Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developers.scrapeunblocker.com/llms.txt

Use this file to discover all available pages before exploring further.

For most scraping workloads you don’t want raw HTML - you want clean JSON with the fields that matter. Pass parsed_data=true to /getPageSource and ScrapeUnblocker extracts structured data using the best available method for that page.

Request

curl -X POST "https://api.scrapeunblocker.com/getPageSource?url=https://www.amazon.com/dp/B08N5WRWNW&parsed_data=true" \
  -H "x-scrapeunblocker-key: YOUR_API_KEY"

Response shape

{
  "data": {
    "page_type": "product",
    "source": "schema_org",
    "data": {
      "title": "Echo Dot (4th Gen)",
      "price": "49.99",
      "currency": "USD",
      "brand": "Amazon",
      "availability": "InStock",
      "rating": 4.7,
      "review_count": 123456
    }
  }
}

page_type

Detected category of the page. Common values:
  • product - e-commerce product detail page
  • listing - search results or category page
  • article - news, blog, or editorial content
  • job - job posting
  • real_estate - property listing
  • unknown - extractor could not classify the page

source

Which extraction strategy produced the data:
SourceWhat it means
schema_orgThe page exposed JSON-LD or microdata using schema.org vocabulary. Most reliable.
next_dataExtracted from a Next.js __NEXT_DATA__ <script> block. Common on modern e-commerce.
nuxt_dataExtracted from a Nuxt __NUXT__ block.
og_metaFell back to OpenGraph / Twitter Card meta tags. Limited fields but always normalized.
ai_ruleCustom selector rule generated by AI for this domain. Used when no structured data is available.

data

The extracted fields. Schema depends on page_type. Field names are normalized across sources - a product always has title and price regardless of whether source is schema_org or ai_rule.

When parsed data is the right choice

Use it when you’re scraping a known page type at scale - products, articles, listings, jobs. Saves you from writing per-site parsers.
Skip it when you need a field the extractor doesn’t expose, or when you need raw HTML for downstream tooling. Fetch the HTML and parse it yourself instead.

Combining with get_cookies

You can set both parsed_data=true and get_cookies=true on the same request. The response gains a cookies field and a proxy field alongside data:
{
  "data": { "page_type": "product", "source": "schema_org", "data": { ... } },
  "cookies": [ { "name": "session", "value": "...", "domain": "..." } ],
  "proxy": "us"
}
See cookies and sessions.