ScrapeUnblocker Documentation

For most scraping workloads you don’t want raw HTML - you want clean JSON with the fields that matter. Pass parsed_data=true to /getPageSource and ScrapeUnblocker extracts structured data using the best available method for that page.

Request

curl -X POST "https://api.scrapeunblocker.com/getPageSource?url=https://www.amazon.com/dp/B08N5WRWNW&parsed_data=true" \
  -H "x-scrapeunblocker-key: YOUR_API_KEY"

Response shape

{
  "data": {
    "page_type": "product",
    "source": "schema_org",
    "data": {
      "title": "Echo Dot (4th Gen)",
      "price": "49.99",
      "currency": "USD",
      "brand": "Amazon",
      "availability": "InStock",
      "rating": 4.7,
      "review_count": 123456
    }
  }
}

`page_type`

Detected category of the page. Common values:

product - e-commerce product detail page
listing - search results or category page
article - news, blog, or editorial content
job - job posting
real_estate - property listing
unknown - extractor could not classify the page

`source`

Which extraction strategy produced the data:

Source	What it means
`schema_org`	The page exposed JSON-LD or microdata using schema.org vocabulary. Most reliable.
`next_data`	Extracted from a Next.js `__NEXT_DATA__` `<script>` block. Common on modern e-commerce.
`nuxt_data`	Extracted from a Nuxt `__NUXT__` block.
`og_meta`	Fell back to OpenGraph / Twitter Card meta tags. Limited fields but always normalized.
`ai_rule`	Custom selector rule generated by AI for this domain. Used when no structured data is available.

`data`

The extracted fields. Schema depends on page_type. Field names are normalized across sources - a product always has title and price regardless of whether source is schema_org or ai_rule.

When parsed data is the right choice

Use it when you’re scraping a known page type at scale - products, articles, listings, jobs. Saves you from writing per-site parsers.

Skip it when you need a field the extractor doesn’t expose, or when you need raw HTML for downstream tooling. Fetch the HTML and parse it yourself instead.

Combining with `get_cookies`

You can set both parsed_data=true and get_cookies=true on the same request. The response gains a cookies field and a proxy field alongside data:

{
  "data": { "page_type": "product", "source": "schema_org", "data": { ... } },
  "cookies": [ { "name": "session", "value": "...", "domain": "..." } ],
  "proxy": "us"
}

See cookies and sessions.

Get Started

Guides

Code Examples

Parsed data extraction

Request

Response shape

`page_type`

`source`

`data`

When parsed data is the right choice

Combining with `get_cookies`

Get Started

Guides

Code Examples

Documentation Index

​Request

​Response shape

​page_type

​source

​data

​When parsed data is the right choice

​Combining with get_cookies

Request

Response shape

`page_type`

`source`

`data`

When parsed data is the right choice

Combining with `get_cookies`