> ## Documentation Index
> Fetch the complete documentation index at: https://developers.scrapeunblocker.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Parsed data extraction

> Get structured JSON instead of raw HTML. Extraction uses Schema.org, __NEXT_DATA__, or AI-generated rules.

For most scraping workloads you don't want raw HTML - you want clean JSON with the fields that matter. Pass `parsed_data=true` to `/getPageSource` and ScrapeUnblocker extracts structured data using the best available method for that page.

## Request

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST "https://api.scrapeunblocker.com/getPageSource?url=https://www.amazon.com/dp/B08N5WRWNW&parsed_data=true" \
    -H "x-scrapeunblocker-key: YOUR_API_KEY"
  ```

  ```python Python theme={null}
  import requests

  r = requests.post(
      "https://api.scrapeunblocker.com/getPageSource",
      params={
          "url": "https://www.amazon.com/dp/B08N5WRWNW",
          "parsed_data": True,
      },
      headers={"x-scrapeunblocker-key": "YOUR_API_KEY"},
      timeout=120,
  )
  payload = r.json()
  ```

  ```javascript Node.js theme={null}
  const res = await fetch(
    "https://api.scrapeunblocker.com/getPageSource?url=https://www.amazon.com/dp/B08N5WRWNW&parsed_data=true",
    {
      method: "POST",
      headers: { "x-scrapeunblocker-key": "YOUR_API_KEY" },
    }
  );
  const payload = await res.json();
  ```
</CodeGroup>

## Response shape

```json theme={null}
{
  "data": {
    "page_type": "product",
    "source": "schema_org",
    "data": {
      "title": "Echo Dot (4th Gen)",
      "price": "49.99",
      "currency": "USD",
      "brand": "Amazon",
      "availability": "InStock",
      "rating": 4.7,
      "review_count": 123456
    }
  }
}
```

### `page_type`

Detected category of the page. Common values:

* `product` - e-commerce product detail page
* `listing` - search results or category page
* `article` - news, blog, or editorial content
* `job` - job posting
* `real_estate` - property listing
* `unknown` - extractor could not classify the page

### `source`

Which extraction strategy produced the data:

| Source       | What it means                                                                                           |
| ------------ | ------------------------------------------------------------------------------------------------------- |
| `schema_org` | The page exposed JSON-LD or microdata using [schema.org](https://schema.org) vocabulary. Most reliable. |
| `next_data`  | Extracted from a Next.js `__NEXT_DATA__` `<script>` block. Common on modern e-commerce.                 |
| `nuxt_data`  | Extracted from a Nuxt `__NUXT__` block.                                                                 |
| `og_meta`    | Fell back to OpenGraph / Twitter Card meta tags. Limited fields but always normalized.                  |
| `ai_rule`    | Custom selector rule generated by AI for this domain. Used when no structured data is available.        |

### `data`

The extracted fields. Schema depends on `page_type`. Field names are normalized across sources - a `product` always has `title` and `price` regardless of whether `source` is `schema_org` or `ai_rule`.

## When parsed data is the right choice

<Check>**Use it when** you're scraping a known page type at scale - products, articles, listings, jobs. Saves you from writing per-site parsers.</Check>
<Warning>**Skip it when** you need a field the extractor doesn't expose, or when you need raw HTML for downstream tooling. Fetch the HTML and parse it yourself instead.</Warning>

## Combining with `get_cookies`

You can set both `parsed_data=true` and `get_cookies=true` on the same request. The response gains a `cookies` field and a `proxy` field alongside `data`:

```json theme={null}
{
  "data": { "page_type": "product", "source": "schema_org", "data": { ... } },
  "cookies": [ { "name": "session", "value": "...", "domain": "..." } ],
  "proxy": "us"
}
```

See [cookies and sessions](/guides/cookies-and-sessions).
