If you already have a Scrapy project, theDocumentation Index
Fetch the complete documentation index at: https://developers.scrapeunblocker.com/llms.txt
Use this file to discover all available pages before exploring further.
scrapeunblocker-scrapy-middleware package lets you keep your spiders unchanged - every Request is silently rewritten to go through /getPageSource, with HTML returned to your callbacks as if the spider had fetched the URL directly.
The middleware is maintained at github.com/scrapeunblocker/scrapeunblocker-scrapy-middleware. For the most up-to-date install and configuration instructions, check the README there.
Install
Enable in settings.py
Per-request overrides
Pass options viaRequest.meta["scrapeunblocker"] to override defaults for a single request:
What the middleware does
For every outgoing request, the middleware:- Rewrites the URL to
https://api.scrapeunblocker.com/getPageSource?url=<original>. - Changes the method to
POST. - Adds the
x-scrapeunblocker-keyheader. - Merges
SCRAPEUNBLOCKER_DEFAULTSandmeta["scrapeunblocker"]into the query string. - On response, restores the original URL on the Scrapy
Responseobject so your selectors see the URL you requested, not the proxy URL.
Handling parsed data
Whenparsed_data=True is set, the response body is JSON, not HTML. Use the convenience accessor:
Retry behavior
The middleware does not override Scrapy’sRetryMiddleware. Configure retries in settings.py as you would for any other downloader:
403 retry from Scrapy will hit ScrapeUnblocker again, which independently rotates through bypass routes - so a second 403 usually means the target is truly hard-blocked on that day. See handling failures.
