How to Scrape Websites Without Getting Blocked (2026 Guide)
How to Scrape Websites Without Getting Blocked (2026 Guide)
Getting blocked while scraping? Here's everything I've learned building production scrapers that extract data from 50+ websites daily.
Why You Get Blocked
1. Bot-like User-Agent — default HTTP library headers scream "I'm a bot" 2. Too fast — 100 requests/second is not human behavior 3. No JavaScript — modern sites detect headless browsers 4. Fingerprinting — canvas, WebGL, font fingerprinting 5. Cloudflare/AWS WAF — enterprise anti-bot protection
Solution 1: Use a Headless Browser
`python
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://target-site.com")
# Wait for JS to render
page.wait_for_load_state("networkidle")
# Extract data
data = page.query_selector_all(".product-card")
for item in data:
title = item.query_selector(".title").text_content()
price = item.query_selector(".price").text_content()
print(f"{title}: {price}")
`
Solution 2: Realistic Headers
`python
import requests
headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36", "Accept": "text/html,application/xhtml+xml", "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate, br", "Connection": "keep-alive", }
response = requests.get("https://target-site.com", headers=headers)
`
Solution 3: Rate Limiting
`python
import time
import random
for url in urls:
response = requests.get(url, headers=headers)
# Random delay between 2-5 seconds
time.sleep(2 + random.random() * 3)
`
Solution 4: Use an API Instead
For common tasks like screenshots, metadata extraction, or text extraction, use an API and skip the scraping entirely:
`bash
Screenshot any website
curl -X POST https://api.16761.tech/screenshot \ -H "Authorization: Bearer YOUR_KEY" \ -d '{"url":"https://target-site.com"}' -o screenshot.pngExtract metadata (title, OG tags, favicon)
curl -X POST https://api.16761.tech/metadata \ -H "Authorization: Bearer YOUR_KEY" \ -d '{"url":"https://target-site.com"}'Extract clean text
curl -X POST https://api.16761.tech/text-extract \ -H "Authorization: Bearer YOUR_KEY" \ -d '{"url":"https://target-site.com"}'`Free: 100 requests/day. Get API key →
When to DIY vs Hire Someone
DIY if:
- Simple, static HTML pages
- One-time data collection
- You enjoy debugging anti-bot systems
- JavaScript-heavy sites (React, Vue, Angular)
- Anti-bot protection (Cloudflare, AWS WAF, Akamai)
- Need ongoing/scheduled scraping
- Time is more valuable than money
- Scraping Demo — see what data I can extract from any URL
- Domain Tech Checker — check what tech a site uses before scraping
- 10 Developer APIs — screenshot, PDF, text extraction, and more
Hire if:
I offer custom scraping services starting at $300. Details →