Scraper API

Any Webpage to
Clean Data

Extract markdown, HTML, or screenshots. AI-ready output with noise removed.

$0.001per format

https://example.com/blog/article

Webpage

Clean Data

{
  "markdown": "# Article Title\n\nClean content...",
  "title": "Article Title",
  "url": "https://example.com/...",
  "cost": 0.001
}

Formats

$0.001

Each

+$0.004

Proxy

Output Formats

Three Formats, One Call

Request any combination. Each format costs $0.001.

$0.001

Markdown

Clean, readable text with formatting preserved

Best For

LLM training, RAG pipelines, content analysis

$0.001

HTML

Complete page structure with all elements

Best For

Custom parsing, structure preservation

$0.001

Screenshot

Full-page PNG capture (base64 encoded)

Best For

Visual testing, archiving, change detection

Multi-Format Requests

Request multiple formats in one API call. Example: markdown + screenshot = $0.002. Add advanced proxy for protected sites = +$0.004.

Main Content Only

Remove ads, nav, footers

Advanced Proxy

Bypass bot detection

Multi-Format

Request all in one call

Quick Start

Start Scraping in Minutes

scrape.py

PYTHON

from llmlayer import LLMLayerClient

client = LLMLayerClient(api_key="...")

# Extract clean markdown
response = client.scrape(
    url="https://example.com/article",
    formats=["markdown"],
    main_content_only=True
)

print(response.markdown)
print(f"Cost: ${response.cost}")

scrape.ts

TYPESCRIPT

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY
});

const response = await client.scrape({
  url: 'https://example.com/pricing',
  formats: ['markdown', 'screenshot'],
  mainContentOnly: true
});

console.log(response.markdown);
// Screenshot is base64 PNG

TerminalcURL

curl -X POST https://api.llmlayer.dev/api/v2/scrape \
  -H "Authorization: Bearer $LLMLAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article", "formats": ["markdown"], "main_content_only": true}'

Reference

API Parameters

POST/api/v2/scrape

PARAMETER	TYPE	DEFAULT	DESCRIPTION
`url`*	string	—	URL to scrape (must include https://)
`formats`*	array	—	["markdown"], ["html"], ["screenshot"] or combinations
`main_content_only`	boolean	false	Remove nav, headers, footers, sidebars, ads
`advanced_proxy`	boolean	false	Bypass bot detection (+$0.004/request)
`include_images`	boolean	true	Include image links in markdown output
`include_links`	boolean	true	Include hyperlinks in markdown output