Home/APIs/Scraper
Scraper API

Any Webpage to
Clean Data

Extract markdown, HTML, or screenshots. AI-ready output with noise removed.

$0.001per format
https://example.com/blog/article
Webpage
Clean Data
{
  "markdown": "# Article Title\n\nClean content...",
  "title": "Article Title",
  "url": "https://example.com/...",
  "cost": 0.001
}
3
Formats
$0.001
Each
+$0.004
Proxy
Output Formats

Three Formats, One Call

Request any combination. Each format costs $0.001.

$0.001

Markdown

Clean, readable text with formatting preserved

Best For

LLM training, RAG pipelines, content analysis

$0.001

HTML

Complete page structure with all elements

Best For

Custom parsing, structure preservation

$0.001

Screenshot

Full-page PNG capture (base64 encoded)

Best For

Visual testing, archiving, change detection

Multi-Format Requests

Request multiple formats in one API call. Example: markdown + screenshot = $0.002. Add advanced proxy for protected sites = +$0.004.

Main Content Only
Remove ads, nav, footers
Advanced Proxy
Bypass bot detection
Multi-Format
Request all in one call
Quick Start

Start Scraping in Minutes

scrape.py
PYTHON
from llmlayer import LLMLayerClient

client = LLMLayerClient(api_key="...")

# Extract clean markdown
response = client.scrape(
    url="https://example.com/article",
    formats=["markdown"],
    main_content_only=True
)

print(response.markdown)
print(f"Cost: ${response.cost}")
scrape.ts
TYPESCRIPT
import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY
});

const response = await client.scrape({
  url: 'https://example.com/pricing',
  formats: ['markdown', 'screenshot'],
  mainContentOnly: true
});

console.log(response.markdown);
// Screenshot is base64 PNG
TerminalcURL
curl -X POST https://api.llmlayer.dev/api/v2/scrape \
  -H "Authorization: Bearer $LLMLAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article", "formats": ["markdown"], "main_content_only": true}'
Reference

API Parameters

POST/api/v2/scrape
PARAMETERTYPEDEFAULTDESCRIPTION
url*stringURL to scrape (must include https://)
formats*array["markdown"], ["html"], ["screenshot"] or combinations
main_content_onlybooleanfalseRemove nav, headers, footers, sidebars, ads
advanced_proxybooleanfalseBypass bot detection (+$0.004/request)
include_imagesbooleantrueInclude image links in markdown output
include_linksbooleantrueInclude hyperlinks in markdown output

* Required parameter

Use Cases

Built For

LLM Training Data

Extract clean markdown from documentation, blogs, and articles. Use main_content_only for noise-free datasets.

RAG Pipelines

Feed webpage content directly into vector databases. Markdown format preserves structure while removing HTML noise.

Change Detection

Monitor competitor pages with screenshots. Compare visual changes over time for pricing or content updates.

Content Aggregation

Build news aggregators or research tools. Extract and normalize content from diverse sources.

Pricing

Simple, Predictable

Per Request

Each format (markdown, html, screenshot)$0.001
Advanced proxy (optional)+$0.004

Examples

Markdown only$0.001
Markdown + HTML$0.002
All 3 formats$0.003
Markdown + proxy$0.005
$0.001per format

Start Extracting
Today

Get your API key and start scraping in minutes. Free credits to start. No credit card required.