Home/APIs/Crawl

Crawl API

Crawl Entire
Websites

Multi-page extraction with real-time streaming. Results arrive as pages complete.

$0.001per page

Get API Key View Docs

SSE Stream

LIVE

data: {"type":"page","title":"Getting Started"}

data: {"type":"page","title":"Installation"}

data: {"type":"page","title":"API Reference"}

data: {"type":"usage","cost":0.003}

data: {"type":"done","time":"4.2s"}

100

Max Pages

$0.001

Per Page

Failed

How It Works

Depth-Based Crawling

Follow links from your seed URL to discover and extract content.

SEED URL

https://docs.example.com

Depth = 1

/getting-started

/installation

/api-reference

Depth = 2

/api/auth

/api/search

...

depth = 1

Only pages directly linked from seed. Fast and focused.

depth = 2

Seed + pages those link to. Good for documentation.

depth = 3+

Deeper crawling. Use with caution on large sites.

Real-Time SSE

Results stream as pages complete

Depth Control

Configure crawl depth 1-3+

Advanced Proxy

Bypass bot detection

Real-Time

SSE Stream Frames

Results arrive via Server-Sent Events as pages complete. No waiting for the entire crawl.

pageContent for each crawled page

usageCost summary after crawling

doneCompletion with response time

Process As You Go

Don't wait for the crawl to complete. Save pages to your database or process them as each frame arrives.

Quick Start

Start Crawling in Minutes

crawl.py

PYTHON

from llmlayer import LLMLayerClient

client = LLMLayerClient(api_key="...")

async for frame in client.crawl_stream(
    url="https://docs.example.com",
    max_pages=20,
    max_depth=2,
    main_content_only=True
):
    if frame["type"] == "page":
        page = frame["page"]
        if page["success"]:
            print(f"✅ {page['title']}")
            # Save page["markdown"]
    elif frame["type"] == "usage":
        print(f"Cost: ${frame['cost']}")

crawl.ts

TYPESCRIPT

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY
});

for await (const frame of client.crawlStream({
  url: 'https://docs.example.com',
  maxPages: 20,
  maxDepth: 2,
  mainContentOnly: true
})) {
  if (frame.type === 'page' && frame.page.success) {
    console.log(`✅ ${frame.page.title}`);
    // Save frame.page.markdown
  }
}

Terminal (note -N flag for streaming)cURL

curl -N -X POST https://api.llmlayer.dev/api/v2/crawl_stream \
  -H "Authorization: Bearer $LLMLAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://docs.example.com", "max_pages": 20, "max_depth": 2}'

Reference

API Parameters

POST/api/v2/crawl_stream

PARAMETER	TYPE	DEFAULT	DESCRIPTION
`url`*	string	—	Starting URL to crawl from
`max_pages`	integer	25	Maximum pages to crawl (1-100)
`max_depth`	integer	2	Link depth from seed URL
`main_content_only`	boolean	false	Remove nav, headers, footers, sidebars
`advanced_proxy`	boolean	false	Bypass bot detection (+$0.004/page)
`include_subdomains`	boolean	false	Crawl across subdomains (blog., docs.)
`timeout`	number	60	Total crawl timeout in seconds