What is Crawl?

Overview

Crawl systematically visits and extracts content from an entire website. Give it a starting URL, and it automatically discovers pages, follows links, and extracts clean structured data from every page it visits. Think of it as a smart robot that explores a website for you - reading every page and organizing all the content.

How it works

You provide a starting URL

Give Crawl the website or page URL where you want to start

Crawl discovers all pages

Reads sitemap.xml for URL lists
Follows internal links automatically
Discovers pages across the entire site
Respects depth and scope limits you set

Visits and extracts from each page

Systematically visits every discovered page
Parses content from each page
Structures the data consistently
Processes pages in parallel for speed

Delivers results as they're ready

Get organized data via webhook in real-time or poll for completed results

When to use Crawl

Full Site Data Collection

Extract data from hundreds or thousands of pages across an entire website

Product Catalog Scraping

Gather all product information from e-commerce sites automatically

Content Archiving

Create complete snapshots of websites for analysis or backup

Price Monitoring

Track pricing across entire catalogs over time

Common use cases

E-commerce data collection Scrape complete product catalogs including prices, descriptions, images, and specifications. Content migration Move content from old platforms to new systems by crawling and extracting all pages. Competitive analysis Monitor competitor websites for changes in products, pricing, or content strategy. SEO audits Analyze entire websites for content quality, structure, and optimization opportunities.

Crawl vs. other tools

What you need	Use this
Data from entire website	Crawl
Data from popular sites (Amazon, Google, etc.)	Public Agent - maintained by Nimble
Data from sites not in the gallery	Custom Agent - create with natural language
Data from specific URLs (expert users)	Extract
Search web + extract content from results	Search
URLs with context for AI planning	Map

How Crawl discovers pages

Crawl intelligently finds pages using multiple methods:

Sitemap analysis - Reads sitemap.xml for structured URL lists
Link following - Discovers pages by following internal navigation
Depth control - Set how deep to explore from the starting URL
Smart filtering - Include or exclude specific paths and patterns

Why use Crawl

Comprehensive - Get data from entire websites, not just single pages
Automated - No manual URL lists needed - Crawl finds everything
Efficient - Processes thousands of pages with optimal resource usage
Flexible - Control depth, scope, and which pages to include or exclude
Webhook support - Receive data in real-time as pages are processed

Example

Input: example-store.com/products

{
    "url": "https://www.bestbuy.com/site/searchpage.jsp?id=pcat17071&st=laptops",
    "sitemap": "include",
    "country": "US",
    "limit": 10
}

Output: Immediate response with crawl job details

{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "name": "string",
    "url": "https://www.bestbuy.com/site/searchpage.jsp?id=pcat17071&st=laptops",
    "status": "queued",
    "account_name": "your-account",
    "total": 1,
    "pending": 1,
    "completed": 0,
    "failed": 0,
    "created_at": "2026-02-09T10:25:59.512Z",
    "updated_at": "2026-02-09T10:25:59.512Z",
    "completed_at": null,
    "crawl_options": {
        "sitemap": "include",
        "crawl_entire_domain": false,
        "limit": 10,
        "max_discovery_depth": 5,
        "exclude_paths": [],
        "include_paths": [],
        "ignore_query_parameters": false,
        "allow_external_links": false,
        "allow_subdomains": false,
        "callback": null
    },
    "extract_options": null,
    "tasks": [
        {
            "webit_task_id": "0be6081e-90e3-458a-1234-36f5a51f0156",
            "crawl_id": "123e4567-e89b-12d3-a456-426614174000",
            "status": "pending",
            "url": "https://www.bestbuy.com/site/searchpage.jsp?id=pcat17071&st=laptops",
            "created_at": "2026-02-09T10:25:59.512Z",
            "updated_at": "2026-02-09T10:25:59.512Z"
        }
    ]
}

As pages are crawled, you receive crawl tasks either via webhook or by polling the status endpoint:

Use GET https://sdk.nimbleway.com/v1/crawl/{crawl_id}

{
    "crawl": {
        "id": "123e4567-e89b-12d3-a456-426614174000",
        "name": "string",
        "url": "https://www.bestbuy.com/site/searchpage.jsp?id=pcat17071&st=laptops",
        "status": "succeeded",
        "account_name": "your-account",
        "total": 10,
        "pending": 0,
        "completed": 10,
        "failed": 0,
        "created_at": "2026-02-09T10:25:59.512Z",
        "updated_at": "2026-02-09T10:26:12.164Z",
        "completed_at": "2026-02-09T10:30:00.000Z",
        "crawl_options": {
            "sitemap": "include",
            "crawl_entire_domain": false,
            "limit": 10,
            "max_discovery_depth": 5,
            "exclude_paths": [],
            "include_paths": [],
            "ignore_query_parameters": false,
            "allow_external_links": false,
            "allow_subdomains": false,
            "callback": null
        },
        "extract_options": null,
        "tasks": [
            {
                "task_id": "0be6081e-90e3-458a-1234-36f5a51f0156",
                "crawl_id": "123e4567-e89b-12d3-a456-426614174000",
                "status": "completed",
                "url": "https://www.bestbuy.com/site/searchpage.jsp?id=pcat17071&st=laptops",
                "created_at": "2026-02-09T10:25:59.512Z",
                "updated_at": "2026-02-09T10:26:12.164Z"
            },
            {
                "task_id": "0be6081e-90e3-1234-1234-36f5a51f0156",
                "crawl_id": "123e4567-e89b-12d3-a456-426614174000",
                "status": "completed",
                "url": "https://www.bestbuy.com/site/searchpage.jsp?af=false&id=pcat17071&st=laptops",
                "created_at": "2026-02-09T10:25:59.512Z",
                "updated_at": "2026-02-09T10:26:12.164Z"
            }
        ]
    }
}

The create crawl response returns webit_task_id in tasks, while the status endpoint returns task_id. Both refer to the same task identifier used to fetch results.

Then per crawled pages (tasks), you receive the task results:

Use GET https://sdk.nimbleway.com/v1/tasks/{task_id}/results

{
    "url": "https://www.bestbuy.com/site/searchpage.jsp?af=false&id=pcat17071&st=laptops",
    "task_id": "e8ed8ef6-2657-43ba-98d5-a5c79ea7b551",
    "status": "success",
    "task": {
        "id": "e8ed8ef6-2657-43ba-98d5-a5c79ea7b551",
        "state": "success",
        "created_at": "2026-02-09T10:26:05.817Z",
        "modified_at": "2026-02-09T10:26:05.817Z",
        "account_name": "your-account",
        "input": { ... }
    },
    "data": {
        "html": "...",
        "headers": { ... },
        "parsing": { ... }
    },
    "metadata": {
        "query_time": "2026-02-09T10:26:05.817Z",
        "query_duration": 1877,
        "response_parameters": {
            "input_url": "https://www.bestbuy.com/site/searchpage.jsp?af=false&id=pcat17071&st=laptops"
        },
        "driver": "vx6"
    },
    "status_code": 200
}

Key features

Async operation Crawl jobs run asynchronously - start a crawl and receive results via webhook or check status later. Scale control Set limits on pages, depth, and scope to match your needs and budget. Pattern matching Use include/exclude patterns to target specific sections or content types. Real-time results Get data as it’s extracted via webhooks, or poll for completed results.

SDK methods

Method	Description
`nimble.crawl({...})`	Create a new crawl job
`nimble.crawl.list()`	List all crawls with pagination
`nimble.crawl.status(crawl_id)`	Get crawl status and task list
`nimble.crawl.terminate(crawl_id)`	Stop a running crawl
`nimble.tasks.results(task_id)`	Get extracted content for a page

Next steps

Crawl Usage

See all parameters, code examples, and advanced features

Introduction

Web Tools

Agentic

Overview

How it works

When to use Crawl

Full Site Data Collection

Product Catalog Scraping

Content Archiving

Price Monitoring

Common use cases

Crawl vs. other tools

How Crawl discovers pages

Why use Crawl

Example

Key features

SDK methods

Next steps

Crawl Usage

Introduction

Web Tools

Agentic

​Overview

​How it works

​When to use Crawl

Full Site Data Collection

Product Catalog Scraping

Content Archiving

Price Monitoring

​Common use cases

​Crawl vs. other tools

​How Crawl discovers pages

​Why use Crawl

​Example

​Key features

​SDK methods

​Next steps

Crawl Usage

Overview

How it works

When to use Crawl

Common use cases

Crawl vs. other tools

How Crawl discovers pages

Why use Crawl

Example

Key features

SDK methods

Next steps