Crawl

Nimble Crawl (async) systematically visits and extracts content from entire websites. Give it a starting URL, and it automatically discovers pages through sitemaps and internal links, then extracts clean structured data from every page it visits.

Quick Start

Example Request

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.crawl.run(
    url="https://www.nimbleway.com",
    limit=10
)

print(f"Crawl started with ID: {result.crawl_id}")

Example Response

{
  "crawl_id": "e3ca2ff1-b82a-472b-b1a9-ef4d29cc549f",
  "name": null,
  "url": "https://www.nimbleway.com",
  "status": "queued",
  "account_name": "your-account",
  "created_at": "2026-02-09T23:15:40.785Z",
  "updated_at": "2026-02-09T23:15:40.785Z",
  "completed_at": null,
  "crawl_options": {
    "sitemap": "include",
    "crawl_entire_domain": false,
    "limit": 10,
    "max_discovery_depth": 5,
    "ignore_query_parameters": false,
    "allow_external_links": false,
    "allow_subdomains": false
  },
  "extract_options": null
}

How it works

You submit a crawl request

Provide a starting URL and configure crawl options (limits, filters, extraction settings)

An async crawl job is created

Returns immediately with a crawl_id to track progress - The crawl runs in the background on Nimble’s infrastructure - Optional: Configure webhooks to receive real-time notifications

Crawl discovers and processes pages

Reads sitemaps and follows internal links - Creates individual tasks for each discovered URL - Extracts content from pages as they’re visited - Status updates live: track pending, completed, and failed counts

Retrieve results anytime

Poll crawl status to monitor progress - Fetch extracted content for completed tasks using task_id - Results persist after crawl completes for later retrieval

Parameters

Supported input parameters:

url - Required

url

string

required

The starting point for your crawl. The crawler will begin here and discover other pages from this URL.Example: https://www.nimbleway.com

name

string

Give your crawl a memorable name. This helps you identify it later when you have multiple crawls running.Example: my-zillow-crawl

limit

integer

default:"5000"

Stop the crawl after finding this many pages.

Min: 1
Max: 10000
Default: 5000

extract_options

object

Automatically extract content from each page as you crawl it. Accepts all Extract API options.**Example: **

{
	"extract_options":{
		"driver":"vx10",
		"parse":true,
		"formats": ["html", "markdown"]
	}
}

sitemap

string

default:"include"

Decide how to use the website’s sitemap for discovering pages.Options:

include (default) - Use both the sitemap and discovered links
only - Just use the sitemap (fastest)
skip - Ignore the sitemap and only follow links

crawl_entire_domain

boolean

default:"false"

Let the crawler explore the entire domain, not just pages “under” your starting URL.For example, if you start at /blog, enabling this lets it also crawl /about and /contact.

allow_subdomains

boolean

default:"false"

Allow the crawler to follow links to subdomains.For example, from www.example.com to blog.example.com or shop.example.com.

include_paths

array

Only crawl pages whose URLs match these regex patterns.Example: ["/blog/.*", "/articles/.*"]

exclude_paths

array

Skip pages whose URLs match these regex patterns.Example: [".*/tag/.*", ".*/page/[0-9]+"]

max_discovery_depth

integer

default:"5"

Control how many “clicks away” from the starting page the crawler can go.

Min: 1
Max: 20
Default: 5

ignore_query_parameters

boolean

default:"false"

Treat URLs with different query parameters as the same page, preventing duplicate crawls.

callback

object

Get notified when your crawl completes or as pages are discovered.Configuration:

url (required) - String | Webhook URL to receive notifications
headers - Object | Custom headers for authentication
metadata - Object | Extra data to include in payloads
events - Array | Which events trigger notifications: started, page, completed, failed

Example:

{
	"callback":{
		"url":"",
		"headers":{},
		"metadata":{},
		"events":["started", "completed"]
	}
}

country

string

default:"ALL"

Crawl the site as if you’re browsing from a specific country.Use ISO Alpha-2 country codes like US, GB, FR, DE, CA, JP, etc. Use ALL for random country selection.

locale

string

Set the language preference for crawling. Use LCID standard.Locale Examples:

en-US - English (United States)
en-GB - English (United Kingdom)
fr-FR - French (France)
de-DE - German (Germany)

Usage

Basic crawl

Crawl a website using default settings:

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.crawl.run(url="https://www.nimbleway.com")

print(f"Crawl started with ID: {result.crawl_id}")
print(f"Status: {result.status}")

Filter with URL patterns

Use include and exclude patterns to control which URLs are crawled:

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.crawl.run(
    url="https://www.nimbleway.com",
    include_paths=["/blog/.*", "/use-cases/.*"],
    exclude_paths=[".*/careers/.*"],
    limit=500
)

print(f"Crawl ID: {result.crawl_id}")
print(f"Status: {result.status}")

Crawl entire domain

Allow crawler to follow all internal links beyond the starting path:

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.crawl.run(
    url="https://www.nimbleway.com/blog",
    crawl_entire_domain=True,
    limit=2000
)

print(f"Crawl ID: {result.crawl_id}")
print(f"Status: {result.status}")

Crawl with extraction

Extract structured data from each page during the crawl:

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.crawl.run(
    url="https://www.nimbleway.com",
    limit=500,
    extract_options={
        "driver": "vx10",
        "parse": True,
        "formats": ["html", "markdown"]
    }
)

print(f"Crawl ID: {result.crawl_id}")
print(f"Status: {result.status}")

Combined parameters

Crawl with multiple parameters for precise control:

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.crawl.run(
    url="https://www.nimbleway.com",
    name="Nimble Website Crawl",
    sitemap="include",
    allow_subdomains=True,
    include_paths=["/use-cases/.*"],
    limit=1000,
    callback={
        "url": "https://your-server.com/webhook",
        "events": ["completed"]
    }
)

print(f"Crawl started: {result.crawl_id}")
print(f"Status: {result.status}")

Managing Crawls

List crawls

Get all your crawls filtered by status using the REST API:

Available status filters: pending, in_progress, completed, failed, canceled

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

# List crawls by status
my_crawls = nimble.crawl.list()

for crawl in my_crawls.data:
    print(f"Crawl {crawl.crawl_id}: {crawl.name} - {crawl.status}")

Get crawl status (by crawl_id)

Check progress and get the list of task IDs for a specific crawl using the REST API:

my_crawl = nimble.crawl.status(crawl_id)

print(f"Status: {my_crawl.status}")

Get task results

Use the task_id from the crawl status response to fetch extracted content for each page using the REST API:

# my_crawl["tasks"] from step #2 contains list of task IDs from status response
for task in my_crawl.tasks:
    if task.status == "completed":
        task_result = nimble.tasks.get(task_id)

        print(f"URL: {task_result['url']}")
        print(f"HTML length: {len(task_result['data'].get('html', ''))}")

Example Task Results Response

{
    "url": "https://www.nimbleway.com/blog/post",
    "task_id": "ec89b1f7-1cf2-40eb-91b4-78716093f9ed",
    "status": "success",
    "task": {
        "id": "ec89b1f7-1cf2-40eb-91b4-78716093f9ed",
        "state": "success",
        "created_at": "2026-02-09T23:15:43.549Z",
        "modified_at": "2026-02-09T23:16:39.094Z",
        "account_name": "your-account"
    },
    "data": {
        "html": "<!DOCTYPE html>...",
        "markdown": "# Page Title\n\nContent...",
        "headers": { ... }
    },
    "metadata": {
        "query_time": "2026-02-09T23:15:43.549Z",
        "query_duration": 1877,
        "response_parameters": {
            "input_url": "https://www.nimbleway.com/blog/post"
        },
		"driver": "vx6"
    },
    "status_code": 200
}

Cancel crawl

Stop a running crawl using the REST API. Completed tasks remain available for result retrieval:

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

response = nimble.crawl.terminate(crawl_id)

print(response)  # {"status": "canceled"}

Response Fields

When you use Crawl, you receive:

Async operation - Crawl jobs run in the background, check status or receive webhooks
Progress tracking - Monitor total, pending, completed, and failed counts
Task-based results - Each page becomes a task with extractable content
Webhook support - Get notified in real-time as pages are processed

Create Crawl Response

Returns immediate response with crawl job details

Example Response

{
    "crawl_id": "e3ca2ff1-b82a-472b-b1a9-ef4d29cc549f",
    "name": null,
    "url": "https://www.nimbleway.com",
    "status": "queued",
    "account_name": "your-account",
    "created_at": "2026-02-09T23:15:40.785Z",
    "updated_at": "2026-02-09T23:15:40.785Z",
    "completed_at": null,
    "crawl_options": {
        "sitemap": "include",
        "crawl_entire_domain": false,
        "limit": 10,
        "max_discovery_depth": 5,
        "ignore_query_parameters": false,
        "allow_external_links": false,
        "allow_subdomains": false
    },
    "extract_options": null
}

Field	Type	Description
`crawl_id`	string	Unique identifier for the crawl job
`name`	string	Optional name you assigned to the crawl
`url`	string	Starting URL for the crawl
`status`	string	`queued`, `running`, `succeeded`, `failed`, `canceled`
`account_name`	string	Your account identifier
`created_at`	string	Timestamp when crawl was created
`updated_at`	string	Timestamp of last status update
`completed_at`	string	Timestamp when crawl completed (null if in progress)
`crawl_options`	object	Configuration settings applied to this crawl
`extract_options`	object	Extraction settings (null if not configured)

Get Crawl Status by ID Response

Returns the crawl object wrapped in a crawl key, with progress counters and task list:

Example Response

{
    "crawl": {
        "crawl_id": "e3ca2ff1-b82a-472b-b1a9-ef4d29cc549f",
        "name": null,
        "url": "https://www.nimbleway.com",
        "status": "succeeded",
        "account_name": "your-account",
        "total": 10,
        "pending": 0,
        "completed": 10,
        "failed": 0,
        "created_at": "2026-02-09T23:15:40.785Z",
        "updated_at": "2026-02-09T23:17:08.083Z",
        "completed_at": "2026-02-09T23:17:08.079Z",
        "crawl_options": {
            "sitemap": "include",
            "crawl_entire_domain": false,
            "limit": 10,
            "max_discovery_depth": 5
        },
        "extract_options": null,
        "tasks": [
            {
                "task_id": "ec89b1f7-1cf2-40eb-91b4-78716093f9ed",
                "status": "completed",
                "updated_at": "2026-02-09T23:16:39.094Z",
                "created_at": "2026-02-09T23:15:43.549Z"
            },
            {
                "task_id": "3f6c136c-4bb5-44af-a21b-c8f1db708c2f",
                "status": "completed",
                "updated_at": "2026-02-09T23:16:45.033Z",
                "created_at": "2026-02-09T23:15:42.966Z"
            }
        ]
    }
}

Field	Type	Description
`crawl.crawl_id`	string	Unique identifier for the crawl job
`crawl.status`	string	Current crawl status
`crawl.total`	integer	Total URLs discovered
`crawl.pending`	integer	URLs waiting to be processed
`crawl.completed`	integer	Successfully processed URLs
`crawl.failed`	integer	Failed URL extractions
`crawl.tasks`	array	List of individual page tasks
`crawl.tasks[].task_id`	string	Task ID to use with `GET /v1/tasks/{id}/results`
`crawl.tasks[].status`	string	`pending`, `processing`, `completed`, `failed`

SDK and API methods

Method	Availability	Description
`nimble.crawl.run(url=..., ...)`	Python SDK	Create a new crawl job
`GET /v1/crawl`	REST API	List all crawls with pagination
`GET /v1/crawl/{crawl_id}`	REST API	Get crawl status and task list
`DELETE /v1/crawl/{crawl_id}`	REST API	Stop a running crawl
`GET /v1/tasks/{task_id}/results`	REST API	Get extracted content for a page

The Python SDK currently supports creating crawl jobs via nimble.crawl.run(). For crawl management operations (listing, status, cancellation) and retrieving task results, use the REST API directly as shown in the examples above.

Use cases

Full Site Data Collection

Extract data from hundreds or thousands of pages across an entire website

Product Catalog Scraping

Gather all product information from e-commerce sites automatically

Content Archiving

Create complete snapshots of websites for analysis or backup

Price Monitoring

Track pricing across entire catalogs over time

Real-world examples

E-commerce product discovery

Scenario: You need to gather all product information from a competitor’s online store.How Crawl helps:

Discovers all product pages through sitemaps and navigation
Extracts product details, prices, and descriptions from each page
Handles pagination and category structures automatically
Filters out cart, checkout, and account pages

Result: Complete product catalog data without manual URL collection.

Blog content migration

Scenario: You’re migrating a blog to a new platform and need all content.How Crawl helps:

Finds all blog posts through sitemap and internal links
Extracts post content, metadata, and images
Excludes tag pages, author archives, and pagination
Preserves URL structure for redirects

Result: Complete content export ready for migration.

Documentation site backup

Scenario: You want to create an offline backup of documentation.How Crawl helps:

Maps entire documentation structure
Extracts content from all pages
Maintains hierarchy and navigation structure
Captures code examples and technical content

Result: Complete documentation archive for offline access.

Competitive price monitoring

Scenario: You need to track competitor pricing across their entire catalog.How Crawl helps:

Discovers all product pages automatically
Extracts pricing information from each page
Runs on schedule to track changes over time
Handles dynamic pricing and regional variations

Result: Comprehensive price intelligence data.

SEO site audit

Scenario: You’re auditing a website’s content for SEO optimization.How Crawl helps:

Discovers all indexable pages
Extracts titles, meta descriptions, and headings
Identifies orphaned pages and broken links
Maps internal linking structure

Result: Complete SEO audit data for analysis.

Crawl vs Map

Need	Use
Extract content from pages	Crawl
Deep link following	Crawl
Complex filtering patterns	Crawl
Webhook notifications	Crawl
Quick URL discovery only	Map - completes in seconds
URL list with titles/descriptions	Map

Try Use Map first - Discover URLs quickly with Map, then use Crawl to extract content from the pages you need.

Introduction

Web Tools

Agentic

SDKs

Guides

Admin

Quick Start

Example Request

Example Response

How it works

Parameters

Usage

Basic crawl

Filter with URL patterns

Crawl entire domain

Crawl with extraction

Combined parameters

Managing Crawls

List crawls

Get crawl status (by crawl_id)

Get task results

Cancel crawl

Response Fields

Create Crawl Response

Get Crawl Status by ID Response

SDK and API methods

Use cases

Full Site Data Collection

Product Catalog Scraping

Content Archiving

Price Monitoring

Real-world examples

Crawl vs Map

Next steps

Map

Extract

Introduction

Web Tools

Agentic

SDKs

Guides

Admin

​Quick Start

​Example Request

​Example Response

​How it works

​Parameters

​Usage

​Basic crawl

​Filter with URL patterns

​Crawl entire domain

​Crawl with extraction

​Combined parameters

​Managing Crawls

List crawls

Get crawl status (by crawl_id)

Get task results

Cancel crawl

​Response Fields

​Create Crawl Response

​Get Crawl Status by ID Response

​SDK and API methods

​Use cases

Full Site Data Collection

Product Catalog Scraping

Content Archiving

Price Monitoring

​Real-world examples

​Crawl vs Map

​Next steps

Map

Extract

Quick Start

Example Request

Example Response

How it works

Parameters

Usage

Basic crawl

Filter with URL patterns

Crawl entire domain

Crawl with extraction

Combined parameters

Managing Crawls

Response Fields

Create Crawl Response

Get Crawl Status by ID Response

SDK and API methods

Use cases

Real-world examples

Crawl vs Map

Next steps