Quick Start
Example Request
Example Response
How it works
You submit a crawl request
Provide a starting URL and configure crawl options (limits, filters, extraction settings)
An async crawl job is created
- Returns immediately with a
crawl_idto track progress - The crawl runs in the background on Nimble’s infrastructure
- Optional: Configure webhooks to receive real-time notifications
Crawl discovers and processes pages
- Reads sitemaps and follows internal links
- Creates individual tasks for each discovered URL
- Extracts content from pages as they’re visited
- Status updates live: track
pending,completed, andfailedcounts
Parameters
Supported input parameters:url - Required
url - Required
The starting point for your crawl. The crawler will begin here and discover other pages from this URL.Example:
https://www.example.comname
name
Give your crawl a memorable name. This helps you identify it later when you have multiple crawls running.Example:
my-zillow-crawllimit
limit
Stop the crawl after finding this many pages.
- Min:
1 - Max:
10000 - Default:
5000
sitemap
sitemap
Decide how to use the website’s sitemap for discovering pages.Options:
include(default) - Use both the sitemap and discovered linksonly- Just use the sitemap (fastest)skip- Ignore the sitemap and only follow links
crawl_entire_domain
crawl_entire_domain
Let the crawler explore the entire domain, not just pages “under” your starting URL.For example, if you start at
/blog, enabling this lets it also crawl /about and /contact.allow_subdomains
allow_subdomains
Allow the crawler to follow links to subdomains.For example, from
www.example.com to blog.example.com or shop.example.com.include_paths
include_paths
Only crawl pages whose URLs match these regex patterns.Example:
["/blog/.*", "/articles/.*"]exclude_paths
exclude_paths
Skip pages whose URLs match these regex patterns.Example:
[".*/tag/.*", ".*/page/[0-9]+"]max_discovery_depth
max_discovery_depth
Control how many “clicks away” from the starting page the crawler can go.
- Min:
1 - Max:
20 - Default:
5
ignore_query_parameters
ignore_query_parameters
Treat URLs with different query parameters as the same page, preventing duplicate crawls.
callback
callback
Get notified when your crawl completes or as pages are discovered.Configuration:
url(required) - String | Webhook URL to receive notificationsheaders- Object | Custom headers for authenticationmetadata- Object | Extra data to include in payloadsevents- Array | Which events trigger notifications:started,page,completed,failed
extract_options
extract_options
Automatically extract content from each page as you crawl it. Accepts all Extract API options.**Example: **
country
country
Crawl the site as if you’re browsing from a specific country.Use ISO Alpha-2 country codes like
US, GB, FR, DE, CA, JP, etc. Use ALL for random country selection.locale
locale
Set the language preference for crawling. Use LCID standard.Locale Examples:
en-US- English (United States)en-GB- English (United Kingdom)fr-FR- French (France)de-DE- German (Germany)
Usage
Basic crawl
Crawl a website using default settings:Filter with URL patterns
Use include and exclude patterns to control which URLs are crawled:Crawl entire domain
Allow crawler to follow all internal links beyond the starting path:Crawl with extraction
Extract structured data from each page during the crawl:Check crawl status
Get current status and progress of a crawl:Get task results
Fetch extracted content for a specific crawled page:Combined parameters
Crawl with multiple parameters for precise control:Managing Crawls
List crawls
Get all your crawls filtered by status:
-
Available status filters:
pending,in_progress,completed,failed,canceled
Get task results
Use the
task_id from the crawl status response to fetch extracted content for each page:Example Task Results Response
Example Task Results Response
Response Fields
When you use Crawl, you receive:- Async operation - Crawl jobs run in the background, check status or receive webhooks
- Progress tracking - Monitor
total,pending,completed, andfailedcounts - Task-based results - Each page becomes a task with extractable content
- Webhook support - Get notified in real-time as pages are processed
Create Crawl Response
Returns immediate response with crawl job detailsExample Response
Example Response
| Field | Type | Description |
|---|---|---|
crawl_id | string | Unique identifier for the crawl job |
name | string | Optional name you assigned to the crawl |
url | string | Starting URL for the crawl |
status | string | queued, running, succeeded, failed, canceled |
account_name | string | Your account identifier |
created_at | string | Timestamp when crawl was created |
updated_at | string | Timestamp of last status update |
completed_at | string | Timestamp when crawl completed (null if in progress) |
crawl_options | object | Configuration settings applied to this crawl |
extract_options | object | Extraction settings (null if not configured) |
Get Crawl Status by ID Response
Returns the crawl object wrapped in acrawl key, with progress counters and task list:
Example Response
Example Response
| Field | Type | Description |
|---|---|---|
crawl.crawl_id | string | Unique identifier for the crawl job |
crawl.status | string | Current crawl status |
crawl.total | integer | Total URLs discovered |
crawl.pending | integer | URLs waiting to be processed |
crawl.completed | integer | Successfully processed URLs |
crawl.failed | integer | Failed URL extractions |
crawl.tasks | array | List of individual page tasks |
crawl.tasks[].task_id | string | Task ID to use with GET /v1/tasks/{id}/results |
crawl.tasks[].status | string | pending, processing, completed, failed |
SDK methods
| Method | Description |
|---|---|
nimble.crawl({...}) | Create a new crawl job |
nimble.crawl.list() | List all crawls with pagination |
nimble.crawl.status(crawl_id) | Get crawl status and task list |
nimble.crawl.terminate(crawl_id) | Stop a running crawl |
nimble.tasks.results(task_id) | Get extracted content for a page |
Use cases
Full Site Data Collection
Extract data from hundreds or thousands of pages across an entire website
Product Catalog Scraping
Gather all product information from e-commerce sites automatically
Content Archiving
Create complete snapshots of websites for analysis or backup
Price Monitoring
Track pricing across entire catalogs over time
Real-world examples
E-commerce product discovery
E-commerce product discovery
Scenario: You need to gather all product information from a competitor’s online store.How Crawl helps:
- Discovers all product pages through sitemaps and navigation
- Extracts product details, prices, and descriptions from each page
- Handles pagination and category structures automatically
- Filters out cart, checkout, and account pages
Blog content migration
Blog content migration
Scenario: You’re migrating a blog to a new platform and need all content.How Crawl helps:
- Finds all blog posts through sitemap and internal links
- Extracts post content, metadata, and images
- Excludes tag pages, author archives, and pagination
- Preserves URL structure for redirects
Documentation site backup
Documentation site backup
Scenario: You want to create an offline backup of documentation.How Crawl helps:
- Maps entire documentation structure
- Extracts content from all pages
- Maintains hierarchy and navigation structure
- Captures code examples and technical content
Competitive price monitoring
Competitive price monitoring
Scenario: You need to track competitor pricing across their entire catalog.How Crawl helps:
- Discovers all product pages automatically
- Extracts pricing information from each page
- Runs on schedule to track changes over time
- Handles dynamic pricing and regional variations
SEO site audit
SEO site audit
Scenario: You’re auditing a website’s content for SEO optimization.How Crawl helps:
- Discovers all indexable pages
- Extracts titles, meta descriptions, and headings
- Identifies orphaned pages and broken links
- Maps internal linking structure
Crawl vs Map
| Need | Use |
|---|---|
| Extract content from pages | Crawl |
| Deep link following | Crawl |
| Complex filtering patterns | Crawl |
| Webhook notifications | Crawl |
| Quick URL discovery only | Map - completes in seconds |
| URL list with titles/descriptions | Map |

