Quick Start
Example Request
Example Response
How it works
You submit a crawl request
Provide a starting URL and configure crawl options (limits, filters,
extraction settings)
An async crawl job is created
- Returns immediately with a
crawl_idto track progress - The crawl runs in the background on Nimble’s infrastructure - Optional: Configure webhooks to receive real-time notifications
Crawl discovers and processes pages
- Reads sitemaps and follows internal links - Creates individual tasks for
each discovered URL - Extracts content from pages as they’re visited -
Status updates live: track
pending,completed, andfailedcounts
Parameters
Supported input parameters:url - Required
url - Required
The starting point for your crawl. The crawler will begin here and discover other pages from this URL.Example:
https://www.nimbleway.comname
name
Give your crawl a memorable name. This helps you identify it later when you have multiple crawls running.Example:
my-zillow-crawllimit
limit
Stop the crawl after finding this many pages.
- Min:
1 - Max:
10000 - Default:
5000
extract_options
extract_options
Automatically extract content from each page as you crawl it. Accepts all Extract API options.**Example: **
sitemap
sitemap
Decide how to use the website’s sitemap for discovering pages.Options:
include(default) - Use both the sitemap and discovered linksonly- Just use the sitemap (fastest)skip- Ignore the sitemap and only follow links
crawl_entire_domain
crawl_entire_domain
Let the crawler explore the entire domain, not just pages “under” your starting URL.For example, if you start at
/blog, enabling this lets it also crawl /about and /contact.allow_subdomains
allow_subdomains
Allow the crawler to follow links to subdomains.For example, from
www.example.com to blog.example.com or shop.example.com.include_paths
include_paths
Only crawl pages whose URLs match these regex patterns.Example:
["/blog/.*", "/articles/.*"]exclude_paths
exclude_paths
Skip pages whose URLs match these regex patterns.Example:
[".*/tag/.*", ".*/page/[0-9]+"]max_discovery_depth
max_discovery_depth
Control how many “clicks away” from the starting page the crawler can go.
- Min:
1 - Max:
20 - Default:
5
ignore_query_parameters
ignore_query_parameters
Treat URLs with different query parameters as the same page, preventing duplicate crawls.
callback
callback
Get notified when your crawl completes or as pages are discovered.Configuration:
url(required) - String | Webhook URL to receive notificationsheaders- Object | Custom headers for authenticationmetadata- Object | Extra data to include in payloadsevents- Array | Which events trigger notifications:started,page,completed,failed
country
country
Crawl the site as if you’re browsing from a specific country.Use ISO Alpha-2 country codes like
US, GB, FR, DE, CA, JP, etc. Use ALL for random country selection.locale
locale
Set the language preference for crawling. Use LCID standard.Locale Examples:
en-US- English (United States)en-GB- English (United Kingdom)fr-FR- French (France)de-DE- German (Germany)
Usage
Basic crawl
Crawl a website using default settings:Filter with URL patterns
Use include and exclude patterns to control which URLs are crawled:Crawl entire domain
Allow crawler to follow all internal links beyond the starting path:Crawl with extraction
Extract structured data from each page during the crawl:Combined parameters
Crawl with multiple parameters for precise control:Managing Crawls
List crawls
Get all your crawls filtered by status using the REST API:
-
Available status filters:
pending,in_progress,completed,failed,canceled
Get crawl status (by crawl_id)
Check progress and get the list of task IDs for a specific crawl using the REST API:
Get task results
Use the
task_id from the crawl status response to fetch extracted content for each page using the REST API:Example Task Results Response
Example Task Results Response
Response Fields
When you use Crawl, you receive:- Async operation - Crawl jobs run in the background, check status or receive webhooks
- Progress tracking - Monitor
total,pending,completed, andfailedcounts - Task-based results - Each page becomes a task with extractable content
- Webhook support - Get notified in real-time as pages are processed
Create Crawl Response
Returns immediate response with crawl job detailsExample Response
Example Response
| Field | Type | Description |
|---|---|---|
crawl_id | string | Unique identifier for the crawl job |
name | string | Optional name you assigned to the crawl |
url | string | Starting URL for the crawl |
status | string | queued, running, succeeded, failed, canceled |
account_name | string | Your account identifier |
created_at | string | Timestamp when crawl was created |
updated_at | string | Timestamp of last status update |
completed_at | string | Timestamp when crawl completed (null if in progress) |
crawl_options | object | Configuration settings applied to this crawl |
extract_options | object | Extraction settings (null if not configured) |
Get Crawl Status by ID Response
Returns the crawl object wrapped in acrawl key, with progress counters and task list:
Example Response
Example Response
| Field | Type | Description |
|---|---|---|
crawl.crawl_id | string | Unique identifier for the crawl job |
crawl.status | string | Current crawl status |
crawl.total | integer | Total URLs discovered |
crawl.pending | integer | URLs waiting to be processed |
crawl.completed | integer | Successfully processed URLs |
crawl.failed | integer | Failed URL extractions |
crawl.tasks | array | List of individual page tasks |
crawl.tasks[].task_id | string | Task ID to use with GET /v1/tasks/{id}/results |
crawl.tasks[].status | string | pending, processing, completed, failed |
SDK and API methods
| Method | Availability | Description |
|---|---|---|
nimble.crawl.run(url=..., ...) | Python SDK | Create a new crawl job |
GET /v1/crawl | REST API | List all crawls with pagination |
GET /v1/crawl/{crawl_id} | REST API | Get crawl status and task list |
DELETE /v1/crawl/{crawl_id} | REST API | Stop a running crawl |
GET /v1/tasks/{task_id}/results | REST API | Get extracted content for a page |
The Python SDK currently supports creating crawl jobs via
nimble.crawl.run(). For crawl management operations (listing, status,
cancellation) and retrieving task results, use the REST API directly as shown
in the examples above.Use cases
Full Site Data Collection
Extract data from hundreds or thousands of pages across an entire website
Product Catalog Scraping
Gather all product information from e-commerce sites automatically
Content Archiving
Create complete snapshots of websites for analysis or backup
Price Monitoring
Track pricing across entire catalogs over time
Real-world examples
E-commerce product discovery
E-commerce product discovery
Scenario: You need to gather all product information from a competitor’s online store.How Crawl helps:
- Discovers all product pages through sitemaps and navigation
- Extracts product details, prices, and descriptions from each page
- Handles pagination and category structures automatically
- Filters out cart, checkout, and account pages
Blog content migration
Blog content migration
Scenario: You’re migrating a blog to a new platform and need all content.How Crawl helps:
- Finds all blog posts through sitemap and internal links
- Extracts post content, metadata, and images
- Excludes tag pages, author archives, and pagination
- Preserves URL structure for redirects
Documentation site backup
Documentation site backup
Scenario: You want to create an offline backup of documentation.How Crawl helps:
- Maps entire documentation structure
- Extracts content from all pages
- Maintains hierarchy and navigation structure
- Captures code examples and technical content
Competitive price monitoring
Competitive price monitoring
Scenario: You need to track competitor pricing across their entire catalog.How Crawl helps:
- Discovers all product pages automatically
- Extracts pricing information from each page
- Runs on schedule to track changes over time
- Handles dynamic pricing and regional variations
SEO site audit
SEO site audit
Scenario: You’re auditing a website’s content for SEO optimization.How Crawl helps:
- Discovers all indexable pages
- Extracts titles, meta descriptions, and headings
- Identifies orphaned pages and broken links
- Maps internal linking structure
Crawl vs Map
| Need | Use |
|---|---|
| Extract content from pages | Crawl |
| Deep link following | Crawl |
| Complex filtering patterns | Crawl |
| Webhook notifications | Crawl |
| Quick URL discovery only | Map - completes in seconds |
| URL list with titles/descriptions | Map |