Overview
Crawl systematically visits and extracts content from an entire website. Give it a starting URL, and it automatically discovers pages, follows links, and extracts clean structured data from every page it visits. Think of it as a smart robot that explores a website for you - reading every page and organizing all the content.How it works
Crawl discovers all pages
- Reads sitemap.xml for URL lists
- Follows internal links automatically
- Discovers pages across the entire site
- Respects depth and scope limits you set
Visits and extracts from each page
- Systematically visits every discovered page
- Parses content from each page
- Structures the data consistently
- Processes pages in parallel for speed
When to use Crawl
Full Site Data Collection
Extract data from hundreds or thousands of pages across an entire website
Product Catalog Scraping
Gather all product information from e-commerce sites automatically
Content Archiving
Create complete snapshots of websites for analysis or backup
Price Monitoring
Track pricing across entire catalogs over time
Common use cases
E-commerce data collection Scrape complete product catalogs including prices, descriptions, images, and specifications. Content migration Move content from old platforms to new systems by crawling and extracting all pages. Competitive analysis Monitor competitor websites for changes in products, pricing, or content strategy. SEO audits Analyze entire websites for content quality, structure, and optimization opportunities.Crawl vs. other tools
| What you need | Use this |
|---|---|
| Data from entire website | Crawl |
| Data from popular sites (Amazon, Google, etc.) | Public Agent - maintained by Nimble |
| Data from sites not in the gallery | Custom Agent - create with natural language |
| Data from specific URLs (expert users) | Extract |
| Search web + extract content from results | Search |
| URLs with context for AI planning | Map |
How Crawl discovers pages
Crawl intelligently finds pages using multiple methods:- Sitemap analysis - Reads sitemap.xml for structured URL lists
- Link following - Discovers pages by following internal navigation
- Depth control - Set how deep to explore from the starting URL
- Smart filtering - Include or exclude specific paths and patterns
Why use Crawl
- Comprehensive - Get data from entire websites, not just single pages
- Automated - No manual URL lists needed - Crawl finds everything
- Efficient - Processes thousands of pages with optimal resource usage
- Flexible - Control depth, scope, and which pages to include or exclude
- Webhook support - Receive data in real-time as pages are processed
Example
Input:example-store.com/products
- Use
GET https://sdk.nimbleway.com/v1/crawl/{crawl_id}
The create crawl response returns
webit_task_id in tasks, while the status endpoint returns task_id. Both refer to the same task identifier used to fetch results.- Use
GET https://sdk.nimbleway.com/v1/tasks/{task_id}/results
Key features
Async operation Crawl jobs run asynchronously - start a crawl and receive results via webhook or check status later. Scale control Set limits on pages, depth, and scope to match your needs and budget. Pattern matching Use include/exclude patterns to target specific sections or content types. Real-time results Get data as it’s extracted via webhooks, or poll for completed results.SDK methods
| Method | Description |
|---|---|
nimble.crawl({...}) | Create a new crawl job |
nimble.crawl.list() | List all crawls with pagination |
nimble.crawl.status(crawl_id) | Get crawl status and task list |
nimble.crawl.terminate(crawl_id) | Stop a running crawl |
nimble.tasks.results(task_id) | Get extracted content for a page |
Next steps
Crawl Usage
See all parameters, code examples, and advanced features

