Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Url to crawl
"https://example.com"
Name of the crawl
"my-crawler-1"
Sitemap and other methods will be used together to find URLs
skip, include, only "include"
Allows the crawler to follow internal links to sibling or parent URLs, not just child paths
false
Maximum number of pages to crawl
1 <= x <= 10000100
Maximum depth to crawl based on discovery order
1 <= x <= 203
URL pathname regex patterns that exclude matching URLs from the crawl
["/exclude-this-path", "/and-this-path"]
URL pathname regex patterns that include matching URLs in the crawl
["/include-this-path", "/and-this-path"]
Do not re-scrape the same path with different (or none) query parameters
false
Allows the crawler to follow links to external websites
false
Allows the crawler to follow links to subdomains of the main domain
false
Webhook configuration for receiving crawl results
Request body model for the /extract endpoint
Successful Response - Crawl Task Created
Starting URL for the crawl
Crawl configuration settings applied to this task
Unique identifier for the crawl task
Current status of the crawl task
pending, in_progress, completed, failed Timestamp when the crawl task was created
Optional name for the crawl task
Optional extraction configuration for each crawled page
Total number of pages discovered (null until discovery begins)
Number of pages completed (null until crawl begins)
Timestamp when the crawl was completed (null if not completed)