Skip to main content
POST
/
v1
/
crawl
Create Crawl
curl --request POST \
  --url https://sdk.nimbleway.com/v1/crawl \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "https://example.com",
  "name": "my-crawler-1",
  "sitemap": "include",
  "limit": 100,
  "max_discovery_depth": 3
}
'
{
  "url": "<string>",
  "crawl_options": {
    "url": "<string>",
    "limit": 123,
    "sitemap": "skip",
    "allow_subdomains": true,
    "crawl_entire_domain": true,
    "max_discovery_depth": 123,
    "allow_external_links": true,
    "ignore_query_parameters": true
  },
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "status": "pending",
  "created_at": "2023-11-07T05:31:56Z",
  "name": "<string>",
  "extract_options": {},
  "total": 123,
  "completed": 123,
  "completed_at": "2023-11-07T05:31:56Z"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string<uri>
required

Url to crawl

Example:

"https://example.com"

name
string

Name of the crawl

Example:

"my-crawler-1"

sitemap
enum<string>

Sitemap and other methods will be used together to find URLs

Available options:
skip,
include,
only
Example:

"include"

crawl_entire_domain
boolean

Allows the crawler to follow internal links to sibling or parent URLs, not just child paths

Example:

false

limit
integer

Maximum number of pages to crawl

Required range: 1 <= x <= 10000
Example:

100

max_discovery_depth
integer

Maximum depth to crawl based on discovery order

Required range: 1 <= x <= 20
Example:

3

exclude_paths
string[]

URL pathname regex patterns that exclude matching URLs from the crawl

Example:
["/exclude-this-path", "/and-this-path"]
include_paths
string[]

URL pathname regex patterns that include matching URLs in the crawl

Example:
["/include-this-path", "/and-this-path"]
ignore_query_parameters
boolean

Do not re-scrape the same path with different (or none) query parameters

Example:

false

Allows the crawler to follow links to external websites

Example:

false

allow_subdomains
boolean

Allows the crawler to follow links to subdomains of the main domain

Example:

false

callback

Webhook configuration for receiving crawl results

extract_options
object

Request body model for the /extract endpoint

Response

Successful Response - Crawl Task Created

url
string<uri>
required

Starting URL for the crawl

crawl_options
object
required

Crawl configuration settings applied to this task

id
string<uuid>
required

Unique identifier for the crawl task

status
enum<string>
required

Current status of the crawl task

Available options:
pending,
in_progress,
completed,
failed
created_at
string<date-time>
required

Timestamp when the crawl task was created

name
string | null

Optional name for the crawl task

extract_options
object

Optional extraction configuration for each crawled page

total
integer | null

Total number of pages discovered (null until discovery begins)

completed
integer | null

Number of pages completed (null until crawl begins)

completed_at
string<date-time> | null

Timestamp when the crawl was completed (null if not completed)