Skip to main content
Extract data in the background with async requests, or submit multiple URLs at once with batch. Both modes return immediately without blocking your application while Nimble processes the work.

Async requests

Use async extract when you:
  • Background jobs: Integrate extraction into scheduled or queued workflows
  • Long-running operations: Handle complex browser actions or slow-loading sites
  • Webhook integration: Get notified when extraction completes
  • Cloud storage: Save results directly to S3 or Google Cloud Storage
For single-page extractions where you need results immediately, use the synchronous Extract API. For multiple URLs at once, use Batch extraction below.

How it works

1

Submit extraction request

Send a POST request to /v1/extract/async with your extraction parameters. The API returns immediately with a task ID.Optionally include:
  • callback_url to receive a webhook notification when complete
  • storage_url and storage_type (s3/gs) to save results directly to cloud storage
2

Track task status

Use the task ID to check progress at /v1/tasks/{task_id}. The task transitions through states: pendingcompleted or failed.
3

Retrieve results

Once complete, fetch results from /v1/tasks/{task_id}/results or from your configured cloud storage.

API endpoint

POST https://sdk.nimbleway.com/v1/extract/async

Parameters

Async extract accepts all the same parameters as the synchronous extract endpoint, plus optional async-specific parameters:

Core extraction parameters

All parameters from the Extract API are supported:
  • url (required) - The webpage to extract
  • formats - Output formats (html, markdown, text, screenshot)
  • render - Enable JavaScript rendering
  • driver - Choose extraction engine (vx6, vx8, vx10, etc.)
  • country, state, city - Geo-targeting options
  • parse - Enable parsing with schemas
  • parser - Define extraction schema
  • browser_actions - Automate interactions
  • network_capture - Capture network requests
  • And all other extract parameters…

Async-specific parameters

storage_type
string
Storage provider for results. When specified, results are saved to your cloud storage instead of Nimble’s servers.Options: s3 (Amazon S3), gs (Google Cloud Storage)
storage_url
string
Bucket path where results will be saved. Results are stored as {task_id}.json at the specified location.Format: s3://your-bucket/path/prefix/Example: s3://my-bucket/nimble-results/
storage_compress
boolean
default:"false"
Compress results with GZIP before saving. Reduces storage costs and transfer time. When true, results are saved as {task_id}.json.gz.
storage_object_name
string
Custom filename for the stored object instead of the default task ID.Example: "my-custom-name" saves as my-custom-name.json
callback_url
string
Webhook URL to receive a POST request when the task completes. Nimble sends task metadata (without result data) to this URL when extraction finishes.Example: https://your-api.com/webhook/complete

Response format

The async endpoint returns immediately with task information:
{
  "task": {
    "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "state": "completed",
    "status_code": 200,
    "created_at": "2026-01-24T12:36:24.685Z",
    "modified_at": "2026-01-24T12:36:24.685Z",
    "input": {},
    "api_type": "extract"
  }
}

Task states

StateDescription
pendingTask queued, waiting to start
runningExtraction in progress
successExtraction finished successfully, results available
failedExtraction failed, check error details

Example usage

Basic async extraction

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

# Submit async extraction
response = nimble.extract_async(
    url= "https://www.example.com",
    render= True,
    formats= ["html", "markdown"]
)

task_id = response.task_id
print(f"Task created: {task_id}")

# Check status
import time
while True:
    my_task = nimble.tasks.get(task_id)
    print(f"Status: {my_task.task.state}")

    if my_task.task.state == 'success':
        # Get results
        results = nimble.tasks.results(task_id)
        print(results.data.html[:200])
        break
    elif my_task.task.state == 'failed':
        print(f"Task failed: {my_task.message}")
        break

    time.sleep(2)

Save to cloud storage

Store results directly in Amazon S3:
from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

response = nimble.extract_async(
    url= "https://www.example.com",
    render= True,
    formats= ["html", "markdown"],
    storage_type= "s3",
    storage_url= "s3://my-bucket/nimble-extracts/",
    storage_compress= True,
    storage_object_name= "example-com-extraction"
)

task_id = response.task_id
print(f"Task created: {task_id}")
print(f"Results will be saved to: s3://my-bucket/nimble-extracts/example-com-extraction.json.gz")

# Results are automatically saved to S3 when complete
# You can still check status and retrieve from Nimble's servers if needed

Webhook notifications

Get notified when extraction completes:
from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

response = nimble.extract_async(
    url= "https://www.example.com",
    render= True,
    formats= ["html"],
    callback_url= "https://your-api.com/webhooks/extract-complete"
)

task_id = response.task_id
print(f"Task created: {task_id}")
print("Webhook will be called when extraction completes")

Checking task status

Use the Tasks API to check status:
GET https://sdk.nimbleway.com/v1/tasks/{task_id}/results

Status Response:

{
  "task": {
    "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "state": "completed",
    "status_code": 200,
    "created_at": "2026-01-24T12:36:24.685Z",
    "modified_at": "2026-01-24T12:36:24.685Z",
    "input": {},
    "api_type": "extract"
  }
}

Retrieving results

Once the task is complete, retrieve results:
GET https://sdk.nimbleway.com/v1/tasks/{task_id}/results

Results Response

{
    "url": "https://www.nimbleway.com/blog/post",
    "task_id": "ec89b1f7-1cf2-40eb-91b4-78716093f9ed",
    "status": "success",
    "task": {
        "id": "ec89b1f7-1cf2-40eb-91b4-78716093f9ed",
        "state": "success",
        "created_at": "2026-02-09T23:15:43.549Z",
        "modified_at": "2026-02-09T23:16:39.094Z",
        "account_name": "your-account"
    },
    "data": {
        "html": "<!DOCTYPE html>...",
        "markdown": "# Page Title\n\nContent...",
        "headers": { ... }
    },
    "metadata": {
        "query_time": "2026-02-09T23:15:43.549Z",
        "query_duration": 1877,
        "response_parameters": {
            "input_url": "https://www.nimbleway.com/blog/post"
        },
		"driver": "vx6"
    },
    "status_code": 200
}

Batch extraction

Submit multiple URLs in a single request. Each URL is processed as an independent task — results are delivered to cloud storage as each one completes. Use batch when you:
  • Extract many URLs at once: Submit up to 1,000 URLs in a single API call instead of looping
  • Apply shared settings: Set common params once (render, country, driver) across all URLs
  • Scale efficiently: Process large URL lists without managing individual async requests

API endpoint

POST https://sdk.nimbleway.com/v1/extract/batch

Parameters

params
array
required
Array of per-URL extraction requests. Supports up to 1,000 items per batch. Each item accepts all Core extraction parametersurl is the only required field per item.Per-item values override anything set in shared_params. This lets you mix and match — for example, run most URLs with a US proxy but override country on specific items:
"params": [
  { "url": "https://example.com/us-page", "country": "US" },
  { "url": "https://example.com/il-page", "country": "IL" }
]
shared_params
object
Default parameters applied to every item in the batch. Accepts two categories of params:Async delivery params (applied to the whole batch):
  • storage_types3 or gs
  • storage_url — bucket path for results
  • storage_compress — GZIP compress results
  • storage_object_name — custom filename prefix
  • callback_url — webhook on completion
Core extraction params used as defaults (can be overridden per item):
  • render, driver, formats, country, locale, parse, parser, and others
Any value set in shared_params can be overridden by the same field in an individual params item.

Example

Extract multiple pages using shared settings, with per-item country overrides:
from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

response = nimble.extract_batch(
    params=[
        # Uses shared country (US)
        {"url": "https://www.example.com/page1"},
        # Overrides country for this item only
        {"url": "https://www.example.com/page2", "country": "IL"},
        {"url": "https://www.example.com/page3"},
    ],
    shared_params={
        # Core params — applied to all items unless overridden
        "render": True,
        "country": "US",
        "formats": ["markdown"],
        # Async delivery params — applied to the whole batch
        "storage_type": "s3",
        "storage_url": "s3://my-bucket/batch-results/",
    }
)

print(f"Batch ID: {response.batch_id}")
print(f"Tasks submitted: {response.batch_size}")
for task in response.tasks:
    print(f"  Task {task.id}: {task.state}")

Batch response

The endpoint returns immediately with a batch_id and the list of created tasks:
{
  "batch_id": "b7e1a2f3-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
  "batch_size": 3,
  "tasks": [
    {
      "id": "task-001-uuid",
      "state": "pending",
      "batch_id": "b7e1a2f3-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
      "created_at": "2026-03-19T10:00:00.000Z",
      "modified_at": "2026-03-19T10:00:00.000Z",
      "api_type": "extract"
    },
    {
      "id": "task-002-uuid",
      "state": "pending",
      "batch_id": "b7e1a2f3-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
      "created_at": "2026-03-19T10:00:00.000Z",
      "modified_at": "2026-03-19T10:00:00.000Z",
      "api_type": "extract"
    }
  ]
}
Each task in the batch is independent. Tasks complete at different times based on page complexity. Use the task IDs to poll individual results, or use cloud storage to receive each result as it completes.

Data Retention & Expiration

Result retention

Results are typically retained for 7 days. If you need longer retention:
  • Use cloud storage (storage_url) to persist results indefinitely
  • Download results promptly after completion
  • Implement your own archival system

Task expiration

  • Pending tasks: Expire after 24 hours if not started
  • Completed results: Available for 24-48 hours (unless using cloud storage)
  • Failed tasks: Retry data available for 24 hours