Skip to main content
Extract data in the background with async requests, or submit multiple URLs at once with batch. Both modes return immediately without blocking your application while Nimble processes the work.

Quick Start

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

response = nimble.extract_async(
    url="https://www.example.com",
    render=True,
    formats=["markdown"]
)

print(f"Task created: {response.task_id}")
# Poll GET /v1/tasks/{task_id} until state == "success"
Use async or batch when you need to:
  • Run extractions without blocking your application
  • Process large volumes of URLs efficiently
  • Deliver results to cloud storage (S3 / GCS) automatically
  • Receive webhook notifications when tasks complete
  • Integrate extraction into scheduled or queued workflows

How it works

1

Submit a request

Send a POST request to the async or batch endpoint. The API returns immediately with a task_id (async) or batch_id (batch) — no waiting for extraction to finish.
2

Nimble processes in the background

Extraction runs asynchronously. For batch, each URL becomes an independent task processed in parallel.
3

Receive results your way

Choose how results are delivered:
  • Polling — check task status and fetch results on demand
  • Webhook — get notified automatically when a task completes
  • Cloud storage — results saved directly to your S3 or GCS bucket
For setup instructions, code examples, and bucket permission configuration, see the Callbacks & Delivery guide.

Async

Submit a single URL and receive a task_id immediately. Nimble processes the extraction in the background — retrieve results via polling, webhook, or cloud storage.
POST https://sdk.nimbleway.com/v1/extract/async

Parameters

Accepts all parameters from the Extract API, plus async-specific delivery options:
  • url (required) — The webpage to extract
  • formats — Output formats: html, markdown, text, screenshot
  • render — Enable JavaScript rendering
  • driver — Extraction engine: vx6, vx8, vx10, etc.
  • country, state, city — Geo-targeting
  • parse, parser — Structured data extraction
  • browser_actions, network_capture — Advanced interactions
Async-specific parameters:
storage_type
string
Storage provider for results. When specified, results are saved to your cloud storage instead of Nimble’s servers.Options: s3 (Amazon S3), gs (Google Cloud Storage)
storage_url
string
Bucket path where results will be saved. Results are stored as {task_id}.json at the specified location.Format: s3://your-bucket/path/prefix/
storage_compress
boolean
default:"false"
Compress results with GZIP before saving. When true, results are saved as {task_id}.json.gz.
storage_object_name
string
Custom filename for the stored object instead of the default task ID.Example: "my-custom-name" saves as my-custom-name.json
callback_url
string
Webhook URL to receive a POST request when the task completes. Nimble sends task metadata (without result data) to this URL when extraction finishes.Example: https://your-api.com/webhook/complete

Examples

Submit a URL and receive a task_id immediately. All three delivery methods below return the same initial response — the difference is how you retrieve results once the task completes.
Poll the task endpoint to check status and retrieve results when complete.
from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

response = nimble.extract_async(
    url="https://www.example.com",
    render=True,
    formats=["html", "markdown"]
)

task_id = response.task_id
print(f"Task created: {task_id}")
Results are saved automatically to your bucket as the task completes. No need to poll — the file appears at storage_url/storage_object_name.json.gz when done.
response = nimble.extract_async(
    url="https://www.example.com",
    render=True,
    formats=["html", "markdown"],
    storage_type="s3",
    storage_url="s3://my-bucket/nimble-extracts/",
    storage_compress=True,
    storage_object_name="example-com-extraction"
)

print(f"Task created: {response.task_id}")
print(f"Results will be saved to: s3://my-bucket/nimble-extracts/example-com-extraction.json.gz")
Nimble sends a POST to your callback_url when the task completes. No polling required — your server receives the notification automatically.
response = nimble.extract_async(
    url="https://www.example.com",
    render=True,
    formats=["html"],
    callback_url="https://your-api.com/webhooks/extract-complete"
)

print(f"Task created: {response.task_id}")
Nimble POSTs task metadata to your URL when complete:
{
  "status": "success",
  "task": {
    "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "state": "pending",
    "created_at": "2026-01-24T12:36:24.685Z",
    "modified_at": "2026-01-24T12:36:24.685Z",
    "input": {}
  }
}
The endpoint returns immediately with a task ID:
{
  "status": "success",
  "task": {
    "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "state": "pending",
    "created_at": "2026-01-24T12:36:24.685Z",
    "modified_at": "2026-01-24T12:36:24.685Z",
    "input": {}
  }
}

Status & Results

When polling, the typical flow is:
  1. Poll GET /v1/tasks/{task_id} until state: "success"
  2. Call GET /v1/tasks/{task_id}/results to retrieve the extracted data
Task states:
StateDescription
pendingTask queued, waiting to start
successExtraction complete, results available
errorExtraction failed

Retrieve results

GET https://sdk.nimbleway.com/v1/tasks/{task_id}/results
{
  "url": "https://www.nimbleway.com/blog/post",
  "task_id": "ec89b1f7-1cf2-40eb-91b4-78716093f9ed",
  "status": "success",
  "data": {
    "html": "<!DOCTYPE html>...",
    "markdown": "# Page Title\n\nContent..."
  },
  "metadata": {
    "query_duration": 1877,
    "driver": "vx6"
  },
  "status_code": 200
}

Check a task

GET https://sdk.nimbleway.com/v1/tasks/{task_id}
{
  "task": {
    "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "state": "success",
    "status_code": 200,
    "created_at": "2026-01-24T12:36:24.685Z",
    "api_type": "extract"
  }
}

List all tasks (paginated)

GET https://sdk.nimbleway.com/v1/tasks?limit=100&cursor={cursor}
{
  "data": [
    {
      "id": "8e8cfde8-...",
      "state": "success",
      "batch_id": "b7e1a2f3-...",
      "api_type": "extract",
      "created_at": "2026-03-19T10:00:00.000Z",
      "download_url": "https://sdk.nimbleway.com/v1/tasks/8e8cfde8-.../results"
    }
  ],
  "pagination": {
    "has_next": true,
    "next_cursor": "eyJpZCI6Ij...",
    "total": 142
  }
}

Batch

Submit up to 1,000 URLs in a single request. Each URL runs as an independent async task. Use shared_inputs to apply common settings across all URLs — individual items in inputs can override any shared value.
POST https://sdk.nimbleway.com/v1/extract/batch

Parameters

inputs
array
required
Array of per-URL extraction requests. Supports up to 1,000 items per batch. Each item accepts all Core extraction parametersurl is the only required field per item.Per-item values override anything set in shared_inputs:
"inputs": [
  { "url": "https://example.com/us-page", "country": "US" },
  { "url": "https://example.com/il-page", "country": "IL" }
]
shared_inputs
object
Default parameters applied to every item in the batch. Accepts two categories:Delivery params (batch-wide, not overridable per item):
  • storage_types3 or gs
  • storage_url — bucket path for results
  • storage_compress — GZIP compress results
  • storage_object_name — custom filename prefix
  • callback_url — webhook on completion
Extraction defaults (applied to all items, overridable per item in inputs):
  • render, driver, formats, country, locale, parse, parser, and others

Examples

Parameters set in shared_inputs are applied as defaults to all items in inputs. Any value set inside an individual item overrides the shared default.
Extract several unique URLs with results delivered to S3 and a webhook callback on completion:
response = nimble.extract_batch(
    inputs=[
        {"url": "https://www.finance.com"},
        {"url": "https://www.travel.com"},
        {"url": "https://www.socialmedia.com"},
    ],
    shared_inputs={
        "storage_type": "s3",
        "storage_url": "s3://your-bucket/results/",
        "callback_url": "https://your-api.com/webhooks/batch-complete",
    }
)

print(f"Batch ID: {response.batch_id}")
print(f"Tasks submitted: {response.batch_size}")
Set a different country per URL. Items without a country fall back to the shared default (CA):
response = nimble.extract_batch(
    inputs=[
        {"url": "https://www.finance.com",     "country": "US", "locale": "en-US"},
        {"url": "https://www.travel.com",       "country": "FR", "locale": "fr-FR"},
        {"url": "https://www.socialmedia.com",  "country": "DE", "locale": "de-DE"},
        {"url": "https://www.searchengine.com"},  # falls back to shared country: CA
    ],
    shared_inputs={
        "country": "CA",
        "locale": "en-CA",
        "storage_type": "s3",
        "storage_url": "s3://your-bucket/results/",
        "callback_url": "https://your-api.com/webhooks/batch-complete",
    }
)
Set the URL once in shared_inputs and vary only the country per item — useful for geo-comparison:
response = nimble.extract_batch(
    inputs=[
        {"country": "US", "locale": "en-US"},
        {"country": "FR", "locale": "fr-FR"},
        {"country": "DE", "locale": "de-DE"},
    ],
    shared_inputs={
        "url": "https://www.finance.com",
        "storage_type": "s3",
        "storage_url": "s3://your-bucket/results/",
        "callback_url": "https://your-api.com/webhooks/batch-complete",
    }
)
The endpoint returns immediately with a batch_id and the initial task list:
{
  "batch_id": "b7e1a2f3-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
  "batch_size": 3,
  "tasks": [
    { "id": "task-001-uuid", "state": "pending", "batch_id": "b7e1a2f3-..." },
    { "id": "task-002-uuid", "state": "pending", "batch_id": "b7e1a2f3-..." }
  ]
}

Status & Results

When polling, the typical flow is:
  1. Poll /v1/batches/{batch_id}/progress until completed: true
  2. Fetch /v1/batches/{batch_id} to get all task IDs and states
  3. For each success task, call GET /v1/tasks/{task_id}/results
Batch states:
StateDescription
pendingTask queued, waiting to start
in_progressTask is currently being processed
successExtraction complete, results available
errorExtraction failed
1

Poll for batch completion

Call /v1/batches/{batch_id}/progress repeatedly until completed: true. This is a lightweight endpoint — use it for polling.
GET https://sdk.nimbleway.com/v1/batches/{batch_id}/progress
{
  "id": "b7e1a2f3-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
  "completed": false,
  "completed_count": 47,
  "progress": 0.47,
  "completed_at": null
}
2

Fetch the full batch details

Once completed: true, fetch the batch details to get all task IDs, states, and download URLs.
GET https://sdk.nimbleway.com/v1/batches/{batch_id}
{
  "id": "b7e1a2f3-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
  "completed": true,
  "completed_count": 3,
  "progress": 1.0,
  "tasks": [
    {
      "id": "task-001-uuid",
      "state": "success",
      "download_url": "https://sdk.nimbleway.com/v1/tasks/task-001-uuid/results"
    },
    {
      "id": "task-002-uuid",
      "state": "error",
      "error": "Connection timeout"
    }
  ]
}
3

Retrieve results per task

Iterate over the task list and call GET /v1/tasks/{task_id}/results for each success task.
import requests

batch = requests.get(
    f"https://sdk.nimbleway.com/v1/batches/{batch_id}",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
).json()

for task in batch["tasks"]:
    if task["state"] == "success":
        result = requests.get(
            task["download_url"],
            headers={"Authorization": "Bearer YOUR_API_KEY"}
        ).json()
        print(result["data"]["markdown"])

List all batches

GET https://sdk.nimbleway.com/v1/batches
{
  "data": [
    {
      "id": "b7e1a2f3-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
      "completed": true,
      "created_at": "2026-03-19T10:00:00.000Z",
      "tasks": ["task-001-uuid", "task-002-uuid", "task-003-uuid"]
    }
  ],
  "pagination": { "has_next": false, "next_cursor": null, "total": 1 }
}

Data Retention

Results are retained for 7 days. For longer retention, use cloud storage (storage_url) to persist results indefinitely.
ItemExpiration
Pending tasks24 hours if not started
Completed results24–48 hours (indefinite with cloud storage)
Failed tasks24 hours

API Reference

Tasks APIs

Check the status of a single async task

Batch APIs

Retrieve the full task list and states for a batch