Skip to main content
Use async requests when you need to extract data from multiple pages, run long-running operations, or integrate extraction into background job systems. Async requests return immediately with a task ID, letting you check status and retrieve results later.

When to use async

Use async extract when you:
  • Batch processing: Extract data from hundreds or thousands of URLs
  • Background jobs: Integrate extraction into scheduled or queued workflows
  • Long-running operations: Handle complex browser actions or slow-loading sites
  • Webhook integration: Get notified when extraction completes
  • Cloud storage: Save results directly to S3 or Google Cloud Storage
Async requests are ideal for high-volume extraction where you don’t need immediate results. For single-page extractions where you need results right away, use the synchronous Extract API.

How it works

1

Submit extraction request

Send a POST request to /v1/extract/async with your extraction parameters. The API returns immediately with a task ID.Optionally include:
  • callback_url to receive a webhook notification when complete
  • storage_url and storage_type (s3/gs) to save results directly to cloud storage
2

Track task status

Use the task ID to check progress at /v1/tasks/{task_id}. The task transitions through states: pendingcompleted or failed.
3

Retrieve results

Once complete, fetch results from /v1/tasks/{task_id}/results or from your configured cloud storage.

API endpoint

POST https://sdk.nimbleway.com/v1/extract/async

Parameters

Async extract accepts all the same parameters as the synchronous extract endpoint, plus optional async-specific parameters:

Core extraction parameters

All parameters from the Extract API are supported:
  • url (required) - The webpage to extract
  • formats - Output formats (html, markdown, text, screenshot)
  • render - Enable JavaScript rendering
  • driver - Choose extraction engine (vx6, vx8, vx10, etc.)
  • country, state, city - Geo-targeting options
  • parse - Enable parsing with schemas
  • parser - Define extraction schema
  • browser_actions - Automate interactions
  • network_capture - Capture network requests
  • And all other extract parameters…

Async-specific parameters

storage_type
string
Storage provider for results. Use s3 for Amazon S3 or gs for Google Cloud Storage.When specified, results are automatically saved to your cloud storage instead of Nimble’s servers.Options: s3, gs
storage_url
string
Repository URL where results will be saved. Format: s3://Your.Bucket.Name/path/prefix/Results are saved as {TASK_ID}.json in the specified location.Example: s3://my-bucket/nimble-results/
storage_compress
boolean
default:"false"
Compress results using GZIP before saving to storage. Reduces storage costs and transfer time.When true, results are saved as {TASK_ID}.json.gz
storage_object_name
string
Custom name for the stored object instead of the default task ID.Example: "my-custom-name" saves as my-custom-name.json
callback_url
string
Webhook URL to receive a POST request when the task completes.The API sends task metadata (without result data) to this URL when extraction finishes.Example: https://your-api.com/webhooks/extract-complete

Response format

The async endpoint returns immediately with task information:
{
  "task": {
    "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "state": "completed",
    "status_code": 200,
    "created_at": "2026-01-24T12:36:24.685Z",
    "modified_at": "2026-01-24T12:36:24.685Z",
    "input": {},
    "api_type": "extract"
  }
}

Task states

StateDescription
pendingTask queued, waiting to start
runningExtraction in progress
successExtraction finished successfully, results available
failedExtraction failed, check error details

Example usage

Basic async extraction

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

# Submit async extraction
response = nimble.extract_async(
    url= "https://www.example.com",
    render= True,
    formats= ["html", "markdown"]
)

task_id = response.task_id
print(f"Task created: {task_id}")

# Check status
import time
while True:
    my_task = nimble.tasks.get(task_id)
    print(f"Status: {my_task.task.state}")

    if my_task.task.state == 'success':
        # Get results
        results = nimble.tasks.results(task_id)
        print(results.data.html[:200])
        break
    elif my_task.task.state == 'failed':
        print(f"Task failed: {my_task.message}")
        break

    time.sleep(2)

Save to cloud storage

Store results directly in Amazon S3:
from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

response = nimble.extract_async(
    url= "https://www.example.com",
    render= True,
    formats= ["html", "markdown"],
    storage_type= "s3",
    storage_url= "s3://my-bucket/nimble-extracts/",
    storage_compress= True,
    storage_object_name= "example-com-extraction"
)

task_id = response.task_id
print(f"Task created: {task_id}")
print(f"Results will be saved to: s3://my-bucket/nimble-extracts/example-com-extraction.json.gz")

# Results are automatically saved to S3 when complete
# You can still check status and retrieve from Nimble's servers if needed

Webhook notifications

Get notified when extraction completes:
from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

response = nimble.extract_async(
    url= "https://www.example.com",
    render= True,
    formats= ["html"],
    callback_url= "https://your-api.com/webhooks/extract-complete"
)

task_id = response.task_id
print(f"Task created: {task_id}")
print("Webhook will be called when extraction completes")

Checking task status

Use the Tasks API to check status:
GET https://sdk.nimbleway.com/v1/tasks/{task_id}/results

Status Response:

{
  "task": {
    "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "state": "completed",
    "status_code": 200,
    "created_at": "2026-01-24T12:36:24.685Z",
    "modified_at": "2026-01-24T12:36:24.685Z",
    "input": {},
    "api_type": "extract"
  }
}

Retrieving results

Once the task is complete, retrieve results:
GET https://sdk.nimbleway.com/v1/tasks/{task_id}/results

Results Response

{
    "url": "https://www.nimbleway.com/blog/post",
    "task_id": "ec89b1f7-1cf2-40eb-91b4-78716093f9ed",
    "status": "success",
    "task": {
        "id": "ec89b1f7-1cf2-40eb-91b4-78716093f9ed",
        "state": "success",
        "created_at": "2026-02-09T23:15:43.549Z",
        "modified_at": "2026-02-09T23:16:39.094Z",
        "account_name": "your-account"
    },
    "data": {
        "html": "<!DOCTYPE html>...",
        "markdown": "# Page Title\n\nContent...",
        "headers": { ... }
    },
    "metadata": {
        "query_time": "2026-02-09T23:15:43.549Z",
        "query_duration": 1877,
        "response_parameters": {
            "input_url": "https://www.nimbleway.com/blog/post"
        },
		"driver": "vx6"
    },
    "status_code": 200
}

Data Retention & Expiration

Result retention

Results are typically retained for 7 days. If you need longer retention:
  • Use cloud storage (storage_url) to persist results indefinitely
  • Download results promptly after completion
  • Implement your own archival system

Task expiration

  • Pending tasks: Expire after 24 hours if not started
  • Completed results: Available for 24-48 hours (unless using cloud storage)
  • Failed tasks: Retry data available for 24 hours