Skip to main content
Use async requests when you need to extract data from multiple pages, run long-running operations, or integrate extraction into background job systems. Async requests return immediately with a task ID, letting you check status and retrieve results later.

When to use async

Use async extract when you:
  • Batch processing: Extract data from hundreds or thousands of URLs
  • Background jobs: Integrate extraction into scheduled or queued workflows
  • Long-running operations: Handle complex browser actions or slow-loading sites
  • Webhook integration: Get notified when extraction completes
  • Cloud storage: Save results directly to S3 or Google Cloud Storage
Async requests are ideal for high-volume extraction where you don’t need immediate results. For single-page extractions where you need results right away, use the synchronous Extract API.

How it works

1

Submit extraction request

Send a POST request to /v1/extract/async with your extraction parameters. The API returns immediately with a task ID.Optionally include:
  • callback_url to receive a webhook notification when complete
  • storage_url and storage_type (s3/gs) to save results directly to cloud storage
2

Track task status

Use the task ID to check progress at /v1/tasks/{task_id}. The task transitions through states: pendingcompleted or failed.
3

Retrieve results

Once complete, fetch results from /v1/tasks/{task_id}/results or from your configured cloud storage.

API endpoint

POST https://sdk.nimbleway.com/v1/extract/async

Parameters

Async extract accepts all the same parameters as the synchronous extract endpoint, plus optional async-specific parameters:

Core extraction parameters

All parameters from the Extract API are supported:
  • url (required) - The webpage to extract
  • formats - Output formats (html, markdown, text, screenshot)
  • render - Enable JavaScript rendering
  • driver - Choose extraction engine (vx6, vx8, vx10, etc.)
  • country, state, city - Geo-targeting options
  • parse - Enable parsing with schemas
  • parser - Define extraction schema
  • browser_actions - Automate interactions
  • network_capture - Capture network requests
  • And all other extract parameters…

Async-specific parameters

storage_type
string
Storage provider for results. Use s3 for Amazon S3 or gs for Google Cloud Storage.When specified, results are automatically saved to your cloud storage instead of Nimble’s servers.Options: s3, gs
storage_url
string
Repository URL where results will be saved. Format: s3://Your.Bucket.Name/path/prefix/Results are saved as {TASK_ID}.json in the specified location.Example: s3://my-bucket/nimble-results/
storage_compress
boolean
default:"false"
Compress results using GZIP before saving to storage. Reduces storage costs and transfer time.When true, results are saved as {TASK_ID}.json.gz
storage_object_name
string
Custom name for the stored object instead of the default task ID.Example: "my-custom-name" saves as my-custom-name.json
callback_url
string
Webhook URL to receive a POST request when the task completes.The API sends task metadata (without result data) to this URL when extraction finishes.Example: https://your-api.com/webhooks/extract-complete

Response format

The async endpoint returns immediately with task information:
{
  "status": "success",
  "task": {
    "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "state": "pending",
    "status_url": "https://sdk.nimbleway.com/v1/tasks/8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "created_at": "2026-01-24T12:36:24.685Z",
    "modified_at": "2026-01-24T12:36:24.685Z",
    "input": {},
    "api_type": "extract",
    "download_url": "https://sdk.nimbleway.com/v1/tasks/8e8cfde8-345b-42b8-b3e2-0c61eb11e00f/results"
  }
}

Task states

StateDescription
pendingTask queued, waiting to start
runningExtraction in progress
completedExtraction finished successfully, results available
failedExtraction failed, check error details

Example usage

Basic async extraction

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

# Submit async extraction
response = nimble.extract.async({
    "url": "https://www.example.com",
    "render": True,
    "formats": ["html", "markdown"]
})

task_id = response['task']['id']
print(f"Task created: {task_id}")

# Check status
import time
while True:
    status = nimble.get_task_status(task_id)
    print(f"Status: {status['state']}")

    if status['state'] == 'completed':
        # Get results
        results = nimble.get_task_results(task_id)
        print(results['html'][:200])
        break
    elif status['state'] == 'failed':
        print(f"Task failed: {status.get('error')}")
        break

    time.sleep(2)

Batch processing

Extract data from multiple URLs:
from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

urls = [
    "https://www.example.com/page1",
    "https://www.example.com/page2",
    "https://www.example.com/page3"
]

# Submit all tasks
task_ids = []
for url in urls:
    response = nimble.extract.async({
        "url": url,
        "render": True,
        "formats": ["html"]
    })
    task_ids.append(response['task']['id'])
    print(f"Submitted: {url}{response['task']['id']}")

# Wait for all to complete
import time
completed = []
while len(completed) < len(task_ids):
    for task_id in task_ids:
        if task_id in completed:
            continue

        status = nimble.get_task_status(task_id)
        if status['state'] == 'completed':
            completed.append(task_id)
            print(f"Completed: {task_id}")
        elif status['state'] == 'failed':
            completed.append(task_id)
            print(f"Failed: {task_id}")

    if len(completed) < len(task_ids):
        time.sleep(5)

# Retrieve all results
for task_id in task_ids:
    results = nimble.get_task_results(task_id)
    print(f"\n{task_id}:\n{results['html'][:100]}")

Save to cloud storage

Store results directly in Amazon S3:
from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

response = nimble.extract.async({
    "url": "https://www.example.com",
    "render": True,
    "formats": ["html", "markdown"],
    "storage_type": "s3",
    "storage_url": "s3://my-bucket/nimble-extracts/",
    "storage_compress": True,
    "storage_object_name": "example-com-extraction"
})

task_id = response['task']['id']
print(f"Task created: {task_id}")
print(f"Results will be saved to: s3://my-bucket/nimble-extracts/example-com-extraction.json.gz")

# Results are automatically saved to S3 when complete
# You can still check status and retrieve from Nimble's servers if needed

Webhook notifications

Get notified when extraction completes:
from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

response = nimble.extract.async({
    "url": "https://www.example.com",
    "render": True,
    "formats": ["html"],
    "callback_url": "https://your-api.com/webhooks/extract-complete"
})

task_id = response['task']['id']
print(f"Task created: {task_id}")
print("Webhook will be called when extraction completes")

# Your webhook endpoint will receive:
# POST https://your-api.com/webhooks/extract-complete
# {
#   "task": {
#     "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
#     "state": "completed",
#     "api_type": "extract",
#     "download_url": "https://sdk.nimbleway.com/v1/tasks/8e8cfde8-345b-42b8-b3e2-0c61eb11e00f/results"
#   }
# }

Checking task status

Use the Tasks API to check status:
GET https://sdk.nimbleway.com/v1/tasks/{task_id}
Response:
{
  "task": {
    "id": "8e8cfde8-345b-42b8-b3e2-0c61eb11e00f",
    "state": "completed",
    "created_at": "2026-01-24T12:36:24.685Z",
    "modified_at": "2026-01-24T12:37:15.123Z",
    "api_type": "extract",
    "download_url": "https://sdk.nimbleway.com/v1/tasks/8e8cfde8-345b-42b8-b3e2-0c61eb11e00f/results"
  }
}

Retrieving results

Once the task is complete, retrieve results:
GET https://sdk.nimbleway.com/v1/tasks/{task_id}/results
The response format is identical to the synchronous extract endpoint.

Best practices

Polling intervals

Don’t poll too frequently - it wastes resources and may trigger rate limits:
# ✅ Good: Reasonable polling interval
time.sleep(5)  # Check every 5 seconds

# ❌ Bad: Excessive polling
time.sleep(0.5)  # Checking twice per second

Error handling

Always handle failed tasks:
status = nimble.get_task_status(task_id)

if status['state'] == 'failed':
    error = status.get('error', 'Unknown error')
    print(f"Task failed: {error}")
    # Implement retry logic or alert
elif status['state'] == 'completed':
    results = nimble.get_task_results(task_id)
    # Process results

Batch size limits

Don’t submit thousands of tasks simultaneously:
# ✅ Good: Submit in batches
from itertools import islice

def batch(iterable, size):
    iterator = iter(iterable)
    while chunk := list(islice(iterator, size)):
        yield chunk

urls = [...]  # Large list of URLs
for url_batch in batch(urls, 100):
    for url in url_batch:
        nimble.extract.async({"url": url})
    time.sleep(10)  # Pause between batches

# ❌ Bad: Submit all at once
for url in urls:  # 10,000+ URLs
    nimble.extract.async({"url": url})

Result retention

Results are typically retained for 24-48 hours. If you need longer retention:
  • Use cloud storage (storage_url) to persist results indefinitely
  • Download results promptly after completion
  • Implement your own archival system

Limitations

Task expiration

  • Pending tasks: Expire after 24 hours if not started
  • Completed results: Available for 24-48 hours (unless using cloud storage)
  • Failed tasks: Retry data available for 24 hours

Concurrent tasks

Concurrent task limits vary by plan:
PlanMax Concurrent Tasks
Starter10
Growth50
Pro100
EnterpriseCustom

Request limits

All extraction parameters have the same limits as synchronous requests:
  • Maximum timeout: 180000ms (3 minutes)
  • Maximum URL length: 2048 characters
  • Maximum cookies: 100 per request
  • Maximum headers: 50 per request