Async requests
Use async extract when you:- Background jobs: Integrate extraction into scheduled or queued workflows
- Long-running operations: Handle complex browser actions or slow-loading sites
- Webhook integration: Get notified when extraction completes
- Cloud storage: Save results directly to S3 or Google Cloud Storage
For single-page extractions where you need results immediately, use the
synchronous Extract API. For
multiple URLs at once, use Batch extraction below.
How it works
Submit extraction request
Send a POST request to
/v1/extract/async with your extraction parameters. The API returns immediately with a task ID.Optionally include:callback_urlto receive a webhook notification when completestorage_urlandstorage_type(s3/gs) to save results directly to cloud storage
Track task status
Use the task ID to check progress at
/v1/tasks/{task_id}. The task transitions through states: pending → completed or failed.API endpoint
Parameters
Async extract accepts all the same parameters as the synchronous extract endpoint, plus optional async-specific parameters:Core extraction parameters
All parameters from the Extract API are supported:url(required) - The webpage to extractformats- Output formats (html, markdown, text, screenshot)render- Enable JavaScript renderingdriver- Choose extraction engine (vx6, vx8, vx10, etc.)country,state,city- Geo-targeting optionsparse- Enable parsing with schemasparser- Define extraction schemabrowser_actions- Automate interactionsnetwork_capture- Capture network requests- And all other extract parameters…
Async-specific parameters
storage_type
storage_type
Storage provider for results. When specified, results are saved to your cloud storage instead of Nimble’s servers.Options:
s3 (Amazon S3), gs (Google Cloud Storage)storage_url
storage_url
Bucket path where results will be saved. Results are stored as
{task_id}.json at the specified location.Format: s3://your-bucket/path/prefix/Example: s3://my-bucket/nimble-results/storage_compress
storage_compress
Compress results with GZIP before saving. Reduces storage costs and transfer time. When
true, results are saved as {task_id}.json.gz.storage_object_name
storage_object_name
Custom filename for the stored object instead of the default task ID.Example:
"my-custom-name" saves as my-custom-name.jsoncallback_url
callback_url
Webhook URL to receive a POST request when the task completes. Nimble sends task metadata (without result data) to this URL when extraction finishes.Example:
https://your-api.com/webhook/completeResponse format
The async endpoint returns immediately with task information:Task states
| State | Description |
|---|---|
pending | Task queued, waiting to start |
running | Extraction in progress |
success | Extraction finished successfully, results available |
failed | Extraction failed, check error details |
Example usage
Basic async extraction
Save to cloud storage
Store results directly in Amazon S3:Webhook notifications
Get notified when extraction completes:Checking task status
Use the Tasks API to check status:Status Response:
Retrieving results
Once the task is complete, retrieve results:Results Response
Batch extraction
Submit multiple URLs in a single request. Each URL is processed as an independent task — results are delivered to cloud storage as each one completes. Use batch when you:- Extract many URLs at once: Submit up to 1,000 URLs in a single API call instead of looping
- Apply shared settings: Set common params once (render, country, driver) across all URLs
- Scale efficiently: Process large URL lists without managing individual async requests
API endpoint
Parameters
params — Required
params — Required
Array of per-URL extraction requests. Supports up to 1,000 items per batch. Each item accepts all Core extraction parameters —
url is the only required field per item.Per-item values override anything set in shared_params. This lets you mix and match — for example, run most URLs with a US proxy but override country on specific items:shared_params
shared_params
Example
Extract multiple pages using shared settings, with per-item country overrides:Batch response
The endpoint returns immediately with abatch_id and the list of created tasks:
Each task in the batch is independent. Tasks complete at different times based on page complexity. Use the task IDs to poll individual results, or use cloud storage to receive each result as it completes.
Data Retention & Expiration
Result retention
Results are typically retained for 7 days. If you need longer retention:- Use cloud storage (
storage_url) to persist results indefinitely - Download results promptly after completion
- Implement your own archival system
Task expiration
- Pending tasks: Expire after 24 hours if not started
- Completed results: Available for 24-48 hours (unless using cloud storage)
- Failed tasks: Retry data available for 24 hours