When to use async
Use async extract when you:- Batch processing: Extract data from hundreds or thousands of URLs
- Background jobs: Integrate extraction into scheduled or queued workflows
- Long-running operations: Handle complex browser actions or slow-loading sites
- Webhook integration: Get notified when extraction completes
- Cloud storage: Save results directly to S3 or Google Cloud Storage
Async requests are ideal for high-volume extraction where you don’t need immediate results. For single-page extractions where you need results right away, use the synchronous Extract API.
How it works
Submit extraction request
Send a POST request to
/v1/extract/async with your extraction parameters. The API returns immediately with a task ID.Optionally include:callback_urlto receive a webhook notification when completestorage_urlandstorage_type(s3/gs) to save results directly to cloud storage
Track task status
Use the task ID to check progress at
/v1/tasks/{task_id}. The task transitions through states: pending → completed or failed.API endpoint
Parameters
Async extract accepts all the same parameters as the synchronous extract endpoint, plus optional async-specific parameters:Core extraction parameters
All parameters from the Extract API are supported:url(required) - The webpage to extractformats- Output formats (html, markdown, text, screenshot)render- Enable JavaScript renderingdriver- Choose extraction engine (vx6, vx8, vx10, etc.)country,state,city- Geo-targeting optionsparse- Enable parsing with schemasparser- Define extraction schemabrowser_actions- Automate interactionsnetwork_capture- Capture network requests- And all other extract parameters…
Async-specific parameters
Storage provider for results. Use
s3 for Amazon S3 or gs for Google Cloud Storage.When specified, results are automatically saved to your cloud storage instead of Nimble’s servers.Options: s3, gsRepository URL where results will be saved. Format:
s3://Your.Bucket.Name/path/prefix/Results are saved as {TASK_ID}.json in the specified location.Example: s3://my-bucket/nimble-results/Compress results using GZIP before saving to storage. Reduces storage costs and transfer time.When
true, results are saved as {TASK_ID}.json.gzCustom name for the stored object instead of the default task ID.Example:
"my-custom-name" saves as my-custom-name.jsonWebhook URL to receive a POST request when the task completes.The API sends task metadata (without result data) to this URL when extraction finishes.Example:
https://your-api.com/webhooks/extract-completeResponse format
The async endpoint returns immediately with task information:Task states
| State | Description |
|---|---|
pending | Task queued, waiting to start |
running | Extraction in progress |
completed | Extraction finished successfully, results available |
failed | Extraction failed, check error details |
Example usage
Basic async extraction
Batch processing
Extract data from multiple URLs:Save to cloud storage
Store results directly in Amazon S3:Webhook notifications
Get notified when extraction completes:Checking task status
Use the Tasks API to check status:Retrieving results
Once the task is complete, retrieve results:Best practices
Polling intervals
Don’t poll too frequently - it wastes resources and may trigger rate limits:Error handling
Always handle failed tasks:Batch size limits
Don’t submit thousands of tasks simultaneously:Result retention
Results are typically retained for 24-48 hours. If you need longer retention:- Use cloud storage (
storage_url) to persist results indefinitely - Download results promptly after completion
- Implement your own archival system
Limitations
Task expiration
- Pending tasks: Expire after 24 hours if not started
- Completed results: Available for 24-48 hours (unless using cloud storage)
- Failed tasks: Retry data available for 24 hours
Concurrent tasks
Concurrent task limits vary by plan:| Plan | Max Concurrent Tasks |
|---|---|
| Starter | 10 |
| Growth | 50 |
| Pro | 100 |
| Enterprise | Custom |
Request limits
All extraction parameters have the same limits as synchronous requests:- Maximum timeout: 180000ms (3 minutes)
- Maximum URL length: 2048 characters
- Maximum cookies: 100 per request
- Maximum headers: 50 per request

