Batch Processing
Nimble's SERP API can scale up dramatically by using batch requests with up to 1,000 URLs per batch. Below, we outline three real-world use cases before reviewing the full parameter list, response examples, and response codes.
Example one - collecting data for multiple search terms
In this first example, we'll collect data for several unique search terms. To do so, we set the terms we wish to search and collect in the query
field of the requests
object.
curl -X POST 'https://api.webit.live/api/v1/batch/serp' \
--header 'Authorization: Basic <credential string>' \
--header 'Content-Type: application/json' \
--data-raw '{
"requests": [
{ "query": "Coffee" },
{ "query": "Tea" },
{ "query": "Biscuits" }
],
"search_engine": "google_search",
"storage_type": "s3",
"storage_url": "s3://Your.Repository.Path/",
"callback_url": "https://your.callback.url/path"
}'
Parameters that are placed outside the requests
object, such as search_engine
,storage_type
, storage_url
, and callback_url
, are automatically applied as defaults to all defined requests.
If a parameter is set both inside and outside the requests
object, the value inside the request overrides the one outside.
Example two - collecting multiple searches from different countries
In this example, we'll again search for several different terms, but this time, we'll also use a different location for each search. To achieve this, we'll take advantage of the requests object, which allows us to set any parameter inside each request:
curl -X POST 'https://api.webit.live/api/v1/batch/serp' \
--header 'Authorization: Basic <credential string>' \
--header 'Content-Type: application/json' \
--data-raw '{
"requests": [
{ "query": "Coffee, "country": "US", "locale": "en-US" },
{ "query": "Tea", "country": "FR", "locale": "fr" },
{ "query": "Biscuits", "country": "GR", "locale": "de" },
{ "query": "Eggs" }
],
"country": "CA",
"locale": "ca",
"search_engine": "google_search",
"storage_type": "s3",
"storage_url": "s3://Your.Repository.Path/",
"callback_url": "https://your.callback.url/path"
}'
For the above request, each search would be performed from the corresponding country. "Eggs" does not have a country set in its request, and thus will default to the country defined outside the requests
object (CA - Canada). If no default country had been set, by default the request would have used a randomly selected country.
Example three - searching for the same phrase with different engines
Any parameter can be defined inside and outside the requests
object. We can take advantage of this by defining our parameters in the requests
object, and setting our search term once outside of it as a default. For example:
curl -X POST 'https://api.webit.live/api/v1/batch/serp' \
--header 'Authorization: Basic <credential string>' \
--header 'Content-Type: application/json' \
--data-raw '{
"requests": [
{ "search_engine": "google_search" },
{ "search_engine": "bing_search" },
{ "search_engine": "yandex_search" },
],
"query": "Coffee",
"storage_type": "s3",
"storage_url": "s3://Your.Repository.Path/",
"callback_url": "https://your.callback.url/path"
}'
In the above example, three searches would be performed for the same phrase of "Coffee", but each time with a different search engine.
Request Options
Batch requests use the same parameters as asynchronous requests, with the exception of the requests
object.
requests
Optional
Object array
Allows for defining custom parameters for each request within the bulk. Any of the parameters below can be used in an individual request.
query
Required
String
The term or phrase to search for.
search_engine
Required
Enum: google_search | bing_search | yandex_search
The search engine from which to collect results.
tab
Optional (default = null)
Enum: news
When using google_search
, setting tab
to news
will provide Google News results instead of standard search results.
num_results
Optional
Integer
Set the mount of retuned search results
country
Optional (default = all)
String
Country used to access the target URL, use ISO Alpha-2 Country Codes i.e. US, DE, GB
state
Optional
String
For targeting US states (does not include regions or territories in other countries). Two-letter state code, e.g. NY, IL, etc.
city
Optional
String
For targeting large cities and metro areas around the globe. When targeting major US cities, you must include state as well. Click here for a list of available cities.
locale
Optional (default = en)
String
String | LCID standard locale used for the URL request. Alternatively, user can use auto
for automatic locale based on country targeting.
parse
Optional (default = true)
Boolean
Instructs Nimble whether to structure the results into a JSON format or return the raw HTML.
ads_optimization
Optional (default = false)
Boolean
This flag increases the number of paid ads (sponsored ads) in the results. It works by running the requests in 'incognito' mode.
storage_type
Optional
ENUM: s3 | gs
Use s3 for Amazon S3 and gs for Google Cloud Platform. Leave blank to enable Push/Pull delivery.
storage_url
Optional
String
Repository URL: s3://Your.Bucket.Name/your/object/name/prefix/ | Output will be saved to TASK_ID.json Leave blank to enable Push/Pull delivery.
callback_url
Optional
String
A url to callback once the data is delivered. Nimble APIs will send a POST request to the callback_url with the task details once the task is complete (this “notification” will not include the requested data).
storage_compress
Optional (default = false)
Boolean
When set to true
, the response saved to the storage_url
will be compressed using GZIP format. This can help reduce storage size and improve data transfer efficiency. If not set or set to false
, the response will be saved in its original uncompressed format.
storage_object_name
Optional (default = task_id
)
String
Allows setting a custom name for the stored object instead of the default task ID.
Please add Nimble's system/service user to your GCS or S3 bucket to ensure that data can be delivered successfully.
Response
Initial Response
Batch requests operate asynchronously, and treat each request as a separate task. The result of each task is stored in a file, and a notification is sent to the provided callback any time an individual task is completed.
{
"batch_id": "7a07a96d-c402-4d98-a17f-4ecb390d11a3",
"batch_size": 3,
"tasks": [
{
"batch_id": "7a07a96d-c402-4d98-a17f-4ecb390d11a3",
"id": "2e508d43-8b02-4fc0-96c7-0968ab454a0c",
"state": "pending",
"output_url": "s3://Your.Repository.Path/2e508d43-8b02-4fc0-96c7-0968ab454a0c.json",
"callback_url": "https://your.callback.url/path",
"status_url": "https://api.webit.live/api/v1/tasks/2e508d43-8b02-4fc0-96c7-0968ab454a0c",
"created_at": "2022-07-24T08:09:23.205Z",
"modified_at": "2022-07-24T08:09:23.205Z",
"input": {...},
"status_code": 200
},
{
"batch_id": "7a07a96d-c402-4d98-a17f-4ecb390d11a3",
"id": "63cc3bd5-01b4-4787-90a2-f382b9960c77",
"state": "pending",
...
},
{
"batch_id": "7a07a96d-c402-4d98-a17f-4ecb390d11a3",
"id": "4cb39bbf-5580-4c50-8ed4-4a7905e2ec52",
"state": "pending",
...
}
]
}
Checking batch progress and status
POST https://api.webit.live/api/v1/batches/<batch_id>/progress
Like asynchronous tasks, the status of a batch is available for 24 hours.
curl -X GET 'https://api.webit.live/api/v1/batches/<batch_id>/progress' \
--header 'Authorization: Basic <credential string>'
Response
The progress of a batch is reported in percentages.
{
"status": "success",
"completed": false,
"progress": 0.333333
}
Once a batch is finished, its progress will be reported as “1”.
{
"status": "success",
"completed": true,
"progress": 1
}
Checking Batch List
To check the status of Batch list, use the endpoint
Path https://api.webit.live/api/v1/batches/list
Parameters
limit
Optional (default = 100)
Number | List item limit
cursor
Optional
String | Cursor for pagination.
Example Request:
curl -X GET 'https://api.webit.live/api/v1/batches/list?limit=20' \
--header 'Authorization: Basic <credential string>'
Example Response:
{
"data": [
...
],
"pagination": {
"hasNext": true,
"nextCursor": ...,
"total": 102
}
}
For pagination, run until pagination.hasNext = false or pagination.nextCursor = null
Retrieving Batch Summary
One a batch has finished, it’s possible to return a summary of the completed tasks, by using the following endpoint:
GET https://api.webit.live/api/v1/batches/<batch_id>
For example:
curl -X GET 'https://api.webit.live/api/v1/batches/<batch_id>' \
--header 'Authorization: Basic <credential string>'
The response object lists the status of the overall batch, as well as the individual tasks and their details:
Response
{
"status": "success",
"tasks": [
{
"batch_id": "7a07a96d-c402-4d98-a17f-4ecb390d11a3",
"id": "2e508d43-8b02-4fc0-96c7-0968ab454a0c",
"state": "success",
"output_url": "s3://Your.Repository.Path/2e508d43-8b02-4fc0-96c7-0968ab454a0c.json",
"callback_url": "https://your.callback.url/path",
"status_url": "https://[base_url]/api/v1/tasks/2e508d43-8b02-4fc0-96c7-0968ab454a0c",
"created_at": "2022-07-24T08:09:23.205Z",
"modified_at": "2022-07-24T08:10:27.244Z",
"input": {...},
"status_code": 200
},
{
"batch_id": "7a07a96d-c402-4d98-a17f-4ecb390d11a3",
"id": "63cc3bd5-01b4-4787-90a2-f382b9960c77",
"state": "success",
"output_url": "s3://Your.Repository.Path/63cc3bd5-01b4-4787-90a2-f382b9960c77.json",
"callback_url": "https://your.callback.url/path",
"status_url": "https://[base_url]/api/v1/tasks/63cc3bd5-01b4-4787-90a2-f382b9960c77",
"created_at": "2022-07-24T08:09:23.205Z",
"modified_at": "2022-07-24T08:10:27.973Z",
"input": {...},
"status_code": 200
},
{
"batch_id": "7a07a96d-c402-4d98-a17f-4ecb390d11a3",
"id": "4cb39bbf-5580-4c50-8ed4-4a7905e2ec52",
"state": "success",
"output_url": "s3://Your.Repository.Path/4cb39bbf-5580-4c50-8ed4-4a7905e2ec52.json",
"callback_url": "https://your.callback.url/path",
"status_url": "https://[base_url]/api/v1/tasks/4cb39bbf-5580-4c50-8ed4-4a7905e2ec52",
"created_at": "2022-07-24T08:09:23.205Z",
"modified_at": "2022-07-24T08:10:30.292Z",
"input": {...},
"status_code": 200
}
],
"completed": true,
"progress": 1
}
500 error
{
"status": "error",
"task_id": "<task_id>",
"msg": "can't download the query response - please try again"
}
400 Input Error
{
"status": "failed",
"msg": error
}
Response Codes
200
OK
400
The requested resource could not be reached
401
Unauthorized/invalid credental string
500
Internal service error
501
An error was encountered by the proxy service
Last updated