Overview
Managed Web Search Agents are turn‑key data‑extraction agents operated by Nimble. Unlike generic crawlers, Nimble’s Web Search Agents render websites, follow links and interpret dynamic pages – even those personalized or localized – to produce clean, structured data in real time. These agents act as a complete ETL layer, browsing, parsing and streaming high‑quality data directly into your analytics or AI stack. Managed Agents are fully maintained by Nimble: infrastructure, scaling and monitoring are handled for you. The service provides real‑time data feeds backed by an SLA, schema and logic updates, and 24×7 monitoring. Because the agents are customized for you, they deliver deeper, more reliable web intelligence than one‑size‑fits‑all scrapers. What’s included:- Fully managed infrastructure and execution: Nimble maintains the agent fleet, proxy management and headless browsers, ensuring reliable operation and scalability.
- Real‑time, structured data: Agents interpret pages rather than simply downloading HTML, rendering JavaScript, following links and adapting to personalised or localized content .
- 24/7 monitoring and issue resolution: Dedicated operations teams monitor agents around the clock to detect and remediate failures quickly.
- SLA‑backed delivery: Guaranteed process initiation, infrastructure availability and issue response times (see SLA section).
- Schema and logic maintenance: Nimble updates extraction logic and schema definitions as websites change, so your pipelines stay current.
Integration Guide
This section explains how to configure inputs and outputs, provision cross‑account access and schedule your agent.Input Configuration
S3 path structure
Agents accept input files from your designated Amazon S3 bucket. Organize uploads using a date‑partitioned path, replacing <bucket‑name> and <agent‑name> with your values:AWS Role Provisioning
To allow Nimble to read your input bucket and write results back, you need to create cross‑account IAM roles. The AWS Knowledge Center describes two options for granting cross‑account access: using IAM policies and bucket policies or using roles . The simplest pattern for Agents is to create a read‑only IAM role in your AWS account for inputs and a write‑only IAM role for outputs. Each role’s trust policy should allow the Nimble AWS account (provided during onboarding) to assume the role.- Create an IAM role for input files. Grant the role s3:GetObject and s3:ListBucket permissions on the input prefix. A minimal policy might look like:
- Create an IAM role for output files. Grant the role s3:PutObject and s3:ListBucket permissions on the output prefix. For example:
- Add Nimble as a trusted entity. In each role’s trust policy, specify Nimble’s AWS account ID and external ID, which you will receive during onboarding. This allows the Nimble service to assume the role without granting broader access.
- Share the role ARNs. Provide the Amazon Resource Names (ARNs) of the roles to your Nimble account team so the service can be configured to use them.
Supported Input Formats
| Format | Extension | Notes |
|---|---|---|
| CSV | .csv | UTF‑8 encoding required; must include a header row. |
| Parquet | .parquet | Preferred for large data sets due to columnar storage. |
| JSON Lines | .jsonl | One JSON object per line |
Input File Requirements
- Required fields: Your file must include all fields defined in the agent’s input schema. Fields should be named exactly as specified (case‑sensitive) to avoid ingestion errors.
- File size: Maximum 1 GB per file.
- Empty files: Empty files are ignored. Each file must contain at least one valid record.
Output Configuration
S3 Path Structure
Agents deliver results to your output bucket in an hourly partitioned structure. When you choose cloud delivery, Nimble writes directly to your S3 bucket, so you must grant the output IAM role thes3:PutObject and s3:ListBucket permissions described above. The general path pattern for S3 output is:
Supported Output Formats
The default output format is Parquet. Additional formats may be enabled upon request.| Format | Extension | Best For |
|---|---|---|
| Parquet | .parquet | Columnar analytics and data warehouses (e.g., Databricks, Snowflake). |
| CSV | .csv | Integration with spreadsheet tools or simple pipelines. |
| JSON Lines | .jsonl | Streaming ingestion and API integrations. |
File Naming Convention
Output files follow this naming pattern:- agent‑name: Identifies which Agent produced the file.
- batch‑id: A unique identifier for the processing batch. A single batch may produce multiple output files.
- sequence: The sequence number of the file within the batch, padded to three digits (001, 002, etc.).
- ext: The file extension (parquet, csv or jsonl).
Scheduling & cadence
Agents run on a schedule that you configure during onboarding. Schedules are defined using UTC time. Available cadences include:- Hourly: The agent executes once every hour.
- Multiple times per day: For example, every 4 hours or at specific times.
- Weekly: Choose specific days and times to run.
- Custom intervals: Tailored schedules such as every 36 hours or only on business days.
Service level agreement (SLA)
Data retention & recovery
Agent output files remain available in your delivery bucket for a defined retention period based on your subscription tier. Contact your account team for plan details. If a file is lost or corrupted within the retention window, Nimble can re‑deliver it upon request.Coverage commitments
The Managed Agent SLA covers the following areas:- Process initiation: Agents start processing at the scheduled time.
- Infrastructure availability: Nimble’s underlying infrastructure – including proxies, headless browsers and scheduler – will be available and functioning.
- Issue response time: Operations teams will acknowledge and begin investigating issues within a defined timeframe.
- Target website availability or blocking: If a target site is unavailable, blocks traffic or changes structure outside of Nimble’s control, this may impact data completeness.
- Input data quality problems: Missing or invalid fields, malformed files or unsupported formats may cause job failures.
- Customer‑side S3 access issues: Misconfigured IAM roles or network restrictions that prevent Nimble from reading or writing to your bucket.
Change Management & Support
Schema & logic changes
Websites evolve over time. Nimble continuously monitors target sites and updates the agent’s extraction logic and schemas as needed. If you need to modify the input schema, add new output fields or change the target URLs, follow this change management process:- Submit a change request: Contact your account team or open a support ticket with a description of the change.
- Feasibility review: Nimble evaluates the request, including complexity and impact. Expect an initial response within 1–2 business days.
- Testing in staging: Changes are deployed in a staging environment and validated using sample data.
- Scheduled deployment: Once testing is complete, changes are rolled out to production with at least 48 hours notice.
- Input schema modifications: Adding or removing fields, adjusting data types.
- Output field additions: Including new attributes or computed metrics in the output.
- Extraction logic updates: Updating selectors or parsing rules to accommodate website changes.
- Target URL changes: Switching to different pages or domains.
Support Channels
| Channel | Response Time | Use For |
|---|---|---|
| 24 hours | General inquiries, change requests, non‑urgent issues. | |
| Slack (if configured) | 4 hours | Urgent issues, delivery questions, troubleshooting. |

