Managed Agents

Overview

Managed Web Search Agents are turn‑key data‑extraction agents operated by Nimble. Unlike generic crawlers, Nimble’s Web Search Agents render websites, follow links and interpret dynamic pages – even those personalized or localized – to produce clean, structured data in real time. These agents act as a complete ETL layer, browsing, parsing and streaming high‑quality data directly into your analytics or AI stack. Managed Agents are fully maintained by Nimble: infrastructure, scaling and monitoring are handled for you. The service provides real‑time data feeds backed by an SLA, schema and logic updates, and 24×7 monitoring. Because the agents are customized for you, they deliver deeper, more reliable web intelligence than one‑size‑fits‑all scrapers. What’s included:

Fully managed infrastructure and execution: Nimble maintains the agent fleet, proxy management and headless browsers, ensuring reliable operation and scalability.
Real‑time, structured data: Agents interpret pages rather than simply downloading HTML, rendering JavaScript, following links and adapting to personalised or localized content .
24/7 monitoring and issue resolution: Dedicated operations teams monitor agents around the clock to detect and remediate failures quickly.
SLA‑backed delivery: Guaranteed process initiation, infrastructure availability and issue response times (see SLA section).
Schema and logic maintenance: Nimble updates extraction logic and schema definitions as websites change, so your pipelines stay current.

Integration Guide

This section explains how to configure inputs and outputs, provision cross‑account access and schedule your agent.

Input Configuration

S3 path structure

Agents accept input files from your designated Amazon S3 bucket. Organize uploads using a date‑partitioned path, replacing <bucket‑name> and <agent‑name> with your values:

s3://<bucket-name>/input/<agent-name>/dt=YYYY-MM-DD/<filename>.<ext>

For example, to deliver a CSV file on 01 January 2026 for an agent named product-search, the full path would be:

s3://nimble-customer-data/input/product-search/dt=2026-01-01/products.csv

AWS Role Provisioning

To allow Nimble to read your input bucket and write results back, you need to create cross‑account IAM roles. The AWS Knowledge Center describes two options for granting cross‑account access: using IAM policies and bucket policies or using roles . The simplest pattern for Agents is to create a read‑only IAM role in your AWS account for inputs and a write‑only IAM role for outputs. Each role’s trust policy should allow the Nimble AWS account (provided during onboarding) to assume the role.

Create an IAM role for input files. Grant the role s3:GetObject and s3:ListBucket permissions on the input prefix. A minimal policy might look like:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::<your-input-bucket>",
        "arn:aws:s3:::<your-input-bucket>/input/*"
      ]
    }
  ]
}

Use a trust policy to permit Nimble’s AWS account (shared during onboarding) to assume this role .

Create an IAM role for output files. Grant the role s3:PutObject and s3:ListBucket permissions on the output prefix. For example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::<your-output-bucket>",
        "arn:aws:s3:::<your-output-bucket>/output/*"
      ]
    }
  ]
}

Add Nimble as a trusted entity. In each role’s trust policy, specify Nimble’s AWS account ID and external ID, which you will receive during onboarding. This allows the Nimble service to assume the role without granting broader access.
Share the role ARNs. Provide the Amazon Resource Names (ARNs) of the roles to your Nimble account team so the service can be configured to use them.

Using cross‑account roles instead of access keys improves security and auditing. For more details on cross‑account S3 access, see AWS’s guidance on using bucket policies and IAM roles.

Supported Input Formats

Format	Extension	Notes
CSV	`.csv`	UTF‑8 encoding required; must include a header row.
Parquet	`.parquet`	Preferred for large data sets due to columnar storage.
JSON Lines	`.jsonl`	One JSON object per line

Input File Requirements

Required fields: Your file must include all fields defined in the agent’s input schema. Fields should be named exactly as specified (case‑sensitive) to avoid ingestion errors.
File size: Maximum 1 GB per file.
Empty files: Empty files are ignored. Each file must contain at least one valid record.

Output Configuration

S3 Path Structure

Agents deliver results to your output bucket in an hourly partitioned structure. When you choose cloud delivery, Nimble writes directly to your S3 bucket, so you must grant the output IAM role the s3:PutObject and s3:ListBucket permissions described above. The general path pattern for S3 output is:

s3://<bucket‑name>/output/<agent‑name>/dt=YYYY‑MM‑DD/hr=HH/<filename>.<ext>

For instance, if your agent is named product-search and it delivers results at 14:00 UTC on 01 January 2026, the output might be saved to:

s3://nimble-customer-data/output/product-search/dt=2026-01-01/hr=14/results_001.parquet

Supported Output Formats

The default output format is Parquet. Additional formats may be enabled upon request.

Format	Extension	Best For
Parquet	`.parquet`	Columnar analytics and data warehouses (e.g., Databricks, Snowflake).
CSV	`.csv`	Integration with spreadsheet tools or simple pipelines.
JSON Lines	`.jsonl`	Streaming ingestion and API integrations.

File Naming Convention

Output files follow this naming pattern:

<agent-name>_<batch-id>_<sequence>.<ext>

agent‑name: Identifies which Agent produced the file.
batch‑id: A unique identifier for the processing batch. A single batch may produce multiple output files.
sequence: The sequence number of the file within the batch, padded to three digits (001, 002, etc.).
ext: The file extension (parquet, csv or jsonl).

Scheduling & cadence

Agents run on a schedule that you configure during onboarding. Schedules are defined using UTC time. Available cadences include:

Hourly: The agent executes once every hour.
Multiple times per day: For example, every 4 hours or at specific times.
Weekly: Choose specific days and times to run.
Custom intervals: Tailored schedules such as every 36 hours or only on business days.

Nimble provides an add‑on for custom scheduling. During onboarding, define your preferred execution window (for example, between 01:00 and 05:00 UTC). Agents can be paused or rescheduled upon request via your account team.

Service level agreement (SLA)

Data retention & recovery

Agent output files remain available in your delivery bucket for a defined retention period based on your subscription tier. Contact your account team for plan details. If a file is lost or corrupted within the retention window, Nimble can re‑deliver it upon request.

Coverage commitments

The Managed Agent SLA covers the following areas:

Process initiation: Agents start processing at the scheduled time.
Infrastructure availability: Nimble’s underlying infrastructure – including proxies, headless browsers and scheduler – will be available and functioning.
Issue response time: Operations teams will acknowledge and begin investigating issues within a defined timeframe.

The SLA does not cover:

Target website availability or blocking: If a target site is unavailable, blocks traffic or changes structure outside of Nimble’s control, this may impact data completeness.
Input data quality problems: Missing or invalid fields, malformed files or unsupported formats may cause job failures.
Customer‑side S3 access issues: Misconfigured IAM roles or network restrictions that prevent Nimble from reading or writing to your bucket.

Change Management & Support

Schema & logic changes

Websites evolve over time. Nimble continuously monitors target sites and updates the agent’s extraction logic and schemas as needed. If you need to modify the input schema, add new output fields or change the target URLs, follow this change management process:

Submit a change request: Contact your account team or open a support ticket with a description of the change.
Feasibility review: Nimble evaluates the request, including complexity and impact. Expect an initial response within 1–2 business days.
Testing in staging: Changes are deployed in a staging environment and validated using sample data.
Scheduled deployment: Once testing is complete, changes are rolled out to production with at least 48 hours notice.

Common change types include:

Input schema modifications: Adding or removing fields, adjusting data types.
Output field additions: Including new attributes or computed metrics in the output.
Extraction logic updates: Updating selectors or parsing rules to accommodate website changes.
Target URL changes: Switching to different pages or domains.

Support Channels

Channel	Response Time	Use For
Email	24 hours	General inquiries, change requests, non‑urgent issues.
Slack (if configured)	4 hours	Urgent issues, delivery questions, troubleshooting.

Introduction

Web Tools

Agentic

SDKs

Guides

Admin

Overview

Integration Guide

Input Configuration

S3 path structure

AWS Role Provisioning

Supported Input Formats

Input File Requirements

Output Configuration

S3 Path Structure

Supported Output Formats

File Naming Convention

Scheduling & cadence

Service level agreement (SLA)

Data retention & recovery

Coverage commitments

Change Management & Support

Schema & logic changes

Support Channels

Introduction

Web Tools

Agentic

SDKs

Guides

Admin

​Overview

​Integration Guide

​Input Configuration

​S3 path structure

​AWS Role Provisioning

​Supported Input Formats

​Input File Requirements

​Output Configuration

​S3 Path Structure

​Supported Output Formats

​File Naming Convention

​Scheduling & cadence

​Service level agreement (SLA)

​Data retention & recovery

​Coverage commitments

​Change Management & Support

​Schema & logic changes

​Support Channels

Overview

Integration Guide

Input Configuration

S3 path structure

AWS Role Provisioning

Supported Input Formats

Input File Requirements

Output Configuration

S3 Path Structure

Supported Output Formats

File Naming Convention

Scheduling & cadence

Service level agreement (SLA)

Data retention & recovery

Coverage commitments

Change Management & Support

Schema & logic changes

Support Channels