Skip to main content
This integration is ideal for data engineers who need live web data inside their Databricks pipelines — without leaving SQL or building a custom API integration. Nimble ships as a set of SQL-native table functions in Unity Catalog. SELECT … FROM nimble_search(…) and SELECT … FROM nimble_extract(…) run live web search and page extraction directly inside a query, and results compose with CREATE TABLE … AS to land in a governed Delta table — access-controlled, lineage-tracked, time-travelable. The same shape exposes every Web Search Agent for structured, per-row enrichment.
Live web search and page extraction, straight from SQL
-- Web search → rows of (title, description, url, content)
SELECT title, url
FROM   nimble_integration.tools.nimble_search('AI agent frameworks news', 5, 'news');

-- Enrich a table of URLs and persist as a governed Delta table
CREATE TABLE my_catalog.web.page_text AS
SELECT u.id, x.url, x.content
FROM   my_catalog.web.urls u,
       LATERAL nimble_integration.tools.nimble_extract(u.url) x;
Every capability is a table function — call it in the FROM clause. No stored procedures, no Python middle-tier, no http_request() + from_json plumbing. Notebooks, dashboards, dbt models, Databricks Workflows, and Databricks Genie all consume the output transparently.

What you get

  • Search and extract as first-class SQL. nimble_search(query, …) returns rows of title, description, url, content; nimble_extract(url, …) returns a row of url, format, content (markdown / html / links). Both hit the live web through Nimble at query time.
  • Results land as governed Delta tables. Wrap any call in CREATE TABLE … AS and the output is a managed Unity Catalog table — Delta time travel, column/table lineage, and GRANT-based access control come for free.
  • Every Web Search Agent, one function. nimble_agent_run(agent, params_json) runs any agent (google_maps_search, amazon_serp, linkedin_company_details, …) and returns the parsed payload as a navigable VARIANT. nimble_agent_list() enumerates the catalog.
  • Per-row error isolation. Each tool yields zero rows on failure instead of raising, so a batch CREATE TABLE … AS over thousands of inputs is never aborted by one bad row.
  • Genie-native. Databricks Genie registers table functions as tools, so each function is directly registrable as a Genie agent tool — the function and column COMMENTs are the spec the LLM reads.
  • Stays inside your Databricks account. Only authorized outbound calls hit Nimble. The API key lives in a Databricks secret scope and is injected server-side — it never appears in a function signature or at a call site.

Prerequisites

  • A Databricks workspace with a serverless SQL warehouse.
  • Privilege to create a catalog/schema (or an existing schema you own) plus CREATE FUNCTION on it.
  • The Databricks CLI v0.205+, authenticated (databricks auth login).
  • A Nimble API key (sign up free).
Enable outbound networking for the warehouse — or every tool returns zero rows. A serverless SQL warehouse blocks Python UDF/UDTF egress by default. Two one-time steps fix it:
  1. Workspace Settings → Previews → enable “Enable networking for isolated workloads in Serverless SQL Warehouses” (Public Preview).
  2. Cold-restart the warehouse — fully Stop it, then Start it. A plain “restart” is not enough; the new network config is only picked up on a cold start.
Symptom of skipping this: a tool returns zero rows and the underlying request fails with Connection refused (Errno 111) even though DNS resolves. The account-level serverless egress network policy is a separate control and is usually already “allow all” — it is not this setting.Workspaces that can’t enable the preview can use the http_request() fallback instead.

Install in Databricks

1

Store the Nimble API key in a secret scope

The tools read the key from secret('nimble','api_key'), so no token ever appears in a function body or at a call site. Create the scope once per workspace.
databricks secrets create-scope nimble
databricks secrets put-secret  nimble api_key   # paste the raw token, then Ctrl-D
databricks secrets put-acl     nimble users READ
READ lets every workspace user call secret('nimble','api_key') from SQL — values come back redacted in any display or log, so the plaintext can’t be extracted. Replace users with a group name to scope access more tightly.
2

Create the catalog and schemas

nimble_search, nimble_extract, and the agent functions all live in nimble_integration.tools. The DDL is re-runnable (CREATE … IF NOT EXISTS).
CREATE CATALOG IF NOT EXISTS nimble_integration
  COMMENT 'Nimble web-data integration: UC table functions wrapping Nimble APIs and agents.';

CREATE SCHEMA IF NOT EXISTS nimble_integration.tools
  COMMENT 'Table functions wrapping Nimble APIs / agents. Callable from Genie, agents, dashboards.';

CREATE SCHEMA IF NOT EXISTS nimble_integration.recipes
  COMMENT 'Sample inputs and end-to-end recipe outputs (Delta tables) built on the tools schema.';
If your workspace uses Unity Catalog Default Storage, the plain CREATE CATALOG may return “Metastore storage root URL does not exist.” Create the catalog from Catalog Explorer (Create Catalog → Default Storage), or use the explicit CREATE CATALOG nimble_integration MANAGED LOCATION '<uri>' form pointed at one of your UC external locations.

cookbook/databricks/01_setup.sql

Catalog + schema scaffolding, with the Default Storage fallbacks documented inline
3

Deploy the tools

Each tool is a Python UDTF (_nimble_xxx) that makes the HTTP call, behind a thin SQL RETURNS TABLE wrapper (nimble_xxx) that supplies defaults and injects the API key. The cookbook ships helpers/deploy_sql.py to post each multi-statement file to the warehouse (it treats the $$ … $$ UDTF bodies as opaque).
WH=<your-serverless-warehouse-id>   # databricks warehouses list

python3 databricks/helpers/deploy_sql.py --file databricks/01_setup.sql --warehouse "$WH"
for f in databricks/tools/*.sql; do
    python3 databricks/helpers/deploy_sql.py --file "$f" --warehouse "$WH"
done
nimble_search defaults to search_depth = lite (fast / deep are premium and not enabled on every account). The wrapper is what you call and what Genie registers — there is no scalar twin.

cookbook/databricks/tools/

nimble_search.sql, nimble_extract.sql, nimble_agent_list.sql, nimble_agent_run.sql — full UDTF + wrapper for each, with COMMENTs and smoke tests
4

Verify with a smoke test

Both should return a populated result within ~30 seconds.
-- Web search → expect > 0 rows
SELECT count(*) AS n
FROM   nimble_integration.tools.nimble_search('AI agents news', 5);

-- Page extraction → expect a large character count
SELECT length(content) AS md_len
FROM   nimble_integration.tools.nimble_extract('https://www.nimbleway.com');
If a tool returns zero rows, re-check the networking Preview + cold restart from Prerequisites.

The tools

FunctionWhat it does
nimble_search(query, max_results, focus, search_depth, …)Live web search → rows of title, description, url, content.
nimble_extract(url, render, format, …)Fetch + parse one URL → a row of url, format, content (markdown / html / links).
nimble_agent_list(privacy, managed_by, …)The Nimble agent catalog → one row per agent.
nimble_agent_run(agent, params_json, localization)Run any agent → one row: response envelope + parsing (VARIANT).
Two functions per tool. Each capability is a Python UDTF (_nimble_xxx, does the HTTP call and yields rows) behind a thin SQL RETURNS TABLE wrapper (nimble_xxx). The wrapper does three things the UDTF can’t: supply DEFAULT parameter values (UC Python UDTFs reject them), inject secret('nimble','api_key') once (a UDTF can’t read secret() itself), and — for nimble_agent_runparse_json() the payload into a navigable VARIANT (a Python UDF/UDTF can’t return VARIANT). You only ever call the public wrapper.

Common workflows

Land web data as a governed Delta table

The payoff of SQL-native web data: results compose with CREATE TABLE … AS and inherit Unity Catalog governance — access control, lineage, and time travel. Enrich a table of company homepages into a managed Delta table in one statement:
CREATE TABLE my_catalog.web.company_pages AS
SELECT c.company_id,
       c.homepage,
       x.content            AS homepage_markdown,
       current_timestamp()  AS fetched_at
FROM   my_catalog.crm.companies c,
       LATERAL nimble_integration.tools.nimble_extract(c.homepage) x;
Because nimble_extract yields zero rows on a failed fetch (instead of raising), the batch over every company is never aborted by one bad URL. The result is a real Unity Catalog Delta table: DESCRIBE HISTORY it for time travel, GRANT SELECT it to a group, and trace its column lineage back through nimble_extract to companies.

Local business discovery at scale

Build a fresh, location-aware view of every business that matters for a category in a market: feed a location_queries table of research terms and get back the ranked Google Maps landscape — one row per business per query. Useful for market sizing, account-universe building, competitive territory mapping, and POI databases. google_maps_search returns ~20 results per query under parsing:entities:SearchResult. nimble_agent_run is a table function whose parsing column is a VARIANT, so navigate it with : paths and explode the results array with LATERAL variant_explode — one row per business — then persist with CREATE TABLE … AS.
CREATE OR REPLACE TABLE nimble_integration.recipes.local_businesses AS
SELECT
    q.query,
    q.category,
    biz:position::int                          AS position,
    biz:title::string                          AS name,
    biz:business_category[0]::string           AS business_category,
    biz:street_address::string                 AS street_address,
    biz:city::string                           AS city,
    biz:review_summary:overall_rating::double  AS rating,
    coalesce(biz:review_summary:review_count::int,
             biz:number_of_reviews::int)       AS review_count,
    biz:price_level::string                    AS price_level,
    biz:place_information:website_url::string  AS website,
    biz:business_status::string                AS business_status,
    biz:sponsored::boolean                     AS sponsored,
    current_timestamp()                        AS enriched_at
FROM nimble_integration.recipes.location_queries q,
     LATERAL nimble_integration.tools.nimble_agent_run(
         'google_maps_search',
         to_json(named_struct('query', q.query))
     ) a,
     LATERAL variant_explode(a.parsing:entities:SearchResult) AS e(pos, key, biz)
WHERE a.status = 'success';
Input: location_queries
querycategory
coffee shops in Williamsburg Brooklyncoffee
pizza restaurants in Chicago Looppizza
gyms in Austin Texasgym
Output: local_businesses (one row per business per refresh)
querypositionnamecityratingreview_countprice_level
coffee shops in Williamsburg Brooklyn1DevociónBrooklyn4.52481$$
coffee shops in Williamsburg Brooklyn2Sey CoffeeBrooklyn4.61840$$
pizza restaurants in Chicago Loop1Giordano’sChicago4.410618$$
google_maps_search nests results under parsing:entities:SearchResult — not a flat array. LATERAL variant_explode(a.parsing:entities:SearchResult) is the correct unfold path. Different agents nest results differently; inspect once with SELECT parsing FROM nimble_agent_run(<agent>, …) before pinning the explode target and projection.

cookbook/databricks/recipes/local_business_universe.sql

Full runnable recipe: the location_queries seed, the stitch-and-explode CTAS, sanity-check queries, and the governance proof points (time travel + lineage + GRANT)

Structured per-row enrichment with any agent

When there is no domain-specific need beyond a single identifier, nimble_agent_run enriches each input row with that agent’s typed payload. PDP-style agents (Amazon PDP, LinkedIn Profile, Walmart PDP) return a single object; navigate it inline with : paths and cast leaves with ::type.
SELECT
    p.sku,
    a.parsing:brand::string                    AS brand,
    a.parsing:product_title::string            AS title,
    a.parsing:web_price::decimal(10, 2)        AS price,
    a.parsing:availability::boolean            AS in_stock,
    a.parsing:average_of_reviews::double       AS rating
FROM   my_catalog.crm.products p,
       LATERAL nimble_integration.tools.nimble_agent_run(
           'amazon_pdp',
           to_json(named_struct('asin', p.amazon_asin))
       ) a
WHERE  p.amazon_asin IS NOT NULL
  AND  a.status = 'success';
One input row → one output row, no variant_explode needed. Field names inside parsing differ per agent (product_title vs. product_name, web_price vs. price) — discover the catalog with SELECT * FROM nimble_agent_list() and inspect a single response with SELECT parsing FROM nimble_agent_run(<agent>, …) before pinning a projection.

Schedule recurring enrichment

Each tool runs in a single SQL statement, so a Databricks Workflow can execute the enrichment directly on a schedule — point a SQL task at the CREATE OR REPLACE TABLE … AS statement above and set a cron trigger. No wrapper procedure required. The same statement also drops straight into a dbt model body for orchestrated, incremental refreshes.

Fallback: http_request() without the networking preview

If a workspace can’t enable UDTF egress, the http_request() SQL builtin reaches the Nimble API through a different, always-on egress path — at the cost of from_json parsing in SQL and no per-row isolation. It needs a one-time UC HTTP CONNECTION (requires the CREATE CONNECTION privilege):
CREATE CONNECTION IF NOT EXISTS nimble_api TYPE HTTP
OPTIONS (
    host         'https://sdk.nimbleway.com',
    port         '443',
    base_path    '/',
    bearer_token secret('nimble', 'api_key')
);
A fallback query then calls http_request(conn => 'nimble_api', method => 'POST', path => '/v1/search', headers => map('Content-Type','application/json'), json => to_json(named_struct(…))), gates on response.status_code, and parses response.text with from_json. See ADDING_A_TOOL.md in the cookbook for the full pattern.

Roll out across your organization

  • Tune for your Nimble rate-limit tier. Each tool call is one upstream request — one per input row. For high-cardinality inputs (thousands of rows), watch for 429s; because failures yield zero rows, a single retry pass over the rows that produced no output is the typical recovery. Nimble’s rate-limits page lists per-tier ceilings.
  • Right-size the warehouse. A small serverless warehouse handles daily refreshes over a few hundred input rows. For larger enrichments (10K+ rows), step up the size only for the duration of the scheduled job.
  • Surface failures separately. Because tools yield zero rows on failure, an input row with no matching output row is a failed call — left-join the input table against the result to find the gaps and audit them, rather than grepping job logs.
  • Don’t ship guesses. Field names inside parsing differ per agent. Inspect the shape once per agent (SELECT parsing FROM nimble_agent_run(<agent>, …)) before pinning a projection into a dbt model or production view.

Use it from Databricks Genie

Genie registers table functions as tools, so each public function is directly registrable — point a Genie space at nimble_integration.tools.nimble_search (and the others). The function COMMENT and per-column COMMENTs are what the LLM reads to decide when and how to call each tool. The cookbook ships helpers/create_genie_space.py to create a space wired to all four functions programmatically.

Resources

Nimbleway/cookbook (databricks/)

Every SQL file on this page, the one-time prereqs, the local-business recipe, and the deploy helpers

Databricks MCP Server

The agent-side surface: Nimble MCP Server from the Databricks Marketplace, wired into Genie and AI Playground

Nimble Web Search Agents

Overview of WSAs — what they are, when to use them, how they parse

Nimble Agent Gallery

Browse the catalog of pre-built agents and their per-agent parsing shapes

Databricks UC Python UDFs/UDTFs

How Unity Catalog tabular UDFs work — eval() handler, RETURNS TABLE, and network egress

Databricks Genie

Register these table functions as tools in a Genie space for natural-language web data