SELECT … FROM nimble_search(…) and SELECT … FROM nimble_extract(…) run live web search and page extraction directly inside a query, and results compose with CREATE TABLE … AS to land in a governed Delta table — access-controlled, lineage-tracked, time-travelable. The same shape exposes every Web Search Agent for structured, per-row enrichment.
Live web search and page extraction, straight from SQL
FROM clause. No stored procedures, no Python middle-tier, no http_request() + from_json plumbing. Notebooks, dashboards, dbt models, Databricks Workflows, and Databricks Genie all consume the output transparently.
What you get
- Search and extract as first-class SQL.
nimble_search(query, …)returns rows oftitle, description, url, content;nimble_extract(url, …)returns a row ofurl, format, content(markdown / html / links). Both hit the live web through Nimble at query time. - Results land as governed Delta tables. Wrap any call in
CREATE TABLE … ASand the output is a managed Unity Catalog table — Delta time travel, column/table lineage, andGRANT-based access control come for free. - Every Web Search Agent, one function.
nimble_agent_run(agent, params_json)runs any agent (google_maps_search,amazon_serp,linkedin_company_details, …) and returns the parsed payload as a navigableVARIANT.nimble_agent_list()enumerates the catalog. - Per-row error isolation. Each tool yields zero rows on failure instead of raising, so a batch
CREATE TABLE … ASover thousands of inputs is never aborted by one bad row. - Genie-native. Databricks Genie registers table functions as tools, so each function is directly registrable as a Genie agent tool — the function and column
COMMENTs are the spec the LLM reads. - Stays inside your Databricks account. Only authorized outbound calls hit Nimble. The API key lives in a Databricks secret scope and is injected server-side — it never appears in a function signature or at a call site.
Prerequisites
- A Databricks workspace with a serverless SQL warehouse.
- Privilege to create a catalog/schema (or an existing schema you own) plus
CREATE FUNCTIONon it. - The Databricks CLI v0.205+, authenticated (
databricks auth login). - A Nimble API key (sign up free).
Install in Databricks
Store the Nimble API key in a secret scope
The tools read the key from
secret('nimble','api_key'), so no token ever appears in a function body or at a call site. Create the scope once per workspace.READ lets every workspace user call secret('nimble','api_key') from SQL — values come back redacted in any display or log, so the plaintext can’t be extracted. Replace users with a group name to scope access more tightly.Create the catalog and schemas
nimble_search, nimble_extract, and the agent functions all live in nimble_integration.tools. The DDL is re-runnable (CREATE … IF NOT EXISTS).If your workspace uses Unity Catalog Default Storage, the plain
CREATE CATALOG may return “Metastore storage root URL does not exist.” Create the catalog from Catalog Explorer (Create Catalog → Default Storage), or use the explicit CREATE CATALOG nimble_integration MANAGED LOCATION '<uri>' form pointed at one of your UC external locations.cookbook/databricks/01_setup.sql
Catalog + schema scaffolding, with the Default Storage fallbacks documented inline
Deploy the tools
Each tool is a Python UDTF (
_nimble_xxx) that makes the HTTP call, behind a thin SQL RETURNS TABLE wrapper (nimble_xxx) that supplies defaults and injects the API key. The cookbook ships helpers/deploy_sql.py to post each multi-statement file to the warehouse (it treats the $$ … $$ UDTF bodies as opaque).nimble_search defaults to search_depth = lite (fast / deep are premium and not enabled on every account). The wrapper is what you call and what Genie registers — there is no scalar twin.cookbook/databricks/tools/
nimble_search.sql, nimble_extract.sql, nimble_agent_list.sql, nimble_agent_run.sql — full UDTF + wrapper for each, with COMMENTs and smoke testsVerify with a smoke test
Both should return a populated result within ~30 seconds.If a tool returns zero rows, re-check the networking Preview + cold restart from Prerequisites.
The tools
| Function | What it does |
|---|---|
nimble_search(query, max_results, focus, search_depth, …) | Live web search → rows of title, description, url, content. |
nimble_extract(url, render, format, …) | Fetch + parse one URL → a row of url, format, content (markdown / html / links). |
nimble_agent_list(privacy, managed_by, …) | The Nimble agent catalog → one row per agent. |
nimble_agent_run(agent, params_json, localization) | Run any agent → one row: response envelope + parsing (VARIANT). |
Common workflows
Land web data as a governed Delta table
The payoff of SQL-native web data: results compose withCREATE TABLE … AS and inherit Unity Catalog governance — access control, lineage, and time travel. Enrich a table of company homepages into a managed Delta table in one statement:
nimble_extract yields zero rows on a failed fetch (instead of raising), the batch over every company is never aborted by one bad URL. The result is a real Unity Catalog Delta table: DESCRIBE HISTORY it for time travel, GRANT SELECT it to a group, and trace its column lineage back through nimble_extract to companies.
Local business discovery at scale
Build a fresh, location-aware view of every business that matters for a category in a market: feed alocation_queries table of research terms and get back the ranked Google Maps landscape — one row per business per query. Useful for market sizing, account-universe building, competitive territory mapping, and POI databases.
google_maps_search returns ~20 results per query under parsing:entities:SearchResult. nimble_agent_run is a table function whose parsing column is a VARIANT, so navigate it with : paths and explode the results array with LATERAL variant_explode — one row per business — then persist with CREATE TABLE … AS.
location_queries
| query | category |
|---|---|
| coffee shops in Williamsburg Brooklyn | coffee |
| pizza restaurants in Chicago Loop | pizza |
| gyms in Austin Texas | gym |
local_businesses (one row per business per refresh)
| query | position | name | city | rating | review_count | price_level |
|---|---|---|---|---|---|---|
| coffee shops in Williamsburg Brooklyn | 1 | Devoción | Brooklyn | 4.5 | 2481 | $$ |
| coffee shops in Williamsburg Brooklyn | 2 | Sey Coffee | Brooklyn | 4.6 | 1840 | $$ |
| pizza restaurants in Chicago Loop | 1 | Giordano’s | Chicago | 4.4 | 10618 | $$ |
cookbook/databricks/recipes/local_business_universe.sql
Full runnable recipe: the
location_queries seed, the stitch-and-explode CTAS, sanity-check queries, and the governance proof points (time travel + lineage + GRANT)Structured per-row enrichment with any agent
When there is no domain-specific need beyond a single identifier,nimble_agent_run enriches each input row with that agent’s typed payload. PDP-style agents (Amazon PDP, LinkedIn Profile, Walmart PDP) return a single object; navigate it inline with : paths and cast leaves with ::type.
variant_explode needed. Field names inside parsing differ per agent (product_title vs. product_name, web_price vs. price) — discover the catalog with SELECT * FROM nimble_agent_list() and inspect a single response with SELECT parsing FROM nimble_agent_run(<agent>, …) before pinning a projection.
Schedule recurring enrichment
Each tool runs in a single SQL statement, so a Databricks Workflow can execute the enrichment directly on a schedule — point a SQL task at theCREATE OR REPLACE TABLE … AS statement above and set a cron trigger. No wrapper procedure required. The same statement also drops straight into a dbt model body for orchestrated, incremental refreshes.
Fallback: http_request() without the networking preview
If a workspace can’t enable UDTF egress, the http_request() SQL builtin reaches the Nimble API through a different, always-on egress path — at the cost of from_json parsing in SQL and no per-row isolation. It needs a one-time UC HTTP CONNECTION (requires the CREATE CONNECTION privilege):
http_request(conn => 'nimble_api', method => 'POST', path => '/v1/search', headers => map('Content-Type','application/json'), json => to_json(named_struct(…))), gates on response.status_code, and parses response.text with from_json. See ADDING_A_TOOL.md in the cookbook for the full pattern.
Roll out across your organization
- Tune for your Nimble rate-limit tier. Each tool call is one upstream request — one per input row. For high-cardinality inputs (thousands of rows), watch for 429s; because failures yield zero rows, a single retry pass over the rows that produced no output is the typical recovery. Nimble’s rate-limits page lists per-tier ceilings.
- Right-size the warehouse. A small serverless warehouse handles daily refreshes over a few hundred input rows. For larger enrichments (10K+ rows), step up the size only for the duration of the scheduled job.
- Surface failures separately. Because tools yield zero rows on failure, an input row with no matching output row is a failed call — left-join the input table against the result to find the gaps and audit them, rather than grepping job logs.
- Don’t ship guesses. Field names inside
parsingdiffer per agent. Inspect the shape once per agent (SELECT parsing FROM nimble_agent_run(<agent>, …)) before pinning a projection into a dbt model or production view.
Use it from Databricks Genie
Genie registers table functions as tools, so each public function is directly registrable — point a Genie space atnimble_integration.tools.nimble_search (and the others). The function COMMENT and per-column COMMENTs are what the LLM reads to decide when and how to call each tool. The cookbook ships helpers/create_genie_space.py to create a space wired to all four functions programmatically.
Resources
Nimbleway/cookbook (databricks/)
Every SQL file on this page, the one-time prereqs, the local-business recipe, and the deploy helpers
Databricks MCP Server
The agent-side surface: Nimble MCP Server from the Databricks Marketplace, wired into Genie and AI Playground
Nimble Web Search Agents
Overview of WSAs — what they are, when to use them, how they parse
Nimble Agent Gallery
Browse the catalog of pre-built agents and their per-agent parsing shapes
Databricks UC Python UDFs/UDTFs
How Unity Catalog tabular UDFs work —
eval() handler, RETURNS TABLE, and network egressDatabricks Genie
Register these table functions as tools in a Genie space for natural-language web data