AI Parsing Skills

Nimble Labs Beta Feature

Introduction

Nimble's AI Parsing Skills is an automated parsing system that makes it possible to parse data from any website by using HTML-trained LLM technology to dynamically generate accurate, customizable web data parsers.

A "Skill" refers to a parser - generated by Nimble's behind-the-scenes AI Agents - that is used to extract a particular set of data points from a particular page type (product details page, stock ticker page, etc.) Users can use Nimble's out-of-the-box Skills, or create their own custom Skills using a Skill schema.

AI Parsing Skills can operate completely automatically, in which the system will:

Alternatively, users can create their own custom Skills (coming soon), in which they provide basic instructions to Nimble's AI Parsing engine regarding what data they are looking to extract from a particular page in the form of a schema:

In both cases, Nimble's Parsing Engine uses AI Agents to monitor and manage the entire process. If at some point in the future the webpage changes to a new layout and the previous parsing logic fails to extract the needed data, the AI Agent recreates the Skill based on the new webpage, and validates that data is extracted successfully.

Currently available only in auto mode. Custom AI Parsing skill is coming soon...

Key Features

How it works

The process begins with a request for a new, unparsed webpage.

  1. Along with the Web API request, the user sends a Skill schema that lists the fields they'd like parsed out of the webpage.

  2. Nimble Browser completes the request, and passes the user's schema and the resulting page's HTML to Parsing Skills.

  3. Parsing Skills launches an AI Agent which feeds the schema into an HTML-trained LLM that understands the fields the user requested, and examines the webpage's HTML source code.

  4. The LLM identifies the requested data in the source code, and generates multiple parsers (also called Skills) for its extraction from the HTML source code.

  5. The AI Agent organizes the resulting data, which includes:

    1. The IDs of each of the generated Skills

    2. The output produced by each of the generated Skills

    3. The original HTML source code

  6. The user then receives the data, and can review the generated Skills to see which output is the best for their needs.

Sample Schema

This example illustrates a schema that might be used to extract information about a product.

"schema": {
    "name": "product",
    "fields": {
        "product_name": { "type": "str" },
        "product_description": { "type": "str" },
        "image_url": { "type": "str" },
        "sku": { "type": "str" },
        "price": { "type": "str" }
    }
}

Once the preferred Skill is identified, the user can refer to this Skill using its ID in future requests to apply the same parsing logic onto other pages. This process can be repeated to generate increasingly sophisticated Skills, to generate Skills for webpages with different layouts, etc.

What is a "Skill"? A Skill is an AI-generated parser. Users create schemas to instruct the Parsing Skills system what fields they'd like extracted from a webpage. Parsing Skills then generates one or more Skills - each with its own unique logic - that parse the requested fields from the HTML source.

Auto-Healing Capabilities

When a user makes a request for a particular Skill to be used by referring to its ID, Nimble will apply that Skill's parsing logic directly to the source HTML. However, the Parsing Skills AI Agent also monitors the output of the Skill for signs of extraction failure.

When a webpage's structure changes - breaking the parsing logic of the Skill - the AI Agent will detect that parsing did not work as expected, and will regenerate the Skill from its original schema with the updated HTML source. This restore's the Skills ability to extract data accurately from the target webpage with zero user intervention.

Last updated