LogoLogo
Nimble HomeLoginCreate an Account
  • Home
  • Quick Start Tutorials
    • Tutorial Library
      • Track SEO and SEM Ranking
      • Reddit as a Guerilla Marketing Strategy
  • Nimble Platform
    • Nimble Platform Overview
    • Online Pipelines
      • Supermarkets
        • ASDA
        • Tesco Groceries
        • Sainsbury’s
        • Morrisons
      • eCommerce
      • Restaurants
        • Yelp
        • Tabelog
        • Uber Eats Japan
        • Demaecan
        • Uber Eats US
      • Real Estate
        • Zillow
  • Nimble SDK
    • SDK Overview
    • Web API
      • Web API Overview
      • API Product Specs
      • Nimble Web API Quick Start Guide
        • Introduction
        • Nimble APIs Authentication
        • Real-time URL request
        • Delivery methods
        • Batch processing
        • Response codes
        • FAQs
      • Nimble Web API Functions
        • Realtime, Async & Batch Request
        • Geo Location Targeting
        • Javascript Rendering
        • Page Interaction
          • Wait (delay)
          • Wait for Selector
          • Wait and Click
          • Wait and Type
          • Scroll
          • Scroll to
          • Infinite Scrolling
          • Capturing Screenshots
          • Collecting Cookies
          • Executing HTTP Requests
          • Operation Reference
        • Network Capture
          • Filter by URL Matching
          • Filter By Resource Type
            • Real World Demo: Capturing Ajax Requests
          • Wait for Requests
          • Capturing XHR without Rendering
          • Operation Reference
        • Data Parsing
          • Parsing Templates
          • Merge Dynamic Parser
        • Custom Headers & Cookies
        • General Params
      • Vertical Endpoints
        • SERP API
          • Real-time search request
          • Getting local data
          • Browsing SERP pagination
          • Delivery methods
          • Batch Processing
          • Endpoints and Response Codes
        • Maps API
          • Searching for places
          • Getting information about a place
          • Collecting reviews
          • Delivery methods
          • Batch processing
          • Endpoints and Response Codes
    • Web Retrieval API
      • Web Retrieval API Overview
    • Proxy API
      • Nimble IP Overview
      • Nimble IP Quick Start Guide
        • Send a request
        • Nimble IP Autentication
        • Geotargeting and session control
        • Response codes
        • FAQs
      • Nimble IP Functions
        • Country/state/city geotargeting
        • Controlling IP rotation
        • Geo-sessions: longer, stickier, more accurate sessions
        • Using IPv6 Proxies
        • Response Codes
      • Integration Guides
        • Incogniton
        • Kameleo
        • VMLogin
        • AdsPower
        • FoxyProxy
        • Android
        • Multilogin
        • iOS
        • SwitchyOmega
        • Windows
        • macOS
        • Proxifier
        • MuLogin
        • Puppeteer
        • Selenium
        • Scrapy
    • Client Libraries
      • Installation
      • Quick Start
    • LangChain Integration
  • Technologies
    • Browserless Drivers
      • API Driver-Based Pricing
    • IP Optimization Models
    • AI Parsing Skills
  • Management Tools
    • Nimble Dashboard
      • Exploring the User Dashboard
      • Managing Pipelines
      • Reporting and Analytics
      • Account Settings
      • Experimenting with the Playground
      • Billing and history
    • Nimble Admin API
      • Admin API basics
      • Admin API reference
  • General
    • Onboarding Guide
      • Getting started with Nimble's User Dashboard
      • Nimble IP Basics
      • Nimble API Basics
      • Helpful Resources
    • FAQs
      • Account Settings and Security
      • Billing and Pricing
      • Tools and Integrations
      • Nimble API
      • Nimble IP
    • Deprecated APIs
      • E-commerce API
        • E-commerce API Authentication
        • Real-time product request
        • Real-time product search request
        • Delivery methods
        • Batch Processing
        • Endpoints and Response Codes
      • Unlocker Proxy Overview
        • Unlocker Proxy Quick Start Guide
          • Real-time request
          • FAQs
        • Unlocker Proxy FAQ
Powered by GitBook
On this page
  • Introduction
  • Key Features
  • How it works
  • Auto-Healing Capabilities
  1. Technologies

AI Parsing Skills

PreviousIP Optimization ModelsNextNimble Dashboard

Last updated 8 months ago

Nimble Labs Beta Feature

Introduction

Nimble's AI Parsing Skills is an automated parsing system that makes it possible to parse data from any website by using HTML-trained LLM technology to dynamically generate accurate, customizable web data parsers.

A "Skill" refers to a parser - generated by Nimble's behind-the-scenes AI Agents - that is used to extract a particular set of data points from a particular page type (product details page, stock ticker page, etc.) Users can use Nimble's out-of-the-box Skills, or create their own custom Skills using a Skill schema.

AI Parsing Skills can operate completely automatically, in which the system will:

Alternatively, users can create their own custom Skills (coming soon), in which they provide basic instructions to Nimble's AI Parsing engine regarding what data they are looking to extract from a particular page in the form of a schema:

In both cases, Nimble's Parsing Engine uses AI Agents to monitor and manage the entire process. If at some point in the future the webpage changes to a new layout and the previous parsing logic fails to extract the needed data, the AI Agent recreates the Skill based on the new webpage, and validates that data is extracted successfully.

Currently available only in . Custom AI Parsing skill is coming soon...

Key Features

How it works

The process begins with a request for a new, unparsed webpage.

  1. Along with the Web API request, the user sends a Skill schema that lists the fields they'd like parsed out of the webpage.

  2. Nimble Browser completes the request, and passes the user's schema and the resulting page's HTML to Parsing Skills.

  3. Parsing Skills launches an AI Agent which feeds the schema into an HTML-trained LLM that understands the fields the user requested, and examines the webpage's HTML source code.

  4. The LLM identifies the requested data in the source code, and generates multiple parsers (also called Skills) for its extraction from the HTML source code.

  5. The AI Agent organizes the resulting data, which includes:

    1. The IDs of each of the generated Skills

    2. The output produced by each of the generated Skills

    3. The original HTML source code

  6. The user then receives the data, and can review the generated Skills to see which output is the best for their needs.

Sample Schema

This example illustrates a schema that might be used to extract information about a product.

"schema": {
    "name": "product",
    "fields": {
        "product_name": { "type": "str" },
        "product_description": { "type": "str" },
        "image_url": { "type": "str" },
        "sku": { "type": "str" },
        "price": { "type": "str" }
    }
}

Once the preferred Skill is identified, the user can refer to this Skill using its ID in future requests to apply the same parsing logic onto other pages. This process can be repeated to generate increasingly sophisticated Skills, to generate Skills for webpages with different layouts, etc.

What is a "Skill"? A Skill is an AI-generated parser. Users create schemas to instruct the Parsing Skills system what fields they'd like extracted from a webpage. Parsing Skills then generates one or more Skills - each with its own unique logic - that parse the requested fields from the HTML source.

Auto-Healing Capabilities

When a user makes a request for a particular Skill to be used by referring to its ID, Nimble will apply that Skill's parsing logic directly to the source HTML. However, the Parsing Skills AI Agent also monitors the output of the Skill for signs of extraction failure.

When a webpage's structure changes - breaking the parsing logic of the Skill - the AI Agent will detect that parsing did not work as expected, and will regenerate the Skill from its original schema with the updated HTML source. This restore's the Skills ability to extract data accurately from the target webpage with zero user intervention.

auto mode