Data Parsing
What?
Transforming raw HTML into clean, accurate, and useable data is no easy task. With each website having its own unique layout and unpredictable updates, it's important to have a diverse set of powerful tools to ensure consistent and accurate data extraction.
Nimble's Web API comes built-in with three tools to help you effectively extract the key data you need easily, reliably, and at scale.
Let's look at each one in more detail and examine some examples to understand when it's right to use each one.
Why?
Enhanced Accuracy: LLMs are adept at understanding the context and structure of web content, enabling them to parse complex web data more accurately than traditional parsing tools. This results in higher-quality data extraction, particularly from sophisticated web pages including site stricture changes.
Scalability: AI models can handle a wide range of website layouts and structures without needing specific rules for each site. This scalability makes it easier to process data from a broad spectrum of sources with minimal setup time.
Continuity: Unlike traditional parsers that require pre-defined schemas and are often brittle to changes in web page design, AI-based parsing adapts to changes in webpage layouts and content schemes, reducing the need for frequent manual updates.
Efficiency: By automating the structuring of data into usable formats, this feature saves significant time and effort that would otherwise be spent on manual data cleaning and organization. This allows users to focus on analysis and insights rather than data preprocessing.
Integration Readiness: The structured data output from AI Parsing is readily integrable into various data analysis tools and applications, enhancing the workflow from data collection to actionable insights.
Which tool is right for me?
Each tool has its own unique advantages and disadvantages. The below table should help clarify the features of each individual tool, and help you decide which is right for you. It's also important to remember that these tools can operate in parallel within each request, and we encourage users to try out each one and experiment to get the best results
Fully-automated
Manual control
Auto-healing
Easy to use
CSS Selector targeting
Additional Information
Supported by realtime (except cloud delivery), asynchronous, and batch requests.
Not supported Endpoints: Social
Request Option
Enable Parsing
To run Nimble API request that requires data parsing (HTML -> JSON), the user simply needs to include the parse
parameter to true
. Behind the scenes, the Nimble AI Parser will dynamically parse the webpage HTML content into structured data format (JSON).
Data Formatting
To set Nimble API data response format as JSON (instead of HTML), the user simply needs to include the parameter "format": JSON
in the body of the request. Actually this is the default value of format
param so the user don't need manually set it, but this is configurable.
parse
Optional (default = false
)
Enum: true
| false
- True - the page's content will be parsed and returned in a JSON format. False - Response will include page headers and raw data (without parsing).
format
Optional (default = JSON
)
Enum: JSON
| HTML
- The data response format. HTML - in case of error, returns JSON with error message.
When setting parse
as true
, the format
must be set to JSON
(which is the default format)
Example Request
Actually no need as JSON is the default value of
format
Next Steps
Dive into the full guides for each of Nimble's parsing solutions:
Last updated