LogoLogo
Nimble HomeLoginCreate an Account
  • Home
  • Quick Start Tutorials
    • Tutorial Library
      • Track SEO and SEM Ranking
      • Reddit as a Guerilla Marketing Strategy
  • Nimble Platform
    • Nimble Platform Overview
    • Online Pipelines
      • Supermarkets
        • ASDA
        • Tesco Groceries
        • Sainsbury’s
        • Morrisons
      • eCommerce
      • Restaurants
        • Yelp
        • Tabelog
        • Uber Eats Japan
        • Demaecan
        • Uber Eats US
      • Real Estate
        • Zillow
  • Nimble SDK
    • SDK Overview
    • Web API
      • Web API Overview
      • API Product Specs
      • Nimble Web API Quick Start Guide
        • Introduction
        • Nimble APIs Authentication
        • Real-time URL request
        • Delivery methods
        • Batch processing
        • Response codes
        • FAQs
      • Nimble Web API Functions
        • Realtime, Async & Batch Request
        • Geo Location Targeting
        • Javascript Rendering
        • Page Interaction
          • Wait (delay)
          • Wait for Selector
          • Wait and Click
          • Wait and Type
          • Scroll
          • Scroll to
          • Infinite Scrolling
          • Capturing Screenshots
          • Collecting Cookies
          • Executing HTTP Requests
          • Operation Reference
        • Network Capture
          • Filter by URL Matching
          • Filter By Resource Type
            • Real World Demo: Capturing Ajax Requests
          • Wait for Requests
          • Capturing XHR without Rendering
          • Operation Reference
        • Data Parsing
          • Parsing Templates
          • Merge Dynamic Parser
        • Custom Headers & Cookies
        • General Params
      • Vertical Endpoints
        • SERP API
          • Real-time search request
          • Getting local data
          • Browsing SERP pagination
          • Delivery methods
          • Batch Processing
          • Endpoints and Response Codes
        • Maps API
          • Searching for places
          • Getting information about a place
          • Collecting reviews
          • Delivery methods
          • Batch processing
          • Endpoints and Response Codes
    • Web Retrieval API
      • Web Retrieval API Overview
    • Proxy API
      • Nimble IP Overview
      • Nimble IP Quick Start Guide
        • Send a request
        • Nimble IP Autentication
        • Geotargeting and session control
        • Response codes
        • FAQs
      • Nimble IP Functions
        • Country/state/city geotargeting
        • Controlling IP rotation
        • Geo-sessions: longer, stickier, more accurate sessions
        • Using IPv6 Proxies
        • Response Codes
      • Integration Guides
        • Incogniton
        • Kameleo
        • VMLogin
        • AdsPower
        • FoxyProxy
        • Android
        • Multilogin
        • iOS
        • SwitchyOmega
        • Windows
        • macOS
        • Proxifier
        • MuLogin
        • Puppeteer
        • Selenium
        • Scrapy
    • Client Libraries
      • Installation
      • Quick Start
  • AI Agents
    • LangChain Integration
    • MCP Server
  • Technologies
    • Browserless Drivers
      • API Driver-Based Pricing
    • IP Optimization Models
    • AI Parsing Skills
  • Management Tools
    • Nimble Dashboard
      • Exploring the User Dashboard
      • Managing Pipelines
      • Reporting and Analytics
      • Account Settings
      • Experimenting with the Playground
      • Billing and history
    • Nimble Admin API
      • Admin API basics
      • Admin API reference
  • General
    • Onboarding Guide
      • Getting started with Nimble's User Dashboard
      • Nimble IP Basics
      • Nimble API Basics
      • Helpful Resources
    • FAQs
      • Account Settings and Security
      • Billing and Pricing
      • Tools and Integrations
      • Nimble API
      • Nimble IP
    • Deprecated APIs
      • E-commerce API
        • E-commerce API Authentication
        • Real-time product request
        • Real-time product search request
        • Delivery methods
        • Batch Processing
        • Endpoints and Response Codes
      • Unlocker Proxy Overview
        • Unlocker Proxy Quick Start Guide
          • Real-time request
          • FAQs
        • Unlocker Proxy FAQ
Powered by GitBook
On this page
  • Configuration
  • Basic Usage
  • Middleware Handling
  • Advanced Features
  • Development Environment Setup
  1. Nimble SDK
  2. Proxy API
  3. Integration Guides

Scrapy

PreviousSeleniumNextClient Libraries

Last updated 8 months ago

Scrapy is a popular open-source web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It is written in Python and provides a complete toolset for scraping tasks.

Scrapy simplifies the process of writing complex spiders, which are programs that browse the Web and extract data based on a set of instructions. It's highly extensible, allowing for the implementation of custom functionality through plugins, and it can handle a wide range of web scraping and crawling tasks, making it an ideal choice for projects ranging from simple data extraction to large-scale web mining.

Configuration

Setting Up Your Nimble Account

If you haven't already, you'll need to create an account with to access their Web API .

Configure Scrapy Settings

The first step is to install Nimble's Scrapy middleware using pip:

pip install scrapy-nimble

Next, configure your Scrapy project to interact with Nimble's Web API by updating your :

# settings.py
NIMBLE_ENABLED = True

NIMBLE_USERNAME = "username"
NIMBLE_PASSWORD = "password"

Then, add the Nimble middleware to your downloader middlewares:

# settings.py
DOWNLOADER_MIDDLEWARES = {
    "scrapy_nimble.middlewares.NimbleWebApiMiddleware": 570,
}

Basic Usage

Middleware Handling

With the middleware configured, every request sent from your Scrapy spiders will automatically pass through the Nimble Web API. There's no need for additional changes in your spider code for basic usage.

Advanced Features

Real-time URL Requests

To use these features, you add specific options in the meta section of your request. Here’s how you can specify these options:

# Inside your spider
yield scrapy.Request(
   "https://nimbleway.com",
   meta={
      "nimble_country": "DE",
      "nimble_locale": "uk",
      "nimble_render": True,
   }
)

Development Environment Setup

Python Environment

pyenv virtualenv 3.11.6 myvenv
pyenv activate myvenv
python -m pip install -e .

Now, your development environment is set up, and you're ready to develop your Scrapy project with Nimble's Web API.

Ensure that the Nimble middleware is configured to run before the default scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware which is enabled by default in set at an order of 590.

The enhances your scraping capabilities with options for real-time URL requests. This feature allows for dynamic content rendering, geolocated requests, and .

It's recommended to use for managing Python versions and creating an isolated development environment:

Nimble
here
settings.py
DOWNLOADER_MIDDLEWARES_BASE
Nimble Web API
more
pyenv