Creating Jobs
Learn how to configure scraping jobs with all available options and best practices.
Job Configuration Overview
A scraping job in SnowScrape consists of several components that work together to extract data from websites. Understanding each component helps you create efficient and reliable scrapers.
Basic Configuration
Job Name
Choose a descriptive name that helps you identify the job later. Good naming conventions include:
Amazon - Product Prices - ElectronicsLinkedIn - Job Postings - Software EngineerNews Site - Headlines - Daily
Source URL
The URL of the page you want to scrape. SnowScrape supports:
- ✓Single URLs - Scrape one page (e.g.,
https://example.com/product/123) - ✓URL Lists - Upload a CSV file with multiple URLs to scrape
- ✓Dynamic URLs - Use pagination patterns like
page=[1-100]
Extraction Queries
Queries define what data to extract from each page. Each query has:
| Field | Description | Example |
|---|---|---|
| name | Column name in output | product_title |
| type | Query language | xpath, css, regex |
| query | The selector expression | //h1[@class='title']/text() |
| join | Combine multiple matches | true or false |
Example Query Configuration
{
"queries": [
{
"name": "title",
"type": "xpath",
"query": "//h1[@id='product-title']/text()"
},
{
"name": "price",
"type": "css",
"query": ".price-value::text"
},
{
"name": "description",
"type": "xpath",
"query": "//div[@class='description']//text()",
"join": true
}
]
}Rate Limiting
Control how fast SnowScrape makes requests to avoid overwhelming target servers or getting blocked.
Recommended Settings
- Low traffic sites: 10-20 requests/minute
- Medium traffic sites: 5-10 requests/minute
- High traffic / protected sites: 1-3 requests/minute
Important
Always respect the target website's robots.txt and terms of service. Aggressive scraping can lead to IP blocks or legal issues.
Scheduling
Set up recurring scrapes to keep your data fresh. SnowScrape supports flexible scheduling:
- Days - Select which days of the week to run
- Hours - Choose specific hours (24-hour format)
- Minutes - Fine-tune the exact minute
Schedule Examples
- Daily at 9 AM: Days: [0-6], Hours: [9], Minutes: [0]
- Weekdays at 6 PM: Days: [1-5], Hours: [18], Minutes: [0]
- Every 6 hours: Days: [0-6], Hours: [0, 6, 12, 18], Minutes: [0]
Advanced Options
JavaScript Rendering
Enable this for websites that load content dynamically with JavaScript. SnowScrape will use a headless browser to render the page before extracting data.
Learn more about JavaScript rendering →
Proxy Configuration
Use proxies to avoid IP blocks and access geo-restricted content. Options include:
- Rotation Strategy - Round-robin, random, or sticky sessions
- Geo-targeting - Route requests through specific countries
- Automatic Retries - Retry failed requests with different proxies
Learn more about proxy rotation →