Understanding Jobs
Learn how scraping jobs work in SnowScrape and their lifecycle.
A job is the fundamental unit of work in SnowScrape. It defines what to scrape, how to extract data, and when to run. Understanding jobs is essential for effective web scraping.
Job Anatomy
Every job consists of these core components:
Source
The URL(s) to scrape. Can be a single page, a list of URLs, or a pattern with pagination.
Queries
XPath, CSS, or Regex patterns that define what data to extract from each page.
Schedule
Optional configuration for when the job should run automatically.
Configuration
Advanced options like rate limiting, proxy settings, and JavaScript rendering.
Job Lifecycle
Jobs progress through various states during their lifecycle:
Status Definitions
| Status | Description | Next Actions |
|---|---|---|
| Scheduled | Waiting for scheduled time | Run now, Edit, Delete |
| Running | Currently executing | Pause, View Progress |
| Paused | Manually paused | Resume, Delete |
| Success | Completed successfully | Download, Run Again |
| Failed | Execution failed | View Logs, Retry, Edit |
Job Execution
When a job runs, SnowScrape performs these steps:
- URL Resolution - Expands URL patterns and loads URL lists
- Request Queuing - Creates a queue of pages to scrape with rate limiting
- Page Fetching - Downloads each page (with optional JS rendering)
- Data Extraction - Applies queries to extract structured data
- Result Storage - Saves extracted data to your account
- Webhook Notification - Sends completion notification (if configured)
Job Types
One-Time Jobs
Run once manually or triggered via API. Good for ad-hoc data collection.
Scheduled Jobs
Run automatically on a schedule. Perfect for monitoring prices, tracking changes, or collecting data at regular intervals.
Template-Based Jobs
Created from pre-built templates for popular websites. Fastest way to get started with proven configurations.
Best Practice
Start with a small test run (one or two URLs) to verify your queries work correctly before running a full scrape on hundreds of pages.