Query Types
Master XPath, CSS selectors, and Regex patterns to extract exactly the data you need.
SnowScrape supports three query types for data extraction. Each has its strengths, and you can mix them within a single job based on what works best for each data point.
XPath
Best for complex DOM traversal
CSS
Simple and widely known
Regex
Pattern matching in text
XPath Selectors
XPath (XML Path Language) is a powerful query language for selecting nodes in XML/HTML documents. It excels at navigating complex document structures.
Basic Syntax
| Expression | Description | Example |
|---|---|---|
| //tag | Select all matching tags | //div |
| /tag | Direct child only | /html/body/div |
| [@attr] | Has attribute | //a[@href] |
| [@attr='value'] | Attribute equals value | //div[@class='price'] |
| /text() | Get text content | //h1/text() |
| /@attr | Get attribute value | //a/@href |
Advanced XPath
Common Patterns
//div[contains(@class, 'product')]Class contains "product"//span[starts-with(@id, 'price-')]ID starts with "price-"//ul[@class='items']/li[position() <= 5] First 5 list items//div[@class='review']//span[@class='rating']/text()Rating text inside review divsCSS Selectors
CSS selectors are familiar to web developers and great for simple selections. They're more concise but less powerful than XPath for complex queries.
Basic Syntax
| Selector | Description | Example |
|---|---|---|
| tag | Element type | div |
| .class | Class selector | .product-title |
| #id | ID selector | #main-content |
| [attr] | Has attribute | [data-price] |
| [attr=value] | Attribute equals | [type="submit"] |
| parent > child | Direct child | ul > li |
| ancestor descendant | Any descendant | .card .price |
CSS Pseudo-selectors
li:first-childFirst list itemli:nth-child(2)Second list itemp:not(.intro)Paragraphs without .intro class::textExtract text contentRegular Expressions (Regex)
Regex patterns work on the raw HTML or extracted text. Use them when you need to extract specific patterns that CSS/XPath can't easily select.
Common Patterns
| Pattern | Matches | Example Match |
|---|---|---|
| \$[\d,]+\.?\d* | US Dollar prices | $1,299.99 |
| \d+\.\d+ stars? | Star ratings | 4.5 stars |
| [A-Z]\d{9} | Product codes | B0123456789 |
| \d{1,3}(,\d{3})* | Numbers with commas | 1,234,567 |
Pro Tip: Use Capture Groups
Use parentheses to capture specific parts of a match. For example, Price: \$(\d+\.\d+)captures just the number, not "Price: $".
Choosing the Right Query Type
| Use Case | Recommended | Why |
|---|---|---|
| Simple class/ID selection | CSS | Concise and readable |
| Text content extraction | XPath | text() function is explicit |
| Parent/sibling navigation | XPath | CSS can't go upward |
| Pattern in text | Regex | Best for pattern matching |
| Attribute contains value | XPath | contains() function |
Testing Your Queries
Before running a full scrape, test your queries using browser developer tools:
- Open DevTools (F12 or right-click → Inspect)
- Go to the Console tab
- For XPath:
$x('//your/xpath/here') - For CSS:
$$('your.css.selector') - For Regex: Use the Elements panel search (Ctrl+F)