Overview
Termly’s cookie scanner automatically detects cookies and tracking technologies across your website. By default, it uses optimized settings that work well for most websites.
For larger or more complex sites, Scan Optimization Configuration allows you to customize how the scanner works. This gives you better control over which pages are scanned and how the 500-page scan limit is used.
These settings help you:
- Improve coverage of important pages
- Avoid scanning repetitive or low-value pages
- Discover hidden or unlinked pages
- Balance scan depth and breadth
All changes apply to future scans only.
How the Scanner Works
The scanner starts from your homepage and follows links to discover pages. It stops when:
- It reaches the 500-page limit, or
- There are no more pages to scan
Without optimization, the scanner may:
- Spend too much time on similar pages (e.g., product listings)
- Miss important pages like privacy policies
- Fail to detect pages not linked in navigation
Scan Optimization Settings
1. Breadth-First Scan (BFS)
Default: Enabled
Controls the order in which pages are scanned.
- Enabled (BFS): Scans top-level pages first, then moves deeper
- Disabled (DFS): Scans one path deeply before moving to others
Why it matters:
Ensures important pages (privacy, terms, contact) are scanned early.
Recommendation: Keep enabled for most websites.
2. Crawl Depth Limit
Default: 5
Controls how deep the scanner goes from the homepage.
- Lower values focus on top-level pages
- Higher values allow deeper scanning
Use cases:
- Increase for sites with deep navigation
- Decrease for large sites hitting the scan limit
3. URL Pattern Detection
Default: Enabled
Detects similar page structures and limits how many are scanned.
Example:
/products/item-1
/products/item-2
Why it matters:
Prevents wasting scan budget on repetitive pages.
Recommendation: Keep enabled unless pages contain unique tracking.
4. Max URLs Per Pattern
Default: 10
Controls how many pages are scanned per pattern.
- Lower value: More aggressive filtering
- Higher value: More pages scanned
Use cases:
- Lower for large product catalogs (3–5)
- Higher if pages may contain unique cookies
5. Sitemap Scanning
Default: Enabled
Allows the scanner to discover pages through your sitemap.
Why it matters:
Finds pages not linked in navigation, such as:
- Campaign pages
- Archived content
- Hidden landing pages
6. Max Sitemap URLs
Default: 50
Limits how many sitemap URLs are included in the scan.
Lower value: Focus on navigation links
Higher value: More sitemap pages included
7. Priority Scoring
Default: Enabled
Prioritizes important pages based on keywords like:
- privacy
- terms
- legal
- cookie
Why it matters:
Ensures compliance-related pages are scanned early.