Workspace
Settings
Defaults for new crawls, storage, extraction, and notifications. Per-job overrides win.
Crawler defaults
These apply to every new crawl unless overridden per-job.
User agent
Identify yourself to site owners. Honored in robots.txt.
Default strategy
Browser runs JS; deep uses headless cluster.
Concurrent workers
Per-job. Cluster-wide limit is 200.
Request timeout
Seconds before a request is abandoned.
Respect robots.txt
Auto-throttle on 429
Extraction pipeline
How raw HTML becomes structured data.
Default extractor
LLM fallback
Deduplicate by URL
Strip tracking params
Storage
Where crawled data lives before it ships downstream.
Region
Storage class
Retention (days)
Compress archives
Notifications
Where to send failure and quota alerts.
Email
Slack webhook
Alert on failure
Alert on quota > 80%
3 unsaved changes