AK
Workspace

Settings

Defaults for new crawls, storage, extraction, and notifications. Per-job overrides win.

Crawler defaults

These apply to every new crawl unless overridden per-job.

User agent
Identify yourself to site owners. Honored in robots.txt.
Default strategy
Browser runs JS; deep uses headless cluster.
Concurrent workers
Per-job. Cluster-wide limit is 200.
Request timeout
Seconds before a request is abandoned.
Respect robots.txt
Auto-throttle on 429

Extraction pipeline

How raw HTML becomes structured data.

Default extractor
LLM fallback
Deduplicate by URL
Strip tracking params

Storage

Where crawled data lives before it ships downstream.

Region
Storage class
Retention (days)
Compress archives

Notifications

Where to send failure and quota alerts.

Email
Slack webhook
Alert on failure
Alert on quota > 80%
3 unsaved changes