What it is, why it matters, who it’s for.
Multi-tenant SaaS that crawls full domains, scores every page across 40+ signals, runs Lighthouse audits and TTFB probes, and streams findings to dashboards in real time. Teams, invites, billing, and credits all live under one membership model.
SEO crawlers are either nightly batch jobs or burn through API budgets. Teams that work daily on technical SEO need feedback that arrives while they’re still on the page.
In-house SEO leads and agencies running weekly audits across dozens of client domains.
Sub-secondfeedbackperpage,fairacrosstenants,ononebox.
- 01Single bare-metal node — no autoscaling budget
- 02No cross-tenant data leaks under any failure mode
- 03Polite to upstream hosts — robots.txt, adaptive backoff
- 04Every analyzer must be hot-reloadable in dev
Crawlers degrade in three places at once: scheduling, analysis, and delivery. Solving one usually breaks another, and multi-tenant fairness amplifies every mistake.
Decisions, in order of stakes.
- 01
Work-stealing, not round-robin
A central scheduler hands work to whichever worker is idle. Per-host token buckets keep us polite without idling cores when a tenant’s domain is rate-limited.
- 02
Analyzers as a registry
Every check is an isolated module behind a typed contract. New analyzers register at startup, can be hot-reloaded in dev, and ship as a signed bundle in prod.
- 03
Stream, don’t poll
Per-tenant SSE channels push findings the moment an analyzer returns. The UI patches an immutable store — no refetch, no flicker.
- 04
RLS as the only tenant boundary
Postgres row-level security scopes every query. The API sets `app.tenant_id` per request and the database enforces the rest.
- 05
Lighthouse + TTFB as first-class signals
Lab Lighthouse runs and synthetic TTFB probes are domains of their own — separate workers, separate quotas, regression alerts when a tracked URL drifts. SEO posture and performance posture share the same dashboard.
- 06
One membership model for teams, invites, billing
Teams, invites, permissions, subscriptions, and credit ledgers all hang off a single membership table. New surfaces (Lighthouse credits, exports) plug into the same accounting without a parallel auth path.
Each worker pulls jobs from a shared queue, takes a token from the per-host bucket, fetches, and walks every analyzer registered for that tenant. If anyone returns a finding, it lands on the SSE channel within milliseconds.
Multi-tenant safety is enforced in one place. Every table has an RLS policy; the API sets the tenant ID inside the same transaction as the query. There is no second path.
Lighthouse and TTFB runs aren’t free. A credit ledger settles in the same transaction as the run insert — no double-spend, no orphaned jobs, and the billing surface reads from the same view that gates the queue.
Real-timecrawls.Onenode.Zeroleaks.
10k pages crawled in under four minutes per node. First finding under 200ms. Tenant isolation enforced at the database — verified by red-team probes through launch.
- ·
Fairness is a scheduling concern, not a quota concern. Token buckets per host beat per-tenant caps.
- ·
RLS keeps the leak surface inside Postgres. Application-layer isolation always finds a way to fail.
- ·
Streaming is a UX concern first. Sub-second feedback changes how teams use the tool.
- →
Wire OpenTelemetry traces from API → scheduler → analyzer earlier — debugging fairness issues in prod was expensive.
- →
Build a replay tool for analyzer development before adding the tenth analyzer, not the thirtieth.
