How we detect outages

Plain-language documentation of what WebsiteDown actually measures, where the data comes from, and what we deliberately don't claim.

The short version

Our outage check runs one HTTP probe from one serverless region per request — not from many. (The speed test has a separate, optional 5-region latency panel.)
Outage detection combines our probe with community reports submitted by real users.
The AI summary on each page is generated by Vantlir Intelligence; every claim links to a source.
Alerts fire within 1–5 minutes of detection — not instantly.
We do not claim multi-region uptime monitoring, fake testimonials, or invented uptime percentages for sites we don't monitor.

1. The synthetic probe

When you check a site on the homepage or visit an /is-X-down page, our backend issues one HTTP request to that domain from a single serverless region (Vercel iad1, US East). We record the HTTP status code, latency in milliseconds, and any error message.

That's it. It is intentionally simple, and we deliberately don't claim more than that. A successful probe means our probe reached the server in that region within ~10 seconds — not that the site is reachable from every part of the world.

For sites you monitor on a paid plan, we run that same kind of probe on a schedule (every 60 seconds on Starter/Growth, every 5 minutes on Free) and store the result in our monitor_checks table. That's the only continuous probe we do; sites you don't monitor are checked on demand only.

2. Community reports

On every status page (/status/[domain]) and every /is-X-down page, you can press “Report issue” or “It’s working for me”. These submissions are stored in our outage_reports table along with country (when the browser provides it), report type (down / working), and an anonymous fingerprint for deduplication.

Reports are the regional signal. Our probe runs from one region; reports tell us when a site is broken for users elsewhere — geo-blocks, regional ISPs, partial CDN failures. A site with a healthy 200 OK from our probe but 50+ user reports in 15 minutes is almost always degraded for someone, and the page reflects that.

We aggregate counts in 15-minute, 1-hour, 24-hour, 7-day, and 30-day windows and surface them on the status page and in the snapshot pipeline below.

3. The snapshot pipeline

A scheduled job (the snapshot cron) runs every 15 minutes, aggregates reports per service, and compares the rolling count against a per-hour-of-day baseline. When reports cross 3× the baseline AND exceed an absolute floor, we record a detected incident in live_incidents with a severity level: minor, major, or critical.

Detected incidents power the “Last incident: 3 days ago, lasted 12 minutes” line on each status page, the active-incident banner, and the comparison-page metric rows. They are an objective spike-detection signal — not a human review — so a brief baseline-crossing burst can flag as “minor” even if the site recovered quickly.

4. AI summary (Vantlir Intelligence)

When you open a status page or trigger an outage check, we generate a 1–2 sentence summary of what's happening. The model classifies the issue type (network / DNS / SSL / login wall / regional / unknown), assigns a confidence level, and lists the signals that drove the conclusion.

Every AI output is paired with a sources panel: links to the official status page, Twitter/X discussions, Reddit threads, news coverage, and our own probe + reports. We don't narrate “verified” — the AI summarises the signals, it doesn't verify them. You can always click through and read the underlying sources for yourself.

We never list the underlying model name in user-facing surfaces; the public brand for the AI layer is Vantlir Intelligence.

5. The verdict classifier

Every status page emits one of seven verdicts based on a deterministic rule chain — not the AI:

Confirmed outage — probe failed AND community reports or a live incident corroborate.
Probable outage — probe failed but no corroboration yet.
Likely regional or partial — probe succeeded but reports indicate users elsewhere are affected.
Possible degradation — probe succeeded with elevated latency or a 4xx/5xx status code.
Operational — probe succeeded with 2xx/3xx response, low latency, and no anomalous report spike.
Pending first check — monitor was just added; the first probe hasn't completed yet.
Unknown — the probe couldn't determine status (rate-limited, blocked, ambiguous response).

The classifier rules live in src/lib/confidence.ts and are deliberately conservative: we never claim “outage” without corroboration, and a 200-OK with mixed user signals reads as “regional or partial”, never “outage”.

6. Alerts (paid monitoring)

For sites you monitor, we send alerts on status changes through the channels you've configured: Slack, Email, Telegram, Discord, Teams, and Webhook. Alerts fire within 1–5 minutes of detection — not instantly, not in real-time. The exact lag is one cron tick (60 seconds on paid plans) plus the alert dispatcher's outbound HTTP latency.

We anti-flap aggressively: if a monitor recovered within the last 15 minutes, we record the incident but skip the alert. That keeps notification noise down on flickering sites without losing the timeline data.

7. What we deliberately don't claim

Multi-region uptime monitoring. Continuous monitoring and the outage check run from one Vercel region (US East). The speed test does have an optional, clearly-labeled 5-region panel — a one-off reachability + round-trip-latency probe from US East, US West, EU/London, Tokyo, and Hong Kong — but that's an on-demand measurement, not always-on coverage. We don't claim continuous multi-region uptime for the sites we watch.
Uptime % for sites we don't monitor. We don't have continuous probe data for arbitrary services. Comparison pages like /compare/aws-vs-google-cloud show outage reports and detected incidents only — not fabricated 99.94% uptime numbers.
“Real-time” / “instant” alerts. The honest cadence is 1–5 minutes from detection.
Fake testimonials, customer logos, or invented usage stats. Anything that looks like social proof is real or it's not on the page.
“Verified by AI”. The AI summarises sources; it doesn't verify anything. The sources panel is on every AI surface so you can read them yourself.

8. Where data lives

Every signal on the site maps to a real database table or external API:

Signal	Source
HTTP probe (status page, /is-X-down)	On-demand HTTP request from Vercel iad1
Continuous monitor checks (paid)	monitor_checks table — 60s or 5-min cadence
Community reports	outage_reports table — Report issue / Working buttons
Aggregated report counts	report_snapshots table — 15-min rollups
Detected incidents	live_incidents table — snapshot cron with 3× baseline rule
AI summary + sources	Vantlir Intelligence (Anthropic + Perplexity APIs)
Official status pages	Status-page polling cron — statuspage.io endpoints
Alert deliveries	alert_channels table + Resend (email) / Telegram bot / Slack, Discord & generic webhooks

Questions or corrections?

If something on a status page looks wrong, let us know. We'd rather be told we're wrong than appear right.