How we detect outages
Plain-language documentation of what WebsiteDown actually measures, where the data comes from, and what we deliberately don't claim.
- We run one HTTP probe from one serverless region per check. Not from many regions.
- Outage detection combines our probe with community reports submitted by real users.
- The AI summary on each page is generated by Vantlir Intelligence; every claim links to a source.
- Alerts fire within 1–5 minutes of detection — not instantly.
- We do not claim multi-region uptime, fake testimonials, or invented uptime percentages for sites we don't monitor.
1. The synthetic probe
When you check a site on the homepage or visit an /is-X-down page, our backend issues one HTTP request to that domain from a single serverless region (Vercel iad1, US East). We record the HTTP status code, latency in milliseconds, and any error message.
That's it. It is intentionally simple, and we deliberately don't claim more than that. A successful probe means our probe reached the server in that region within ~10 seconds — not that the site is reachable from every part of the world.
For sites you monitor on a paid plan, we run that same kind of probe on a schedule (every 60 seconds on Starter/Growth, every 5 minutes on Free) and store the result in our monitor_checks table. That's the only continuous probe we do; sites you don't monitor are checked on demand only.
2. Community reports
On every status page (/status/[domain]) and every /is-X-down page, you can press “Report issue” or “It’s working for me”. These submissions are stored in our outage_reports table along with country (when the browser provides it), report type (down / working), and an anonymous fingerprint for deduplication.
Reports are the regional signal. Our probe runs from one region; reports tell us when a site is broken for users elsewhere — geo-blocks, regional ISPs, partial CDN failures. A site with a healthy 200 OK from our probe but 50+ user reports in 15 minutes is almost always degraded for someone, and the page reflects that.
We aggregate counts in 15-minute, 1-hour, 24-hour, 7-day, and 30-day windows and surface them on the status page and in the snapshot pipeline below.
3. The snapshot pipeline
A scheduled job (the snapshot cron) runs every 15 minutes, aggregates reports per service, and compares the rolling count against a per-hour-of-day baseline. When reports cross 3× the baseline AND exceed an absolute floor, we record a detected incident in live_incidents with a severity level: minor, major, or critical.
Detected incidents power the “Last incident: 3 days ago, lasted 12 minutes” line on each status page, the active-incident banner, and the comparison-page metric rows. They are an objective spike-detection signal — not a human review — so a brief baseline-crossing burst can flag as “minor” even if the site recovered quickly.
4. AI summary (Vantlir Intelligence)
When you open a status page or trigger an outage check, we generate a 1–2 sentence summary of what's happening. The model classifies the issue type (network / DNS / SSL / login wall / regional / unknown), assigns a confidence level, and lists the signals that drove the conclusion.
Every AI output is paired with a sources panel: links to the official status page, Twitter/X discussions, Reddit threads, news coverage, and our own probe + reports. We don't narrate “verified” — the AI summarises the signals, it doesn't verify them. You can always click through and read the underlying sources for yourself.
We never list the underlying model name in user-facing surfaces; the public brand for the AI layer is Vantlir Intelligence.
5. The verdict classifier
Every status page emits one of seven verdicts based on a deterministic rule chain — not the AI:
- Confirmed outage — probe failed AND community reports or a live incident corroborate.
- Probable outage — probe failed but no corroboration yet.
- Likely regional or partial — probe succeeded but reports indicate users elsewhere are affected.
- Possible degradation — probe succeeded with elevated latency or a 4xx/5xx status code.
- Operational — probe succeeded with 2xx/3xx response, low latency, and no anomalous report spike.
- Pending first check — monitor was just added; the first probe hasn't completed yet.
- Unknown — the probe couldn't determine status (rate-limited, blocked, ambiguous response).
The classifier rules live in src/lib/confidence.ts and are deliberately conservative: we never claim “outage” without corroboration, and a 200-OK with mixed user signals reads as “regional or partial”, never “outage”.
6. Alerts (paid monitoring)
For sites you monitor, we send alerts on status changes through the channels you've configured: email, Telegram, Discord. Alerts fire within 1–5 minutes of detection — not instantly, not in real-time. The exact lag is one cron tick (60 seconds on paid plans) plus the alert dispatcher's outbound HTTP latency.
We anti-flap aggressively: if a monitor recovered within the last 15 minutes, we record the incident but skip the alert. That keeps notification noise down on flickering sites without losing the timeline data.
7. What we deliberately don't claim
- Multi-region probes. We run from one Vercel region. If you need true regional coverage you need a different tool — and we'll be honest about that on this page rather than overclaim.
- Uptime % for sites we don't monitor. We don't have continuous probe data for arbitrary services. Comparison pages like
/compare/aws-vs-google-cloudshow outage reports and detected incidents only — not fabricated 99.94% uptime numbers. - “Real-time” / “instant” alerts. The honest cadence is 1–5 minutes from detection.
- Fake testimonials, customer logos, or invented usage stats. Anything that looks like social proof is real or it's not on the page.
- “Verified by AI”. The AI summarises sources; it doesn't verify anything. The sources panel is on every AI surface so you can read them yourself.
8. Where data lives
Every signal on the site maps to a real database table or external API:
| Signal | Source |
|---|---|
| Synthetic probe (status page, /is-X-down) | On-demand HTTP request from Vercel iad1 |
| Continuous monitor checks (paid) | monitor_checks table — 60s or 5-min cadence |
| Community reports | outage_reports table — Report issue / Working buttons |
| Aggregated report counts | report_snapshots table — 15-min rollups |
| Detected incidents | live_incidents table — snapshot cron with 3× baseline rule |
| AI summary + sources | Vantlir Intelligence (Anthropic + Perplexity APIs) |
| Official status pages | Status-page polling cron — statuspage.io endpoints |
| Alert deliveries | alert_channels table + Resend / Telegram / Discord webhooks |
Questions or corrections?
If something on a status page looks wrong, let us know. We'd rather be told we're wrong than appear right.