Three steps.
Then we don't bother you.
Setup takes about twenty minutes. After that, the goal is silence — broken only by the occasional "here's what broke and how to fix it" note in your inbox.
Point us at the URLs that matter.
Drop your app URL and the URLs of the flows you care about — login, checkout, webhook receivers, scheduled jobs. For Bubble, Webflow, Softr, and similar, we ask for read-only editor access. For custom code we add a single webhook for deploys. No SDK, no code change.
- Read-only access only — nothing we do can break your live app.
- Setup wizard for the major no-code platforms; manual config for everything else.
- Twenty minutes from "I signed up" to "we're watching."
Around the clock. Every flow that matters.
We probe each flow on a schedule — every minute for the critical ones, every five for the rest. We watch for response codes, latency drift, schema changes, and silent failures (the cron job that didn't fire). When something looks wrong, we re-check three times before we declare it a real incident.
- Confirmed retries from us-east-1, so a transient network blip has to fail repeatedly before it pages.
- Latency baselines per-flow, per-hour — we know what "normal" looks like at 3 a.m.
- Webhook contracts pinned to your last-known-good schema.
A real engineer pages you with the fix recipe.
When something breaks, an on-call engineer (a real one — not a tier-one chatbot) takes the page, reproduces the break, and replies with the root cause plus a recommended fix in plain English. You — or your developer, agency, or no-code builder — ships the fix in your platform. We re-probe the flow afterward to confirm it's healthy again.
- First response within 5 minutes, 24/7 on Priority and White-glove — contractual.
- Root cause + fix recipe within 60 minutes on Priority — contractual, backed by the auto-credit guarantee.
- We determine a root cause and recommend a likely fix — no more than that.
- Every incident gets a written post-mortem the same day.
This is what we actually send you.
Not a stack trace, not a red dashboard — a written report your whole team can act on. Here's a sample, built on the failure mode we diagnose most often.
Checkout webhook failing — Stripe events not reaching Bubble
Root cause, in plain English
Your live Stripe webhook still points at the development URL — the one with /version-test/ in it. Test-mode events reach the development version of your app, so everything looked fine while you were testing. Live events hit a workflow that doesn't exist on the deployed app, Stripe got a 502, and queued retries. Nothing in your Bubble editor is broken — the pointer is.
Fix recipe
- Open the live webhook endpoint in Stripe
Stripe Dashboard → Developers → Webhooks. You have one endpoint receiving both test and live events — that's the first thing to untangle.
- Replace the URL with your deployed workflow URL
The live endpoint still points at the development URL — the one containing /version-test/. Drop that segment so live events reach the deployed version of the workflow.
- Keep test and live endpoints separate
One endpoint per mode, each storing its own signing secret. Mixed keys are the next failure waiting behind this one.
- Replay the failed events
In Stripe's delivery log, resend the failed checkout.session.completed events. The four affected checkouts complete without customers having to retry.
After you ship it, we re-probe the flow and confirm it's healthy — you get a one-line "all clear."
What we checked
- Live endpoint URL vs deployed workflow URLmismatch — root cause
- Event-type subscription (checkout.session.completed)subscribed
- Endpoint signing secret matches the workflowmatch
- Workflow exposed without user authenticationpublicly callable
- Adjacent flows (login, password reset, cron)healthy
Who actually answers the page.
Nightlamp is founder-operated. Incident response is led by the founding engineer, and every responder on the rotation is named on this page — we don't outsource diagnosis to anonymous contractors or a tier-one support farm.
Priority and White-glove pages go to a 24/7 on-call rotation. Baseline incidents are handled during business hours with a written post-mortem on every incident.
On-call engineers work from the read-only access you grant: scoped editor logins, webhook payloads from monitored flows, response codes, and platform logs for those flows. They cannot push code, deploy releases, or touch your customer database.
How we access your app — read-only, audited →Built Nightlamp and leads the on-call rotation personally. The person who picks up your incident page works directly with the product, the probes, and the post-mortems — not a tier-one queue.
Honest avatars, real names. As the rotation grows, every responder is added here — never stock photos, never invented engineers.
Six things that quietly carry the business.
Login & signup flows
Synthetic users running every minute. We notice when password reset stops sending email before your customer does.
Payment webhooks
Stripe, Paddle, Lemon Squeezy. We probe the contract — payload shape, status code, retry behaviour.
Third-party API calls
Klaviyo, Twilio, Postmark, OpenAI. We log when their latency drifts and when their schema changes break yours.
Scheduled jobs
Daily digests, weekly billing runs, hourly sync jobs. We track expected cadence and notice silence.
Email delivery
Magic-link, OTP, signup confirmation, and round-trip messages. AgentDraft lets us verify that the email actually reached an inbox.
Custom user journeys
Describe the path that matters in plain English. We turn it into a probe and watch it like the rest.
The response commitments, plan by plan.
One canonical table backs every response-time claim on this site — the same numbers you'll find on pricing.
| Plan | First response | Written diagnosis | Coverage | Commitment type |
|---|---|---|---|---|
| Watchtower | No human-response SLA — alerting only | Best-effort plain-English summary attached to alerts | Automated alerts in near real-time when a check fails | descriptive |
| Baseline | First response within 1 business day | Written post-mortem on every incident | Business hours, weekdays | descriptive |
| Priority | First response within 5 minutes, 24/7 | Root cause + fix recipe within 60 minutes typically ~30 minutes — typical, not a commitment | 24/7 on-call | contractual |
| White-glove | First response within 5 minutes, 24/7 | Same-day deep root-cause investigation · dedicated engineer | 24/7 on-call · dedicated engineer | contractual |
| Studio | Priority SLAs across all client apps | Priority SLAs across all client apps | 24/7 on-call | contractual |
Contractual numbers are backed by the SLA auto-credit. Miss a contractual SLA and we credit the full month — automatically on your next invoice. No tickets, no negotiation. Numbers labeled typical are observed medians, not commitments. Descriptive rows carry no minute-level promise.
How Nightlamp works, in five questions.
How long does setup take?
About twenty minutes. Drop your app URL, point us at the flows that matter (login, checkout, webhooks, cron), and grant read-only access. No SDK and no code change required for the no-code platforms we support.
What happens when a flow breaks?
A real on-call engineer takes the page, reproduces the failure, and replies with a plain-English root cause plus a recommended fix. Priority and White-glove carry a contractual first response within 5 minutes, 24/7; Priority also carries a contractual root cause + fix recipe within 60 minutes — see the table above.
Does Nightlamp deploy the fix?
No. We diagnose and recommend; you (or your developer or no-code builder) ship the change in your platform. After deploy, we re-probe the flow and confirm it's healthy.
What if I'm not technical?
The setup wizard speaks plain English: paste your URL, click the platforms you use, describe the flows that matter. The dashboard is a list of "things working" and "things we're handling."
How is this different from Pingdom or UptimeRobot?
URL pings tell you a page is up. We probe whether the actual flow works (login, checkout, webhook contract, scheduled job) and send a diagnosed root cause plus a recommended fix when it doesn't.