Heartbeat monitors

A dead-man's switch for cron jobs, schedulers, queue workers, and self-converging control planes — alert on the run that didn't happen.

A heartbeat monitor is passive: instead of Nightlamp probing your service, your job POSTs (or GETs) a unique ping URL on each successful run. If no ping arrives within the expected interval plus a grace window, Nightlamp opens an incident. It catches a job that silently stops, throws and swallows the error, or exits 0 without doing its work — failures a normal uptime check, which only sees a healthy endpoint, will never notice.

What is a heartbeat monitor?

Every other check type is active — Nightlamp makes a request and evaluates the response. A heartbeat is the inverse: the monitored job tells Nightlamp it succeeded, and the absence of that signal is the alert. You configure one number — expected_interval_seconds, how often the job should check in — and Nightlamp watches for a gap.

expected_interval_seconds — how often the job is expected to ping (30–604800s).
grace_seconds — extra slack before a missed ping alerts. Optional; defaults to half the interval, so an incident opens at roughly 1.5× the interval.
last_ping_at — set by each ping; read-only.
heartbeat_ping_url — the unique URL to ping; generated when you create the monitor.

How do I create one and where is the ping URL?

Create it like any monitor — pick Heartbeat — dead-man's switch as the type and set the expected interval. The ping URL is generated on creation and shown on the monitor page (and on the edit form). The unguessable token in the path is the credential, so treat the URL as a secret; rotate it by recreating the monitor.

From automation, POST to the monitors API:

Create

curl -X POST https://api.nightlamp.app/apps/$APP_ID/monitors \ -H "Authorization: Bearer $NIGHTLAMP_API_KEY" \ -H "Content-Type: application/json" \ -d '{"name":"converge","check_type":"heartbeat","expected_interval_seconds":600}'

The response includes heartbeat_ping_url. The monitor's check-type is fixed after creation, but the interval and grace are editable.

How do I ping from a cron job?

POST or GET the URL after the work succeeds. GET is accepted so plain curl / wget one-liners work. Ping only on success — putting the ping at the end of the script, after the real work, means a crash or non-zero exit withholds it.

cron

*/10 * * * * /usr/local/bin/converge.sh && curl -fsS -X POST https://api.nightlamp.app/heartbeat/$MONITOR_ID/$TOKEN

For a self-converging control plane, gate the ping behind a post-run health check so a rolled-back or no-op convergence withholds it — see the control-plane self-convergence guide.

When does Nightlamp open an incident?

On each evaluation, Nightlamp compares the current time against the last ping. When now − last_ping_at > expected_interval_seconds + grace_seconds, the monitor goes down and an incident opens through the normal alert pipeline. Before the first ping ever arrives, the clock runs from the monitor's creation time — so a job that never starts pinging is caught too, not just one that stops. When a ping arrives after a miss, the next evaluation resolves the incident automatically.

The ingest endpoint and its payload

Pings go to https://api.nightlamp.app/heartbeat/{monitor_id}/{token} over POST or GET. There is no session or API-key auth — the unguessable token in the path is the credential. Every accepted ping returns {"ok": true, "received_at": <unix>}; any miss (wrong id, wrong token, deleted monitor) returns a uniform 404 so the endpoint never reveals which ids or tokens exist.

A POST may carry a JSON body of numeric fields. Nightlamp stores it as last_ping_metrics on the monitor and evaluates it against any payload_thresholds you've set (below). GET pings carry no body and leave the previously-stored metrics untouched, so a plain cron one-liner still keeps the dead-man's switch alive.

ping with metrics

# POST a JSON body of numeric fields; Nightlamp stores it as last_ping_metrics curl -fsS -X POST https://api.nightlamp.app/heartbeat/$MONITOR_ID/$TOKEN \ -H "Content-Type: application/json" \ -d '{"backlog": 12, "oldest_age_seconds": 47}'

Alerting on a value, not just a missed ping

A heartbeat can also carry numeric payload thresholds. On each evaluation, once Nightlamp confirms a ping arrived on time, it reads the most recent last_ping_metrics and checks every rule. If any rule is breached the monitor goes down with a detail like backlog=83.0 > 50.0. This lets a job self-report not just that it's alive but that its work is actually keeping up — e.g. a forwarder whose queue is draining and whose oldest item is fresh.

field — a key in the posted JSON. Dotted paths like queue.backlog resolve into nested objects.
op — one of gt, gte, lt, lte, eq, ne.
value — the numeric threshold to compare against.

A field absent from the latest ping is skipped, never treated as a breach, so partial payloads don't false-alarm. A missed ping always takes precedence over a threshold breach — a stale heartbeat is the louder signal. Up to 20 rules per monitor.

create with thresholds

curl -X POST https://api.nightlamp.app/apps/$APP_ID/monitors \ -H "Authorization: Bearer $NIGHTLAMP_API_KEY" \ -H "Content-Type: application/json" \ -d '{"name":"forwarder","check_type":"heartbeat","expected_interval_seconds":600, "payload_thresholds":[ {"field":"backlog","op":"gt","value":50}, {"field":"oldest_age_seconds","op":"gt","value":300} ]}'

How do I pause alerts during maintenance?

Pause the monitor for the window — the Pause control on the monitor page, or PATCH /monitors/{id} with a future paused_until timestamp. A paused monitor is skipped by the evaluator, so a deliberately stopped job won't page on-call, and it resumes automatically when the window passes.