Scheduled job stopped running and nobody noticed
Digests stop sending, syncs drift stale, cleanup never runs — and you find out days later from a customer. The scheduler dashboard shows nothing obviously wrong.
- Bubble
- Supabase
- Replit
- Any stack
- Scheduled jobs
Root cause, in plain English
Schedulers fail silent-by-default: a job that doesn't run produces no error event anywhere you look. Bubble's recurring and recursive workflows stop permanently once a run errors before rescheduling itself or the plan that allows them lapses; pg_cron jobs fail inside the database where app logs never see them; and platforms that sleep idle apps simply aren't awake at trigger time. Every path ends the same way — silence that looks like health.
How to fix it
Establish the last successful run from the job's own output (last email sent, last row written), not from the scheduler's configuration screen — "scheduled" and "running" are different facts.
Check the scheduler's execution history: Bubble's Scheduler/Logs tabs, cron.job_run_details for pg_cron, your host's deployment logs. Look for the first failed or missing run, not the most recent.
Fix the specific break: re-arm a stopped recursive workflow by scheduling it once manually, repair the failing step it died on, or move jobs off hosts that sleep.
Make the job prove it ran: add a one-line HTTP ping to a heartbeat URL as the job's final step, so completing the work and reporting it are the same action.
Alert on absence: a heartbeat monitor that expects the ping every N hours turns "silently didn't run" into a page within one missed cycle.
Go deeper: read the full guide · copy the open-source health-check recipe.
How Nightlamp detects this automatically
- Heartbeat
- API canary
A heartbeat monitor inverts the failure mode: your job pings Nightlamp on every successful run, and the alert fires when the ping doesn't arrive — which is the only reliable way to detect something that didn't happen. An api_canary can additionally verify the job's output (fresh rows, recent timestamps) through your API.
Catch this before your customers do
Nightlamp runs these checks continuously against your live app and sends a plain-English diagnosis — not a wall of logs — the moment this pattern shows up.
Related patterns
Frequently asked questions
- Why didn't anything alert me when the job stopped?
- Monitoring is built around events: errors, status codes, log lines. A job that never starts emits no event, so there is nothing to alert on. You have to monitor for the absence of an expected signal — that's what heartbeat checks are for.
- Why do Bubble recursive workflows stop permanently?
- A recursive workflow keeps running only because each run schedules the next one. If any run errors before that scheduling step — or the app's capacity or plan blocks it — the chain has no next link, and it stays stopped until a human schedules it again.
- Where do pg_cron failures show up?
- Inside Postgres: query the cron.job_run_details table for status and return messages. Failures there never reach your application logs, which is why a broken database job can hide for weeks.
- What interval should a heartbeat monitor expect?
- The job's schedule plus a grace window for normal variance — for a nightly job, expecting a ping every 24–26 hours is typical. Tighter than that pages you for jitter; looser than that delays detection by a full cycle.
Newsletter
Get new incident patterns as we publish them
One email when new failure patterns, fixes, and monitoring recipes for no-code and AI-built apps land. No fluff, unsubscribe any time.
Double opt-in. One-click unsubscribe. No spam, ever.