ClaudeLoops
/
On-call alert → AI runbook
All loops
OpsMedium 14 min· claude-opus-4

On-call alert → AI runbook

PagerDuty → Claude reads last 5 deploys + logs → draft runbook in the alert thread.

NOT DEPLOYEDNOT DEPLOYED
0188ms
Trigger
cron(0 7 * * *) fired · every day · 07:00
021268ms
Agent
claude-opus-4 · in 1168 tok · out 588 tok
03298ms
Tools
github-mcp/slack:files.upload → 200 OK · 228ms
0478ms
Verify
schema check · pydantic v2 passed
05168ms
Output
slack #ops · alert deduped x4
0648ms
Notify
audit log written · runbook link attached
SUCCESS
0%
0 runs
P50
0ms
median
P95
0ms
tail
AVG COST
per run
LAST OK
never
LAST FAIL
never
none
Latency · last 30 runs0 samples
no runs yet
Latest output · what your users see
PagerDutywarning

p95 latency on /api/checkout crossed 1200ms for 6 min. Correlated deploy: web@d41f2a. Suggested rollback command attached.

deduped ×8·runbook: runbooks/checkout-latency.md#rollback
// press Test to run once · Watch live to keep streaming · Deploy to make it real
The problem

Page goes off at 03:00. The on-call spends 15 minutes hunting for the latest deploy and the right log query before they even start debugging.

The outcome

Before the on-call opens their laptop, the incident channel already has: probable cause, suspect deploy, copy-paste log query, suggested rollback command.

Ingredients & skills

Secrets
  • ANTHROPIC_API_KEY
  • PAGERDUTY_WEBHOOK_SECRET
  • GITHUB_TOKEN
  • DATADOG_API_KEY
  • SLACK_BOT_TOKEN
Providers
  • Anthropic
  • PagerDuty
  • GitHub
  • Datadog
  • Slack
MCP servers
  • github-mcp
  • datadog-mcp
  • slack-mcp
#ops#incident#pagerduty

How it works

A PagerDuty webhook hands the alert to Claude. It pulls the last 5 deploys, tails the relevant logs, and posts a runbook draft into the incident channel within 30 seconds.

Step 1

1 — Webhook receiver

Verify the PagerDuty signature first. Always. Treat unsigned posts as hostile.

src/routes/api/public/pagerduty.ts
export const Route = createFileRoute("/api/public/pagerduty")({
  server: { handlers: { POST: async ({ request }) => {
    const sig = request.headers.get("x-pagerduty-signature") ?? "";
    const body = await request.text();
    if (!verifyPD(sig, body, process.env.PAGERDUTY_WEBHOOK_SECRET!)) return new Response("bad sig", { status: 401 });
    const incident = JSON.parse(body);
    await draftRunbook(incident);
    return new Response("ok");
  } } },
});
Step 2

2 — Gather context

Three parallel reads. Pass them all to Claude in one message.

typescript
const [deploys, logs, dashboards] = await Promise.all([
  github.listDeploys({ since: '30m' }),
  datadog.logs.search({ query: incident.tags.join(' AND '), from: '30m ago' }),
  datadog.dashboards.list({ tag: incident.service }),
]);
Step 3

3 — Draft and post

Single Slack message, threaded under the incident channel.

typescript
await slack.chat.postMessage({
  channel: `incident-${incident.id}`,
  text: runbook,
  blocks: toBlocks(runbook),
});
One-line deploy

The button above runs the same command with your saved config. This is the raw CLI form.

bash
locker deploy oncall-runbook --trigger pagerduty

Related loops