I've been running Claude Code sessions in the background for a while now. Kick off a pipeline, walk away, come back to a branch with working code. That part works great. The part I kept getting wrong was everything around it.

How do you know when it finishes, or if it's stuck? What if it errors out at 2 AM and nobody notices until morning? What if it succeeds at 2 AM and the tmux session just sits there burning nothing but still cluttering your setup?

I spent the last couple of weeks building what I'm calling the coding-agent skill, a lifecycle manager for background Claude Code sessions. It's evolved a lot.

What you actually type

I wrote before about how the spec is the product — the idea that if you get the spec right, the implementation follows. That thinking drove how I structured the pipeline. Speckit generates the work. Three commands.

New Claude Code instance on main branch:

/speckit.specify I want XYZ. Watch out for FOO. ETC

Creates a branch with a generated spec. Review it, look at the clarification questions it raises.

Then clarify, as many rounds as you want:

/clear
/speckit.clarify

Can leave it blank or tell it what to focus on first. /speckit.clarify Change the pagination to 50 instead of 100. Each round clears context and starts fresh so nothing drifts.

When the spec says what you mean, kick off the pipeline:

/clear
/speckit.pipeline

Runs five stages automatically. Homer clarifies the spec (up to 10 iterations). Plan generation. Task breakdown. Lisa analyzes the plan (up to 10 iterations). Ralph implements each task one at a time with buffer for retries. Every stage runs as a sub-agent with clean context.

Whole thing can take hours. That's the "kick off and walk away" part.

And that's where the coding-agent skill comes in. Pipeline is the work. Everything below is what sits around it. How I know when it's done and what to do when it's stuck.

The ACP disaster

OpenClaw has an ACP (Agent Communication Protocol) runtime that lets you spawn isolated agent sessions. Thread-bound, persistent, full I/O history you can drop into from any channel. On paper it was perfect for background Claude Code work.

In practice it was a disaster. acpx claude prompt --session would fail with "Query closed before response received". One-shot exec worked fine. Persistent sessions — the thing I actually needed — did not. Dug into it. Bug somewhere in @zed-industries/claude-agent-acp or the acpx queue system. Not something I could fix from my side.

Spent more time than I should have trying to make it work before accepting the L. Wrote down exactly what failed and moved on.

The handoff

The problem with ACP and OpenClaw subagents is the same. They run inside the agent's process. No terminal you can attach to, no way to type into a running session.

Jama (my AI assistant) runs as a separate process with its own Anthropic API token configured in OpenClaw. That's how it has access to Claude. When I tell Jama to have Claude Code work on something, it doesn't try to do the work inline. The coding-agent skill launches Claude Code as an independent process, then watches it from the outside. Runner script launches it. Watcher monitors and cleans up. All orchestration, no reasoning work.

That split is why the rest of this works. Claude Code does the thinking. Jama does the babysitting. I drop in when I want to.

From exec to tmux

Next attempt: OpenClaw's exec tool with pty:true to launch Claude Code in the background. Worked, but I couldn't drop into a running session to see what was happening. It was a black box. I'd check process log and get a dump of output but couldn't type into the session if Claude Code asked a question or got stuck on something.

Switched to named tmux sessions. Changed everything. Now I can tmux attach -t claude-myapp-20260303-170000 from any SSH terminal and see exactly what Claude Code is doing in real time. Type directly if needed. Ctrl+B D to detach without killing anything. Jama can simultaneously monitor via tmux capture-pane and send input via tmux send-keys. Two observers, one session.

The naming convention matters more than you'd think. claude-<project-slug>-<timestamp> means I can glance at tmux list-sessions and know exactly what's running, where, and when it started.

Persistent logs

tmux panes are ephemeral. Scroll buffer fills up, session crashes, whatever. Pane output is gone. First time I lost a session's output to a tmux crash I added script -qf inside every session. It captures all terminal output to a log file on disk.

script -qf /tmp/coding-agent-myapp-20260303.log -c "cd ~/Projects/myapp && claude --dangerously-skip-permissions 'your task here'"

Log file is the source of truth. Not the tmux pane, not process output. The file on disk.

Two-layer completion detection

How do you know when Claude Code finishes? Two mechanisms, one instant and one backup.

The instant path: chain an openclaw system event after the claude command. When claude exits, the event fires immediately and I get a notification.

claude --dangerously-skip-permissions 'task' ; openclaw system event --text 'session finished' --mode now

Backup path: a watcher cron that polls every 5 minutes. Checks if the tmux session still exists, reads the log, determines if things are healthy or stuck. If the session vanished, it reads the log, summarizes what happened, and notifies me on Telegram.

Two layers because neither is perfect alone. The chained event misses crashes (process dies before the ; chain runs). The watcher cron has a delay.

The idle problem

Claude Code finishes its task and just... sits there. The tmux session is still running. The watcher cron sees RUNNING and reports HEARTBEAT_OK. Session could sit idle for hours.

First fix seemed obvious. Check the log file's modification time. If nothing's been written in 2 minutes, the session is idle. Read the last output. If it's an error, ping me. If it's clean, auto-kill and clean up.

Seemed efficient.

I was wrong

Ran a test. Launched Claude Code with a simple task: read /etc/hostname and tell me what it says. Watched the log file with a polling loop.

t=1 size=24002 mtime_age=2s
t=2 size=24002 mtime_age=4s
t=3 size=24002 mtime_age=6s
...
t=30 size=24002 mtime_age=61s

24,002 bytes. Didn't change for 61 seconds. Completely static.

Meanwhile, tmux capture-pane showed Claude Code's full UI: the welcome screen, the thinking spinner animation, the response, the idle prompt. All of that visual activity produced exactly zero new bytes in the script log.

Why? Claude Code's spinner uses ANSI cursor repositioning. It overwrites the same terminal position repeatedly. The bytes are \r and cursor movement escape sequences that rewrite in place rather than appending new content. script -qf captures those raw bytes but the file modification time only updates when new data actually gets written. Terminal repaints that overwrite existing positions don't trigger an mtime update consistently enough to rely on.

A stale log doesn't mean idle. It could mean Claude Code is actively thinking. Or stuck on a long npm install that isn't producing output.

If I'd shipped the mtime-only check, the watcher would've been killing active sessions that were just thinking.

Two signals, not one

Fix was straightforward once I understood the problem. Idle detection requires two signals, both present:

  1. Log file hasn't been modified in 2+ minutes
  2. The input prompt is visible in tmux capture-pane output

The prompt is Claude Code's "I'm done, waiting for input" indicator. If you can see it in the last few lines of the pane, Claude Code genuinely finished and is sitting there.

Log stale + prompt visible = idle. Auto-kill, clean up, notify. Log stale + no prompt = still working. Back off. Log fresh = actively producing output. Obviously working.

If the session is idle and the last output contains error indicators, don't auto-kill. Ping me on Telegram with the context so I can decide whether to investigate. If it's clean, tear it all down. Kill the tmux session, delete the log, remove the watcher cron, update the session tracker.

Model economics

I started with Opus for everything. The whole orchestration layer, all running on the most expensive model. It worked, but it was like hiring a senior engineer to watch a progress bar.

As the patterns stabilized I could see what actually needed reasoning versus what was just mechanical. Dropped the orchestration to Sonnet. Same results, fraction of the cost. The watcher cron runs every 5 minutes. That's 288 checks per day per session if something really goes sideways. These checks are dead simple: run three commands, compare two numbers, maybe read 100 lines of a log. No creativity required, just pattern matching.

Once I trusted the process, I pushed it down again to Haiku. Haiku handles session management and monitoring now. It's fast, cheap, and perfectly fine for "is this tmux session still alive? Is the prompt visible?" logic.

Opus still does the actual thinking. Claude Code runs on Opus for feature development and complex reasoning. Jama uses Opus when writing specs or reviewing code. But the babysitting layer? That's Haiku. The cost difference between Opus and Haiku polling every 5 minutes adds up fast.

The full lifecycle

Every background session now works the same way. Named tmux session with script -qf for persistent logging. System event chained after the command for instant completion detection. Watcher cron on haiku every 5 minutes as backup. Everything tracked in a session file that any channel can read.

The watcher checks if the session exists, whether it's idle (two-signal check), whether it's stuck or erroring, and cleans up when it's done. On completion it reads the log, summarizes what happened, notifies me on Telegram, and tears down all the scaffolding. Doesn't matter whether the chained event or the watcher caught it.

I can drop into any session from any terminal. I can ask Jama "how's session claude-myapp going?" from Telegram and get a progress summary. If Claude Code asks a question nobody's around to answer, the watcher catches it and pings me.

What it looks like

I'm on Telegram. I tell Jama "have Claude Code add pagination to the API endpoints in the web app."

Jama responds in a few seconds:

Session launched: claude-webapp-20260303-142000
Drop in: tmux attach -t claude-webapp-20260303-142000
Detach with Ctrl+B D to let it keep working.

Claude Code is already running in that tmux session. Persistent log at /tmp/claude-runner-webapp-20260303-142000.log captures all output. Watcher cron is ticking every 5 minutes.

I go make coffee. Come back, curious.

tmux attach -t claude-webapp-20260303-142000

Claude Code is halfway through wiring up cursor-based pagination on the listings endpoint. I watch for a minute, looks good. Ctrl+B D. Back to whatever I was doing.

Twenty minutes later, Telegram pings:

Session claude-webapp-20260303-142000 complete.
6 files changed. All tests passing. Branch: feat/api-pagination

Watcher caught the idle state. Log stale, prompt visible. Summarized what happened, tore everything down.

If something had gone wrong — test failures, Claude Code asking a question nobody answered — the watcher pings me but leaves the session alive. I attach, see what happened, type a response or fix it manually. Session stays up until I deal with it or kill it.

One message from me. Everything else is automatic.

Same pattern

Same lesson as the Simpsons Loops + Spec-Kit + Claude Code work. Don't trust a single signal. Test the actual behavior before you assume infrastructure does what you think.

I assumed script -qf would continuously update during Claude Code's thinking animation. Seemed like a reasonable assumption. Terminal is visually active, bytes are flowing. Ran a test, watched the numbers. Wrong.

One polling loop saved me from shipping a watcher that kills thinking sessions.

Still figuring it out

This whole setup is two weeks old. It works today, but I've already torn it apart and rebuilt it three times. The ACP attempt. The exec approach. Now tmux. Each version taught me something the previous one couldn't.

I don't think this is the final version either. The idle detection heuristics are good enough but not bulletproof. The watcher cron is simple but blunt. There are edge cases I haven't hit yet that will force another round of changes.

I'm sharing this as a snapshot of where I am, not a blueprint. If you're building something similar, steal what's useful and expect to throw some of it away. I know I will.

The OpenClaw skills, Simpsons Loops implementation, and Speckit pipeline code are all in the spec-kit-simpsons-loops repo. Poke around, fork it, break it.

If you have questions or want to swap notes on your own setup, open an issue on the repo — that's the best place for it. Bug reports, feature ideas, "I tried this and it exploded" stories all welcome. I'm figuring this out in the open and the feedback loop makes it better.

Teaching my AI to babysit itself