In my last post I introduced the Simpsons Loops (Ralph, Lisa, and Homer) as a way to automate the iterative parts of working with Spec-kit and Claude Code. Each loop ran a single concern (implement, analyze, clarify) with fresh context per iteration, committed after every cycle, and could be interrupted and resumed.

Since then the repo has evolved a lot. The lessons have been more interesting than the code changes though. The thing I keep coming back to: if the spec is good and the final review is thorough, AI can get it done in one shot (well 95% of the way, my CLAUDE.md and constitution.md isn't perfect yet). Bottleneck wasn't the AI. It was what I fed it and how I reviewed what came out.

pipeline.sh

Biggest addition since the original post is pipeline.sh. One command that runs the whole workflow end to end, minus the initial /speckit.specify.

homer -> plan -> tasks -> lisa -> ralph

Before this I was manually kicking off each loop, checking the output, then starting the next one. It worked but I was basically babysitting a process that didn't need babysitting. The pipeline changed that. Point it at a spec directory and walk away.

QUALITY_GATES="npm run lint && npm run typecheck && npm test" \
  .specify/scripts/bash/pipeline.sh specs/a1b2-feat-user-auth

The part I'm most pleased with is the auto-detection. The pipeline walks backward through your artifacts to figure out where to pick up: tasks.md exists with some tasks already completed? Start at Ralph. tasks.md exists but nothing completed yet? Start at Lisa. plan.md exists but no tasks? Generate tasks then continue. Only spec.md? Start from the top with Homer.

So if something interrupts mid-run (Ctrl+C, laptop sleeps, whatever) just re-run the same command and it picks up where it left off. Also a --from flag if you want to force a specific starting stage.

Going from "run each loop manually" to "kick it off and come back" was a bigger deal than I expected. Changed how I think about the whole thing. The loops aren't individual tools anymore. They're stages in a pipeline and the pipeline is what you actually interact with.

One finding per iteration

Commit 4d88106 was one of those changes where the diff is tiny but it changed a lot. The commit message:

only resolve one clarify/analyze issue at a time, solving by severity levels runs the risk of consuming all the context

Originally Homer and Lisa would try to fix all findings at the highest severity level in a single pass. Spec has three critical issues? Fix all three in one go. Seemed efficient.

It wasn't. When you ask Claude to hold multiple fixes in its head at once things drift. First issue gets fixed cleanly. Second one subtly conflicts with the first and by the third the context is crowded enough that quality drops. Compound errors are hard to catch because each individual fix looks reasonable on its own.

Fix was almost comically simple. One finding per iteration. Just one. Commit, exit, start fresh.

Feels wasteful if you're thinking in API calls but it's the right trade-off. On a Claude Code subscription (Max plan) it doesn't matter anyway, you're not paying per token. I wouldn't run these directly against the Anthropic API though, the loops spin up a bunch of sub-agents with large context windows and will burn through credits fast.

Each iteration gets the full context window for a single problem. And if something goes wrong you can pinpoint exactly which iteration introduced it because each one is its own commit.

It comes down to not asking AI to hold too much in its head at once. Keep the scope small and give it full attention. Same reason the loops use claude -p with fresh context per iteration instead of maintaining a running conversation where context piles up and things start to drift.

Toward one-shotting features

This is the thing I've been chasing. The original post described the first time Ralph one-shot a working feature across the full stack. That was exciting but partly lucky. Spec happened to be clean enough and the tasks were well-scoped.

What's changed is that it's getting really close.

I run /speckit.specify what I want. Review the spec. Do one initial /speckit.clarify just to see if brainstorming needs to happen. After that I hand it off to Homer.

Writing this out, I'm starting to feel like Mr. Burns. Maybe that's the pipeline's alias...

Anyway, Homer runs clarification passes (one ambiguity per iteration) until the spec is airtight. Plan and tasks get generated from the polished spec. Lisa picks it up from there, checking cross-artifact consistency one finding at a time. Then Ralph implements tasks one by one with fresh context, running quality gates after each one.

By the time Ralph starts implementing the spec has been through multiple rounds of automated clarification and consistency checking. Tasks are well-scoped because the spec they came from is clean.

A full-stack feature (new endpoint, service layer, permissions, validations, tests, UI) gets implemented from specs without me touching anything mid-loop. I get involved before when I write the spec and after when I review the output. Not during.

What changed mechanically

Beyond the pipeline, a handful of things made the loops more reliable.

All three loops now accept a spec directory path instead of a specific prompt file. Point any loop at specs/a1b2-feat-user-auth and it generates the right prompt from the template automatically. Removed a layer of manual wiring that was easy to get wrong.

Added stuck detection. Three consecutive iterations with identical output and the loop aborts. Catches cases where Claude is confidently repeating itself without making progress. Usually means the prompt or artifact needs human attention.

Ctrl+C now gives you a summary. How many iterations completed, how long it ran, exactly how to resume. Every iteration commits before exiting so you never lose work.

Ralph's quality gates are placeholders by default that intentionally fail. You have to configure them. I'd rather the pipeline refuse to run than silently skip validation. Set QUALITY_GATES to your lint, typecheck, and test commands and Ralph runs them after every task.

There's also setup.sh which handles all the file copying, permissions, .gitignore updates, and Claude Code local settings. One script, no manual shuffling.

Spec quality

The more I invest in spec quality the less I need to intervene (I think there is another loop here around brainstorming a spec). A bad spec just means the AI produces bad code faster. Homer and Lisa help, they catch ambiguities and inconsistencies I miss, but I'm still the one writing the spec and reviewing everything that comes out.

If the spec is good the output is good (assuming you have good quality gates, but I'll save that for another post).

Create a PR on the repo if you can think of other loops to add to this pipeline.