The move was upstream
I spent most of a day pointing at broken cases. Each time I pointed at one, the model patched it. The next regeneration would surface a different broken case in a different file. We’d patch that. Then a third. By late afternoon I was the one suggesting we slow down and return to first principles — the model wasn’t pulling me up a level on its own.
The bug itself was small. I’ve been wiring up a pipeline that turns transcripts of my conversations with AI tools into clean Markdown for an Obsidian vault. The reasoning part — extracting topics, summarizing, tagging — was working. The rendering kept breaking. Stray characters, broken callouts, content that ended up harder to read than the original conversation had been. I was working it inside Gemini CLI, mostly on their fast tier with occasional escalation to their reasoning model. Twelve back-and-forth turns across roughly six hours of intermittent work, and we still hadn’t landed it.
That evening I opened a fresh session in Claude Code on Opus 4.7. New context, no commitments to anything we’d already tried. I described the problem and pasted screenshots. Forty-five minutes and nine turns later, the fix was committed and verified clean.
The pivotal moment came partway through. After I’d shown Opus yet another broken example, it replied with a sentence that flipped the work:
“I’ve been adding layer on layer of text mutation without making them aware of where they shouldn’t apply. The simple thing we’re missing is that all these mutations need to skip code regions.”
That’s a structural observation, not a patch. The bug wasn’t in any one of the individual fixes — they were all roughly correct in isolation. The bug was that several fixes were stepping on each other because none of them knew where they shouldn’t run. Once that was named, the actual code change was small and final.
There’s a book by Dan Heath called Upstream about exactly this shape of problem. The argument is that we spend most of our energy reacting to problems downstream — handling each new symptom as it surfaces — when the leverage is almost always upstream, at the source. A river of broken rendering cases is downstream. Mutation passes that don’t know about code regions are upstream.
That’s what I think actually happened across those two sessions. Gemini was downstream, dutifully patching each case I pointed at. Opus walked upstream and named the source. Both are real moves. The mistake was mine for staying downstream as long as I did.
I want to be careful about the easy version of this story. Opus 4.7 is the top of Anthropic’s reasoning lineup; the model I spent most of the Gemini session on was the fast, cheap tier. That’s a budget difference as much as a capability difference, and the fairer comparison would have involved escalating to Gemini’s reasoning model sooner inside that session, before I switched harnesses at all. I also don’t think those Gemini turns were wasted — by the time Opus saw the problem, the failure modes had been thoroughly characterized, and that almost certainly made the upstream pattern easier to spot.
The lesson I’m carrying out of it isn’t which model is smarter. It’s this: when you’ve patched the same kind of thing three times in a row, you’re downstream. The next move is not another patch. It’s stepping back far enough to ask what’s producing the river.
A fresh window with a different model — or even just the most capable model in the same harness — is one of the cheapest ways to force that step. Not because the new session is smarter than the old one. Because it can’t inherit the downstream assumptions you and the first model built up together. It has no choice but to look at the problem from where it actually starts.
The cost of opening a second tab is small. The cost of staying downstream is the rest of your day.