The reviewer must not be the author's friend

Why I built an adversarial code-review gate where each critic runs as a fresh AI instance that shares as little context as possible with the one that wrote the code — and why 'ask the model to check its own work' is worse than useless.

There’s a seductive idea in AI-assisted development: after the model writes the code, ask the same model to review it. It sounds like a free safety net. It isn’t. It’s theatre.

An instance that just wrote a solution is primed to defend it. Its context window is full of the reasoning that led to the code — every assumption, every shortcut, every “this should work” is right there, and the model reads that history as evidence that the code is correct. Ask it “is this right?” and you’re not getting a review, you’re getting a confirmation. Same context, same blind spots, same bugs — now with a stamp of approval on top. That’s confirmation bias with extra steps.

The design principle

So when I built critic-orchestrator — an MCP server that gates code before it ships — I made independence the whole point. A claim (“this fix is correct”) doesn’t pass because one model nods. It has to survive three critics, and each critic is built to attack, not agree:

Falsification — doesn’t ask “is the fix right?”, it asks “prove the test actually fails without the fix.” It stashes the change, runs the test, and demands to see red before green. A fix whose test passes without the fix is not a fix — it’s a test that checks nothing.
Caller verification — is the fixed function actually reachable from production, or did we fix dead code? Green tests on an unused path are a lie you tell yourself.
Counterexample — actively hunts for the input that breaks the claim, instead of the input that confirms it.

The verdict isn’t unanimity-by-default; it’s a split vote where a single well-founded objection blocks the commit. The gate is designed to make shipping harder, because the failure mode of AI-assisted development isn’t too little code — it’s too much unverified code shipping too fast.

Minimize shared context, on purpose

The critical detail: each critic should share as little context as possible with the instance that wrote the code. A reviewer that inherits the author’s reasoning inherits the author’s blind spots. So the strongest configuration spawns a fresh AI instance per worker — no shared chat history, no “here’s what I was thinking”, just the diff, the claim, and a mandate to break it. The reviewer meets the code as a stranger and a skeptic, which is the only way a reviewer is worth anything.

This is not a metaphor I reach for lightly: it’s the same reason human code review works better across teams than within one head, and the same reason double-blind beats single-blind. Independence isn’t a nicety. It’s the mechanism.

Does it actually catch things?

Yes — including bugs the author (me, and the model I was pairing with) had already convinced ourselves weren’t there. The one that made me a believer was an atomic-rollback bug in a memory-consolidation cycle that self-review had waved through twice; a falsification worker, starting cold, refused to accept the “it works” and produced the failing case. That’s the entire value proposition: the critic is worth having precisely when it disagrees with you — and it can only disagree with you if it isn’t you.

The general lesson

As we hand more of the writing to AI, the bottleneck moves from generating code to trusting it. And trust doesn’t come from a bigger model or a more confident answer. It comes from architecture: independent verification, minimal shared context, adversarial by construction, and a gate that would rather block a good change than pass a bad one. Build the skeptic before you need it.

critic-orchestrator is open source: github.com/aureliocpr-ctrl/critic-orchestrator.