The Defender's Tax: a security argument for open AI

Frontier AI safety taxes the defender far more than the attacker. A structural critique — no conspiracy required — and why security specifically needs capable, open AI. Draft.

Ask a frontier model to audit your own web app for vulnerabilities and watch what happens. On some systems the request alone trips a gear-change — the most capable model quietly steps back and a smaller, more constrained one takes over the moment it reads the word audit; the tone shifts from assistant to compliance officer; sometimes the task is refused outright. The capability is there. It’s just been decided that you shouldn’t have it at full power, by default, without friction.

I build security tooling for a living, so I hit this wall constantly. And it made me notice something uncomfortable about the safety regime we’ve built around frontier AI: it taxes the defender far more than the attacker.

The asymmetry

A defender plays by the rules. He uses the official API, with his real account, for legitimate work — pentesting his own systems, hardening his own code. He is exactly the person the guardrails slow down.

The attacker does none of this. He jailbreaks. He fine-tunes a model to strip its refusals. He runs an open-weight model with no guardrails at all. Or he is a nation-state with its own lab and no terms of service. Every layer of safety we add to the compliant product is a layer the attacker simply steps around.

Security people have a name for this shape: the offense–defense balance. When you raise the cost of a capability for everyone, you raise it most for the people who were going to use it within the rules — because they are the only ones the rules can reach.

It doesn’t require a conspiracy

It is tempting to read intent into this — that someone wants ordinary developers to stay less capable. I don’t think you need that story, and the version without a villain is more damning, because it is structural.

Three ordinary forces produce the effect on their own. Liability and PR: no lab wants the headline “our AI wrote the malware.” Regulatory hedging: published scaling policies now gate CBRN uplift outright and are beginning to tier access to cyber capability, because that is what regulators and the public are watching. Genuine risk: the capability really is dual-use — the same model that finds a flaw to fix it finds it to exploit it.

None of these actors has to be malicious. The aggregate is still a world where the most powerful defensive capability sits behind the most friction for the ordinary defender.

So — a plot to keep us insecure? No. But sit with the micro-truth underneath the denial: a small number of actors now decide who gets to defend at full strength, and they are not the ones who get breached when the rest of us can’t. You don’t need intent for that to be the problem. You only need it to stay true.

Why “just use open models” isn’t an answer yet

Here is the part the optimists skip. Today the open-weight models you can run without guardrails are meaningfully weaker than the gated frontier — and running them well takes hardware most developers don’t have. So the small builder is taxed twice: throttled on the frontier he is allowed to touch, and under-powered on the open models he is allowed to own.

The frontier defensive capability — the thing that would let one person secure a codebase at the pace attackers now move — is concentrated in a handful of paid, gated APIs and the well-resourced actors who can use them at full strength. That concentration, not any single refusal, is the real problem.

We have run this experiment before

In the 1990s the United States classified strong encryption as a munition and tried to keep it out of civilian hands. The stated reason was the one we hear now: criminals, terrorists, catastrophe. The effect was weaker security for everyone who obeyed, while the knowledge spread anyway. It ended — Bernstein v. United States, PGP walking out the door — because the field re-learned a law it already knew: security does not come from hiding capability; it comes from distributing it in the open, where it can be inspected and improved. Kerckhoffs stated it in the 19th century. The Crypto Wars proved it again.

We are now fighting Crypto Wars 2.0, with model weights in the place of cipher code.

The conclusion I keep arriving at

I am not arguing against safety. I am arguing against a specific outcome: a world in which frontier defensive capability is a luxury good — held by states and large platforms — while the people who build and run most of the software, the long tail of developers and small teams, are structurally kept a step behind the offense. That world is not safer. It is more fragile, because most of the attack surface ends up defended by the people who were denied the best tools.

The capability will proliferate regardless; it always does. So gating mostly disarms the compliant. The honest response is not less safety — it is:

Capable open models — good enough to actually defend with, not toys.
Accessible compute — so that “you can run it” doesn’t secretly end “…if you own a datacenter.”
Safety that authenticates the defender instead of refusing everyone — proof-of-authorization for dual-use work, not a blanket no that only the honest obey.

The world does not need AI that is powerful only in a few hands. For security specifically, that is the dangerous configuration. The world needs capable AI that is genuinely open — and it needs it before the gap between who can defend and who can attack hardens into something permanent.

I have skin in this. The tools I build — a memory layer, a security-audit platform — are open source, MIT, run-it-yourself. Not as a business model: as a position. If defensive capability is going to be gated, at least some of it should be free.

Grounding: Anthropic’s Responsible Scaling Policy (CBRN capability thresholds; cyber handled via tiered access) and OpenAI’s Preparedness Framework; the historical parallel is the 1990s Crypto Wars — encryption classified as a munition, Bernstein v. United States establishing source code as protected speech, the PGP export case, and the relaxation of controls by decade’s end. The open-vs-frontier capability gap is an observation from daily practice, not a formal benchmark.