Limiting the Blast Radius
When it comes to cyber-attacks, I've never believed in “if it happens”. The honest framing is “when it happens”. Any business operating on the internet long enough will get hit, or at least, it should be your working assumption.
The same logic applies to coding agents introducing major defects. Not if one of them ships something catastrophic. When. The setup that survives is the one that already assumes something bad will eventually happen.
That framing is what pushed me to run a simulation on my own agentic setup in the last few weeks. I wanted to know what could go wrong.
My initial setup is standard. I run coding agents locally on my dev machine most of the time. I don’t have any credentials or token setup by default on my machine (npm keys, aws tokens, CloudFlare wrangler, etc.), and only my pipeline inside of GitHub actions is authorized to deploy to production. So, in theory, a rogue coding agent could not destroy my production database without going through the pull request review process.
But my simulation helped me realise that I was one mistake away from that not being true anymore. Not through anything sophisticated. For example, just by having a human (me) temporarily pasting an AWS authorization token in a terminal session, and using the same session for an agent later by mistake.
That's a thin line.
The shift in the threat model
I wrote earlier this month about the cybersecurity convergence: AI offense getting cheaper at the same time AI-built apps are getting more vulnerable. What I didn't write about is what happens inside your own system when an agent does something you didn't intend.
Most of the conversation about AI safety in code is about prevention. Better tests. Better guardrails. Better review agents reviewing the coding agents. All of that matters, but prevention has a ceiling. At some point, an agent will run in a misconfig terminal. And when that happens, the only question that matters is: how much can it actually break?
The Shopify model, applied inward
Shopify doesn't trust the apps in its app store. Not because all developers are malicious, but because trust at scale is impossible. So the platform was built around the assumption that any individual app could misbehave at any time.
Apps run in iframes. They get short-lived tokens. They communicate with the core platform through a controlled message bus. Their backends integrate through scoped APIs that know what they're allowed to touch and what they aren't.
The result: a third-party app can have a terrible day, get hacked, or push a broken update, and the platform absorbs it. The merchant's data is fine. The checkout still works. The blast radius is the app, not the store.
I've been thinking about what happens if you apply that pattern inward. Not to third-party developers. To your own code. To your own agents.
The core app becomes a thin wrapper: authentication, permissions, navigation, core data model, billing. Maybe an audit log. Nothing else.
Every other piece of functionality is built as if it were a third-party app you don't fully trust. It loads in an iframe. It receives a short-lived token scoped to exactly what it needs. It talks to the wrapper through messaging. Its backend has scoped access to the wrapper's backend, never direct access to anything underneath.
When an agent works on the dashboard, it's working inside that sandbox. The worst it can do is break the dashboard. The auth layer is untouchable. The user records are untouchable. The token it is holding expires before it could be useful for anything else.
Why this used to be a bad idea
Architecting a small product this way used to be insane. The overhead of building a platform-style structure for a single-team product (the boundary contracts, the message bus, the scoped APIs) was hard to justify when one developer could just write the code in the same repo and ship it.
But the math has shifted.
The expensive part of this architecture was always the boundaries. Defining contracts. Writing the glue. Maintaining the scoped APIs as the product evolves. That work is exactly the kind of repetitive, well-specified, high-context coding that agents are now good at.
The structural overhead got cheap. And the value of the containment it offers just significantly increased.
Trusting the agents
The reason I never built software this way before is because I trusted my own code. Or more accurately, I trusted the small number of humans writing it, and the review process that caught their mistakes.
I don't fully trust my agents yet. And I will probably never. The honest answer to the question "how confident am I that one of them won't do something catastrophic in the next twelve months" is: not confident enough to bet user data on it.
So I'm building like I don't trust them. The same way Shopify built like it didn't trust app developers, and ended up with a more resilient platform.
Maybe the right way to think about agents isn't as employees you're managing. Maybe it's as third parties you're integrating with.
