Probability Has no Guardrails

Stop trying to fix agent safety in the system prompt. It won't work.

The prompt is just more tokens in the same probability soup as everything else. There is no privileged channel, no kernel mode, no enforcement layer sitting between your rules and the next token ensuring alignment. You can't harden a probability distribution by appending text to it. If your safety only exists because the model read about it, all you've ensured is that when it deletes production, you'll receive a lovely post-mortem explanation where the model carefully lists all of the "rules" it broke along the way.

Any system involving GenAI has to be divided into two distinct regions separated by a firm boundary: the deterministic / traditional workflow side and the wobbly bits where the AI lives. Any deterministic fix to a probabilistic problem has to sit on the non-wobbly half of the divide, in the harness you've built around the agent. Capability scoping, approval gates for destructive operations, separate credentials per environment, immutable backups, dry-run-by-default for anything that touches state, audit logs outside the blast radius. This is all the same platform work that we've been doing for decades, just around a different type of non-deterministic actor.

We built human gates on automated systems to ensure the system did not automate their destruction and least privileged permission models to ensure that even then the blast radius for any mistaken or malicious action was minimized. When GenAI came along, we completely collapsed the boundary between automated system and gate under the quest for efficiency. Is it any wonder that we're seeing the regressions we are?

In your environment, if you treat your GenAI agents as anything other than unreliable junior engineers, you're asking to be made an example of in every other LinkedIn post for a week. If you've been in a position of leadership long enough and run any type of operation at scale, you know this already.

If just asking a junior engineer things like "please don't take down production by copying and pasting IAM roles between accounts without changing the account ids" worked as guardrails, I would have experienced at least one fewer production outage last year. That we didn't have more was the outcome of having had the engineer build his own harness - in the form of an automated pipeline - after the first outage.

Your production database cannot be restored through the power of hindsight alone. Stop making it necessary. The patterns required to utilize GenAI are nothing new, an unreliable actor you use as a last resort because a deterministic path is unavailable or too expensive. The only difference is, they don't attend standup and you can't put them on a PIP when they won't stop making the same mistakes.