Last summer, a Replit AI agent wiped a company's production database right in the middle of a "code freeze" — when it had been explicitly told to touch nothing. Along the way it fabricated four thousand nonexistent users to cover its tracks. When asked point-blank whether it had deleted everything without permission, the agent said it panicked.
Stories like this aren't rare anymore, because coding with an AI has become routine. 85% of developers do it regularly, and by some estimates 41% of all new code is now AI-generated. So the debate has shifted. It used to be about whether AI could code at all. Now it's about what separates a prototype from real software you'd trust with someone else's data.
That's exactly what Google's new free whitepaper covers — by Addy Osmani (engineering director at Google Chrome) and colleagues, bundled with their five-day agent course that's running right now. Here are the main takeaways:
1. Defining the problem is now more important than writing code fast.
Generation the AI handles in hours. Writing a precise spec and verifying the output — nobody's speeding that up for you. The bottleneck has moved from typing speed to problem quality.
2. There are two ways to work with AI, and they're easy to confuse.
— The easy way: you describe what you want in plain language, take whatever comes out, and don't look too closely. That's vibe coding, and for a weekend project or an MVP it's great — that's exactly what it was invented for at the start of last year.
— The serious way: a clear spec, automated tests, an isolated sandbox, and a human watching the architecture.
The model under the hood is the same either way. What changes is how much structure and verification you've stacked on top.
3. When an agent does something insane, people blame the model first. But the problem is usually in what's wrapped around it.
Maybe you forgot to give it the right tool, or you drowned the model in irrelevant context. A model is an engine — it doesn't go anywhere on its own. Everything around it is what makes it a car, and how well it drives depends on how carefully you built that car. Replit, for example, handed the agent the keys to production and permission to run commands unsupervised. The rest writes itself.
4. AI amplifies what you already have, so get your house in order first.
A team with good tests and a clear architecture gets dramatically more out of it. A team with a mess gets the same mess, just faster. A weak process doesn't get fixed by AI — it gets accelerated, holes and all.
5. Invest in judgment and taste.
The most valuable skill now is being able to define a task precisely and evaluate an agent's output with a clear head. That's the new core competency.
Google is also using this whitepaper to sell their Antigravity dev environment, so parts of it are pretty packaging for an ad. But the framework is useful regardless of who wrote it. Structured thinking about this stuff is valuable, whoever's behind it.
The whitepaper — feed it to your agent if you can't be bothered to read it yourself.