Notas del episodio
The model isn't the problem. I went back through 20 of my agent's pull requests and the failures looked exactly like a junior's first month.
3 of them tried to rewrite things nobody asked them to rewrite. 5 skipped the test, or wrote a test that would have passed either way. 4 fixed the bug but broke something else in the process.
I used to assume model quality was the main driver. It isn't.
The agent doesn't ship a one-line fix. It opens a change touching twelve files, half of them unrelated to the bug. It writes the code and skips the test. Or it writes a test that proves nothing.
But notice that none of these are model problems. They're the same review failures a junior would ship. Just at higher volume and with more confidence.
The practical move: stop logging "the agent failed" and start logging why. Counting ch ...