Claude 4.8's best upgrade is that it admits when it's wrong

Anthropic put out a new version of Claude last week. Opus 4.8. I almost didn’t bother reading the announcement, because a new one shows up every month or so now and they mostly say the same thing. Scores went up a few points. It’s a bit faster. You read one, you’ve basically read all of them.

But there was one line in this one that had nothing to do with scores, and that’s the only part that stuck with me.

Anthropic says the new version is around four times less likely to write bad code and then stay quiet about it. That is the upgrade they are proud of. Not that the code is better. That when something is off, or it isn’t sure, it is far more likely to actually say so. It got better at telling you “I’m not confident about this part” instead of sounding sure and hoping you don’t check.

If you have never used these tools for real work, that sounds like a small thing. If you have, you know it isn’t. The problem was never really that the AI gets things wrong. Everyone gets things wrong. It was that it would get something wrong while sounding totally sure, so you couldn’t tell the good answers from the bad ones until much later.

The useful version isn’t the one that gets everything right. It’s the one that owns up when it doesn’t.

Here is what that buys you, and it’s slightly mad. The developer behind Bun, a tool a lot of programmers use every day, decided to rewrite the whole thing. Not a patch. The entire project, hundreds of thousands of lines, moved from one programming language to another that crashes less often (Zig to Rust, if the names mean anything to you). Normally that is months of careful work for a whole team. He got Claude to do most of it by running hundreds of copies of itself at once, each one checking the others. Eleven days, start to finish. His reason was pretty down to earth, actually. He said he was sick of dealing with crashes and memory bugs, and that nobody made him do it.

Then people looked at it properly, and the arguing started. Nobody can read hundreds of thousands of lines of code by hand, so really you are trusting the AI’s word that it did a clean job. A well-known developer, Theo Browne, pointed out that the rewrite is full of little shortcuts that switch off the exact safety the new language was meant to give you. Around thirteen thousand of them, where a similar tool might have fewer than a hundred. You move to a safer language and then poke thirteen thousand holes in it. And it isn’t even finished. Might go into a future release, might not.

Which is the whole reason the honesty part matters. If you are letting an AI write more code in a week than a team writes in a year, you need it to be straight with you about what it got wrong. If it quietly hides the messy parts at that scale, it isn’t saving you anything. You just find the problems later, when they are much harder to fix.

Quick honest note, because I’m not trying to sell you on this. Most people who have actually used 4.8 say it is a small step up, not some huge leap. Simon Willison, who reviews these carefully and doesn’t do hype, called it “a modest but tangible improvement,” which matches how it feels to me. It still fumbles small easy tasks now and then. And that run-hundreds-of-copies feature will eat through your usage limit frighteningly fast. Some people hit the cap in ten minutes.

I’m writing this in Claude Code, on 4.8, with that same hundreds-of-copies button sitting right there in the editor. The thing that rewrote an entire project is just installed on my laptop now. Normal. Billed by usage like everything else.

And this is the small model. Anthropic has a much stronger one called Mythos, the one I wrote about a while back that they said was too dangerous to release, and word is it’s reaching more people soon. Getting the cheaper, safer model to own up to its mistakes before they hand out the powerful one sounds sensible. Also a bit ominous.

So next time one of these launches lands, skip the score charts. The thing worth asking is whether you would trust it enough to walk away from your screen and let it work without watching over it. That is harder to put in a chart, which is probably why nobody leads with it.

If you want the official version, Anthropic’s announcement is here. Fair warning, the first third is charts, and the charts are the least interesting thing in it.