I read the entire thing fwiw (pseudo-retired life helps with time here).
It looks like it was a collaborative effort across multiple teams, where each team (research, security, psycology, etc etc etc) were all submitting ~10 pages or so. It doesn't feel like slop.
I'll copy the highlights here, but the tweets have imagery as well:
> The obvious hype - It crushes benchmarks across the board, and it does so with fewer tokens per task.
> Despite this, they don’t think it can self-improve on its own. There are still areas your average engineer does better with, and despite it accelerating tasks by 4x, that only translates to <2x increase in overall progress.
> They’re probably right to hold this back - its ability to exploit things is unprecedented. Any site running on an old stack right now or any traditional industry with outdated software should be terrified if this becomes accessible.
> Counterintuitively, while it’s the most dangerous model, it’s also the safest. They’ve also seen significant additional improvements in safety between their early versions of Mythos and the preview version.
> Anthropic does a really good job of documenting some of the rare dangerous behaviors the early models had.
> Interestingly, Mythos itself leaked a recent internal “code related artifact” on github.
> Mythos is also RUTHLESS in Vending Bench. Agent-as-a-CEO might be viable?
> The last thing: Mythos has emergent humor. One of the first models I’ve seen that’s witty. The examples are puns it came up with and witty slack responses it had when operating as a bot.
It looks like it was a collaborative effort across multiple teams, where each team (research, security, psycology, etc etc etc) were all submitting ~10 pages or so. It doesn't feel like slop.