I'm glad I have chatGPT to turn that image with benchmarks into an accessible table lol. I like claude Code, but their accessibility in anything other than accidental CLI accessibility is frustrating. Try it. Load a screen reader like VoiceOver for Mac (cause I know most programmers use Macs) and go to claude.ai. In the "write your prompt to Claude" box, type something like "What will the weather be like tomorrow?" and press Enter/Return. Try closing your eyes for a good 30 seconds and within those 30 seconds, tell me how you'd know if a reply has been given by the model. Then try the same thing with ChatGPT. I would /love/ to be proven wrong.
curious if the 1m context window will be default available in claude code. if so, that's a pretty big deal: "Sonnet 4.6’s 1M token context window is enough to hold entire codebases, lengthy contracts, or dozens of research papers in a single request. More importantly, Sonnet 4.6 reasons effectively across all that context."
I've read that compute costs for LLMs go up O(n^2) with context window size. But I think it is also a combination of limited compute availability, users preference for Anthropic models and Anthropic planning to go IPO.
Those hours that with gentle work did frame
The lovely gaze where every eye doth dwell,
Will play the tyrants to the very same
And that unfair which fairly doth excel:
I really don't get these companies posting disingenuous benchmarks. Every time, they pick and choose who to compare against. Not comparing to the latest 5.3-codex is absurd when it's been out a couple of weeks now. Who are they trying to kid?
If you were writing a promotional post for your new model, would you include benchmarks of a competitor that's spanking you across the board? This is marketing.
It’s similar to or better than Opus 4.5 as per benchmarks, while being 2x-3x cheaper, definitely worth it over Opus 4.6, if cost/tokens is the concern.
Anthropic again running scared of the open weight models which are rapidly catching up to them. Not even Sonnet or Opus isn't going to help with that at all.
It has already happened with the music gen models already. It's only a matter of time when the open weight models will overtake Anthropic.
Expect them to dial up the scaremongering until they IPO. The Claude family of models are their only AI product that is keeping them alive.
reply