Claude Sonnet 4.6

devinprater · 2026-02-17T18:43:25 1771353805

I'm glad I have chatGPT to turn that image with benchmarks into an accessible table lol. I like claude Code, but their accessibility in anything other than accidental CLI accessibility is frustrating. Try it. Load a screen reader like VoiceOver for Mac (cause I know most programmers use Macs) and go to claude.ai. In the "write your prompt to Claude" box, type something like "What will the weather be like tomorrow?" and press Enter/Return. Try closing your eyes for a good 30 seconds and within those 30 seconds, tell me how you'd know if a reply has been given by the model. Then try the same thing with ChatGPT. I would /love/ to be proven wrong.

edding360 · 2026-02-17T19:45:57 1771357557

thanks for sharing! just tried it for the first time.. Anthropic should really do better

dchuk · 2026-02-17T18:16:22 1771352182

curious if the 1m context window will be default available in claude code. if so, that's a pretty big deal: "Sonnet 4.6’s 1M token context window is enough to hold entire codebases, lengthy contracts, or dozens of research papers in a single request. More importantly, Sonnet 4.6 reasons effectively across all that context."

pkaye · 2026-02-17T18:22:02 1771352522

Above 200k token context they charge a premium. I think its $10/M tokens of input.

_ink_ · 2026-02-17T18:31:33 1771353093

Interesting. Is it because they can or is it really more expensive for them to process bigger context?

cube2222 · 2026-02-17T18:40:23 1771353623

Attention is, at its core, quadratic wrt context length. So I'd believe that to be the case, yeah.

pkaye · 2026-02-17T18:51:59 1771354319

I've read that compute costs for LLMs go up O(n^2) with context window size. But I think it is also a combination of limited compute availability, users preference for Anthropic models and Anthropic planning to go IPO.

a_void_sky · 2026-02-17T17:48:26 1771350506

Opus 4.6 but cheaper

rishabhaiover · 2026-02-17T18:24:35 1771352675

I am not seeing it on claude-code yet

ChrisArchitect · 2026-02-17T19:38:03 1771357083

Discussion here apparently: https://news.ycombinator.com/item?id=47050488

mudkipdev · 2026-02-17T18:00:14 1771351214

What happened to sonnet 5?

meetpateltech · 2026-02-17T18:34:08 1771353248

They're probably saving 5 for a bigger leap.

hxugufjfjf · 2026-02-17T18:24:12 1771352652

Those hours that with gentle work did frame The lovely gaze where every eye doth dwell, Will play the tyrants to the very same And that unfair which fairly doth excel:

deanc · 2026-02-17T18:24:14 1771352654

I really don't get these companies posting disingenuous benchmarks. Every time, they pick and choose who to compare against. Not comparing to the latest 5.3-codex is absurd when it's been out a couple of weeks now. Who are they trying to kid?

falloon · 2026-02-17T18:31:57 1771353117

If you were writing a promotional post for your new model, would you include benchmarks of a competitor that's spanking you across the board? This is marketing.

AdamConwayIE · 2026-02-17T18:27:45 1771352865

There aren't really any of the typical benchmark suites targeting Codex 5.3 because it's still not in the API.

SWE bench for example creates a predictions file and evaluates the results in the harness. Without Codex 5.3 being in the API, it can't.

tomlis · 2026-02-17T20:55:07 1771361707

gpt-5.3-codex isn't available via the API yet. Pretty sure they were only testing via API access.

rvz · 2026-02-17T18:28:51 1771352931

> Who are they trying to kid?

People who do not know how reproducible research works.

Any benchmark that is presented by AI labs must be reproduced reliably by someone else independent of that AI lab presenting these results.

Otherwise, not only it is biased, these numbers can be just made up for marketing purposes.

cube2222 · 2026-02-17T18:14:07 1771352047

So tldr it seems like it's

- a reasonable improvement over sonnet 4.5, esp. with agentic tool use

- generally worse than opus 4.6

Probably not worth it for coding, but a win for anybody building agentic ai assistants of any sort with Sonnet.

Handy-Man · 2026-02-17T18:23:11 1771352591

It’s similar to or better than Opus 4.5 as per benchmarks, while being 2x-3x cheaper, definitely worth it over Opus 4.6, if cost/tokens is the concern.

To remind, Opus 4.5 was SOTA 2-3 weeks ago.

adastra22 · 2026-02-17T18:25:09 1771352709

Yes but Opus 4.6 is a massive step up. Some applications don’t need that power though.

rvz · 2026-02-17T18:05:17 1771351517

Anthropic again running scared of the open weight models which are rapidly catching up to them. Not even Sonnet or Opus isn't going to help with that at all.

It has already happened with the music gen models already. It's only a matter of time when the open weight models will overtake Anthropic.

Expect them to dial up the scaremongering until they IPO. The Claude family of models are their only AI product that is keeping them alive.

throwup238 · 2026-02-17T18:09:49 1771351789

What are the latest open music models?

falloon · 2026-02-17T18:29:46 1771352986

Ace step 1.5 is great, only 1.5b params so very easy to run locally.

https://github.com/ace-step/ACE-Step-1.5

catigula · 2026-02-17T18:12:22 1771351942

Chinese companies distilling frontier models is certainly a crisis but it isn't one that implies said Chinese companies are anywhere in the 'race'.

bigyabai · 2026-02-17T18:13:20 1771352000

The "race" matters less than making money. If those Chinese models perform well in price/performance, AGI might as well pound sand.