The best local models are literally right behind Claude/Gemini/Codex. Check the benchmarks.
That said, Claude Code is designed to work with Anthropic's models. Agents have a buttload of custom work going on in the background to massage specific models to do things well.
I've repeatedly seen Opus 4.5 manufacture malpractice and then disable the checks complaining about it in order to be able to declare the job done, so I would agree with you about benchmarks versus experience.
That said, Claude Code is designed to work with Anthropic's models. Agents have a buttload of custom work going on in the background to massage specific models to do things well.