The key to running LLM services in prod is setting up Gemini in Vertex, Anthropic models on AWS Bedrock and OpenAI models on Azure. It's a completely different world in terms of uptime, latency and output performance.
Have you had any luck getting your Claude quota bumped on Bedrock? I tried working through AWS support but got nowhere. Gave up and used Vertex + Gemini
Does OpenAI on azure still have that insane latency for content filtering? Last time I checked it added a huge # to time to first token, making azure hosting for real time scenarios impractical.