Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The key to running LLM services in prod is setting up Gemini in Vertex, Anthropic models on AWS Bedrock and OpenAI models on Azure. It's a completely different world in terms of uptime, latency and output performance.


Have you had any luck getting your Claude quota bumped on Bedrock? I tried working through AWS support but got nowhere. Gave up and used Vertex + Gemini


Does OpenAI on azure still have that insane latency for content filtering? Last time I checked it added a huge # to time to first token, making azure hosting for real time scenarios impractical.


Yes.

Unless you convince MS to let you at the "Provisioned Throughput" model. Which also requires being big enough for sales to listen to you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: