It took me a while but I think the difference between Vertex and Gemini APIs is that Vertex is meant for existing GCP users and Gemini API for everyone else. If you are already using GCP then Vertex API works like everything else there. If you are not, then Gemini API is much easier. But they really should spell it out, currently it's really confusing.
Also they should make it clearer which SDKs, documents, pricing, SLAs etc apply to each. I still get confused when I google up some detail and end up reading the wrong document.
> I think the difference between Vertex and Gemini APIs is that Vertex is meant for existing GCP users and Gemini API for everyone else
Nahh, not really - Vertex has a HUGE feature surface, and can run a ton of models and frameworks. Gemini happens to be one of them, but you could also run non-google LLMs, non LLM stuff, run notebooks against your dataset, manage data flow and storage, and and and…
The key to running LLM services in prod is setting up Gemini in Vertex, Anthropic models on AWS Bedrock and OpenAI models on Azure. It's a completely different world in terms of uptime, latency and output performance.
Have you had any luck getting your Claude quota bumped on Bedrock? I tried working through AWS support but got nowhere. Gave up and used Vertex + Gemini
Does OpenAI on azure still have that insane latency for content filtering? Last time I checked it added a huge # to time to first token, making azure hosting for real time scenarios impractical.
Ex-googler here. Google shipped their org hierarchy here.
Vertex API is managed by Vertex team in Google Cloud. This is a production ready infrastructure that is SRE managed but usually one or two steps from the bleeding edge.
Gemini API, Jules etc are built by Google Labs. This is close to the bleeding edge but not as production ready.
Also they should make it clearer which SDKs, documents, pricing, SLAs etc apply to each. I still get confused when I google up some detail and end up reading the wrong document.