Hello on behalf of the Gemma team! We are really excited to answer any questions...

voxgen · on Feb 21, 2024

Thank you very much for releasing these models! It's great to see Google enter the battle with a strong hand.

I'm wondering if you're able to provide any insight into the below hyperparameter decisions in Gemma's architecture, as they differ significantly from what we've seen with other recent models?

* On the 7B model, the `d_model` (3072) is smaller than `num_heads * d_head` (16*256=4096). I don't know of any other model where these numbers don't match.

* The FFN expansion factor of 16x is MUCH higher than the Llama-2-7B's 5.4x, which itself was chosen to be equi-FLOPS with PaLM's 4x.

* The vocab is much larger - 256k, where most small models use 32k-64k.

* GQA is only used on the 2B model, where we've seen other models prefer to save it for larger models.

These observations are in no way meant to be criticism - I understand that Llama's hyperparameters are also somewhat arbitrarily inherited from its predecessors like PaLM and GPT-2, and that it's non-trivial to run hyperopt on such large models. I'm just really curious about what findings motivated these choices.

owl_brawl · on Feb 21, 2024

I would love answers to these questions too, particularly on the vocab size

lordswork · on Feb 21, 2024

Is there any truth behind this claim that folks who worked on Gemma have left Google?

https://x.com/yar_vol/status/1760314018575634842

lordswork · on Feb 21, 2024

I confirmed all the folks listed on page 12 are still at Google (listed below). I am guessing the linked tweet is a BS claim.

   # Product Management
   Tris Warkentin
   Ludovic Peran

   # Program Management
   Minh Giang

   # Executive Sponsors
   Clement Farabet
   Oriol Vinyals
   Jeff Dean
   Koray Kavukcuoglu
   Demis Hassabis
   Zoubin Ghahramani
   Douglas Eck
   Joelle Barral
   Fernando Pereira
   Eli Collins

   # Leads
   Armand Joulin
   Noah Fiedel
   Evan Senter

   # Tech Leads
   Alek Andreev†
   Kathleen Kenealy†

trisfromgoogle · on Feb 23, 2024

Always funny to see your own name as you scroll through HN comments =). You're right, though!

elcomet · on Feb 21, 2024

It seems very easy to check no? Look at the names in the paper and check where they are working now

lordswork · on Feb 21, 2024

Good idea. I've confirmed all the leadership / tech leads listed on page 12 are still at Google.

Can someone with a Twitter account call out the tweet linked above and ask them specifically who they are referring to? Seems there is no evidence of their claim.

elcomet · on Feb 22, 2024

It's also possible Google removed names of people who left. It's not really a research paper, more a marketing piece, so it might be possible (I don't think they would do that with a conf paper)

lordswork · on Feb 22, 2024

We'll see if the person making this claim responds with specific Gemma developers that have left. Otherwise, I think it's safe to assume they are just lying.

CaffeinatedDev · on Feb 21, 2024

Them: here to answer questions

Question

Them: :O

lordswork · on Feb 21, 2024

To be fair, I think they are in London, so I assume they have winded down for the day. Will probably have to wait ~12-18 hours for a response.

bluefinity · on Feb 21, 2024

To be fair, the tweet says that they don't work on the models at Google anymore, not that they have left Google.

Might be true, might not be. It's unsourced speculation.

LorenDB · on Feb 21, 2024

EDIT: it seems this is likely an Ollama bug, please keep that in mind for the rest of this comment :)

I ran Gemma in Ollama and noticed two things. First, it is slow. Gemma got less than 40 tok/s while Llama 2 7B got over 80 tok/s. Second, it is very bad at output generation. I said "hi", and it responded this:

``` Hi, . What is up? melizing with you today!

What would you like to talk about or hear from me on this fine day?? ```

With longer and more complex prompts it goes completely off the rails. Here's a snippet from its response to "Explain how to use Qt to get the current IP from https://icanhazip.com":

``` python print( "Error consonming IP arrangration at [local machine's hostname]. Please try fufing this function later!") ## guanomment messages are typically displayed using QtWidgets.MessageBox ```

Do you see similar results on your end or is this just a bug in Ollama? I have a terrible suspicion that this might be a completely flawed model, but I'm holding out hope that Ollama just has a bug somewhere.

mark_l_watson · on Feb 21, 2024

I was going to try these models with Ollama. Did you use a small number of bits/quantization?

LorenDB · on Feb 21, 2024

The problem exists with the default 7B model. I don't know if different quantizations would fix the problem. The 2B model is fine, though.

fosterfriends · on Feb 21, 2024

Not a question, but thank you for your hard work! Also, brave of you to join the HN comments, I appreciate your openness. Hope y'all get to celebrate the launch :)

lnyan · on Feb 21, 2024

Will there be Gemma-vision models or multimodal Gemma models?

alekandreev · on Feb 21, 2024

We have many exciting things planned that we can't reveal just yet :)

Jayakumark · on Feb 21, 2024

Have the same question.

h1t35h · on Feb 21, 2024

It seems you have exposed the internal debugging tool link in the blog post. You may want to do something about it.

trisfromgoogle · on Feb 21, 2024

Ah, I see -- the link is wrong, thank you for flagging! Fixing now.

h1t35h · on Feb 21, 2024

The blog post shares the link for debugging tool as https://*.*.corp.google.com/codelabs/responsible-ai/lit-gemm...

.corp and the login redirect makes me believe it was supposed to be an internal link

barrkel · on Feb 21, 2024

https://codelabs.developers.google.com/codelabs/responsible-...

littlestymaar · on Feb 21, 2024

Same for the “safety classifier”

neximo64 · on Feb 21, 2024

The link to the debugging tool is an internal one, no one outside Google can access it

wrexx0r · on Feb 21, 2024

The link in the Debugging section redirects to a Google SSO login page

pama · on Feb 21, 2024

Will these soon be available on lmsys for human comparison against other models? Can they run with llama.cpp?

ErneX · on Feb 21, 2024

Yes to llama.cpp

https://twitter.com/ggerganov/status/1760293079313973408

sbarre · on Feb 21, 2024

I came here wondering if these models are "open" in the sense that they'll show up on sites like Ollama where you can download and run them locally.

Am I correct to conclude that this means they eventually will?

It's unclear to me from Google's docs exactly what "open" means for Gemma

benpacker · on Feb 21, 2024

Yes - they are open weights and open inference code, which means they can be integrated into Ollama.

They are not “open training” (either in the training code or training data sense), so they are not reproducible, which some have suggested ought to be a component of the definition of open models.

OJFord · on Feb 21, 2024

It really should shouldn't it? I'm quite ML-naïve, but surely providing the model without 'training code or training data' is just like providing a self-hostable binary without the source code? Nobody calls that open source, it's not even source available.

michaelt · on Feb 21, 2024

It is widely believed (and in some cases acknowledged) that a lot of models are trained on copyrighted data scraped from the web. In some cases, even scrapes of ebook piracy websites - google 'books3' to learn more.

Some companies (such as those working on AI) believe this is legal, others (such as the copyright holders to those books) believe it isn't.

In any case, IMHO it's unlikely any cutting edge models will be offering us their training data any time soon.

sanroot99 · on Feb 22, 2024

Can training data be generated from llm,with right prompt?

sunnybeetroot · on Feb 21, 2024

That’s why they’re called open as in free to use how you wish, not open source where the source of the training is also provided.

OJFord · on Feb 21, 2024

But my point is there's no analogy for that that we call open? It's like self-hostable, or free (as in beer).

sunnybeetroot · on Feb 21, 2024

That’s a fair comment, maybe free-to-use is more appropriate.

etiam · on Feb 21, 2024

Yes, and there has been some discussion of that

Meta’s LLaMa 2 license is not Open Source https://news.ycombinator.com/item?id=36820122

idiotsecant · on Feb 21, 2024

Man, people will find anything to complain about.

OJFord · on Feb 21, 2024

I'm not complaining, I'm unlikely ever to use it (regardless of how open or not it is) so it doesn't really matter to me, just surprised to learn what people mean by 'open' in this context.

SushiHippie · on Feb 21, 2024

https://huggingface.co/google/gemma-7b-it/tree/main

yes, similar to the llama models, you'll also need to accept the license to download them officially. But the llama models have been unofficially downloadable without accepting the license for quite a while, so it's probably just a matter of time.

sbarre · on Feb 21, 2024

Can the Gemma models be downloaded to run locally, like open-source models Llama2, Mistral, etc ?

Or is your definition of "open" different?

austinvhuang · on Feb 21, 2024

Yes models can be downloaded locally. In addition to the python NN frameworks and ggml as options, we also implemented a standalone C++ implementation that you can run locally at https://github.com/google/gemma.cpp

kathleenfromgdm · on Feb 21, 2024

Yes, you can get started downloading the model and running inference on Kaggle: https://www.kaggle.com/models/google/gemma ; for a full list of ways to interact with the model, you can check out https://ai.google.dev/gemma.

dartharva · on Feb 21, 2024

Can we have llamafile releases as well?

https://github.com/Mozilla-Ocho/llamafile

syntaxing · on Feb 21, 2024

A small typo in your model link that breaks it. There’s an extra ; on the end.

kathleenfromgdm · on Feb 21, 2024

Corrected - thanks :)

Kostic · on Feb 21, 2024

It should be possible to run it via llama.cpp[0] now.

[0] https://github.com/ggerganov/llama.cpp/pull/5631

nerdix · on Feb 21, 2024

Amazing how quickly this happened.

mrob · on Feb 21, 2024

Mistral weights are released under an Apache 2.0 license, but Llama 2 weights are released under a proprietary license that prohibits use by large organizations and imposes usage restrictions, violating terms 5 and 6 the Open Source Definition[0]. Even if you accept that a model with a proprietary training dataset and proprietary training code can be considered "open source", there's no way Llama 2 qualifies.

For consistency with existing definitions[1], Llama 2 should be labeled a "weights available" model.

[0] https://en.wikipedia.org/wiki/The_Open_Source_Definition

[1] https://en.wikipedia.org/wiki/Source-available_software

tomp · on Feb 21, 2024

Their definition of "open" is "not open", i.e. you're only allowed to use Gemma in "non-harmful" way.

We all know that Google thinks that saying that 1800s English kings were white is "harmful".

hackerlight · on Feb 21, 2024

> We all know that Google thinks that saying that 1800s English kings were white is "harmful".

If you know how to make "1800s english kings" show up as white 100% of the time without also making "kings" show up as white 100% of the time, maybe you should apply to Google? Clearly you must have advanced knowledge on how to perfectly remove bias from training distributions if you casually throw stones like this.

trackflak · on Feb 21, 2024

Tell me you take this seriously: https://twitter.com/napoleon21st/status/1760116228746805272

It has no problem with other cultures and ethnicities, yet somehow white or Japanese just throws everything off?

I suppose 'bias' is the new word for "basic historic accuracy". I can get curious about other peoples without forcibly promoting them at the expense of my own Western and British people and culture. This 'anti bias' keyword injection is a laughably bad, in your face solution to a non-issue.

I lament the day 'anti-bias' AI this terrible is used to make real world decisions. At least we now know we can't trust such a model because it has already been so evidently crippled by its makers.

wantsanagent · on Feb 21, 2024

Not sure why you're getting downvoted. I would have thought HN of all places would recognize the power and value of OSI licensing and the danger of the proliferation of these source available but definitely not Open Source licenses.

neximo64 · on Feb 21, 2024

How are these performing so well compared to Llama 2, are there any documents on the architecture and differences, is it MoE?

Also note some of the links on the blog post don't work, e.g debugging tool.

kathleenfromgdm · on Feb 21, 2024

We've documented the architecture (including key differences) in our technical report here (https://goo.gle/GemmaReport), and you can see the architecture implementation in our Git Repo (https://github.com/google-deepmind/gemma).

declaredapple · on Feb 21, 2024

Congrats on the launch and thanks for the contribution! This looks like it's on-par or better compared to mistral 7B 0.1 or is that 0.2?

Are there plans for MoE or 70B models?

kathleenfromgdm · on Feb 21, 2024

Great question - we compare to the Mistral 7B 0.1 pretrained models (since there were no pretrained checkpoint updates in 0.2) and the Mistral 7B 0.2 instruction-tuned models in the technical report here: https://goo.gle/GemmaReport

audessuscest · on Feb 21, 2024

Does this model also thinks german were black 200 years ago ? Or is afraid to answer basic stuff ? because if this is the case no one will care about that model.

graphe · on Feb 21, 2024

I disagree, coding and RAG performance is all that matters to me. I'm not using an LLM to learn basic facts I already know.

audessuscest · on Feb 21, 2024

we're at basic knowledge level, if your RAG imply some of it, you can get bad result too. Anyway, would you use a model who makes this nonsense response or one that doesn't? I know which one I will prefer for sure...

graphe · on Feb 21, 2024

If this was better at specific RAG or coding performance I would absolutely, certainly without a doubt use it over a general instruct model in those instances.

audessuscest · on Feb 22, 2024

People getting so used to being manipulated and lied to that they don't even bother anymore is a huge part of the problem. But sure, do what suits you the best.

TheHypnotist · on Feb 21, 2024

How do you ragebait for premium pearl clutching?

freedomben · on Feb 21, 2024

I don't know anything about these twitter accounts so I don't know how credible they are, but here are some examples for your downvoters that I'm guessing just think you're just trolling or grossly exaggerating:

https://twitter.com/aginnt/status/1760159436323123632

https://twitter.com/Black_Pilled/status/1760198299443966382

robswc · on Feb 21, 2024

Yea. Just ask it anything about historical people/cultures and it will seemingly lobotomize itself.

I asked it about early Japan and it talked about how European women used Katanas and how Native Americans rode across the grassy plains carrying traditional Japanese weapons. Pure made up nonsense that not even primitive models would get wrong. Not sure what they did to it. I asked it why it assumed Native Americans were in Japan in the 1100s and it said:

> I assumed [...] various ethnicities, including Indigenous American, due to the diversity present in Japan throughout history. However, this overlooked [...] I focused on providing diverse representations without adequately considering the specific historical context.

How am I supposed to take this seriously? Especially on topics I'm unfamiliar with?

trackflak · on Feb 21, 2024

From one of the Twitter threads linked above:

> they insert random keyword in the prompts randomly to counter bias, that got revealed with something else I think. Had T shirts written with "diverse" on it as artifact

This was exposed as being the case with OpenAI's DALL-E as well - someone had typed a prompt of "Homer Simpson wearing a namebadge" and it generated an image of Homer with brown skin wearing a namebadge that said 'ethnically ambiguous'.

This is ludicrous - if they are fiddling with your prompt in this way, it will only stoke more frustration and resentment - achieving the opposite of why this has been implemented. Surely if we want diversity we will ask for it, but sometimes you don't, and that should be at the user's discretion.\

Another thread for context: https://twitter.com/napoleon21st/status/1760116228746805272

zitterbewegung · on Feb 21, 2024

Do you have a plan of releasing higher parameter models?

alekandreev · on Feb 21, 2024

We have many great things in research and development phases, so stay tuned. I’m hopeful we can share more in the coming weeks and month!

brucethemoose2 · on Feb 21, 2024

That is awesome!

I hope y'all consider longer context models as well.

Also, are ya'll looking alternative architectures like Mamba? Being "first" with a large Mamba model would cement your architectural choices/framework support like llama did for Meta.

efilife · on Feb 22, 2024

This doesn't answer the question at all

memossy · on Feb 21, 2024

Training on 4096 v5es how did you handle crazy batch size :o

tosh · on Feb 21, 2024

Are there any plans for releasing the datasets used?

alekandreev · on Feb 21, 2024

This would be really interesting in my opinion, but we are not releasing datasets at this time. See the C4 dataset for an earlier open dataset from Google.

CuriouslyC · on Feb 21, 2024

It's cool that you guys are able to release open stuff, that must be a nice change from the modus operandi at goog. I'll have to double check but it looks like phi-2 beats your performance in some cases while being smaller, I'm guessing the value proposition of these models is being small and good while also having more knowledge baked in?

alekandreev · on Feb 21, 2024

We deeply respect the Phi team and all other teams in the open model space. You’ll find that different models have different strengths and not all can be quantified with existing public evals. Take them for a spin and see what works for you.

owl_brawl · on Feb 21, 2024

Hi alekandreev,

Any reason you decided to go with a token vocabulary size of 256k? Smaller vocab/vector sizes like most models in this size seem to be using (~16-32k) are much easier to work with. Would love to understand the technical reasoning here that isn't detailed in the report unfortunately :(.

moffkalast · on Feb 21, 2024

I'm not sure if this was mentioned in the paper somewhere, but how much does the super large 265k tokenizer vocabulary influence inference speed and how much higher is the average text compression compared to llama's usual 30k? In short, is it really worth going beyond GPT 4's 100k?

nuclearjam · on Feb 23, 2024

May I ask what is the ram requirement for running the 2B model on CPU on an average consumer windows laptop? I have 16 gb RAM but I am seeing CPU/memory traceback. I’m using the transformer implementation.

dmnsl · on Feb 21, 2024

Hi, what is the cutoff date ?

alekandreev · on Feb 21, 2024

September 2023.

legohead · on Feb 21, 2024

All it will tell me is mid-2018.

jmorgan · on Feb 21, 2024

Hi! This is such an exciting release. Congratulations!

I work on Ollama and used the provided GGUF files to quantize the model. As mentioned by a few people here, the 4-bit integer quantized models (which Ollama defaults to) seem to have strange output with non-existent words and funny use of whitespace.

Do you have a link /reference as to how the models were converted to GGUF format? And is it expected that quantizing the models might cause this issue?

Thanks so much!

espadrine · on Feb 21, 2024

As a data point, using the Huggingface Transformers 4-bit quantization yields reasonable results: https://twitter.com/espadrine/status/1760355758309298421

kleiba · on Feb 21, 2024

> We are really excited to answer any questions you may have about our models.

I cannot count how many times I've seen similar posts on HN, followed by tens of questions from other users, three of which actually get answered by the OP. This one seems to be no exception so far.

alekandreev · on Feb 21, 2024

Sorry, doing our best here :)

kleiba · on Feb 24, 2024

Thank you!

spankalee · on Feb 21, 2024

What are you talking about? The team is in this thread answering questions.

AlexeyBelov · on Feb 23, 2024

Only simple and convenient ones.

vorticalbox · on Feb 21, 2024

are there plans to release an official GGUF version to use with llama.ccp?

espadrine · on Feb 21, 2024

It is already part of the release on Huggingface: https://huggingface.co/google/gemma-7b/blob/main/gemma-7b.gg...

It is a pretty clean release! I had some 500 issues with Kaggle validating my license approval, so you might too, but after a few attempts I could access the model.

vorticalbox · on Feb 21, 2024

I didn't see this when searching thanks

quickgist · on Feb 21, 2024

Will this be available as a Vertex AI foundational model like Gemini 1.0, without deploying a custom endpoint? Any info on pricing? (Also, when will Gemini 1.5 be available on Vertex?)

turnsout · on Feb 21, 2024

What is the license? I couldn’t find it on the 1P site or Kaggle.

trisfromgoogle · on Feb 21, 2024

You can find the terms on our website, ai.google.dev/gemma:

https://ai.google.dev/gemma/terms

spiantino · on Feb 21, 2024

out of curiosity, why is this a "terms" and not a license? I'm used to reading and understanding the software as coming with a license to use it. Do the terms give us license to use this explicitly?

turnsout · on Feb 21, 2024

They do, but unlike a known license, these terms are custom and non-standard. Which means I would guide my commercial clients away from this particular model.

sqreept · on Feb 21, 2024

What are the supported languages of these models?

alekandreev · on Feb 21, 2024

This v1 model is focused on English support, but you may find some multilingual capabilities.

cypress66 · on Feb 21, 2024

Can you share the training loss curve?

brucethemoose2 · on Feb 21, 2024

Will there be "extended context" releases like 01.ai did for Yi?

Also, is the model GQA?

hustwindmaple1 · on Feb 21, 2024

It's MQA, documented in the tech report

artninja1988 · on Feb 21, 2024

I find the snyde remarks around open source in the paper and announcement rather off putting.

As the ecosystem evolves, we urge the corporate AI community to move beyond demanding to be taken seriously as a player in open source for models that are not actually open, and avoid preaching with a PR statement that can be interpreted as uniformed at best or malicious at worst.

trisfromgoogle · on Feb 21, 2024

It would be great to understand what you mean by this -- we have a deep love for open source and the open developer ecosystem. Our open source team also released a blog today describing the rationale and approach for open models and continuing AI releases in the open ecosystem:

https://opensource.googleblog.com/2024/02/building-open-mode...

Thoughts and feedback welcome, as always.

mrob · on Feb 21, 2024

If you truly love Open Source, you should update the the language you use to describe your models so it doesn't mislead people into thinking it has something to do with Open Source.

Despite being called "Open", the Gemma weights are released under a license that is incompatible with the Open Source Definition. It has more in common with Source-Available Software, and as such it should be called a "Weights-Available Model".

surajrmal · on Feb 22, 2024

Open source is not defined as strictly as what you are suggesting it is. If you wish to have a stricter definition, a new term should probably be used. I believe I've heard it referred to as libre software in the past

mrob · on Feb 22, 2024

"Open Source Software" always refers to software that meets the Open Source Definition. "Libre Software" always refers to software that meets the Free Software Definition. In practice the two are often identical, hence the abbreviations "FOSS" (Free and Open Source Software) and "FLOSS" (Free/Libre and Open Source Software).

Although I don't know Google's motivation for using "Open" to describe proprietary model weights, the practical result is increasing confusion about Open Source Software. It's behavior that benefits any organization wanting to enjoy the good image of the Open Source Software community while not actually caring about that community at all.

artninja1988 · on Feb 21, 2024

The statement on you not being able to use LLaMA 2 to benchmark is also false and highly misleading see https://x.com/BlancheMinerva/status/1760302091166241163?s=20

lordswork · on Feb 21, 2024

    If, on the Llama 2 version release date, the monthly active users [...] is greater than 700 million monthly active users [...] you are not authorized to exercise any of the rights under this Agreement

I would guess this is Google being careful to not be burned by this lame clause in the Llama 2 license.

not2b · on Feb 21, 2024

It's aimed directly at them (and OpenAI and Microsoft) so they have to honor it if they don't want a legal battle. But there's nothing stopping others from doing benchmarking.

lucubratory · on Feb 22, 2024

For the reference of people seeing this now: The tweet that person linked has now been deleted and the scientist who tweeted it has acknowledged they were wrong and retracted their claim, as all good scientists should.

jppittma · on Feb 21, 2024

Working at google is like this, where no matter how much you try to do the right thing you're always under attack.

silentsanctuary · on Feb 21, 2024

Which remarks are you referring to?

artninja1988 · on Feb 21, 2024

The synde remarks at metas llama license that doesn't allow companies with 700 million monthly active users to use it, while this model also doesn't have a really 'open' license itself and also this paragraph:

>As the ecosystem evolves, we urge the wider AI community to move beyond simplistic ’open vs. closed’ debates, and avoid either exaggerating or minimising potential harms, as we believe a nuanced, collaborative approach to risks and benefits is essential. At Google DeepMind we’re committed to developing high-quality evaluations and invite the community to join us in this effort for a deeper understanding of AI systems.

tomComb · on Feb 21, 2024

Well, given that that restriction added to the meta-llama license is aimed at Google, is petty, and goes against open source norms, I think it’s reasonable that they should feel this way about it.

lordswork · on Feb 21, 2024

How is this a snide remark? It's factual and prevented their team from benchmarking against Llama 2.

trisfromgoogle · on Feb 21, 2024

Quick question -- can you tell me where you got that quote? It's not in the main blog or any of the launch communications that I can see.

artninja1988 · on Feb 21, 2024

The quote is from the technical report

https://storage.googleapis.com/deepmind-media/gemma/gemma-re...

trisfromgoogle · on Feb 23, 2024

Ah, thanks for clarifying! It's a good flag, though I wouldn't classify it as a snide comment personally. I'd be interested in hearing what you find snide or offensive about it -- do you think we shouldn't be trying to bring the whole community along for evals/safety/etc, regardless of open/closed?