Thank you very much for releasing these models! It's great to see Google enter the battle with a strong hand.
I'm wondering if you're able to provide any insight into the below hyperparameter decisions in Gemma's architecture, as they differ significantly from what we've seen with other recent models?
* On the 7B model, the `d_model` (3072) is smaller than `num_heads * d_head` (16*256=4096). I don't know of any other model where these numbers don't match.
* The FFN expansion factor of 16x is MUCH higher than the Llama-2-7B's 5.4x, which itself was chosen to be equi-FLOPS with PaLM's 4x.
* The vocab is much larger - 256k, where most small models use 32k-64k.
* GQA is only used on the 2B model, where we've seen other models prefer to save it for larger models.
These observations are in no way meant to be criticism - I understand that Llama's hyperparameters are also somewhat arbitrarily inherited from its predecessors like PaLM and GPT-2, and that it's non-trivial to run hyperopt on such large models. I'm just really curious about what findings motivated these choices.
Good idea. I've confirmed all the leadership / tech leads listed on page 12 are still at Google.
Can someone with a Twitter account call out the tweet linked above and ask them specifically who they are referring to? Seems there is no evidence of their claim.
It's also possible Google removed names of people who left. It's not really a research paper, more a marketing piece, so it might be possible (I don't think they would do that with a conf paper)
We'll see if the person making this claim responds with specific Gemma developers that have left. Otherwise, I think it's safe to assume they are just lying.
EDIT: it seems this is likely an Ollama bug, please keep that in mind for the rest of this comment :)
I ran Gemma in Ollama and noticed two things. First, it is slow. Gemma got less than 40 tok/s while Llama 2 7B got over 80 tok/s. Second, it is very bad at output generation. I said "hi", and it responded this:
```
Hi, . What is up? melizing with you today!
What would you like to talk about or hear from me on this fine day??
```
With longer and more complex prompts it goes completely off the rails. Here's a snippet from its response to "Explain how to use Qt to get the current IP from https://icanhazip.com":
``` python
print( "Error consonming IP arrangration at [local machine's hostname]. Please try fufing this function later!") ##
guanomment messages are typically displayed using QtWidgets.MessageBox
```
Do you see similar results on your end or is this just a bug in Ollama? I have a terrible suspicion that this might be a completely flawed model, but I'm holding out hope that Ollama just has a bug somewhere.
Not a question, but thank you for your hard work! Also, brave of you to join the HN comments, I appreciate your openness. Hope y'all get to celebrate the launch :)
Yes - they are open weights and open inference code, which means they can be integrated into Ollama.
They are not “open training” (either in the training code or training data sense), so they are not reproducible, which some have suggested ought to be a component of the definition of open models.
It really should shouldn't it? I'm quite ML-naïve, but surely providing the model without 'training code or training data' is just like providing a self-hostable binary without the source code? Nobody calls that open source, it's not even source available.
It is widely believed (and in some cases acknowledged) that a lot of models are trained on copyrighted data scraped from the web. In some cases, even scrapes of ebook piracy websites - google 'books3' to learn more.
Some companies (such as those working on AI) believe this is legal, others (such as the copyright holders to those books) believe it isn't.
In any case, IMHO it's unlikely any cutting edge models will be offering us their training data any time soon.
I'm not complaining, I'm unlikely ever to use it (regardless of how open or not it is) so it doesn't really matter to me, just surprised to learn what people mean by 'open' in this context.
yes, similar to the llama models, you'll also need to accept the license to download them officially. But the llama models have been unofficially downloadable without accepting the license for quite a while, so it's probably just a matter of time.
Yes models can be downloaded locally. In addition to the python NN frameworks and ggml as options, we also implemented a standalone C++ implementation that you can run locally at https://github.com/google/gemma.cpp
Mistral weights are released under an Apache 2.0 license, but Llama 2 weights are released under a proprietary license that prohibits use by large organizations and imposes usage restrictions, violating terms 5 and 6 the Open Source Definition[0]. Even if you accept that a model with a proprietary training dataset and proprietary training code can be considered "open source", there's no way Llama 2 qualifies.
For consistency with existing definitions[1], Llama 2 should be labeled a "weights available" model.
> We all know that Google thinks that saying that 1800s English kings were white is "harmful".
If you know how to make "1800s english kings" show up as white 100% of the time without also making "kings" show up as white 100% of the time, maybe you should apply to Google? Clearly you must have advanced knowledge on how to perfectly remove bias from training distributions if you casually throw stones like this.
It has no problem with other cultures and ethnicities, yet somehow white or Japanese just throws everything off?
I suppose 'bias' is the new word for "basic historic accuracy". I can get curious about other peoples without forcibly promoting them at the expense of my own Western and British people and culture. This 'anti bias' keyword injection is a laughably bad, in your face solution to a non-issue.
I lament the day 'anti-bias' AI this terrible is used to make real world decisions. At least we now know we can't trust such a model because it has already been so evidently crippled by its makers.
Not sure why you're getting downvoted. I would have thought HN of all places would recognize the power and value of OSI licensing and the danger of the proliferation of these source available but definitely not Open Source licenses.
Great question - we compare to the Mistral 7B 0.1 pretrained models (since there were no pretrained checkpoint updates in 0.2) and the Mistral 7B 0.2 instruction-tuned models in the technical report here: https://goo.gle/GemmaReport
Does this model also thinks german were black 200 years ago ? Or is afraid to answer basic stuff ? because if this is the case no one will care about that model.
we're at basic knowledge level, if your RAG imply some of it, you can get bad result too. Anyway, would you use a model who makes this nonsense response or one that doesn't? I know which one I will prefer for sure...
If this was better at specific RAG or coding performance I would absolutely, certainly without a doubt use it over a general instruct model in those instances.
People getting so used to being manipulated and lied to that they don't even bother anymore is a huge part of the problem. But sure, do what suits you the best.
I don't know anything about these twitter accounts so I don't know how credible they are, but here are some examples for your downvoters that I'm guessing just think you're just trolling or grossly exaggerating:
Yea. Just ask it anything about historical people/cultures and it will seemingly lobotomize itself.
I asked it about early Japan and it talked about how European women used Katanas and how Native Americans rode across the grassy plains carrying traditional Japanese weapons. Pure made up nonsense that not even primitive models would get wrong. Not sure what they did to it. I asked it why it assumed Native Americans were in Japan in the 1100s and it said:
> I assumed [...] various ethnicities, including Indigenous American, due to the diversity present in Japan throughout history. However, this overlooked [...] I focused on providing diverse representations without adequately considering the specific historical context.
How am I supposed to take this seriously? Especially on topics I'm unfamiliar with?
> they insert random keyword in the prompts randomly to counter bias, that got revealed with something else I think. Had T shirts written with "diverse" on it as artifact
This was exposed as being the case with OpenAI's DALL-E as well - someone had typed a prompt of "Homer Simpson wearing a namebadge" and it generated an image of Homer with brown skin wearing a namebadge that said 'ethnically ambiguous'.
This is ludicrous - if they are fiddling with your prompt in this way, it will only stoke more frustration and resentment - achieving the opposite of why this has been implemented. Surely if we want diversity we will ask for it, but sometimes you don't, and that should be at the user's discretion.\
I hope y'all consider longer context models as well.
Also, are ya'll looking alternative architectures like Mamba? Being "first" with a large Mamba model would cement your architectural choices/framework support like llama did for Meta.
This would be really interesting in my opinion, but we are not releasing datasets at this time. See the C4 dataset for an earlier open dataset from Google.
It's cool that you guys are able to release open stuff, that must be a nice change from the modus operandi at goog. I'll have to double check but it looks like phi-2 beats your performance in some cases while being smaller, I'm guessing the value proposition of these models is being small and good while also having more knowledge baked in?
We deeply respect the Phi team and all other teams in the open model space. You’ll find that different models have different strengths and not all can be quantified with existing public evals. Take them for a spin and see what works for you.
Any reason you decided to go with a token vocabulary size of 256k? Smaller vocab/vector sizes like most models in this size seem to be using (~16-32k) are much easier to work with. Would love to understand the technical reasoning here that isn't detailed in the report unfortunately :(.
I'm not sure if this was mentioned in the paper somewhere, but how much does the super large 265k tokenizer vocabulary influence inference speed and how much higher is the average text compression compared to llama's usual 30k? In short, is it really worth going beyond GPT 4's 100k?
May I ask what is the ram requirement for running the 2B model on CPU on an average consumer windows laptop? I have 16 gb RAM but I am seeing CPU/memory traceback. I’m using the transformer implementation.
Hi! This is such an exciting release. Congratulations!
I work on Ollama and used the provided GGUF files to quantize the model. As mentioned by a few people here, the 4-bit integer quantized models (which Ollama defaults to) seem to have strange output with non-existent words and funny use of whitespace.
Do you have a link /reference as to how the models were converted to GGUF format? And is it expected that quantizing the models might cause this issue?
> We are really excited to answer any questions you may have about our models.
I cannot count how many times I've seen similar posts on HN, followed by tens of questions from other users, three of which actually get answered by the OP. This one seems to be no exception so far.
It is a pretty clean release! I had some 500 issues with Kaggle validating my license approval, so you might too, but after a few attempts I could access the model.
Will this be available as a Vertex AI foundational model like Gemini 1.0, without deploying a custom endpoint? Any info on pricing? (Also, when will Gemini 1.5 be available on Vertex?)
out of curiosity, why is this a "terms" and not a license? I'm used to reading and understanding the software as coming with a license to use it. Do the terms give us license to use this explicitly?
They do, but unlike a known license, these terms are custom and non-standard. Which means I would guide my commercial clients away from this particular model.
I find the snyde remarks around open source in the paper and announcement rather off putting.
As the ecosystem evolves, we urge the corporate AI community to move beyond demanding to be taken seriously as a player in open source for models that are not actually open, and avoid preaching with a PR statement that can be interpreted as uniformed at best or malicious at worst.
It would be great to understand what you mean by this -- we have a deep love for open source and the open developer ecosystem. Our open source team also released a blog today describing the rationale and approach for open models and continuing AI releases in the open ecosystem:
If you truly love Open Source, you should update the the language you use to describe your models so it doesn't mislead people into thinking it has something to do with Open Source.
Despite being called "Open", the Gemma weights are released under a license that is incompatible with the Open Source Definition. It has more in common with Source-Available Software, and as such it should be called a "Weights-Available Model".
Open source is not defined as strictly as what you are suggesting it is. If you wish to have a stricter definition, a new term should probably be used. I believe I've heard it referred to as libre software in the past
"Open Source Software" always refers to software that meets the Open Source Definition. "Libre Software" always refers to software that meets the Free Software Definition. In practice the two are often identical, hence the abbreviations "FOSS" (Free and Open Source Software) and "FLOSS" (Free/Libre and Open Source Software).
Although I don't know Google's motivation for using "Open" to describe proprietary model weights, the practical result is increasing confusion about Open Source Software. It's behavior that benefits any organization wanting to enjoy the good image of the Open Source Software community while not actually caring about that community at all.
If, on the Llama 2 version release date, the monthly active users [...] is greater than 700 million monthly active users [...] you are not authorized to exercise any of the rights under this Agreement
I would guess this is Google being careful to not be burned by this lame clause in the Llama 2 license.
It's aimed directly at them (and OpenAI and Microsoft) so they have to honor it if they don't want a legal battle. But there's nothing stopping others from doing benchmarking.
For the reference of people seeing this now: The tweet that person linked has now been deleted and the scientist who tweeted it has acknowledged they were wrong and retracted their claim, as all good scientists should.
The synde remarks at metas llama license that doesn't allow companies with 700 million monthly active users to use it, while this model also doesn't have a really 'open' license itself and also this paragraph:
>As the ecosystem evolves, we urge the wider AI community to move beyond simplistic ’open vs. closed’ debates, and avoid either exaggerating or minimising potential harms, as we believe a nuanced, collaborative approach to risks and benefits is essential. At Google DeepMind we’re committed to developing high-quality evaluations and invite the community to join us in this effort for a deeper understanding of AI systems.
Well, given that that restriction added to the meta-llama license is aimed at Google, is petty, and goes against open source norms, I think it’s reasonable that they should feel this way about it.
Ah, thanks for clarifying! It's a good flag, though I wouldn't classify it as a snide comment personally. I'd be interested in hearing what you find snide or offensive about it -- do you think we shouldn't be trying to bring the whole community along for evals/safety/etc, regardless of open/closed?
Opinions are our own and not of Google DeepMind.