Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hello on behalf of the Gemma team! We are really excited to answer any questions you may have about our models.

Opinions are our own and not of Google DeepMind.



Thank you very much for releasing these models! It's great to see Google enter the battle with a strong hand.

I'm wondering if you're able to provide any insight into the below hyperparameter decisions in Gemma's architecture, as they differ significantly from what we've seen with other recent models?

* On the 7B model, the `d_model` (3072) is smaller than `num_heads * d_head` (16*256=4096). I don't know of any other model where these numbers don't match.

* The FFN expansion factor of 16x is MUCH higher than the Llama-2-7B's 5.4x, which itself was chosen to be equi-FLOPS with PaLM's 4x.

* The vocab is much larger - 256k, where most small models use 32k-64k.

* GQA is only used on the 2B model, where we've seen other models prefer to save it for larger models.

These observations are in no way meant to be criticism - I understand that Llama's hyperparameters are also somewhat arbitrarily inherited from its predecessors like PaLM and GPT-2, and that it's non-trivial to run hyperopt on such large models. I'm just really curious about what findings motivated these choices.


I would love answers to these questions too, particularly on the vocab size


Is there any truth behind this claim that folks who worked on Gemma have left Google?

https://x.com/yar_vol/status/1760314018575634842


I confirmed all the folks listed on page 12 are still at Google (listed below). I am guessing the linked tweet is a BS claim.

   # Product Management
   Tris Warkentin
   Ludovic Peran

   # Program Management
   Minh Giang

   # Executive Sponsors
   Clement Farabet
   Oriol Vinyals
   Jeff Dean
   Koray Kavukcuoglu
   Demis Hassabis
   Zoubin Ghahramani
   Douglas Eck
   Joelle Barral
   Fernando Pereira
   Eli Collins

   # Leads
   Armand Joulin
   Noah Fiedel
   Evan Senter

   # Tech Leads
   Alek Andreev†
   Kathleen Kenealy†


Always funny to see your own name as you scroll through HN comments =). You're right, though!


It seems very easy to check no? Look at the names in the paper and check where they are working now


Good idea. I've confirmed all the leadership / tech leads listed on page 12 are still at Google.

Can someone with a Twitter account call out the tweet linked above and ask them specifically who they are referring to? Seems there is no evidence of their claim.


It's also possible Google removed names of people who left. It's not really a research paper, more a marketing piece, so it might be possible (I don't think they would do that with a conf paper)


We'll see if the person making this claim responds with specific Gemma developers that have left. Otherwise, I think it's safe to assume they are just lying.


Them: here to answer questions

Question

Them: :O


To be fair, I think they are in London, so I assume they have winded down for the day. Will probably have to wait ~12-18 hours for a response.


To be fair, the tweet says that they don't work on the models at Google anymore, not that they have left Google.

Might be true, might not be. It's unsourced speculation.


EDIT: it seems this is likely an Ollama bug, please keep that in mind for the rest of this comment :)

I ran Gemma in Ollama and noticed two things. First, it is slow. Gemma got less than 40 tok/s while Llama 2 7B got over 80 tok/s. Second, it is very bad at output generation. I said "hi", and it responded this:

``` Hi, . What is up? melizing with you today!

What would you like to talk about or hear from me on this fine day?? ```

With longer and more complex prompts it goes completely off the rails. Here's a snippet from its response to "Explain how to use Qt to get the current IP from https://icanhazip.com":

``` python print( "Error consonming IP arrangration at [local machine's hostname]. Please try fufing this function later!") ## guanomment messages are typically displayed using QtWidgets.MessageBox ```

Do you see similar results on your end or is this just a bug in Ollama? I have a terrible suspicion that this might be a completely flawed model, but I'm holding out hope that Ollama just has a bug somewhere.


I was going to try these models with Ollama. Did you use a small number of bits/quantization?


The problem exists with the default 7B model. I don't know if different quantizations would fix the problem. The 2B model is fine, though.


Not a question, but thank you for your hard work! Also, brave of you to join the HN comments, I appreciate your openness. Hope y'all get to celebrate the launch :)


Will there be Gemma-vision models or multimodal Gemma models?


We have many exciting things planned that we can't reveal just yet :)


Have the same question.


It seems you have exposed the internal debugging tool link in the blog post. You may want to do something about it.


Ah, I see -- the link is wrong, thank you for flagging! Fixing now.


The blog post shares the link for debugging tool as https://*.*.corp.google.com/codelabs/responsible-ai/lit-gemm...

.corp and the login redirect makes me believe it was supposed to be an internal link



Same for the “safety classifier”


The link to the debugging tool is an internal one, no one outside Google can access it


The link in the Debugging section redirects to a Google SSO login page


Will these soon be available on lmsys for human comparison against other models? Can they run with llama.cpp?



I came here wondering if these models are "open" in the sense that they'll show up on sites like Ollama where you can download and run them locally.

Am I correct to conclude that this means they eventually will?

It's unclear to me from Google's docs exactly what "open" means for Gemma


Yes - they are open weights and open inference code, which means they can be integrated into Ollama.

They are not “open training” (either in the training code or training data sense), so they are not reproducible, which some have suggested ought to be a component of the definition of open models.


It really should shouldn't it? I'm quite ML-naïve, but surely providing the model without 'training code or training data' is just like providing a self-hostable binary without the source code? Nobody calls that open source, it's not even source available.


It is widely believed (and in some cases acknowledged) that a lot of models are trained on copyrighted data scraped from the web. In some cases, even scrapes of ebook piracy websites - google 'books3' to learn more.

Some companies (such as those working on AI) believe this is legal, others (such as the copyright holders to those books) believe it isn't.

In any case, IMHO it's unlikely any cutting edge models will be offering us their training data any time soon.


Can training data be generated from llm,with right prompt?


That’s why they’re called open as in free to use how you wish, not open source where the source of the training is also provided.


But my point is there's no analogy for that that we call open? It's like self-hostable, or free (as in beer).


That’s a fair comment, maybe free-to-use is more appropriate.


Yes, and there has been some discussion of that

Meta’s LLaMa 2 license is not Open Source https://news.ycombinator.com/item?id=36820122


Man, people will find anything to complain about.


I'm not complaining, I'm unlikely ever to use it (regardless of how open or not it is) so it doesn't really matter to me, just surprised to learn what people mean by 'open' in this context.


https://huggingface.co/google/gemma-7b-it/tree/main

yes, similar to the llama models, you'll also need to accept the license to download them officially. But the llama models have been unofficially downloadable without accepting the license for quite a while, so it's probably just a matter of time.


Can the Gemma models be downloaded to run locally, like open-source models Llama2, Mistral, etc ?

Or is your definition of "open" different?


Yes models can be downloaded locally. In addition to the python NN frameworks and ggml as options, we also implemented a standalone C++ implementation that you can run locally at https://github.com/google/gemma.cpp


Yes, you can get started downloading the model and running inference on Kaggle: https://www.kaggle.com/models/google/gemma ; for a full list of ways to interact with the model, you can check out https://ai.google.dev/gemma.


Can we have llamafile releases as well?

https://github.com/Mozilla-Ocho/llamafile


A small typo in your model link that breaks it. There’s an extra ; on the end.


Corrected - thanks :)


It should be possible to run it via llama.cpp[0] now.

[0] https://github.com/ggerganov/llama.cpp/pull/5631


Amazing how quickly this happened.


Mistral weights are released under an Apache 2.0 license, but Llama 2 weights are released under a proprietary license that prohibits use by large organizations and imposes usage restrictions, violating terms 5 and 6 the Open Source Definition[0]. Even if you accept that a model with a proprietary training dataset and proprietary training code can be considered "open source", there's no way Llama 2 qualifies.

For consistency with existing definitions[1], Llama 2 should be labeled a "weights available" model.

[0] https://en.wikipedia.org/wiki/The_Open_Source_Definition

[1] https://en.wikipedia.org/wiki/Source-available_software


Their definition of "open" is "not open", i.e. you're only allowed to use Gemma in "non-harmful" way.

We all know that Google thinks that saying that 1800s English kings were white is "harmful".


> We all know that Google thinks that saying that 1800s English kings were white is "harmful".

If you know how to make "1800s english kings" show up as white 100% of the time without also making "kings" show up as white 100% of the time, maybe you should apply to Google? Clearly you must have advanced knowledge on how to perfectly remove bias from training distributions if you casually throw stones like this.


Tell me you take this seriously: https://twitter.com/napoleon21st/status/1760116228746805272

It has no problem with other cultures and ethnicities, yet somehow white or Japanese just throws everything off?

I suppose 'bias' is the new word for "basic historic accuracy". I can get curious about other peoples without forcibly promoting them at the expense of my own Western and British people and culture. This 'anti bias' keyword injection is a laughably bad, in your face solution to a non-issue.

I lament the day 'anti-bias' AI this terrible is used to make real world decisions. At least we now know we can't trust such a model because it has already been so evidently crippled by its makers.


Not sure why you're getting downvoted. I would have thought HN of all places would recognize the power and value of OSI licensing and the danger of the proliferation of these source available but definitely not Open Source licenses.


How are these performing so well compared to Llama 2, are there any documents on the architecture and differences, is it MoE?

Also note some of the links on the blog post don't work, e.g debugging tool.


We've documented the architecture (including key differences) in our technical report here (https://goo.gle/GemmaReport), and you can see the architecture implementation in our Git Repo (https://github.com/google-deepmind/gemma).


Congrats on the launch and thanks for the contribution! This looks like it's on-par or better compared to mistral 7B 0.1 or is that 0.2?

Are there plans for MoE or 70B models?


Great question - we compare to the Mistral 7B 0.1 pretrained models (since there were no pretrained checkpoint updates in 0.2) and the Mistral 7B 0.2 instruction-tuned models in the technical report here: https://goo.gle/GemmaReport


Does this model also thinks german were black 200 years ago ? Or is afraid to answer basic stuff ? because if this is the case no one will care about that model.


I disagree, coding and RAG performance is all that matters to me. I'm not using an LLM to learn basic facts I already know.


we're at basic knowledge level, if your RAG imply some of it, you can get bad result too. Anyway, would you use a model who makes this nonsense response or one that doesn't? I know which one I will prefer for sure...


If this was better at specific RAG or coding performance I would absolutely, certainly without a doubt use it over a general instruct model in those instances.


People getting so used to being manipulated and lied to that they don't even bother anymore is a huge part of the problem. But sure, do what suits you the best.


How do you ragebait for premium pearl clutching?


I don't know anything about these twitter accounts so I don't know how credible they are, but here are some examples for your downvoters that I'm guessing just think you're just trolling or grossly exaggerating:

https://twitter.com/aginnt/status/1760159436323123632

https://twitter.com/Black_Pilled/status/1760198299443966382


Yea. Just ask it anything about historical people/cultures and it will seemingly lobotomize itself.

I asked it about early Japan and it talked about how European women used Katanas and how Native Americans rode across the grassy plains carrying traditional Japanese weapons. Pure made up nonsense that not even primitive models would get wrong. Not sure what they did to it. I asked it why it assumed Native Americans were in Japan in the 1100s and it said:

> I assumed [...] various ethnicities, including Indigenous American, due to the diversity present in Japan throughout history. However, this overlooked [...] I focused on providing diverse representations without adequately considering the specific historical context.

How am I supposed to take this seriously? Especially on topics I'm unfamiliar with?


From one of the Twitter threads linked above:

> they insert random keyword in the prompts randomly to counter bias, that got revealed with something else I think. Had T shirts written with "diverse" on it as artifact

This was exposed as being the case with OpenAI's DALL-E as well - someone had typed a prompt of "Homer Simpson wearing a namebadge" and it generated an image of Homer with brown skin wearing a namebadge that said 'ethnically ambiguous'.

This is ludicrous - if they are fiddling with your prompt in this way, it will only stoke more frustration and resentment - achieving the opposite of why this has been implemented. Surely if we want diversity we will ask for it, but sometimes you don't, and that should be at the user's discretion.\

Another thread for context: https://twitter.com/napoleon21st/status/1760116228746805272


Do you have a plan of releasing higher parameter models?


We have many great things in research and development phases, so stay tuned. I’m hopeful we can share more in the coming weeks and month!


That is awesome!

I hope y'all consider longer context models as well.

Also, are ya'll looking alternative architectures like Mamba? Being "first" with a large Mamba model would cement your architectural choices/framework support like llama did for Meta.


This doesn't answer the question at all


Training on 4096 v5es how did you handle crazy batch size :o


Are there any plans for releasing the datasets used?


This would be really interesting in my opinion, but we are not releasing datasets at this time. See the C4 dataset for an earlier open dataset from Google.


It's cool that you guys are able to release open stuff, that must be a nice change from the modus operandi at goog. I'll have to double check but it looks like phi-2 beats your performance in some cases while being smaller, I'm guessing the value proposition of these models is being small and good while also having more knowledge baked in?


We deeply respect the Phi team and all other teams in the open model space. You’ll find that different models have different strengths and not all can be quantified with existing public evals. Take them for a spin and see what works for you.


Hi alekandreev,

Any reason you decided to go with a token vocabulary size of 256k? Smaller vocab/vector sizes like most models in this size seem to be using (~16-32k) are much easier to work with. Would love to understand the technical reasoning here that isn't detailed in the report unfortunately :(.


I'm not sure if this was mentioned in the paper somewhere, but how much does the super large 265k tokenizer vocabulary influence inference speed and how much higher is the average text compression compared to llama's usual 30k? In short, is it really worth going beyond GPT 4's 100k?


May I ask what is the ram requirement for running the 2B model on CPU on an average consumer windows laptop? I have 16 gb RAM but I am seeing CPU/memory traceback. I’m using the transformer implementation.


Hi, what is the cutoff date ?


September 2023.


All it will tell me is mid-2018.


Hi! This is such an exciting release. Congratulations!

I work on Ollama and used the provided GGUF files to quantize the model. As mentioned by a few people here, the 4-bit integer quantized models (which Ollama defaults to) seem to have strange output with non-existent words and funny use of whitespace.

Do you have a link /reference as to how the models were converted to GGUF format? And is it expected that quantizing the models might cause this issue?

Thanks so much!


As a data point, using the Huggingface Transformers 4-bit quantization yields reasonable results: https://twitter.com/espadrine/status/1760355758309298421


> We are really excited to answer any questions you may have about our models.

I cannot count how many times I've seen similar posts on HN, followed by tens of questions from other users, three of which actually get answered by the OP. This one seems to be no exception so far.


Sorry, doing our best here :)


Thank you!


What are you talking about? The team is in this thread answering questions.


Only simple and convenient ones.


are there plans to release an official GGUF version to use with llama.ccp?


It is already part of the release on Huggingface: https://huggingface.co/google/gemma-7b/blob/main/gemma-7b.gg...

It is a pretty clean release! I had some 500 issues with Kaggle validating my license approval, so you might too, but after a few attempts I could access the model.


I didn't see this when searching thanks


Will this be available as a Vertex AI foundational model like Gemini 1.0, without deploying a custom endpoint? Any info on pricing? (Also, when will Gemini 1.5 be available on Vertex?)


What is the license? I couldn’t find it on the 1P site or Kaggle.


You can find the terms on our website, ai.google.dev/gemma:

https://ai.google.dev/gemma/terms


out of curiosity, why is this a "terms" and not a license? I'm used to reading and understanding the software as coming with a license to use it. Do the terms give us license to use this explicitly?


They do, but unlike a known license, these terms are custom and non-standard. Which means I would guide my commercial clients away from this particular model.


What are the supported languages of these models?


This v1 model is focused on English support, but you may find some multilingual capabilities.


Can you share the training loss curve?


Will there be "extended context" releases like 01.ai did for Yi?

Also, is the model GQA?


It's MQA, documented in the tech report


I find the snyde remarks around open source in the paper and announcement rather off putting.

As the ecosystem evolves, we urge the corporate AI community to move beyond demanding to be taken seriously as a player in open source for models that are not actually open, and avoid preaching with a PR statement that can be interpreted as uniformed at best or malicious at worst.


It would be great to understand what you mean by this -- we have a deep love for open source and the open developer ecosystem. Our open source team also released a blog today describing the rationale and approach for open models and continuing AI releases in the open ecosystem:

https://opensource.googleblog.com/2024/02/building-open-mode...

Thoughts and feedback welcome, as always.


If you truly love Open Source, you should update the the language you use to describe your models so it doesn't mislead people into thinking it has something to do with Open Source.

Despite being called "Open", the Gemma weights are released under a license that is incompatible with the Open Source Definition. It has more in common with Source-Available Software, and as such it should be called a "Weights-Available Model".


Open source is not defined as strictly as what you are suggesting it is. If you wish to have a stricter definition, a new term should probably be used. I believe I've heard it referred to as libre software in the past


"Open Source Software" always refers to software that meets the Open Source Definition. "Libre Software" always refers to software that meets the Free Software Definition. In practice the two are often identical, hence the abbreviations "FOSS" (Free and Open Source Software) and "FLOSS" (Free/Libre and Open Source Software).

Although I don't know Google's motivation for using "Open" to describe proprietary model weights, the practical result is increasing confusion about Open Source Software. It's behavior that benefits any organization wanting to enjoy the good image of the Open Source Software community while not actually caring about that community at all.


The statement on you not being able to use LLaMA 2 to benchmark is also false and highly misleading see https://x.com/BlancheMinerva/status/1760302091166241163?s=20


    If, on the Llama 2 version release date, the monthly active users [...] is greater than 700 million monthly active users [...] you are not authorized to exercise any of the rights under this Agreement
I would guess this is Google being careful to not be burned by this lame clause in the Llama 2 license.


It's aimed directly at them (and OpenAI and Microsoft) so they have to honor it if they don't want a legal battle. But there's nothing stopping others from doing benchmarking.


For the reference of people seeing this now: The tweet that person linked has now been deleted and the scientist who tweeted it has acknowledged they were wrong and retracted their claim, as all good scientists should.


Working at google is like this, where no matter how much you try to do the right thing you're always under attack.


Which remarks are you referring to?


The synde remarks at metas llama license that doesn't allow companies with 700 million monthly active users to use it, while this model also doesn't have a really 'open' license itself and also this paragraph:

>As the ecosystem evolves, we urge the wider AI community to move beyond simplistic ’open vs. closed’ debates, and avoid either exaggerating or minimising potential harms, as we believe a nuanced, collaborative approach to risks and benefits is essential. At Google DeepMind we’re committed to developing high-quality evaluations and invite the community to join us in this effort for a deeper understanding of AI systems.


Well, given that that restriction added to the meta-llama license is aimed at Google, is petty, and goes against open source norms, I think it’s reasonable that they should feel this way about it.


How is this a snide remark? It's factual and prevented their team from benchmarking against Llama 2.


Quick question -- can you tell me where you got that quote? It's not in the main blog or any of the launch communications that I can see.



Ah, thanks for clarifying! It's a good flag, though I wouldn't classify it as a snide comment personally. I'd be interested in hearing what you find snide or offensive about it -- do you think we shouldn't be trying to bring the whole community along for evals/safety/etc, regardless of open/closed?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: