Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TBH the community has largely outrun Mistral's own finetuning. The 7B model in particular is such a popular target because its so practical to train.


Strong disagree - a Mistral fine tune of llama 70b was the top performing llama fine tune. They have lots of data the community simply does not.


Miqu was (allegedly) an internal continued pretrain Mistral did as a test, that was leaked as a GGUF.

Maybe its just semantics, it is technically a finetune... But to me theres a big difference between expensive "continuation training" (like Solar 10.7B or Mistral 70B) and a much less intense finetuning. The former is almost like releasing a whole new base model.

It would be awesome if Mistral did that with their data, but thats very different than releasing a Gemma Instruct finetune.


There’s typically a difference in LR between a ‘continued pretrain’ and ‘fine tune.’ I don’t have the details around miqu, but was merely trying to say that Mistral could produce a better version of these models than the OSS community might. If the size of the corpora they use means we are no longer in fine tuning territory, then okay.


Arthur Mensch, the Mistral CEO, confirmed the leak. https://twitter.com/arthurmensch/status/1752737462663684344


Also, it led to one of the funniest pr I've seen in a while

https://huggingface.co/miqudev/miqu-1-70b/discussions/10


No shot. Mistral Medium's outputs from API were virtually identical. Miqu really was Mistral Medium which happened to be a continued pretrain




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: