TBH the community has largely outrun Mistral's own finetuning. The 7B model in p...

whimsicalism · on Feb 21, 2024

Strong disagree - a Mistral fine tune of llama 70b was the top performing llama fine tune. They have lots of data the community simply does not.

brucethemoose2 · on Feb 21, 2024

Miqu was (allegedly) an internal continued pretrain Mistral did as a test, that was leaked as a GGUF.

Maybe its just semantics, it is technically a finetune... But to me theres a big difference between expensive "continuation training" (like Solar 10.7B or Mistral 70B) and a much less intense finetuning. The former is almost like releasing a whole new base model.

It would be awesome if Mistral did that with their data, but thats very different than releasing a Gemma Instruct finetune.

whimsicalism · on Feb 21, 2024

There’s typically a difference in LR between a ‘continued pretrain’ and ‘fine tune.’ I don’t have the details around miqu, but was merely trying to say that Mistral could produce a better version of these models than the OSS community might. If the size of the corpora they use means we are no longer in fine tuning territory, then okay.

speedgoose · on Feb 21, 2024

Arthur Mensch, the Mistral CEO, confirmed the leak. https://twitter.com/arthurmensch/status/1752737462663684344

saintradon · on Feb 21, 2024

Also, it led to one of the funniest pr I've seen in a while

https://huggingface.co/miqudev/miqu-1-70b/discussions/10

sanjiwatsuki · on Feb 21, 2024

No shot. Mistral Medium's outputs from API were virtually identical. Miqu really was Mistral Medium which happened to be a continued pretrain