Miqu was (allegedly) an internal continued pretrain Mistral did as a test, that was leaked as a GGUF.
Maybe its just semantics, it is technically a finetune... But to me theres a big difference between expensive "continuation training" (like Solar 10.7B or Mistral 70B) and a much less intense finetuning. The former is almost like releasing a whole new base model.
It would be awesome if Mistral did that with their data, but thats very different than releasing a Gemma Instruct finetune.
There’s typically a difference in LR between a ‘continued pretrain’ and ‘fine tune.’ I don’t have the details around miqu, but was merely trying to say that Mistral could produce a better version of these models than the OSS community might. If the size of the corpora they use means we are no longer in fine tuning territory, then okay.
Maybe its just semantics, it is technically a finetune... But to me theres a big difference between expensive "continuation training" (like Solar 10.7B or Mistral 70B) and a much less intense finetuning. The former is almost like releasing a whole new base model.
It would be awesome if Mistral did that with their data, but thats very different than releasing a Gemma Instruct finetune.