Miqu was (allegedly) an internal continued pretrain Mistral did as a test, that was leaked as a GGUF.
Maybe its just semantics, it is technically a finetune... But to me theres a big difference between expensive "continuation training" (like Solar 10.7B or Mistral 70B) and a much less intense finetuning. The former is almost like releasing a whole new base model.
It would be awesome if Mistral did that with their data, but thats very different than releasing a Gemma Instruct finetune.
There’s typically a difference in LR between a ‘continued pretrain’ and ‘fine tune.’ I don’t have the details around miqu, but was merely trying to say that Mistral could produce a better version of these models than the OSS community might. If the size of the corpora they use means we are no longer in fine tuning territory, then okay.