• brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      Some of the distillations are trained on top of Qwen 2.5.

      And for some cases, FuseAI (a special merge of several thinking models), Qwen Coder, EVA-Gutenberg Qwen, or some other specialized models do a better job than Deepseek 32B in certain niches.