• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    Some of the distillations are trained on top of Qwen 2.5.

    And for some cases, FuseAI (a special merge of several thinking models), Qwen Coder, EVA-Gutenberg Qwen, or some other specialized models do a better job than Deepseek 32B in certain niches.