I’ve been using llama.cpp, gpt-llama and chatbot-ui for a while now, and I’m very happy with it. However, I’m now looking into a more stable setup using only GPU. Is this llama.cpp still still a good candidate for that?

  • Hudsonius@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    GPTQ-for-llama with ooba booga works pretty well. I’m not sure to what extent it uses CPU, but my GPU is at 100% during inference so it seems to be mainly that.

    • bia@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I’ve looked at that before. Do you use it with any UI?

      • Hudsonius@lemmy.ml
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        Yea it’s called Text Generation web UI. If you check out the Ooba Booga git, it goes into good details. From what I can tell it’s based on the automatic1111 UI for stable diffusion.

        • dragonfyre13@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          It’s using Gradio, which is what auto1111 also uses. Both of these are pretty heavy modifications/extensions that do a lot to push Gradio to it’s limits, but that’s package being used in both. Note, it also has an api (checkout the --api flag I believe), and depending on what you want to do there’s various UIs that can hook into the Text Gen Web UI (oobabooga) API in various ways.