The first GPT-4-class AI model anyone can download has arrived: Llama 405B

sabreW4K3@lazysoci.al · 8 months ago

The first GPT-4-class AI model anyone can download has arrived: Llama 405B

DeprecatedCompatV2@programming.dev · 8 months ago

I’ve heard that performance improves offline. Is it possible to set a model loose on a project and let it iteratively work, or is there a better approach?

Mikina@programming.dev · edit-2 8 months ago

If you are interested in code completion, I recommend taking a look at https://refact.ai/. Hosting it (last time I tried) was almost painless, setting up docker to work with your GPU takes some time, but is pretty ok-ishly documented on NVIDIA page, and then you just run a docker and it worked.

It runs a server you can connect to i.e with a VSCode plugin, that will provide code completion or a chatbot (depending on what model you run), and it also has an option to let it loose on your project. You set training hours, give it a git repo (or a zipfile with whole project), and it starts training, which should tailor it towards giving more relevant code completion in the context of the project. I’m not sure if you can do that for the chatbot models, though.

However, I was trying it on my spare gaming PC turned server, that has an unused NVIDIA 1060, and while I could run some smaller models, I wasn’t able to get the training working - the 6Gb of VRAM simply aren’t enough for that. I also tried running it on the PC I work on, but it kept eating like 20-30Gb of RAM for the container, which made it kind of hard to also do anything else on the PC.

However, if you have a spare PC/server with good GPU that can run it, I’d say it’s one of the better ways how to get personalized code completion, that keeps your data local and secure.

As a side note, I think you can give it API keys and let it use online models, but that would kind of defeat the point.