Compiling Llama.cpp for Nvidia

medium.com

Compiling Llama.cpp for Nvidia

medium.com

canadaduane to

Pop!_OS (Linux)@lemmy.worldEnglish · 2 years ago

Just a moment...

medium.com

Incredibly, running a local LLM (large language model) on just the CPU is possible with Llama.cpp!— however, it can be pretty slow. I get about 1 token every 2 seconds with a 34 billion parameter model on an 11th gen Intel framework laptop with 64GB of RAM.

I have an external Nvidia GPU connected to my Pop!_OS laptop, and I’ve used the following technique to successfully compile Llama.cpp to use clblast (a BLAS adapter library) to speed up various LLMs (such as codellama-34b.Q4_K_M.gguf). As a rough estimate, the speed-up I get is about 5x on my Nvidia 3080 TI.

Here’s how to compile Llama.cpp inside a docker container on Pop!_OS.

You must log in or register to comment.

Chat

Pop!_OS (Linux)@lemmy.world

pop_os@lemmy.world

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Pop!_OS is an operating system developed by System76 for STEM and creative professionals who use their computer as a tool to discover and create. Unleash your potential on secure, reliable open source software. Based on your exceptional curiosity, we sense you have a lot of it.

Unleash your potential

Whether this is your first experience with Linux, or your latest adventure, all are welcome to discuss and ask questions about Pop!_OS and COSMIC. Keep the discussions friendly though, and remember to assume good intentions whenever you reply. We’re all here because we have a shared love for Linux and open source software.

Support us by buying System76 hardware for you or your company! Or by donating on the Pop!_OS website through the “Support Pop” button. Pop!_OS and COSMIC are fully funded by System76 hardware sales. All systems are assembled in the USA. With your support, we’ll work to push the Linux desktop forward with COSMIC.

Links

Guides

Hardware

Recommended

Community Rules

Follow the Code of Conduct

All posts on pop_os must adhere to the Pop!_OS community Code of Conduct. https://github.com/pop-os/code-of-conduct

Be helpful

Posts to pop_os must be helpful. When responding to a user asking for help, do not provide tongue-in-cheek responses like “RTM” or links to LMGTFY. Linking to direct sources that answer the asker’s question is fine, but it’s advised to provide some explanation as to how you got to that source.

Critique should be constructive

We within the Pop!_OS community welcome helpful criticism or ideas on ways to improve. However, basic “It’s bad” or other simple negative comments don’t help anyone fix anything. When voicing a complaint about something, try to point out ways the complaint could be improved or worked around, so that we can make a better product for it.

This rule applies to both Pop!_OS and its projects as well as other products available from third-parties.

Don’t post malicious “advice”

It can be funny to joke about malicious commands, however this is not the venue for it. Do not advise users to run commands which will lock up their systems, steal their data, or erase their drive. Examples of this include (but are not limited to) fork bombs, rm, etc.

Posts violating this rule will be removed, even if the post is clearly in jest. Repeated offences may lead to a ban. You may understand that the command isn’t serious, but a new user might not.

No personal attacks

Posts making a personal attack on any user will not be tolerated.

No hate speech

Hate speech of any kind will not be tolerated. Any violations will be removed, and are grounds for a ban.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
2 users / week
67 users / month
501 users / 6 months
85 local subscribers
5.42K subscribers
361 Posts
1.59K Comments
Modlog