5 easy ways to run an LLM locally

Deploying a large language model on your own system can be surprisingly simple—if you have the right tools. Here's how to use LLMs like Meta's new Code Llama on your desktop.

1 2 Page 2
Page 2 of 2

PrivateGPT features scripts to ingest data files, split them into chunks, create "embeddings" (numerical representations of the meaning of the text), and store those embeddings in a local Chroma vector store. When you ask a question, the app searches for relevant documents and sends just those to the LLM to generate an answer.

If you're familiar with Python and how to set up Python projects, you can clone the full PrivateGPT repository and run it locally. If you're less knowledgeable about Python, you may want to check out a simplified version of the project that author Iván Martínez set up for a conference workshop, which is considerably easier to set up.

That version's README file includes detailed instructions that don't assume Python sysadmin expertise. The repo comes with a source_documents folder full of Penpot documentation, but you can delete those and add your own.

PrivateGPT includes the features you'd likely most want in a "chat with your own documents" app in the terminal, but the documentation warns it's not meant for production. And once you run it, you may see why: Even the small model option ran very slowly on my home PC. But just remember, the early days of home Internet were painfully slow, too. I expect these types of individual projects will speed up.

More ways to run a local LLM

There are more ways to run LLMs locally than just these five, ranging from other desktop applications to writing scripts from scratch, all with varying degrees of setup complexity.

A PrivateGPT spinoff, LocalGPT, includes more options for models and has detailed instructions as well as three how-to videos, including a 17-minute detailed code walk-through. Opinions may differ on whether this installation and setup is "easy," but it does look promising. As with PrivateGPT, though, documentation warns that running it on a CPU alone will be slow.

Another desktop app I tried, LM Studio, has an easy-to-use interface for running chats, but you're more on your own with picking models. If you know what model you want to download and run, this could be a good choice. If you're just coming from the world of using ChatGPT and have limited knowledge of how best to balance precision with size, all the choices may be a bit overwhelming at first. Hugging Face Hub is the main source of model downloads inside LM Studio, and it has a lot of models.

Unlike the other LLM options, which all downloaded the models I chose on the first try, I had problems downloading one of the models within LM Studio. Another didn't run well, which was my fault for maxing out my Mac's hardware, but I didn't immediately see a suggested minimum non-GPU RAM for model choices. If you don't mind being patient about selecting and downloading models, though, LM Studio has a nice, clean interface once you're running the chat. As of this writing, the UI didn't have a built-in option for running the LLM over your own data.

Interface shows chat list at left, main chat bot UI in the center, and config options at right. Screenshot by Sharon Machlis for IDG

The LLM Studio interface.

It does have a built-in server that can be used "as a drop-in replacement for the OpenAI API," as the documentation notes, so code that was written to use an OpenAI model via the API will run instead on the local model you've selected.

Like h2oGPT, LM Studio also throws a warning on Windows that it's an unverified app. LM Studio code is not available on GitHub and isn't from a long-established organization, though, so not everyone will be comfortable installing it.

In addition to using a pre-built model download interface through apps like h2oGPT, you can also download and run some models directly from Hugging Face, a platform and community for artificial intelligence that includes many LLMs. (Not all models there include download options.) Mark Needham, developer advocate at StarTree, has a nice explainer on how to do this, including a YouTube video. He's also got some related code in a GitHub repo, including sentiment analysis with a local LLM

Hugging Face also has some documentation of its own about how to install and run available models locally.

Another popular option is to download and use LLMs locally in LangChain, a framework for creating end-to-end generative AI applications. That does require getting up to speed with writing code using the LangChain ecosystem. If you know LangChain basics, you may want to check out the documentation on Hugging Face Local Pipelines, Titan Takeoff (requires Docker as well as Python), and OpenLLM for running LangChain with local models. OpenLLM is another robust, standalone platform, designed for deploying LLM-based applications into production.

Copyright © 2023 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2