Instant Language Model Web App on Your Home Desktop

Travis Harrison
December 9, 2023

Incredible Large Language Models (LLM) have been released by technology companies including OpenAI ChatGPT, Facebook Llama, and Google Bard. LLMs can be used to generate text, summarize text, question answering, and more, but the focus has largely been focused on their ability to answer questions with surprisingly strong answers.

These models have grown to 100+ billion parameters and are trained on hundreds of gigabytes of text data. This much data, the number of parameters, and advanced modeling architectures make for very convincing AI models. Although, the models do still have their limitations and are marked with warnings on their potential biases and hallucinations.

Get Started

How do we run such large models? Turns out it can be pretty hard because the model parameter counts are so large which makes loading them require dozens of gigabytes of memory. But what if we want to run them on our own machines?

We can use a few tricks and variations of the models in order to run them on a consumer desktop and pop it into a web app for easy use. Here we are going to use the Facebook Llama 7B model which is the smallest variation of the Llama model. In addition, we are going to quantize the weights down to 4 bits and change the batch size to one. The quantization reduces the precision of the weights while still maintaining most of the performance. The reduction in batch size increases the latency and limits the scalability of the model by removing the parallel processing.

Finally we will use the open source Dalai software which will quantize and serve the model in our browser!

Prerequisites

Bullet Points in Black

Linux Operating System
14 GB+ RAM

Installation

1.Install packages for the model

sudo apt update

sudo apt upgrade

sudo apt install g++ build-essential python3.10 python3.10-venv

2.Update ~/.bashrc with

alias python=python3

3.Update the current terminal or restart it

source ~/.bashrc

4.Install nvm

curl -o- <https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh> | bash

export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")"

[ -s "$NVM_DIR/nvm.sh" ] && \\. "$NVM_DIR/nvm.sh" # This loads nvm

5.Install node with nvm

nvm install 18.15.0

6.Install Dalai and run it

npx dalai llama
npx dalai serve

7.Go to localhost:3000 and start using your very own language model!

There you have it, your own personal language model! There is much more to explore about LLMs and their use in improving the efficiency of workers and integrating them with existing products. Follow us to learn more… like how you can use the new ChatGPT plugins that directly integrate with external knowledge by giving the model access to web browsing, code interpreting, and retrieval of self hosted knowledge bases!