A chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU.
- SvelteKit frontend
- Redis for storing chat history & parameters
- FastAPI + langchain for the API, wrapping calls to llama.cpp using the python bindings
Setting up Aya-LLM is very easy. Starting it up can be done in a single command:
docker run -d -v weights:/usr/src/app/weights -v datadb:/data/db/ -p 8008:8008 ghcr.io/umilab/aya-llm:latest
Then just go to http://localhost:8008/ and you're good to go!
The API documentation can be found at http://localhost:8008/api/docs
Make sure you have docker desktop installed, WSL2 configured and enough free RAM to run models. (see below)
Setting up Aya-LLM on Kubernetes or docker compose can be found in the wiki
Currently the following models are supported:
- Alpaca 7B
- Alpaca 7B-native
- Alpaca 13B
- Alpaca 30B
- GPT4All
- Vicuna 7B
- Vicuna 13B
- Open Assistant 13B
- Open Assistant 30B
If you have existing weights from another project you can add them to the aya-llm_weights
volume using docker cp
.
LLaMA will just crash if you don't have enough available memory for your model.
- 7B requires about 4.5GB of free RAM
- 13B requires about 12GB free
- 30B requires about 20GB free
Feel free to join the discord if you need help with the setup: https://discord.gg/nhB8hv3Rf5
Aya-LLM is always open for contributions! If you catch a bug or have a feature idea, feel free to open an issue or a PR.
If you want to run Aya-LLM in development mode (with hot-module reloading for svelte & autoreload for FastAPI) you can do so like this:
git clone https://github.com/umilab/aya-llm.git
DOCKER_BUILDKIT=1 docker compose -f docker-compose.dev.yml up -d --build
You can test the production image with
DOCKER_BUILDKIT=1 docker compose up -d --build
- Front-end to interface with the API
- Pass model parameters when creating a chat
- Manager for model files
- Support for other models
- LangChain integration
- User profiles & authentication
And a lot more!