- Mimic and make a lightweight OpenAI API server endpoint to serve the text-generation service.
- Make use of llama.cpp library,created by ggerganov in Pure C/C++, for text-generation service. I might be applied on various platform (embedded device, cloud, mobile(android, iphone) ...)
- A simple UI tool to explore/research the capability of text-generation service.
This is demonstration version, some issues or error checking is not fully validated.
Contact me via avble.harry dot gmail.com
if any
- A lightweight OpenAI API compatible server: av_connect http server in C++
- Text-generation: llama.cp
- Web UI: Provide a simple web UI interface to explore/experiment
Obtain the latest container from docker hub
docker image pull harryavble/av_llm
Run from docker
docker run -p 8080:8080 harryavble/av_llm:latest
Access to Web interface at http://127.0.0.1:8080
- LLaMA 1
- LLaMA 2
- LLaMA 3
- Mistral-7B
- Mixtral MoE
- DBRX
- Falcon
- Chinese-LLaMA-Alpaca This application is built on the top of llama.cpp, so it should work any model which the llama.cpp supports
docker run -p 8080:8080 -v $your_host_model_folder:/work/model av_llm ./av_llm -m /work/model/$your_model_file
T.B.D
Should work with below UI
- Support more LLM models
- Support more OpenAI API server
- Support more application