Open LLM

This Open LLM Framework serves as a powerful and flexible tool for generating text embeddings and chat completions using state-of-the-art and open source language models. By leveraging models Transformers, this enables various natural language processing (NLP) tasks to be performed via simple HTTP endpoints similar to openai endpoints.

Provides an easy-to-use API interface to leverage powerful NLP models locally without needing deep expertise in machine learning.
{TD - Allow integration with various applications, including chatbots, content creation tools, and recommendation systems.}
Supports multiple models from Transformers library, enabling diverse NLP tasks.
Utilizes GPU acceleration when available to enhance processing speed and efficiency.
Tunneling to give access to other endpoints
Reduces dependency on external APIs, potentially lowering operational costs.
Enables control over the computational resources used, optimizing for cost and performance.

‍

Notebooks

For GraphRAG:

More Coming Soon...

Usage

Prerequisites
- Python >= 3.10
- Docker >= 23.0.3
Source

# Clone the Repository
git clone https://github.com/rushizirpe/open-llm-server.git

# Install Dependencies
cd open-llm-server
pip install -e .

#  Launch server
llm-server start --host 127.0.0.1 --port 8888 --reload

Params:

start: Start the server
stop: Stop the server
status: Check the server status
--host: Specify the host IP (default: 127.0.0.1)
--port: Specify the port number (default: 8888)
--reload: Enable auto-reload for development

API Endpoints

Chat Completions: /v1/chat/completions
Embeddings: /v1/embeddings
System Metrics: /v1/metrics

DockerHub

# Pull Docker Image
docker pull thisisrishi/open-llm-server

# Run Docker
docker run -it -p 8888:8888 thisisrishi/open-llm-server:latest

OR

# Run on Custom Port
docker run -e PORT=8000 -p 8000:8000 thisisrishi/open-llm-server:latest

OR

# Create and Start Container
docker compose up

‍

Endpoints

Health Check
- URL: /
- Method: GET
- Description: Check the status of the API and the availability of a GPU.
Usage

curl http://localhost:8888/

Response:

{
    "status": "System Status: Operational",
    "gpu": "Available",
    "gpu_details": {
        "GPU 0": {
            "compute_capability": "(8, 9)",
            "device_name": "NVIDIA L4"
            }
    }
}

Embeddings
- URL: /v1/embeddings
- Method: POST
- Description: Generate embeddings for a list of input texts using a specified model.
Usage

curl http://localhost:8888/v1/embeddings \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer DUMMY_KEY"  \
    -d '{"input": "the quick brown fox", "model": "nomic-ai/nomic-embed-text-v1.5"}'

Response:

{
    "object": "list",
    "data": [
        {"embedding": [0.56324344, 0.25775233, -0.123355], "index": 0},
        {"embedding": [0.30823462, -0.23636326, 0.543345], "index": 1}
    ],
    "model": "nomic-ai/nomic-embed-text-v1.5",
    "usage": {"total_tokens": 5}
}

Chat Completions
- URL: /v1/chat/completions
- Method: POST
- Description: Generate chat completions based on conversation history using a specified model.
Request Body:

{
    "model": "openai-community/gpt2",
    "messages": [
        {"role": "user", "content": "Hi!"},
        {"role": "assistant", "content": "Hi there! How can I help you today?"}
    ],
    "max_tokens": 150,
    "temperature": 0.7,
    "top_p": 1.0,
    "n": 1,
    "stop": null
}

Usage

curl http://localhost:8888/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer DUMMY_KEY" \
    -d '{"model": "openai-community/gpt2","messages": [{"role": "user", "content": "Hi!"}],"max_tokens": 150,"temperature": 0.7}'

Response:

{
    "choices": [
        {"index": 0, "text": "Hello, I can help you with a variety of tasks, such as ..."}
    ]
}

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

Contact

For any inquiries or support, please contact me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Open LLM

Notebooks

Usage

Prerequisites

Source

API Endpoints

DockerHub

Endpoints

Health Check

Embeddings

Chat Completions

License

Contributing

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Open LLM

Notebooks

Usage

Prerequisites

Source

API Endpoints

DockerHub

Endpoints

Health Check

Embeddings

Chat Completions

License

Contributing

Contact