Skip to content

This Open LLM Framework serves as a powerful and flexible tool for serving endpoints for embeddings and chat completions using SOTA open source language models. By leveraging models Transformers, this enables various natural language processing (NLP) tasks to be performed via simple HTTP endpoints similar to openai endpoints.

License

Notifications You must be signed in to change notification settings

rushizirpe/open-llm-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Docker   Python   FastAPI   Git   License   Issues  

Open LLM

This Open LLM Framework serves as a powerful and flexible tool for generating text embeddings and chat completions using state-of-the-art and open source language models. By leveraging models Transformers, this enables various natural language processing (NLP) tasks to be performed via simple HTTP endpoints similar to openai endpoints.

  • Provides an easy-to-use API interface to leverage powerful NLP models locally without needing deep expertise in machine learning.

  • {TD - Allow integration with various applications, including chatbots, content creation tools, and recommendation systems.}

  • Supports multiple models from Transformers library, enabling diverse NLP tasks.

  • Utilizes GPU acceleration when available to enhance processing speed and efficiency.

  • Tunneling to give access to other endpoints

  • Reduces dependency on external APIs, potentially lowering operational costs.

  • Enables control over the computational resources used, optimizing for cost and performance.

Notebooks

For GraphRAG:

  • Open In Colab
  • More Coming Soon...

Usage

  • Prerequisites

    • Python >= 3.10
    • Docker >= 23.0.3
  • Source

# Clone the Repository
git clone https://github.com/rushizirpe/open-llm-server.git

# Install Dependencies
cd open-llm-server
pip install -e .

#  Launch server
llm-server start --host 127.0.0.1 --port 8888 --reload

Params:

  • start: Start the server
  • stop: Stop the server
  • status: Check the server status
  • --host: Specify the host IP (default: 127.0.0.1)
  • --port: Specify the port number (default: 8888)
  • --reload: Enable auto-reload for development

API Endpoints

  1. Chat Completions: /v1/chat/completions
  2. Embeddings: /v1/embeddings
  3. System Metrics: /v1/metrics
  • DockerHub

# Pull Docker Image
docker pull thisisrishi/open-llm-server
# Run Docker
docker run -it -p 8888:8888 thisisrishi/open-llm-server:latest

OR

# Run on Custom Port
docker run -e PORT=8000 -p 8000:8000 thisisrishi/open-llm-server:latest

OR

# Create and Start Container
docker compose up

Endpoints

  • Health Check

    • URL: /
    • Method: GET
    • Description: Check the status of the API and the availability of a GPU.
  • Usage

curl http://localhost:8888/
  • Response:
{
    "status": "System Status: Operational",
    "gpu": "Available",
    "gpu_details": {
        "GPU 0": {
            "compute_capability": "(8, 9)",
            "device_name": "NVIDIA L4"
            }
    }
}
  • Embeddings

    • URL: /v1/embeddings
    • Method: POST
    • Description: Generate embeddings for a list of input texts using a specified model.
  • Usage

curl http://localhost:8888/v1/embeddings \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer DUMMY_KEY"  \
    -d '{"input": "the quick brown fox", "model": "nomic-ai/nomic-embed-text-v1.5"}' 
  • Response:
{
    "object": "list",
    "data": [
        {"embedding": [0.56324344, 0.25775233, -0.123355], "index": 0},
        {"embedding": [0.30823462, -0.23636326, 0.543345], "index": 1}
    ],
    "model": "nomic-ai/nomic-embed-text-v1.5",
    "usage": {"total_tokens": 5}
}
  • Chat Completions

    • URL: /v1/chat/completions
    • Method: POST
    • Description: Generate chat completions based on conversation history using a specified model.
  • Request Body:

{
    "model": "openai-community/gpt2",
    "messages": [
        {"role": "user", "content": "Hi!"},
        {"role": "assistant", "content": "Hi there! How can I help you today?"}
    ],
    "max_tokens": 150,
    "temperature": 0.7,
    "top_p": 1.0,
    "n": 1,
    "stop": null
}
  • Usage
curl http://localhost:8888/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer DUMMY_KEY" \
    -d '{"model": "openai-community/gpt2","messages": [{"role": "user", "content": "Hi!"}],"max_tokens": 150,"temperature": 0.7}'
  • Response:
{
    "choices": [
        {"index": 0, "text": "Hello, I can help you with a variety of tasks, such as ..."}
    ]
}

License

   This project is licensed under the MIT License. See the LICENSE file for more details.

Contributing

  Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

Contact

  For any inquiries or support, please contact me.

About

This Open LLM Framework serves as a powerful and flexible tool for serving endpoints for embeddings and chat completions using SOTA open source language models. By leveraging models Transformers, this enables various natural language processing (NLP) tasks to be performed via simple HTTP endpoints similar to openai endpoints.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published