Skip to content

Latest commit

 

History

History
192 lines (151 loc) · 5.26 KB

README.md

File metadata and controls

192 lines (151 loc) · 5.26 KB

Docker   Python   FastAPI   Git   License   Issues  

Open LLM

This Open LLM Framework serves as a powerful and flexible tool for generating text embeddings and chat completions using state-of-the-art and open source language models. By leveraging models Transformers, this enables various natural language processing (NLP) tasks to be performed via simple HTTP endpoints similar to openai endpoints.

  • Provides an easy-to-use API interface to leverage powerful NLP models locally without needing deep expertise in machine learning.

  • {TD - Allow integration with various applications, including chatbots, content creation tools, and recommendation systems.}

  • Supports multiple models from Transformers library, enabling diverse NLP tasks.

  • Utilizes GPU acceleration when available to enhance processing speed and efficiency.

  • Tunneling to give access to other endpoints

  • Reduces dependency on external APIs, potentially lowering operational costs.

  • Enables control over the computational resources used, optimizing for cost and performance.

Notebooks

For GraphRAG:

  • Open In Colab
  • More Coming Soon...

Usage

  • Prerequisites

    • Python >= 3.10
    • Docker >= 23.0.3
  • Source

# Clone the Repository
git clone https://github.com/rushizirpe/open-llm-server.git

# Install Dependencies
cd open-llm-server
pip install -e .

#  Launch server
llm-server start --host 127.0.0.1 --port 8888 --reload

Params:

  • start: Start the server
  • stop: Stop the server
  • status: Check the server status
  • --host: Specify the host IP (default: 127.0.0.1)
  • --port: Specify the port number (default: 8888)
  • --reload: Enable auto-reload for development

API Endpoints

  1. Chat Completions: /v1/chat/completions
  2. Embeddings: /v1/embeddings
  3. System Metrics: /v1/metrics
  • DockerHub

# Pull Docker Image
docker pull thisisrishi/open-llm-server
# Run Docker
docker run -it -p 8888:8888 thisisrishi/open-llm-server:latest

OR

# Run on Custom Port
docker run -e PORT=8000 -p 8000:8000 thisisrishi/open-llm-server:latest

OR

# Create and Start Container
docker compose up

Endpoints

  • Health Check

    • URL: /
    • Method: GET
    • Description: Check the status of the API and the availability of a GPU.
  • Usage

curl http://localhost:8888/
  • Response:
{
    "status": "System Status: Operational",
    "gpu": "Available",
    "gpu_details": {
        "GPU 0": {
            "compute_capability": "(8, 9)",
            "device_name": "NVIDIA L4"
            }
    }
}
  • Embeddings

    • URL: /v1/embeddings
    • Method: POST
    • Description: Generate embeddings for a list of input texts using a specified model.
  • Usage

curl http://localhost:8888/v1/embeddings \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer DUMMY_KEY"  \
    -d '{"input": "the quick brown fox", "model": "nomic-ai/nomic-embed-text-v1.5"}' 
  • Response:
{
    "object": "list",
    "data": [
        {"embedding": [0.56324344, 0.25775233, -0.123355], "index": 0},
        {"embedding": [0.30823462, -0.23636326, 0.543345], "index": 1}
    ],
    "model": "nomic-ai/nomic-embed-text-v1.5",
    "usage": {"total_tokens": 5}
}
  • Chat Completions

    • URL: /v1/chat/completions
    • Method: POST
    • Description: Generate chat completions based on conversation history using a specified model.
  • Request Body:

{
    "model": "openai-community/gpt2",
    "messages": [
        {"role": "user", "content": "Hi!"},
        {"role": "assistant", "content": "Hi there! How can I help you today?"}
    ],
    "max_tokens": 150,
    "temperature": 0.7,
    "top_p": 1.0,
    "n": 1,
    "stop": null
}
  • Usage
curl http://localhost:8888/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer DUMMY_KEY" \
    -d '{"model": "openai-community/gpt2","messages": [{"role": "user", "content": "Hi!"}],"max_tokens": 150,"temperature": 0.7}'
  • Response:
{
    "choices": [
        {"index": 0, "text": "Hello, I can help you with a variety of tasks, such as ..."}
    ]
}

License

   This project is licensed under the MIT License. See the LICENSE file for more details.

Contributing

  Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

Contact

  For any inquiries or support, please contact me.