This README provides instructions for deploying the CodeGen application using Docker Compose on systems equipped with Intel Xeon CPUs.
- Overview
- Prerequisites
- Quick Start
- Available Deployment Options
- Configuration Parameters
- Building Custom Images (Optional)
- Validate Services
- Accessing the User Interface (UI)
- Troubleshooting
- Stopping the Application
- Next Steps
This guide focuses on running the pre-configured CodeGen service using Docker Compose on Intel Xeon CPUs. It leverages containers optimized for Intel architecture for the CodeGen gateway, LLM serving (vLLM or TGI), RAG components (Embedding, Retriever, Vector DB), and UI.
- Docker and Docker Compose installed.
- Intel Xeon CPU.
- Git installed (for cloning repository).
- Hugging Face Hub API Token (for downloading models).
- Access to the internet (or a private model cache).
- Clone the
GenAIExamples
repository:git clone https://github.com/opea-project/GenAIExamples.git cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
This uses the default vLLM-based deployment profile (codegen-xeon-vllm
).
-
Configure Environment: Set required environment variables in your shell:
# Replace with your host's external IP address (do not use localhost or 127.0.0.1) export HOST_IP="your_external_ip_address" # Replace with your Hugging Face Hub API token export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token" # Optional: Configure proxy if needed # export http_proxy="your_http_proxy" # export https_proxy="your_https_proxy" # export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary source ../../../set_env.sh
Note: The compose file might read additional variables from a
.env
file or expect them defined elsewhere. Ensure all required variables like ports (LLM_SERVICE_PORT
,MEGA_SERVICE_PORT
, etc.) are set if not using defaults from the compose file. -
Start Services (vLLM Profile):
docker compose --profile codegen-xeon-vllm up -d
-
Validate: Wait several minutes for models to download (especially the first time) and services to initialize. Check container logs (
docker compose logs -f <service_name>
) or proceed to the validation steps below.
The compose.yaml
file uses Docker Compose profiles to select the LLM serving backend.
- Profile:
codegen-xeon-vllm
- Description: Uses vLLM optimized for Intel CPUs as the LLM serving engine. This is the default profile used in the Quick Start.
- Services Deployed:
codegen-vllm-server
,codegen-llm-server
,codegen-tei-embedding-server
,codegen-retriever-server
,redis-vector-db
,codegen-dataprep-server
,codegen-backend-server
,codegen-gradio-ui-server
.
- Profile:
codegen-xeon-tgi
- Description: Uses Hugging Face Text Generation Inference (TGI) optimized for Intel CPUs as the LLM serving engine.
- Services Deployed:
codegen-tgi-server
,codegen-llm-server
,codegen-tei-embedding-server
,codegen-retriever-server
,redis-vector-db
,codegen-dataprep-server
,codegen-backend-server
,codegen-gradio-ui-server
. - To Run:
# Ensure environment variables (HOST_IP, HUGGINGFACEHUB_API_TOKEN) are set docker compose --profile codegen-xeon-tgi up -d
Key parameters are configured via environment variables set before running docker compose up
.
Environment Variable | Description | Default (Set Externally) |
---|---|---|
HOST_IP |
External IP address of the host machine. Required. | your_external_ip_address |
HUGGINGFACEHUB_API_TOKEN |
Your Hugging Face Hub token for model access. Required. | your_huggingface_token |
LLM_MODEL_ID |
Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within compose.yaml environment. |
Qwen/Qwen2.5-Coder-7B-Instruct |
EMBEDDING_MODEL_ID |
Hugging Face model ID for the embedding model (used by TEI service). Configured within compose.yaml environment. |
BAAI/bge-base-en-v1.5 |
LLM_ENDPOINT |
Internal URL for the LLM serving endpoint (used by codegen-llm-server ). Configured in compose.yaml . |
http://codegen-tgi-server:80/generate or http://codegen-vllm-server:8000/v1/chat/completions |
TEI_EMBEDDING_ENDPOINT |
Internal URL for the Embedding service. Configured in compose.yaml . |
http://codegen-tei-embedding-server:80/embed |
DATAPREP_ENDPOINT |
Internal URL for the Data Preparation service. Configured in compose.yaml . |
http://codegen-dataprep-server:80/dataprep |
BACKEND_SERVICE_ENDPOINT |
External URL for the CodeGen Gateway (MegaService). Derived from HOST_IP and port 7778 . |
http://${HOST_IP}:7778/v1/codegen |
*_PORT (Internal) |
Internal container ports (e.g., 80 , 6379 ). Defined in compose.yaml . |
N/A |
http_proxy / https_proxy /no_proxy |
Network proxy settings (if required). | "" |
Most of these parameters are in set_env.sh
, you can either modify this file or overwrite the env variables by setting them.
source CodeGen/docker_compose/set_env.sh
Docker Compose profiles (codegen-xeon-vllm
, codegen-xeon-tgi
) control which LLM serving backend (vLLM or TGI) and its associated dependencies are started. Only one profile should typically be active.
If you need to modify the microservices:
- Clone the OPEA GenAIComps repository.
- Follow build instructions in the respective component directories (e.g.,
comps/llms/text-generation
,comps/codegen
,comps/ui/gradio
, etc.). Use the provided Dockerfiles (e.g.,CodeGen/Dockerfile
,CodeGen/ui/docker/Dockerfile.gradio
). - Tag your custom images appropriately (e.g.,
my-custom-codegen:latest
). - Update the
image:
fields in thecompose.yaml
file to use your custom image tags.
Refer to the main CodeGen README for links to relevant GenAIComps components.
Ensure all containers associated with the chosen profile are running:
docker compose --profile <profile_name> ps
# Example: docker compose --profile codegen-xeon-vllm ps
Check logs for specific services: docker compose logs <service_name>
Use curl
commands to test the main service endpoints. Ensure HOST_IP
is correctly set in your environment.
-
Validate LLM Serving Endpoint (Example for vLLM on default port 8000 internally, exposed differently):
# This command structure targets the OpenAI-compatible vLLM endpoint curl http://${HOST_IP}:8000/v1/chat/completions \ -X POST \ -H 'Content-Type: application/json' \ -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a basic Python class"}], "max_tokens":32}'
- Expected Output: A JSON response with generated code in
choices[0].message.content
.
- Expected Output: A JSON response with generated code in
-
Validate CodeGen Gateway (MegaService on default port 7778):
curl http://${HOST_IP}:7778/v1/codegen \ -H "Content-Type: application/json" \ -d '{"messages": "Write a Python function that adds two numbers."}'
- Expected Output: A stream of JSON data chunks containing generated code, ending with
data: [DONE]
.
- Expected Output: A stream of JSON data chunks containing generated code, ending with
Multiple UI options can be configured via the compose.yaml
.
Access the default Gradio UI by navigating to:
http://{HOST_IP}:8080
(Port 8080
is the default host mapping for codegen-gradio-ui-server
)
- Modify
compose.yaml
: Comment out thecodegen-gradio-ui-server
service and uncomment/add thecodegen-xeon-ui-server
(Svelte) service definition, ensuring the port mapping is correct (e.g.,"- 5173:5173"
). - Restart Docker Compose:
docker compose --profile <profile_name> up -d
- Access:
http://{HOST_IP}:5173
(or the host port you mapped).
- Modify
compose.yaml
: Comment out the default UI service and uncomment/add thecodegen-xeon-react-ui-server
definition, ensuring correct port mapping (e.g.,"- 5174:80"
). - Restart Docker Compose:
docker compose --profile <profile_name> up -d
- Access:
http://{HOST_IP}:5174
(or the host port you mapped).
Users can interact with the backend service using the Neural Copilot
VS Code extension.
- Install: Find and install
Neural Copilot
from the VS Code Marketplace. - Configure: Set the "Service URL" in the extension settings to your CodeGen backend endpoint:
http://${HOST_IP}:7778/v1/codegen
(use the correct port if changed). - Usage:
- Model Download Issues: Check
HUGGINGFACEHUB_API_TOKEN
. Ensure internet connectivity or correct proxy settings. Check logs oftgi-service
/vllm-service
andtei-embedding-server
. Gated models need prior Hugging Face access. - Connection Errors: Verify
HOST_IP
is correct and accessible. Checkdocker ps
for port mappings. Ensureno_proxy
includesHOST_IP
if using a proxy. Check logs of the service failing to connect (e.g.,codegen-backend-server
logs if it can't reachcodegen-llm-server
). - "Container name is in use": Stop existing containers (
docker compose down
) or changecontainer_name
incompose.yaml
. - Resource Issues: CodeGen models can be memory-intensive. Monitor host RAM usage. Increase Docker resources if needed.
docker compose --profile <profile_name> down
# Example: docker compose --profile codegen-xeon-vllm down
- Consult the OPEA GenAIComps repository for details on individual microservices.
- Refer to the main CodeGen README for links to benchmarking and Kubernetes deployment options.