You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Impact Level: [Medium] - Introduces a standardized Docker container runtime environment for the project, simplifying environment setup (especially CUDA and PyTorch dependencies), with no effect on existing local runtime logic.
Key Changes:
✨ Added Dockerfile: Based on PyTorch and CUDA 13.0 images, automatically configures Alibaba Cloud APT mirror and pre-installs the flashinfer library along with system dependencies.
✨ Added docker-compose.yml: Provides one-click startup configuration, supporting full GPU mounting, 32GB shared memory, and persistent model caching.
✨ Updated README.md: Added comprehensive Docker usage instructions, including build/run commands and guidance for adjusting CUDA versions to match local machines.
🔧 Added .dockerignore: Excludes .git, .venv, caches, and build artifacts to optimize image build speed.
Description: Added full Docker support configuration, making it easy to replicate complex GPU/CUDA environments across different machines (source files: Dockerfile, docker-compose.yml, .dockerignore).
Added an example for running the demo script demo.py inside the container.
High-value guidance: Included concrete Diff examples for modifying the Dockerfile to support other CUDA versions (e.g., downgrading from CUDA 13.0 to the more common CUDA 12.8).
4. Impact & Risk Assessment
⚠️Breaking Changes:None. All changes are new files and documentation updates, with no impact on existing local development/runtime workflows using conda/pip.
🧪 Testing Recommendations:
Build Test: Run docker compose build on a machine with NVIDIA drivers and Docker installed, verifying successful Alibaba Cloud mirror switching and dependency installation (network variations may cause APT/PIP download failures).
GPU Passthrough Test: After starting the container, run nvidia-smi and python -c "import torch; print(torch.cuda.is_available()) in the terminal to confirm GPU detection.
Business Logic Test: Run demo.py inside the container to verify flashinfer and model inference work as expected without errors.
Hi, The Dockerfile’s apt-get retry loop does not fail the build if all attempts fail, so the image can be produced without required runtime dependencies (e.g., ffmpeg/libGL) and later fail at runtime.
Severity: action required | Category: reliability
How to fix: Fail build if apt fails
Agent prompt to fix - you can give this to your LLM of choice:
Issue description
The Dockerfile retries apt-get update/install but can still succeed the build even if all retries fail, leaving the image without required system packages.
Issue Context
This is caused by the for i in 1 2 3; do ... && break; ...; done loop having no final exit 1 / success guard.
Fix Focus Areas
Dockerfile[8-29]
Suggested fix
After the loop, add a verification that installation succeeded, e.g.:
Track a success=0/1 flag and exit 1 if it never becomes 1, or
Replace the loop with an until/|| structure that ultimately fails, e.g.:
apt-get update && apt-get install ... || (sleep 5; false) and repeat with explicit failure after N tries.
Also consider adding a small command -v ffmpeg sanity check if ffmpeg is required.
We noticed a couple of other issues in this PR as well - happy to share if helpful.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. High-Level Summary (TL;DR)
flashinferlibrary along with system dependencies..git,.venv, caches, and build artifacts to optimize image build speed.2. Visual Overview (Code & Logic Map)
graph TD %% Style definition (high contrast) classDef host fill:#bbdefb,color:#0d47a1,stroke:#0d47a1 classDef docker fill:#c8e6c9,color:#1a5e20,stroke:#1a5e20 classDef volume fill:#fff3e0,color:#e65100,stroke:#e65100 subgraph "Host Machine" User["Developer"]:::host Source["Local code directory (./)"]:::host end subgraph "Docker Environment" Compose["docker-compose.yml"]:::docker Container["lingbot-map container (CUDA 13.0)"]:::docker subgraph "Volumes" HFCache["hf_cache"]:::volume TorchCache["torch_cache"]:::volume end end User -->|"docker compose run"| Compose Compose -->|"Build & Start"| Container Source -.->|"Mount workspace (/workspace)"| Container Container -.->|"Persistent read/write"| HFCache Container -.->|"Persistent read/write"| TorchCache3. Detailed Change Analysis
🐳 Docker Containerization Configuration
Description: Added full Docker support configuration, making it easy to replicate complex GPU/CUDA environments across different machines (source files:
Dockerfile,docker-compose.yml,.dockerignore).Base Environment & Dependencies (Dockerfile):
spxiong/pytorch:2.11.0-py3.10.19-cuda13.0.2-ubuntu22.04https://mirrors.aliyun.com/ubuntuUBUNTU_MIRROR)ffmpeg,libgl1,libsm6, etc.flashinfer-python,flashinfer-jit-cachecu130and the main project package.[vis]Environment & Resource Mounting (docker-compose.yml):
.:/workspacehf_cache:/root/.cache/huggingfacetorch_cache:/root/.cache/torch8080:8080all32gb📝 Documentation Update
Description: Updated user guide to support the new Docker workflow (source file:
README.md).docker compose build&docker compose run --rm --service-ports lingbot-map).demo.pyinside the container.Dockerfileto support other CUDA versions (e.g., downgrading from CUDA 13.0 to the more common CUDA 12.8).4. Impact & Risk Assessment
conda/pip.docker compose buildon a machine with NVIDIA drivers and Docker installed, verifying successful Alibaba Cloud mirror switching and dependency installation (network variations may cause APT/PIP download failures).nvidia-smiandpython -c "import torch; print(torch.cuda.is_available())in the terminal to confirm GPU detection.demo.pyinside the container to verifyflashinferand model inference work as expected without errors.