-
Notifications
You must be signed in to change notification settings - Fork 594
ci: Specify MPI implementation to mpich #2182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAcross all CUDA version Dockerfiles (cu126, cu128, cu129, cu130), mpich is now installed alongside mpi4py in the py312 conda environment. This change is applied consistently to both standard and development-variant Dockerfiles, expanding the MPI backend availability. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes This is a repetitive configuration change applied uniformly across 8 Dockerfiles with identical patterns. Each modification simply adds mpich to an existing conda install line with no logic, build flow, or behavioral changes. Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @bkryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request standardizes the MPI environment within the project's Docker images by explicitly specifying Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request consistently specifies mpich as the MPI implementation when installing mpi4py across all Dockerfiles. This is a good practice for ensuring reproducible environments. My main feedback is to also clean the conda cache after installation to optimize the Docker image sizes. I've added specific suggestions for this on each of the changed lines.
Additionally, I've noticed significant duplication across the various Dockerfiles (cu126, cu128, etc., and their .dev variants). While a full refactor is outside the scope of this PR, you might consider consolidating them in the future using a single base Dockerfile with build arguments to handle the CUDA version differences. This would greatly improve maintainability.
yzh119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall.
In the long term, I wonder should we consider installing mpi from apt-get like in sglang: https://github.com/sgl-project/sglang/blob/cee93a6f26023d978b5187725bcb3c15ba604343/docker/Dockerfile#L474
📌 Description
We currently have unit tests failing as:
These tests should be skipping in a single GPU environment, but are failing, which indicates that they are failing at MPI module load time.
The current
dockerfile.cuXXXinstalls MPI viaRUN conda install -n py312 -y mpi4py. Upon investigating the docker build logs,A month ago (Nov. 4),
was being installed, but yesterday:
is being installed.
The mpich vs. impi are Implementations to the MPI: MPICH vs. Intel MPI. This is currently the suspected issue underlying the MPI load failures.
Current PR specifies the MPI implementation via
RUN conda install -n py312 -y mpi4py mpich. The result of the current PR produces (build log):which now matches what we had before
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes