-
Notifications
You must be signed in to change notification settings - Fork 5
Submission of Docker project #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
AdrienG9
wants to merge
6
commits into
vib-tcp:main
Choose a base branch
from
AdrienG9:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
6cb902a
repository cloning done
AdrienG9 5e1603c
containerized and ran training the ml model following the proposed st…
AdrienG9 c2d1848
readme ticked boxes
AdrienG9 d215405
containerizing and serving the ml model with documentation on readme
AdrienG9 709917b
storing images on dockerhub and trying to build apptainer images
AdrienG9 9965913
commit without large sif files
AdrienG9 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
infer_container.sif | ||
train_container.sif |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,29 @@ | ||
# Dockerfile for the inference server | ||
# This Dockerfile is used to create a Docker image for the inference server | ||
# It is based on the Python 3.9 slim image | ||
FROM python:3.9-slim | ||
|
||
# Set working directory | ||
# This is the directory where the inference server will be executed | ||
# and where the model will be loaded | ||
WORKDIR /app | ||
|
||
# Copy the list of Python packages required for the serving script | ||
COPY requirements.txt . | ||
COPY server.py . | ||
FROM python:3.9-slim | ||
CMD ["python", "server.py"] | ||
|
||
# Install the Python dependencies inside the container | ||
RUN pip install --no-cache-dir -r requirements.txt | ||
|
||
# Copy the model files to the working directory | ||
# moved after dependencies | ||
COPY server.py . | ||
|
||
# Expose the port that the inference server will listen on | ||
# This is the port that the server will use to communicate with clients | ||
# The default port for the inference server is 8080 | ||
EXPOSE 8080 | ||
|
||
# Run the inference server | ||
CMD ["python", "server.py"] | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,42 @@ | ||
FROM <base imagae> | ||
ARG OWNER=jupyter | ||
ARG BASE_CONTAINER=$OWNER/scipy-notebook:python-3.11.5 | ||
FROM $BASE_CONTAINER | ||
|
||
# TODO: Set a working directory | ||
# This is the directory where the training script will be executed | ||
# and where the model will be saved | ||
WORKDIR /home/jovyan/app | ||
|
||
# Switch to root to create model output dir | ||
USER root | ||
RUN mkdir -p /home/jovyan/app/models | ||
# Give ownership to jovyan so it can write to it | ||
RUN chown -R jovyan /home/jovyan/app | ||
|
||
# Switch back to jovyan user | ||
USER jovyan | ||
|
||
# TODO: Copy the requirements.txt file to the working directory | ||
# This file contains the list of Python packages required for the training script | ||
# The requirements.txt file should be in the same directory as the Dockerfile | ||
COPY requirements.txt . | ||
|
||
# TODO: Install the Python dependencies | ||
# TODO: Install the Python dependencies inside the container | ||
# Use --no-cache-dir to avoid caching the packages | ||
# Useful for keeping the image size smaller | ||
RUN pip install --no-cache-dir -r requirements.txt | ||
|
||
# TODO: Copy the training script (train.py) to the working directory | ||
# This script contains the code to train the model | ||
# and save it to a specified location | ||
# The script should be in the same directory as the Dockerfile | ||
# If the script is in a different directory, adjust the path accordingly | ||
# For example, if the script is in a subdirectory called 'src', use: | ||
# COPY src/train.py . | ||
# If the script is in a parent directory, use: | ||
# COPY ../train.py . | ||
COPY train.py . | ||
|
||
# TODO: Run the training script that generates the model | ||
CMD [...] | ||
# This command will be executed when the container starts | ||
CMD ["python", "train.py"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
## Containerize and run training the machine learning model | ||
|
||
0. I didn't pull an image with the dependencies needed because I already had one from the training session with the librairies that are going to be use (Repository: jupyter/scipy-notebook; TAG: python-3.11.5 ) | ||
|
||
2. I then modified the recipe Dockerfile.train with the proposed advices on the dockerfile | ||
|
||
3. Proceed to create a first version, but this was missing some info so I created a second one | ||
|
||
4. Built the train model using | ||
`docker build . --tag train:version2 -f Dockerfile.train` | ||
|
||
5. Ran the train model using | ||
`docker run --rm --volume "$PWD"/app/models:/app/models train:version2` | ||
|
||
6. Once completed this resulted in: iris_model.pkl | ||
|
||
## Then onto the containerizing and serving the ml model | ||
|
||
7. Started by pulling the proposed image using | ||
`docker pull python:3.9-slim` | ||
|
||
8. Modified the dockerfile.infer | ||
|
||
9. Built the image using | ||
`docker build . --tag infer:version1 -f Dockerfile.infer` | ||
|
||
10. Ran and mounted the serving recipe | ||
`docker run --rm -- p 8080:8080 -v "$PWD"/app/models:/app/models infer:version1` | ||
|
||
11. Tested it on another terminal using | ||
`curl http://localhost:8080/` which returned "Welcome to Docker Lab" | ||
|
||
## Storing images on Dockerhub | ||
|
||
12. I then logged in to dockerhub via vs code | ||
`docker login` | ||
|
||
13. Then properly retagged my images for docker hub | ||
```bash | ||
docker tag train:version2 agrondin1/train:version2 | ||
docker tag infer:version1 agrondin1/infer:version1 | ||
``` | ||
|
||
14. Finally proceeded to push images to docker Hub | ||
```bash | ||
docker push agrondin1/train:version2 | ||
docker push agrondin1/infer:version1 | ||
``` | ||
|
||
## Building an Apptainer image on the HPC | ||
|
||
15. I first connected to https://login.hpc.ugent.be | ||
|
||
16. Proceeded to start a shell session with 1 node 4 cores for 4 hours using the donphan cluster | ||
|
||
17. I then enter my scratch directory: | ||
`cd scratch/gent/491/vsc49179` | ||
|
||
18. Then went one to build the apptainer images: `nano build_apptainer_images.sh` | ||
```bash | ||
`#!/bin/bash` | ||
`#SBATCH --job-name=apptainer_build_all` | ||
`#SBATCH --output=apptainer_build.log` | ||
`#SBATCH --time=01:00:00` | ||
`#SBATCH --ntasks=1` | ||
|
||
`# Apptainer is available system-wide so no need to load module` | ||
|
||
`# Build training image` | ||
`apptainer build train_container.sif docker://agrondin1/train:version2` | ||
|
||
`# Build inference image` | ||
`apptainer build infer_container.sif docker://agrondin1/infer:version1` | ||
``` | ||
|
||
19. Submitted the job to slurm | ||
`sbatch build_apptainer_images.sh` | ||
|
||
This did manage to create the infer container but not the train so I relaunch it with only the build training image. | ||
|
||
20. I then switch to vs code server and modified the script to create the containers like so | ||
|
||
```bash | ||
#!/bin/bash | ||
#SBATCH --job-name=job_submission | ||
#SBATCH --output=apptainer_build.log | ||
#SBATCH --partition=donphan | ||
#SBATCH --mem=8G | ||
#SBATCH --time=00:30:00 | ||
# Apptainer is available system-wide so no need to load module | ||
``` | ||
|
||
```bash | ||
# Build training image | ||
apptainer build --fakeroot train_container.sif docker://agrondin1/train:version2 | ||
``` | ||
|
||
```bash | ||
# Build inference image | ||
apptainer build --fakeroot infer_container.sif docker://agrondin1/infer:version1 | ||
``` | ||
|
||
```bash | ||
mv train_container.sif $VSC_SCRATCH/. | ||
mv infer_container.sif $VSC_SCRATCH/. | ||
``` | ||
|
||
|
||
And then I finally manage to create both image plus the log of the slurm job. Unfortunately the .log is not complete because I creates the *sif files but had forgotten to use #SBATCH --output=apptainer_build.log and I had to do it again but now it just states that the file have been created. | ||
|
||
21. I copied the files to my user home directory: | ||
cp apptainer_build.log infer_container.sif train_container.sif /user/gent/491/vsc49179 | ||
|
||
22. Then finally donwloaded the files and finally pushed everything except the *sif because they exceeded the limit supported by github: | ||
```bash | ||
remote: error: File infer_container.sif is 120.25 MB; this exceeds GitHub's file size limit of 100.00 MB | ||
remote: error: File train_container.sif is 1246.62 MB; this exceeds GitHub's file size limit of 100.00 MB | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your efforts @AdrienG9 - no need to provide the sif files! ;-) |
||
``` | ||
|
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
FATAL: While checking build target: build target 'train_container.sif' already exists. Use --force if you want to overwrite it | ||
FATAL: While checking build target: build target 'infer_container.sif' already exists. Use --force if you want to overwrite it | ||
mv: 'train_container.sif' and '/scratch/gent/491/vsc49179/./train_container.sif' are the same file | ||
mv: 'infer_container.sif' and '/scratch/gent/491/vsc49179/./infer_container.sif' are the same file |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is typo in this line - the declaration of the port does work like specified.