Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 25 additions & 3 deletions Dockerfile.infer
Original file line number Diff line number Diff line change
@@ -1,7 +1,29 @@
FROM python:3.9-slim


# Set a working directory
WORKDIR /app


# Copy the requirements.txt file to the working directory
COPY requirements.txt .
COPY server.py .
FROM python:3.9-slim
CMD ["python", "server.py"]


# Install the Python dependencies
RUN pip install --no-cache-dir -r requirements.txt


# indicate which port should be exposed in the image
EXPOSE 8080


# Copy the training script (server.py) to the working directory
COPY server.py .


# run the server script that generates a server
CMD ["python", "server.py"]


# Command to build the image:
# docker build . --tag sever:v01 -f Dockerfile.infer
36 changes: 29 additions & 7 deletions Dockerfile.train
Original file line number Diff line number Diff line change
@@ -1,12 +1,34 @@
FROM <base imagae>
FROM python:3.9-slim

# TODO: Set a working directory

# TODO: Copy the requirements.txt file to the working directory
# Set a working directory
WORKDIR /app

# TODO: Install the Python dependencies

# TODO: Copy the training script (train.py) to the working directory
# Copy the requirements.txt file to the working directory
COPY requirements.txt ./


# Install the Python dependencies
RUN apt update && apt -y upgrade
RUN apt install -y wget
RUN pip install --no-cache-dir -r requirements.txt


# Copy the training script (train.py) to the working directory
COPY train.py ./


# Setup an app user so the container doesn't run as the root user
# RUN useradd app
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting suggestion which we could apply in other edition

# USER app



# run the training script that generates the model
CMD ["python", "train.py"]


# Command to build the image:
# docker build . --tag train:v01 -f Dockerfile.train

# TODO: Run the training script that generates the model
CMD [...]
41 changes: 41 additions & 0 deletions How-to-use-apptainer-on-hpc.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# How to use Apptainer on the HPC

## Step 1 connect to HPC

1. Open WinSCP and connect to the HPC

2. Open a Putty terminal to communicate with the HPC


## Step 2 create a batch file

1. Create a batch file .sh

* start with `#!/bin/bash`

* Specify job options via `#SBATCH --<job option>=<your choice>` [See documentation](https://docs.hpc.ugent.be/Windows/running_batch_jobs/#defining-and-submitting-your-job]

* Change the working directory, for instance to `$VSC_DATA` to `$VSC_SCRATCH`

* Use `module purge` to purge all loded modules

* Set the chache directory for apptainer: `export APPTAINER_CACHEDIR=$VSC_SCRATCH`

2. Pull container images via `apptainer pull <name_version.sif>` <location>


## Step 3 Save/copy batch file to DATA folder

* copy via WinSCP
* upload via the hpc portal website
* create directly using vi


## Step 4 submit batch file as a job

* using `sbatch` <batch_file.sh>
* check running processes with `squeue`




50 changes: 26 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,46 @@
# Project Docker Microcredential
micro-credential VIB/UGent - Reproducible data analysis

In this project, you will train, run and serve a machine learning model using Docker. Furthermore, you will store the Docker images on your own account on Docker hub. Using the image of the training step, you will build an Apptainer image on the HPC of UGent.
In this project, you will train, run and serve a machine learning model using Docker.
Furthermore, you will store the Docker images on your own account on Docker hub.
Using the image of the training step, you will build an Apptainer image on the HPC of UGent.

## Deliverables

- [ ] Clone this repository to your personal github account
- [ ] Containerize training the machine learning model
- [ ] Containerize serving of the machine learning model
- [ ] Train and run the machine learning model using Docker
- [ ] Run the Docker container serving the machine learning model
- [ ] Store the Docker images on your personal account on Docker Hub
- [ ] Provide the resulting Dockerfiles in GitHub
- [ ] Build an Apptainer image on a HPC of your choice
- [ ] Provide the logs of the slurm job in GitHub
- [ ] Document the steps in a text document in GitHub
- [x] Clone this repository to your personal github account
- [x] Containerize training the machine learning model
- [x] Containerize serving of the machine learning model
- [x] Train and run the machine learning model using Docker
- [x] Run the Docker container serving the machine learning model
- [x] Store the Docker images on your personal account on Docker Hub
- [x] Provide the resulting Dockerfiles in GitHub
- [x] Build an Apptainer image on a HPC of your choice
- [x] Provide the logs of the slurm job in GitHub
- [x] Document the steps in a text document in GitHub

## Proposed steps - containerize and run training the machine learning model

Complete file named `Dockerfile.train`

- Copy requirements.txt and install dependencies
- Copy train.py to the working directory
- Set the command to run train.py
- Run the training of the model on your computer
- Document the command as comment in the Dockerfile
- Store the created Dockerfile in your cloned github repository
+ Copy requirements.txt and install dependencies
+ Copy train.py to the working directory
+ Set the command to run train.py
+ Run the training of the model on your computer
+ Document the command as comment in the Dockerfile
+ Store the created Dockerfile in your cloned github repository

## Proposed steps - containerize and serve the machine learning model

- Correct the order of the instructions in the Dockerfile.infer
- Document the steps in the Dockerfile.infer as comments
- Document the succesful `docker run` command in the Dockerfile.infer as a comment
+ Correct the order of the instructions in the Dockerfile.infer
+ Document the steps in the Dockerfile.infer as comments
+ Document the succesful `docker run` command in the Dockerfile.infer as a comment

## Proposed steps - store images on Dockerhub and build an Apptainer image on the HPC

- Create an account on Dockerhub
- Store the built images on your account
- Create a shell script on the HPC of your preference
- Store the shell script in your cloned github repository
+ Create an account on Dockerhub
+ Store the built images on your account
+ Create a shell script on the HPC of your preference
+ Store the shell script in your cloned github repository



18 changes: 18 additions & 0 deletions assignment.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --partition=donphan
#SBATCH --mem=8G
#SBATCH --time=00:30:00
#SBATCH [email protected]
#SBATCH --ntasks=1


module purge

export APPTAINER_CACHEDIR=$VSC_SCRATCH

cd $VSC_DATA
echo Start Job
apptainer pull server_v0.sif docker://ddebeer/server:v01
echo halfway
apptainer pull train_v0.sif docker://ddebeer/train:v01
echo end Job
2 changes: 1 addition & 1 deletion server.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
app = Flask(__name__)

# Check if the model file exists and wait until it does
model_path = '/app/models/iris_model.pkl'
model_path = '/app/iris_model.pkl'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, works as well


while not os.path.exists(model_path):
print(f"Waiting for model file at {model_path}...")
Expand Down
54 changes: 54 additions & 0 deletions slurm-20167422.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
The following modules were not unloaded:
(Use "module --force purge" to unload all):

1) env/vsc/donphan 3) env/software/donphan
2) env/slurm/donphan 4) cluster/donphan
Start Job
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Copying blob sha256:05802d3ba2ead9e590dd748b23da547106549ef0fa66bbe6cf14583d1450db04
Copying blob sha256:8a628cdd7ccc83e90e5a95888fcb0ec24b991141176c515ad101f12d6433eb96
Copying blob sha256:74018f7cfa8f2965fd86b13c38f71417bc846e071a5f5bb5ae569ccb5a6e7248
Copying blob sha256:a0b0cfc480ce03c723a597904bcfbf28c71438c689e6d5097c2332835f67a40c
Copying blob sha256:97d21b95fb00ac3b08975ab6f8709f3a7e35a05d75e2f9a70fa95348279dac27
Copying blob sha256:7c0a46d2d00fd6b3bfbaf17d1a66701c9f045b106b2b77d30308d83b4997e91a
Copying blob sha256:722a684821197aa750a57327cce12518f2aff787af1bf353c4ddae4beae7ad44
Copying blob sha256:179efd259301554f31db82f1ab6362ef03724ce395462216777ef0636ad6a7c0
Copying config sha256:9302d5dd202ce9285b6d62d0f0fb5d8e95b7e6a278db8bdd7ba0baf6961cc324
Writing manifest to image destination
2025/04/24 15:09:22 info unpack layer: sha256:8a628cdd7ccc83e90e5a95888fcb0ec24b991141176c515ad101f12d6433eb96
2025/04/24 15:09:23 info unpack layer: sha256:74018f7cfa8f2965fd86b13c38f71417bc846e071a5f5bb5ae569ccb5a6e7248
2025/04/24 15:09:23 info unpack layer: sha256:a0b0cfc480ce03c723a597904bcfbf28c71438c689e6d5097c2332835f67a40c
2025/04/24 15:09:24 info unpack layer: sha256:97d21b95fb00ac3b08975ab6f8709f3a7e35a05d75e2f9a70fa95348279dac27
2025/04/24 15:09:24 info unpack layer: sha256:7c0a46d2d00fd6b3bfbaf17d1a66701c9f045b106b2b77d30308d83b4997e91a
2025/04/24 15:09:24 info unpack layer: sha256:05802d3ba2ead9e590dd748b23da547106549ef0fa66bbe6cf14583d1450db04
2025/04/24 15:09:24 info unpack layer: sha256:722a684821197aa750a57327cce12518f2aff787af1bf353c4ddae4beae7ad44
2025/04/24 15:09:27 info unpack layer: sha256:179efd259301554f31db82f1ab6362ef03724ce395462216777ef0636ad6a7c0
INFO: Creating SIF file...
halfway
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Copying blob sha256:05802d3ba2ead9e590dd748b23da547106549ef0fa66bbe6cf14583d1450db04
Copying blob sha256:8a628cdd7ccc83e90e5a95888fcb0ec24b991141176c515ad101f12d6433eb96
Copying blob sha256:74018f7cfa8f2965fd86b13c38f71417bc846e071a5f5bb5ae569ccb5a6e7248
Copying blob sha256:a0b0cfc480ce03c723a597904bcfbf28c71438c689e6d5097c2332835f67a40c
Copying blob sha256:97d21b95fb00ac3b08975ab6f8709f3a7e35a05d75e2f9a70fa95348279dac27
Copying blob sha256:7c0a46d2d00fd6b3bfbaf17d1a66701c9f045b106b2b77d30308d83b4997e91a
Copying blob sha256:91f69e43c7b2a4191a9e05dd6f01c34b0993d76608c14e955f33397cb915ed5f
Copying blob sha256:d6713be2e29283b75210dbd238dbd8a40037de079961bf7afa4865cd8687ef0e
Copying blob sha256:0d13e422e987d4968e849a9fcaab036164e43e03c0d8c69d4394ad8dc1a01c7b
Copying blob sha256:fc73ac045e6437103a9183eb474e41a67bbc65874896418df1049c6cb1cc9ecb
Copying config sha256:64f10bf38baaa55ff5b37c771a34fe67de171b8c98ea87a7ffb73b2b506adcaf
Writing manifest to image destination
2025/04/24 15:10:25 info unpack layer: sha256:8a628cdd7ccc83e90e5a95888fcb0ec24b991141176c515ad101f12d6433eb96
2025/04/24 15:10:26 info unpack layer: sha256:74018f7cfa8f2965fd86b13c38f71417bc846e071a5f5bb5ae569ccb5a6e7248
2025/04/24 15:10:26 info unpack layer: sha256:a0b0cfc480ce03c723a597904bcfbf28c71438c689e6d5097c2332835f67a40c
2025/04/24 15:10:27 info unpack layer: sha256:97d21b95fb00ac3b08975ab6f8709f3a7e35a05d75e2f9a70fa95348279dac27
2025/04/24 15:10:27 info unpack layer: sha256:7c0a46d2d00fd6b3bfbaf17d1a66701c9f045b106b2b77d30308d83b4997e91a
2025/04/24 15:10:27 info unpack layer: sha256:05802d3ba2ead9e590dd748b23da547106549ef0fa66bbe6cf14583d1450db04
2025/04/24 15:10:27 info unpack layer: sha256:91f69e43c7b2a4191a9e05dd6f01c34b0993d76608c14e955f33397cb915ed5f
2025/04/24 15:10:28 info unpack layer: sha256:d6713be2e29283b75210dbd238dbd8a40037de079961bf7afa4865cd8687ef0e
2025/04/24 15:10:28 info unpack layer: sha256:0d13e422e987d4968e849a9fcaab036164e43e03c0d8c69d4394ad8dc1a01c7b
2025/04/24 15:10:31 info unpack layer: sha256:fc73ac045e6437103a9183eb474e41a67bbc65874896418df1049c6cb1cc9ecb
INFO: Creating SIF file...
end Job
2 changes: 1 addition & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@
model = clf.fit(iris.data, iris.target_names[iris.target])

#Save the trained model to the shared volume (make sure to use the correct path)
joblib.dump(model, '/app/models/iris_model.pkl')
joblib.dump(model, '/app/iris_model.pkl')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, works as well


print("Model training complete and saved as iris_model.pkl")