vib-tcp · arthurapostel · Apr 7, 2025 · Apr 7, 2025 · Apr 7, 2025 · Apr 7, 2025
diff --git a/Dockerfile.infer b/Dockerfile.infer
@@ -1,7 +1,20 @@
+# set base image
+FROM python:3.9-slim
+
+# set working directory
 WORKDIR /app
+
+# copy requirements file
 COPY requirements.txt .
+
+# install dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# copy the server code
 COPY server.py .
-FROM python:3.9-slim
+
+# run the server
 CMD ["python", "server.py"]
-RUN pip install --no-cache-dir -r requirements.txt
+
+# some info on the default port (doesn't do anything)
 EXPOSE 8080
diff --git a/Dockerfile.train b/Dockerfile.train
@@ -1,12 +1,23 @@
-FROM <base imagae>
+ARG OWNER=jupyter
+ARG BASE_CONTAINER=$OWNER/scipy-notebook:python-3.11.5
+FROM $BASE_CONTAINER
 
-# TODO: Set a working directory
+# Create an additional folder for model storage
+USER root
+RUN mkdir -p /app/models
+USER jovyan
 
-# TODO: Copy the requirements.txt file to the working directory
+# Set a working directory
+WORKDIR /app
 
-# TODO: Install the Python dependencies
+# Copy the requirements.txt file to the working directory
+COPY requirements.txt .
 
-# TODO: Copy the training script (train.py) to the working directory
+# Install the Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
 
-# TODO: Run the training script that generates the model
-CMD [...]
+# Copy the training script (train.py) to the working directory
+COPY train.py .
+
+# Run the training script that generates the model
+CMD ["python", "train.py"]
diff --git a/README.md b/README.md
@@ -1,44 +1,47 @@
 # Project Docker Microcredential
 micro-credential VIB/UGent - Reproducible data analysis
 
-In this project, you will train, run and serve a machine learning model using Docker. Furthermore, you will store the Docker images on your own account on Docker hub. Using the image of the training step, you will build an Apptainer image on the HPC of UGent.
+**Please find the required steps specified below**
 
-## Deliverables
-
-- [ ] Clone this repository to your personal github account
-- [ ] Containerize training the machine learning model
-- [ ] Containerize serving of the machine learning model
-- [ ] Train and run the machine learning model using Docker
-- [ ] Run the Docker container serving the machine learning model
-- [ ] Store the Docker images on your personal account on Docker Hub
-- [ ] Provide the resulting Dockerfiles in GitHub
-- [ ] Build an Apptainer image on a HPC of your choice
-- [ ] Provide the logs of the slurm job in GitHub
-- [ ] Document the steps in a text document in GitHub
-
-## Proposed steps - containerize and run training the machine learning model
-
-Complete file named `Dockerfile.train`
-
-- Copy requirements.txt and install dependencies
-- Copy train.py to the working directory
-- Set the command to run train.py
-- Run the training of the model on your computer
-- Document the command as comment in the Dockerfile
-- Store the created Dockerfile in your cloned github repository
+Instructions:
+In this project, you will train, run and serve a machine learning model using Docker. Furthermore, you will store the Docker images on your own account on Docker hub. Using the image of the training step, you will build an Apptainer image on the HPC of UGent.docker login
 
-## Proposed steps - containerize and serve the machine learning model
-
-- Correct the order of the instructions in the Dockerfile.infer
-- Document the steps in the Dockerfile.infer as comments
-- Document the succesful `docker run` command in the Dockerfile.infer as a comment
-
-## Proposed steps - store images on Dockerhub and build an Apptainer image on the HPC
-
-- Create an account on Dockerhub
-- Store the built images on your account
-- Create a shell script on the HPC of your preference
-- Store the shell script in your cloned github repository
+## Deliverables
 
+- [X] Clone this repository to your personal github account
+- [X] Containerize training the machine learning model
+- [X] Containerize serving of the machine learning model
+- [X] Train and run the machine learning model using Docker
+- [X] Run the Docker container serving the machine learning model
+- [X] Store the Docker images on your personal account on Docker Hub
+- [X] Provide the resulting Dockerfiles in GitHub
+- [X] Build an Apptainer image on a HPC of your choice
+- [X] Provide the logs of the slurm job in GitHub
+- [X] Document the steps in a text document in GitHub
+
+## Steps
+1. Create dockerfiles locally
+```bash
+docker build . --tag train:v1 -f Dockerfile.train
+docker build . --tag infer:v1 -f Dockerfile.infer
+docker run --rm --volume "$PWD"/app/models:/app/models --name train_model train:v1
+docker run --rm -p 8080:8080 --volume "$PWD"/app/models:/app/models --name ml_server infer:v1
+```
+
+2. Push dockerfiles to dockerfile hub
+```bash
+docker login
+docker tag train:v1 aapostel/train:v1
+docker tag infer:v1 aapostel/infer:v1
+docker push aapostel/train:v1
+docker push aapostel/infer:v1
+```
+
+3. Create PBS file to generate apptainer SIFs & run those on HPC
+
+4. Run PBS file
+```bash
+qsub apptainer.pbs
+```
 
 
diff --git a/apptainer.pbs b/apptainer.pbs
@@ -0,0 +1,30 @@
+#!/bin/bash
+#SBATCH --job-name=build-apptainer-train
+#SBATCH --ntasks=1
+#SBATCH --cpus-per-task=4
+#SBATCH --time=1:00:00
+#SBATCH --output=build_train_server.stdout
+#SBATCH --error=build_train_server.stderr
+
+# change to appropriate working directory
+cd /tmp
+mkdir -p /tmp/$USER
+
+echo "Start Job"
+date
+
+# build SIFs
+APPTAINER_CACHEDIR=/tmp/ \
+APPTAINER_TMPDIR=/tmp/ \
+apptainer build --fakeroot train_model.sif docker://aapostel/train:v1
+
+APPTAINER_CACHEDIR=/tmp/ \
+APPTAINER_TMPDIR=/tmp/ \
+apptainer build --fakeroot server.sif docker://aapostel/infer:v1
+
+# move the built image to a persistent location
+mv train_model.sif $VSC_DATA/
+mv server.sif $VSC_DATA/
+
+date
+echo "End Job"