Skip to content

Handing in project #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions Dockerfile.infer
Original file line number Diff line number Diff line change
@@ -1,7 +1,20 @@
# set base image
FROM python:3.9-slim

# set working directory
WORKDIR /app

# copy requirements file
COPY requirements.txt .

# install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# copy the server code
COPY server.py .
FROM python:3.9-slim

# run the server
CMD ["python", "server.py"]
RUN pip install --no-cache-dir -r requirements.txt

# some info on the default port (doesn't do anything)
EXPOSE 8080
25 changes: 18 additions & 7 deletions Dockerfile.train
Original file line number Diff line number Diff line change
@@ -1,12 +1,23 @@
FROM <base imagae>
ARG OWNER=jupyter
ARG BASE_CONTAINER=$OWNER/scipy-notebook:python-3.11.5
FROM $BASE_CONTAINER

# TODO: Set a working directory
# Create an additional folder for model storage
USER root
RUN mkdir -p /app/models
USER jovyan

# TODO: Copy the requirements.txt file to the working directory
# Set a working directory
WORKDIR /app

# TODO: Install the Python dependencies
# Copy the requirements.txt file to the working directory
COPY requirements.txt .

# TODO: Copy the training script (train.py) to the working directory
# Install the Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# TODO: Run the training script that generates the model
CMD [...]
# Copy the training script (train.py) to the working directory
COPY train.py .

# Run the training script that generates the model
CMD ["python", "train.py"]
75 changes: 39 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,47 @@
# Project Docker Microcredential
micro-credential VIB/UGent - Reproducible data analysis

In this project, you will train, run and serve a machine learning model using Docker. Furthermore, you will store the Docker images on your own account on Docker hub. Using the image of the training step, you will build an Apptainer image on the HPC of UGent.
**Please find the required steps specified below**

## Deliverables

- [ ] Clone this repository to your personal github account
- [ ] Containerize training the machine learning model
- [ ] Containerize serving of the machine learning model
- [ ] Train and run the machine learning model using Docker
- [ ] Run the Docker container serving the machine learning model
- [ ] Store the Docker images on your personal account on Docker Hub
- [ ] Provide the resulting Dockerfiles in GitHub
- [ ] Build an Apptainer image on a HPC of your choice
- [ ] Provide the logs of the slurm job in GitHub
- [ ] Document the steps in a text document in GitHub

## Proposed steps - containerize and run training the machine learning model

Complete file named `Dockerfile.train`

- Copy requirements.txt and install dependencies
- Copy train.py to the working directory
- Set the command to run train.py
- Run the training of the model on your computer
- Document the command as comment in the Dockerfile
- Store the created Dockerfile in your cloned github repository
Instructions:
In this project, you will train, run and serve a machine learning model using Docker. Furthermore, you will store the Docker images on your own account on Docker hub. Using the image of the training step, you will build an Apptainer image on the HPC of UGent.docker login

## Proposed steps - containerize and serve the machine learning model

- Correct the order of the instructions in the Dockerfile.infer
- Document the steps in the Dockerfile.infer as comments
- Document the succesful `docker run` command in the Dockerfile.infer as a comment

## Proposed steps - store images on Dockerhub and build an Apptainer image on the HPC

- Create an account on Dockerhub
- Store the built images on your account
- Create a shell script on the HPC of your preference
- Store the shell script in your cloned github repository
## Deliverables

- [X] Clone this repository to your personal github account
- [X] Containerize training the machine learning model
- [X] Containerize serving of the machine learning model
- [X] Train and run the machine learning model using Docker
- [X] Run the Docker container serving the machine learning model
- [X] Store the Docker images on your personal account on Docker Hub
- [X] Provide the resulting Dockerfiles in GitHub
- [X] Build an Apptainer image on a HPC of your choice
- [X] Provide the logs of the slurm job in GitHub
- [X] Document the steps in a text document in GitHub

## Steps
1. Create dockerfiles locally
```bash
docker build . --tag train:v1 -f Dockerfile.train
docker build . --tag infer:v1 -f Dockerfile.infer
docker run --rm --volume "$PWD"/app/models:/app/models --name train_model train:v1
docker run --rm -p 8080:8080 --volume "$PWD"/app/models:/app/models --name ml_server infer:v1
```

2. Push dockerfiles to dockerfile hub
```bash
docker login
docker tag train:v1 aapostel/train:v1
docker tag infer:v1 aapostel/infer:v1
docker push aapostel/train:v1
docker push aapostel/infer:v1
```

3. Create PBS file to generate apptainer SIFs & run those on HPC

4. Run PBS file
```bash
qsub apptainer.pbs
```


30 changes: 30 additions & 0 deletions apptainer.pbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash
#SBATCH --job-name=build-apptainer-train
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=1:00:00
#SBATCH --output=build_train_server.stdout
#SBATCH --error=build_train_server.stderr

# change to appropriate working directory
cd /tmp
mkdir -p /tmp/$USER

echo "Start Job"
date

# build SIFs
APPTAINER_CACHEDIR=/tmp/ \
APPTAINER_TMPDIR=/tmp/ \
apptainer build --fakeroot train_model.sif docker://aapostel/train:v1

APPTAINER_CACHEDIR=/tmp/ \
APPTAINER_TMPDIR=/tmp/ \
apptainer build --fakeroot server.sif docker://aapostel/infer:v1

# move the built image to a persistent location
mv train_model.sif $VSC_DATA/
mv server.sif $VSC_DATA/

date
echo "End Job"
Loading