Unified FastAPI based backend for ChemScraper, (CLEAN job-manager and Molli - future scope)
Create a .env
from the env.tpl
file in this repo. The default env is fine without modifications for testing. Change the passwords for production use.
cp .env.tpl .env
- Install Minikube.
- Start minikube w/ an external network (defined in this repo's
docker-compose.yml
)
minikube start --network=mmli-net --driver=docker --memory=24384
- Ensure it's running:
minikube kubectl cluster-info
Kubernetes control plane is running at https://192.168.49.2:8443
CoreDNS is running at https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
- Apply the necessary confgurations:
# Create `mmli` namespace
minikube kubectl -- create ns mmli
# Apply secret and config
minikube kubectl -- apply -f /home/kastan/ncsa/mmli/mmli-backend/app/cfg/local.secret.yaml -n mmli
minikube kubectl -- apply -f /home/kastan/ncsa/mmli/mmli-backend/app/cfg/local.config.yaml -n mmli
# Create PVC needed by molli jobs
minikube kubectl -- apply -f /home/kastan/ncsa/mmli/mmli-backend/chart/weights.pvc.yaml
Edit the docker-compose.yml
to expose your kube config. In our case, minikube
requires 3 values: ca.crt
, client.crt
, client.key
.
Copy-paste this into the docker-compose.yml
, under the rest
container:
⚠️ Note: I had problems with${HOME}
and had to provide full absolute paths manually; e.g. replace${HOME}
with/home/username
.⚠️
rest:
container_name: mmli-backend
...
volumes:
- ./app:/code/app
- ./migrations:/code/migrations
- ${HOME}/.kube/config:/opt/kubeconfig
- ${HOME}/.minikube/ca.crt:/home/kastan/.minikube/ca.crt
- ${HOME}/.minikube/profiles/minikube/client.crt:${HOME}/.minikube/profiles/minikube/client.crt
- ${HOME}/.minikube/profiles/minikube/client.key:${HOME}/.minikube/profiles/minikube/client.key
Finally start the compose. Monitor for errors from mmli-backend
in the logs.
docker compose up --build # optionally add -d for detached
This will run MinIO
+ PostgreSQL
+ the Python app mmli-backend
.
Test the service works: Navigate to localhost:8080/docs
and you should see the FastAPI Swagger docs.
Initialize the Postgres database, this creates the SQL tables.
docker compose exec -w /code rest alembic upgrade head
# You sould see the logs:
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> d775ee615d7b, init
INFO [alembic.runtime.migration] Running upgrade d775ee615d7b -> 88355d0f323b, added moleculecacheentry for caching molecules, modified job schema, added flaggedmolecule for saving flagged molecules
INFO [alembic.runtime.migration] Running upgrade 88355d0f323b -> e8569ab45dd1, removed moleculecacheentry
INFO [alembic.runtime.migration] Running upgrade e8569ab45dd1 -> 30b240622d34, add chemical identifier model and table
Finally, verify the tables are created:
- exec into the database container, running the
pgsql
command.
docker exec -it mmli-backend-postgresql psql -U postgres mmli
- Run
\d
command to list tables.
psql (15.8 (Debian 15.8-1.pgdg120+1))
Type "help" for help.
mmli=# \d
List of relations
Schema | Name | Type | Owner
--------+----------------------------+----------+----------
public | alembic_version | table | postgres
public | chemical_identifier | table | postgres
public | chemical_identifier_id_seq | sequence | postgres
public | flaggedmolecule | table | postgres
public | job | table | postgres
(5 rows)
- Check the jobs table
\d job
:
mmli=# \d job
Table "public.job"
Column | Type | Collation | Nullable | Default
--------------+-------------------+-----------+----------+---------
job_info | character varying | | |
email | character varying | | |
job_id | character varying | | not null |
run_id | character varying | | |
phase | character varying | | not null |
type | character varying | | not null |
image | character varying | | |
command | character varying | | |
time_created | integer | | not null |
time_start | integer | | not null |
time_end | integer | | not null |
deleted | integer | | not null |
user_agent | character varying | | not null |
Indexes:
"job_id_pk" PRIMARY KEY, btree (job_id)
Referenced by:
TABLE "flaggedmolecule" CONSTRAINT "flaggedmolecule_job_id_fkey" FOREIGN KEY (job_id) REFERENCES job(job_id)
🎉 All done! 🎉 Check the Swagger docs for important commands on localhost:8080/docs
.
First, submit a job using Curl, Swagger or Postman. E.g.:
curl -X POST https://mmli.kastan.ai/aceretro/jobs \
-H "Content-Type: application/json" \
-d '{
"job_id": "123",
"run_id": "123",
"email": "[email protected]",
"job_info": "{\"nuc\": \"hi\", \"CORES_FILE_NAME\": \"hi\", \"SUBS_FILE_NAME\": \"hi\"}"
}'
Monitoring the job:
# after submitted a job, it should create a pod
minikube kubectl -- get pods -A
# Then get details of the pod, including failures
minikube kubectl -- describe pod mmli-job-molli-123456-j4pwd -n mmli
# get the logs from a pod
minikube kubectl -- logs mmli-job-aceretro-222222222222-n5cl4 -n mmli -c job
Create a .env
from the env.tpl
file in this repo. The default env is fine without modifications for testing. Change the passwords for production use.
cp .env.tpl .env
Setting DEBUG=true
will enable automatically reload the app when the Python source code changes
Or, you can use Python + pip if you have them installed locally
To install Dependencies:
# create a new virtual environment, e.g. for conda `conda create -n mmli-backend python=3.10 -y`
# conda activate mmli-backend
pip install -r requirements.txt
This will only run the Python app.
MinIO
and PostgreSQL
yourself. Set their credentials in the .env
file.
Initialize the Postgres database, this initializes the SQL tables with the "init" migration.
alembic upgrade head
# You should see these logs:
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> d775ee615d7b, init
...
Finally, verify the tables are created:
- Run the
pgsql
command.
psql mmli
- Run
\d
command to list tables.
psql (15.8 (Debian 15.8-1.pgdg120+1))
Type "help" for help.
mmli=# \d
List of relations
Schema | Name | Type | Owner
--------+----------------------------+----------+----------
public | alembic_version | table | postgres
public | chemical_identifier | table | postgres
public | chemical_identifier_id_seq | sequence | postgres
public | flaggedmolecule | table | postgres
public | job | table | postgres
(5 rows)
- Check the
job
table\d job
:
mmli=# \d job
Table "public.job"
Column | Type | Collation | Nullable | Default
--------------+-------------------+-----------+----------+---------
job_info | character varying | | |
email | character varying | | |
job_id | character varying | | not null |
run_id | character varying | | |
phase | character varying | | not null |
type | character varying | | not null |
image | character varying | | |
command | character varying | | |
time_created | integer | | not null |
time_start | integer | | not null |
time_end | integer | | not null |
deleted | integer | | not null |
user_agent | character varying | | not null |
Indexes:
"job_id_pk" PRIMARY KEY, btree (job_id)
Referenced by:
TABLE "flaggedmolecule" CONSTRAINT "flaggedmolecule_job_id_fkey" FOREIGN KEY (job_id) REFERENCES job(job_id)
🎉 All done! 🎉 Check the Swagger docs for important commands on localhost:8080/docs
.
Any time that you add, modify, or remove anything in the Job or JobBase classes, this will affect the database schema.
Migrations are handled using Alembic
You can use Alembic to automatically generate a script that will migrate the database to a new schema version.
See the migrations README for more info