Skip to content

Unified FastAPI based backend for ChemScraper, (CLEAN job-manager and Molli - future scope)


Notifications You must be signed in to change notification settings


Repository files navigation


Unified FastAPI based backend for ChemScraper, (CLEAN job-manager and Molli - future scope)

⭐️ Recommended local development (Docker)

(1/4) Create a .env

Create a .env from the env.tpl file in this repo. The default env is fine without modifications for testing. Change the passwords for production use.

cp .env.tpl .env

(2/4) Setup a K8 cluster, here we use Minikube

  1. Install Minikube.
  2. Start minikube w/ an external network (defined in this repo's docker-compose.yml)
minikube start --network=mmli-net --driver=docker --memory=24384
  1. Ensure it's running: minikube kubectl cluster-info
Kubernetes control plane is running at
CoreDNS is running at
  1. Apply the necessary confgurations:
# Create `mmli` namespace
minikube kubectl -- create ns mmli

# Apply secret and config
minikube kubectl -- apply -f /home/kastan/ncsa/mmli/mmli-backend/app/cfg/local.secret.yaml -n mmli
minikube kubectl -- apply -f /home/kastan/ncsa/mmli/mmli-backend/app/cfg/local.config.yaml -n mmli

# Create PVC needed by molli jobs
minikube kubectl -- apply -f /home/kastan/ncsa/mmli/mmli-backend/chart/weights.pvc.yaml

(3/4) Run Docker Compose build

Edit the docker-compose.yml to expose your kube config. In our case, minikube requires 3 values: ca.crt, client.crt, client.key.

Copy-paste this into the docker-compose.yml, under the rest container:

⚠️ Note: I had problems with ${HOME} and had to provide full absolute paths manually; e.g. replace ${HOME} with /home/username. ⚠️

    container_name: mmli-backend

        - ./app:/code/app
        - ./migrations:/code/migrations
        - ${HOME}/.kube/config:/opt/kubeconfig
        - ${HOME}/.minikube/ca.crt:/home/kastan/.minikube/ca.crt
        - ${HOME}/.minikube/profiles/minikube/client.crt:${HOME}/.minikube/profiles/minikube/client.crt
        - ${HOME}/.minikube/profiles/minikube/client.key:${HOME}/.minikube/profiles/minikube/client.key

Finally start the compose. Monitor for errors from mmli-backend in the logs.

docker compose up --build # optionally add -d for detached

This will run MinIO + PostgreSQL + the Python app mmli-backend.

Test the service works: Navigate to localhost:8080/docs and you should see the FastAPI Swagger docs.

(4/4) Initialize the databse

Initialize the Postgres database, this creates the SQL tables.

docker compose exec -w /code rest alembic upgrade head

# You sould see the logs: 
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> d775ee615d7b, init
INFO  [alembic.runtime.migration] Running upgrade d775ee615d7b -> 88355d0f323b, added moleculecacheentry for caching molecules, modified job schema, added flaggedmolecule for saving flagged molecules
INFO  [alembic.runtime.migration] Running upgrade 88355d0f323b -> e8569ab45dd1, removed moleculecacheentry
INFO  [alembic.runtime.migration] Running upgrade e8569ab45dd1 -> 30b240622d34, add chemical identifier model and table

Finally, verify the tables are created:

  1. exec into the database container, running the pgsql command.
docker exec -it mmli-backend-postgresql psql -U postgres mmli
  1. Run \d command to list tables.
psql (15.8 (Debian 15.8-1.pgdg120+1))
Type "help" for help.

mmli=# \d 
                     List of relations
 Schema |            Name            |   Type   |  Owner
 public | alembic_version            | table    | postgres
 public | chemical_identifier        | table    | postgres
 public | chemical_identifier_id_seq | sequence | postgres
 public | flaggedmolecule            | table    | postgres
 public | job                        | table    | postgres
(5 rows)
  1. Check the jobs table \d job:
mmli=# \d job
                        Table "public.job"
    Column    |       Type        | Collation | Nullable | Default
 job_info     | character varying |           |          |
 email        | character varying |           |          |
 job_id       | character varying |           | not null |
 run_id       | character varying |           |          |
 phase        | character varying |           | not null |
 type         | character varying |           | not null |
 image        | character varying |           |          |
 command      | character varying |           |          |
 time_created | integer           |           | not null |
 time_start   | integer           |           | not null |
 time_end     | integer           |           | not null |
 deleted      | integer           |           | not null |
 user_agent   | character varying |           | not null |
    "job_id_pk" PRIMARY KEY, btree (job_id)
Referenced by:
    TABLE "flaggedmolecule" CONSTRAINT "flaggedmolecule_job_id_fkey" FOREIGN KEY (job_id) REFERENCES job(job_id)

🎉 All done! 🎉 Check the Swagger docs for important commands on localhost:8080/docs.

How to monitor running jobs

First, submit a job using Curl, Swagger or Postman. E.g.:

curl -X POST \
-H "Content-Type: application/json" \
-d '{
  "job_id": "123",
  "run_id": "123",
  "email": "[email protected]",
  "job_info": "{\"nuc\": \"hi\", \"CORES_FILE_NAME\": \"hi\", \"SUBS_FILE_NAME\": \"hi\"}"

Monitoring the job:

# after submitted a job, it should create a pod
minikube kubectl -- get pods -A

# Then get details of the pod, including failures
minikube kubectl -- describe pod mmli-job-molli-123456-j4pwd -n mmli

# get the logs from a pod
minikube kubectl -- logs mmli-job-aceretro-222222222222-n5cl4 -n mmli -c job

Local development Setup (without Docker, not recommended)

(1/3) Configure Environment

Create a .env from the env.tpl file in this repo. The default env is fine without modifications for testing. Change the passwords for production use.

cp .env.tpl .env

Setting DEBUG=true will enable automatically reload the app when the Python source code changes

(2/3) Install dependencies

Or, you can use Python + pip if you have them installed locally

To install Dependencies:

# create a new virtual environment, e.g. for conda `conda create -n mmli-backend python=3.10 -y`
# conda activate mmli-backend
pip install -r requirements.txt

This will only run the Python app.

⚠️ You must run MinIO and PostgreSQL yourself. Set their credentials in the .env file.

(3/3) Initialize the databse

Initialize the Postgres database, this initializes the SQL tables with the "init" migration.

alembic upgrade head
# You should see these logs:
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> d775ee615d7b, init

Finally, verify the tables are created:

  1. Run the pgsql command.
psql mmli
  1. Run \d command to list tables.
psql (15.8 (Debian 15.8-1.pgdg120+1))
Type "help" for help.

mmli=# \d 
                     List of relations
 Schema |            Name            |   Type   |  Owner
 public | alembic_version            | table    | postgres
 public | chemical_identifier        | table    | postgres
 public | chemical_identifier_id_seq | sequence | postgres
 public | flaggedmolecule            | table    | postgres
 public | job                        | table    | postgres
(5 rows)
  1. Check the job table \d job:
mmli=# \d job
                        Table "public.job"
    Column    |       Type        | Collation | Nullable | Default
 job_info     | character varying |           |          |
 email        | character varying |           |          |
 job_id       | character varying |           | not null |
 run_id       | character varying |           |          |
 phase        | character varying |           | not null |
 type         | character varying |           | not null |
 image        | character varying |           |          |
 command      | character varying |           |          |
 time_created | integer           |           | not null |
 time_start   | integer           |           | not null |
 time_end     | integer           |           | not null |
 deleted      | integer           |           | not null |
 user_agent   | character varying |           | not null |
    "job_id_pk" PRIMARY KEY, btree (job_id)
Referenced by:
    TABLE "flaggedmolecule" CONSTRAINT "flaggedmolecule_job_id_fkey" FOREIGN KEY (job_id) REFERENCES job(job_id)

🎉 All done! 🎉 Check the Swagger docs for important commands on localhost:8080/docs.

Database Migrations

Any time that you add, modify, or remove anything in the Job or JobBase classes, this will affect the database schema.

Migrations are handled using Alembic

You can use Alembic to automatically generate a script that will migrate the database to a new schema version.

See the migrations README for more info


Unified FastAPI based backend for ChemScraper, (CLEAN job-manager and Molli - future scope)







No releases published


No packages published
