Skip to content

Commit aad8fc7

Browse files
committed
Added MPM end to end demo
1 parent 748c580 commit aad8fc7

File tree

8 files changed

+2760
-0
lines changed

8 files changed

+2760
-0
lines changed
+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
training/preprocessed_data.csv
2+
__pycache__

guides/MPM/end_to_end_example/data_processing/credit_scoring_dataset.csv

+1,201
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
import comet_ml
2+
import pandas as pd
3+
from io import StringIO
4+
import os
5+
6+
def get_raw_data(workspace_name: str, artifact_name: str):
7+
"""
8+
In this function, we will check if the raw data exists in Comet Artifacts. If it does, we will download it from there,
9+
if not we will upload it from the local directory.
10+
11+
Once the file is available locally, we will load it into a pandas dataframe and return it.
12+
"""
13+
exp = comet_ml.get_running_experiment()
14+
15+
try:
16+
artifact = exp.get_artifact(artifact_name=f"{artifact_name}_raw")
17+
18+
# Download the artifact
19+
artifact.download(path="./")
20+
except Exception as e:
21+
print(f"Error downloading artifact: {e}")
22+
artifact = comet_ml.Artifact(name=f"{artifact_name}_raw", artifact_type="dataset")
23+
artifact.add("./credit_scoring_dataset.csv")
24+
exp.log_artifact(artifact)
25+
26+
df = pd.read_csv("./credit_scoring_dataset.csv")
27+
return df
28+
29+
def preprocess_data(df: pd.DataFrame):
30+
"""
31+
In this function, we will preprocess the data to make it ready for the model. We will store the preprocessed data in a
32+
new Comet Artifact.
33+
"""
34+
# Select the relevant columns
35+
df = df.loc[:, ['CustAge', 'CustIncome', 'EmpStatus', 'UtilRate', 'OtherCC', 'ResStatus', 'TmAtAddress', 'TmWBank',
36+
'probdefault']]
37+
38+
# Rename the target column
39+
df.rename({'probdefault': 'probability_default'}, inplace=True, axis=1)
40+
41+
# Convert the categorical columns to category type
42+
for c in ['EmpStatus', 'OtherCC', 'ResStatus']:
43+
df[c] = df[c].astype('category')
44+
45+
# Save the preprocessed data to a new Comet Artifact
46+
csv_buffer = StringIO()
47+
df.to_csv(csv_buffer, index=False)
48+
csv_buffer.seek(0)
49+
50+
artifact = comet_ml.Artifact(name=f"{artifact_name}_preprocessed", artifact_type="dataset")
51+
artifact.add(local_path_or_data=csv_buffer, logical_path="preprocessed_data.csv")
52+
53+
exp = comet_ml.get_running_experiment()
54+
exp.log_artifact(artifact)
55+
56+
return df
57+
58+
if __name__ == "__main__":
59+
workspace_name = os.environ["COMET_WORKSPACE"]
60+
project_name = os.environ["COMET_PROJECT_NAME"]
61+
artifact_name = os.environ["COMET_PROJECT_NAME"]
62+
63+
exp = comet_ml.Experiment(workspace=workspace_name, project_name=project_name)
64+
df = get_raw_data(workspace_name, artifact_name)
65+
66+
processed_df = preprocess_data(df)
67+
68+
print("Data preprocessing complete.")
+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# MPM example scripts
2+
3+
The MPM examples are all based on the same Credit Scoring examples, the goal of the model is to identify users that are likely to default on their loan.
4+
5+
This folder contains three different set of scripts that showcase MPM:
6+
* `data_processing`: Script that processes the raw data and creates a new CSV file with the model's features
7+
* `training`: Script that trains a machine learning model and uploads it to Comet's Model Registry
8+
* `serving`: FastAPI inference server that downloads a model from Comet's Model Registry who's predictions are logged to MPM
9+
10+
## Setup
11+
In order to run these demo scripts you will need to set these environment variables:
12+
```bash
13+
export COMET_API_KEY="<Comet API Key>"
14+
export COMET_WORKSPACE="<Comet workspace to log data to>"
15+
export COMET_PROJECT_NAME="<Comet project name>"
16+
export COMET_MODEL_REGISTRY_NAME="<Comet model registry name>"
17+
18+
export COMET_URL_OVERRIDE="<EM endpoint, similar format to https://www.comet.com/clientlib/>"
19+
export COMET_URL="<MPM ingestion endpoint, similar format to https://www.comet.com/>"
20+
```
21+
22+
You will also need to install the Python libraries in `requirements.txt`
23+
24+
## Data processing
25+
26+
For this demo, we will be using a simple credit scoring dataset available in the `data_processing` folder.
27+
28+
The proprocessing set is quite simple in this demo but showcases how you can use Comet's Artifacts features to track all your data processing steps.
29+
30+
The code can be run using:
31+
```
32+
cd data_processing
33+
python data_processing.py
34+
```
35+
36+
## Training
37+
For this demo we train a LightGBM model that we then upload to the model registry.
38+
39+
The code can be run using:
40+
```
41+
cd training
42+
python model_training.py
43+
```
44+
45+
## Serving
46+
**Dependency**: In order to use this inference server, you will need to first train a model and upload it to the model registry using the training scripts.
47+
48+
The inference server is built using FastAPI and demonstrates how to use both the model registry to store models as well as MPM to log predictions.
49+
50+
The code can be run using:
51+
```
52+
cd serving
53+
uvicorn main:app --reload
54+
```
55+
56+
Once the code has been run, an inference server will be available under `http://localhost:8000` and has the following endpoints:
57+
* `http://localhost:8000/`: returns the string `FastAPI inference service` and indicates the inference server is running
58+
* `http://localhost:8000/health_check`: Simple health check to make sure the server is running and accepting requests
59+
* `http://localhost:8000/prediction`: Make a prediction and log it to MPM
60+
* `http://localhost:8000/create_demo_data`: Creates 10,000 predictions over a one week period to populate MPM dashboards
61+
62+
**Note:** It can take a few minutes for the data to appear in the debugger tab in the MPM UI.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
comet_ml
2+
pandas
3+
numpy
4+
lightgbm
5+
fastapi
6+
requests
7+
asyncio
8+
tqdm
9+
comet_mpm
10+
uvicorn

0 commit comments

Comments
 (0)