Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
117 commits
Select commit Hold shift + click to select a range
622ee61
adding count mlflow objects utilities
dcoles-cc Apr 25, 2023
373802e
adding ex/import `common` overrides
dcoles-cc Apr 25, 2023
eb25624
`count-exported-mlflow-objects`: mod path because of `export_all`
dcoles-cc Apr 27, 2023
eb73753
`Export_All`: log stdout to file
dcoles-cc Apr 27, 2023
049d143
`count-objects-in-mlflow`: swap out `MlflowClient` functionality
dcoles-cc Apr 27, 2023
3d52b58
write logs to file for import notebooks
dcoles-cc Apr 27, 2023
1dae66f
`count-object-in-mlflow`: dropping view_type
dcoles-cc Apr 28, 2023
3239444
update logging
dcoles-cc May 2, 2023
3064b14
count-exported-mlflow-objects: fixing path error
dcoles-cc May 3, 2023
52f1242
Export_Models: rm dead code
dcoles-cc May 3, 2023
8d9b867
removing ThreadPoolExecutor argument
dcoles-cc May 3, 2023
894d3cb
create `mycode/` and migrate my code there
dcoles-cc May 5, 2023
0194261
create manual-export-models nb
dcoles-cc May 5, 2023
4f82e0c
export logs
dcoles-cc May 5, 2023
4e24379
remove dead configs
dcoles-cc May 5, 2023
9cb8a01
remove dead logs
dcoles-cc May 5, 2023
f0e0eaa
updated `count-export-mlflow-objects`
dcoles-cc May 8, 2023
be83bab
created `manual-export-experiments`
dcoles-cc May 8, 2023
c341da1
adding .gitignore
dcoles-cc May 8, 2023
9c262ab
Delete exported_experiments
dcoles-cc May 8, 2023
84a14cb
Delete export_models_20230504_1709.log
dcoles-cc May 8, 2023
784aad8
Delete exported_models
dcoles-cc May 8, 2023
d8b904d
updated .gitignore
dcoles-cc May 8, 2023
30d86bf
created `manual-import-experiments`
dcoles-cc May 8, 2023
82dab7b
updating credential security
dcoles-cc May 8, 2023
5ecc79b
rename import/export NBs
dcoles-cc May 10, 2023
341207c
directly implementing looping logic for model imports
dcoles-cc May 10, 2023
4974cc9
adding count validation utilities
dcoles-cc May 10, 2023
79d9488
created single model import for CLI
dcoles-cc May 10, 2023
2cf19e0
dead code
dcoles-cc May 10, 2023
58c83d4
change path
dcoles-cc May 10, 2023
e51fb10
temp commit
dcoles-cc May 11, 2023
12ed20d
count-objects-in-mlflow: added start_time widget
dcoles-cc May 11, 2023
375365c
creating get-model-hash to validate migrated models
dcoles-cc May 25, 2023
ccd4448
get-model-hash: documentation
dcoles-cc May 25, 2023
7bbf9b6
get-model-hash: updating
dcoles-cc May 30, 2023
056a480
creating get-model-hashes to hash all models
dcoles-cc May 30, 2023
a12d1e8
get-model-hash: bugfix
dcoles-cc May 30, 2023
38be473
get-model-hash: changing run logic based on whether model files exist
dcoles-cc May 30, 2023
a22a957
get-model-hash: bugfix
dcoles-cc May 30, 2023
653532e
update gitignore
Feb 11, 2025
7ba4cfa
revisiting export-experiments
Feb 11, 2025
358c0d8
update cli-export-experiments
Feb 14, 2025
2fd7603
changes to aws mlflow exim utils
dcoles-cc Feb 14, 2025
e256492
update mount
dcoles-cc Feb 14, 2025
73757a8
Merge pull request #3 from dcoles-cc/mybranch
dcoles-cc Feb 14, 2025
6f2764c
small updates
dcoles-cc Feb 18, 2025
f7ceb7b
adding logic to discard failed model registrations
dcoles-cc Feb 18, 2025
d6654df
small updates
dcoles-cc Feb 19, 2025
be4f3c5
Merge pull request #4 from dcoles-cc/mybranch
dcoles-cc Feb 19, 2025
2fbe5d5
add files of model hashes from both platforms
Feb 19, 2025
3a861ce
update output dir
Feb 19, 2025
aedb24d
use default `stages` option in `cli-export-models`
Feb 19, 2025
65c327f
create nb to compare model hashes across platforms
Feb 19, 2025
a21265d
Merge pull request #5 from dcoles-cc/mybranch
dcoles-cc Feb 19, 2025
f226fae
create get_credentials_path()
Feb 19, 2025
44c8eaf
update export utils to use get_credentials_path()
Feb 19, 2025
5998e8a
update get_model_hash() to use get_credentials_path()
Feb 19, 2025
acd9aca
Merge pull request #6 from dcoles-cc/export-branch
dcoles-cc Feb 19, 2025
8e7d991
Merge pull request #7 from dcoles-cc/master
dcoles-cc Feb 19, 2025
ce2bcf1
update import side to use ./credentials
dcoles-cc Feb 19, 2025
d202efe
update manual import to use ./credentials
dcoles-cc Feb 19, 2025
960bedf
Merge pull request #8 from dcoles-cc/import-branch
dcoles-cc Feb 19, 2025
72b2364
Merge pull request #9 from dcoles-cc/master
dcoles-cc Feb 19, 2025
d28bff1
dead code
dcoles-cc Feb 19, 2025
a10930e
Merge pull request #10 from dcoles-cc/import-branch
dcoles-cc Feb 19, 2025
49445a7
Merge pull request #11 from dcoles-cc/export-branch
dcoles-cc Feb 19, 2025
7a585da
add table viz for compare-model-hashes
Feb 19, 2025
5f12a0b
create models-by-owner
Feb 19, 2025
8ba0848
Merge pull request #12 from dcoles-cc/export-branch
dcoles-cc Feb 19, 2025
dd6adaf
create get-model-hash-from-us
dcoles-cc Feb 24, 2025
84c841c
Merge pull request #13 from dcoles-cc/import-branch
dcoles-cc Feb 24, 2025
9104bfd
update get-model-hash to operate in both AWS and Azure with the new, …
Feb 24, 2025
e20d35d
update get-model-hashes to operate on both platforms
Feb 24, 2025
45a0506
address case where model has no `Production` or `champion` version
Feb 24, 2025
f38fc9d
just hash `model.pkl` and don't worry about the rest
Feb 24, 2025
7c597cd
update model fetching logic to `*.pkl`
Feb 24, 2025
fae89f3
Merge pull request #14 from dcoles-cc/export-branch
dcoles-cc Feb 25, 2025
922ca98
delete get-model-hash-from-uc
dcoles-cc Feb 25, 2025
1248db0
Merge pull request #15 from dcoles-cc/import-branch
dcoles-cc Feb 25, 2025
965d9b9
update nb description + fix model uri in AWS
dcoles-cc Feb 25, 2025
0dcd08e
Merge pull request #16 from dcoles-cc/import-branch
dcoles-cc Feb 25, 2025
f5b1737
change to workspace/UC as opposed to azure/aws
dcoles-cc Feb 25, 2025
8ab85f0
Merge pull request #17 from dcoles-cc/import-branch
dcoles-cc Feb 25, 2025
159ecc6
update azure-model-hashes
Feb 25, 2025
7468a45
updating/creating aws hash lists
dcoles-cc Feb 25, 2025
8efd9c2
Merge pull request #18 from dcoles-cc/export-branch
dcoles-cc Feb 25, 2025
cb5066f
Merge pull request #19 from dcoles-cc/import-branch
dcoles-cc Feb 25, 2025
61e3c28
ensure get latest version of Production model when in workspace
Feb 25, 2025
fb5acf2
update compare model hash logic
dcoles-cc Feb 25, 2025
51321c4
cover case where no Production model exists in workspace registry
dcoles-cc Feb 25, 2025
a5a9c10
az model hashes
Feb 25, 2025
11f43a8
aws model hashes
dcoles-cc Feb 25, 2025
43de083
update model deletion logic to handle UC and Workspace registries
dcoles-cc Feb 26, 2025
70530a2
dead code
dcoles-cc Feb 26, 2025
4dc9fbe
create hash comparison logic for UC models
dcoles-cc Feb 26, 2025
fa6118e
disable threading on model import
dcoles-cc Feb 27, 2025
458a4bd
update datalake path in count logic
dcoles-cc Feb 27, 2025
54d7228
update aws UC model hashes
dcoles-cc Feb 27, 2025
641bd1d
created a `mount` notebook to help mount S3 buckets
May 8, 2025
c238a6d
modding cli-export-models to operate for azure-dev-to-aws-nonprod
May 8, 2025
79472ba
adding documentation
May 8, 2025
3e52fb5
replace external mount check
May 9, 2025
119a936
update cli-export-export
May 9, 2025
47bbd04
parse out stats by experiment & model
May 9, 2025
05d2ad2
update logic for ds-non-prod
dcoles-cc May 12, 2025
b906341
impl model count routine
dcoles-cc May 12, 2025
f96927e
Merge pull request #20 from dcoles-cc/import-branch
dcoles-cc May 12, 2025
f12a5b4
Merge pull request #21 from dcoles-cc/master
dcoles-cc May 12, 2025
394bb18
Merge branch 'export-branch' into master
dcoles-cc May 12, 2025
64e5895
Merge pull request #22 from dcoles-cc/master
dcoles-cc May 12, 2025
c1cf499
Merge pull request #23 from dcoles-cc/export-branch
dcoles-cc May 12, 2025
2b959cb
limit export to Production stage
May 12, 2025
16a16e8
update count prod models logic
May 12, 2025
652048b
rename count models nb
May 12, 2025
e30ebaa
Merge pull request #24 from dcoles-cc/export-branch
dcoles-cc May 12, 2025
d5f63ad
Merge pull request #25 from dcoles-cc/master
dcoles-cc May 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 26 additions & 12 deletions databricks_notebooks/bulk/Export_All.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,15 +62,29 @@

# COMMAND ----------

from mlflow_export_import.bulk.export_all import export_all

export_all(
output_dir = output_dir,
stages = stages,
export_latest_versions = export_latest_versions,
run_start_time = run_start_date,
export_permissions = export_permissions,
export_deleted_runs = export_deleted_runs,
notebook_formats = notebook_formats,
use_threads = use_threads
)
# MAGIC %%capture captured
# MAGIC
# MAGIC from mlflow_export_import.bulk.export_all import export_all
# MAGIC
# MAGIC export_all(
# MAGIC output_dir = output_dir,
# MAGIC stages = stages,
# MAGIC export_latest_versions = export_latest_versions,
# MAGIC run_start_time = run_start_date,
# MAGIC export_permissions = export_permissions,
# MAGIC export_deleted_runs = export_deleted_runs,
# MAGIC notebook_formats = notebook_formats,
# MAGIC use_threads = use_threads
# MAGIC )

# COMMAND ----------

# DBTITLE 1,write log file
filepath = "/mnt/public-blobs/dcoles/mlflow_export_log.txt"

dbutils.fs.rm(filepath)
dbutils.fs.put(filepath, captured.stdout)

# COMMAND ----------


20 changes: 18 additions & 2 deletions databricks_notebooks/bulk/Export_Experiments.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Databricks notebook source
# MAGIC %md ## Export Experiments
# MAGIC
# MAGIC
# MAGIC Export multiple experiments and all their runs.
# MAGIC
# MAGIC
# MAGIC Widgets
# MAGIC * `1. Experiments` - comma delimited list of either experiment IDs or experiment names. `all` will export all experiments.
# MAGIC * `2. Output directory` - shared directory between source and destination workspaces.
Expand Down Expand Up @@ -53,6 +53,22 @@

# COMMAND ----------

# DBTITLE 1,set up log file
import os
from datetime import datetime
import pytz

cst = pytz.timezone('US/Central')
now = datetime.now(tz=cst)
date = now.strftime("%Y-%m-%d-%H:%M:%S")

logfile = f"export_experiments.{date}.log"
os.environ["MLFLOW_EXPORT_IMPORT_LOG_OUTPUT_FILE"] = logfile

print("Logging to", logfile)

# COMMAND ----------

assert_widget(experiments, "1. Experiments")
assert_widget(output_dir, "2. Output directory")

Expand Down
52 changes: 19 additions & 33 deletions databricks_notebooks/bulk/Export_Models.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Databricks notebook source
# MAGIC %md ## Export Models
# MAGIC
# MAGIC
# MAGIC Export specified models, their version runs and the experiments that the runs belong to.
# MAGIC
# MAGIC
# MAGIC Widgets
# MAGIC * `1. Models` - comma seperated registered model names to be exported. `all` will export all models.
# MAGIC * `2. Output directory` - shared directory between source and destination workspaces.
Expand All @@ -13,7 +13,7 @@
# MAGIC * `7. Export deleted runs`
# MAGIC * `8. Notebook formats`
# MAGIC * `9. Use threads`
# MAGIC
# MAGIC
# MAGIC See: https://github.com/mlflow/mlflow-export-import/blob/master/README_bulk.md#registered-models.

# COMMAND ----------
Expand Down Expand Up @@ -69,6 +69,22 @@

# COMMAND ----------

# DBTITLE 1,set up log file
import os
from datetime import datetime
import pytz

cst = pytz.timezone('US/Central')
now = datetime.now(tz=cst)
date = now.strftime("%Y-%m-%d-%H:%M:%S")

logfile = f"export_models.{date}.log"
os.environ["MLFLOW_EXPORT_IMPORT_LOG_OUTPUT_FILE"] = logfile

print("Logging to", logfile)

# COMMAND ----------

assert_widget(models, "1. Models")
assert_widget(output_dir, "2. Output directory")

Expand All @@ -91,33 +107,3 @@
notebook_formats = notebook_formats,
use_threads = use_threads
)

# COMMAND ----------

# MAGIC %md ### Display exported files

# COMMAND ----------

# MAGIC %sh
# MAGIC echo $OUTPUT_DIR
# MAGIC ls -l $OUTPUT_DIR

# COMMAND ----------

# MAGIC %sh cat $OUTPUT_DIR/manifest.json

# COMMAND ----------

# MAGIC %sh ls -l $OUTPUT_DIR/models

# COMMAND ----------

# MAGIC %sh cat $OUTPUT_DIR/models/models.json

# COMMAND ----------

# MAGIC %sh ls -l $OUTPUT_DIR/experiments

# COMMAND ----------

# MAGIC %sh cat $OUTPUT_DIR/experiments/experiments.json
26 changes: 24 additions & 2 deletions databricks_notebooks/bulk/Import_Experiments.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Databricks notebook source
# MAGIC %md ## Import Experiments
# MAGIC
# MAGIC
# MAGIC Widgets
# MAGIC * `1. Input directory` - directory of exported experiments.
# MAGIC * `2. Import source tags`
# MAGIC * `3. Experiment rename file` - Experiment rename file.
# MAGIC * `4. Use threads` - use multi-threaded import.
# MAGIC
# MAGIC
# MAGIC See https://github.com/mlflow/mlflow-export-import/blob/master/README_bulk.md#Import-experiments.

# COMMAND ----------
Expand All @@ -15,6 +15,22 @@

# COMMAND ----------

# DBTITLE 1,set up log file
import os
from datetime import datetime
import pytz

cst = pytz.timezone('US/Central')
now = datetime.now(tz=cst)
date = now.strftime("%Y-%m-%d-%H:%M:%S")

logfile = f"import_experiments.{date}.log"
os.environ["MLFLOW_EXPORT_IMPORT_LOG_OUTPUT_FILE"] = logfile

print("Logging to", logfile)

# COMMAND ----------

dbutils.widgets.text("1. Input directory", "")
input_dir = dbutils.widgets.get("1. Input directory")
input_dir = input_dir.replace("dbfs:","/dbfs")
Expand All @@ -40,6 +56,8 @@

# COMMAND ----------

#%%capture captured

from mlflow_export_import.bulk.import_experiments import import_experiments

import_experiments(
Expand All @@ -48,3 +66,7 @@
experiment_renames = experiment_rename_file,
use_threads = use_threads
)

# COMMAND ----------


38 changes: 28 additions & 10 deletions databricks_notebooks/bulk/Import_Models.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Databricks notebook source
# MAGIC %md ## Import Models
# MAGIC
# MAGIC
# MAGIC Widgets
# MAGIC * `1. Input directory` - directory of exported models.
# MAGIC * `2. Delete model` - delete the current contents of model
# MAGIC * `3. Model rename file` - Model rename file.
# MAGIC * `4. Experiment rename file` - Experiment rename file.
# MAGIC * `5. Import source tags`
# MAGIC * `6. Use threads` - use multi-threaded import
# MAGIC
# MAGIC
# MAGIC See https://github.com/mlflow/mlflow-export-import/blob/master/README_bulk.md#Import-registered-models

# COMMAND ----------
Expand Down Expand Up @@ -47,17 +47,35 @@

# COMMAND ----------

# DBTITLE 1,set up log file
import os
from datetime import datetime
import pytz

cst = pytz.timezone('US/Central')
now = datetime.now(tz=cst)
date = now.strftime("%Y-%m-%d-%H:%M:%S")

logfile = f"import_models.{date}.log"
os.environ["MLFLOW_EXPORT_IMPORT_LOG_OUTPUT_FILE"] = logfile

print("Logging to", logfile)

# COMMAND ----------

assert_widget(input_dir, "1. Input directory")

# COMMAND ----------

from mlflow_export_import.bulk.import_models import import_all
from mlflow_export_import.bulk.import_models import import_models

import_all(
input_dir = input_dir,
delete_model = delete_model,
import_source_tags = import_source_tags,
experiment_renames = experiment_rename_file,
model_renames = model_rename_file,
use_threads = use_threads
import_models(
input_dir = input_dir,
delete_model = delete_model,
use_src_user_id = True,
verbose=True,
import_source_tags = import_source_tags,
experiment_renames = experiment_rename_file,
model_renames = model_rename_file,
use_threads = use_threads
)
2 changes: 2 additions & 0 deletions databricks_notebooks/bulk/export_common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Databricks notebook source
# MAGIC %run ./Common
14 changes: 14 additions & 0 deletions databricks_notebooks/bulk/import_common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Databricks notebook source
# DBTITLE 1,install mlflow-export-import from local
# MAGIC %pip install ../../../mlflow-export-import --use-feature=in-tree-build

# COMMAND ----------

def assert_widget(value, name):
if len(value.rstrip())==0:
raise Exception(f"ERROR: '{name}' widget is required")

# COMMAND ----------

import mlflow
mlflow_client = mlflow.client.MlflowClient()
2 changes: 1 addition & 1 deletion mlflow_export_import/bulk/export_experiments.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ def export_experiments(
export_results = []
futures = []
notebook_formats = utils.string_to_list(notebook_formats)
with ThreadPoolExecutor(max_workers=max_workers) as executor:
with ThreadPoolExecutor() as executor:
for exp_id_or_name in experiments:
run_ids = experiments_dct.get(exp_id_or_name, None)
future = executor.submit(_export_experiment,
Expand Down
2 changes: 1 addition & 1 deletion mlflow_export_import/bulk/export_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ def _export_models(

notebook_formats = utils.string_to_list(notebook_formats),
futures = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
with ThreadPoolExecutor() as executor:
for model_name in model_names:
dir = os.path.join(output_dir, model_name)
future = executor.submit(export_model,
Expand Down
5 changes: 5 additions & 0 deletions mycode/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
export_*
exported_*
import_*
imported_*
scratch*
Loading