Skip to content

Commit c0c81ea

Browse files
committed
ML Insights - Update and Add new sample notebooks
1 parent 2f6f3b3 commit c0c81ea

File tree

4 files changed

+949
-37
lines changed

4 files changed

+949
-37
lines changed

ml-insights/sample_notebooks/11_Data_Correlation_Metrics.ipynb

+190-37
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "bb530e57",
6+
"metadata": {},
7+
"source": [
8+
"# Improved Insights Configuration Authoring Experience"
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"id": "44c63619",
14+
"metadata": {
15+
"tags": []
16+
},
17+
"source": [
18+
"This Notebook shows the features to ease the developer experience of authoring Insights JSON Configuration. This shows how to author Insights configuration programmatically using InsightsBuilder and InsightsConfigWriter APIs.\n",
19+
"\n",
20+
"The InsightsBuilder class is used to define and customise all of its core features like data schema, data ingestion, data transformation, metric calculation and post processing of metric output .\n",
21+
"\n",
22+
"The InsightsConfigWriter class from ML Insights Library will be used to build a config JSON file from InsightsBuilder class instance.\n",
23+
"\n",
24+
"In this Notebook we have the following examples -\n",
25+
"\n",
26+
"\n",
27+
"- Generate Insights Configuration JSON from InsightsBuilder class\n",
28+
"- Approximate input_schema detection from sample dataset and then generate Insights Configuration JSON"
29+
]
30+
},
31+
{
32+
"cell_type": "markdown",
33+
"id": "d807a7da",
34+
"metadata": {},
35+
"source": [
36+
"# Install ML Observability Insights Library SDK\n",
37+
"\n",
38+
"- Prerequisites\n",
39+
" - Linux/Mac (Intel CPU)\n",
40+
" - Python 3.8 and 3.9 only\n",
41+
"\n",
42+
"\n",
43+
"- Installation\n",
44+
" - ML Insights is made available as a Python package (via Artifactory) which can be installed using pip install as shown below. Depending on the execution engine on which to do the run, one can use scoped package. For eg: if we want to run on dask, use oracle-ml-insights[dask], for spark use oracle-ml-insights[spark], for native use oracle-ml-insights. One can install all the dependencies as use oracle-ml-insights[all]\n",
45+
"\n",
46+
" !pip install oracle-ml-insights\n",
47+
"\n",
48+
"Refer : [Installation and Setup](https://docs.oracle.com/en-us/iaas/tools/ml-insights-docs/latest/ml-insights-documentation/html/user_guide/tutorials/install.html)"
49+
]
50+
},
51+
{
52+
"cell_type": "code",
53+
"execution_count": null,
54+
"id": "b2d719f8",
55+
"metadata": {},
56+
"outputs": [],
57+
"source": [
58+
"!python3 -m pip install oracle-ml-insights"
59+
]
60+
},
61+
{
62+
"cell_type": "markdown",
63+
"id": "e02be26b",
64+
"metadata": {},
65+
"source": [
66+
"# 1 ML Insights Imports "
67+
]
68+
},
69+
{
70+
"cell_type": "code",
71+
"execution_count": 1,
72+
"id": "e59af580",
73+
"metadata": {
74+
"tags": []
75+
},
76+
"outputs": [],
77+
"source": [
78+
"# imports\n",
79+
"\n",
80+
"import json\n",
81+
"\n",
82+
"# Import Data Quality metrics \n",
83+
"from mlm_insights.core.metrics.mean import Mean\n",
84+
"from mlm_insights.core.metrics.standard_deviation import StandardDeviation\n",
85+
"\n",
86+
"# Import Data Integrity metrics\n",
87+
"from mlm_insights.core.metrics.rows_count import RowCount\n",
88+
"\n",
89+
"from mlm_insights.builder.builder_component import MetricDetail\n",
90+
"from mlm_insights.constants.types import FeatureType, DataType, VariableType\n",
91+
"from mlm_insights.core.metrics.metric_metadata import MetricMetadata\n",
92+
"from mlm_insights.core.post_processors.local_writer_post_processor import LocalWriterPostProcessor\n",
93+
"\n",
94+
"# import data reader\n",
95+
"from mlm_insights.core.data_sources import LocalDatePrefixDataSource\n",
96+
"from mlm_insights.mlm_native.readers import CSVNativeDataReader\n",
97+
"\n",
98+
"# import InsightsBuilder\n",
99+
"from mlm_insights.builder.insights_builder import InsightsBuilder\n",
100+
"\n",
101+
"# import InsightsConfigWriter\n",
102+
"from mlm_insights.config_writer.insights_config_writer import InsightsConfigWriter"
103+
]
104+
},
105+
{
106+
"cell_type": "markdown",
107+
"id": "5159be09",
108+
"metadata": {},
109+
"source": [
110+
"## 2 Generate Insights Configuration JSON using InsightsConfigWriter \n",
111+
"\n",
112+
"The below section shows how the InsightsBuilder class is used to define and customise all of its core features like data schema, data ingestion, metric calculation and post processing of metric output .\n",
113+
"\n",
114+
"The Config Writer class from ML Insights Library used to build a config file from InsightsBuilder class instance using to_json() method.\n",
115+
"\n",
116+
"The user can save the config to Object storage using save_config_to_object_storage() method of Config Writer class of ML Insights Library."
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": null,
122+
"id": "a7d05c7b",
123+
"metadata": {
124+
"pycharm": {
125+
"is_executing": true
126+
}
127+
},
128+
"outputs": [],
129+
"source": [
130+
"def get_input_schema():\n",
131+
" return {\n",
132+
" \"Pregnancies\": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS),\n",
133+
" \"BloodPressure\": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS)\n",
134+
" }\n",
135+
"\n",
136+
"def get_metrics_input():\n",
137+
" metrics = [\n",
138+
" MetricMetadata(klass=Mean),\n",
139+
" MetricMetadata(klass=StandardDeviation)\n",
140+
" ]\n",
141+
" uni_variate_metrics = {\n",
142+
" \"BloodPressure\": metrics\n",
143+
" }\n",
144+
" metric_details = MetricDetail(univariate_metric=uni_variate_metrics,\n",
145+
" dataset_metrics=[MetricMetadata(klass=RowCount)])\n",
146+
" return metric_details\n",
147+
"\n",
148+
"def get_reader():\n",
149+
" data = {\n",
150+
" \"file_type\": \"csv\",\n",
151+
" \"date_range\": {\"start\": \"2023-06-26\", \"end\": \"2023-06-27\"}\n",
152+
" }\n",
153+
" base_location =\"input_data/diabetes_prediction\"\n",
154+
" ds = LocalDatePrefixDataSource(base_location, **data)\n",
155+
" csv_reader = CSVNativeDataReader(data_source=ds)\n",
156+
" return csv_reader\n",
157+
"\n",
158+
"def write_config(config_json,file_name):\n",
159+
" \"\"\"\n",
160+
" Writes the configuration dictionary to a JSON file.\n",
161+
" \"\"\"\n",
162+
" with open(file_name, \"w\") as f:\n",
163+
" json.dump(config_json, f, indent=4) # Indent for readability\n",
164+
" print(\"Configuration file created \")\n",
165+
"\n",
166+
"def main(): \n",
167+
" # Set up the insights builder by passing: input schema, metric, reader and engine details\n",
168+
" runner = InsightsBuilder(). \\\n",
169+
" with_input_schema(get_input_schema()). \\\n",
170+
" with_metrics(metrics=get_metrics_input()). \\\n",
171+
" with_reader(reader=get_reader()). \\\n",
172+
" with_post_processors(post_processors=[LocalWriterPostProcessor(file_location=\"output_data/profiles\", file_name=\"classification_metrics_profile.bin\")])\n",
173+
"\n",
174+
"\n",
175+
" # Run the evaluation\n",
176+
" config_writer = InsightsConfigWriter(insights_builder=runner)\n",
177+
" config_json_from_builder = config_writer.to_json()\n",
178+
" return config_json_from_builder\n",
179+
" \n",
180+
"config_json = main()\n",
181+
"config_json_1 = json.loads(config_json)\n",
182+
"print(config_json_1) \n",
183+
"write_config(config_json_1,\"config_json_1\")"
184+
]
185+
},
186+
{
187+
"cell_type": "markdown",
188+
"id": "512ab83d",
189+
"metadata": {},
190+
"source": [
191+
"## 2.1 Generate Configuration with Automatic approximate input_schema detection \n",
192+
"\n",
193+
"In above section we showed how to define the input schema of each feature one by one along with defining other components using Insights Builder . To ease the developer experience in below section we show how to use automatic approximate input_schema detection feature using the sample dataset.The auto-generated input_schema feature infers the data_type and variable_type of each feature and creates the input schema.\n",
194+
"\n",
195+
"Here we are using with_input_schema_using_dataset() method of InsightsBuilder class which take the sample dataset and column_type feature details and auto generate the approximated input_schema instead of defining each feature schema .\n",
196+
"\n",
197+
"Note : The auto generated input_schema is approximated version of input-schema, it may not be 100% correct .User needs to validate the input_schema and make the neccesary changes if required.\n"
198+
]
199+
},
200+
{
201+
"cell_type": "code",
202+
"execution_count": null,
203+
"id": "5aea5b41",
204+
"metadata": {
205+
"pycharm": {
206+
"is_executing": true,
207+
"name": "#%%\n"
208+
}
209+
},
210+
"outputs": [],
211+
"source": [
212+
"def config_authoring_using_auto_generated_input_schema(): \n",
213+
" data_set_location = \"input_data/diabetes_prediction/2023-06-26/2023-06-26.csv\"\n",
214+
" target_features = [\"Outcome\"]\n",
215+
" prediction_features = [\"Prediction\"]\n",
216+
" prediction_score_features = [\"Prediction_Score\"]\n",
217+
" # Set up the insights builder by passing: dataset location to generate approaximate input_schema, coulumn_type feature name , metric, reader and engine details\n",
218+
" runner = InsightsBuilder(). \\\n",
219+
" with_input_schema_using_dataset(data_set_location,target_features,prediction_features,prediction_score_features). \\\n",
220+
" with_metrics(metrics=get_metrics_input()). \\\n",
221+
" with_reader(reader=get_reader()). \\\n",
222+
" with_post_processors(post_processors=[LocalWriterPostProcessor(file_location=\"output_data/profiles\", file_name=\"classification_metrics_profile.bin\")])\n",
223+
"\n",
224+
"\n",
225+
" # Run the evaluation\n",
226+
" config_writer = InsightsConfigWriter(insights_builder=runner)\n",
227+
" config_json_from_builder = config_writer.to_json()\n",
228+
" print(config_json_from_builder)\n",
229+
" return config_json_from_builder\n",
230+
"\n",
231+
"config_json = config_authoring_using_auto_generated_input_schema()\n",
232+
"config_json_2 = json.loads(config_json)\n",
233+
"\n",
234+
"write_config(config_json_2,\"config_json_2.json\")\n"
235+
]
236+
}
237+
],
238+
"metadata": {
239+
"kernelspec": {
240+
"display_name": "Python 3 (ipykernel)",
241+
"language": "python",
242+
"name": "python3"
243+
},
244+
"language_info": {
245+
"codemirror_mode": {
246+
"name": "ipython",
247+
"version": 3
248+
},
249+
"file_extension": ".py",
250+
"mimetype": "text/x-python",
251+
"name": "python",
252+
"nbconvert_exporter": "python",
253+
"pygments_lexer": "ipython3",
254+
"version": "3.9.19"
255+
}
256+
},
257+
"nbformat": 4,
258+
"nbformat_minor": 5
259+
}

0 commit comments

Comments
 (0)