Skip to content

Commit 5af29ea

Browse files
authored
Merge pull request #56 from splunk/fix-3400660-keras-issues
Update all images to 5.1.2 version of DSDL including fix branch 3400660 with all issues
2 parents 7e4cf60 + 58b4b49 commit 5af29ea

File tree

78 files changed

+11474
-4560
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+11474
-4560
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ There are a number of scripts in this repo which can help in various tasks when
7373
| --- | --- | --- | --- |
7474
| `build.sh` | Build a container using a configuration tag found in `tag_mapping.csv` | `./build.sh minimal-cpu splunk/ 5.1.1` | |
7575
| `bulk_build.sh` | Build all containers in a tag list | `./bulk_build.sh tag_mapping.csv splunk/ 5.1.1` | |
76-
| `compile_image_python_requirements.sh` | Use a base image and simplified dockerfile to pre-compute the python dependancy versions for all libraries listed in the tag's referenced requirements files | `./compile_image_python_requirements.sh minimal-cpu Dockerfile.5.1.1.debian.requirements` | If the Dockerfile for the tag is not specified, the script looks for the tags Dockerfile plus the `.requirements` extension. If this does not exist, please create a requirements dockerfile or specifiy and appropriate requirements dockerfile. An example can be found in /dockerfiles/Dockerfile.5.1.1.debian.requirements |
76+
| `compile_image_python_requirements.sh` | Use a base image and simplified dockerfile to pre-compute the python dependancy versions for all libraries listed in the tag's referenced requirements files | `./compile_image_python_requirements.sh minimal-cpu` | If the Dockerfile for the tag is not specified, the script looks for the tags Dockerfile plus the `.requirements` extension. If this does not exist, please create a requirements dockerfile or specifiy and appropriate requirements dockerfile. An example can be found in /dockerfiles/Dockerfile.debian.requirements |
7777
| `bulk_compile.sh` | Attempt to pre-compile python dependancy versions for all containers in a tag list | `./bulk_build.sh tag_mapping.csv` | Makes assumptions about dockerfile names as described above. |
7878
| `scan_container.sh` | Scan a built container for vulnerabilities and produce a report with Trivy | `./scan_container.sh minimal-cpu splunk/ 5.1.1` | Downloads the Trivy container to run the scan. |
7979
| `test_container.sh` | Run a set of simulated tests using Playwright on a built container. | `./test_container.sh minimal-cpu splunk/ 5.1.1` | Requires the setup of a python virtual environment that can run Playwright. Specific python versions and dependancies may be required at the system level. |
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
#!/usr/bin/env python
2+
# coding: utf-8
3+
4+
5+
6+
# In[1]:
7+
8+
9+
# this definition exposes all python module imports that should be available in all subsequent commands
10+
import json
11+
import numpy as np
12+
import pandas as pd
13+
import os
14+
15+
# for operationalization of the model we want to use a few other libraries later
16+
from sklearn.preprocessing import OneHotEncoder
17+
from sklearn.ensemble import IsolationForest
18+
19+
# global constants
20+
MODEL_DIRECTORY = "/srv/app/model/data/"
21+
22+
23+
24+
25+
26+
27+
28+
29+
# In[3]:
30+
31+
32+
# this cell is not executed from MLTK and should only be used for staging data into the notebook environment
33+
def stage(name):
34+
with open("data/"+name+".csv", 'r') as f:
35+
df = pd.read_csv(f)
36+
with open("data/"+name+".json", 'r') as f:
37+
param = json.load(f)
38+
return df, param
39+
40+
41+
42+
43+
44+
45+
46+
47+
48+
49+
50+
51+
52+
53+
54+
55+
56+
57+
58+
59+
60+
61+
# In[12]:
62+
63+
64+
# initialize your model
65+
# available inputs: data and parameters
66+
# returns the model object which will be used as a reference to call fit, apply and summary subsequently
67+
def init(df,param):
68+
model = {}
69+
model['encoder'] = OneHotEncoder(handle_unknown='ignore')
70+
model['detector'] = IsolationForest(contamination=0.01)
71+
return model
72+
73+
74+
75+
76+
77+
78+
79+
80+
81+
82+
83+
84+
85+
86+
87+
88+
89+
90+
91+
92+
93+
94+
95+
96+
97+
98+
99+
100+
101+
102+
103+
104+
# In[26]:
105+
106+
107+
# train your model
108+
# returns a fit info json object and may modify the model object
109+
def fit(model,df,param):
110+
features_to_encode = df[['ComputerName','EventCode']]
111+
model['encoder'].fit(features_to_encode)
112+
encoded_features = model['encoder'].transform(features_to_encode)
113+
df_encoded_features = pd.concat([df[['count']], pd.DataFrame(encoded_features.toarray()).add_prefix('f_')], axis=1)
114+
model['detector'].fit(df_encoded_features)
115+
info = {"message": "model trained"}
116+
return info
117+
118+
119+
120+
121+
122+
123+
124+
125+
# In[28]:
126+
127+
128+
# apply your model
129+
# returns the calculated results
130+
def apply(model,df,param):
131+
features_to_encode = df[['ComputerName','EventCode']]
132+
encoded_features = model['encoder'].transform(features_to_encode)
133+
df_encoded_features = pd.concat([df[['count']], pd.DataFrame(encoded_features.toarray()).add_prefix('f_')], axis=1)
134+
outliers = model['detector'].predict(df_encoded_features)
135+
result = pd.DataFrame(outliers, columns=['outlier'])
136+
return result
137+
138+
139+
140+
141+
142+
143+
144+
145+
# In[30]:
146+
147+
148+
# save model to name in expected convention "<algo_name>_<model_name>"
149+
def save(model,name):
150+
# we skip saving and loading in this example, but of course you can build your preferred serialization here
151+
#with open(MODEL_DIRECTORY + name + ".json", 'w') as file:
152+
# json.dump(model, file)
153+
return model
154+
155+
156+
157+
158+
159+
160+
# In[31]:
161+
162+
163+
# load model from name in expected convention "<algo_name>_<model_name>"
164+
def load(name):
165+
# we skip saving and loading in this example, but of course you can build your preferred deserialization here
166+
model = {}
167+
#with open(MODEL_DIRECTORY + name + ".json", 'r') as file:
168+
# model = json.load(file)
169+
return model
170+
171+
172+
173+
174+
175+
176+
# In[32]:
177+
178+
179+
# return a model summary
180+
def summary(model=None):
181+
returns = {"version": {"numpy": np.__version__, "pandas": pd.__version__} }
182+
return returns
183+
184+
185+
186+
187+
188+
189+

app/model/anomaly_detection_ecod.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44

55

6-
# In[ ]:
6+
# In[1]:
77

88

99
# this definition exposes all python module imports that should be available in all subsequent commands
@@ -29,7 +29,7 @@
2929

3030

3131

32-
# In[ ]:
32+
# In[6]:
3333

3434

3535
# this cell is not executed from MLTK and should only be used for staging data into the notebook environment
@@ -51,7 +51,7 @@ def stage(name):
5151

5252

5353

54-
# In[ ]:
54+
# In[10]:
5555

5656

5757
# initialize your model
@@ -62,7 +62,7 @@ def init(df,param):
6262
# parallization options for ECOD:
6363
# ECOD(n_jobs=2)
6464
# most of other PyOD models would work similar, e.g. replace with Isolation Forest:
65-
#model = IForest()
65+
# model = IForest()
6666

6767
return model
6868

@@ -73,7 +73,7 @@ def init(df,param):
7373

7474

7575

76-
# In[ ]:
76+
# In[12]:
7777

7878

7979
# train your model
@@ -95,7 +95,7 @@ def fit(model,df,param):
9595

9696

9797

98-
# In[ ]:
98+
# In[14]:
9999

100100

101101
# apply your model
@@ -117,7 +117,7 @@ def apply(model,df,param):
117117

118118

119119

120-
# In[ ]:
120+
# In[16]:
121121

122122

123123
# save model to name in expected convention "<algo_name>_<model_name>"
@@ -133,7 +133,7 @@ def save(model,name):
133133

134134

135135

136-
# In[ ]:
136+
# In[17]:
137137

138138

139139
# load model from name in expected convention "<algo_name>_<model_name>"
@@ -148,7 +148,7 @@ def load(name):
148148

149149

150150

151-
# In[ ]:
151+
# In[18]:
152152

153153

154154
# return a model summary

0 commit comments

Comments
 (0)