Skip to content

Commit 1ff3cf5

Browse files
SaschaHeyerk8s-ci-robot
authored andcommitted
added named entity recognition example (kubeflow#590)
* added named entity recognition example kubeflow/website#853 * added previous and next steps * changed all absolute links to relative links * changed headline for better understanding * moved dataset description section to top * fixed style * added missing Jupyter notebook * changed headline * added link to documentation * fixed meaning of images and components * adapted documentation to https://www.kubeflow.org/docs/about/style-guide/#address-the-audience-directly * added link to ai platform models * make it clear these are optional extensions * changed summary and goals * added kubeflow version * fixed s/an/a/ also checked the rest of the documentation * added #!/bin/sh * added environment variables for build scripts and adapted documentation * changed PROJECT TO PROJECT_ID * added link to kaggle dataset and removed not required copy script (due to direct public location in gs://). Adapted Jupyter notebook input data path * added hint to make clear no further steps are required * fixed s/Run/RUN/ * grammar fix * optimized text * added prev link to index * removed model description due to lack of information * added significance and congrats =) * added example * guided the user's attention to specific screens/metrics/graphs * explenation of pieces * updated main readme * updated parts * fixed typo * adapted dataset path * made scripts executable chmod +x * Update step-1-setup.md swaped sections and added env variables to gsutil comand * added information regarding public access * added named entity recognition example kubeflow/website#853 * added previous and next steps * changed all absolute links to relative links * changed headline for better understanding * moved dataset description section to top * fixed style * added missing Jupyter notebook * changed headline * added link to documentation * fixed meaning of images and components * adapted documentation to https://www.kubeflow.org/docs/about/style-guide/#address-the-audience-directly * added link to ai platform models * make it clear these are optional extensions * changed summary and goals * added kubeflow version * fixed s/an/a/ also checked the rest of the documentation * added #!/bin/sh * added environment variables for build scripts and adapted documentation * changed PROJECT TO PROJECT_ID * added link to kaggle dataset and removed not required copy script (due to direct public location in gs://). Adapted Jupyter notebook input data path * added hint to make clear no further steps are required * fixed s/Run/RUN/ * grammar fix * optimized text * added prev link to index * removed model description due to lack of information * added significance and congrats =) * added example * guided the user's attention to specific screens/metrics/graphs * explenation of pieces * updated main readme * updated parts * fixed typo * adapted dataset path * made scripts executable chmod +x * Update step-1-setup.md swaped sections and added env variables to gsutil comand * added information regarding public access * fixed lint error * fixed lint issues * fixed lint issues * figured kubeflow examples are using 2 rather then 4 spaces (due to tensorflow standards) * lint fixes * reverted changes * removed unused import * removed object inherit * fixed lint issues * added kwargs to ignored-argument-name (due to best practice in Google custom prediction routine) * fix lint issues * set pylintrc back to default and removed unused argument
1 parent 78a79e7 commit 1ff3cf5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1458
-0
lines changed

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,17 @@ This repository is home to the following types of examples and demos:
1111

1212
## End-to-end
1313

14+
### [Named Entity Recognition](./named_entity_recognition)
15+
Author: [Sascha Heyer](https://github.com/saschaheyer)
16+
17+
This example covers the following concepts:
18+
1. Build reusable pipeline components
19+
2. Run Kubeflow Pipelines with Jupyter notebooks
20+
1. Train a Named Entity Recognition model on a Kubernetes cluster
21+
1. Deploy a Keras model to AI Platform
22+
1. Use Kubeflow metrics
23+
1. Use Kubeflow visualizations
24+
1425
### [GitHub issue summarization](./github_issue_summarization)
1526
Author: [Hamel Husain](https://github.com/hamelsmu)
1627

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
*.egg-info/
24+
.installed.cfg
25+
*.egg
26+
MANIFEST
27+
28+
# PyInstaller
29+
# Usually these files are written by a python script from a template
30+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
31+
*.manifest
32+
*.spec
33+
34+
# Installer logs
35+
pip-log.txt
36+
pip-delete-this-directory.txt
37+
38+
# Unit test / coverage reports
39+
htmlcov/
40+
.tox/
41+
.coverage
42+
.coverage.*
43+
.cache
44+
nosetests.xml
45+
coverage.xml
46+
*.cover
47+
.hypothesis/
48+
.pytest_cache/
49+
50+
# Translations
51+
*.mo
52+
*.pot
53+
54+
# Django stuff:
55+
*.log
56+
local_settings.py
57+
db.sqlite3
58+
59+
# Flask stuff:
60+
instance/
61+
.webassets-cache
62+
63+
# Scrapy stuff:
64+
.scrapy
65+
66+
# Sphinx documentation
67+
docs/_build/
68+
69+
# PyBuilder
70+
target/
71+
72+
# Jupyter Notebook
73+
.ipynb_checkpoints
74+
75+
# pyenv
76+
.python-version
77+
78+
# celery beat schedule file
79+
celerybeat-schedule
80+
81+
# SageMath parsed files
82+
*.sage.py
83+
84+
# Environments
85+
.env
86+
.venv
87+
env/
88+
venv/
89+
ENV/
90+
env.bak/
91+
venv.bak/
92+
93+
# Spyder project settings
94+
.spyderproject
95+
.spyproject
96+
97+
# Rope project settings
98+
.ropeproject
99+
100+
# mkdocs documentation
101+
/site
102+
103+
# mypy
104+
.mypy_cache/
105+
106+
# custom
107+
custom_prediction_routine.egg-info
108+
custom_prediction_routine*

named_entity_recognition/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Named Entity Recognition with Kubeflow and Keras
2+
3+
In this walkthrough, you will learn how to use Kubeflow to build reusable components to train your model on an kubernetes cluster and deploy it to AI platform.
4+
5+
## Goals
6+
7+
* Demonstrate how to build reusable pipeline components
8+
* Demonstrate how to use Keras only models
9+
* Demonstrate how to train a Named Entity Recognition model on a Kubernetes cluster
10+
* Demonstrate how to deploy a Keras model to AI Platform
11+
* Demonstrate how to use a custom prediction routine
12+
* Demonstrate how to use Kubeflow metrics
13+
* Demonstrate how to use Kubeflow visualizations
14+
15+
## What is Named Entity Recognition
16+
Named Entity Recognition is a word classification problem, which extract data called entities from text.
17+
18+
![solution](documentation/files/solution.png)
19+
20+
### Steps
21+
22+
1. [Setup Kubeflow and clone repository](documentation/step-1-setup.md)
23+
1. [Build the pipeline components](documentation/step-2-build-components.md)
24+
1. [Upload the dataset](documentation/step-3-upload-dataset.md)
25+
1. [Custom prediction routine](documentation/step-4-custom-prediction-routine.md)
26+
1. [Run the pipeline](documentation/step-5-run-pipeline.md)
27+
1. [Monitor the training](documentation/step-6-monitor-training.md)
28+
1. [Predict](documentation/step-7-predictions.md)
29+
30+
31+
32+
33+
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/sh
2+
3+
echo "\nBuild and push preprocess component"
4+
./preprocess/build_image.sh
5+
6+
echo "\nBuild and push train component"
7+
./train/build_image.sh
8+
9+
echo "\nBuild and push deploy component"
10+
./deploy/build_image.sh
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/sh
2+
3+
BUCKET="your-bucket-name"
4+
5+
echo "\nCopy component specifications to Google Cloud Storage"
6+
gsutil cp preprocess/component.yaml gs://${BUCKET}/components/preprocess/component.yaml
7+
gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/preprocess/component.yaml
8+
9+
gsutil cp train/component.yaml gs://${BUCKET}/components/train/component.yaml
10+
gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/train/component.yaml
11+
12+
gsutil cp deploy/component.yaml gs://${BUCKET}/components/deploy/component.yaml
13+
gsutil acl ch -u AllUsers:R gs://${BUCKET}/components/deploy/component.yaml
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
FROM google/cloud-sdk:latest
2+
ADD ./src /pipelines/component/src
3+
RUN chmod 755 /pipelines/component/src/deploy.sh
4+
ENTRYPOINT ["/pipelines/component/src/deploy.sh"]
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/bin/sh
2+
3+
image_name=gcr.io/$PROJECT_ID/kubeflow/ner/deploy
4+
image_tag=latest
5+
6+
full_image_name=${image_name}:${image_tag}
7+
8+
cd "$(dirname "$0")"
9+
10+
docker build -t "${full_image_name}" .
11+
docker push "$full_image_name"
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
name: deploy
2+
description: Deploy the model with custom prediction route
3+
inputs:
4+
- name: Model path
5+
type: GCSPath
6+
description: 'Path of GCS directory containing exported Tensorflow model.'
7+
- name: Model name
8+
type: String
9+
description: 'The name specified for the model when it was or get created'
10+
- name: Model region
11+
type: String
12+
description: 'The region where the model is going to be deployed'
13+
- name: Model version
14+
type: String
15+
description: 'The version of the model'
16+
- name: Model runtime version
17+
type: String
18+
description: 'The runtime version of the model'
19+
- name: Model prediction class
20+
type: String
21+
description: 'The runtime version of the model'
22+
- name: Model python version
23+
type: String
24+
description: 'The python version of the model'
25+
- name: Model package uris
26+
type: String
27+
description: 'The packge uri of the model'
28+
outputs:
29+
implementation:
30+
container:
31+
image: gcr.io/<PROJECT-ID>/kubeflow/ner/deploy:latest
32+
command: [
33+
sh, /pipelines/component/src/deploy.sh
34+
]
35+
args: [
36+
--model-path, {inputValue: Model path},
37+
--model-name, {inputValue: Model name},
38+
--model-region, {inputValue: Model region},
39+
--model-version, {inputValue: Model version},
40+
--model-runtime-version, {inputValue: Model runtime version},
41+
--model-prediction-class, {inputValue: Model prediction class},
42+
--model-python-version, {inputValue: Model python version},
43+
--model-package-uris, {inputValue: Model package uris},
44+
]
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# loop through all parameters
2+
while [ "$1" != "" ]; do
3+
case $1 in
4+
"--model-path")
5+
shift
6+
MODEL_PATH="$1"
7+
echo
8+
shift
9+
;;
10+
"--model-name")
11+
shift
12+
MODEL_NAME="$1"
13+
echo
14+
shift
15+
;;
16+
"--model-region")
17+
shift
18+
MODEL_REGION="$1"
19+
echo
20+
shift
21+
;;
22+
"--model-version")
23+
shift
24+
MODEL_VERSION="$1"
25+
echo
26+
shift
27+
;;
28+
"--model-runtime-version")
29+
shift
30+
RUNTIME_VERSION="$1"
31+
echo
32+
shift
33+
;;
34+
"--model-prediction-class")
35+
shift
36+
MODEL_PREDICTION_CLASS="$1"
37+
echo
38+
shift
39+
;;
40+
"--model-python-version")
41+
shift
42+
MODEL_PYTHON_VERSION="$1"
43+
echo
44+
shift
45+
;;
46+
"--model-package-uris")
47+
shift
48+
MODEL_PACKAGE_URIS="$1"
49+
echo
50+
shift
51+
;;
52+
*)
53+
esac
54+
done
55+
56+
# echo inputs
57+
echo MODEL_PATH = "${MODEL_PATH}"
58+
echo MODEL = "${MODEL_EXPORT_PATH}"
59+
echo MODEL_NAME = "${MODEL_NAME}"
60+
echo MODEL_REGION = "${MODEL_REGION}"
61+
echo MODEL_VERSION = "${MODEL_VERSION}"
62+
echo RUNTIME_VERSION = "${RUNTIME_VERSION}"
63+
echo MODEL_PREDICTION_CLASS = "${MODEL_PREDICTION_CLASS}"
64+
echo MODEL_PYTHON_VERSION = "${MODEL_PYTHON_VERSION}"
65+
echo MODEL_PACKAGE_URIS = "${MODEL_PACKAGE_URIS}"
66+
67+
68+
# create model
69+
modelname=$(gcloud ai-platform models list | grep -w "$MODEL_NAME")
70+
echo "$modelname"
71+
if [ -z "$modelname" ]; then
72+
echo "Creating model $MODEL_NAME in region $REGION"
73+
74+
gcloud ai-platform models create ${MODEL_NAME} \
75+
--regions ${MODEL_REGION}
76+
else
77+
echo "Model $MODEL_NAME already exists"
78+
fi
79+
80+
# create version with custom prediction routine (beta)
81+
echo "Creating version $MODEL_VERSION from $MODEL_PATH"
82+
gcloud beta ai-platform versions create ${MODEL_VERSION} \
83+
--model ${MODEL_NAME} \
84+
--origin ${MODEL_PATH} \
85+
--python-version ${MODEL_PYTHON_VERSION} \
86+
--runtime-version ${RUNTIME_VERSION} \
87+
--package-uris ${MODEL_PACKAGE_URIS} \
88+
--prediction-class ${MODEL_PREDICTION_CLASS}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
ARG BASE_IMAGE_TAG=1.12.0-py3
2+
FROM tensorflow/tensorflow:$BASE_IMAGE_TAG
3+
RUN python3 -m pip install keras
4+
COPY ./src /pipelines/component/src

0 commit comments

Comments
 (0)