Automate the training process by duchoang · Pull Request #27 · muvr/muvr-analytics

duchoang · 2015-11-11T09:29:19Z

Note: this PR should be merged first: #37

In the training python program, before training, divide the training data properly, X% of each label is used for training
Generate the dataset statistic and the accuracy of each exercise
Given set of different shape of model (defined in yaml format), run the training process sequentially and give the result
Ready for review

To run the training with models in yaml format models/exercise/exercise-cnn.yaml, models/exercise/exercise-mlp-1.yaml, models/exercise/exercise-mlp-2.yaml:

./run_training.sh -d dataset -o output -m basic -y models/exercise

The script will run the training process sequentially on each model and generate the result in:

1. output/exercise-cnn/*
2. output/exercise-mlp-1/*
3. output/exercise-mlp-2/*

tmbo · 2015-11-16T11:18:40Z

mlp/training/examples.py

Please use a logger instead.

tmbo · 2015-11-16T16:18:22Z

What are you trying to do with the label mapping? There is already a label mapping implementation in the AccelerationDataset. You can use it with something like this:

exercises = ["biceps curls (left)", "biceps-curl", "bice", "arms/biceps-curl", "BC ", "bicep curls", "bicep",
            "arms/lateral-raise", "lateral raises", "lateral", "LR", "LR "
             "triceps-extension ", "tc ", "TE", "tc"
            ]

slacking = ["", "walking"]

def label_mapper(label):
    if label in slacking:
        return "-"
    elif label in exercises:
        return "E"
    else:
        return None

dataset = CSVAccelerationDataset('datasets/Exercise', label_mapper=label_mapper)

duchoang · 2015-11-16T17:29:57Z

mlp/preprocess_data.py

@tomstockton actually the load_data() here is slightly the same with the method AccelerationDataset.load_examples().

In load_examples it returns the big object ExampleColl which contain all data.

But in here i want to generate the set of ExampleColl, each of them will correspond to only one same label. So in this case the size will be 4 (bicep, tricep, lateral, non-exercise).

Then in the next step, I reuse the method ExampleColl.split() to divide the data for that label.

In addition, depend on the slacking model or exercise model, we will generate the dataset differently

For slacking model, convert all different exercises to just "exercise"

For exercise model, remove the ExampleColl with label non-exercise

Alright, let me rephrase: Why do you want to clone the dataset with your labels instead of doing the mapping on the fly?

Ok I got your point, at first I intend to separate 2 task::

1/ Map label, divide dataset:

./prepare_dataset.sh -d dataset -o output/ -r 80 -s

2/ Run training on folder output:

./run_training.sh -d output/train -t output/test/ -m -slacking

Then I want to take some trained input (in folder output/train/*.csv), put into the playground and see the classification result.

So do you think should we merge 2 steps? (maybe I only use this script locally for experiment)

In the future we will probably not split the dataset anymore. Instead we will curate a test & validate dataset with the data labeled as good as possible (even the slacking data using different labels for different activities) to be able to debug where the model has weaknesses.

* develop: Fix reset error of label mapping

* develop: Fixed feature ordering

* develop: Update README.md removed csv converter finished notebook cleanup removed mlp init.py cleaned up notebooks & packages

tmbo · 2015-12-10T16:06:03Z

This PR only covers the very basic use case of training a model with a defined shape on more data. What we actually need is a way to train multiple models to evaluate them. Therefore, I still don't think this PR covers where we want to go for the following reasons:

The automation trains only a single model. What we need would be a way to define lets say 10 models and hit the start button. And evaluate them after they are all done (manually).
Model parameters scattered across code & files
No support for non MLP models, e.g. convolutions

duchoang changed the title ~~Feature/add slacking model for training~~ Add slacking model for training Nov 11, 2015

tmbo reviewed Nov 16, 2015
View reviewed changes

mlp/training/examples.py Outdated

Copy link
Copy Markdown

Contributor

tmbo Nov 16, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a logger instead.

Duc Hoang added 7 commits November 16, 2015 16:25

Fix misclassification error

2e2c92d

Add slacking model, modify run_training.sh script

73ecd62

Print statistic of dataset

422147b

Reuse scaling feature, add log in training script

e910d80

Add parameter analysis only (without training)

285ca59

Add preprocess data script (divide training dataset on each label)

d39d693

Add documentation for the training script

858301a

duchoang force-pushed the feature/add-slacking-model-for-training branch from 2bf420f to 858301a Compare November 16, 2015 16:38

Add some comment

80a02fb

duchoang reviewed Nov 16, 2015
View reviewed changes

Duc Hoang added 11 commits November 18, 2015 09:34

Ignore some exercise, use tanh activation function

f85fded

Merge branch 'develop' into feature/add-slacking-model-for-training

6b9db97

* develop: Fix reset error of label mapping

Experiment shape for small slacking dataset

d4abec5

Ignore some exercise

c5adc1b

Merge branch 'develop' into feature/add-slacking-model-for-training

d8d02f0

* develop: Fixed feature ordering

Merge branch 'develop' into feature/add-slacking-model-for-training

2a7a340

* develop: Update README.md removed csv converter finished notebook cleanup removed mlp init.py cleaned up notebooks & packages

Fix merging error

11d195d

Handle all type of exercises

7e34734

Remove the preprocess script

a625bae

Generate the accuracy of each label

585e994

Split the dataseton each label while training

26b6db2

duchoang changed the title ~~Add slacking model for training~~ Automate the training process Dec 10, 2015

tmbo added 2 commits December 14, 2015 22:16

Add save best callback

68007f3

Added supersampling & highpassfilter

42e07c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate the training process#27

Automate the training process#27
duchoang wants to merge 25 commits intodevelopfrom
feature/add-slacking-model-for-training

duchoang commented Nov 11, 2015

Uh oh!

tmbo Nov 16, 2015

Uh oh!

tmbo commented Nov 16, 2015

Uh oh!

duchoang Nov 16, 2015

Uh oh!

tmbo Nov 16, 2015

Uh oh!

duchoang Nov 16, 2015

Uh oh!

tmbo Nov 16, 2015

Uh oh!

tmbo commented Dec 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

duchoang commented Nov 11, 2015

Uh oh!

tmbo Nov 16, 2015

Choose a reason for hiding this comment

Uh oh!

tmbo commented Nov 16, 2015

Uh oh!

duchoang Nov 16, 2015

Choose a reason for hiding this comment

Uh oh!

tmbo Nov 16, 2015

Choose a reason for hiding this comment

Uh oh!

duchoang Nov 16, 2015

Choose a reason for hiding this comment

Uh oh!

tmbo Nov 16, 2015

Choose a reason for hiding this comment

Uh oh!

tmbo commented Dec 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants