Skip to content

Automate the training process#27

Open
duchoang wants to merge 25 commits intodevelopfrom
feature/add-slacking-model-for-training
Open

Automate the training process#27
duchoang wants to merge 25 commits intodevelopfrom
feature/add-slacking-model-for-training

Conversation

@duchoang
Copy link
Copy Markdown
Contributor

Note: this PR should be merged first: #37

  • In the training python program, before training, divide the training data properly, X% of each label is used for training
  • Generate the dataset statistic and the accuracy of each exercise
  • Given set of different shape of model (defined in yaml format), run the training process sequentially and give the result
  • Ready for review

To run the training with models in yaml format models/exercise/exercise-cnn.yaml, models/exercise/exercise-mlp-1.yaml, models/exercise/exercise-mlp-2.yaml:

./run_training.sh -d dataset -o output -m basic -y models/exercise

The script will run the training process sequentially on each model and generate the result in:

1. output/exercise-cnn/*
2. output/exercise-mlp-1/*
3. output/exercise-mlp-2/*

@duchoang duchoang changed the title Feature/add slacking model for training Add slacking model for training Nov 11, 2015
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a logger instead.

@tmbo
Copy link
Copy Markdown
Contributor

tmbo commented Nov 16, 2015

What are you trying to do with the label mapping? There is already a label mapping implementation in the AccelerationDataset. You can use it with something like this:

exercises = ["biceps curls (left)", "biceps-curl", "bice", "arms/biceps-curl", "BC ", "bicep curls", "bicep",
            "arms/lateral-raise", "lateral raises", "lateral", "LR", "LR "
             "triceps-extension ", "tc ", "TE", "tc"
            ]

slacking = ["", "walking"]

def label_mapper(label):
    if label in slacking:
        return "-"
    elif label in exercises:
        return "E"
    else:
        return None

dataset = CSVAccelerationDataset('datasets/Exercise', label_mapper=label_mapper)

@duchoang duchoang force-pushed the feature/add-slacking-model-for-training branch from 2bf420f to 858301a Compare November 16, 2015 16:38
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomstockton actually the load_data() here is slightly the same with the method AccelerationDataset.load_examples().

  • In load_examples it returns the big object ExampleColl which contain all data.
  • But in here i want to generate the set of ExampleColl, each of them will correspond to only one same label. So in this case the size will be 4 (bicep, tricep, lateral, non-exercise).
  • Then in the next step, I reuse the method ExampleColl.split() to divide the data for that label.

In addition, depend on the slacking model or exercise model, we will generate the dataset differently

  1. For slacking model, convert all different exercises to just "exercise"
  2. For exercise model, remove the ExampleColl with label non-exercise

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let me rephrase: Why do you want to clone the dataset with your labels instead of doing the mapping on the fly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I got your point, at first I intend to separate 2 task::

1/ Map label, divide dataset:

./prepare_dataset.sh -d dataset -o output/ -r 80 -s

2/ Run training on folder output:

./run_training.sh -d output/train -t output/test/ -m -slacking

Then I want to take some trained input (in folder output/train/*.csv), put into the playground and see the classification result.

So do you think should we merge 2 steps? (maybe I only use this script locally for experiment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we will probably not split the dataset anymore. Instead we will curate a test & validate dataset with the data labeled as good as possible (even the slacking data using different labels for different activities) to be able to debug where the model has weaknesses.

@duchoang duchoang changed the title Add slacking model for training Automate the training process Dec 10, 2015
@tmbo
Copy link
Copy Markdown
Contributor

tmbo commented Dec 10, 2015

This PR only covers the very basic use case of training a model with a defined shape on more data. What we actually need is a way to train multiple models to evaluate them. Therefore, I still don't think this PR covers where we want to go for the following reasons:

  • The automation trains only a single model. What we need would be a way to define lets say 10 models and hit the start button. And evaluate them after they are all done (manually).
  • Model parameters scattered across code & files
  • No support for non MLP models, e.g. convolutions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants