Conversation
mlp/training/examples.py
Outdated
|
What are you trying to do with the label mapping? There is already a label mapping implementation in the exercises = ["biceps curls (left)", "biceps-curl", "bice", "arms/biceps-curl", "BC ", "bicep curls", "bicep",
"arms/lateral-raise", "lateral raises", "lateral", "LR", "LR "
"triceps-extension ", "tc ", "TE", "tc"
]
slacking = ["", "walking"]
def label_mapper(label):
if label in slacking:
return "-"
elif label in exercises:
return "E"
else:
return None
dataset = CSVAccelerationDataset('datasets/Exercise', label_mapper=label_mapper) |
2bf420f to
858301a
Compare
mlp/preprocess_data.py
Outdated
There was a problem hiding this comment.
@tomstockton actually the load_data() here is slightly the same with the method AccelerationDataset.load_examples().
- In
load_examplesit returns the big objectExampleCollwhich contain all data. - But in here i want to generate the set of
ExampleColl, each of them will correspond to only one same label. So in this case the size will be 4 (bicep, tricep, lateral, non-exercise). - Then in the next step, I reuse the method
ExampleColl.split()to divide the data for that label.
In addition, depend on the slacking model or exercise model, we will generate the dataset differently
- For slacking model, convert all different exercises to just "exercise"
- For exercise model, remove the ExampleColl with label non-exercise
There was a problem hiding this comment.
Alright, let me rephrase: Why do you want to clone the dataset with your labels instead of doing the mapping on the fly?
There was a problem hiding this comment.
Ok I got your point, at first I intend to separate 2 task::
1/ Map label, divide dataset:
./prepare_dataset.sh -d dataset -o output/ -r 80 -s2/ Run training on folder output:
./run_training.sh -d output/train -t output/test/ -m -slackingThen I want to take some trained input (in folder output/train/*.csv), put into the playground and see the classification result.
So do you think should we merge 2 steps? (maybe I only use this script locally for experiment)
There was a problem hiding this comment.
In the future we will probably not split the dataset anymore. Instead we will curate a test & validate dataset with the data labeled as good as possible (even the slacking data using different labels for different activities) to be able to debug where the model has weaknesses.
* develop: Fix reset error of label mapping
* develop: Fixed feature ordering
* develop: Update README.md removed csv converter finished notebook cleanup removed mlp init.py cleaned up notebooks & packages
|
This PR only covers the very basic use case of training a model with a defined shape on more data. What we actually need is a way to train multiple models to evaluate them. Therefore, I still don't think this PR covers where we want to go for the following reasons:
|
Note: this PR should be merged first: #37
X%of each label is used for trainingTo run the training with models in yaml format
models/exercise/exercise-cnn.yaml,models/exercise/exercise-mlp-1.yaml,models/exercise/exercise-mlp-2.yaml:./run_training.sh -d dataset -o output -m basic -y models/exerciseThe script will run the training process sequentially on each model and generate the result in: