Nastaran's assignment: investigate data imbalance

Nastaran would like for us to explore approaches to working with imbalanced datasets, like anomaly detection. She mentioned boosting methods (XBoost, Adaboost, LightGBM), and assemble models.

There was an idea to use `one_class_SVM` in `sklearn` to learn the distribution of the features in the majority class, but it doesn't really apply to our data because our features are mostly categorical. Tom also had the idea to try to plot the histogram of labels per feature, and rearrange it so that the histogram starts looking like a normal distribution. Then, we could try out this approach on our transformed data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nastaran's assignment: investigate data imbalance #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nastaran's assignment: investigate data imbalance #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions