Skip to content

Nastaran's assignment: investigate data imbalance #29

@johannadevos

Description

@johannadevos

Nastaran would like for us to explore approaches to working with imbalanced datasets, like anomaly detection. She mentioned boosting methods (XBoost, Adaboost, LightGBM), and assemble models.

There was an idea to use one_class_SVM in sklearn to learn the distribution of the features in the majority class, but it doesn't really apply to our data because our features are mostly categorical. Tom also had the idea to try to plot the histogram of labels per feature, and rearrange it so that the histogram starts looking like a normal distribution. Then, we could try out this approach on our transformed data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions