data description is here
- Run preprocessing.py to complete the feature construction, combination and merge. Each code snippet has been commented.
- Run model.py to evaluate feature importance.
- Features are priority.
- It is always necessary to generate some new features based on data at hand. The principle to construct these new features is from either business insights or industrial characteristics. For instance, retail industry is all about people, shop and product. The logic to make new features depends on what kind of object that the model is about to evaluate. If the model needs to predict sales, every record should be product-centric in the dataframe. If the model needs to predict re-purchase propensity, every record should be customer-centric in the dataframe.
- Add operation is for feature preparation, while subtraction operation is for modeling. In other words, we should discover as much more valuable features as possible at the first step. Then, analyzing feature importance and taking feature selection or feature reduction to remove less useful features.