-
Notifications
You must be signed in to change notification settings - Fork 81
Long term prediction
A simple long term predictive LightGBM model can be found in this notebook. The model was trained with one year data (2016) in order to predict the following year (2017).
Based on the exploratory data analysis a simple feature engineering was performed. Based on EDA of meter readings:
- Healthcare, Food sales and services and Utility usages shows the highest meter reading values.
- Hotwater meter shows the highest meter reading values.
- Monthly behaviour (meter-reading median) shows higher readings in warm season.
- Hourly behaviour (meter-reading median) shows higher values from 6 to 19 hs.
- Weekday behaviour: lowers during weekends.
In the following section can be found the features selected, transformed and created.
the following features were selected from each data set:
-
Building metadata
- Building ID*
- Site ID*
- Primary space usage
- Building size (sqm)
-
Weather data
- Timestamp*
- Site ID*
- Air temperature
-
Meter reading data
- Timestamp*
- Building ID*
- meter
- meter reading (target)
The following features were transformed:
-
primaryspaceusage
categories (16) were reduced to food sales and services, healthcare, food sales and services, utility and other -
meter
categories (8) were preserved
The following features were created:
- month
- day of the week
- hour of the day
- Timestamp*
- Site ID
- Building ID
- Month
- Hour
- Day of the week
- Usage (4 levels: healthcare, food, utility, other)
- Building size (sqm)
- Air temperature
- Meter (8 levels)
- Meter reading / target
Parameters for this model were not tuned, but were manually modified to perform better than default.
- "objective": "regression"
- "metric": "rmse"
- "random_state": 55
- "learning_rate": 0.01, (default 0.1)
- "max_bin": 761 (default 255)
- "num_leaves": 2197 (default 31)
Performance, as expected, was poor for this model. It can be used as baseline for more complex models.
Figure 1: meter_reading
real values and predicted with long-term model v. timestamp
.
Figure 2: meter_reading
predicted with long-term model v. real values.
meter/metric | RMSE | RMSLE | CVRMSE | MBE | R2 |
---|---|---|---|---|---|
all | 55322.5199 | 4.954 | 507.2326 | -12.0286 | 0.7159 |
electricity | 3176.3816 | 4.7 | 2315.3038 | -2311.0688 | -158.8472 |
water | 3615.7081 | 6.248 | 926.2795 | -800.8299 | -6.9786 |
chilledwater | 110294.371 | 4.4007 | 238.3562 | 12.6059 | 0.7745 |
hotwater | 69321.352 | 5.4857 | 167.1094 | 12.054 | 0.594 |
gas | 3326.863 | 6.5313 | 595.6342 | -544.586 | -1.5012 |
steam | 70529.2322 | 4.6466 | 14962.0129 | -1114.1194 | -2090.692 |
solar | 3295.8814 | 7.1066 | 13486.3753 | -13482.7474 | -3114.5355 |
irrigation | 3419.1858 | 7.7316 | 1413.337 | -1272.5739 | -4.1593 |
Table 1: metrics for the long-term model, calculated for all meters alltogether and for each one.