Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gbt dtrees always use float instead of the real data type cause unexpected predict result. #1628

Open
cmsxbc opened this issue May 21, 2021 · 6 comments
Assignees
Labels

Comments

@cmsxbc
Copy link
Contributor

cmsxbc commented May 21, 2021

Describe the bug
I'm working on convert lightGBM model to daal gbt model. And the converted model predict different result which is much different
to that lightGBM predict.

And I have found that the gbt_dtrees model always use float as the code shows.

https://github.com/oneapi-src/oneDAL/blob/c6c5219c85e5bceb0392e54e653e00e8cc45e21f/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i#L44

If i changed the typedef to typedef double ModelFPType, the result would be just as expected.

To Reproduce
Steps to reproduce the behavior:

  1. use example model from lightgbm lightgbm simple_example, and apply this change for generate a 200 iterations model to make error more easy to be observed.
index 9af83008..debb339c 100644
--- a/examples/python-guide/simple_example.py                                                                          
+++ b/examples/python-guide/simple_example.py
@@ -35,9 +35,10 @@ print('Starting training...')
 # train                                                   
 gbm = lgb.train(params,                                   
                 lgb_train,                                
-                num_boost_round=20,
+                num_boost_round=340,
                 valid_sets=lgb_eval,
-                early_stopping_rounds=5)
+                verbose_eval=False,
+                early_stopping_rounds=100)
  1. convert the model and predict
d4p_model = daal4py.get_gbt_model_from_lightgbm(gbm)
d4p_y_pred = daal4py.gbt_regression_prediction().compute(X_test, d4p_model).prediction.reshape(-1)
print("The rmse of daal4py's prediction is:", mean_squared_error(y_test, d4p_y_pred))

print('are preds of daal4py and lightGBM equal:', (d4p_y_pred == y_pred).all())
print('The rmse of daal4py vs lightGBM is:', mean_squared_error(d4p_y_pred, y_pred) ** 0.5)
  1. the release package will see two pred result is not equal.

  2. apply the patch

index 6b4ffc3af..16aac9b18 100644
--- a/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
+++ b/cpp/daal/src/algorithms/dtrees/gbt/gbt_predict_dense_default_impl.i
@@ -41,7 +41,7 @@ namespace prediction
 {
 namespace internal
 {
-typedef float ModelFPType;
+typedef double ModelFPType;
 typedef uint32_t FeatureIndexType;
 const FeatureIndexType VECTOR_BLOCK_SIZE = 64;
  1. all works well.

Expected behavior
Use real data type, or the reason why the float is the one~

Output/Screenshots
Up one is the unexpected result, and bottom is what I recompiled with typedef double ModelFPType.

2021-05-22-012202_1885x913_scrot

My patch:
2021-05-22-011158_1896x549_scrot

Environment:

  • OS: ArchLInux
  • Compiler: gcc 11.1.0
  • Version: 2021.2.2
@ShvetsKS
Copy link
Contributor

@cmsxbc thanks for reporting this bug! Sorry for long response!
could you share the actual mean_squared_error difference for float vs double usage?
currently we are working on performance affection check.

@cmsxbc
Copy link
Contributor Author

cmsxbc commented May 28, 2021

@ShvetsKS Sorry for my poor English.... I'm not sure what's real meaning for "actual mean_squared_error difference for float vs double usage"

For the model , which is from lightGBM example, I mentioned before, the mean_squared_error(daal4py_prediction_float, lightGBM_prediction) is 3.367366706112064e-06 using float. And when change to double, mean_squared_error(daal4py_prediction_double, lightGBM_prediction) is just 0.

And mean_squared_error(daal4py_prediction_float, daal4py_prediction_double) is also 3.367366706112064e-06 which is the result of mean_squared_error(daal4py_prediction_float, lightGBM_prediction)

Actually, I found this error from a model built with thousands of features and thousands of iterations.
But I'm afraid I can't provide actual value because of "Security policy for data" of my company.

What I can provide is that sqrt(mean_squared_error(d4p_prediction_float, lightGBM_prediction)) / lightGBM_prediction is approximately equal to 0.2 , and d4p_prediction_double is absolutely equal to lightGBM_prediction.
And we can tolerate sqrt(mean_squared_error(d4p_prediction, lightGBM_prediction)) / lightGBM_prediction < 1e-9

@cmsxbc
Copy link
Contributor Author

cmsxbc commented Jul 15, 2021

@ShvetsKS Sorry for bothering, is there any progress? T_T

@ShvetsKS
Copy link
Contributor

@cmsxbc currently we are working on appropriate solution to save support of "old" trained models with float fields. Sorry for slow fix delivery.

@cmsxbc
Copy link
Contributor Author

cmsxbc commented Oct 6, 2021

@ShvetsKS Sorry again, but it have took almost another season.... T_T

@napetrov napetrov added the bug label Feb 2, 2023
@razdoburdin
Copy link
Contributor

Hi all.

I reproduced the behavior with the latest version of the library. The accuracy difference has now decreased to approximately 2e-8 compared to the initially reported ~2e-3. We plan to close this issue, as the degradation no longer appears significant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants