Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fix “DMatrix/Booster has not been initialized or has already been disposed. is:closed” #400

Open
invain1218 opened this issue Dec 5, 2024 · 6 comments

Comments

@invain1218
Copy link

invain1218 commented Dec 5, 2024

Description

src/c_api/c_api.cc:675: DMatrix/Booster has not been initialized or has already been disposed. This happened PipeOp surv.xgboost.cox's $predict()

Reproducible example

I trained an xgboost.cox for mlr3prob and mlr3extralearners and I saved it (RDS). Then I copied it to my another system to predict new data. But error comes up:

src/c_api/c_api.cc:675: DMatrix/Booster has not been initialized or has already been disposed. This happened PipeOp surv.xgboost.cox's $predict()

Thanks for help!!! 😊

I used R-4.2.3 and mlrextralearner 1.0.0 and the version of mlr3prob is 0.6.9

Like this

library(mlr3)
library(mlrproba)
library(mlr3verse)
library(mlr3extralearners)
task = as_task("lung")
model = lrn("surv.xgboost.cox")$train(task)
saveRDS(model,"model.rds")

=============================
another file

model = readRDS("model.rds")
model$predict(task_new)

Error in dim.xgb.DMatrix(x) :
[17:45:16] src/c_api/c_api.cc:675: DMatrix/Booster has not been initialized or has already been disposed.
This happened PipeOp surv.xgboost.cox's $predict()
Additionally: Warning message:
In warn_deprecated("Learner$data_formats") :
Learner$data_formats is deprecated and will be removed in the future.

@sebffischer
Copy link
Member

sebffischer commented Dec 5, 2024

Thanks for this bug report!
This bug is caused because XGBoost models cannot simply be saved and loaded.
In principle we have a general solution for this available, but it has not been implemented for XGBoost.
We will do this soon!

For a workaround, I think you can do:

library(mlr3)
library(mlrproba)
library(mlr3verse)
library(mlr3extralearners)
library(xgboost)
task = as_task("lung")
learner = lrn("surv.xgboost.cox")$train(task)
saveRDS(learner,"learner.rds")
xgb.save(learner$model$model, "xgb.model")

and then:

learner = readRDS("learner.rds")
model = xgb.load("xgb.model")
learner$model$model = model

let me know whether this works for you

@invain1218
Copy link
Author

invain1218 commented Dec 6, 2024

Thanks! But:

library(mlr3)
library(mlr3proba)
library(mlr3verse)
library(mlr3extralearners)
library(xgboost)
library(dplyr)
lung_filter = lung |> select(-sex,-ph.ecog)
task = TaskSurv$new(id="lung",backend = lung_filter,time = "time",event = "status")
learner = lrn("surv.xgboost.cox")$train(task)
saveRDS(task,"task.rds")
saveRDS(learner,"learner.rds")
xgb.save(learner$model$model, "xgb.model")
# [1] TRUE
learner$predict_newdata(lung_filter,task)
# <PredictionSurv> for 228 observations:
#   row_ids time status    crank       lp     distr
# 1  306   TRUE 66.05680 66.05680 <list[1]>
#   2  455   TRUE 65.66539 65.66539 <list[1]>
#   3 1010  FALSE 58.95570 58.95570 <list[1]>
#   ---                                                
#   226  105  FALSE 58.55264 58.55264 <list[1]>
#   227  174  FALSE 63.55852 63.55852 <list[1]>
#   228  177  FALSE 59.69077 59.69077 <list[1]>
learner$predict(task)
# <PredictionSurv> for 228 observations:
#   row_ids time status    crank       lp     distr
# 1  306   TRUE 66.05680 66.05680 <list[1]>
#   2  455   TRUE 65.66539 65.66539 <list[1]>
#   3 1010  FALSE 58.95570 58.95570 <list[1]>
#   ---                                                
#   226  105  FALSE 58.55264 58.55264 <list[1]>
#   227  174  FALSE 63.55852 63.55852 <list[1]>
#   228  177  FALSE 59.69077 59.69077 <list[1]>

rm(list=ls())
lung_filter = lung |> select(-sex,-ph.ecog)
task2 = readRDS("task.rds")
learner2 = readRDS("learner.rds")
learner2$model$model
# ##### xgb.Booster
# Handle is invalid! Suggest using xgb.Booster.complete
# raw: 1.2 Mb 
# call:
#   xgboost::xgb.train(data = data, nrounds = 1000L, verbose = 0L, 
#                      nthread = 1L, objective = "survival:cox", eval_metric = "cox-nloglik")
# params (as set within xgb.train):
#   nthread = "1", objective = "survival:cox", eval_metric = "cox-nloglik", validate_parameters = "TRUE"
# # of features: 6 
# niter: 1000
# nfeatures : 6 
model = xgb.load("xgb.model")
model
# ##### xgb.Booster
# raw: 1.2 Mb 
# xgb.attributes:
#   niter
# niter: 999
learner2$model$model = model
learner2$model$model
# ##### xgb.Booster
# raw: 1.2 Mb 
# xgb.attributes:
#   niter
# niter: 999
learner2$predict_newdata(lung_filter,task2)
# Error in predict.xgb.Booster(model, newdata = train_data, , objective = "survival:cox") : 
#   [09:40:13] src/c_api/c_api.cc:916: DMatrix has not been initialized or has already been disposed.
learner2$predict(task2)
# Error in predict.xgb.Booster(model, newdata = train_data, , objective = "survival:cox") : 
#   [09:41:20] src/c_api/c_api.cc:916: DMatrix has not been initialized or has already been disposed.

Packages version is

[1] dplyr_1.1.4 xgboost_1.7.7.1 mlr3proba_0.6.9 mlr3verse_0.3.0 mlr3tuning_1.0.0 paradox_1.0.1
[7] survival_3.7-0 mlr3extralearners_1.0.0 mlr3_0.22.1

@sebffischer
Copy link
Member

Ok, we need to do the same for the $train_data:

library(survival)
library(mlr3)
library(mlr3proba)
library(mlr3verse)
library(mlr3extralearners)
library(xgboost)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:xgboost':
#> 
#>     slice
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
lung_filter = lung |> select(-sex,-ph.ecog)
task = TaskSurv$new(id="lung",backend = lung_filter,time = "time",event = "status")
learner = lrn("surv.xgboost.cox")$train(task)
saveRDS(task,"task.rds")
saveRDS(learner,"learner.rds")
xgb.save(learner$model$model, "xgb.model")
#> [1] TRUE
xgb.DMatrix.save(learner$model$train_data, "xgb.data")
#> [1] TRUE

learner = readRDS("learner.rds")
learner$model$model = xgb.load("xgb.model")
learner$model$train_data = xgb.DMatrix("xgb.data")
#> [07:24:06] 228x6 matrix with 1302 entries loaded from xgb.data
# [1] TRUE
learner$predict_newdata(lung_filter,task)
#> <PredictionSurv> for 228 observations:
#>  row_ids time status    crank       lp     distr
#>        1  306   TRUE 65.82970 65.82970 <list[1]>
#>        2  455   TRUE 65.80392 65.80392 <list[1]>
#>        3 1010  FALSE 59.49477 59.49477 <list[1]>
#>      ---  ---    ---      ---      ---       ---
#>      226  105  FALSE 57.72884 57.72884 <list[1]>
#>      227  174  FALSE 63.40212 63.40212 <list[1]>
#>      228  177  FALSE 61.15349 61.15349 <list[1]>

Created on 2024-12-06 with reprex v2.1.1

@sebffischer
Copy link
Member

In fact, you only need to do it for the $train_data:

library(survival)
library(mlr3)
library(mlr3proba)
library(mlr3verse)
library(mlr3extralearners)
library(xgboost)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:xgboost':
#> 
#>     slice
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
lung_filter = lung |> select(-sex,-ph.ecog)
task = TaskSurv$new(id="lung",backend = lung_filter,time = "time",event = "status")
learner = lrn("surv.xgboost.cox")$train(task)
saveRDS(task,"task.rds")
saveRDS(learner,"learner.rds")
xgb.save(learner$model$model, "xgb.model")
#> [1] TRUE
xgb.DMatrix.save(learner$model$train_data, "xgb.data")
#> [1] TRUE

learner = readRDS("learner.rds")
learner$model$train_data = xgb.DMatrix("xgb.data")
#> [08:07:24] 228x6 matrix with 1302 entries loaded from xgb.data
# [1] TRUE
learner$predict_newdata(lung_filter,task)
#> <PredictionSurv> for 228 observations:
#>  row_ids time status    crank       lp     distr
#>        1  306   TRUE 65.82970 65.82970 <list[1]>
#>        2  455   TRUE 65.80392 65.80392 <list[1]>
#>        3 1010  FALSE 59.49477 59.49477 <list[1]>
#>      ---  ---    ---      ---      ---       ---
#>      226  105  FALSE 57.72884 57.72884 <list[1]>
#>      227  174  FALSE 63.40212 63.40212 <list[1]>
#>      228  177  FALSE 61.15349 61.15349 <list[1]>

Created on 2024-12-06 with reprex v2.1.1

@invain1218
Copy link
Author

Oh, it works!!!! Thank you so much for your help—I really appreciate it! 😊

@sebffischer sebffischer reopened this Dec 12, 2024
@sebffischer
Copy link
Member

we should fix this with marshalling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants