Add functionality: allow sample_posterior_predictive to test MMM performance with test data #1268
nheusch-se
started this conversation in
Ideas
Replies: 1 comment 9 replies
-
Hi, the Though I will create an issue for us to have a better clarify this and work with the Let me know if I didn't answer your question! |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi team,
since this is my first post, just let me say upfront: I love working with pymc-marketing; you've done an amazing job- it's a great package!
I've been trying to look at the predictive performance of an estimated MMM on a test set. In other words - after estimating the model on training data (spends/covariates and sales for 2020-2023), how good are the model's predicted sales for 2024 (when I give it the spends/covariates for 2024)? I know this is not necessarily the goal of causal modelling (after all, I'm after the "true" parameters for adstock and saturation), but in practice decent predictive performance can also be helpful to convince users of the MMM.
BaseMMM.sample_posterior_predictive already provides this functionality: to generate the predicted y's for my training period (2020-2023), I can use
y_train_pred = mmm.sample_posterior_predictive(X_pred=X_train, extend_idata=True, include_last_observations=True)
This is also very handy, because it adds the posterior_predictive to the idata object. That's useful because I save the model and with it the idata object (and can hence retrieve everything again later on).
Now the problem: to generate the predicted y's for my test period (2024), I can use
y_test_pred = mmm.sample_posterior_predictive(X_pred=X_test, extend_idata=True, include_last_observations=True)
However, this overwrites the posterior predictive (for the training period) in the idata object. Hence, when saving my model - I now have to choose whether I want the posterior predictions for either the training or the test period in my idata object. The alternative is to use
extend_idata=False
; then my saved model has the data from posterior predictive for the training data, but nothing for the test period at all.This is a little annoying, because pymc actually allows
pm.sample_posterior_predictive(... , predictions=True)
. In this case, the test data (X's for the test period and the posterior predictive) get added to the idata object, without overwriting anything, as idata.predictions and idata.predictions_constant_data. I could now save my model, and would have everything included.It would be great to allow
mmm.sample_posterior_predictive()
to use the optionpredictions=True
. Currently, I can already pass it as**sample_posterior_predictive_kwargs
, which then gets passed topymc.sample_posterior_predictive
. However, this leads to an error, because when passingpredictions=True
tommm.sample_posterior_predictive()
, it still tries to extract the "posterior_predictive" from idata. However, in this case it should extract idata.predictions. To get the correct behaviour, predictions=True would hence need to be a 'native' parameter of mmm.sample_posterior_predictive, which then also leads to the extraction of "predictions" from idata.I've gotten around this with some awkward monkey patching (creating a new function sample_posterior_predictive_test; sorry, it's a little verbose and would have been easier to just change the function sample_posterior_predictive to natively accept the predictions=True parameter, but this was my first go), but it would be great to add this as a native feature of pymc-marketing. I'm happy to help - I hope this description makes sense to you in the first place.
To get - later on - the posterior predictive in the original scale, I can use:
Beta Was this translation helpful? Give feedback.
All reactions