Add the feature contribution argument output as an option at predict #39

Fish-Soup · 2024-09-24T08:55:04Z

Add the option to call lightGBM.Booster.predict(...., pred_contrib=True)
This generates an output with the number of columns = the number distribution arguments * (number of features + 1).

Output is converted to a multi-index column of two levels, distribution args and feature contributions (and Constant)

Unit test added for pred_contributions to test if when you sum up all contributions and apply response function you get the same result as when predicting the parameters.

I also noticed that when predicting sampling is always applied even if we are returning an output that does not require sampling. This must make predictions a little slower on larger data sets. As such I moved the sampling code so it only gets called when required.

StatMixedML · 2024-09-25T15:25:11Z

Thanks for opening the PR and for your interest in the proect, very much appreciated!

I`d need some time, though, to look into it in detail.

May I ask you to also give an example of how to use and interpret it. That would help, thanks!

Fish-Soup · 2024-09-26T10:10:28Z

Hi I added an example in the examples section. There is lots more you can use the output for. At a very high level it provides SHAP like information but directly from lightGBM's internal calculations. When a distribution_arg is used we can also use it to get the actual contribution to the final parameter value

…ument

Fish-Soup · 2024-10-03T09:06:34Z

I've added a little code to give the pandas columns a level name based on the pred_type argument. This is helpful when doing pandas operations like stack.

For example when pred_type="quantiles", the pandas output columns will have name "quantiles". This means we can

pred_samples.stack("quantiles") to create a multi index series.

Ive also changed the names for the multi-index with pred_type="contributions" to ["parameters", "feature_contributions"] from ["distribution_args", "FeatureContributions"] to alligh with the pred_type naming convention

StatMixedML · 2024-10-08T10:40:39Z

Thanks for your changses. I am currently occupied with the Hyper-Tree paper, so please do expect some delay in my review.

Fish-Soup · 2024-10-18T08:21:26Z

Is there anything I can do to help speed it along? The PR is unit tested and is essentially just passing arguments to lightgbm.booster then doing some reshaping of the output.

SimonRobertPike added 2 commits September 23, 2024 15:00

updates for predict contributions

6f24901

update the test to check all response functions

023211d

Fish-Soup mentioned this pull request Sep 24, 2024

Support pred_contrib argument at predict #38

Open

add an example

98db010

SimonRobertPike added 3 commits September 26, 2024 11:34

add an example

9877944

change name of multi index to parameters to allign with pred_type arg…

b393dce

…ument

give columns level names for easier pandas manipulations

f1780b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the feature contribution argument output as an option at predict #39

Add the feature contribution argument output as an option at predict #39

Fish-Soup commented Sep 24, 2024

StatMixedML commented Sep 25, 2024 •

edited

Loading

Fish-Soup commented Sep 26, 2024

Fish-Soup commented Oct 3, 2024 •

edited

Loading

StatMixedML commented Oct 8, 2024

Fish-Soup commented Oct 18, 2024

Add the feature contribution argument output as an option at predict #39

Are you sure you want to change the base?

Add the feature contribution argument output as an option at predict #39

Conversation

Fish-Soup commented Sep 24, 2024

StatMixedML commented Sep 25, 2024 • edited Loading

Fish-Soup commented Sep 26, 2024

Fish-Soup commented Oct 3, 2024 • edited Loading

StatMixedML commented Oct 8, 2024

Fish-Soup commented Oct 18, 2024

StatMixedML commented Sep 25, 2024 •

edited

Loading

Fish-Soup commented Oct 3, 2024 •

edited

Loading