Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upcoming Insights Package #357

Closed
6 tasks done
Scienfitz opened this issue Aug 30, 2024 · 9 comments
Closed
6 tasks done

Upcoming Insights Package #357

Scienfitz opened this issue Aug 30, 2024 · 9 comments
Assignees
Labels
new feature New functionality

Comments

@Scienfitz
Copy link
Collaborator

Scienfitz commented Aug 30, 2024

Update and ToDo's 22.11.2024

  • We would like to name the entire new package insights instead of diagnostics. At the heart of the new Insight class would be the surrogate model as attribute. I.e. one would initialize it via the constructor like Insight(my_surrogate_model) or Insight(campaign.get_surrogate())
  • Since the user often has either a campaign or recommender we would offer the Insight.from_campaign() and Insight.from_recommender() methods which are just wrappers to extract the surrogate model. Check out this design pattern used somewhere else e.g. here
  • The SHAP part of it would become a class SHAPInsight derived from the base class Insight. It would implement all logic to initialize and extract shap correctly based on the surrogate model. We already have another class FitInsight in mind, providing traditional goodness of fit metrics or OutlierInsight - for the shap implementations we dont have to deal with those, just saying to provide context for this design choice
  • The shap class should additionally be configurable which shap exlainer is used (with the default being the kernel explainer I guess?)
  • The shap class would also have one single .plot method. It would accept any of the valid plotters from the shap library with a default choice set. From the current implementation it seems that most plots would work out of the box so that the shap plotter object can simply be called. Just for one plotter type, some additional work had to be done. This could be solved via singledispatchmethod. The main point is that SHAPInsight has just one .plot(plot_type) method which executes the correct thing based on the requested plotter type
  • not entirely sure, but with the new structure pehaps the explainer and explanation structures become obsolete and somewaht absorbed into the SHAPInsight class, make sure to check that

Original 30.06.2024: Opening this issue to collect and plan around the upcoming diagnostics package.

After #355 is merged, the last fitted surrogate model will be available. Since we deal with bayesian models our model exposes a posterior method. Applying .mean turns this into a model essentially comparable to standard predictive models. Our last refactoring ensured that the input to that model can be in experimental representation, i.e. the same as accepted by add_measurements, i.e. it can include the unencoded labels etc.

Preliminary Example
@AdrianSosic already shared how to utilize this in the PR, for easier access I will copy the crucial part of his example here:

# Assuming we already have a campaign created and measurements added
data = campaign.measurements[[p.name for p in campaign.parameters]]
model = lambda x: campaign.get_surrogate().posterior(x).mean

# Apply SHAP
explainer = shap.Explainer(model, data)
shap_values = explainer(data)
shap.plots.bar(shap_values)

@brandon-holt @Alex6022 tagging you here so you have this simple example to go already after #355 is merged. Questions or feedback on this application should ideally be collected here and not in the PR.

Turning this into a diagnostics package
Since the very start of this package we had requests for diagnostics, SHAP is one of them, but more traditional users might also want to look at traditional metrics such as goodness of fit etc etc.

Essentially we are now proposing to turn the above code into a subpackag that can be used like this:

from baybe.diagnostics import FeatureImportance
from shap.explainer import PermutationExplainer

FeatureImportance(my_campaign, explainer=PermutationExplainer) #produces same bar plot or shap results as above

Note that SHAP explainers seem to have a common interface (model, data) which means we can allow any shap explainer importable via shap.explainer and shap.explainer.other. The latter even offers non-shap methods such as LIME and MAPLE via the same interface.

Conversely, we could have things like

from baybe.diagnostics import LackOfFitTest

LackOfFitTest(my_campaign)

Open Questions
For @AdrianSosic as he is going on longer absence, it would be good to have your definitive decisions on these questions before that.

  1. Are you OK with the above structure, any alternatives you propose or issues?
  2. For instance, should the utilities already do the plots or only provide shap values or plot be an optional argument to the utility?
  3. SHould the utilities accept campaigns, recommenders or both?
  4. This requrires additional dependencies, at the very least shap but probably others for the non-shap methods (that might still come as part of the shap package see above). I would group all of those into a new optional dependency group diagnostics or do we need that more finegrained or even let if fail at runtime?
@Scienfitz Scienfitz added the new feature New functionality label Aug 30, 2024
@Alex6022
Copy link
Contributor

I really like this approach. There could be multiple reasons why decoupling the plotting from the calculation of the SHAP values might be preferred:

  1. Especially for non-tree-based models and large parameter spaces, the calculations can be quite expensive. It might be beneficial to save the SHAP values, since they cannot otherwise be serialized as part of any other object.
  2. The SHAP package provides plenty of plotting options (e.g. beeswarm plots but also dependence scatter plots for a single feature) between which the user could choose after the calculation. To improve usability, these plotting functions could also be wrapped within the diagnostics package.

@AdrianSosic
Copy link
Collaborator

Thanks, @Scienfitz, for the nice summary! I think it is very difficult to provide definitive answer to all your questions at the current point since there are still so many unclear parts, like

  • What basic questions the package shall answer? Definitely feature importance, but what else? You mentioned model fitting related things, which sounds reasonable. Other stuff?
  • Depending on how diverse these things are, should we potentially even split into different subpackages, like feature_importance, model_fit, ...?
  • How large is the overlap of the interface that different methods require? For example, I imagine that "black-box" feature analyzers likes SHAP and permutation invariance agree quite nicely. This might be very different for model-specific approaches like GP lengthscale analysis, though.

Anyway, only time will show us what is needed, so I think let's just start with one or two methods in mind and adapt from there. So regarding your questions:

For @AdrianSosic as he is going on longer absence, it would be good to have your definitive decisions on these questions before that.

  1. Are you OK with the above structure, any alternatives you propose or issues?
  2. For instance, should the utilities already do the plots or only provide shap values or plot be an optional argument to the utility?
  3. SHould the utilities accept campaigns, recommenders or both?
  4. This requrires additional dependencies, at the very least shap but probably others for the non-shap methods (that might still come as part of the shap package see above). I would group all of those into a new optional dependency group diagnostics or do we need that more finegrained or even let if fail at runtime?
  1. Seems reasonable for now, potentially with adding the subpackages I mentioned above. What needs to be fleshed out is how to provide (approach-specific?) context information. For instance, in my SHAP example from above, I used the same dataframe to create the Explainer and to evaluate it. These are generally two different things.
  2. Agree with @Alex6022: would always try to separate computation and visualization as much as possible. Probably there are several options how we could achieve this, e.g. by returning an object that offers a plot method
  3. I'm in favor of always going down to the lowest level required (less coupling, more modular, better testability). In this case: feature importance is only relevant in the context of models. And for baybe, this effectively means: it should expect a surrogate model. We can then always easily build high-level access around it like I did for the surrogate itself, i.e. something like def campaign.explain() that then internally takes care of collecting the relevant bits and pieces.
  4. Yep, agree. Let's simply go with diagnostics for now and extend later if needed.

@Alex6022
Copy link
Contributor

Alex6022 commented Oct 4, 2024

Dear @Scienfitz and @AdrianSosic,

I have taken my shot at implementing feature importance analysis in PR #391 for the new diagnostics package.

The workflow and separation of computation and visualization are highly similar to the original SHAP package. Hence, the plotting is implemented by passing the Explanation objects to functions. Alternatively, it could make sense to derive the Explanation class from the SHAP package to implement plotting methods.

I am looking forward to discussing this further, I hope this helps :)!

@AVHopp
Copy link
Collaborator

AVHopp commented Oct 11, 2024

Hi @Alex6022 let me just quickly also say "Hi" and thank you for your work :) Since you, @Scienfitz and @AdrianSosic already discussed here and in #391 I will leave the majority of the discussion to you guys and step in if I have the feeling that I can contribute anything (or if an additional opinion is required in case there is disagreement).

@Scienfitz
Copy link
Collaborator Author

Scienfitz commented Oct 11, 2024

@Alex6022 thanks so much for providing the PR. @AdrianSosic and I could not yet agree on all aspects, in particular the design of the actual explainer and how to handle its variance regarding the different avaialble shap explainers etc. Overall, we dont want to couple it to Campaign too strongly. I would advocate for a new class like done with MemoryInfo which could be initialized form a campaign or recommender etc plus comes with properties and mehtods to do all plots. Another option would be to provide a function which accepts a shap explainer object, the suer then has to import whatever plot theyw ant from shap to signal what our object should use.

But below please find other aspects that we saw that you could already work on:

  • Use a more generic name diagnostics for the new dependency group instead of shap to later enable other explainers and/or content
  • Seems youve added the imports correctly to _optional, however you still import things like shap in other files plainly, they now need to be imported from _optional
  • We need to find a way to enable the shap explanation also for categorical parameters, or at least make it not crash (ie ignore them) -> this is currently crashing in the exp_rep
  • Update CHANGELOG under Unreleased/Added
  • Add yourself to CONTRIBUTORS
  • You created two new fixtures, however you can simply parameterize your test with the existing fixtures parameter_names and target_names etc while you test requires the fixture campaign. It will automatically provide you with a campaign that adheres to your parameter and target requests, for an example see here. you can also reuse the same test and parameterize over different explainers (ie sha and non shap explainers, you can generate the list also automatically so alway al explainers are tested, for an example see here)

PS: I might update this post here with more boxes, you can simply tick them off to remember for yourself and signal to us

@Alex6022
Copy link
Contributor

Alex6022 commented Oct 28, 2024

@Alex6022 thanks so much for providing the PR. @AdrianSosic and I could not yet agree on all aspects, in particular the design of the actual explainer and how to handle its variance regarding the different avaialble shap explainers etc. Overall, we dont want to couple it to Campaign too strongly. I would advocate for a new class like done with MemoryInfo which could be initialized form a campaign or recommender etc plus comes with properties and mehtods to do all plots. Another option would be to provide a function which accepts a shap explainer object, the suer then has to import whatever plot theyw ant from shap to signal what our object should use.

But below please find other aspects that we saw that you could already work on:

  • Use a more generic name diagnostics for the new dependency group instead of shap to later enable other explainers and/or content
  • Seems youve added the imports correctly to _optional, however you still import things like shap in other files plainly, they now need to be imported from _optional
  • We need to find a way to enable the shap explanation also for categorical parameters, or at least make it not crash (ie ignore them) -> this is currently crashing in the exp_rep
  • Update CHANGELOG under Unreleased/Added
  • Add yourself to CONTRIBUTORS
  • You created two new fixtures, however you can simply parameterize your test with the existing fixtures parameter_names and target_names etc while you test requires the fixture campaign. It will automatically provide you with a campaign that adheres to your parameter and target requests, for an example see here. you can also reuse the same test and parameterize over different explainers (ie sha and non shap explainers, you can generate the list also automatically so alway al explainers are tested, for an example see here)

PS: I might update this post here with more boxes, you can simply tick them off to remember for yourself and signal to us

Hello @Scienfitz, thank you very much for this list.

I’ve addressed most of the points, except for the SHAP explanation error with categorical parameters, which I haven’t been able to replicate in a Python 3.12.4 environment with only baybe and shap installed. Could you let me know whether you get an error when running test_diagnostics.py within the current #391 PR?

Please also let me know if you have anything else to add to the list at the moment, I'd be happy to work on it.

@Scienfitz
Copy link
Collaborator Author

@Alex6022 great, thanks for working through the list. Please forget my point about the exp rep, it wasnt the result of a test or so, but a connection I made based on another post somewhere else (that might have turned out to be not relevant)

I pushed some commits to your branch fixing minor things. I've also changed a few things in the test file, perhaps you can draw some inspiration from that regarding parameterization. I also left some more comments in a review

I have two main questions to discuss with you

  1. Seems only the kernel shap explainer works at the moment. Is there any reason for that? I see its mostly because your signature enforces only the explainers that have data. What about the explainers that have masker instead of data? Can we make those work as well? Naively I would expect that an agnostic explainer like permutation and others in principle should work if correctly called

  2. I did not look deeply into the main functionality file. yet. This is because in the dev team we need to align on the structure still. You see at the moment functions like plot_bar and plot_waterfall seem rather pointless because they just call 1 line that a user could also call with the return from the utility. But plot_scatter on the other hand seems to use more logic so its useful. There is nothing to change for you for now and we will get back to you

@Alex6022
Copy link
Contributor

Dear @Scienfitz,

Thank you for the feedback, I have described the changes based on your comments in the PR #391. Regarding the questions:

  1. Good point, we could definitely make more use of the different SHAP explainers within the related package. While some explainers accept masker instead of data, these arguments are effectively the same when a background data matrix instead of a 'masker' object is provided. Since BayBE exclusively works with tabular data, I believe we could disregard other types of maskers and hence use these arguments interchangeably. I have changed the implementation accordingly and included SHAP explainers with mask arguments in the tests.
  2. These functions have now been removed. Do you believe the plotting functions from shap should still be part of the diagnostics.shap namespace? I look forward to hearing from you about the structure based on the strategy for the diagnostics package.

@Scienfitz Scienfitz changed the title Upcoming Diagnostics Package Upcoming Insights Package Nov 22, 2024
@Scienfitz
Copy link
Collaborator Author

closed via #391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New functionality
Projects
None yet
Development

No branches or pull requests

4 participants