-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shapley values #335
Shapley values #335
Conversation
…yBE to generalize with different surrogates.
…022/baybe into feature/feature_importance
…ng simulations instead of plotting them.
… feature/feature_importance
Hi @Alex6022, it's absolutely great to see that you took the time to contribute to BayBE 😃🥇 In fact, the feature importance part has been requested since quite a while (see #78) and there are currently quite a few changes (i.e. a major refactoring of the surrogate modules) on the way that prepare our code for proper integration of this feature (and many others). The SHAP integration was one of the first things I've planned for after the refactoring has been completed. (Ping: @brandon-holt, who specifically asked about SHAP) That said, I'm glad to see that you already took the first step. And I'd be more than happy if we could finalize the PR together. However, I think a proper integration requires the refactoring to be finished first and the So if you don't mind: let me finish the refactoring first and then ping you once done, so that we can proceed with your PR? In the meantime, I can already highlight some points that we'd need to address with your PR before it can be merged. Perhaps, if you have the time, you can already start to work on it / think about the necessary design changes:
I can gladly give you guidance on individual points if you need more information 👍🏼 And I'll keep you updated on the ongoing refactoring work (here the current open PR: #325) Once again, thanks for your contribution!! |
@AdrianSosic @Alex6022 Hey just checking the status of this! Also if you could include a quick example of how to perform the SHAP analysis on a baybe surrogate, that would be much appreciated |
Hi @AdrianSosic, thank you for the great feedback! To respect the modularity of BayBE, implementing the SHAP calculation as a hook and then providing a plot utility probably makes a lot of sense. This should solve issues 4 and 5, would you agree on this? As you suggested, addressing the other points in detail may make more sense once the refactoring of the surrogate model has been completed. Please feel free to take your time for this, I am currently also on vacation until the end of August. @brandon-holt, the current way to use my (arguably slightly hacked) implementation would be to call I really like the overall idea and vision for this package, so I am really excited to contribute with the implementation. Looking forward to continuing this 😃! |
@Alex6022 amazing thank you!
Edit: It appears that you cannot reload a campaign object that was saved via pickle using an earlier version of baybe which doesn't include your injected code. The following error appears: Would it be possible to update the code to handle this scenario? Perhaps some of the modularity changes suggested by @AdrianSosic would address this? Let me know your thoughts, these campaign objects took 1 week+ to train so recreating them every time there is a new version of baybe is not ideal. |
Hi @brandon-holt, let me shed some light on what is happening in the back:
Let me know if I can help you any further. |
Hello @Alex6022 there was some movement in this issue and I wanted to reach out
Do you think thats reasonable? Do you want to have a go at this? Even if not, your response here would be appreciated. I'd judge the effort to do this as fairly small, perhaps even less code than this PR here |
Dear @Scienfitz, I am afraid my previous response did not get through, hopefully this did not cause any inconveniences. I recently made a suggestion for this feature in the PR #391, as mentioned in issue #357. |
Introduced SHAP (SHapley Additive exPlanations) analysis of the surrogate model to analyze the feature importance of finished campaigns. This is especially interesting in combination with the molecular encodings that are already built-in into BayBE.
In a previous project from the AC-BO-Hackathon, different molecular encodings were previously tested to screen molecules for high corrosion inhibition. Analyzing the highly succesful MORDRED campaign with the new SHAP functionality yields the following summary plot:

Besides the measurement parameters "Time_h" and "Salt_Concentrat_M", the Mordred-specific features "SMILES_MORDRED_NdS" and "SMILES_MORDRED_nS" suggest the importance of sulphur groups for corrosion inhibition. Interestingly, this is in agreement with previous literature in the field. I hope that this new feature will be of interest for many other applications in the future.