Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusable and NonReusable capability #592

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chengchingwen
Copy link

addressed in #591

cc: @oxinabox, @mzgubic

@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2022

Codecov Report

Base: 93.16% // Head: 93.16% // No change to project coverage 👍

Coverage data is based on head (7f47c2e) compared to base (9c8fcd2).
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #592   +/-   ##
=======================================
  Coverage   93.16%   93.16%           
=======================================
  Files          15       15           
  Lines         907      907           
=======================================
  Hits          845      845           
  Misses         62       62           
Impacted Files Coverage Δ
src/ChainRulesCore.jl 100.00% <ø> (ø)
src/config.jl 100.00% <ø> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@oxinabox
Copy link
Member

Before merging I want to see a PR on Zygote where this is hooked into the code for gradient vs jacobian.
So we know this isn't a feature that won't be used and will just go stale

@chengchingwen
Copy link
Author

Still working on that. I tried to hook that into Zygote with the idea of injecting the information into Context. However, it break some of the test of Zygote and I haven't figure out what was the cause. I would mark this PR as ready if I get things to work in Zygote.

p.s. if anyone have time and know how to make it work, feel free to take over.

@ToucheSir
Copy link
Contributor

ToucheSir commented Oct 28, 2022

Happy to help workshop the Zygote PR. I can't think of too many places where we'd be able to use it without adding new rules, however, given that the rules which would use it are likely to use mutation and there aren't many second-order rules for those rules.

@chengchingwen
Copy link
Author

I can't think of too many places where we'd be able to use it without adding new rules, however, given that the rules which would use it are likely to use mutation and there aren't many second-order rules for those rules.

We should be able to use it to add new optimized rules. Ideally the change shouldn't affect all the old rule and we could decide whether to add an optimized rule with mutation. But if we are adding those optimized rules, we also need to get the info about whether the pullback is execute inside an AD context to avoid breaking higher-order AD. Do we have a interface for second-order rules? It seems Zygote is using jacobian with gradient to compute second-order derivative and we can't specify a rule for second-order directly?

@ToucheSir
Copy link
Contributor

Do we have a interface for second-order rules?

I don't think so. This is something I'd really like to see in CRC, but it's not clear to me what a higher-order rules interface would look like. Diffractor appears to have something for this (usage example here?), but I don't quite understand how it works.

It seems Zygote is using jacobian with gradient to compute second-order derivative and we can't specify a rule for second-order directly?

AIUI Zygote only needs jacobian if you're calculating the full hessian. Otherwise nesting gradient should be enough. To your point though, there isn't any way to define a second order rule.

On possible ideas, I've been reading through https://github.com/pytorch/pytorch/blob/master/tools/autograd/derivatives.yaml and their model looks the most similar to FluxML/NNlib.jl#434. As noted on that PR though, such a model might not work for all ChainRules-compatible ADs. Another way that works today would be to manually split out an augmented_primal and callable PullbackMemory like in https://juliadiff.org/ChainRulesCore.jl/stable/design/changing_the_primal.html, then define rules for each. This would require a decent amount more work and code, but it should work today with any AD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants