-
Notifications
You must be signed in to change notification settings - Fork 12
docs: Add intuitive explanation of JD #494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
| ================ | ||
|
|
||
| This guide briefly explains what Jacobian descent is and on what kind of problems it can be used. | ||
| For a more theoretical explanation, please read our article |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
| For a more theoretical explanation, please read our article | |
| For a detailed explanation, take a look at |
|
|
||
| **Introduction** | ||
|
|
||
| The goal of Jacobian descent is to train models with multiple conflicting losses. When you have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe?
| The goal of Jacobian descent is to train models with multiple conflicting losses. When you have | |
| The goal of Jacobian descent is to train models when considering multiple conflicting losses. When you have |
| losses and then computing the gradient, so doing that is equivalent to doing gradient descent. | ||
|
|
||
| If you have two gradients with a negative inner product and quite different norms, their sum will | ||
| have a negative inner product with the smallest gradient. So, given a sufficiently small learning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pic or fake (Figure 1)
| are traditionally considered as single-objective can actually be seen as multi-objective. Here are a | ||
| few examples: | ||
|
|
||
| - We can consider separately the loss of each element in the mini-batch, instead of averaging them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you did there, I guess you want to avoid the awkwardness of SJD
| - We can consider separately the loss of each element in the mini-batch, instead of averaging them. | |
| - We can consider separately the loss of each element in the dataset, instead of averaging them. |
This also needs to improve the monitoring examples so that they're easier to understand for someone who is just getting started with torchjd, and so that they show:
TODO: