Skip to content

Conversation

@ValerianRey
Copy link
Contributor

@ValerianRey ValerianRey commented Dec 20, 2025

This also needs to improve the monitoring examples so that they're easier to understand for someone who is just getting started with torchjd, and so that they show:

  • how to compute angles between gradients (or cosine similarities)
  • how to compute norm imbalance

TODO:

  • Add link in base page of torchJD with some text such as "[blob] contains an introduction to Jacobian Descent as well as information as to when it can lead to to improvement of results."

@codecov
Copy link

codecov bot commented Dec 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

================

This guide briefly explains what Jacobian descent is and on what kind of problems it can be used.
For a more theoretical explanation, please read our article
Copy link
Contributor

@PierreQuinton PierreQuinton Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
For a more theoretical explanation, please read our article
For a detailed explanation, take a look at


**Introduction**

The goal of Jacobian descent is to train models with multiple conflicting losses. When you have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe?

Suggested change
The goal of Jacobian descent is to train models with multiple conflicting losses. When you have
The goal of Jacobian descent is to train models when considering multiple conflicting losses. When you have

losses and then computing the gradient, so doing that is equivalent to doing gradient descent.

If you have two gradients with a negative inner product and quite different norms, their sum will
have a negative inner product with the smallest gradient. So, given a sufficiently small learning
Copy link
Contributor

@PierreQuinton PierreQuinton Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pic or fake (Figure 1)

are traditionally considered as single-objective can actually be seen as multi-objective. Here are a
few examples:

- We can consider separately the loss of each element in the mini-batch, instead of averaging them.
Copy link
Contributor

@PierreQuinton PierreQuinton Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you did there, I guess you want to avoid the awkwardness of SJD

Suggested change
- We can consider separately the loss of each element in the mini-batch, instead of averaging them.
- We can consider separately the loss of each element in the dataset, instead of averaging them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants