Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about mask generation #12

Open
magehrig opened this issue Feb 14, 2022 · 2 comments
Open

Questions about mask generation #12

magehrig opened this issue Feb 14, 2022 · 2 comments

Comments

@magehrig
Copy link

magehrig commented Feb 14, 2022

Hi @thomasverelst

Congrats, nice work! I have two questions out of curiosity:

  1. Forward pass: Why did you choose to sample from the Bernoulli distribution instead of the Gumbel-softmax? To my knowledge, sampling from the Bernoulli distribution introduces a bias in the gradient estimation which could make optimization trickier. I understand that you would not be able to use sparse convolutions in the training but I wonder if there is another reason.

  2. Have you tried annealing the temperature parameter to less than 1?

@thomasverelst
Copy link
Owner

thomasverelst commented Feb 16, 2022

Hi!
I think you mean that it uses the straight-through version of the Gumbel-Softmax trick (hard version). I did not thoroughly ablate this, but my initial results indicated slightly better performance for the hard straight-through version. The straight-through version indeed has bias, but the network's weights directly optimize for the sparse convolutions. I can agree though that the soft Gumbel-Softmax with some temperature annealing towards 0 might improve training stability.
The best solution though might be to weight spatial positions by the probabilities (i.e. soft attention), e.g. by using the soft Gumbel softmax and multiplying executed positions (where prob_exec > 0.5 by (prob_exec-0.5)*2 ) both at training and inference time.
As I have more compute available nowadays I might explore this over summer when writing my PhD thesis

@magehrig
Copy link
Author

Congrats on your soon-PhD!
Yes, I would be interested in knowing whether you are able to get the non-straight-through version to work well.

Good luck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants