Questions about mask generation #12

magehrig · 2022-02-14T10:34:25Z

Congrats, nice work! I have two questions out of curiosity:

Forward pass: Why did you choose to sample from the Bernoulli distribution instead of the Gumbel-softmax? To my knowledge, sampling from the Bernoulli distribution introduces a bias in the gradient estimation which could make optimization trickier. I understand that you would not be able to use sparse convolutions in the training but I wonder if there is another reason.
Have you tried annealing the temperature parameter to less than 1?

thomasverelst · 2022-02-16T16:47:25Z

Hi!
I think you mean that it uses the straight-through version of the Gumbel-Softmax trick (hard version). I did not thoroughly ablate this, but my initial results indicated slightly better performance for the hard straight-through version. The straight-through version indeed has bias, but the network's weights directly optimize for the sparse convolutions. I can agree though that the soft Gumbel-Softmax with some temperature annealing towards 0 might improve training stability.
The best solution though might be to weight spatial positions by the probabilities (i.e. soft attention), e.g. by using the soft Gumbel softmax and multiplying executed positions (where prob_exec > 0.5 by (prob_exec-0.5)*2 ) both at training and inference time.
As I have more compute available nowadays I might explore this over summer when writing my PhD thesis

magehrig · 2022-02-21T17:42:40Z

Congrats on your soon-PhD!
Yes, I would be interested in knowing whether you are able to get the non-straight-through version to work well.

Good luck

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about mask generation #12

Questions about mask generation #12

magehrig commented Feb 14, 2022 •

edited

Loading

thomasverelst commented Feb 16, 2022 •

edited

Loading

magehrig commented Feb 21, 2022

Questions about mask generation #12

Questions about mask generation #12

Comments

magehrig commented Feb 14, 2022 • edited Loading

thomasverelst commented Feb 16, 2022 • edited Loading

magehrig commented Feb 21, 2022

magehrig commented Feb 14, 2022 •

edited

Loading

thomasverelst commented Feb 16, 2022 •

edited

Loading