question about the sparsity_target #10

albertszg · 2021-12-06T07:23:00Z

Hello, this is brilliant work, I want to use the binary gumbel-softmax for my work. But there are some problems.
I used the soft mask for the first layer only (just apply the generated mask to the features after the first layer)，and I found a strange phenomenon。The gumbel noise seemed to influence the training process too much. I plotted the sparsity loss only, and I found I usually couldn't obtain the sparsity target I set. Is this process right?
temp=5.0

temp=1.0

thomasverelst · 2021-12-14T09:51:51Z

Some things that could help for convergence;

one of the gumbel-softmax papers shows that for a binary case, the temperature should be smaller or equal to 1 for convergence. e.g. 0.66
lowering the learning rate (or separate lower learning rate for the decision layers) might help
disabling weight decay on the decision layers might help

Not sure what your exact setup is, but be sure that the implementation is correct so that the gradients can backpropagate

albertszg · 2021-12-14T10:34:07Z

Thanks for your advice, I'll try them. And I find that adding a BatchNorm layer in the squeeze function is better

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about the sparsity_target #10

question about the sparsity_target #10

albertszg commented Dec 6, 2021

thomasverelst commented Dec 14, 2021

albertszg commented Dec 14, 2021

question about the sparsity_target #10

question about the sparsity_target #10

Comments

albertszg commented Dec 6, 2021

thomasverelst commented Dec 14, 2021

albertszg commented Dec 14, 2021