Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the sparsity_target #10

Open
albertszg opened this issue Dec 6, 2021 · 2 comments
Open

question about the sparsity_target #10

albertszg opened this issue Dec 6, 2021 · 2 comments

Comments

@albertszg
Copy link

Hello, this is brilliant work, I want to use the binary gumbel-softmax for my work. But there are some problems.
I used the soft mask for the first layer only (just apply the generated mask to the features after the first layer),and I found a strange phenomenon。The gumbel noise seemed to influence the training process too much. I plotted the sparsity loss only, and I found I usually couldn't obtain the sparsity target I set. Is this process right?
temp=5.0
微信截图_20211206151218
temp=1.0
later

@thomasverelst
Copy link
Owner

Some things that could help for convergence;

  • one of the gumbel-softmax papers shows that for a binary case, the temperature should be smaller or equal to 1 for convergence. e.g. 0.66
  • lowering the learning rate (or separate lower learning rate for the decision layers) might help
  • disabling weight decay on the decision layers might help

Not sure what your exact setup is, but be sure that the implementation is correct so that the gradients can backpropagate

@albertszg
Copy link
Author

Thanks for your advice, I'll try them. And I find that adding a BatchNorm layer in the squeeze function is better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants