Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you please explain the choice of STFT size 512? #27

Open
xdcesc opened this issue May 20, 2019 · 3 comments
Open

Could you please explain the choice of STFT size 512? #27

xdcesc opened this issue May 20, 2019 · 3 comments

Comments

@xdcesc
Copy link

xdcesc commented May 20, 2019

@LukasDrude Could you please explain why choosing STFT size 512 (with shift 128)? Is is related to the coherence bandwidth of RIR?

@LukasDrude
Copy link
Member

We tend to use WPE together with other component, e.g. beamforming. When doing to, we use parameters typical for that application.

In this example [1, 2] we use 512 as a window size. But we tend to check various sizes/ shifts when performance is important.

In [2] we use it together with a beamformer. Since 1024 size and 256 shift worked better on this dataset for beamforming, we used this parameters. Its worth noting, that all other parameters (minimum delay, ...) should ideally be checked, e.g. on the development set.

[1] https://groups.uni-paderborn.de/nt/pubs/2018/IWAENC_2018_Heymann_Paper.pdf
[2] https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8683294
[3] https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Drude_Paper.pdf

@xdcesc
Copy link
Author

xdcesc commented May 21, 2019

@LukasDrude Thanks for your reply. I do some simulations using different echo lengths and DFT sizes. It is true that we need check various DFT sizes to get optimal performance, for example, for 800ms echo, the best DFT window size is 1024. And what confused me is using 2048-point DFT makes it worse. Considering coherent bandwidth of room impulse response, greater DFT window size should not lead to performance degradation.

@LukasDrude
Copy link
Member

@xdcesc I for sure recommend to not tune the DFT size to each single utterance. We tend to set the parameters on the train or validation set and then keep that value for the test set.

In general, with DFT sizes you have different effects playing in. If your DFT size is very high, you have very few time frames for WPE to calculate the covariance matrix. You have a high frequency resolution, but that does not really help when the algorithm provides inaccurate estimates.

Also keep in mind that when you change DFT size you basically have to tune all other parameters as well (e.g. change minimum delay, ...).

@xdcesc xdcesc closed this as completed May 23, 2019
@xdcesc xdcesc reopened this May 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants