Dear Author,
first of all I would like to say that this work is a very good solution for low data resource kws. Thank you for sharing the e2e training recipe of this novel idea.
I am interested to know if this is a reliable solution for real time Speech commands recognition on low power embedded devices.
Another question I would like to ask is:
Could we use the Embedding extractor model which was trained for 10-way 15-shot experiment for deploying into a 6-way keyword spotting pipeline?
Thanks,
Saikiran