I see that the w2v-conformer pre-trained model is trained using a multilingual dataset. Currently I have not found a relevant multilingual training solution or script.
Some of the problems encountered so far are how to choose the text modeling unit, is it BPE or char or something else?