This section covers frequently asked questions and will grow over time. We appreciate your feedback to continuously improve our toolkit and let it grow over time!
The following adjustments need to be made, based on the task types that should be supported.
- Language model extensions
- All
- Check if the spacy multi-lingual models supports your language
- If it is not supported, or you want a language specific model, add the applicable model to the
spacy_model_lookup
object in/code/helper.py
- Classification
- Check if the DL architecture you want to us supports your language in their pre-trained multi-language models. Check here for details.
- If it is not supported, or you want a language specific model, add the applicable model to the
farm_model_lookup
object, in/code/helper.py
- NER
- Check if flair multi-language ner model supports your language
- In addition, you can train your own NER using one of the models used for the sequence classification task
- QA
- Currently, the QA model is language agnostic. The pre-processing steps (incl. stopwords and tokenization) should be optimized on a per-language basis
- Tokenizer adjustments
The Spacy tokenizer is used for splitting words and characters into tokens. See the spacy model support above, for appropriate approaches to language specific tokenization. This includes lemmatization and stemming (if needed).
A stopword list can be added for the respective language, by adding a utf-8 encoded list of stopwords in a.txt
file to the /assets folder. The naming is as follows:stopwords-{language}.txt
. Examples for German and French are given.
This documentation helps you to connect your personal GitHub with the Microsoft organization. You will be able to access Microsoft-internal GitHub repositories after successfully connecting them.
- Enable 2-factor authentication on your GitHub account (Instructions). You can either use SMS or app authentication. We recommend using App-Authentication, either with Microsoft Authenticator on your corporate phone or using LastPass Authenticator,
as SMS can be difficult when you are abroad. Other apps are also possible, but LastPass Authenticator is the only app that works without login. - Save the reset codes to be able to reset the 2-factor authentication one day.
- Here you can link your GitHub account to the Microsoft organization afterwards. You have to give Microsoft the authorization to access your GitHub (e.g. Microsoft will ask you if your GitHub account uses 2-factor authentication).
- After successfully linking the accounts, you can join the GitHub organization "microsoft" in the "Available Microsoft GitHub organizations" section.
- Your GitHub profile should now list "microsoft" among your organizations.
- You can now access the Verseagility GitHub-repository