You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank you for providing this library! I am interested in very short texts like "capital Italy". With the other version of this library, i.e. https://github.com/shuyo/language-detection I got quite good results. With this version it is different. Is it a matter of configurations? Do you have an idea what it can be?
I use:
TextObjectFactory textObjectFactory = CommonTextObjectFactories.forDetectingShortCleanText();
Here are some examples that were working in the "previeus version":
Persone nate a padova (italian)
actors from canada (english)
attori canada (italian)
Was ist die hauptstadt von kanada (german)
The text was updated successfully, but these errors were encountered:
In my version, I have added more shorttext language resources and added more shorttext data in the DataLanguageDetectorImplTest.java which needs TestNG test framework to test upon (need to enable test option useTestNG() and disable useJUnit() in the build.gradle):
@DataProvider
protected Object[][] shortCleanTexts() {
return new Object[][] {
{"en", shortCleanText("This is some English text.")},
{"fr", shortCleanText("Ceci est un texte français.")},
{"nl", shortCleanText("Dit is een Nederlandse tekst.")},
{"de", shortCleanText("Dies ist eine deutsche Text")},
{"km", shortCleanText("សព្វវចនាធិប្បាយសេរីសម្រាប់អ្នកទាំងអស់គ្នា។" + "នៅក្នុងវិគីភីឌាភាសាខ្មែរឥឡូវនេះមាន ១១៩៨រូបភាព សមាជិក១៥៣៣៣នាក់ និងមាន៤៥៨៣អត្ថបទ។")},
{"bg", shortCleanText("Европа не трябва да стартира нов конкурентен маратон и изход с приватизация")},
{"it", shortCleanText("Persone nate a padova")},
{"it", shortCleanText("attori canada")},
{"de", shortCleanText("Was ist die hauptstadt von kanada")},
{"pl", shortCleanText("I Kanadyjczycy")},
{"en", shortCleanText("actors from Canada")},
};
}
Hi,
thank you for providing this library! I am interested in very short texts like "capital Italy". With the other version of this library, i.e. https://github.com/shuyo/language-detection I got quite good results. With this version it is different. Is it a matter of configurations? Do you have an idea what it can be?
I use:
TextObjectFactory textObjectFactory = CommonTextObjectFactories.forDetectingShortCleanText();
Here are some examples that were working in the "previeus version":
The text was updated successfully, but these errors were encountered: