Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short text recognition #65

Open
D063520 opened this issue Sep 27, 2016 · 1 comment
Open

Short text recognition #65

D063520 opened this issue Sep 27, 2016 · 1 comment

Comments

@D063520
Copy link

D063520 commented Sep 27, 2016

Hi,

thank you for providing this library! I am interested in very short texts like "capital Italy". With the other version of this library, i.e. https://github.com/shuyo/language-detection I got quite good results. With this version it is different. Is it a matter of configurations? Do you have an idea what it can be?
I use:
TextObjectFactory textObjectFactory = CommonTextObjectFactories.forDetectingShortCleanText();

Here are some examples that were working in the "previeus version":

  • Persone nate a padova (italian)
  • actors from canada (english)
  • attori canada (italian)
  • Was ist die hauptstadt von kanada (german)
@eclectice
Copy link

You can check my forked version which I've added the build.gradle for building pure Java library with Android Studio 2.2: https://github.com/eclectice/language-detector

In my version, I have added more shorttext language resources and added more shorttext data in the DataLanguageDetectorImplTest.java which needs TestNG test framework to test upon (need to enable test option useTestNG() and disable useJUnit() in the build.gradle):

    @DataProvider
    protected Object[][] shortCleanTexts() {
        return new Object[][] {
                {"en", shortCleanText("This is some English text.")},
                {"fr", shortCleanText("Ceci est un texte français.")},
                {"nl", shortCleanText("Dit is een Nederlandse tekst.")},
                {"de", shortCleanText("Dies ist eine deutsche Text")},
                {"km", shortCleanText("សព្វវចនាធិប្បាយសេរីសម្រាប់អ្នកទាំងអស់គ្នា។" + "នៅក្នុងវិគីភីឌាភាសាខ្មែរឥឡូវនេះមាន ១១៩៨រូបភាព សមាជិក១៥៣៣៣នាក់ និងមាន៤៥៨៣អត្ថបទ។")},
                {"bg", shortCleanText("Европа не трябва да стартира нов конкурентен маратон и изход с приватизация")},
                {"it", shortCleanText("Persone nate a padova")},
                {"it", shortCleanText("attori canada")},
                {"de", shortCleanText("Was ist die hauptstadt von kanada")},
                {"pl", shortCleanText("I Kanadyjczycy")},
                {"en", shortCleanText("actors from Canada")},
        };
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants