Skip to content

Check tokenization of emoji #37

@ziorufus

Description

@ziorufus

Test string: This 😈 is a devil emoji

{
  "docDate": "2022-01-27",
  "timings": "Annotation pipeline timing information:\nItalianTokenizerAnnotator: 0.0 sec.\nTOTAL: 0.0 sec. for 7 tokens at 200.0 tokens/sec.",
  "sentences": [
    {
      "index": 0,
      "characterOffsetBegin": 0,
      "characterOffsetEnd": 24,
      "text": "This 😈 is a devil emoji",
      "parse": "SENTENCE_SKIPPED_OR_UNPARSABLE",
      "tokens": [
        {
          "index": 1,
          "word": "This",
          "originalText": "This",
          "characterOffsetBegin": 0,
          "characterOffsetEnd": 4
        },
        {
          "index": 2,
          "word": "?",
          "originalText": "?",
          "characterOffsetBegin": 5,
          "characterOffsetEnd": 6
        },
        {
          "index": 3,
          "word": "?",
          "originalText": "?",
          "characterOffsetBegin": 6,
          "characterOffsetEnd": 7
        },
        {
          "index": 4,
          "word": "is",
          "originalText": "is",
          "characterOffsetBegin": 8,
          "characterOffsetEnd": 10
        },
        {
          "index": 5,
          "word": "a",
          "originalText": "a",
          "characterOffsetBegin": 11,
          "characterOffsetEnd": 12
        },
        {
          "index": 6,
          "word": "devil",
          "originalText": "devil",
          "characterOffsetBegin": 13,
          "characterOffsetEnd": 18
        },
        {
          "index": 7,
          "word": "emoji",
          "originalText": "emoji",
          "characterOffsetBegin": 19,
          "characterOffsetEnd": 24
        }
      ]
    }
  ]
}

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions