Skip to content

use nltk/cltk-compliant corpus formatting #1

Description

@thatbudakguy

see https://github.com/cltk/chinese_text_cbeta_01 for an example of a repo containing source texts (here, in TEI) and generated JSON that can be consumed by the cltk corpus loader.

see also https://docs.cltk.org/en/latest/importing_corpora.html#user-defined-distributed-corpora for information on adding user-defined corpora via cltk.

finally, the nltk CorpusReader specification: https://www.nltk.org/api/nltk.corpus.reader.html#module-nltk.corpus.reader

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions