Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

Write a tidy_masc_data() function #8

Open
francojc opened this issue Dec 12, 2023 · 0 comments
Open

Write a tidy_masc_data() function #8

francojc opened this issue Dec 12, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@francojc
Copy link
Contributor

This function will take the acquired MASC data (after using the get_compressed_data() function) and produce a curated dataset.

Set up the function so that it will be possible to extend it do different types of curation. But start with the XML for word-based data. (*-penn.xml)

The signature might look something like this:

tidy_masc_data <- function(source_dir, target_dir, type = "penn", force = FALSE) {
# ...
}

Later type = can be augmented to include:

  • "text" (plain text files)
  • "ne" (named entities)
  • "shallow" (noun chunks)
  • etc.
@francojc francojc added the enhancement New feature or request label Dec 12, 2023
@francojc francojc self-assigned this Dec 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant