Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regex Error] Error in splitting titles and sections in Mimoracle corpora #39

Open
honghanhh opened this issue Oct 18, 2024 · 0 comments
Assignees
Labels
prio/low type/bug Something isn't working as intended

Comments

@honghanhh
Copy link
Contributor

Description

There exists error in section extraction in Mimoracle that should be updated if we plan to take advantage of this dataset for later use.
E.g.,

  • Mix between different sections and title
  • Wrong section_title value
  • section_content overlaps information of other sections (e.g., check date columns)
  • Duplication in section_title in one document even same following up position in documents
@honghanhh honghanhh added prio/low type/bug Something isn't working as intended labels Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio/low type/bug Something isn't working as intended
Projects
None yet
Development

No branches or pull requests

2 participants