-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve Links and Text related to Public Dataset #53
Conversation
Thanks @Precious-Macaulay |
Alright |
I just reviewed and tested this PR on a couple PMCids. Here are the output files These outputs work well for my use case and resolved the issue in #53. They also look fine in LabelBuddy. @adelavega and @jeromedockes is there any additional testing you want to do or are you happy to merge this PR into Main? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much, @Precious-Macaulay !
LGTM
See #51
I updated the text_extraction.xml stylesheet to keep external links without affecting readability and also searched for more articles to know where public datasets text tends to be and updated the stylesheet to handle that accordindly. i have attached a copy of my extracted text data from some articles i gathered including the example in the issue too.
text.csv