Annotations were made with the brat tool (http://brat.nlplab.org) and they are provided in brat format.
The annotated data is under the data folder. The data folder contains the brat configuration files, the annotation guidelines and the annotated sets. Sets 00, 01, 02, 03 and 04 were used for training and sets 05, 06 and 07 were used as test sets in our study. Each set contains text and annotation files. The name of the file indicates the MEDLINE citation from the text is from. The pre-adjudicated folder contains the individual annotations from each of the 5 annotators before discussion.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
https://creativecommons.org/licenses/by-nc-nd/4.0
Please cite our AMIA contribution if you use this data set. This publication contains additional information about the generation of the data set.
Antonio Jimeno Yepes, Andrew MacKinlay, Natalie Gunn, Christine Schieber, Noel Faux, Matthew Downton, Benjamin Goudey, Richard L. Martin, A hybrid approach for automated mutation annotation of the extended human mutation landscape in scientific literature, American Medical Informatics Association (AMIA) Symposium, 2018