Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions data/xml/2024.acl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -11132,9 +11132,9 @@
<author><first>Rifki Afina</first><last>Putri</last><affiliation>Korea Advanced Institute of Science &amp; Technology</affiliation></author>
<author><first>Emmanuel</first><last>Dave</last><affiliation>Binus University</affiliation></author>
<author><first>Jhonson</first><last>Lee</last><affiliation>Tokopedia</affiliation></author>
<author><first>Nuur</first><last>Shadieq</last><affiliation>Binus University</affiliation></author>
<author><first>Wawan</first><last>Cenggoro</last><affiliation>Institut Teknologi Bandung</affiliation></author>
<author><first>Salsabil Maulana</first><last>Akbar</last><affiliation>Universitas Telkom</affiliation></author>
<author><first>Nuur</first><last>Shadieq</last><affiliation>Universitas Telkom</affiliation></author>
<author><first>Wawan</first><last>Cenggoro</last><affiliation>Binus University</affiliation></author>
<author><first>Salsabil Maulana</first><last>Akbar</last><affiliation>Institut Teknologi Bandung</affiliation></author>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue of metadata correction: #6328
2024.acl-long.796
Paper page: https://aclanthology.org/2024.acl-long.796.pdf
Commit showing the bug effect: f88a83e

just affiliations needed to be changed
note: affiliations don't always match with what is shown on PDF

<author><first>Muhammad Ihza</first><last>Mahendra</last><affiliation>Universitas Telkom</affiliation></author>
<author><first>Dea Annisayanti</first><last>Putri</last><affiliation>Universitas Indonesia</affiliation></author>
<author><first>Bryan</first><last>Wilie</last><affiliation>Hong Kong University of Science and Technology</affiliation></author>
Expand Down
10 changes: 5 additions & 5 deletions data/xml/2025.arabicnlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -384,8 +384,8 @@
<paper id="26">
<title>Mind the Gap: A Review of <fixed-case>A</fixed-case>rabic Post-Training Datasets and Their Limitations</title>
<author orcid="0009-0004-6842-1300"><first>Mohammed</first><last>Alkhowaiter</last><affiliation>Prince Sattam bin Abdulaziz University</affiliation></author>
<author><first>Norah</first><last>Alshahrani</last><affiliation>University of Bisha</affiliation></author>
<author><first>Saied</first><last>Alshahrani</last><affiliation>ASAS AI</affiliation></author>
<author><first>Norah</first><last>Alshahrani</last><affiliation>ASAS AI</affiliation></author>
<author><first>Saied</first><last>Alshahrani</last><affiliation>University of Bisha</affiliation></author>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue of metadata correction: #6401
2025.arabicnlp-main.26
Paper page: https://aclanthology.org/2025.arabicnlp-main.26/
Commit showing the bug effect: db094a2

just affiliations needed to be changed
saied and norah (same last name) swapped

<author><first>Reem I.</first><last>Masoud</last></author>
<author orcid="0000-0002-9914-915X"><first>Alaa</first><last>Alzahrani</last><affiliation>King Salman Global Academy for Arabic</affiliation></author>
<author orcid="0000-0002-5082-8565"><first>Deema</first><last>Alnuhait</last><affiliation>University of Illinois at Urbana-Champaign</affiliation></author>
Expand Down Expand Up @@ -2103,11 +2103,11 @@
</paper>
<paper id="133">
<title>Tokenizers United at <fixed-case>QIAS</fixed-case>-2025: <fixed-case>RAG</fixed-case>-Enhanced Question Answering for Islamic Studies by Integrating Semantic Retrieval with Generative Reasoning</title>
<author><first>Mohamed</first><last>Samy</last><affiliation>Institute</affiliation></author>
<author><first>Mayar</first><last>Boghdady</last></author>
<author><first>Mohamed</first><last>Samy</last></author>
<author><first>Mayar</first><last>Boghdady</last><affiliation>Institute</affiliation></author>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue of metadata correction: #6335
2025.arabicnlp-sharedtasks.133
Paper page: https://aclanthology.org/2025.arabicnlp-sharedtasks.133/
Commit showing the bug effect: c8dfe7c

just affiliation (meaningless "NA")

<author><first>Marwan</first><last>El Adawi</last></author>
<author><first>Mohamed</first><last>Nassar</last></author>
<author><first>Ensaf Hussein</first><last>Mohamed</last></author>
<author><first>Ensaf</first><last>Hussein</last></author>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted another name inconsistency: last author has name Ensaf Hussein 8according to PDF](https://aclanthology.org/2025.arabicnlp-sharedtasks.133.pdf), but metadata says last name Mohamed. In metadata correction issue only updated authors_new but not authors list itself. This author should probably also have name variants recorded, currently 3 author pages (Ensaf Hussein Mohamed, Ensaf Mohamed, Ensaf H. Mohamed) that could probably merge and several metadata PDF inconsistencies. I haven't seen an author page request for this author yet.

<pages>960-965</pages>
<abstract/>
<url hash="1572525d">2025.arabicnlp-sharedtasks.133</url>
Expand Down
4 changes: 2 additions & 2 deletions data/xml/2025.emnlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21230,8 +21230,8 @@
<author orcid="0000-0003-0049-4808"><first>Aloka</first><last>Fernando</last><affiliation>University of Moratuwa</affiliation></author>
<author orcid="0000-0002-5361-4810"><first>Nisansa</first><last>de Silva</last><affiliation>University of Moratuwa</affiliation></author>
<author><first>Menan</first><last>Velayuthan</last><affiliation>University of Moratuwa</affiliation></author>
<author orcid="0000-0003-0701-0204"><first>Charitha</first><last>Rathnayake</last><affiliation>Massey University</affiliation></author>
<author><first>Surangika</first><last>Ranathunga</last><affiliation>University of Moratuwa</affiliation></author>
<author><first>Charitha</first><last>Rathnayake</last><affiliation>University of Moratuwa</affiliation></author>
<author orcid="0000-0003-0701-0204"><first>Surangika</first><last>Ranathunga</last><affiliation>Massey University</affiliation></author>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue of metadata correction: #6422
2025.emnlp-main.1435
Paper page: https://aclanthology.org/2025.emnlp-main.1435/
Commit showing the bug effect: 7ba600b

Bug affected affiliation and orcid
check orcid belongs to correct person: https://orcid.org/0000-0003-0701-0204 Ranathunga

<pages>28252-28269</pages>
<abstract>Parallel Data Curation (PDC) techniques aim to filter out noisy parallel sentences from web-mined corpora. Ranking sentence pairs using similarity scores on sentence embeddings derived from Pre-trained Multilingual Language Models (multiPLMs) is the most common PDC technique. However, previous research has shown that the choice of the multiPLM significantly impacts the quality of the filtered parallel corpus, and the Neural Machine Translation (NMT) models trained using such data show a disparity across multiPLMs. This paper shows that this disparity is due to different multiPLMs being biased towards certain types of sentence pairs, which are treated as noise from an NMT point of view. We show that such noisy parallel sentences can be removed to a certain extent by employing a series of heuristics. The NMT models, trained using the curated corpus, lead to producing better results while minimizing the disparity across multiPLMs. We publicly release the source code and the curated datasets</abstract>
<url hash="7a30667d">2025.emnlp-main.1435</url>
Expand Down
4 changes: 2 additions & 2 deletions data/xml/2025.starsem.xml
Original file line number Diff line number Diff line change
Expand Up @@ -222,10 +222,10 @@
<paper id="18">
<title>Latent Traits and Cross-Task Transfer: Deconstructing Dataset Interactions in <fixed-case>LLM</fixed-case> Fine-tuning</title>
<author><first>Shambhavi</first><last>Krishna</last><affiliation>University of Massachusetts at Amherst</affiliation></author>
<author orcid="0000-0003-4607-936X"><first>Atharva</first><last>Naik</last><affiliation>Department of Computer Science, University of Massachusetts at Amherst</affiliation></author>
<author><first>Atharva</first><last>Naik</last></author>
<author><first>Chaitali</first><last>Agarwal</last></author>
<author><first>Sudharshan</first><last>Govindan</last></author>
<author><first>Haw-Shiuan</first><last>Chang</last></author>
<author orcid="0000-0003-4607-936X"><first>Haw-Shiuan</first><last>Chang</last><affiliation>Department of Computer Science, University of Massachusetts at Amherst</affiliation></author>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue of metadata correction: #6394
2025.starsem-1.18
Paper page: https://aclanthology.org/2025.starsem-1.18/
Commit showing the bug effect: ec5e183

Bug affected orcid and affiliation
confirmed ORCID now correct Haw-Shiuan Chang : https://orcid.org/0000-0003-4607-936X

<author><first>Taesung</first><last>Lee</last></author>
<pages>225-241</pages>
<abstract>Large language models are increasingly deployed across diverse applications. This often includes tasks LLMs have not encountered during training.This implies that enumerating and obtaining the high-quality training data for all tasks is infeasible. Thus, we often need to rely on transfer learning using datasets with different characteristics, and anticipate out-of-distribution requests.Motivated by this practical need, we propose an analysis framework, building a transfer learning matrix and dimensionality reduction, to dissect these cross-task interactions.We train and analyze 10 models to identify latent abilities (e.g., Reasoning, Sentiment Classification, NLU, Arithmetic)and discover the side effects of the transfer learning.Our findings reveal that performance improvements often defy explanations based on surface-level dataset similarity or source data quality. Instead, hidden statistical factors of the source dataset, such as class distribution and generation length proclivities, alongside specific linguistic features, are actually more influential.This work offers insights into the complex dynamics of transfer learning, paving the way for more predictable and effective LLM adaptation.</abstract>
Expand Down
24 changes: 12 additions & 12 deletions data/xml/2025.wmt.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1177,20 +1177,20 @@
<author><first>Dinesh</first><last>Tewari</last><affiliation>Google</affiliation></author>
<author><first>Baba Mamadi</first><last>Diane</last><affiliation>NKO USA INC</affiliation></author>
<author><first>Djibrila</first><last>Diane</last><affiliation>NKO USA INC</affiliation></author>
<author><first>Solo Farabado</first><last>Cissé</last><affiliation>Stanford University</affiliation></author>
<author><first>Koulako Moussa</first><last>Doumbouya</last><affiliation>NKO USA INC</affiliation></author>
<author><first>Solo Farabado</first><last>Cissé</last><affiliation>NKO USA INC</affiliation></author>
<author><first>Koulako Moussa</first><last>Doumbouya</last><affiliation>Stanford University</affiliation></author>
<author><first>Edoardo</first><last>Ferrante</last><affiliation>Conseggio pe-o patrimonio linguistico ligure</affiliation></author>
<author><first>Alessandro</first><last>Guasoni</last><affiliation>Conseggio pe-o patrimonio linguistico ligure</affiliation></author>
<author><first>Christopher</first><last>Homan</last><affiliation>Paair Institute</affiliation></author>
<author><first>Mamadou K.</first><last>Keita</last><affiliation>NIT, Arunachal Pradesh</affiliation></author>
<author><first>Sudhamoy</first><last>DebBarma</last><affiliation>tyvan.ru</affiliation></author>
<author><first>Ali</first><last>Kuzhuget</last><affiliation>Stanford University</affiliation></author>
<author><first>David</first><last>Anugraha</last><affiliation>Universitas Indonesia</affiliation></author>
<author><first>Muhammad Ravi</first><last>Shulthan Habibi</last><affiliation>University of Zurich</affiliation></author>
<author><first>Sina</first><last>Ahmadi</last><affiliation>Google</affiliation></author>
<author><first>Anthony</first><last>Munthali</last><affiliation>Google</affiliation></author>
<author><first>Jonathan Mingfei</first><last>Liu</last></author>
<author><first>Jonathan</first><last>Eng</last></author>
<author><first>Christopher</first><last>Homan</last></author>
<author><first>Mamadou K.</first><last>Keita</last><affiliation>Paair Institute</affiliation></author>
<author><first>Sudhamoy</first><last>DebBarma</last><affiliation>NIT, Arunachal Pradesh</affiliation></author>
<author><first>Ali</first><last>Kuzhuget</last><affiliation>tyvan.ru</affiliation></author>
<author><first>David</first><last>Anugraha</last><affiliation>Stanford University</affiliation></author>
<author><first>Muhammad Ravi</first><last>Shulthan Habibi</last><affiliation>Universitas Indonesia</affiliation></author>
<author><first>Sina</first><last>Ahmadi</last><affiliation>University of Zurich</affiliation></author>
<author><first>Anthony</first><last>Munthali</last></author>
<author><first>Jonathan Mingfei</first><last>Liu</last><affiliation>Google</affiliation></author>
<author><first>Jonathan</first><last>Eng</last><affiliation>Google</affiliation></author>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue of metadata correction: #6448
2025.wmt-1.85
Paper page: https://aclanthology.org/2025.wmt-1.85/
Commit showing the bug effect: 53420bc

Bug only affected affiliation

  • affiliations of two consecutive authors were swapped
  • affiliations were off by one
  • last two authors need to get back their affiliations from newly introduced authors before them

<pages>1103-1123</pages>
<abstract>We open-source SMOL (Set of Maximal Over-all Leverage), a suite of training data to un-lock machine translation for low-resource languages (LRLs). SMOL has been translated into123 under-resourced languages (125 language pairs), including many for which there exist no previous public resources, for a total of 6.1M translated tokens. SMOL comprises two sub-datasets, each carefully chosen for maximum impact given its size: SMOLSENT, a set of sentences chosen for broad unique token coverage, and SMOLDOC, a document-level source focusing on a broad topic coverage. They join the already released GATITOS for a trifecta of paragraph, sentence, and token-level content. We demonstrate that using SMOL to prompt or fine-tune Large Language Models yields robust chrF improvements. In addition to translation, we provide factuality ratings and rationales for all documents in SMOLDOC, yielding the first factuality datasets for most of these languages.</abstract>
<url hash="b069adea">2025.wmt-1.85</url>
Expand Down