Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions data/xml/2019.iwslt.xml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
<author><first>Thai-Son</first><last>Nguyen</last></author>
<author><first>Thanh-Le</first><last>Ha</last></author>
<author><first>Juan</first><last>Hussain</last></author>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Jan</first><last>Niehues</last></author>
<author><first>Sebastian</first><last>Stüker</last></author>
<author><first>Alexander</first><last>Waibel</last></author>
Expand Down Expand Up @@ -164,7 +164,7 @@
</paper>
<paper id="13">
<title><fixed-case>KIT</fixed-case>’s Submission to the <fixed-case>IWSLT</fixed-case> 2019 Shared Task on Text Translation</title>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Alex</first><last>Waibel</last></author>
<abstract>In this paper, we describe KIT’s submission for the IWSLT 2019 shared task on text translation. Our system is based on the transformer model [1] using our in-house implementation. We augment the available training data using back-translation and employ fine-tuning for the final model. For our best results, we used a 12-layer transformer-big config- uration, achieving state-of-the-art results on the WMT2018 test set. We also experiment with student-teacher models to improve performance of smaller models.</abstract>
<url hash="a61bab92">2019.iwslt-1.13</url>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2020.eamt.xml
Original file line number Diff line number Diff line change
Expand Up @@ -572,7 +572,7 @@
<author><first>Chiara</first><last>Canton</last></author>
<author><first>Ivan</first><last>Simonini</last></author>
<author><first>Thai-Son</first><last>Nguyen</last></author>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Sebastian</first><last>Stücker</last></author>
<author><first>Alex</first><last>Waibel</last></author>
<author><first>Barry</first><last>Haddow</last></author>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2020.iwltp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@
<author><first>Adelheid</first><last>Glott</last></author>
<author><first>Sebastian</first><last>Stüker</last></author>
<author><first>Thai-Son</first><last>Nguyen</last></author>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Thanh-Le</first><last>Ha</last></author>
<author><first>Alex</first><last>Waibel</last></author>
<author><first>Barry</first><last>Haddow</last></author>
Expand Down
6 changes: 3 additions & 3 deletions data/xml/2020.iwslt.xml
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@
<paper id="4">
<title><fixed-case>KIT</fixed-case>’s <fixed-case>IWSLT</fixed-case> 2020 <fixed-case>SLT</fixed-case> Translation System</title>
<author><first>Ngoc-Quan</first><last>Pham</last></author>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Tuan-Nam</first><last>Nguyen</last></author>
<author><first>Thanh-Le</first><last>Ha</last></author>
<author><first>Thai Son</first><last>Nguyen</last></author>
Expand Down Expand Up @@ -377,7 +377,7 @@
<author><first>Matúš</first><last>Žilinec</last></author>
<author><first>Ondřej</first><last>Bojar</last></author>
<author><first>Thai-Son</first><last>Nguyen</last></author>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Philip</first><last>Williams</last></author>
<author><first>Yuekun</first><last>Yao</last></author>
<pages>200–208</pages>
Expand Down Expand Up @@ -414,7 +414,7 @@
</paper>
<paper id="28">
<title>Towards Stream Translation: Adaptive Computation Time for Simultaneous Machine Translation</title>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Alexander</first><last>Waibel</last></author>
<pages>228–236</pages>
<abstract>Simultaneous machine translation systems rely on a policy to schedule read and write operations in order to begin translating a source sentence before it is complete. In this paper, we demonstrate the use of Adaptive Computation Time (ACT) as an adaptive, learned policy for simultaneous machine translation using the transformer model and as a more numerically stable alternative to Monotonic Infinite Lookback Attention (MILk). We achieve state-of-the-art results in terms of latency-quality tradeoffs. We also propose a method to use our model on unsegmented input, i.e. without sentence boundaries, simulating the condition of translating output from automatic speech recognition. We present first benchmark results on this task.</abstract>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2021.eacl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4363,7 +4363,7 @@
<author><first>Chiara</first><last>Canton</last></author>
<author><first>Ivan</first><last>Simonini</last></author>
<author><first>Thai-Son</first><last>Nguyen</last></author>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Sebastian</first><last>Stüker</last></author>
<author><first>Alex</first><last>Waibel</last></author>
<author><first>Barry</first><last>Haddow</last></author>
Expand Down
8 changes: 5 additions & 3 deletions data/xml/2021.iwslt.xml
Original file line number Diff line number Diff line change
Expand Up @@ -200,13 +200,15 @@
</paper>
<paper id="13">
<title><fixed-case>KIT</fixed-case>’s <fixed-case>IWSLT</fixed-case> 2021 Offline Speech Translation System</title>
<author><first>Tuan Nam</first><last>Nguyen</last></author>
<author><first>Thai Son</first><last>Nguyen</last></author>
<author><first>Tuan-Nam</first><last>Nguyen</last></author>
<author><first>Thai-Son</first><last>Nguyen</last></author>
<author><first>Christian</first><last>Huber</last></author>
<author><first>Maximilian</first><last>Awiszus</last></author>
<author><first>Ngoc-Quan</first><last>Pham</last></author>
<author><first>Thanh-Le</first><last>Ha</last></author>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Sebastian</first><last>Stüker</last></author>
<author><first>Alexander</first><last>Waibel</last></author>
<pages>125–130</pages>
<abstract>This paper describes KIT’submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different end-to-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year’s neural machine translation model was reused. In the end-to-end condition, we improved our Speech Relative Transformer architecture to reach or even surpass the result of the cascade system.</abstract>
<url hash="dac417e1">2021.iwslt-1.13</url>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed one paper was missing two authors in metadata compared to the paper, so I added them and added missing hyphens to two other co-author names for the same paper. Compare:

Expand Down
2 changes: 1 addition & 1 deletion data/xml/2021.latechclfl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@
</paper>
<paper id="11">
<title>Data-Driven Detection of General Chiasmi Using Lexical and Semantic Features</title>
<author><first>Felix</first><last>Schneider</last><affiliation>Friedrich Schiller University, Jena</affiliation></author>
<author id="felix-schneider-fsujena"><first>Felix</first><last>Schneider</last><affiliation>Friedrich Schiller University, Jena</affiliation></author>
<author><first>Phillip</first><last>Brandes</last></author>
<author><first>Björn</first><last>Barz</last></author>
<author><first>Sophie</first><last>Marshall</last></author>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2021.mtsummit.xml
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ Our models outperform massively multilingual models such as Google (<tex-math>+8
<author><first>Vojtěch</first><last>Srdečný</last></author>
<author><first>Rishu</first><last>Kumar</last></author>
<author><first>Otakar</first><last>Smrž</last></author>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Barry</first><last>Haddow</last></author>
<author><first>Phil</first><last>Williams</last></author>
<author><first>Chiara</first><last>Canton</last></author>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2022.mwe.xml
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@
</paper>
<paper id="11">
<title>Metaphor Detection for Low Resource Languages: From Zero-Shot to Few-Shot Learning in <fixed-case>M</fixed-case>iddle <fixed-case>H</fixed-case>igh <fixed-case>G</fixed-case>erman</title>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-fsujena"><first>Felix</first><last>Schneider</last></author>
<author><first>Sven</first><last>Sickert</last></author>
<author><first>Phillip</first><last>Brandes</last></author>
<author><first>Sophie</first><last>Marshall</last></author>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See issue 4345 : user originally asked about this in a (still open) metadata correction issue that we might want to close : he tried to add an affiliation to one of his namesake's papers hoping to disambiguate that way. Should we close the open issue on this metadata correction or do we ever ingest affiliations using metadata corrections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a bit on the fence on this one, I think the reason we record affiliations is because we sometimes get this data in ingestion materials anyway. However, we don’t currently use it for anything or plan to use it for anything, and we definitely don’t want to encourage users to submit metadata requests for this reason. So actually, I guess I’m tending towards “no”. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use affiliations for matching. We collect it in the paper XML only for completeness since it is sometimes provided to us, as @mbollmann mentions. If someone submits it, it doesn't hurt to accept it, but we don't want to encourage it. We even had a field exposing it in the correction dialog and removed that.

This makes me wonder if we should just remove the affiliation field. A question for another day.

Expand Down
2 changes: 1 addition & 1 deletion data/xml/2022.sigul.xml
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@
<paper id="17">
<title>Machine Translation from <fixed-case>S</fixed-case>tandard <fixed-case>G</fixed-case>erman to Alemannic Dialects</title>
<author><first>Louisa</first><last>Lambrecht</last></author>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Alexander</first><last>Waibel</last></author>
<pages>129–136</pages>
<abstract>Machine translation has been researched using deep neural networks in recent years. These networks require lots of data to learn abstract representations of the input stored in continuous vectors. Dialect translation has become more important since the advent of social media. In particular, when dialect speakers and standard language speakers no longer understand each other, machine translation is of rising concern. Usually, dialect translation is a typical low-resourced language setting facing data scarcity problems. Additionally, spelling inconsistencies due to varying pronunciations and the lack of spelling rules complicate translation. This paper presents the best-performing approaches to handle these problems for Alemannic dialects. The results show that back-translation and conditioning on dialectal manifestations achieve the most remarkable enhancement over the baseline. Using back-translation, a significant gain of +4.5 over the strong transformer baseline of 37.3 BLEU points is accomplished. Differentiating between several Alemannic dialects instead of treating Alemannic as one dialect leads to substantial improvements: Multi-dialectal translation surpasses the baseline on the dialectal test sets. However, training individual models outperforms the multi-dialectal approach. There, improvements range from 7.5 to 10.6 BLEU points over the baseline depending on the dialect.</abstract>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2023.inlg.xml
Original file line number Diff line number Diff line change
Expand Up @@ -698,7 +698,7 @@
</paper>
<paper id="14">
<title>Team Zoom @ <fixed-case>A</fixed-case>uto<fixed-case>M</fixed-case>in 2023: Utilizing Topic Segmentation And <fixed-case>LLM</fixed-case> Data Augmentation For Long-Form Meeting Summarization</title>
<author><first>Felix</first><last>Schneider</last></author>
<author id="felix-schneider-kit"><first>Felix</first><last>Schneider</last></author>
<author><first>Marco</first><last>Turchi</last></author>
<pages>101–107</pages>
<abstract>This paper describes Zoom’s submission to the Second Shared Task on Automatic Minuting at INLG 2023. We participated in Task A: generating abstractive summaries of meetings. Our final submission was a transformer model utilizing data from a similar domain and data augmentation by large language models, as well as content-based segmentation. The model produces summaries covering meeting topics and next steps and performs comparably to a large language model at a fraction of the cost. We also find that re-summarizing the summaries with the same model allows for an alternative, shorter summary.</abstract>
Expand Down
10 changes: 10 additions & 0 deletions data/yaml/name_variants.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8952,6 +8952,16 @@
- {first: Julian, last: Schlöder}
- canonical: {first: Laurent, last: Schmitt}
id: laurent-schmitt
- canonical: {first: Felix, last: Schneider}
id: felix-schneider-fsujena
orcid: 0009-0008-9953-6695
degree: Friedrich-Schiller Universität Jena
comment: Uni Jena
- canonical: {first: Felix, last: Schneider}
id: felix-schneider-kit
orcid: 0009-0006-5226-3023
degree: Karlsruhe Institute of Technology
comment: KIT
Copy link
Contributor Author

@weissenh weissenh Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this person have id: felix-schneider without -kit?

Pro:

  • user doesn't have to change the link on their openReview profile
  • most papers belong to this user (12/14)
  • he is the issue submitter, so if we use "first come, first serve" he could reserve the "default" id.

Con:

  • new papers (before new author system is live) will likely get added to this person - someone who has already complained about papers showing up under their name that are not theirs...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this person have id: felix-schneider without -kit?

I’d say “probably yes”, but I don’t fully remember if we ever made a final decision on our future ID policy, it doesn’t appear to be written down in the wiki at least? @mjpost

Con:

* new papers (before new author system is live) will likely get added to this person - someone who has already complained about papers showing up under their name that are not theirs...
  • I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.
  • I don’t think we should base any decisions on how the old system works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I don’t fully remember if we ever made a final decision on our future ID policy, it doesn’t appear to be written down in the wiki at least?

From the author page plan: https://github.com/acl-org/acl-anthology/wiki/Author-Page-Plan#disambiguation (last sentence before next section)

This means that the first person to have an explicit ID created for their name will "lock in" that ID (e.g. yang-liu) to themselves, while other persons with the same name will need a disambiguator appended to it.

So I thought maybe since the KIT person was the first to ask, he can reserve this ID for himself? Normally when dealing with author page requests right now, I need to reserve the simplest id to the catch-all "May refer to several persons" case because I can't always fully disambiguate the name, but just single out one author from "the rest". So right now, the first person to ask often gets a more complicated ID - unless I can assign each paper to a specific person, like in this case.

I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.

Interesting, there are some author page requests recently where for an ambiguous name a new paper got assigned to the catch-all ("May refer to several persons") rather than an existing, more specific ID (with degree institution as suffix) because ORCID-matching isn't enabled yet. However, I didn't check when the new paper got ingested and how the ingestion script looked at that point in time.
So I assumed that if there is a new "Felix Schneider" paper and there is a felix-schneider id, that paper will get mapped to this id, even when there is another "Felix Schneider" in name variants. I agree that one shouldn't rely too much on the old system logic when a new system is under way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I thought maybe since the KIT person was the first to ask, he can reserve this ID for himself?

There’s definitely lots of discussion on this exact topic buried in the new-author-system mega-thread, which is why I pinged @mjpost in the hopes that he remembers if we took a decision on that :) (don’t have time to dig it up right now)

I don’t think that’s correct — as soon as the name is ambiguous, manual intervention is required during ingestion, there’s no “default association” in that case.

Interesting, there are some author page requests recently where for an ambiguous name a new paper got assigned to the catch-all ("May refer to several persons") rather than an existing, more specific ID (with degree institution as suffix) because ORCID-matching isn't enabled yet.

I don’t know the ingestion scripts super well either, but what I meant is that under the old system, IDs do not need to get written to the XML (by default) except in ambiguous cases, so when there’s ambiguity, some decisions needs to be taken which ID to choose. It may be that we used to default to the "catch-all" ID when there’s no time to disambiguate manually. In any case, that’s the old system — let’s move on with the assumption that the new system will be in place for the next major ingestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, sorry I missed this. This is my understanding: the first person to request an ID can claim it. As long as we have the ORCID iD, we can match in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes, I would recommend that we keep one person as the base case.

- canonical: {first: René, last: Schneider}
variants:
- {first: Rene, last: Schneider}
Expand Down