acl-org · weissenh · Nov 16, 2025 · Nov 16, 2025 · Nov 16, 2025 · Nov 16, 2025
diff --git a/data/xml/2016.gwc.xml b/data/xml/2016.gwc.xml
@@ -113,7 +113,7 @@
       <title><fixed-case>CILI</fixed-case>: the Collaborative Interlingual Index</title>
       <author><first>Francis</first><last>Bond</last></author>
       <author><first>Piek</first><last>Vossen</last></author>
-      <author><first>John P.</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <author><first>Christiane</first><last>Fellbaum</last></author>
       <pages>50–57</pages>
       <abstract>This paper introduces the motivation for and design of the Collaborative InterLingual Index (CILI). It is designed to make possible coordination between multiple loosely coupled wordnet projects. The structure of the CILI is based on the Interlingual index first proposed in the EuroWordNet project with several pragmatic extensions: an explicit open license, definitions in English and links to wordnets in the Global Wordnet Grid.</abstract>
@@ -679,7 +679,7 @@
       <title>Toward a truly multilingual <fixed-case>G</fixed-case>lobal<fixed-case>W</fixed-case>ordnet Grid</title>
       <author><first>Piek</first><last>Vossen</last></author>
       <author><first>Francis</first><last>Bond</last></author>
-      <author><first>John</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <pages>424–431</pages>
       <abstract>In this paper, we describe a new and improved Global Wordnet Grid that takes advantage of the Collaborative InterLingual Index (CILI). Currently, the Open Multilingal Wordnet has made many wordnets accessible as a single linked wordnet, but as it used the Princeton Wordnet of English (PWN) as a pivot, it loses concepts that are not part of PWN. The technical solution to this, a central registry of concepts, as proposed in the EuroWordnet project through the InterLingual Index, has been known for many years. However, the practical issues of how to host this index and who decides what goes in remained unsolved. Inspired by current practice in the Semantic Web and the Linked Open Data community, we propose a way to solve this issue. In this paper we define the principles and protocols for contributing to the Grid. We tested them on two use cases, adding version 3.1 of the Princeton WordNet to a CILI based on 3.0 and adding the Open Dutch Wordnet, to validate the current set up. This paper aims to be a call for action that we hope will be further discussed and ultimately taken up by the whole wordnet community.</abstract>
       <url hash="a32bf755">2016.gwc-1.59</url>

diff --git a/data/xml/2018.gwc.xml b/data/xml/2018.gwc.xml
@@ -98,7 +98,7 @@
     </paper>
     <paper id="8">
       <title>Mapping <fixed-case>W</fixed-case>ord<fixed-case>N</fixed-case>et Instances to <fixed-case>W</fixed-case>ikipedia</title>
-      <author><first>John P.</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <pages>61–68</pages>
       <abstract>Lexical resource differ from encyclopaedic resources and represent two distinct types of resource covering general language and named entities respectively. However, many lexical resources, including Princeton WordNet, contain many proper nouns, referring to named entities in the world yet it is not possible or desirable for a lexical resource to cover all named entities that may reasonably occur in a text. In this paper, we propose that instead of including synsets for instance concepts PWN should instead provide links to Wikipedia articles describing the concept. In order to enable this we have created a gold-quality mapping between all of the 7,742 instances in PWN and Wikipedia (where such a mapping is possible). As such, this resource aims to provide a gold standard for link discovery, while also allowing PWN to distinguish itself from other resources such as DBpedia or BabelNet. Moreover, this linking connects PWN to the Linguistic Linked Open Data cloud, thus creating a richer, more usable resource for natural language processing.</abstract>
       <url hash="f49cf29b">2018.gwc-1.8</url>
@@ -121,7 +121,7 @@
       <title>Improving Wordnets for Under-Resourced Languages Using Machine Translation</title>
       <author><first>Bharathi Raja</first><last>Chakravarthi</last></author>
       <author><first>Mihael</first><last>Arcan</last></author>
-      <author><first>John P.</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <pages>77–86</pages>
       <abstract>Wordnets are extensively used in natural language processing, but the current approaches for manually building a wordnet from scratch involves large research groups for a long period of time, which are typically not available for under-resourced languages. Even if wordnet-like resources are available for under-resourced languages, they are often not easily accessible, which can alter the results of applications using these resources. Our proposed method presents an expand approach for improving and generating wordnets with the help of machine translation. We apply our methods to improve and extend wordnets for the Dravidian languages, i.e., Tamil, Telugu, Kannada, which are severly under-resourced languages. We report evaluation results of the generated wordnet senses in term of precision for these languages. In addition to that, we carried out a manual evaluation of the translations for the Tamil language, where we demonstrate that our approach can aid in improving wordnet resources for under-resourced Dravidian languages.</abstract>
       <url hash="57927f05">2018.gwc-1.10</url>
@@ -459,8 +459,8 @@
     </paper>
     <paper id="40">
       <title><fixed-case>ELEXIS</fixed-case> - a <fixed-case>E</fixed-case>uropean infrastructure fostering cooperation and information exchange among lexicographical research communities</title>
-      <author><first>Bolette</first><last>Pedersen</last></author>
-      <author><first>John</first><last>McCrae</last></author>
+      <author><first>Bolette S.</first><last>Pedersen</last></author>
+      <author id="john-philip-mccrae"><first>John</first><last>McCrae</last></author>
       <author><first>Carole</first><last>Tiberius</last></author>
       <author><first>Simon</first><last>Krek</last></author>
       <pages>335–340</pages>
@@ -583,8 +583,8 @@
     </paper>
     <paper id="51">
       <title>Towards a Crowd-Sourced <fixed-case>W</fixed-case>ord<fixed-case>N</fixed-case>et for Colloquial <fixed-case>E</fixed-case>nglish</title>
-      <author><first>John P.</first><last>McCrae</last></author>
-      <author><first>Ian</first><last>Wood</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
+      <author><first>Ian D.</first><last>Wood</last></author>
       <author><first>Amanda</first><last>Hicks</last></author>
       <pages>401–406</pages>
       <abstract>Princeton WordNet is one of the most widely-used resources for natural language processing, but is updated only infrequently and cannot keep up with the fast-changing usage of the English language on social media platforms such as Twitter. The Colloquial WordNet aims to provide an open platform whereby anyone can contribute, while still following the structure of WordNet. Many crowd-sourced lexical resources often have significant quality issues, and as such care must be taken in the design of the interface to ensure quality. In this paper, we present the development of a platform that can be opened on the Web to any lexicographer who wishes to contribute to this resource and the lexicographic methodology applied by this interface.</abstract>

diff --git a/data/xml/2019.gwc.xml b/data/xml/2019.gwc.xml
@@ -331,7 +331,7 @@
     </paper>
     <paper id="31">
       <title><fixed-case>E</fixed-case>nglish <fixed-case>W</fixed-case>ord<fixed-case>N</fixed-case>et 2019 – An Open-Source <fixed-case>W</fixed-case>ord<fixed-case>N</fixed-case>et for <fixed-case>E</fixed-case>nglish</title>
-      <author><first>John P.</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <author><first>Alexandre</first><last>Rademaker</last></author>
       <author><first>Francis</first><last>Bond</last></author>
       <author><first>Ewa</first><last>Rudnicka</last></author>

diff --git a/data/xml/2020.cogalex.xml b/data/xml/2020.cogalex.xml
@@ -91,7 +91,7 @@
     <paper id="8">
       <title><fixed-case>C</fixed-case>og<fixed-case>AL</fixed-case>ex-<fixed-case>VI</fixed-case> Shared Task: Bidirectional Transformer based Identification of Semantic Relations</title>
       <author><first>Saurav</first><last>Karmakar</last></author>
-      <author><first>John P.</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John</first><last>McCrae</last></author>
       <pages>65–71</pages>
       <abstract>This paper presents a bidirectional transformer based approach for recognising semantic relationships between a pair of words as proposed by CogALex VI shared task in 2020. The system presented here works by employing BERT embeddings of the words and passing the same over tuned neural network to produce a learning model for the pair of words and their relationships. Afterwards the very same model is used for the relationship between unknown words from the test set. CogALex VI provided Subtask 1 as the identification of relationship of three specific categories amongst English pair of words and the presented system opts to work on that. The resulted relationships of the unknown words are analysed here which shows a balanced performance in overall characteristics with some scope for improvement.</abstract>
       <url hash="560a438b">2020.cogalex-1.8</url>

diff --git a/data/xml/2020.coling.xml b/data/xml/2020.coling.xml
@@ -1724,7 +1724,7 @@
       <author><first>Rajdeep</first><last>Sarkar</last></author>
       <author><first>Bharathi Raja</first><last>Chakravarthi</last></author>
       <author><first>Theodorus</first><last>Fransen</last></author>
-      <author><first>John P.</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <pages>1606–1617</pages>
       <abstract>Automatic Language Identification (LI) or Dialect Identification (DI) of short texts of closely related languages or dialects, is one of the primary steps in many natural language processing pipelines. Language identification is considered a solved task in many cases; however, in the case of very closely related languages, or in an unsupervised scenario (where the languages are not known in advance), performance is still poor. In this paper, we propose the Unsupervised Deep Language and Dialect Identification (UDLDI) method, which can simultaneously learn sentence embeddings and cluster assignments from short texts. The UDLDI model understands the sentence constructions of languages by applying attention to character relations which helps to optimize the clustering of languages. We have performed our experiments on three short-text datasets for different language families, each consisting of closely related languages or dialects, with very minimal training sets. Our experimental evaluations on these datasets have shown significant improvement over state-of-the-art unsupervised methods and our model has outperformed state-of-the-art LI and DI systems in supervised settings.</abstract>
       <url hash="5a84e94b">2020.coling-main.141</url>
@@ -4467,7 +4467,7 @@
       <author><first>Rajdeep</first><last>Sarkar</last></author>
       <author><first>Koustava</first><last>Goswami</last></author>
       <author><first>Mihael</first><last>Arcan</last></author>
-      <author><first>John P.</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <pages>4179–4189</pages>
       <abstract>Conversational recommender systems focus on the task of suggesting products to users based on the conversation flow. Recently, the use of external knowledge in the form of knowledge graphs has shown to improve the performance in recommendation and dialogue systems. Information from knowledge graphs aids in enriching those systems by providing additional information such as closely related products and textual descriptions of the items. However, knowledge graphs are incomplete since they do not contain all factual information present on the web. Furthermore, when working on a specific domain, knowledge graphs in its entirety contribute towards extraneous information and noise. In this work, we study several subgraph construction methods and compare their performance across the recommendation task. We incorporate pre-trained embeddings from the subgraphs along with positional embeddings in our models. Extensive experiments show that our method has a relative improvement of at least 5.62% compared to the state-of-the-art on multiple metrics on the recommendation task.</abstract>
       <url hash="1d52f878">2020.coling-main.369</url>

diff --git a/data/xml/2020.figlang.xml b/data/xml/2020.figlang.xml
@@ -284,7 +284,7 @@
     <paper id="22">
       <title>Adaptation of Word-Level Benchmark Datasets for Relation-Level Metaphor Identification</title>
       <author><first>Omnia</first><last>Zayed</last></author>
-      <author><first>John Philip</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <author><first>Paul</first><last>Buitelaar</last></author>
       <pages>154–164</pages>
       <abstract>Metaphor processing and understanding has attracted the attention of many researchers recently with an increasing number of computational approaches. A common factor among these approaches is utilising existing benchmark datasets for evaluation and comparisons. The availability, quality and size of the annotated data are among the main difficulties facing the growing research area of metaphor processing. The majority of current approaches pertaining to metaphor processing concentrate on word-level processing due to data availability. On the other hand, approaches that process metaphors on the relation-level ignore the context where the metaphoric expression. This is due to the nature and format of the available data. Word-level annotation is poorly grounded theoretically and is harder to use in downstream tasks such as metaphor interpretation. The conversion from word-level to relation-level annotation is non-trivial. In this work, we attempt to fill this research gap by adapting three benchmark datasets, namely the VU Amsterdam metaphor corpus, the TroFi dataset and the TSV dataset, to suit relation-level metaphor identification. We publish the adapted datasets to facilitate future research in relation-level metaphor processing.</abstract>

diff --git a/data/xml/2020.findings.xml b/data/xml/2020.findings.xml
@@ -467,7 +467,7 @@
     <paper id="36">
       <title>Contextual Modulation for Relation-Level Metaphor Identification</title>
       <author><first>Omnia</first><last>Zayed</last></author>
-      <author><first>John P.</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <author><first>Paul</first><last>Buitelaar</last></author>
       <pages>388–406</pages>
       <abstract>Identifying metaphors in text is very challenging and requires comprehending the underlying comparison. The automation of this cognitive process has gained wide attention lately. However, the majority of existing approaches concentrate on word-level identification by treating the task as either single-word classification or sequential labelling without explicitly modelling the interaction between the metaphor components. On the other hand, while existing relation-level approaches implicitly model this interaction, they ignore the context where the metaphor occurs. In this work, we address these limitations by introducing a novel architecture for identifying relation-level metaphoric expressions of certain grammatical relations based on contextual modulation. In a methodology inspired by works in visual reasoning, our approach is based on conditioning the neural network computation on the deep contextualised features of the candidate expressions using feature-wise linear modulation. We demonstrate that the proposed architecture achieves state-of-the-art results on benchmark datasets. The proposed methodology is generic and could be applied to other textual classification problems that benefit from contextual interaction.</abstract>

diff --git a/data/xml/2020.globalex.xml b/data/xml/2020.globalex.xml
@@ -5,7 +5,7 @@
       <booktitle>Proceedings of the 2020 Globalex Workshop on Linked Lexicography</booktitle>
       <editor><first>Ilan</first><last>Kernerman</last></editor>
       <editor><first>Simon</first><last>Krek</last></editor>
-      <editor><first>John P.</first><last>McCrae</last></editor>
+      <editor id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></editor>
       <editor><first>Jorge</first><last>Gracia</last></editor>
       <editor><first>Sina</first><last>Ahmadi</last></editor>
       <editor><first>Besim</first><last>Kabashi</last></editor>
@@ -29,7 +29,7 @@
       <author><first>Anas Fahad</first><last>Khan</last></author>
       <author><first>Sander</first><last>Stolk</last></author>
       <author><first>Thierry</first><last>Declerck</last></author>
-      <author><first>John Philip</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <pages>1–9</pages>
       <abstract>The OntoLex vocabulary enjoys increasing popularity as a means of publishing lexical resources with RDF and as Linked Data. The recent publication of a new OntoLex module for lexicography, lexicog, reflects its increasing importance for digital lexicography. However, not all aspects of digital lexicography have been covered to the same extent. In particular, supplementary information drawn from corpora such as frequency information, links to attestations, and collocation data were considered to be beyond the scope of lexicog. Therefore, the OntoLex community has put forward the proposal for a novel module for frequency, attestation and corpus information (FrAC), that not only covers the requirements of digital lexicography, but also accommodates essential data structures for lexical information in natural language processing. This paper introduces the current state of the OntoLex-FrAC vocabulary, describes its structure, some selected use cases, elementary concepts and fundamental definitions, with a focus on frequency and attestations.</abstract>
       <url hash="3d238906">2020.globalex-1.1</url>
@@ -172,7 +172,7 @@
     </paper>
     <paper id="15">
       <title><fixed-case>NUIG</fixed-case> at <fixed-case>TIAD</fixed-case>: Combining Unsupervised <fixed-case>NLP</fixed-case> and Graph Metrics for Translation Inference</title>
-      <author><first>John Philip</first><last>McCrae</last></author>
+      <author id="john-philip-mccrae"><first>John P.</first><last>McCrae</last></author>
       <author><first>Mihael</first><last>Arcan</last></author>
       <pages>92–97</pages>
       <abstract>In this paper, we present the NUIG system at the TIAD shard task. This system includes graph-based metrics calculated using novel algorithms, with an unsupervised document embedding tool called ONETA and an unsupervised multi-way neural machine translation method. The results are an improvement over our previous system and produce the highest precision among all systems in the task as well as very competitive F-Measure results. Incorporating features from other systems should be easy in the framework we describe in this paper, suggesting this could very easily be extended to an even stronger result.</abstract>