Skip to content

Conversation

@weissenh
Copy link
Contributor

@weissenh weissenh commented Dec 4, 2025

Recording ORCID for this author in name_variants.yaml together with degree institution.

  • merge has already been done by metadata corrections: no need to record "Rifki Putri" as name variant?
  • ORCID found in XML and mentioned by issue submitter
  • degree found on orcid.org, homepage linked via issue submitter and supporting evidence from matching affiliation in XML
  • "Afina" put to first name as this is how it is done in the XML data

(Please replace this text with a description of the changes effected by this pull request.
Include a link to the corresponding Github Issue, if there is one.
Details on how to do this (can be found here).)

Closes #4082

Evidence that all 9 papers belong to same author (no namesake)

  • 5 papers come with `orcid attribute in XML
  • 6 papers mention KAIST as affiliation in XML
  • remaining ones have consistent affiliation in PDF with homepage/orcid.org (recently changed to Universitas Gadjah Mada)

- orcid found in XML and mentioned by issue submitter
- degree found on orcid.org, homepage and supporting evidence from matching affiliation in XML
@weissenh weissenh added this to the Author page backlog milestone Dec 4, 2025
@weissenh weissenh self-assigned this Dec 4, 2025
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

Build successful. Some useful links:

This preview will be removed when the branch is merged.

@weissenh weissenh requested a review from Azax4 December 7, 2025 12:43
@Azax4
Copy link
Collaborator

Azax4 commented Dec 8, 2025

Hi @mjpost, did we decide to add an entry into name_variants just to store the ORCID and affiliation? Last I remember, we decided against it, but it makes sense to do so if we are planning to move to the new system soon.

The only drawback I can think of is we might end up misassigning papers for new authors with the same canonical name as an existing author.

If we are planning to do a one-time backfill using orcids that should also resolve the issue automatically when we do that, and we wouldn't need to add a separate entry.

If we want to store this information without affecting the current pipelines, maybe we can add this information to another file ?

@weissenh
Copy link
Contributor Author

weissenh commented Dec 8, 2025

[Azax4] did we decide to add an entry into name_variants just to store the ORCID and affiliation? Last I remember, we decided against it, but it makes sense to do so if we are planning to move to the new system soon.

Example of one author page request that was kept open with @mbollmann stating as reason on Oct 7:

[mbollmann] I will keep this issue open anyway so that we can record your ORCID, which will help reduce these problems in the future.

I believe there is another statement somewhere (haven't found it on the spot) stating the goal is to collect as many ORCIDs as possible - we have already started to add them anyway - is there a reason to stop this temporarily? Except for the short period where we need to merge the transition branches to transition to people.yaml fully). Or would it help you @mbollmann to stop recording ORCIDs as you seem to struggle with ORCID-related problems right now that hinder the transition process moving forward? (cf. #5471 (comment) )

@weissenh
Copy link
Contributor Author

weissenh commented Dec 8, 2025

[Azax4] The only drawback I can think of is we might end up misassigning papers for new authors with the same canonical name as an existing author.

In an open PR authored by me for another author page request, I voiced the same worry for a slightly different case (2 namesakes known): that newly ingested papers before the transition to the new representation might get assigned to the wrong person. @mbollmann responded

  • I don’t think we should base any decisions on how the old system works.

I'd like to add that for this particular name (Rifki Afina Putri) I consider it rather unlikely that a namesake appears between now and the time the new author representation is live (which hopefully is not too far in the future, but no pressure from my side!).

@weissenh
Copy link
Contributor Author

weissenh commented Dec 8, 2025

[Azax4] If we are planning to do a one-time backfill using orcids that should also resolve the issue automatically when we do that, and we wouldn't need to add a separate entry.

Not sure what you're referring to with "this information", but if you haven't seen it already, you might also read the last few comments towards the end of the cancelled "Script to transition metadata to new author representation" PR.

I don't think backfilling ORCIDs would bring us a degree institution for any author for free. The ORCID is already found once in the XML, so iirc according to the transition logic all papers of Rifki Afina Putri would be assigned the same id (which in turn is coupled with an ORCID in the new people.yaml). Afaik this is to not break things for most authors, whose names are not ambiguous at all and who will most likely not have every paper of them tagged with an ORCID even if we would backfill ORCIDs (cf. https://github.com/acl-org/acl-anthology/wiki/Author-Page-Plan#proposed-name-resolution-logic ).

When deciding which author page request to deal with next, however, I can see the point of postponing all open author page requests that will get solved (or made a lot easier) by the transition to the ORCID-based system. I.e. merging two author pages with different name variants if papers of both variants are tagged with ORCID and there are no namesakes.

@mbollmann
Copy link
Member

Example of one author page request that was kept open with @mbollmann stating as reason on Oct 7:

[mbollmann] I will keep this issue open anyway so that we can record your ORCID, which will help reduce these problems in the future.

We didn’t have any ORCIDs in the XML yet at that point. If the person’s ORCID is in the XML, and there are no name variants to record, adding them to name_variants.yaml now will make no difference at all for the transition.

@mbollmann
Copy link
Member

I don't think backfilling ORCIDs would bring us a degree institution for any author for free.

That is correct, but also, we’re only using that information for disambiguation purposes, mainly for the author ID. If there are no similarly-named authors, I don’t really feel strongly about adding this kind of information.

@mjpost
Copy link
Member

mjpost commented Dec 13, 2025

Our goal is to collect ORCIDs and explicitly identify as many people as possible. Therefore if someone provides it, we should create the entry in name_variants and secure their ID.

@mjpost mjpost merged commit 15f2189 into master Dec 13, 2025
4 checks passed
@mjpost mjpost deleted the author-page-rifki-afina-putri branch December 13, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Author Metadata: Rifki Afina Putri

5 participants