Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_taxa not handling cases where a taxa is flagged as having a homonym issue #200

Open
wcornwell opened this issue Jul 7, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@wcornwell
Copy link

wcornwell commented Jul 7, 2023

Describe the bug
The tibble input is not being parsed properly by search_taxa to return the correct taxa_id in the case where there is a Homonym issue with one of the taxa. The help file suggests the tibble input is the right approach for this case but it's not working for me.

galah version
1.5.2

To Reproduce

search_taxa(tibble(genus="Acanthocladium", class="Equisetopsida"))

Expected behaviour
It should return the taxa_id for "Acanthocladium" which is the current name for a small daisy genus. The homonym issue is with a moss genus that was formerly (no longer) also called "Acanthocladium".

I expected including tibble(genus="Acanthocladium", class="Equisetopsida") would resolve the homonym issue and the correct taxa_id would be returned.

Instead of the daisy genus, search_taxa returns the taxa_id for Equisetopsida which leads to a large query that then crashes the API.
Screenshot 2023-07-07 at 3 57 58 pm

Apologies about the crashes, it took me a while to work out what was going on.

Additional context
This is related to #168 and #194

@wcornwell wcornwell added the bug Something isn't working label Jul 7, 2023
@wcornwell wcornwell changed the title search_taxa has search_taxa not handling cases where a taxa is flagged as having a homonym issue Jul 7, 2023
@daxkellie
Copy link
Collaborator

Thanks for reaching out. I was able to replicate this error and there does appear to be something wrong with how search_taxa() prioritises higher rank information supplied in a tibble.

At this point, I'm not sure why this is, but I first wanted to offer one solution:

Adding additional search information like authorship to your search can help return the correct results. On the ALA, the name authorship is attributed to F.Muell. Adding this information to your text search returns the correct result:

library(galah)
library(tibble)

search_taxa("Acanthocladium F.Muell")
#> # A tibble: 1 × 13
#>   search_term      scientific_name scientific_name_auth…¹ taxon_concept_id rank 
#>   <chr>            <chr>           <chr>                  <chr>            <chr>
#> 1 Acanthocladium … Acanthocladium  F.Muell.               https://id.biod… genus
#> # ℹ abbreviated name: ¹​scientific_name_authorship
#> # ℹ 8 more variables: match_type <chr>, kingdom <chr>, phylum <chr>,
#> #   class <chr>, order <chr>, family <chr>, genus <chr>, issues <chr>

And this seems to return an expected, nice, small number in a query too!

taxa <- search_taxa("Acanthocladium F.Muell")

galah_call() |>
  identify(taxa) |>
  atlas_counts()
#> # A tibble: 1 × 1
#>   count
#>   <int>
#> 1   128

@wcornwell
Copy link
Author

Great! thanks for the workaround!

@mjwestgate
Copy link
Collaborator

A couple of points to confirm this problem. First, taxonomic search through BIE shows that the supplied hierarchy is present in ALA:
https://bie.ala.org.au/species/https://id.biodiversity.org.au/node/apni/2910163#classification

Second, galah is constructing the underlying API correctly. Even when requested taxonomic level is specified to the API, higher taxon search is returned:
https://api.ala.org.au/namematching/api/searchByClassification?class=Equisetopsida&genus=Acanthocladium&rank=genus

{"success":true,"scientificName":"Equisetopsida","scientificNameAuthorship":"C.Agardh","taxonConceptID":"https://id.biodiversity.org.au/taxon/apni/51744350","rank":"class","rankID":3000,"lft":529696,"rgt":627424,"matchType":"higherMatch","nameType":"SCIENTIFIC","kingdom":"Plantae","kingdomID":"https://id.biodiversity.org.au/taxon/apni/51744352","phylum":"Charophyta","phylumID":"https://id.biodiversity.org.au/taxon/apni/51744351","classs":"Equisetopsida","classID":"https://id.biodiversity.org.au/taxon/apni/51744350","speciesGroup":["Plants"],"issues":["homonym"]}

Ergo I can only conclude that this is a problem with the name-matching algorithm, not with galah per se. Will bring it up with the relevant team at ALA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants