Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rredlist: 'IUCN' Red List Client #663

Open
12 of 29 tasks
willgearty opened this issue Oct 4, 2024 · 52 comments
Open
12 of 29 tasks

rredlist: 'IUCN' Red List Client #663

willgearty opened this issue Oct 4, 2024 · 52 comments

Comments

@willgearty
Copy link

willgearty commented Oct 4, 2024

Submitting Author Name: William Gearty
Submitting Author Github Handle: @willgearty
Other Package Authors Github handles: (comma separated, delete if none) @sckott, @maelle
Repository: https://github.com/ropensci/rredlist
Version submitted:
Submission type: Standard
Editor: @robitalec
Reviewers: @stephhazlitt, @KevCaz

Archive: TBD
Version accepted: TBD
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: rredlist
Type: Package
Title: 'IUCN' Red List Client
Description: 'IUCN' Red List (<https://api.iucnredlist.org/>) client.
    The 'IUCN' Red List is a global list of threatened and endangered species.
    Functions cover all of the Red List 'API' routes. An 'API' key is required.
Version: 0.7.1.9000
Authors@R: 
    c(person(given = "William",
             family = "Gearty",
             role = c("aut", "cre"),
             email = "[email protected]"),
      person(given = "Scott",
             family = "Chamberlain",
             role = c("aut"),
             email = "[email protected]"),
      person(given = "rOpenSci",
             role = "fnd",
             comment = "https://ropensci.org/"),
      person(given = "Maëlle",
             family = "Salmon",
             role = "ctb",
             email = "[email protected]")
      )
License: MIT + file LICENSE
URL: https://docs.ropensci.org/rredlist/,
     https://github.com/ropensci/rredlist
BugReports: https://github.com/ropensci/rredlist/issues
Roxygen: list(markdown = TRUE)
Encoding: UTF-8
Language: en-US
Imports:
    crul (>= 0.3.8),
    jsonlite (>= 1.1),
    lifecycle
Suggests:
    spelling,
    testthat,
    vcr (>= 0.4.0),
    knitr,
    rmarkdown
RoxygenNote: 7.3.2
VignetteBuilder: knitr
X-schema.org-applicationCategory: Biodiversity
X-schema.org-keywords: IUCN, biodiversity, API, web-services, traits, habitat, species, conservation
X-schema.org-isPartOf: https://ropensci.org

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Package retrieves data from the IUCN Red List.

  • Who is the target audience and what are scientific applications of this package?

The IUCN Red List is used by a broad range of users, including biologists, ecologists, conservationists, activists, etc. It has many uses including

There are no other packages, to my knowledge, that perform the same function.

N/A

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

N/A

  • Explain reasons for any pkgcheck items which your package is unable to pass.

The pkgcheck package says my package only has 11.2% coverage, but my own coverage checks calculate the coverage to be ~94%.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?
    Package is already on CRAN.

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@ropensci-review-bot
Copy link
Collaborator

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

ropensci-review-bot commented Oct 4, 2024

Checks for rredlist (v0.7.1.9000)

git hash: 4fd92778

  • ✔️ Package is already on CRAN.
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 94.4%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 160
internal utils 128
internal rredlist 114
internal stats 3
imports jsonlite 1
imports crul NA
imports lifecycle NA
suggests spelling NA
suggests testthat NA
suggests vcr NA
suggests knitr NA
suggests rmarkdown NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (56), all (40), paste (24), parse (14), class (3), order (3), c (2), drop (2), if (2), is.null (2), lapply (2), subset (2), any (1), args (1), Filter (1), names (1), Negate (1), Sys.getenv (1), url (1), vapply (1)

utils

page (128)

rredlist

rr_GET (47), check_key (3), ct (2), rl_actions_ (2), rl_categories_ (2), rl_class_ (2), rl_comp_groups_ (2), rl_countries_ (2), rl_extinct_ (2), rl_extinct_wild_ (2), rl_family_ (2), rl_faos_ (2), rredlist_ua (2), assert_is (1), assert_n (1), assert_not_na (1), combine_assessments (1), err_catcher (1), page_assessments (1), release_questions (1), rl_actions (1), rl_api_version (1), rl_assessment (1), rl_assessment_ (1), rl_categories (1), rl_citation (1), rl_class (1), rl_common_names (1), rl_common_names_ (1), rl_comp_groups (1), rl_countries (1), rl_extinct (1), rl_extinct_wild (1), rl_family (1), rl_faos (1), rl_green (1), rl_green_ (1), rl_growth_forms_ (1), rl_habitats_ (1), rl_kingdom_ (1), rl_order_ (1), rl_parse (1), rl_phylum_ (1), rl_pop_trends_ (1), rl_realms_ (1), rl_research_ (1), rl_scopes_ (1), rl_sis (1), rl_species (1), rl_stresses_ (1), rl_systems_ (1), rl_threats_ (1), rl_use_and_trade_ (1), rr_base (1), space (1)

stats

family (3)

jsonlite

fromJSON (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 19 files) and
  • 2 authors
  • 2 vignettes
  • no internal data file
  • 3 imported packages
  • 84 exported functions (median 12 lines of code)
  • 112 non-exported functions in R (median 8 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 19 78.3
files_vignettes 3 89.3
files_tests 183 99.9
loc_R 899 62.7
loc_vignettes 206 47.9
loc_tests 1510 89.2
num_vignettes 2 85.5
n_fns_r 196 88.1
n_fns_r_exported 84 93.5
n_fns_r_not_exported 112 84.0
n_fns_per_file_r 5 73.3
num_params_per_fn 6 77.5
loc_per_fn_r 12 36.9
loc_per_fn_r_exp 12 28.3
loc_per_fn_r_not_exp 8 27.4
rel_whitespace_R 18 63.1
rel_whitespace_vignettes 50 62.2
rel_whitespace_tests 3 94.0
doclines_per_fn_exp 62 74.4
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 336 92.5

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-check.yml

GitHub Workflow Results

id name conclusion sha run_number date
11183436924 lint.yaml success 4fd927 43 2024-10-04
11183436931 R-CMD-check.yaml success 4fd927 47 2024-10-04
11183436929 revdep-check success 4fd927 38 2024-10-04
11183436923 test-coverage.yaml success 4fd927 43 2024-10-04

3b. goodpractice results

R CMD check with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

Package coverage: 94.4

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found no issues with this package!


Package Versions

package version
pkgstats 0.1.6.17
pkgcheck 0.1.2.58


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

@willgearty
Copy link
Author

I should mention that rredlist was originally developed by @sckott and is already a part of ropensci. However, the IUCN Red List API has recently changed dramatically, so I basically rewrote the entire package to work with the new API. I thought now was as good a time as any to have the package finally go through ropensci review (with the intent to submit to JOSS afterwards).

@adamhsparks adamhsparks self-assigned this Oct 15, 2024
@adamhsparks
Copy link
Member

@ropensci-review-bot assign @robitalec as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @robitalec is now the editor

@robitalec
Copy link
Member

robitalec commented Oct 22, 2024

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
    • Is the case for the package well made?
    • Is the reference index page clear (grouped by topic if necessary)?
    • Are vignettes readable, sufficiently detailed and not just perfunctory?
  • Fit: The package meets criteria for fit and overlap.
  • Installation instructions: Are installation instructions clear enough for human users?
  • Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
  • Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
  • License: The package has a CRAN or OSI accepted license.
  • Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Thanks for the submission @willgearty, I am keen to help on this review!

All above looks good to me. Some minor comments below, the authors could consider working on while we are looking for reviewers.

  • The README has a great introductory paragraph, installation instructions, and meta information. It could use a brief demonstration of usage, see here.

  • The introductory vignette covers installation, authentication, high level and low level interface and best practices. It might be helpful to have a general overview of the available functions or groups of functions in {rredlist}.

  • Good to see tests for {rredlist} use {vcr} for HTTP testing. In CONTRIBUTING.md, the authors could consider including a note about how testing is different in {rredlist} due to HTTP requests, maybe by linking to {vcr} and HTTP testing in R? This might help interested contributors to get familiar with HTTP testing.

  • One minor note from {goodpractice}:

    tests/testthat/test-rl_threats.R
    Line 82 avoid sapply(), it is not type safe.
          It might return a vector, or a list, depending on the
          input data. Consider using vapply() instead.
    

@robitalec
Copy link
Member

@ropensci-review-bot seeking reviewers

@ropensci-review-bot
Copy link
Collaborator

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/663_status.svg)](https://github.com/ropensci/software-review/issues/663)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

@robitalec
Copy link
Member

Oh @willgearty, one more thing. I missed this in the editor checks under Statistical Properties. I see doclines_per_fn_not_exp ("median number of lines of documentation for each non-exported R function") is 0. Looking through your R/ directory, I found these non-exported R functions without any documentation.

release.R:

  • release_questions

zzz.R:

  • ct
  • rredlist_ua
  • rr_GET
  • err_catcher
  • rl_parse
  • check_key
  • rr_base
  • space
  • assert_is
  • assert_n
  • assert_not_na
  • combine_assessments
  • page_assessments

I would suggest adding some internal documentation for anyone trying to debug an issue when using {rredlist}, or to help potential contributors understand how {rredlist} works. You can use the #' @noRd tag (roxygen2 help, dev guide) to prevent .Rd generation. I would think eg. rr_GET, rr_base, rredlist_ua could be good candidates for brief documentation given they are quite central to the package (network of calls). Thanks!

@willgearty
Copy link
Author

@robitalec thanks for flagging those issues. I believe I have addressed all of your concerns in preparation for review!

@robitalec
Copy link
Member

Thank you @willgearty for addressing those suggestions, it looks great!

(Sidebar, that is a neat custom chunk option {r output.lines=1:10}!)

@robitalec
Copy link
Member

@ropensci-review-bot assign @KevCaz as reviewer

@ropensci-review-bot
Copy link
Collaborator

@KevCaz added to the reviewers list. Review due date is 2024-11-14. Thanks @KevCaz for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@robitalec
Copy link
Member

@ropensci-review-bot assign @stephhazlitt as reviewer

@ropensci-review-bot
Copy link
Collaborator

@stephhazlitt added to the reviewers list. Review due date is 2024-11-19. Thanks @stephhazlitt for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@ropensci-review-bot
Copy link
Collaborator

@stephhazlitt: If you haven't done so, please fill this form for us to update our reviewers records.

@robitalec
Copy link
Member

Thanks to both @stephhazlitt and @KevCaz for agreeing to review {rredlist}!

Please let me know if you need anything as you work on your review.

@willgearty
Copy link
Author

Thank you both in advance for your reviews!

@KevCaz
Copy link
Member

KevCaz commented Nov 8, 2024

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • I know the names of all the authors through ROpenSci, but I don't know them well and we are not currently collaborating on any projects.

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work.

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
    • See my suggestion below
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
    • NB: there are no example for the fuction ending with _, but I don't think there are needed, so all good for me.
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software have been confirmed.
  • Performance: Any performance claims of the software have been confirmed.
    • NB: I haven't read any specific performance claims, but the way the main request functions are implemented look good to me.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
    • see my comments below.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
    • package name is good;
    • metadata are good;
    • all platforms are covered;
    • function and argument names are meaningful;
    • messages: there are two cat() used when getting data from different pages that can be switch to a progress bar instead (see below);
    • code style is good;
    • README is good
      • A way to retrieve the citation is mentioned, but the citation could be added directly.
    • URL are properly formatted throughout the documentation;
    • documentation is exhaustive (I fix minor typos in my PR);
    • examples are working locally;
    • deprecated function were carefully listed clearly mentioned;
    • the pkgdown website is working locally and the documentation is exhaustive;
    • Authoring is good;
    • MIT Licence;
    • package dependencies list is good (a minimum version for lifecycle could be specified);

Estimated hours spent reviewing: 7

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

In my option this new version of the rredlist package is well-designed, easy to use and has an exhaustive documentation. I truly value the effort put by @willgearty to support the version 4 of the API. Many thanks to all the package authors for your efforts in developing a robust R client for this web API.

I haven't identified any major issues in the base code, only suggestions, which I've categorized as 'Should Have' and 'Could Have' below. Please consider the 'Should Have' items as recommendations to address while you're actively working on the package, and the 'Could Have' items as optional improvements.

Note that I have submitted a PR (ropensci/rredlist#60) that fixes a few typos and avoid the use of \code{} in R file (given that the DESCRIPTION files includes Roxygen: list(markdown = TRUE)).

Should Have

Add a progress bar when there are data to retrieve from more than one page

Some requests require retrieving data from multiple pages, and a dot is printed for each page using cat(). According to rOpenSci guidelines, cat() should be avoided for displaying console messages. Instead, I believe a progress bar would be more user-friendly, as currently, users only see a series of dots being appended in the console and has no idea how long their request will take.

If I am not mistaking, there is a simple way to do that. In rr_GET(), the crul response you are using (object temp) includes the total number of entry and the total number of pages.

<crul response> 
        url: https://api.iucnredlist.org/api/v4/research/1_1?page=1
        request_headers: 
          User-Agent: r-curl/5.2.3 crul/1.5.0 rOpenSci(rredlist/0.7.1.9000)
          Accept-Encoding: gzip, deflate
          Accept: application/json, text/xml, application/xml, */*
          Authorization: L5rzX4M6mJHoH4RFYUxxVLLDorEJ7TTx5ktS
        response_headers: 
          status: HTTP/2 200
          cache-control: max-age=0, private, must-revalidate
          content-type: application/json
          current-page: 1
          etag: W/"057a8a5a91bd3e7b59ce06ecde0e7ee1"
          link: <https://api.iucnredlist.org/api/v4/research/1_1?page=1&per_pa
   ge=
      100>; rel="first", <https://api.iucnredlist.org/api/v4/research/1_1?page
   =2&
      per_page=100>; rel="next", <https://api.iucnredlist.org/api/v4/research/
   1_1
      ?page=275&per_page=100>; rel="last"
          page-items: 100
          total-count: 27426
          total-pages: 275
          x-request-id: 7aa92fbc-e51d-4449-96b9-0e6c10b2c9d4
          x-runtime: 0.385047
          content-length: 23573
          date: Fri, 08 Nov 2024 10:26:00 GMT
        params: 
          page: 1
        status: 200

temp$response_headers has all these details and rr_GET() could return a list including this

list(x = x, response_headers = temp$response_headers)

you would then be able to use the elements of the response headers (response_headers$page-items response_headers$total-count response_headers$total-pages) to create a nice looking progress bar (your own or one from cli, for instance).

One final thought: This made me consider the possibility of integrating page_assessments() directly within rr_GET(), since it is possible to loop over total-pages. Of course, that's entirely up to you, as I don't anticipate any significant performance impact.

Second vignette

The Get started vignette is excellent. I wish there had been a second vignette with practical examples to help answer specific applied questions. Here is one example: the vignette could show how to determine all species at risk in a given country and how to group them per status and order name (with a final barplot);

I believe this would be a great way to showcase what can be accomplished with this package and how to integrate it with other packages to answer specific questions. It could also serve as an opportunity to demonstrate how to work with complementary packages or to list blog posts and articles that use rredlist.

Could have

tests

There are tests for all functions to ensure they fail gracefully if the input is not in the correct format (see below for an example of this). However, I'm not sure if this level of testing is strictly necessary. Personally, I'd prefer to see the assert_ functions tested directly. But, I may not have fully considered or read enough about this topic, so please take my input with a grain of salt. That said, I would still suggest testing helper functions (functions in zzz.R ) directly, even though they are tested indirectly.

test_that("fails well", {
  skip_on_cran()

  expect_error(rl_categories(5), "code must be of class character")
  expect_error(rl_categories(list()), "code must be of class character")

  expect_error(rl_categories(key = 5), "key must be of class character")
  expect_error(rl_categories(key = matrix()), "key must be of class character")

  expect_error(rl_categories(parse = 5), "parse must be of class logical")
  expect_error(
    rl_categories(parse = matrix()),
    "parse must be of class logical"
  )

  expect_error(rl_categories(page = "next"), "page must be of class integer")
  expect_error(rl_categories(all = "yes"), "all must be of class logical")
  expect_error(rl_categories(quiet = "no"), "quiet must be of class logical")

  # lengths
  expect_error(rl_categories(page = 1:2), "page must be length 1")
})

Could have

  • Vignette: the section 'Overview of available functions' reads more like a list of features rather than a list of functions (which is already available on the pkgdown website). Perhaps it would be better to rewrite this and present it as a table.

  • for rl_citation() I have two suggestions:

    • you could add Accessed on [day month year] (e.g. Sys.time() |> format("%d %B %Y"))
    • you could add a bibtex entry
  @misc{IUCN2024,
  author       = {IUCN},
  title        = {The IUCN Red List of Threatened Species},
  year         = {2024},
  edition      = {Version [version]},
  howpublished = {\url{https://www.iucnredlist.org}},
  note         = {Accessed on [day month year]},
}
  • In R/rredlist-package.R is there supposed to be a specific refence in this sentence See key A IUCN API token. It might not be clear enough.

  • I would mention in the README that you are covering all the end points of the web API and I would add a link to its documentation https://api.iucnredlist.org/api-docs/index.html.

  • regarding the precompilation of the vignette, you could follow https://ropensci.org/blog/2019/12/08/precompute-vignettes/ and use .Rmd.origin. That said, what is currently done is equivalent, so it's up to you.

@robitalec
Copy link
Member

Thank you @KevCaz for your review.

Just a minor note, I see under "Could have" / "tests" two paragraphs with a few repeated sentences. When you have a moment, can you double check and possibly edit to ensure your comments about tests are clear for everyone?

@robitalec
Copy link
Member

@ropensci-review-bot submit review #663 (comment) time 7

@maelle
Copy link
Member

maelle commented Nov 14, 2024

@ropensci-review-bot assign @KevCaz as reviewer

@ropensci-review-bot
Copy link
Collaborator

@KevCaz added to the reviewers list. Review due date is 2024-12-05. Thanks @KevCaz for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@ropensci-review-bot
Copy link
Collaborator

@KevCaz: If you haven't done so, please fill this form for us to update our reviewers records.

@maelle
Copy link
Member

maelle commented Nov 14, 2024

So it was a glitch, sorry @KevCaz you are now properly registered!

@maelle
Copy link
Member

maelle commented Nov 14, 2024

@ropensci-review-bot submit review #663 (comment) time 7

@ropensci-review-bot
Copy link
Collaborator

Logged review for KevCaz (hours: 7)

@robitalec
Copy link
Member

Thank you @maelle!

@ropensci-review-bot
Copy link
Collaborator

📆 @stephhazlitt you have 2 days left before the due date for your review (2024-11-19).

@stephhazlitt
Copy link

rredlist Package Review (@stephhazlitt)

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software have been confirmed.
  • Performance: Any performance claims of the software have been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: ~4

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments:

This is a nicely focused, tidy and well-documented data retrieval package that closely follows the rOpenSci packaging guide. In addition, the author has already implemented some incremental improvements from the rOpenSci editor (@robitalec) and community (@KevCaz) reviews. Apart from a few minor code-related comments, I have focused my review on my user experience and thinking of how analysts and researchers might interact with this software. I had fun playing with this package 🦋🐝🦆🌿! Thanks for the opportunity to review, I hope the comments are useful.

General Comments

  • Saving the API Key: I think for new users (and novice R users) a smooth path with the API key will be the "key" (pun intended). The rredlist::rl_use_iucn() helper fxn to get and save the key takes you to the web page and prints the instructions to save to the console—I missed the latter because I was whisked-away+distracted with the IUCN web page. I think it would a nice enhancement to mature this workflow with a fxn that wraps usethis and helps a user get the key sorted either in their .Rprofile (my usual workflow) or .Renviron (I gather their is no consensus re: a recommended approach). It might also be nice to alert users in the README/Vignette that they need to provide an email and other business information to the IUCN.
  • First Argument Options: I anticipate users often needing to look-up the first argument options for the various query fxns. It is mentioned in the .Rd docs that calling the fxn with no first argument does this for you (e.g., If not supplied, a list of all kingdom names will be returned.), however this could also be exampled in the README+vignette. I suspect this might be a common, interactive part of a typical workflow?
rl_kingdom()
$kingdom_names
[1] "ANIMALIA"  "CHROMISTA" "FUNGI"     "PLANTAE" 
  • An Even Tidier API?: I appreciate the different APIs, higher-level vs low-level. I wonder if a future enhancement could push the higher-level API to follow an even "tidier" approach. A list object, in my experience, is less data analysis-ready than anything data frame. Since the package already parses the most data-rich element of the list to a data frame, I wonder about taking it a bit further and adding the other list objects as metadata columns and returning a data frame. This would perhaps result in 3 APIs: a tidy one (returning a data frame), a list object (currently parse = FALSE) and the low level API (returning just the JSON). In the short-term, it might be useful to add some hints in the vignette about how to navigate the list objects (e.g., names(list) )?
  • Progress Bar: I think it would be useful to add in a progress bar for users in place of the .s—no matter the API used, the larger downloads take time, it is nice to have something to stare at.
  • Speed: The vignette mentions speed being one reason to leverage the lower-level API. Although not at all thorough, I did not find the low-level API significantly faster on the couple of large data pulls I tried.
tic()
ex1 <- rl_class(class = "Aves")
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
toc()
1420.385 sec elapsed

tic()
ex2 <- rl_class_(class = "Aves")
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
toc()
1267.238 sec elapsed

It might be worth sharing some benchmarking so users know where the time is used (download vs parse) and how or if they can further optimize data retrieval speed. For me, setting parse = FALSE felt like it saved a little time up front, but most of the time was the data transfer itself?

Minor Comments

  • It would be nice to see the production branch renamed following current community norms (master -> main)
  • The rl_sp_count() returns an integer not a list (need to override #' @template info for this one)
  • Would it be useful to embed a check-for-the-internet fxn and throw an informative error if not connected?
  • I am a bird nerd by training and immediately went to pull all the assessments for the Oystercatchers of the world. Is there a path to allow a user to pull all the records for a given Genus? (e.g., rl_species(genus = "Haematopus"))
  • I wonder about shifting the API versioning explanation to the end of the vignette, and then consolidate the sentence about not being able to get an API key for the older version?

@robitalec
Copy link
Member

Thank you @stephhazlitt for your review!

@robitalec
Copy link
Member

@ropensci-review-bot submit review #663 (comment) time 4

@ropensci-review-bot
Copy link
Collaborator

Logged review for stephhazlitt (hours: 4)

@robitalec
Copy link
Member

Hi @willgearty, we have the reviews in from @KevCaz and @stephhazlitt (thank you both).

Please feel free to ask the reviewers for any clarification needed on their recommendations.

Great work on the package, sounds like both reviewers found it useful, intuitive and well structured. Highlighting some recommendations across the reviews:

  • both reviewers mentioned improvements to the progress bar
  • @KevCaz had an interesting suggestion for an additional vignette demonstrating more applied questions one could answer with {rredlist}
  • @stephhazlitt suggested a {usethis} type API key storage
  • @stephhazlitt suggested an even tidier approach where data.frames are returned instead of lists ( - I agree that often folks seem to be less familiar working with lists than data.frames)

I will leave it there for you @willgearty to address the reviewers' comments! 🌻

@willgearty
Copy link
Author

Thank you @KevCaz and @stephhazlitt for your thoughtful reviews (and @robitalec for your very helpful summary)! I'll start working on addressing these recommendations as soon as possible.

@ropensci-review-bot
Copy link
Collaborator

@willgearty, @sckott, @maelle: please post your response with @ropensci-review-bot submit response <url to issue comment> if you haven't done so already (this is an automatic reminder).

Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html

@robitalec
Copy link
Member

Hello @willgearty, just checking in on the progress of your response to reviewers and potential changes.
Let us know if you need anything from us and if there are any clarifications needed from the reviews.
Thanks!

@willgearty
Copy link
Author

Thanks for the check-in @robitalec. With Thanksgiving last week, I haven't had a chance to work on any changes to the package, but I have read over the reviews and everything sounds reasonable and I do plan to implement nearly all of the suggestions hopefully over the next week.

@robitalec
Copy link
Member

Great, thanks for the update @willgearty!

@willgearty
Copy link
Author

willgearty commented Dec 15, 2024

Apologies for the delay, but I've now addressed nearly all of the review comments in this PR: ropensci/rredlist#62. I've summarized the changes below. A huge thank you again to the reviewers who both provided extremely useful suggestions, and please don't hesitate to let me know if you have any other comments!

@KevCaz's comments:

  • Progress bar: very useful suggestion; added in 50a1bf2 and improved in 718845b
    • I looked into moving page_assessments within rr_GET, but I'm not sure the benefits would be worth the effort
  • Second vignette: this was a great suggestion, and I've added it in c23f57d
  • test: I've added tests for the utility functions as suggested in 5b99b75
  • vignette overview of functions: I've renamed this section as suggested, but I'm not entirely sure what a table would look like for this. Let me know if you have more detail about what you were thinking here.
  • rl_citation(): this was a great suggestion, the output of rl_citation() now includes the accession date and a BibTex entry (0eb9b1e and 6a3e003)
  • rredlist-package.R: cleaned this up in 4a40ffb
  • README: I've added the suggested text in 64cb3b9
  • precompilation of vignettes: I prefer this setup because RStudio doesn't give the pretty styling of the markdown files if they aren't .Rmd files (e.g., if they end with .origin)

@stephhazlitt's comments:

  • rl_use_iucn(): thanks for the user insight on this! this function no longer opens the browser window automatically, instead providing step-by-step instructions for the user to follow (including linking to the URL for key signup and providing an example of how to edit the .Renviron file), implemented in 6e151ff and 1a3428e
  • first argument options: this is now mentioned in the new vignette (c23f57d)
  • even tidier API: I looked into this, but I'm not sure there is much more that can be data.frame-ified. I agree data.frames are more user-friendly, but often the different list elements have different types of data, numbers of columns, etc. Please do let me know if there were particular parts of the returned objects that would be useful to have data.frame-d. In the meantime, the new workflow vignette touches on how to interact with the list objects that are returned.
  • progress bar: since I had only ever tested examples with a small number of pages, I never noticed how problematic this could be! progress bar implemented in 50a1bf2 and improved in 718845b
  • speed: I've added a benchmark vignette in bbc6fd2; as you intuited, the vast majority of the time is spent downloading the data and parsing it takes relatively no time at all
  • production branch renamed
  • rl_sp_count(): docs updated in a6cbfe3
  • internet check: I added a check for the internet in c315fd1
  • genus query: as far as I can tell, there is currently no way to lookup a genus in the API
  • vignette layout: I've moved this part of the vignette to the end as suggested (2d91ac2)

Other notable changes

  • Switched to using HttpClient()$retry() in case of API timeout (a76f939)
  • Added an IUCN ggplot2 color scale (626e70f)
  • Check that all queries are not vectors (47da382)

@robitalec
Copy link
Member

robitalec commented Dec 20, 2024

Thank you @willgearty for addressing the reviewer comments and your detailed response here.

@stephhazlitt and @KevCaz, please use the reviewer approval template available here to respond to @willgearty's changes. Please feel free to ask for any clarification or discuss suggestions further.

I understand many folks are on holidays around this time, so please let me know if you won't be able to continue the review until after the new year.

Thanks everyone!

@stephhazlitt
Copy link

stephhazlitt commented Dec 23, 2024

Reviewer Response: Steph Hazlitt

Final approval (post-review)

Thank you for the changes+improvements @willgearty, I am sure all of your additional documentation will really support the users of this data retrieval package.

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: +1 final review; 5 total

Comments

Sharing a couple of hiccups I encountered using the ropensci-review branch.

I struggled to get the new progress bar working, getting the below error message with quiet = FALSE. I tried with the package installed from my local copy and with a fresh install from GitHub (e.g., remotes::install_github("ropensci/rredlist", ref = "ropensci-review")).

ropensci-review* > rredlist::rl_phylum(phylum = "Mollusca", quiet = FALSE)
Error in `"id" %in% names(args)`:
! Could not evaluate cli `{}` expression: `pb_name`.
Caused by error in `eval(expr, envir = envir)`:
! object 'pb_name' not found
Type .Last.error to see the more details.
Error in `(function (e) …`:
! Error in a deferred `on.exit()` clause
Caused by error in `"id" %in% names(args)`:
! Could not evaluate cli `{}` expression: `pb_name`.
Caused by error in `eval(expr, envir = envir)`:
! object 'pb_name' not found
Type .Last.error to see the more details.

When I run Check locally I am getting a note:

❯ checking files in ‘vignettes’ ... NOTE
  The following directory looks like a leftover from 'knitr':
    ‘figure’
  Please remove from your package.

0 errors ✔ | 0 warnings ✔ | 1 note ✖

R CMD check succeeded

I added the below to my local .Rbuildignore file and the note went away (as suggested here), in case helpful.

figure$
cache$

@willgearty
Copy link
Author

Thanks @stephhazlitt, I'll address both of those issues ASAP.

@willgearty
Copy link
Author

willgearty commented Jan 12, 2025

OK, I believe those issues should both be fixed now @stephhazlitt. @KevCaz please also let me know what you think of the updated package.

@robitalec
Copy link
Member

(Apologies for my delay here!)

Thank you @stephhazlitt for your final approval and @willgearty for addressing these related issues. @KevCaz, please let us know your thoughts on {rredlist} after @willgearty's changes after the reviews.

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants