Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset: Create Data Frames that are Easier to Exchange and Reuse #553

Open
1 of 18 tasks
antaldaniel opened this issue Aug 15, 2022 · 106 comments
Open
1 of 18 tasks

dataset: Create Data Frames that are Easier to Exchange and Reuse #553

antaldaniel opened this issue Aug 15, 2022 · 106 comments

Comments

@antaldaniel
Copy link

antaldaniel commented Aug 15, 2022

Submitting Author Name: Daniel Antal
Submitting Author Github Handle: @antaldaniel
Repository: https://github.com/dataobservatory-eu/dataset/
Version submitted: 0.1.7
Submission type: Standard
Editor: @annakrystalli
Reviewers: @msperlin, @romanflury

Due date for @msperlin: 2022-09-19

Due date for @romanflury: 2022-09-21

Archive: TBD
Version accepted: TBD
Language: en

  • Paste the full DESCRIPTION file inside a code block below:
Package: dataset
Title: Create Data Frames that are Easier to Exchange and Reuse
Date: 2022-08-19
Version: 0.1.7.3
Authors@R: 
    person(given = "Daniel", family = "Antal", 
           email = "[email protected]", 
           role = c("aut", "cre"),
           comment = c(ORCID = "0000-0001-7513-6760")
           )
Description: The aim of the 'dataset' package is to make tidy datasets easier to release, 
    exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, 
    well-described, interoperable datasets into release and reuse ready form. A subjective 
    interpretation of the  W3C  DataSet recommendation and the datacube model  <https://www.w3.org/TR/vocab-data-cube/>, 
    which is also used in the global Statistical Data and Metadata eXchange standards, 
    the application of the connected Dublin Core <https://www.dublincore.org/specifications/dublin-core/dcmi-terms/> 
    and DataCite <https://support.datacite.org/docs/datacite-metadata-schema-44/> standards 
    preferred by European open science repositories to improve the findability, accessibility,
    interoperability and reusability of the datasets.
License: GPL (>= 3)
URL: https://github.com/dataobservatory-eu/dataset
BugReports: https://github.com/dataobservatory-eu/dataset/issues
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.1
Depends: 
    R (>= 2.10)
LazyData: true
Imports: 
    assertthat,
    ISOcodes,
    utils
Suggests: 
    covr,
    declared,
    dplyr,
    eurostat,
    here,
    kableExtra,
    knitr,
    rdflib,
    readxl,
    rmarkdown,
    spelling,
    statcodelists,
    testthat (>= 3.0.0),
    tidyr
VignetteBuilder: knitr
Config/testthat/edition: 3
Language: en-US

You can find the package website on dataset.dataobservatory.eu. The article Motivation: Make Tidy Datasets Easier to Release Exchange and Reuse will eventually be condensed into a JOSS paper. It has a major development dilemma.

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • [x ] data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):
    Open science repositories and analyst comupters are full with datasets that have no provenance, structural or referential data. We believe that whenever possible, metadata should be machine-recorded when possible, and should not be detached from an R object.
    There are several R packages that have overalapping goals or functionality to dataset, but they use a different philosophy. When exporting to different files, they should be written as exported, but no sooner, and preferably into the file that contains the data.

  • Who is the target audience and what are scientific applications of this package?

This package is intended to give a common foundation to the rOpenGov reproducible research packages. It mainly serves communities that want to reuse statistical data (using the SDMX statistical (meta)data exchange sources, like Eurostat, IMF, World Bank, OECD...) or release new datasets from primary social sciences data that can be integrated into an SDMX compatible API or placed on a knowledge graph. Our main aim is to provide a clear publication workflow to the European open science repository Zenodo, and clear serialization strategies to RDF application.

  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
    The dataspice package aims to create well-defined and referenced datasets, but follows a different schema and a different publication strategy. The dataset package follows the more restrictive W3C/SDMX "DataSet" definition within the datacube model, which is better suited to synchronize with statistical data sources. Unlike dataset, it uses a manual metadata entry from CSV files. (See the documentation of the dataspice package.)

The dataset package aims for a higher level of reproducibality, and does not detach the metadata from the R object's attributes (it is aimed to be used in other reproducible research pacakges that will directly record provenance and other transactional metadata into the attributes.) We aim to bind together dataspice and dataset by creating export functions to csv files that contain the same metadata that dataspice records. Generally, dataspice seems to be better suited to raw, observational data, while dataset for statistically processed data.

The intended use of dataset is to start correctly record referential, structural and provenance metadata retrieved by various reproducible science packages that interact with statistical data (such as the rOpenGov packages eurostat and iotables, or the oecd package.

Neither dataset or dataspice are very suitable of or documenting social sciences survey data, which are usually held in datasets. Our aim is to connect dataset, declared and DDIwR to create such datasets with DDI codebook metadata. They will create a stable new foundation of the retroharmonize package to create new, well-documented and harmonized statistical datasets from the observational datasets of social sciences surveys.

The zen4R package provides reproducible export functionality to the zenodo open science repository. Interacting with zen4R may be intimidating for the casual R user as it uses R6 classes. Our aim to provide an export function that completely wraps the workings of zen4R when releasing the dataset.

In our experience, while the tidy data standards make reuse more efficient by eliminating unnecessary data processing steps before analysis or placement in a relational database, the application of DataSet definition and the datacube model with the information science metadata standards make reuse more efficient with exchanging and combining the data with other data in different datasets.

Yes

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

  • Explain reasons for any pkgcheck items which your package is unable to pass.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • [x ] Do you intend for this package to go on CRAN? -> Yes, I started the CRAN publication process, but opted to stop and get feedback from rOpenSic first

  • Do you intend for this package to go on Bioconductor? -> Don't know.

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

  • [ x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
@ropensci-review-bot
Copy link
Collaborator

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

@ropensci-review-bot
Copy link
Collaborator

🚀

The following problem was found in your submission template:

  • URL = [https://repourl] is not valid
    The package could not be checked because of problems with the URL.
    Editors: Please ensure these problems are rectified, and then call @ropensci-review-bot check package.

👋

@antaldaniel antaldaniel changed the title Create Data Frames that are Easier to Exchange and Reuse datasetÉ Create Data Frames that are Easier to Exchange and Reuse Aug 15, 2022
@antaldaniel antaldaniel changed the title datasetÉ Create Data Frames that are Easier to Exchange and Reuse dataset: Create Data Frames that are Easier to Exchange and Reuse Aug 15, 2022
@adamhsparks
Copy link
Member

Hi, @antaldaniel, could you please fix the repo URL by providing a link to the package’s repository, please? 🙏

@antaldaniel
Copy link
Author

@adamhsparks Apologies for the original issue problem, I hope all is fine now. I added both the github repo and the package website url

@mpadge
Copy link
Member

mpadge commented Aug 15, 2022

@antaldaniel Then you can start the checks yourself by calling @ropensci-review-bot check package

@antaldaniel
Copy link
Author

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for dataset (v0.1.7)

git hash: 2eb439b5

  • ✔️ Package name is available
  • ✖️ does not have a 'codemeta.json' file.
  • ✖️ does not have a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✖️ These functions do not have examples: [attributes_measures].
  • ✖️ Function names are duplicated in other packages
  • ✖️ Package has no continuous integration checks.
  • ✖️ Package coverage is 67.8% (should be at least 75%).
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 159
internal dataset 79
internal stats 4
imports utils 4
imports rlang 1
imports assertthat NA
imports ISOcodes NA
suggests declared NA
suggests dplyr NA
suggests eurostat NA
suggests here NA
suggests kableExtra NA
suggests knitr NA
suggests rdflib NA
suggests readxl NA
suggests rmarkdown NA
suggests spelling NA
suggests statcodelists NA
suggests testthat NA
suggests tidyr NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

names (26), data.frame (14), class (12), paste (9), rep (7), sapply (7), unlist (6), which (6), attr (5), lapply (5), length (5), ncol (5), subset (4), as.character (3), attributes (3), c (3), logical (3), seq_along (3), vapply (3), as.data.frame (2), as.numeric (2), cbind (2), file (2), inherits (2), matrix (2), nrow (2), round (2), args (1), date (1), deparse (1), for (1), gsub (1), ifelse (1), is.null (1), paste0 (1), rbind (1), tolower (1), union (1), unique (1), url (1), UseMethod (1)

dataset

dimensions (6), attributes_measures (5), measures (5), all_unique (3), dataset_title (3), related_item (3), creator (2), datacite (2), dataset (2), dataset_source (2), description (2), geolocation (2), identifier (2), language (2), metadata_header (2), publication_year (2), publisher (2), related_item_identifier (2), resource_type (2), add_date (1), add_relitem (1), arg.names (1), attributes_names (1), bibentry_dataset (1), datacite_add (1), dataset_download (1), dataset_download_csv (1), dataset_export (1), dataset_export_csv (1), dataset_local_id (1), dataset_title_create (1), dataset_uri (1), dimensions_names (1), document_package_used (1), dot.names (1), dublincore (1), dublincore_add (1), extract_year (1), is.dataset (1), measures_names (1), print (1), print.dataset (1), resource_type_general (1), rights (1), subject (1), time_var_guess (1), version (1)

stats

df (2), time (2)

utils

citation (1), object.size (1), read.csv (1), sessionInfo (1)

rlang

get_expr (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 26 files) and
  • 1 authors
  • 7 vignettes
  • no internal data file
  • 4 imported packages
  • 56 exported functions (median 10 lines of code)
  • 82 non-exported functions in R (median 15 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 26 87.0
files_vignettes 7 98.5
files_tests 27 97.6
loc_R 1000 68.2
loc_vignettes 676 84.7
loc_tests 371 68.8
num_vignettes 7 99.2 TRUE
n_fns_r 138 83.6
n_fns_r_exported 56 89.5
n_fns_r_not_exported 82 79.7
n_fns_per_file_r 3 55.0
num_params_per_fn 2 11.9
loc_per_fn_r 15 46.1
loc_per_fn_r_exp 10 22.2
loc_per_fn_r_not_exp 15 49.5
rel_whitespace_R 27 78.3
rel_whitespace_vignettes 36 88.3
rel_whitespace_tests 25 70.7
doclines_per_fn_exp 39 48.6
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 103 79.7

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)


3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fail:

  1. no_description_date

Test coverage with covr

Package coverage: 67.81

The following files are not completely covered by tests:

file coverage
R/creator.R 64.29%
R/datacite_attributes.R 0%
R/datacite.R 46.88%
R/dataset_uri.R 0%
R/dataset.R 48.36%
R/document_package_used.R 0%
R/dublincore.R 67.74%
R/publication_year.R 55.56%
R/related_item.R 66.67%

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
datacite_add 24
dublincore_add 23

Static code analyses with lintr

lintr found the following 383 potential issues:

message number of times
Avoid 1:ncol(...) expressions, use seq_len. 4
Avoid library() and require() calls in packages 20
Avoid using sapply, consider vapply instead, that's type safe 4
Lines should not be more than 80 characters. 352
Use <-, not =, for assignment. 3


4. Other Checks

Details of other checks (click to open)

✖️ The following 10 function names are duplicated in other packages:

    • dataset from assemblerr, febr, robis
    • description from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
    • dimensions from gdalcubes, openeo, sp, tiledb
    • identifier from Ramble
    • is.dataset from crunch
    • language from sylly, wakefield
    • measures from greybox, mlr3measures, tsibble
    • size from acrt, BaseSet, container, crmPack, CVXR, datastructures, deal, disto, easyVerification, EFA.MRFA, flifo, gdalcubes, gWidgets2, hrt, iemisc, InDisc, kernlab, matlab2r, multiverse, optimbase, PopED, pracma, ramify, rEMM, rmonad, simplegraph, siren, tcltk2, UComp, unival, vampyr
    • subject from DGM, emayili, gmailr, sendgridr
    • version from BiocManager, garma, geoknife, mice, R6DS, rerddap, rsyncrosim, shiny.info, SMFilter


Package Versions

package version
pkgstats 0.1.1.20
pkgcheck 0.1.0.3


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@adamhsparks
Copy link
Member

Hi again, @antaldaniel. If you could please address the issues that the bot flagged with the ✖️, then I can proceed with your submission.

@antaldaniel
Copy link
Author

antaldaniel commented Aug 17, 2022

Hi @adamhsparks I hope I managed to add these things, with the following exception.

✔️does not have a 'codemeta.json' file -> added with codematar.
✔️does not have a 'contributing' file -> added CONTRIBUTING.md
✔️ These functions do not have examples: [attributes_measures]. -> added
✖️ Function names are duplicated in other packages

I tried to avoid duplications while keeping in mind rOpenSci duplication guildelines, and at this point, I do not see which are the dupblications and if there is any sensible way to resolve them.

Your guidelines state "Avoid function name conflicts with base packages or other popular ones (e.g. ggplot2, dplyr, magrittr, data.table)" The package currently has no name conflict with any packages that I was thinking of to be used together, and I do not know how to test for this. (Apolgoies if this is somewhere in the 1.3 Package API)

✔️ Package has no continuous integration checks -> added
✖️ Package coverage is 67.8% (should be at least 75%)

I do not see a sensible way to achieve 75%+ codecov coverage with a metadata package that is in an early development page, still has development questions open (see Motivation: Make Tidy Datasets Easier to Release Exchange and Reuse, hence the submission here before the first CRAN release). For example, in the target category, other metadata management pacakges like codemetar has a 42% coverage, EML has 65%, both below the current coverage before the first release of dataset.

@mpadge
Copy link
Member

mpadge commented Aug 17, 2022

@antaldaniel You may indeed ignore the "Function names are duplicated in other packages." That will soon be changed from a failing check (:heavy_multiplication_x:) to an advisory note only. Sorry for any confusion there. @adamhsparks will comment further on the code coverage.

@antaldaniel
Copy link
Author

@mpadge I do not seem to find the output where this informaiton is coming from, but I think that it is nevertheless a very useful reminder, and it would be good to see what conflicts your bot has found. Again, apologies if I ask the obvious, but where can I check what duplicates were flagged by your bot?

@mpadge
Copy link
Member

mpadge commented Aug 17, 2022

It's in the check results. Under "4. Other Checks", you'll see a "Details of other checks (click to open)". You can also generate those yourself by running:

library(pkgcheck)
checks <- pkgcheck("/<path>/<to>/<dataset-pkg>")
checks_md <- checks_to_markdown(checks, render = TRUE)

That will automatically open a HTML-rendered version of the checks, just like the above. You can use that repeatedly as you work through the issues highlighted above.

@antaldaniel
Copy link
Author

@mpadge Oh, really, sorry for asking the obvious.

I would like to comment here on the issue then in substance. The main development question of the package, which aims to make R objects standard datasets (as defined by W3C and SDMX), is to add structural and referential metadata, is if the best way to do this is to create an s3 object or not (see the dilemma here.)

In the current stage, it is a pseudo object inherited from data.frame, but it can be seen also as a utility to any data.frame, tibble, and data.table (or similar tabular format) R objects. The functions, which have duplicates in other packages, are following a very simple naming convention. I think that these is the cleanest API interface that I can think of, for example, the

subject() gets the metadata attribute Subject and the subject<-() sets it. As DataCite, Dublin Core and schema.org has dozens of potential attributes, to me the easiest is to use in a slightly modified form the name of the attribute to set/get its value.

All these functions are lowercase to manipulate a camelCase standard attribute. Except for the SDMX attribute 'attribute', which would create a conflict with the base R 'attributes()' function.

@adamhsparks
Copy link
Member

Hi @antaldaniel,
I can understand the difficulty in writing tests for such a non-standard package. But I've had a look at covr::report() for "dataobservatory-eu/dataset". I think that there is still low-hanging fruit here that can be covered to get your code-coverage up to 75% that we ask for.

For instance, Lines 40-43 are covered but Lines 44-45 aren't. These are seemingly the same except for checking on 2 or 3 letter ISO codes, unless I'm mistaken.

Or the message response within the stop() functions in the same file aren't checked.

Could I ask that you have another look and see if you can't further improve the coverage a bit more?

@antaldaniel
Copy link
Author

Hi @adamhsparks I went up to 71.27%, but further changes are not very productive. I did not extensively cover two areas, one is the constructor for the dataset() itself, where I expect potentially breaking changes, and in the file I/O areas, where I think I would like to come up with a more general solution, and also avoid test being run on CRAN later. As the overwrite function and its messages make the most branches, this is a bit of a play with %, as the very same copied test is tested again and again.

Do you have a good solution to include download and file I/O tests that run fast enough or cause no disruption when later run on CRAN?

@antaldaniel
Copy link
Author

@adamhsparks I am much above your treshold, and apologies for the trivial error. I wanted to omit some issues in the dataset() construtor, but I did not realize that it had some old code that had been rewritten - the test were omitting them, of course, but they sat at the bottom of the file. It is now 81.2% covered, I know that it has to improve, but I'd prefer to do it when some issues are resolved in a clear direction (see my comment above.)

@adamhsparks
Copy link
Member

Hi @antaldaniel, that's great to see. Thank you for rechecking everything and updating.

If you have tests that you feel are unconducive for CRAN, I'd just use (and do liberally use) skip_on_cran(). Reviewers should hopefully be able to help guide you on this more.

@adamhsparks
Copy link
Member

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for dataset (v0.1.7.0002)

git hash: 93c03c54

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✖️ Function names are duplicated in other packages
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 82.1%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 147
internal dataset 66
internal stats 2
imports utils 2
imports assertthat NA
imports ISOcodes NA
suggests covr NA
suggests declared NA
suggests dplyr NA
suggests eurostat NA
suggests here NA
suggests kableExtra NA
suggests knitr NA
suggests rdflib NA
suggests readxl NA
suggests rmarkdown NA
suggests spelling NA
suggests statcodelists NA
suggests testthat NA
suggests tidyr NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

names (21), class (12), data.frame (10), paste (9), vapply (9), rep (7), character (6), unlist (6), attr (5), lapply (5), length (5), ncol (5), subset (4), as.character (3), c (3), seq_along (3), as.data.frame (2), as.numeric (2), attributes (2), cbind (2), file (2), inherits (2), logical (2), matrix (2), nrow (2), round (2), which (2), date (1), for (1), ifelse (1), is.null (1), paste0 (1), rbind (1), seq_len (1), tolower (1), union (1), unique (1), url (1), UseMethod (1)

dataset

attributes_measures (5), dimensions (4), all_unique (3), dataset_title (3), measures (3), creator (2), datacite (2), dataset (2), dataset_source (2), description (2), geolocation (2), identifier (2), language (2), metadata_header (2), publication_year (2), publisher (2), related_item_identifier (2), resource_type (2), bibentry_dataset (1), datacite_add (1), dataset_download (1), dataset_download_csv (1), dataset_export (1), dataset_export_csv (1), dataset_local_id (1), dataset_title_create (1), dataset_uri (1), dublincore (1), dublincore_add (1), extract_year (1), is.dataset (1), print (1), print.dataset (1), related_item (1), resource_type_general (1), resource_type_general_allowed (1), rights (1), subject (1), time_var_guess (1), version (1)

stats

df (2)

utils

object.size (1), read.csv (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 24 files) and
  • 1 authors
  • 7 vignettes
  • no internal data file
  • 3 imported packages
  • 56 exported functions (median 10 lines of code)
  • 66 non-exported functions in R (median 15 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 24 85.5
files_vignettes 7 98.5
files_tests 28 97.7
loc_R 889 64.9
loc_vignettes 676 84.7
loc_tests 432 72.0
num_vignettes 7 99.2 TRUE
n_fns_r 122 81.1
n_fns_r_exported 56 89.5
n_fns_r_not_exported 66 74.6
n_fns_per_file_r 3 54.4
num_params_per_fn 2 11.9
loc_per_fn_r 11 32.3
loc_per_fn_r_exp 10 22.2
loc_per_fn_r_not_exp 15 49.5
rel_whitespace_R 27 75.4
rel_whitespace_vignettes 36 88.3
rel_whitespace_tests 28 76.4
doclines_per_fn_exp 39 48.6
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 103 79.7

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

pkgcheck

GitHub Workflow Results

id name conclusion sha run_number date
2891146042 pkgcheck failure 93c03c 17 2022-08-19
2891146050 test-coverage success 93c03c 20 2022-08-19

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fail:

  1. no_description_date

Test coverage with covr

Package coverage: 82.12

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
datacite_add 24
dublincore_add 23

Static code analyses with lintr

lintr found the following 370 potential issues:

message number of times
Avoid library() and require() calls in packages 20
Lines should not be more than 80 characters. 350


4. Other Checks

Details of other checks (click to open)

✖️ The following 10 function names are duplicated in other packages:

    • dataset from assemblerr, febr, robis
    • description from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
    • dimensions from gdalcubes, openeo, sp, tiledb
    • identifier from Ramble
    • is.dataset from crunch
    • language from sylly, wakefield
    • measures from greybox, mlr3measures, tsibble
    • size from acrt, BaseSet, container, crmPack, CVXR, datastructures, deal, disto, easyVerification, EFA.MRFA, flifo, gdalcubes, gWidgets2, hrt, iemisc, InDisc, kernlab, matlab2r, multiverse, optimbase, PopED, pracma, ramify, rEMM, rmonad, simplegraph, siren, tcltk2, UComp, unival, vampyr
    • subject from DGM, emayili, gmailr, sendgridr
    • version from BiocManager, garma, geoknife, mice, R6DS, rerddap, rsyncrosim, shiny.info, SMFilter


Package Versions

package version
pkgstats 0.1.1.20
pkgcheck 0.1.0.3


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@adamhsparks
Copy link
Member

@ropensci-review-bot assign @melvidoni as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @melvidoni is now the editor

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for dataset (v0.3.1)

git hash: b1dca41e

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✖️ The following functions have no documented return values: [provenance, subsetting, var_labels, xsd_convert]
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✖️ These functions do not have examples: [dataset_to_triples].
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 79%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.
  • 👀 Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with 👀 may be optionally addressed.)

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 312
internal dataset 178
internal graphics 6
imports assertthat 22
imports utils 11
imports stats 10
imports ISOcodes NA
suggests dataspice NA
suggests covr NA
suggests declared NA
suggests dplyr NA
suggests eurostat NA
suggests here NA
suggests kableExtra NA
suggests knitr NA
suggests rdflib NA
suggests readxl NA
suggests rmarkdown NA
suggests spelling NA
suggests statcodelists NA
suggests testthat NA
suggests tidyr NA
suggests tibble NA
suggests nycflights13 NA
suggests tsibble NA
suggests data.table NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

as.character (40), ifelse (40), is.null (38), list (30), c (16), data.frame (14), names (10), lapply (8), attr (7), paste0 (7), inherits (6), class (5), col (5), drop (4), invisible (4), seq_along (4), which (4), as.POSIXct (3), character (3), date (3), for (3), format (3), length (3), ncol (3), Sys.time (3), unlist (3), vapply (3), all (2), args (2), as.data.frame (2), as.numeric (2), dim (2), paste (2), rbind (2), round (2), substitute (2), t (2), url (2), with (2), apply (1), as.Date (1), cbind (1), comment (1), do.call (1), environment (1), get (1), if (1), max (1), nchar (1), new.env (1), range (1), rep (1), substr (1), switch (1), Sys.Date (1)

dataset

dataset_bibentry (28), dataset_title (10), dataset (8), rights (8), subject (8), creator (7), description (6), publisher (6), identifier (5), language (5), new_Subject (5), provenance (5), xsd_convert (5), DataStructure (4), convert_column (3), publication_year (3), as_bibentry (2), as_dublincore (2), dots_number (2), geolocation (2), get_type (2), getdata (2), idcol_find (2), is_person (2), is.dataset (2), provenance_add (2), related_item_identifier (2), size (2), subject_create (2), version (2), as_datacite (1), as_dataset (1), as_dataset.data.frame (1), datacite (1), dataset_download (1), dataset_download_csv (1), dataset_prov (1), dataset_title_create (1), dataset_to_triples (1), dataset_ttl_write (1), datasource_get (1), datasource_set (1), DataStructure_update (1), describe (1), describe.dataset (1), dublincore (1), get_prefix (1), get_resource_identifier (1), head.dataset (1), id_to_column (1), initialise_dsd (1), is.datacite (1), is.datacite.datacite (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), new_datacite (1), new_dataset (1), new_dublincore (1), old_function (1), print.dataset (1), related_item (1), set_var_labels (1), set_var_labels.dataset (1)

assertthat

assert_that (22)

utils

bibentry (3), data (2), person (2), citation (1), object.size (1), read.csv (1), tail (1)

stats

df (5), var (3), ar (1), family (1)

graphics

title (6)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 38 files) and
  • 1 authors
  • 12 vignettes
  • 3 internal data files
  • 4 imported packages
  • 81 exported functions (median 7 lines of code)
  • 117 non-exported functions in R (median 13 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 38 92.1
files_vignettes 12 99.5
files_tests 37 98.1
loc_R 1621 77.5
loc_vignettes 805 86.6
loc_tests 567 73.9
num_vignettes 12 99.7 TRUE
data_size_total 3007 63.3
data_size_median 578 60.0
n_fns_r 198 88.2
n_fns_r_exported 81 93.2
n_fns_r_not_exported 117 84.8
n_fns_per_file_r 3 56.2
num_params_per_fn 3 29.3
loc_per_fn_r 11 33.0
loc_per_fn_r_exp 7 14.4
loc_per_fn_r_not_exp 13 43.3
rel_whitespace_R 25 83.5
rel_whitespace_vignettes 36 89.5
rel_whitespace_tests 28 78.5
doclines_per_fn_exp 38 46.9
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 128 81.9

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

pkgcheck
R-CMD-check.yaml

GitHub Workflow Results

id name conclusion sha run_number date
7677839674 pkgcheck failure b1dca4 126 2024-01-27
7677839676 R-CMD-check failure b1dca4 46 2024-01-27
7677839673 test-coverage failure b1dca4 129 2024-01-27

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fail:

  1. no_description_date

Test coverage with covr

Package coverage: 78.97

Cyclocomplexity with cyclocomp

The following function have cyclocomplexity >= 15:

function cyclocomplexity
[[.dataset 17

Static code analyses with lintr

lintr found no issues with this package!


4. Other Checks

Details of other checks (click to open)

✖️ The following 12 function names are duplicated in other packages:

    • dataset from assemblerr, febr, robis
    • describe from AzureVision, Bolstad2, describer, dlookr, explore, Hmisc, iBreakDown, ingredients, lambda.r, MSbox, onewaytests, prettyR, psych, psych, psyntur, questionr, radiant.data, RCPA3, Rlab, scan, scorecard, sylly, tidycomm
    • description from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
    • identifier from Ramble
    • is.dataset from crunch
    • language from sylly, wakefield
    • provenance from provenance
    • set_var_labels from xpose
    • size from acrt, BaseSet, container, crmPack, CVXR, datastructures, deal, disto, easyVerification, EFA.MRFA, flifo, gdalcubes, gWidgets2, hrt, iemisc, InDisc, kernlab, matlab2r, multiverse, optimbase, PopED, pracma, ramify, rEMM, rmonad, simplegraph, siren, tcltk2, UComp, unival, vampyr
    • subject from DGM, emayili, gmailr, sendgridr
    • var_labels from formatters, sjlabelled
    • version from BiocManager, garma, geoknife, mice, R6DS, rerddap, rsyncrosim, shiny.info, SMFilter


Package Versions

package version
pkgstats 0.2.0
pkgcheck 0.1.2.61


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@adamhsparks
Copy link
Member

adamhsparks commented Oct 15, 2024

Hi @antaldaniel, just checking in, the editor checks indicate a few minor issues that could be addressed fairly easily I think. Are you in a position to fix these issues so we can resume this review?

@antaldaniel
Copy link
Author

Yes, I am. I just created in the last days a plan to improve this package, and add an inheritated package for a specific use, because I think that the mass use was missing that would have created interest and contributions to the package. I will make these small changes, but also include for review a new conceptual vignette to explain better the mission statement.

@adamhsparks
Copy link
Member

Great, thank you for the update, @antaldaniel!

@antaldaniel
Copy link
Author

I received a message the night before that I cannot find here from Emily. Anyhow:

  1. The package was kicked out from CRAN because of a small documenting issue, but I thought that this is a good time to go back for a conceptual issue, as normally rOpenSci suggest to ask for feedback before CRAN release
  2. I took the package apart into two packages, one providing the basic classes and concept, and one a practical use case.
  3. The package under review here remains dataset, which is thoroughly rewritten in terms of s3 classes: https://github.com/antaldaniel/dataset
  4. These were the requirements that I set out before the rewrite
  5. The connecting package wbdataset.

In very short, dataset aims to provide tidier datasets that conform 3NF but their semantics is much richer: their variables can have strict definitions from global name spaces, units of measures, and they are labelled. The dataset as a whole has provenance information, and all Dublin Core - DataCite variables.

What I tried to avoid is what dataspice does, to detach the metadata; these metadata are all stored as attributes in the R object. I also want a full interoperability with the rdflib package in rOpenSci, because my datasets can be serialised in a very rich way to RDF, and for example, can be sent via API calls to statistical data catalogues like the EU Open Data Portal.

@antaldaniel
Copy link
Author

antaldaniel commented Dec 16, 2024

@adamhsparks @emilyriederer I think that the new version 0.3.3008 is ready for both conceptual and in-detail review.

@emilyriederer
Copy link

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for dataset (v0.3.3009)

git hash: eb1b88a6

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✖️ Package has no continuous integration checks.
  • ✔️ Package coverage is 83.6%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.
  • 👀 Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with 👀 may be optionally addressed.)

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 263
internal dataset 174
internal stats 1
imports assertthat 15
imports utils 8
imports labelled 5
imports rlang 2
imports cli 1
imports haven 1
imports tibble 1
imports ISOcodes NA
imports methods NA
imports pillar NA
imports vctrs NA
suggests knitr NA
suggests rmarkdown NA
suggests spelling NA
suggests testthat NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

as.character (50), ifelse (43), is.null (41), list (22), c (7), lapply (7), data.frame (6), substr (6), inherits (5), names (5), paste0 (5), Sys.time (5), seq_along (4), which (4), date (3), for (3), format (3), invisible (3), length (3), Sys.Date (3), t (3), all (2), attr (2), character (2), class (2), drop (2), labels (2), nrow (2), units (2), vapply (2), with (2), args (1), as.data.frame (1), as.Date (1), as.POSIXct (1), cbind (1), do.call (1), double (1), if (1), nchar (1), ncol (1), rbind (1), version (1)

dataset

get_bibentry (22), dataset_title (9), rights (7), description (5), identifier (5), language (5), publisher (5), subject (5), create_bibentry (4), creator (4), get_type (4), new_Subject (4), convert_column (3), provenance (3), publication_year (3), var_definition (3), var_namespace (3), var_unit (3), as_dataset_df (2), as_dublincore (2), dataset_df (2), default_provenance (2), definition_attribute (2), geolocation (2), get_orcid (2), idcol_find (2), is_person (2), is.dataset_df (2), n_triple (2), n_triples (2), namespace_attribute (2), remove_null_elements (2), subject_create (2), unit_attribute (2), as_character (1), as_character.haven_labelled_defined (1), as_datacite (1), as_numeric (1), as_numeric.haven_labelled_defined (1), as.character.haven_labelled_defined (1), create_iri (1), datacite (1), dataset_to_triples (1), defined (1), dublincore (1), get_definition_attribute (1), get_namespace_attribute (1), get_unit_attribute (1), id_to_column (1), is_dataset_df (1), is.datacite (1), is.datacite.datacite (1), is.defined (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), label_attribute (1), names.dataset_df (1), new_datacite (1), new_datetime_defined (1), new_dublincore (1), new_labelled_defined (1), new_my_tibble (1), print.dataset_df (1), prov_author (1), set_bibentry (1), set_definition_attribute (1), set_namespace_attribute (1), set_unit_attribute (1), set_var_labels (1), summary.dataset_df (1), summary.haven_labelled_defined (1), tbl_sum.dataset_df (1), var_definition.default (1), var_label.dataset_df (1), var_label.defined (1), var_namespace.default (1), var_unit.default (1), vec_cast_named (1)

assertthat

assert_that (15)

utils

person (4), bibentry (3), citation (1)

labelled

var_label (4), to_labelled (1)

rlang

caller_env (1), env_is_user_facing (1)

cli

cat_line (1)

haven

labelled (1)

stats

df (1)

tibble

new_tibble (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 26 files) and
  • 1 authors
  • 4 vignettes
  • 1 internal data file
  • 11 imported packages
  • 88 exported functions (median 4 lines of code)
  • 136 non-exported functions in R (median 8 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 26 86.1
files_vignettes 4 93.2
files_tests 27 96.7
loc_R 1378 73.9
loc_vignettes 410 70.9
loc_tests 531 72.4
num_vignettes 4 94.6
data_size_total 2065 61.5
data_size_median 2065 67.9
n_fns_r 224 90.1
n_fns_r_exported 88 93.9
n_fns_r_not_exported 136 87.3
n_fns_per_file_r 5 73.0
num_params_per_fn 2 8.2
loc_per_fn_r 5 9.3
loc_per_fn_r_exp 4 5.7
loc_per_fn_r_not_exp 8 23.0
rel_whitespace_R 26 81.6
rel_whitespace_vignettes 35 73.6
rel_whitespace_tests 28 77.6
doclines_per_fn_exp 38 47.0
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 115 80.5

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)


3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following note:

  1. checking Rd cross-references ... NOTE
    Package unavailable to check Rd xrefs: ‘tsibble’

R CMD check generated the following check_fails:

  1. no_description_date
  2. no_import_package_as_a_whole

Test coverage with covr

Package coverage: 83.56

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
new_datetime_defined 21
new_labelled_defined 21

Static code analyses with lintr

lintr found no issues with this package!


4. Other Checks

Details of other checks (click to open)

✖️ The following 8 function names are duplicated in other packages:

    • as_character from metan, radiant.data, retroharmonize, sjlabelled
    • as_numeric from descstat, metan, qdapRegex, radiant.data, retroharmonize, sjlabelled, zenplots
    • description from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
    • get_bibentry from eurostat
    • identifier from Ramble
    • language from sylly, wakefield
    • provenance from provenance
    • subject from DGM, emayili, gmailr, sendgridr


Package Versions

package version
pkgstats 0.2.0.48
pkgcheck 0.1.2.77


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@emilyriederer
Copy link

Hi @antaldaniel - just to double check, is the right URL https://github.com/antaldaniel/dataset or https://github.com/antaldaniel/dataset now? What will the final home of the main branch of the package be?

Lets make sure the top entry is pointing to the right URL.

If https://github.com/antaldaniel/dataset is the final home, can you please set up CI there also?

@antaldaniel
Copy link
Author

Apologies @emilyriederer , I brought it back to the original repository, so it is https://github.com/dataobservatory-eu/dataset/. It has the new rhubv2 on it, so it does have CI, but it appears to me that sometimes I get a false negative as if there was not (I get it on the CI!)

@emilyriederer
Copy link

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for dataset (v0.3.3009)

git hash: eb1b88a6

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✖️ Package has no continuous integration checks.
  • ✔️ Package coverage is 83.6%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.
  • 👀 Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with 👀 may be optionally addressed.)

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 263
internal dataset 174
internal stats 1
imports assertthat 15
imports utils 8
imports labelled 5
imports rlang 2
imports cli 1
imports haven 1
imports tibble 1
imports ISOcodes NA
imports methods NA
imports pillar NA
imports vctrs NA
suggests knitr NA
suggests rmarkdown NA
suggests spelling NA
suggests testthat NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

as.character (50), ifelse (43), is.null (41), list (22), c (7), lapply (7), data.frame (6), substr (6), inherits (5), names (5), paste0 (5), Sys.time (5), seq_along (4), which (4), date (3), for (3), format (3), invisible (3), length (3), Sys.Date (3), t (3), all (2), attr (2), character (2), class (2), drop (2), labels (2), nrow (2), units (2), vapply (2), with (2), args (1), as.data.frame (1), as.Date (1), as.POSIXct (1), cbind (1), do.call (1), double (1), if (1), nchar (1), ncol (1), rbind (1), version (1)

dataset

get_bibentry (22), dataset_title (9), rights (7), description (5), identifier (5), language (5), publisher (5), subject (5), create_bibentry (4), creator (4), get_type (4), new_Subject (4), convert_column (3), provenance (3), publication_year (3), var_definition (3), var_namespace (3), var_unit (3), as_dataset_df (2), as_dublincore (2), dataset_df (2), default_provenance (2), definition_attribute (2), geolocation (2), get_orcid (2), idcol_find (2), is_person (2), is.dataset_df (2), n_triple (2), n_triples (2), namespace_attribute (2), remove_null_elements (2), subject_create (2), unit_attribute (2), as_character (1), as_character.haven_labelled_defined (1), as_datacite (1), as_numeric (1), as_numeric.haven_labelled_defined (1), as.character.haven_labelled_defined (1), create_iri (1), datacite (1), dataset_to_triples (1), defined (1), dublincore (1), get_definition_attribute (1), get_namespace_attribute (1), get_unit_attribute (1), id_to_column (1), is_dataset_df (1), is.datacite (1), is.datacite.datacite (1), is.defined (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), label_attribute (1), names.dataset_df (1), new_datacite (1), new_datetime_defined (1), new_dublincore (1), new_labelled_defined (1), new_my_tibble (1), print.dataset_df (1), prov_author (1), set_bibentry (1), set_definition_attribute (1), set_namespace_attribute (1), set_unit_attribute (1), set_var_labels (1), summary.dataset_df (1), summary.haven_labelled_defined (1), tbl_sum.dataset_df (1), var_definition.default (1), var_label.dataset_df (1), var_label.defined (1), var_namespace.default (1), var_unit.default (1), vec_cast_named (1)

assertthat

assert_that (15)

utils

person (4), bibentry (3), citation (1)

labelled

var_label (4), to_labelled (1)

rlang

caller_env (1), env_is_user_facing (1)

cli

cat_line (1)

haven

labelled (1)

stats

df (1)

tibble

new_tibble (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 26 files) and
  • 1 authors
  • 4 vignettes
  • 1 internal data file
  • 11 imported packages
  • 88 exported functions (median 4 lines of code)
  • 136 non-exported functions in R (median 8 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 26 86.1
files_vignettes 4 93.2
files_tests 27 96.7
loc_R 1378 73.9
loc_vignettes 410 70.9
loc_tests 531 72.4
num_vignettes 4 94.6
data_size_total 2065 61.5
data_size_median 2065 67.9
n_fns_r 224 90.1
n_fns_r_exported 88 93.9
n_fns_r_not_exported 136 87.3
n_fns_per_file_r 5 73.0
num_params_per_fn 2 8.2
loc_per_fn_r 5 9.3
loc_per_fn_r_exp 4 5.7
loc_per_fn_r_not_exp 8 23.0
rel_whitespace_R 26 81.6
rel_whitespace_vignettes 35 73.6
rel_whitespace_tests 28 77.6
doclines_per_fn_exp 38 47.0
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 115 80.5

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)


3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following note:

  1. checking Rd cross-references ... NOTE
    Package unavailable to check Rd xrefs: ‘tsibble’

R CMD check generated the following check_fails:

  1. no_description_date
  2. no_import_package_as_a_whole

Test coverage with covr

Package coverage: 83.56

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
new_datetime_defined 21
new_labelled_defined 21

Static code analyses with lintr

lintr found no issues with this package!


4. Other Checks

Details of other checks (click to open)

✖️ The following 8 function names are duplicated in other packages:

    • as_character from metan, radiant.data, retroharmonize, sjlabelled
    • as_numeric from descstat, metan, qdapRegex, radiant.data, retroharmonize, sjlabelled, zenplots
    • description from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
    • get_bibentry from eurostat
    • identifier from Ramble
    • language from sylly, wakefield
    • provenance from provenance
    • subject from DGM, emayili, gmailr, sendgridr


Package Versions

package version
pkgstats 0.2.0.48
pkgcheck 0.1.2.77


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@emilyriederer
Copy link

Hey @mpadge - any intuition for why the review bot isn't picking up on CI in this case? (Repo uses GitHub Actions)

I can try to dig into the bot code if it's a mystery but wanting to check if it's a known issue. Obviously I can see the CI checks so it's not a blocker for this review; just a curious phenomenon generally

@antaldaniel
Copy link
Author

Hey @mpadge - any intuition for why the review bot isn't picking up on CI in this case? (Repo uses GitHub Actions)

I can try to dig into the bot code if it's a mystery but wanting to check if it's a known issue. Obviously I can see the CI checks so it's not a blocker for this review; just a curious phenomenon generally

I manage several packages on CRAN and the new rhubv2 infrastructure was also a big challenge for me, I think it is a genuinly new CI infrastructure, so maybe it is not caught by your earlier workflow analysis. However, I think it is also very-very good so perhaps it would make sense for R packages to only rely on that. It is very well functioning, and provides tests on up to 30 scenarios with varous combinations of old, new, devel versions of both R and the various Windows, Mac, Linux distributions.

@emilyriederer
Copy link

Thanks for the context, @antaldaniel ! I'll escalate this if rhubv2 is the issue. On a prelimary look at the pkgcheck repo, however, I don't believe that's the case.

In the meantime, under the goodpractice results, I see that you have a few R CMD check issues. Could you please take a look at these? (particularly the fails)

R CMD check with [rcmdcheck](https://r-lib.github.io/rcmdcheck/)

R CMD check generated the following note:

    checking Rd cross-references ... NOTE
    Package unavailable to check Rd xrefs: ‘tsibble’

R CMD check generated the following check_fails:

    no_description_date
    no_import_package_as_a_whole

@emilyriederer
Copy link

@antaldaniel - could you please also try adding a CI badge to your README? This may be the problem.

If Rhubv2 does not provide one, there are instructions to make your own here

@antaldaniel
Copy link
Author

@emilyriederer I hopefully made a well-functioning badge (it appears to be working to me), and I also ran goodpractice() locally, should have done that before, and made some minor cosmetic documentation changes that were not flagged by the rhub checks (some functions and methods did not have return values, now hopefully all of them have.) This is now the 0.3.4 version.

Regarding the issues flagged by the goodpractice are in fact seem to be devtools::document() issues that I reported. The tsibble example is no longer an issue. I ran goodpractice locally, and there are a few instances of bad practice; some are related to long lines in the documentation, which I will try to resolve after the review when possible (sometimes long URLs cause the problem and there is nothing to do about it.) And there a few instances when accidentally = is used instead of the <- assignment, I fixed these.

@emilyriederer
Copy link

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for dataset (v0.3.4)

git hash: 1040dd8f

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 83.5%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.
  • 👀 Function names are duplicated in other packages

(Checks marked with 👀 may be optionally addressed.)

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 212
internal dataset 169
internal graphics 9
internal stats 1
imports assertthat 18
imports utils 7
imports labelled 5
imports rlang 2
imports cli 1
imports haven 1
imports tibble 1
imports ISOcodes NA
imports methods NA
imports pillar NA
imports vctrs NA
suggests knitr NA
suggests rmarkdown NA
suggests spelling NA
suggests testthat NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

as.character (38), ifelse (32), is.null (30), list (16), c (7), lapply (7), data.frame (6), inherits (5), names (5), Sys.time (5), seq_along (4), substr (4), which (4), date (3), invisible (3), paste0 (3), t (3), with (3), all (2), attr (2), character (2), class (2), drop (2), for (2), format (2), labels (2), length (2), nrow (2), vapply (2), args (1), as.data.frame (1), as.Date (1), as.POSIXct (1), cbind (1), do.call (1), double (1), if (1), nchar (1), ncol (1), rbind (1), Sys.Date (1)

dataset

get_bibentry (24), dataset_title (9), subject (8), rights (6), new_Subject (5), creator (4), dataset_df (4), description (4), get_type (4), identifier (4), language (4), publisher (4), convert_column (3), provenance (3), publication_year (3), var_definition (3), var_namespace (3), var_unit (3), as_dataset_df (2), as_dublincore (2), default_provenance (2), definition_attribute (2), geolocation (2), get_orcid (2), idcol_find (2), is_person (2), is.dataset_df (2), n_triple (2), n_triples (2), namespace_attribute (2), new_my_tibble (2), unit_attribute (2), as_character (1), as_character.haven_labelled_defined (1), as_datacite (1), as_numeric (1), as_numeric.haven_labelled_defined (1), as.character.haven_labelled_defined (1), create_iri (1), datacite (1), dataset_to_triples (1), defined (1), dublincore (1), get_definition_attribute (1), get_namespace_attribute (1), get_unit_attribute (1), id_to_column (1), is_dataset_df (1), is.datacite (1), is.datacite.datacite (1), is.defined (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), label_attribute (1), names.dataset_df (1), new_datacite (1), new_datetime_defined (1), new_dublincore (1), new_labelled_defined (1), print.dataset_df (1), prov_author (1), set_definition_attribute (1), set_namespace_attribute (1), set_unit_attribute (1), set_var_labels (1), subject_create (1), summary.dataset_df (1), summary.haven_labelled_defined (1), tbl_sum.dataset_df (1), var_definition.default (1), var_label.dataset_df (1), var_label.defined (1), var_namespace.default (1), var_unit.default (1)

assertthat

assert_that (18)

graphics

title (9)

utils

person (4), bibentry (2), citation (1)

labelled

var_label (4), to_labelled (1)

rlang

caller_env (1), env_is_user_facing (1)

cli

cat_line (1)

haven

labelled (1)

stats

df (1)

tibble

new_tibble (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 26 files) and
  • 1 authors
  • 4 vignettes
  • 1 internal data file
  • 11 imported packages
  • 88 exported functions (median 4 lines of code)
  • 133 non-exported functions in R (median 8 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 26 86.1
files_vignettes 4 93.2
files_tests 27 96.7
loc_R 1353 73.5
loc_vignettes 449 73.4
loc_tests 552 73.1
num_vignettes 4 94.6
data_size_total 2202 61.8
data_size_median 2202 68.3
n_fns_r 221 89.9
n_fns_r_exported 88 93.9
n_fns_r_not_exported 133 86.9
n_fns_per_file_r 5 72.7
num_params_per_fn 2 8.2
loc_per_fn_r 5 9.3
loc_per_fn_r_exp 4 8.3
loc_per_fn_r_not_exp 8 23.0
rel_whitespace_R 26 81.1
rel_whitespace_vignettes 36 76.5
rel_whitespace_tests 29 78.7
doclines_per_fn_exp 55 68.0
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 122 81.4

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

rhub.yaml


3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fails:

  1. no_description_date
  2. no_import_package_as_a_whole

Test coverage with covr

Package coverage: 83.46

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
new_datetime_defined 21
new_labelled_defined 21

Static code analyses with lintr

lintr found no issues with this package!


4. Other Checks

Details of other checks (click to open)

✖️ The following 9 function names are duplicated in other packages:

    • as_character from metan, radiant.data, retroharmonize, sjlabelled
    • as_numeric from descstat, metan, qdapRegex, radiant.data, retroharmonize, sjlabelled, zenplots
    • description from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
    • get_bibentry from eurostat
    • identifier from Ramble
    • is.defined from nonmemica
    • language from sylly, wakefield
    • provenance from provenance
    • subject from DGM, emayili, gmailr, sendgridr


Package Versions

package version
pkgstats 0.2.0.48
pkgcheck 0.1.2.77


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

@emilyriederer
Copy link

Thanks @antaldaniel ! That seems to have fixed the CI issue.

I think we are in good shape to look for an editor. This may be slow these next few weeks (e.g. rOpenSci is actually closed to new submissions through year end) but we will be in touch soon.

@antaldaniel
Copy link
Author

@emilyriederer Thank you very much. In the meantime, the package got back to CRAN, so I think it technically works. In my opinion, the real conceptual issue would be how this package could be well integrated with some other rOpenSci packages, mainly rdflib (they could form a super powerful usecase together, i.e., to export in a semantically rich way any R dataset to RDF, making them truly interoperable), and perhaps with dataspice, as I think that this package will solve in a much more general and broader sense what that pacakge is aiming for.

I am looking forward to hear from you and I wish a very nice festive season to you and all rOpenSci reviewers

@emilyriederer emilyriederer self-assigned this Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants