-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
butterfly
: Verification of continually updating timeseries data where we expect new values, but want to ensure previous data remains unchanged.
#676
Comments
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type |
🚀 Editor check started 👋 |
Checks for butterfly (v1.0.0)git hash: ed8586e4
(Checks marked with 👀 may be optionally addressed.) Package License: MIT + file LICENSE 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. basecol (4), suppressMessages (2), by (1), character (1), cumsum (1), data.frame (1), list (1) butterflycreate_object_list (3), timeline_group (2), catch (1) dplyranti_join (1), bind_rows (1), case_when (1), semi_join (1) waldocompare (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
12088199123 | pages build and deployment | success | ded264 | 24 | 2024-11-29 |
12088183822 | pkgcheck | success | ed8586 | 8 | 2024-11-29 |
12088183810 | pkgdown.yaml | success | ed8586 | 43 | 2024-11-29 |
12088183807 | R-CMD-check.yaml | success | ed8586 | 45 | 2024-11-29 |
12088183819 | test-coverage.yaml | success | ed8586 | 46 | 2024-11-29 |
3b. goodpractice
results
R CMD check
with rcmdcheck
rcmdcheck found no errors, warnings, or notes
Test coverage with covr
Package coverage: 100
Cyclocomplexity with cyclocomp
No functions have cyclocomplexity >= 15
Static code analyses with lintr
lintr found no issues with this package!
4. Other Checks
Details of other checks (click to open)
✖️ The following 3 function names are duplicated in other packages:
-
catch
from catch, gnn, promises, qrmtools, TULIP
-
release
from packager, simmer
-
timeline
from ndtv, timeline, trackeR
Package Versions
package | version |
---|---|
pkgstats | 0.2.0.48 |
pkgcheck | 0.1.2.77 |
Editor-in-Chief Instructions:
This package is in top shape and may be passed on to a handling editor
Hi @thomaszwagerman ! Thanks for your submission. We're excited to move forward with this review, and it looks like the package is in great shape. I will begin to look for a handling editor. |
@ropensci-review-bot assign @username as editor |
@ropensci-review-bot assign @emilyriederer as editor |
Assigned! @emilyriederer is now the editor |
Hi again @thomaszwagerman ! I'll also be handling editor for this one. Looking forward to working with you |
Editor checks:
Editor commentsOverall, the package looks in great shape for review. The level of documentation is especially impressive; I particularly like the graphics. The repo also seems well organized, making the package easy to understand, install, and observe the development. I marked the "Contributing information" box as complete since the package doesn't require any specific dev set up; however, you could consider adding explicit information on what types of contributions you welcome and how folks should engage (e.g. open an issue before beginning development? See example here ) |
@ropensci-review-bot help |
Hello @emilyriederer, here are the things you can ask me to do:
|
@ropensci-review-bot seeking reviewers |
Please add this badge to the README of your package repository: [![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/676_status.svg)](https://github.com/ropensci/software-review/issues/676) Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news |
Thank you for your comments @emilyriederer - a very good point regarding being explicit about contributions. I have updated the CONTRIBUTING.md using the example you provided, and made it more bespoke to the package (antarctica/butterfly#30). A small point of general clarification, before submitting for review I followed the contribution guide, which states it is compulsory to have CONTRIBUTING.md under either |
@ropensci-review-bot assign @qdread as reviewer |
@qdread added to the reviewers list. Review due date is 2025-01-04. Thanks @qdread for accepting to review! Please refer to our reviewer guide. rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more. |
@thomaszwagerman - let me introduce @qdread as our first reviewer! He's an expert reviewer with rOpenSci with lots of experience in longitudinal data. @qdread - regarding the change in your email, please use the form above to update! |
@thomaszwagerman - great question and thanks for the catch! You're probably right its better in one of those locations to avoid confusion building the R package (otherwise we can tell the build to ignore it) |
@ropensci-review-bot assign @TheAnalyticalEdge as reviewer |
@TheAnalyticalEdge added to the reviewers list. Review due date is 2025-01-06. Thanks @TheAnalyticalEdge for accepting to review! Please refer to our reviewer guide. rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more. |
@TheAnalyticalEdge: If you haven't done so, please fill this form for us to update our reviewers records. |
@thomaszwagerman - I'm delighted to announce @TheAnalyticalEdge as our second review! @TheAnalyticalEdge and @qdread - Please do not stress about the dates the bot "autoassigned" you. As we all discussed offline, the end of December is quite busy for all with the holidays. We can easily bump the date when the time comes. |
My review is below. Happy New Year to all! And specifically for @emilyriederer Go Tar Heels! Package Review
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing: 5
Review CommentsGeneral commentsI enjoyed reviewing this package. Thanks for giving me the opportunity to take a look at it! Below I have written some elaborations on the checklist items that I did not mark complete, and point-by-point comments on different things I noticed while looking over the package. I would say that most of these comments are more like suggestions for possible improvements, than me noticing fatal flaws that must be fixed or else. In general, the package does a well-defined job, does it correctly, and documents itself well. One general comment I have is that the code execution seemed fairly slow for the examples that only use input datasets with a few dozen rows. I would be concerned that it would scale poorly to large datasets. It might be useful to test this with (realistically) large input datasets to see if the performance is still acceptable. Explanation of checklist items not marked complete
Specific comments: suggested improvements to the code
Specific comments: vignettes
Specific comments: function documentation
|
📆 @qdread you have 2 days left before the due date for your review (2025-01-04). |
📆 @TheAnalyticalEdge you have 2 days left before the due date for your review (2025-01-06). |
Submitting Author Name: Thomas Zwagerman
Due date for @qdread: 2025-01-04Submitting Author Github Handle: @thomaszwagerman
Repository: https://github.com/antarctica/butterfly
Version submitted: 1.0.0
Submission type: Standard
Editor: @emilyriederer
Reviewers: @qdread, @TheAnalyticalEdge
Due date for @TheAnalyticalEdge: 2025-01-06
Archive: TBD
Version accepted: TBD
Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
This package was written to handle the verification of continually updating time-series data, where we expect new values over time, but want to ensure previous data remains unchanged, and timesteps remain continuous. This package provides functionality that can be used as part of a data pipeline, to check and flag changes to previous data and prevent changes going unnoticed.
Researchers, data engineers, data stewards etc. We have implemented this package in an operational data pipeline, which extracts ERA5 data and performs some calculations to generate a new publishable dataset. ERA5 can be subject to retrospective changes, and so to prevent these changes going unnoticed and affecting our dataset, we use butterfly to verify our data has not unexpectedly changed compared to previously published versions. New since pre-submission: this package also contains functionality to check the continuity of timeseries data (due to instrument failure), and handle time series data with varying measurement frequencies.
(duplicated from pre-submission #665)
To my knowledge, there is no other package that handles the comparing of iterations of the same data, i.e. we want to verify there are no changes in previous data, but at the same time we do not want new data to throw an error and stop our pipeline.
waldo - butterfly uses waldo::compare() in every function to provide a report on difference. There is therefore significant overlap, however butterfly builds on waldo by providing the functionality of comparing objects where we expect changes in some places (entirely new rows of data), but not in others (previously published data). butterfly also provides extra user feedback to provide clarity on what it is and isn’t comparing, due to the nature of comparing only “matched” rows.
diffdf - similar to waldo, but specifically for data frames, diffdf provides the ability to compare data frames directly. diffdf::diffdf() could have been used in our case, but I prefer waldo’s more explicit and clear user feedback. That said, there is significant overlap in functionality: butterfly::loupe() and diffdf::diffdf_has_issues() both provide a TRUE/FALSE difference check, while diffdf::diffdf_issue_rows() and butterfly::catch() both return the rows where changes have occurred. However, it lacks the flexibility of butterfly to compare object where we expect some changes, but not others.
assertr - assertr provides assertion functionality that can be used as part of a pipeline, and test assertions on a particular dataset, but it does not offer tools for comparison. I would use them hand-in-hand but butterfly has a comparison purpose.
daquiri - daquiri provides tools to check data quality and visually inspect timeseries data. It is a quality assurance package for timeseries, but does not have a comparison/verification purpose like butterfly.
N/A
#665
Since pre-submission, two new functions have been added to the package
timeline()
andtimeline_group()
. This has expanded the package to also deal with time series data from instruments/sensors, specifically those which are prone to error, or those which intentionally measure time stamped data at variable frequencies. Further feedback on controlling tolerance were also incorporated.Another pre-submission suggestion, on checking relationships between variables was not implemented. I had a few passes at this, but I could not think of a generalised method of doing this effectively. I currently don't have a specific use case which suited, but I would be open to re-visiting and open to suggestions.
pkgcheck
items which your package is unable to pass.The only warning is "Function names are duplicated in other packages". Currently I've named functions in the way they made sense for this package, and following the "butterfly" theme. However, I recognise that in specific cases this might not be ideal. I would be very open to feedback/guidance from reviewers on best practices for naming these (with the personal preference of sticking with the "butterfly" theme), perhaps pre-pending functions as
bf_*()
? In the documentation I've tried to specify functions using butterfly::*().Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: