-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QuadratiK: A Collection of Methods Using Kernel-Based Quadratic Distances for Statistical Inference and Clustering #632
Comments
@ropensci-review-bot check srr |
1 similar comment
@ropensci-review-bot check srr |
'srr' standards compliance:
✔️ This package complies with > 50% of all standads and may be submitted. |
Thanks for the submission @giovsaraceno ! I'm getting some advice from the other editors about your question. One thing that would be really helpful - could you push up your documentation to a GitHub page? From the |
Hi @giovsaraceno, Mark here from the rOpenSci stats team to answer your question. We've done our best to clarify the role of Probability Distributions Standards:
So packages should generally fit within some main category, with Probability Distributions being an additional category. In your case, Dimensionality Reduction seems like the appropriate main category, but it seems like your package would also fit within Probability Distributions. Given that, the next step would be for you to estimate what proportion of those standards you think might apply to your package? Our general rule-of-thumb is that at least 50% should apply, but for Probability Distributions as an additional category, that figure may be lower. We are particularly keen to document compliance with this category, because it is where our standards have a large overlap with many core routines of the R language itself. As always, we encourage feedback on our standards, so please also feel very welcome to open issues in the Stats Software repository, or add comments or questions in the discussion pages. Thanks for you submission! |
Thanks @ldecicco-USGS for your guidance during this process. Following your suggestion, I've now pushed the documentation for the QuadratiK package to a GitHub page. You can find it displayed on the main page of the GitHub repository. Here's the direct link for easy access: QuadratiK package GitHub page. |
Hi Mark, Thank you for the additional clarification regarding the standards for Probability Distributions and their integration with other statistical software categories. Following your guidance, we have conducted a thorough review of the standards applicable to the Probability Distributions category in relation to our package. Based on our assessment, we found that the current version of our package satisfies 14% of the standards directly. Furthermore, we identified that an additional 36% of the standards could potentially apply to our package, but this would require us to make some enhancements, including the addition of checks and test codes. We feel the remaining 50% of the standards are not applicable to our package. We are committed to improve our package and aim to fulfill the applicable standards. To this end, we plan to work on a separate branch dedicated to implementing these enhancements, with the goal of meeting the 50% of the standards for the Probability Distributions category. Before proceeding, we would greatly appreciate your opinion on this plan. Thank you for your time and support. Giovanni |
Hi Mark, We addressed the enhancements we discussed, and our package now meets 50% of the standards for the Probability Distributions category. These updates are in the probability-distributions-standards branch of our repository. Thank you, Giovanni |
Hi Giovanni, your Those are very minor points which you may ignore for the moment if you'd like to get the review process started, or you could quickly address them straight away if you prefer. Either way, feel free to ask the bot to |
Hi, thank you for your suggestions on our compliance statements and testing practices. As for comparing results from different distributions, the rpkb function in our package provides options to generate random observations using three distinct algorithms based on different probability distributions. We've conducted tests to confirm that each method functions as intended. We added also a new vignette in which the methods are compared by graphically displaying the generated points. Is this what you are looking for? We're inclined to address them promptly. We would appreciate if we can get an answer to the questions posed above so that we can start the review process. |
Sorry we didn't reply faster, @giovsaraceno. In, say, a single-variable distribution tests might include:
|
Thanks @noamross for your explanation. We have taken your suggestions into consideration and have implemented them accordingly. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 The following problems were found in your submission template:
👋 |
Checks for QuadratiK (v1.0.0)git hash: 21541a40
Important: All failing checks above must be addressed prior to proceeding (Checks marked with 👀 may be optionally addressed.) Package License: GPL (>= 3) 1. rOpenSci Statistical Standards (
|
type | package | ncalls |
---|---|---|
internal | base | 382 |
internal | QuadratiK | 50 |
internal | utils | 10 |
internal | grDevices | 1 |
imports | stats | 29 |
imports | methods | 26 |
imports | sn | 14 |
imports | ggpp | 2 |
imports | cluster | 1 |
imports | mclust | 1 |
imports | moments | 1 |
imports | rrcov | 1 |
imports | clusterRepro | NA |
imports | doParallel | NA |
imports | foreach | NA |
imports | ggplot2 | NA |
imports | ggpubr | NA |
imports | MASS | NA |
imports | movMF | NA |
imports | mvtnorm | NA |
imports | Rcpp | NA |
imports | RcppEigen | NA |
imports | rgl | NA |
imports | rlecuyer | NA |
imports | Tinflex | NA |
suggests | knitr | NA |
suggests | rmarkdown | NA |
suggests | roxygen2 | NA |
suggests | testthat | NA |
linking_to | Rcpp | NA |
linking_to | RcppEigen | NA |
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.
base
list (46), data.frame (26), matrix (24), nrow (23), t (20), log (19), rep (19), ncol (18), c (14), numeric (12), for (11), sqrt (10), length (8), mean (8), as.numeric (6), return (6), sample (6), T (6), vapply (6), apply (5), as.factor (5), table (5), unique (5), as.vector (4), cumsum (4), exp (4), rbind (4), sum (4), as.matrix (3), kappa (3), lapply (3), lgamma (3), pi (3), q (3), replace (3), unlist (3), as.integer (2), diag (2), max (2), readline (2), rownames (2), rowSums (2), which (2), which.max (2), with (2), beta (1), colMeans (1), expand.grid (1), F (1), factor (1), if (1), levels (1), norm (1), rep.int (1), round (1), seq_len (1), subset (1)
QuadratiK
DOF (3), kbNormTest (3), normal_CV (3), C_d_lambda (2), compute_CV (2), cv_ksample (2), d2lpdf (2), dlpdf (2), lpdf (2), norm_vec (2), objective_norm (2), poisson_CV (2), rejvmf (2), sample_hypersphere (2), statPoissonUnif (2), compare_qq (1), compute_stats (1), computeKernelMatrix (1), computePoissonMatrix (1), dpkb (1), elbowMethod (1), generate_SN (1), NonparamCentering (1), objective_2 (1), objective_k (1), ParamCentering (1), pkbc_validation (1), rejacg (1), rejpsaw (1), select_h (1), stat_ksample_cpp (1), stat2sample (1)
stats
df (12), quantile (4), dist (2), rnorm (2), runif (2), aggregate (1), cov (1), D (1), qchisq (1), sd (1), sigma (1), uniroot (1)
methods
setMethod (12), setGeneric (8), new (3), setClass (3)
sn
rmsn (14)
utils
data (8), prompt (2)
ggpp
annotate (2)
cluster
silhouette (1)
grDevices
colorRampPalette (1)
mclust
adjustedRandIndex (1)
moments
skewness (1)
rrcov
PcaLocantore (1)
NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.
3. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in C++ (17% in 2 files) and R (83% in 12 files)
- 4 authors
- 5 vignettes
- 1 internal data file
- 21 imported packages
- 24 exported functions (median 14 lines of code)
- 56 non-exported functions in R (median 16 lines of code)
- 16 R functions (median 13 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:
loc
= "Lines of Code"fn
= "function"exp
/not_exp
= exported / not exported
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown()
function
The final measure (fn_call_network_size
) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
measure | value | percentile | noteworthy |
---|---|---|---|
files_R | 12 | 65.5 | |
files_src | 2 | 79.1 | |
files_vignettes | 5 | 96.9 | |
files_tests | 10 | 90.7 | |
loc_R | 1408 | 76.6 | |
loc_src | 281 | 34.1 | |
loc_vignettes | 235 | 55.3 | |
loc_tests | 394 | 70.0 | |
num_vignettes | 5 | 97.9 | TRUE |
data_size_total | 11842 | 71.9 | |
data_size_median | 11842 | 80.1 | |
n_fns_r | 80 | 70.4 | |
n_fns_r_exported | 24 | 72.5 | |
n_fns_r_not_exported | 56 | 70.6 | |
n_fns_src | 16 | 40.4 | |
n_fns_per_file_r | 5 | 67.1 | |
n_fns_per_file_src | 8 | 69.1 | |
num_params_per_fn | 5 | 69.6 | |
loc_per_fn_r | 15 | 46.1 | |
loc_per_fn_r_exp | 14 | 35.1 | |
loc_per_fn_r_not_exp | 16 | 54.8 | |
loc_per_fn_src | 13 | 41.6 | |
rel_whitespace_R | 24 | 82.7 | |
rel_whitespace_src | 18 | 36.2 | |
rel_whitespace_vignettes | 16 | 29.2 | |
rel_whitespace_tests | 34 | 78.1 | |
doclines_per_fn_exp | 50 | 62.8 | |
doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
fn_call_network_size | 50 | 66.3 |
3a. Network visualisation
Click to see the interactive network visualisation of calls between objects in package
4. goodpractice
and other checks
Details of goodpractice checks (click to open)
3a. Continuous Integration Badges
(There do not appear to be any)
GitHub Workflow Results
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
8851531581 | pages build and deployment | success | 21541a | 25 | 2024-04-26 |
8851531648 | pkgcheck | failure | 21541a | 60 | 2024-04-26 |
8851531643 | pkgdown | success | 21541a | 25 | 2024-04-26 |
8851531649 | R-CMD-check | success | 21541a | 83 | 2024-04-26 |
8851531642 | test-coverage | success | 21541a | 83 | 2024-04-26 |
3b. goodpractice
results
R CMD check
with rcmdcheck
R CMD check generated the following warning:
- checking whether package ‘QuadratiK’ can be installed ... WARNING
Found the following significant warnings:
Warning: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'.
See ‘/tmp/RtmpQrtXuf/file133861d90686/QuadratiK.Rcheck/00install.out’ for details.
R CMD check generated the following note:
- checking installed package size ... NOTE
installed size is 16.6Mb
sub-directories of 1Mb or more:
libs 15.0Mb
R CMD check generated the following check_fails:
- no_import_package_as_a_whole
- rcmdcheck_examples_run_without_warnings
- rcmdcheck_significant_compilation_warnings
- rcmdcheck_reasonable_installed_size
Test coverage with covr
Package coverage: 78.21
Cyclocomplexity with cyclocomp
The following function have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
select_h | 46 |
Static code analyses with lintr
lintr found the following 20 potential issues:
message | number of times |
---|---|
Avoid library() and require() calls in packages | 9 |
Lines should not be more than 80 characters. | 9 |
Use <-, not =, for assignment. | 2 |
5. Other Checks
Details of other checks (click to open)
✖️ Package contains the following unexpected files:
- src/RcppExports.o
- src/kernel_function.o
✖️ The following function name is duplicated in other packages:
-
extract_stats
from ggstatsplot
Package Versions
package | version |
---|---|
pkgstats | 0.1.3.13 |
pkgcheck | 0.1.2.21 |
srr | 0.1.2.9 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
We have solved all the marked items and we are now ready to request the automatic bot check. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 The following problems were found in your submission template:
👋 |
Hi @jooolia, |
@jooolia The automated checks failed because of issue linked to above. @giovsaraceno When you've fixed this issue and confirmed that pkgcheck workflows once again succeed in your repo, please call |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 The following problems were found in your submission template:
👋 |
@mpadge I have modified the initial issue text by additing the version submitted and choosing the badge grade. Please let us know if anything else is needed. |
We have added the provided badge into the README file and added the NEWS.md file into the package. |
@ropensci-review-bot add @kasselhingee as reviewer |
@kasselhingee added to the reviewers list. Review due date is 2024-07-16. Thanks @kasselhingee for accepting to review! Please refer to our reviewer guide. rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more. |
@kasselhingee: If you haven't done so, please fill this form for us to update our reviewers records. |
It is working now. Thank you @mpadge ! |
Package Review
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing:
Review CommentsI'm only partially familiar with the area of spherical data. Both your package's tests of uniformity and clustering sound useful. The I found it really hard to understand your package initially. That was because most of the documentation ignores G1.3 on explaining statistical terms. For example, I jumped into the I would love it if your readme described the major benefits with more detail. For example, that More on following G1.3. There are many unexplained terms I've never heard of before, a few used differently to expectation, and others used vaguely. At times it felt like these were all the terms!
Package-Level CommentsMajor Functionality ClaimsThe following appear to be met from the documentation and function results:
However, I haven't checked if they are implemented correctly and the package lacks tests to confirm results in simple situations. These claims seem unmet to me:
Statement of need in README
Installation
Dependencies
Parts of the PackageVignettes
Function help
ExamplesI get warnings from Test Coverage
Packaging guidelines
More
|
Hello @kasselhingee All the points raised in this review will be addressed. I do have some questions associated with some of the points and I would greatly appreciate if you could please provide some guidance. Here are my questions:
Thank you again for your time and the helpful review. |
📆 @kasselhingee you have 2 days left before the due date for your review (2024-07-16). |
Hi @giovsaraceno , glad it was helpful.
|
Hello @kasselhingee,
Thank you very much for your detailed and constructive feedback. We acknowledge the complexity of the statistical methods implemented in the package, particularly for users who may not be familiar with spherical data analysis or the specific kernel-based techniques we have employed.
We have added an introduction of the PKB distributions in the README file and the help of the
The package documentation has been significantly enhanced in order to clarify the introduction and usage of the mentioned points. It now provides a more comprehensive overview, featuring a brief introduction to the methods, a clear explanation of the theoretical foundations, and a discussion of the advantages and appropriate use cases. Additionally, we have incorporated further tests in line with the reviewer's suggestions.
Please notice that we have removed the mentioned statement from the README file.
Thank you for identifying this bug. We have fixed this and now the scatter-plots are correctly generated.
In the README file, we have added an Installation section which includes the link for the Python package. It is implemented separately and it is now under review with pyOpenSci.
The dashboard is generated using the associated Python package. You can find the link with detailed instructions on how to access it in the Installation section of the README file.
The README file and the documentation for the main functions now include a statement highlighting the advantages of the proposed methods and the scenarios in which they are most appropriate.
Thank you for your feedback. We appreciate your acknowledgment of the peer-reviewed nature of the statistical methods included in the package. While the methods implemented have been extensively validated in the literature and are recognized for their contributions to the field, we believe the practical utility and ease of access these methods offer to users is a significant contribution. We welcome any further insights or suggestions you might have regarding the implementation or documentation of these methods.
We apologize for the inconvenience, it was not available at the time of the first submission. More information about the two and k-sample tests can be found in the newly released arXiv pre-print
Please see our answer above where we state clearly that with the
Apologies, we hope it is clear now.
It tests if the sample follows a multivariate normal distributions with a given mean vector and covariance matrix. If these parameters are not provided, the standard normal distribution is considered. We hope now it is clearly explained.
We refer here to general distributions for the two and
Thank you for your feedback. We have improved the documentation to clearly outline the specific use cases for the package’s functionalities. Our goal is to make the package accessible to a broader audience, including those who may not be familiar with the associated papers. For those interested in a deeper understanding, we have included references to the original research.
The README file now has an Installation section with instructions for installing the CRAN version or the development version of the package.
The packages 'rlecuyer', 'doParallel', 'foreach' and 'parallel' are fundamental for performing the parallel computing in the
The packages 'mclust', 'clusterRepro' and 'cluster' are used for comupting the ARI, IGP and ASW in the The packages 'Tinflex' and 'movMF' are used for the random generation of points from the Poisson kernel-based distribution through the The package 'ggpp' is no longer used. The packages 'rrcov', 'mvtnorm' and 'ggpubr' are now called according the automatic check. They are not moved to the 'Suggests' section since they are used for the package's functionalities. There are also the packages 'sphunif' and 'circular' that appear not be called according the automatic check. This happens since they are used in the tests or the vignettes.
Thank you for the comment.
We checked that all the vignettes can be run locally.
Thank you for the comment. In the revised vignettes we added more details and it is not necessary that readers are familiar with the papers or the other vignettes.
The proposed test for uniformity and clustering algorithm are tailored for data points on the sphere
Corrected.
The chunck has been corrected using the function
Each cluster is represented by a so-called silhouette which is based on the comparison of its tightness and separation. The average silhouette width provides an evaluation of clustering validity, and might be used to select an ‘appropriate’ number of clusters.
We have modified the
Apologies, now it is clearly stated. We avoided the term 'omnibus tests'.
This chunck shows the usage of
Thanks for this comment. We have modified the code such that the mentioned line can be run with the sample size considered in the example. However, this aspect is now mentioned in the help of the
Thanks, we have checked again the correct compilation of the remaining vignettes.
We have modified the help such that the corresponding references do not need to be read.
In the details section of the
We have improved the description of the function
The role of reference distribution parameters has been specified in the Details section.
Apologies. Our 2024 manuscript is now available on arXiv and we updated the corresponding references.
In the
The help of the
Thank you for the comment.
We have added the corresponding links.
Thank you for the suggestion. At this step we prefer to organize them separately.
We have added this information in the README file and the help of the
We added all the relevant information for introducing the clustering algorithm in the help documentation, including information about the initialization and stopping rule.
It is correct. It is now clearly explained in the help of the
We thank the reviewer for pointing this out. The ?plot.pkb was not accessible for a missing tag for the 'roxygen2' package which is used for building the documentation. Now, the help documentation for the
We have modified the plot function as suggested.
We thank the reviewer for pointing this out and we fixed the mentioned bug. The scatter plot displays data points in color and the color indicates the cluster membership of each point obtained for the indicated number of clusters. In the current version, if the number of clusters is not provided, the function shows the scatterplot for each possible value of clusters considered, as indicated in the arguments.
The missing information is now included in the details of the help documentation of the function, as well as it is indicated as header in the created plots.
This information has been added in the Note section of the documentation of the
Corrected.
The function
Golzy and Markatou (2020) proposed an acceptance-rejection method for simulating data from a PKBD. Furthermore Sablica, Hornik and Leydold (2023) proposed new ways for simulating from the PKBD. We have checked the results in the case
We have added a test where three well separed clusters are generated. We test that the clustering algorithm correcly identifies the three clusters.
We also added tests for these functions.
We added a test checking that the plot method does not return any error or warning.
README file includes installation instructions, describes the main functionalities, and indicates the corresponding citations and references. There is also a link pointing to the introductory vignette of the package.
The Citation section has been added.
An introductory vignette has been added presenting the key features of the packages with simple examples and useful links.
Citations have been added.
Cross-links have been added
Yes, the pkgdown website can be assessed from the GitHub page of the package at the link https://giovsaraceno.github.io/QuadratiK-package/. This link is already present in the DESCRIPTION file and in the Citation section of the README file.
In the vignette which shows the usage of the test for uniformity on the sphere we added an example generating data from a multimodal distribution and we compared the obtained results with two tests from the literature.
The suggested command is run and the spelling errors are handled. |
Hi @giovsaraceno great to hear back from you. I've had a busy few weeks, but I hope to get something back to you by the end of next week. |
Hi @kasselhingee, thanks for the update and for letting me know. I look forward to hearing from you soon. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 Editor check started 👋 |
Checks for QuadratiK (v1.1.2)git hash: 32b794e7
Package License: GPL (>= 3) 1. rOpenSci Statistical Standards (
|
type | package | ncalls |
---|---|---|
internal | base | 414 |
internal | QuadratiK | 54 |
internal | utils | 13 |
internal | graphics | 6 |
internal | grDevices | 4 |
imports | stats | 31 |
imports | methods | 27 |
imports | sn | 8 |
imports | ggpubr | 3 |
imports | scatterplot3d | 2 |
imports | parallel | 1 |
imports | doParallel | 1 |
imports | foreach | 1 |
imports | moments | 1 |
imports | mvtnorm | 1 |
imports | rrcov | 1 |
imports | ggplot2 | NA |
imports | Rcpp | NA |
imports | RcppEigen | NA |
imports | rlecuyer | NA |
suggests | Tinflex | 3 |
suggests | cluster | 1 |
suggests | mclust | 1 |
suggests | movMF | 1 |
suggests | knitr | NA |
suggests | rmarkdown | NA |
suggests | roxygen2 | NA |
suggests | testthat | NA |
suggests | rgl | NA |
suggests | sphunif | NA |
suggests | circular | NA |
suggests | clusterRepro | NA |
linking_to | Rcpp | NA |
linking_to | RcppEigen | NA |
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.
base
list (55), c (33), matrix (22), nrow (21), t (21), data.frame (20), rep (20), log (19), ncol (19), for (12), sqrt (12), length (10), as.numeric (9), return (9), mean (8), numeric (8), T (7), unique (7), apply (6), det (6), sample (6), vapply (6), as.factor (5), table (5), as.vector (4), cumsum (4), exp (4), rbind (4), sum (4), as.matrix (3), diag (3), kappa (3), lgamma (3), pi (3), q (3), replace (3), unlist (3), max (2), readline (2), rowSums (2), which (2), which.max (2), beta (1), colMeans (1), expand.grid (1), factor (1), if (1), lapply (1), levels (1), norm (1), rep.int (1), round (1), seq_along (1), seq_len (1), subset (1), with (1)
QuadratiK
kbNormTest (4), compute_CV (3), normal_CV (3), C_d_lambda (2), cv_ksample (2), d2lpdf (2), dlpdf (2), DOF_norm (2), lpdf (2), norm_vec (2), objective_norm (2), poisson_CV (2), rejvmf (2), sample_hypersphere (2), statPoissonUnif (2), compare_qq (1), compute_stats (1), computeKernelMatrix (1), computePoissonMatrix (1), DOF (1), dpkb (1), elbowMethod (1), generate_SN (1), NonparamCentering (1), objective_2 (1), objective_k (1), ParamCentering (1), pkbc_validation (1), rejacg (1), rejpsaw (1), root_func (1), select_h (1), stat_ksample_cpp (1), stat2sample (1), var_norm (1)
stats
df (12), quantile (5), dist (2), qchisq (2), rnorm (2), runif (2), aggregate (1), cov (1), D (1), sd (1), sigma (1), uniroot (1)
methods
setMethod (12), setGeneric (8), new (3), setClass (3), is (1)
utils
data (11), prompt (2)
sn
rmsn (8)
graphics
par (6)
grDevices
colors (2), rainbow (2)
ggpubr
ggarrange (3)
Tinflex
Tinflex.sample (2), Tinflex.setup.C (1)
scatterplot3d
scatterplot3d (2)
cluster
silhouette (1)
doParallel
registerDoParallel (1)
foreach
foreach (1)
mclust
adjustedRandIndex (1)
moments
skewness (1)
movMF
rmovMF (1)
mvtnorm
rmvnorm (1)
parallel
makeCluster (1)
rrcov
PcaLocantore (1)
NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.
3. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in C++ (20% in 2 files) and R (80% in 14 files)
- 4 authors
- 6 vignettes
- 3 internal data files
- 15 imported packages
- 28 exported functions (median 10 lines of code)
- 60 non-exported functions in R (median 14 lines of code)
- 20 R functions (median 13 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:
loc
= "Lines of Code"fn
= "function"exp
/not_exp
= exported / not exported
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown()
function
The final measure (fn_call_network_size
) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
measure | value | percentile | noteworthy |
---|---|---|---|
files_R | 14 | 68.7 | |
files_src | 2 | 79.5 | |
files_vignettes | 12 | 99.5 | |
files_tests | 10 | 87.4 | |
loc_R | 1529 | 76.0 | |
loc_src | 387 | 44.0 | |
loc_vignettes | 762 | 85.5 | |
loc_tests | 638 | 76.0 | |
num_vignettes | 6 | 97.6 | TRUE |
data_size_total | 77179 | 81.0 | |
data_size_median | 11842 | 79.6 | |
n_fns_r | 88 | 71.6 | |
n_fns_r_exported | 28 | 75.7 | |
n_fns_r_not_exported | 60 | 70.6 | |
n_fns_src | 20 | 54.9 | |
n_fns_per_file_r | 5 | 72.2 | |
n_fns_per_file_src | 10 | 80.5 | |
num_params_per_fn | 4 | 51.2 | |
loc_per_fn_r | 14 | 43.1 | |
loc_per_fn_r_exp | 10 | 25.3 | |
loc_per_fn_r_not_exp | 14 | 46.7 | |
loc_per_fn_src | 13 | 45.0 | |
rel_whitespace_R | 23 | 81.5 | |
rel_whitespace_src | 21 | 50.9 | |
rel_whitespace_vignettes | 31 | 85.9 | |
rel_whitespace_tests | 30 | 82.1 | |
doclines_per_fn_exp | 40 | 49.5 | |
doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
doclines_per_fn_src | 1 | 72.6 | |
fn_call_network_size | 57 | 67.7 |
3a. Network visualisation
Click to see the interactive network visualisation of calls between objects in package
4. goodpractice
and other checks
Details of goodpractice checks (click to open)
3a. Continuous Integration Badges
GitHub Workflow Results
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
11571834514 | pages build and deployment | success | 32b794 | 145 | 2024-10-29 |
11571834630 | pkgcheck | failure | 32b794 | 182 | 2024-10-29 |
11571834646 | pkgdown | success | 32b794 | 147 | 2024-10-29 |
11571834640 | R-CMD-check | success | 32b794 | 205 | 2024-10-29 |
11571834627 | test-coverage | success | 32b794 | 205 | 2024-10-29 |
3b. goodpractice
results
R CMD check
with rcmdcheck
R CMD check generated the following note:
- checking installed package size ... NOTE
installed size is 20.0Mb
sub-directories of 1Mb or more:
libs 18.1Mb
R CMD check generated the following check_fails:
- no_import_package_as_a_whole
- rcmdcheck_reasonable_installed_size
Test coverage with covr
Package coverage: 100
Cyclocomplexity with cyclocomp
The following functions have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
select_h | 47 |
pkbc_validation | 23 |
rpkb | 15 |
Static code analyses with lintr
lintr found no issues with this package!
Package Versions
package | version |
---|---|
pkgstats | 0.2.0.47 |
pkgcheck | 0.1.2.63 |
srr | 0.1.3.26 |
Editor-in-Chief Instructions:
This package is in top shape and may be passed on to a handling editor
Hi @giovsaraceno the package is much improved with much more information in the documentation. I noticed a few small things to change and some queries, but I do not need to see it again. Once the small changes have been made I recommend this package for Gold badge. The below is a report I generated with the help of
|
Hi @kasselhingee, Could you kindly provide information on the next steps in the review process? Thank you again for your time and effort in reviewing this package.
The indicated lines are now shorter than 80 characters. The function
Now
The dimension of the data matrix
One has been removed.
The packages
In general, the L2 transformation creates vectors of length 1 that lie on the unit sphere of dimension “In many wireless applications, the relative signal strengths across routers are more relevant to the underlying spatial patterns and device positioning than the absolute magnitudes. Additionally, absolute signal strength can be affected by noise, device orientation or environmental factors. In this case, it is reasonable to consider the spherically transformed data points using L2 normalization. This transformation maps the data onto the surface of a 6-dimensional sphere, ensuring that each observation has a uniform length.
We have changed the description of ASW as follows:
The two guidelines in the reported reference highlight two important aspects of using bootstrap for hypothesis testing.
This is true and the function rpkb() has been changed accordingly. It has also been specified that this function returns a data matrix where each row is a generated data point.
This has been changed along the package.
The value of
nClust has no default values now.
We have added an additional column indicating the corresponding number of clusters.
If two data points are identical, their cosine similarity is 1, which results in a contribution to WCSS, unlike Euclidean distance where identical points contribute 0. Thus, the WCSS calculated using cosine similarity increases with more clusters. Cosine similarity is a widely used measure when working with spherically transformed data, because it emphasizes the relative angular relationships between points, rather than their absolute distances. The use of cosine similarity aligns with the common interpretation of WCSS in elbow plots, where it increases with an increasing number of clusters, and the number of clusters showing the biggest difference in slope is suggested as the optimal number of clusters.
The ‘pkbc’ class has some methods defined for this type of objects. We think it is better to leave these functions the way they are since the current version allows for easier future developments and extensions.
We followed this suggestion.
Removed |
I think the next step is for @emitanaka to look at my review and your response. |
I am on travel at the moment so will look at this on my return on the week starting 16th Dec. |
Hi @emitanaka, |
Thank you for the prompt @giovsaraceno and sorry I let it slip in my travel/leave |
@ropensci-review-bot add @emitanaka as reviewer |
@emitanaka added to the reviewers list. Review due date is 2025-02-05. Thanks @emitanaka for accepting to review! Please refer to our reviewer guide. rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more. |
@emitanaka: If you haven't done so, please fill this form for us to update our reviewers records. |
Submitting Author Giovanni Saraceno
Due date for @kasselhingee: 2024-07-16Submitting Author Github Handle: @giovsaraceno
Other Package Authors Github handles: @rmj3197
Repository: https://github.com/giovsaraceno/QuadratiK-package§
Version submitted:1.1.1
Submission type: Stats
Badge grade: gold
Editor: @emitanaka
Reviewers: @kasselhingee, @emitanaka
Due date for @emitanaka: 2025-02-05
Archive: TBD
Version accepted: TBD
Scope
Data Lifecycle Packages
Statistical Packages
Bayesian and Monte Carlo Routines
Dimensionality Reduction, Clustering, and Unsupervised Learning
Machine Learning
Regression and Supervised Learning
Exploratory Data Analysis (EDA) and Summary Statistics
Spatial Analyses
Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
This category is the most suitable due to QuadratiK's clustering technique, specifically designed for spherical data. The package's clustering algorithm falls within the realm of unsupervised learning, where the focus is on identifying groupings in the data without pre-labeled categories. The two- and k-sample tests serve as additional tools for testing the differences between the identified groups.
Following the link https://stats-devguide.ropensci.org/standards.html we noticed in the "Table of contents" that category 6.9 refers to Probability Distribution. We are unsure how we fit and if we fit this category. Can you please advise?
Yes, we have incorporated documentation of standards into our QuadratiK package by utilizing the srr package, considering the categories "General" and "Dimensionality Reduction, Clustering, and Unsupervised Learning", in line with the recommendations provided in the rOpenSci Statistical Software Peer Review Guide.
The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions. Spherical data structures are common in fields such as biology, geosciences and astronomy, where data points are naturally mapped to a sphere. QuadratiK provides a tailored approach to effectively handle and interpret these data. Furthermore, this package is also of particular interest to professionals in health and biological sciences, where understanding and interpreting spherical data can be crucial in studies ranging from molecular biology to epidemiology. Moreover, its implementation in both R and Python broadens its accessibility, catering to a wide audience accustomed to these popular programming languages.
Yes, there are other R packages that address goodness-of-fit (GoF) testing and multivariate analysis. Notable among these are the energy package for energy statistics-based tests. The function kmmd in the kernlab package offers a kernel-based test which has similar mathematical formulation. The package sphunif provides all the tests for uniformity on the sphere available in literature. The list of implemented tests includes the test for uniformity based on the Poisson kernel. However, there are fundamental differences between the methods encoded in the aforementioned packages and those offered in the QuadratiK package.
QuadratiK uniquely focuses on kernel-based quadratic distances methods for GoF testing, offering a comprehensive set of tools for one-sample, two-sample, and k-sample tests. This specialization provides more nuanced and robust methodologies for statistical analysis, especially in complex multivariate contexts. QuadratiK is optimized for high-dimensional datasets, employing efficient C++ implementations. This makes it particularly suitable for contemporary large-scale data analysis challenges. The package introduces advanced methods for kernel centering and critical value computation, as well as optimal tuning parameter selection based on midpower analysis. QuadratiK includes a unique clustering algorithm for spherical data. These innovations are not covered in other available packages. With implementations in both R and Python, QuadratiK appeals to a wider audience across different programming communities. We also provide a user-friendly dashboard application which further enhances accessibility, catering to users with varying levels of statistical and programming expertise.
In summary there are fundamental differences between QuadratiK and all existing R packages:
Yes, our package, QuadratiK, is compliant with the rOpenSci guidelines on Ethics, Data Privacy, and Human Subjects Research. We have carefully considered and adhered to ethical standards and data privacy laws relevant to our work.
Please see the question posed in the first bullet.
The text was updated successfully, but these errors were encountered: