Database of all RcppExports.cpp files, to support RcppDeepState project.
Update compileAttributes-parse.R to count total usages of type across all Rcpp functions in CRAN packages, here are the top 100:
> arg.dt[, .(usages=.N), by=clean.type][order(-usages)][1:100]
clean.type usages
<char> <int>
1: int 10610
2: double 10460
3: Rcpp::NumericVector 9374
4: arma::mat 5317
5: arma::vec 4620
6: Rcpp::NumericMatrix 4062
7: bool 4058
8: SEXP 3826
9: Rcpp::List 2624
10: Rcpp::IntegerVector 2226
11: std::string 2109
12: unsigned int 1093
13: Rcpp::CharacterVector 971
14: Rcpp::IntegerMatrix 541
15: std::vector<double> 509
16: Eigen::VectorXd 491
17: Rcpp::DataFrame 439
18: arma::colvec 427
19: arma::uvec 416
20: Eigen::MatrixXd 415
21: long 400
22: std::vector<std::string> 394
23: Rcpp::RObject 282
24: Rcpp::String 280
25: std::size_t 276
26: std::vector<int> 258
27: arma::cube 257
28: Rcpp::Function 249
29: arma::uword 247
30: Rcpp::LogicalVector 227
31: Eigen::Map<Eigen::MatrixXd> 209
32: arma::ivec 202
33: float 165
34: Rcpp::S4 159
35: arma::sp_mat 151
36: Eigen::VectorXi 150
37: Rcpp::Nullable<Rcpp::NumericVector> 148
38: Rcpp::StringVector 141
39: unsigned 124
40: Rcpp::Environment 121
41: arma::rowvec 114
42: XPtrImage 99
43: Eigen::Map<Eigen::VectorXd> 96
44: Eigen::SparseMatrix<double> 83
45: arma::umat 77
46: char 65
47: Rcpp::DoubleVector 63
48: uint64 62
49: arma::field<arma::vec> 59
50: char* 57
51: uint 53
52: Rcpp::LogicalMatrix 51
53: Eigen::Matrix<double, Eigen::Dynamic, 1> 50
54: Rcpp::RawVector 49
55: std::vector<long> 49
56: char * 48
57: arma::cx_mat 43
58: arma::Mat<double> 41
59: Rcpp::XPtr<matrix4> 39
60: XPtrNode 38
61: arma::field<arma::mat> 37
62: Eigen::ArrayXd 36
63: std::vector<QuantLib::Date> 34
64: arma::imat 34
65: CV 33
66: Rcpp::CharacterMatrix 31
67: PyObjectRef 31
68: arma::ucube 27
69: arma::Col<unsigned> 26
70: std::vector<arma::mat> 25
71: arma::cx_cube 25
72: arma::field<arma::Cube<unsigned char>> 24
73: arma::Col<int> 23
74: uint32_t 23
75: QuantLib::Date 23
76: Rcpp::GenericVector 23
77: Rcpp::RawMatrix 22
78: std::vector<unsigned int> 21
79: Rcpp::Nullable<std::string> 21
80: XPtrMat 21
81: XPtrDoc 20
82: Rcpp::Nullable<Rcpp::IntegerVector> 19
83: Rcpp::Nullable<Rcpp::NumericMatrix> 18
84: DbResult* 18
85: unsigned long 17
86: std::vector<std::vector<std::string>> 17
87: Eigen::Map<Eigen::ArrayXd> 16
88: Rcpp::XPtr<DbConnectionPtr> 16
89: Rcpp::DateVector 16
90: RcppGSL::Matrix 16
91: std::vector<std::size_t> 16
92: std::vector<Rcpp::Environment> 16
93: ComplexVector 15
94: std::ostream* 15
95: std::vector<std::vector<int>> 14
96: std::vector<bool> 14
97: Rcpp::Nullable<Rcpp::CharacterVector> 14
98: std::vector<std::vector<double>> 13
99: arma::cx_vec 13
100: Symbol 13
clean.type usages
- checks-download.R downloads CRAN check pages from all packages under packages.
- checks-analyze.R parses “Additional issues” from the downloaded CRAN check pages. Will be useful when we compare RcppDeepState fuzz testing with standard CRAN tests, to answer the question, “how many more issues are we able to detect with fuzz testing, which we not already revealed using existing testing approaches?”
- sckott/cchecksapi#57 does list additional issues but does NOT yet implement search functionality so basically the same as asking CRAN directly (need to download N check pages where N is the number of packages).
> issue.dt[, .(pkgs=.N), by=type][order(pkgs)]
type pkgs
1: clang-ASAN 1
2: gcc-ASAN 1
3: MKL 1
4: ATLAS 1
5: OpenBLAS 2
6: clang11 2
7: noLD 3
8: LTO 5
9: rchk 7
10: gcc10 9
11: gcc-UBSAN 12
12: valgrind 18
13: clang-UBSAN 26
> type.dt[, .(pkgs=.N), by=type][order(pkgs)]
type pkgs
1: clang-ASAN 1
2: gcc-ASAN 1
3: gcc-UBSAN 12
4: valgrind 18
5: clang-UBSAN 26
> unique(type.dt$pkg)
[1] "AGread" "bigmemory" "BuyseTest"
[4] "cld2" "cld3" "compboost"
[7] "dggridR" "DStree" "fastAdaboost"
[10] "FLSSS" "FRegSigCom" "glamlasso"
[13] "glmmsr" "GMKMcharlie" "GreedySBTM"
[16] "iptools" "isotree" "kernelboot"
[19] "later" "lda.svi" "milr"
[22] "mined" "mixggm" "OneArmPhaseTwoStudy"
[25] "pdftools" "PP" "PRIMME"
[28] "protolite" "pts2polys" "r2sundials"
[31] "RcppDE" "Rdimtools" "Rdtq"
[34] "RMKL" "rTRNG" "sboost"
[37] "Scalelink" "scPDSI" "scrypt"
[40] "TDA" "tesseract" "TreeLS"
[43] "volesti"
>
Simplified pattern for matching a type in compileAttributes-parse.R, parseRcppExports function now self-contained.
- compileAttributes-untar.R untars the entire R package and then calls compileAttributes to generate a standard (easy to parse) RcppExports.cpp file.
- compileAttributes-parse.R does pretty much the same thing as packages-parse.R, but using the RcppExports.cpp file that we generated (instead of the file that was provided in the package source tar.gz file). The results below are similar to the previous results, but the numbers are a bit larger, which implies that we should run compileAttributes before parsing the RcppExports.cpp file.
> (some.types <- grep( + "SEXP|List", top10$clean.type, invert=TRUE, value=TRUE)) [1] "Rcpp::NumericVector" "Rcpp::NumericMatrix" "arma::mat" [4] "std::string" "Rcpp::CharacterVector" "int" [7] "Rcpp::IntegerVector" "double" > some.covered <- arg.counts[some.types, on="clean.type"][, .( + top10args=.N + ), by=.(pkg.dir, funName, args)][args==top10args][order(-args)] > some.covered[, .( + funs=.N, + pkgs=length(unique(pkg.dir)) + )] funs pkgs 1: 5952 1007 >
Also I checked to make sure that all funs/args are parsed using our regex, so we can be sure that the regex is sufficient (no need to improve any further).
> lines.dt[parsed<parameters] Empty data.table (0 rows and 3 cols): pkg.dir,parameters,parsed >
We are however unable to automatically fuzz the following 39 packages which use Rcpp, but do not use the export attribute, so there is no information about functions/args in the RcppExports.cpp file. Since this is a small minority of packages, it is acceptable to ignore these (we can require users of our software to use the Rcpp export attribute).
> lines.dt[parameters==0]
pkg.dir parameters parsed
1: compileAttributes/ANN2 0 0
2: compileAttributes/ConConPiWiFun 0 0
3: compileAttributes/CoxPlus 0 0
4: compileAttributes/DPP 0 0
5: compileAttributes/DiffusionRgqd 0 0
6: compileAttributes/DiffusionRimp 0 0
7: compileAttributes/DiffusionRjgqd 0 0
8: compileAttributes/FiRE 0 0
9: compileAttributes/FisPro 0 0
10: compileAttributes/GiRaF 0 0
11: compileAttributes/MADPop 0 0
12: compileAttributes/NPBayesImputeCat 0 0
13: compileAttributes/NlinTS 0 0
14: compileAttributes/OncoBayes2 0 0
15: compileAttributes/OneArmPhaseTwoStudy 0 0
16: compileAttributes/RBesT 0 0
17: compileAttributes/RcppBDT 0 0
18: compileAttributes/RcppCNPy 0 0
19: compileAttributes/RcppDL 0 0
20: compileAttributes/RcppHNSW 0 0
21: compileAttributes/RcppXsimd 0 0
22: compileAttributes/RcppXts 0 0
23: compileAttributes/YPPE 0 0
24: compileAttributes/bmlm 0 0
25: compileAttributes/cblasr 0 0
26: compileAttributes/cbq 0 0
27: compileAttributes/cccp 0 0
28: compileAttributes/compboost 0 0
29: compileAttributes/dggridR 0 0
30: compileAttributes/hsstan 0 0
31: compileAttributes/incgraph 0 0
32: compileAttributes/lm.br 0 0
33: compileAttributes/lolog 0 0
34: compileAttributes/multinet 0 0
35: compileAttributes/qmix 0 0
36: compileAttributes/randomUniformForest 0 0
37: compileAttributes/rrcovHD 0 0
38: compileAttributes/s2net 0 0
39: compileAttributes/wingui 0 0
pkg.dir parameters parsed
>
- packages-download.R downloads all CRAN packages which list Rcpp under LinkingTo.
- packages-untar.R extracts just the RcppExports.cpp file from each package tar.gz file. (these are copied to the packages directory in this github repo)
- input_parameter_parse.R was for experimenting with regex subroutines, but it only parses argument types (not functions) so it should no longer be used.
- packages-parse.R analyzes which types are used most frequently in R packages that use Rcpp:
The top 10 types are:
> (top10 <- arg.counts[args==1, .(
+ funs=.N,
+ pkgs=length(unique(pkg.dir))
+ ), by=clean.type][order(-funs)][1:10])
clean.type funs pkgs
1: SEXP 380 72
2: Rcpp::NumericVector 330 154
3: Rcpp::NumericMatrix 236 128
4: arma::mat 208 102
5: Rcpp::List 172 71
6: std::string 159 76
7: Rcpp::CharacterVector 112 51
8: int 108 60
9: Rcpp::IntegerVector 88 37
10: double 79 44
>
If we implement RcppDeepState_* random generation functions for each
of these ten types, then we will be able to automatically test this many
functions/packages:
> covered[, .( + funs=.N, + pkgs=length(unique(pkg.dir)) + )] funs pkgs 1: 7702 1132 >
If we only implement these 8 (easy) then we have this many:
> (some.types <- grep("SEXP|List", top10$clean.type, invert=TRUE, value=TRUE))
[1] "Rcpp::NumericVector" "Rcpp::NumericMatrix" "arma::mat"
[4] "std::string" "Rcpp::CharacterVector" "int"
[7] "Rcpp::IntegerVector" "double"
> some.covered <- arg.counts[some.types, on="clean.type"][, .(
+ top10args=.N
+ ), by=.(pkg.dir, funName, args)][args==top10args][order(-args)]
> some.covered[, .(
+ funs=.N,
+ pkgs=length(unique(pkg.dir))
+ )]
funs pkgs
1: 5838 995
>