drostlab · Aug 20, 2021
diff --git a/‎.gitignore
+7 b/‎.gitignore
+7
diff --git a/‎DESCRIPTION
+2-1 b/‎DESCRIPTION
+2-1
diff --git a/‎NAMESPACE
+3 b/‎NAMESPACE
+3
diff --git a/‎NEWS.md
+2 b/‎NEWS.md
+2
diff --git a/‎R/KL.R
+1-1 b/‎R/KL.R
+1-1
diff --git a/‎R/RcppExports.R
+251-16 b/‎R/RcppExports.R
+251-16
diff --git a/‎R/distance.R
+1-1 b/‎R/distance.R
+1-1
diff --git a/‎man/KL.Rd
+1-1 b/‎man/KL.Rd
+1-1
diff --git a/‎man/bhattacharyya.Rd
+17-7 b/‎man/bhattacharyya.Rd
+17-7
diff --git a/‎man/dist_many_many.Rd
+60 b/‎man/dist_many_many.Rd
+60
diff --git a/‎man/dist_one_many.Rd
+60 b/‎man/dist_one_many.Rd
+60
diff --git a/‎man/dist_one_one.Rd
+59 b/‎man/dist_one_one.Rd
+59
diff --git a/‎man/distance.Rd
+1-1 b/‎man/distance.Rd
+1-1
diff --git a/‎man/jeffreys.Rd
+16-1 b/‎man/jeffreys.Rd
+16-1
diff --git a/‎man/kulczynski_d.Rd
+16-1 b/‎man/kulczynski_d.Rd
+16-1
diff --git a/‎man/kullback_leibler_distance.Rd
+16-1 b/‎man/kullback_leibler_distance.Rd
+16-1
diff --git a/‎man/kumar_johnson.Rd
+16-1 b/‎man/kumar_johnson.Rd
+16-1
diff --git a/‎man/neyman_chi_sq.Rd
+16-1 b/‎man/neyman_chi_sq.Rd
+16-1
diff --git a/‎man/pearson_chi_sq.Rd
+16-1 b/‎man/pearson_chi_sq.Rd
+16-1
diff --git a/‎man/taneja.Rd
+16-1 b/‎man/taneja.Rd
+16-1
diff --git a/‎src/RcppExports.cpp
+131 b/‎src/RcppExports.cpp
+131
diff --git a/‎src/dist_matrix.cpp
+230 b/‎src/dist_matrix.cpp
+230
diff --git a/‎src/distances.h
+133-8 b/‎src/distances.h
+133-8
diff --git a/‎src/philentropy_init.c
-146 b/‎src/philentropy_init.c
-146
diff --git a/‎tests/testthat/test-dist-functions.R
+52 b/‎tests/testthat/test-dist-functions.R
+52
diff --git a/‎vignettes/Distances.Rmd
+2-2 b/‎vignettes/Distances.Rmd
+2-2
diff --git a/‎vignettes/Many_Distances.Rmd
+132 b/‎vignettes/Many_Distances.Rmd
+132
@@ -0,0 +1,7 @@
+.Rproj.user
+.Rhistory
+.RData
+.Ruserdata
+src/*.o
+src/*.so
+src/*.dll
@@ -22,7 +22,8 @@ URL: https://github.com/drostlab/philentropy
 Suggests:
     testthat,
     knitr,
-    markdown
+    rmarkdown,
+    microbenchmark
 VignetteBuilder: knitr
 BugReports: https://github.com/drostlab/philentropy/issues
 RoxygenNote: 7.1.1
@@ -17,6 +17,9 @@ export(cosine_dist)
 export(czekanowski)
 export(dice_dist)
 export(dist.diversity)
+export(dist_many_many)
+export(dist_one_many)
+export(dist_one_one)
 export(distance)
 export(divergence_sq)
 export(estimate.probability)
 
@@ -5,8 +5,10 @@
 - `distance()` and all other individual information theory functions
 receive a new argument `epsilon` with default value `epsilon = 0.00001` to treat cases where in individual distance or similarity computations 
 yield `x / 0` or `0 / 0`. Instead of a hard coded epsilon, users can now set `epsilon` according to their input vectors. (Many thanks to Joshua McNeill #26 for this great question). 
+- three new functions `dist_one_one()`, `dist_one_many()`, `dist_many_many()` are added. They are fairly flexible intermediaries between `distance()` and single distance functions. `dist_one_one()` expects two vectors (probability density functions) and returns a single value. `dist_one_many()` expects one vector (a probability density function) and one matrix (a set of probability density functions), and returns a vector of values. `dist_many_many()` expects two matrices (two sets of probability density functions), and returns a matrix of values.
 
 ### Updates
+
 - `dplyr` package dependency was removed and replaced by the `poorman`
 due to the heavy dependency burden of `dplyr`, since `philentropy`
 only used `dplyr::between()` which is now `poorman::between()` (Many thanks to Patrice Kiener for this suggestion)
 
@@ -50,7 +50,7 @@
 #' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
 #' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
 #' return negative values which are not defined and only occur due to the
-#' technical issues of computing \code{x / 0} or \code(0 / 0) cases. 
+#' technical issues of computing x / 0 or 0 / 0 cases. 
 #' @return The Kullback-Leibler divergence of probability vectors.
 #' @author Hajk-Georg Drost
 #' @seealso
 
@@ -37,7 +37,7 @@
 #' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
 #' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
 #' return negative values which are not defined and only occur due to the
-#' technical issues of computing \code{x / 0} or \code(0 / 0) cases.
+#' technical issues of computing x / 0 or 0 / 0 cases.
 #' @param est.prob method to estimate probabilities from input count vectors such as non-probability vectors. Default: \code{est.prob = NULL}. Options are:
 #' \itemize{
 #' \item \code{est.prob = "empirical"}: The relative frequencies of each vector are computed internally. For example an input matrix \code{rbind(1:10, 11:20)} will be transformed to a probability vector \code{rbind(1:10 / sum(1:10), 11:20 / sum(11:20))}
 
@@ -8,6 +8,11 @@
 
 using namespace Rcpp;
 
+#ifdef RCPP_USE_GLOBAL_ROSTREAM
+Rcpp::Rostream<true>&  Rcpp::Rcout = Rcpp::Rcpp_cout_get();
+Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
+#endif
+
 // Ecpp
 double Ecpp(const Rcpp::NumericVector& P, Rcpp::String unit);
 RcppExport SEXP _philentropy_Ecpp(SEXP PSEXP, SEXP unitSEXP) {
@@ -165,6 +170,57 @@ BEGIN_RCPP
     return rcpp_result_gen;
 END_RCPP
 }
+// dist_one_one
+double dist_one_one(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, const Rcpp::String& method, const double& p, const bool& testNA, const Rcpp::String& unit, const double& epsilon);
+RcppExport SEXP _philentropy_dist_one_one(SEXP PSEXP, SEXP QSEXP, SEXP methodSEXP, SEXP pSEXP, SEXP testNASEXP, SEXP unitSEXP, SEXP epsilonSEXP) {
+BEGIN_RCPP
+    Rcpp::RObject rcpp_result_gen;
+    Rcpp::RNGScope rcpp_rngScope_gen;
+    Rcpp::traits::input_parameter< const Rcpp::NumericVector& >::type P(PSEXP);
+    Rcpp::traits::input_parameter< const Rcpp::NumericVector& >::type Q(QSEXP);
+    Rcpp::traits::input_parameter< const Rcpp::String& >::type method(methodSEXP);
+    Rcpp::traits::input_parameter< const double& >::type p(pSEXP);
+    Rcpp::traits::input_parameter< const bool& >::type testNA(testNASEXP);
+    Rcpp::traits::input_parameter< const Rcpp::String& >::type unit(unitSEXP);
+    Rcpp::traits::input_parameter< const double& >::type epsilon(epsilonSEXP);
+    rcpp_result_gen = Rcpp::wrap(dist_one_one(P, Q, method, p, testNA, unit, epsilon));
+    return rcpp_result_gen;
+END_RCPP
+}
+// dist_one_many
+Rcpp::NumericVector dist_one_many(const Rcpp::NumericVector& P, Rcpp::NumericMatrix dists, Rcpp::String method, double p, bool testNA, Rcpp::String unit, double epsilon);
+RcppExport SEXP _philentropy_dist_one_many(SEXP PSEXP, SEXP distsSEXP, SEXP methodSEXP, SEXP pSEXP, SEXP testNASEXP, SEXP unitSEXP, SEXP epsilonSEXP) {
+BEGIN_RCPP
+    Rcpp::RObject rcpp_result_gen;
+    Rcpp::RNGScope rcpp_rngScope_gen;
+    Rcpp::traits::input_parameter< const Rcpp::NumericVector& >::type P(PSEXP);
+    Rcpp::traits::input_parameter< Rcpp::NumericMatrix >::type dists(distsSEXP);
+    Rcpp::traits::input_parameter< Rcpp::String >::type method(methodSEXP);
+    Rcpp::traits::input_parameter< double >::type p(pSEXP);
+    Rcpp::traits::input_parameter< bool >::type testNA(testNASEXP);
+    Rcpp::traits::input_parameter< Rcpp::String >::type unit(unitSEXP);
+    Rcpp::traits::input_parameter< double >::type epsilon(epsilonSEXP);
+    rcpp_result_gen = Rcpp::wrap(dist_one_many(P, dists, method, p, testNA, unit, epsilon));
+    return rcpp_result_gen;
+END_RCPP
+}
+// dist_many_many
+Rcpp::NumericMatrix dist_many_many(Rcpp::NumericMatrix dists1, Rcpp::NumericMatrix dists2, Rcpp::String method, double p, bool testNA, Rcpp::String unit, double epsilon);
+RcppExport SEXP _philentropy_dist_many_many(SEXP dists1SEXP, SEXP dists2SEXP, SEXP methodSEXP, SEXP pSEXP, SEXP testNASEXP, SEXP unitSEXP, SEXP epsilonSEXP) {
+BEGIN_RCPP
+    Rcpp::RObject rcpp_result_gen;
+    Rcpp::RNGScope rcpp_rngScope_gen;
+    Rcpp::traits::input_parameter< Rcpp::NumericMatrix >::type dists1(dists1SEXP);
+    Rcpp::traits::input_parameter< Rcpp::NumericMatrix >::type dists2(dists2SEXP);
+    Rcpp::traits::input_parameter< Rcpp::String >::type method(methodSEXP);
+    Rcpp::traits::input_parameter< double >::type p(pSEXP);
+    Rcpp::traits::input_parameter< bool >::type testNA(testNASEXP);
+    Rcpp::traits::input_parameter< Rcpp::String >::type unit(unitSEXP);
+    Rcpp::traits::input_parameter< double >::type epsilon(epsilonSEXP);
+    rcpp_result_gen = Rcpp::wrap(dist_many_many(dists1, dists2, method, p, testNA, unit, epsilon));
+    return rcpp_result_gen;
+END_RCPP
+}
 // custom_log2
 double custom_log2(const double& x);
 RcppExport SEXP _philentropy_custom_log2(SEXP xSEXP) {
@@ -935,3 +991,78 @@ RcppExport SEXP _philentropy_RcppExport_registerCCallable() {
     R_RegisterCCallable("philentropy", "_philentropy_RcppExport_validate", (DL_FUNC)_philentropy_RcppExport_validate);
     return R_NilValue;
 }
+
+static const R_CallMethodDef CallEntries[] = {
+    {"_philentropy_Ecpp", (DL_FUNC) &_philentropy_Ecpp, 2},
+    {"_philentropy_JEcpp", (DL_FUNC) &_philentropy_JEcpp, 2},
+    {"_philentropy_CEcpp", (DL_FUNC) &_philentropy_CEcpp, 3},
+    {"_philentropy_MIcpp", (DL_FUNC) &_philentropy_MIcpp, 4},
+    {"_philentropy_pearson_corr_centred", (DL_FUNC) &_philentropy_pearson_corr_centred, 3},
+    {"_philentropy_pearson_corr_uncentred", (DL_FUNC) &_philentropy_pearson_corr_uncentred, 3},
+    {"_philentropy_squared_pearson_corr", (DL_FUNC) &_philentropy_squared_pearson_corr, 3},
+    {"_philentropy_DistMatrixWithoutUnitDF", (DL_FUNC) &_philentropy_DistMatrixWithoutUnitDF, 3},
+    {"_philentropy_DistMatrixMinkowskiMAT", (DL_FUNC) &_philentropy_DistMatrixMinkowskiMAT, 3},
+    {"_philentropy_DistMatrixWithoutUnitMAT", (DL_FUNC) &_philentropy_DistMatrixWithoutUnitMAT, 3},
+    {"_philentropy_DistMatrixWithUnitDF", (DL_FUNC) &_philentropy_DistMatrixWithUnitDF, 4},
+    {"_philentropy_DistMatrixWithUnitMAT", (DL_FUNC) &_philentropy_DistMatrixWithUnitMAT, 4},
+    {"_philentropy_dist_one_one", (DL_FUNC) &_philentropy_dist_one_one, 7},
+    {"_philentropy_dist_one_many", (DL_FUNC) &_philentropy_dist_one_many, 7},
+    {"_philentropy_dist_many_many", (DL_FUNC) &_philentropy_dist_many_many, 7},
+    {"_philentropy_custom_log2", (DL_FUNC) &_philentropy_custom_log2, 1},
+    {"_philentropy_custom_log10", (DL_FUNC) &_philentropy_custom_log10, 1},
+    {"_philentropy_euclidean", (DL_FUNC) &_philentropy_euclidean, 3},
+    {"_philentropy_manhattan", (DL_FUNC) &_philentropy_manhattan, 3},
+    {"_philentropy_minkowski", (DL_FUNC) &_philentropy_minkowski, 4},
+    {"_philentropy_chebyshev", (DL_FUNC) &_philentropy_chebyshev, 3},
+    {"_philentropy_sorensen", (DL_FUNC) &_philentropy_sorensen, 3},
+    {"_philentropy_gower", (DL_FUNC) &_philentropy_gower, 3},
+    {"_philentropy_soergel", (DL_FUNC) &_philentropy_soergel, 3},
+    {"_philentropy_kulczynski_d", (DL_FUNC) &_philentropy_kulczynski_d, 4},
+    {"_philentropy_canberra", (DL_FUNC) &_philentropy_canberra, 3},
+    {"_philentropy_lorentzian", (DL_FUNC) &_philentropy_lorentzian, 4},
+    {"_philentropy_intersection_dist", (DL_FUNC) &_philentropy_intersection_dist, 3},
+    {"_philentropy_wave_hedges", (DL_FUNC) &_philentropy_wave_hedges, 3},
+    {"_philentropy_czekanowski", (DL_FUNC) &_philentropy_czekanowski, 3},
+    {"_philentropy_motyka", (DL_FUNC) &_philentropy_motyka, 3},
+    {"_philentropy_tanimoto", (DL_FUNC) &_philentropy_tanimoto, 3},
+    {"_philentropy_ruzicka", (DL_FUNC) &_philentropy_ruzicka, 3},
+    {"_philentropy_inner_product", (DL_FUNC) &_philentropy_inner_product, 3},
+    {"_philentropy_harmonic_mean_dist", (DL_FUNC) &_philentropy_harmonic_mean_dist, 3},
+    {"_philentropy_cosine_dist", (DL_FUNC) &_philentropy_cosine_dist, 3},
+    {"_philentropy_kumar_hassebrook", (DL_FUNC) &_philentropy_kumar_hassebrook, 3},
+    {"_philentropy_jaccard", (DL_FUNC) &_philentropy_jaccard, 3},
+    {"_philentropy_dice_dist", (DL_FUNC) &_philentropy_dice_dist, 3},
+    {"_philentropy_fidelity", (DL_FUNC) &_philentropy_fidelity, 3},
+    {"_philentropy_bhattacharyya", (DL_FUNC) &_philentropy_bhattacharyya, 5},
+    {"_philentropy_hellinger", (DL_FUNC) &_philentropy_hellinger, 3},
+    {"_philentropy_matusita", (DL_FUNC) &_philentropy_matusita, 3},
+    {"_philentropy_squared_chord", (DL_FUNC) &_philentropy_squared_chord, 3},
+    {"_philentropy_squared_euclidean", (DL_FUNC) &_philentropy_squared_euclidean, 3},
+    {"_philentropy_pearson_chi_sq", (DL_FUNC) &_philentropy_pearson_chi_sq, 4},
+    {"_philentropy_neyman_chi_sq", (DL_FUNC) &_philentropy_neyman_chi_sq, 4},
+    {"_philentropy_squared_chi_sq", (DL_FUNC) &_philentropy_squared_chi_sq, 3},
+    {"_philentropy_prob_symm_chi_sq", (DL_FUNC) &_philentropy_prob_symm_chi_sq, 3},
+    {"_philentropy_divergence_sq", (DL_FUNC) &_philentropy_divergence_sq, 3},
+    {"_philentropy_clark_sq", (DL_FUNC) &_philentropy_clark_sq, 3},
+    {"_philentropy_additive_symm_chi_sq", (DL_FUNC) &_philentropy_additive_symm_chi_sq, 3},
+    {"_philentropy_kullback_leibler_distance", (DL_FUNC) &_philentropy_kullback_leibler_distance, 5},
+    {"_philentropy_jeffreys", (DL_FUNC) &_philentropy_jeffreys, 5},
+    {"_philentropy_k_divergence", (DL_FUNC) &_philentropy_k_divergence, 4},
+    {"_philentropy_topsoe", (DL_FUNC) &_philentropy_topsoe, 4},
+    {"_philentropy_jensen_shannon", (DL_FUNC) &_philentropy_jensen_shannon, 4},
+    {"_philentropy_jensen_difference", (DL_FUNC) &_philentropy_jensen_difference, 4},
+    {"_philentropy_taneja", (DL_FUNC) &_philentropy_taneja, 5},
+    {"_philentropy_kumar_johnson", (DL_FUNC) &_philentropy_kumar_johnson, 4},
+    {"_philentropy_avg", (DL_FUNC) &_philentropy_avg, 3},
+    {"_philentropy_as_matrix", (DL_FUNC) &_philentropy_as_matrix, 1},
+    {"_philentropy_as_data_frame", (DL_FUNC) &_philentropy_as_data_frame, 1},
+    {"_philentropy_sum_rcpp", (DL_FUNC) &_philentropy_sum_rcpp, 1},
+    {"_philentropy_est_prob_empirical", (DL_FUNC) &_philentropy_est_prob_empirical, 1},
+    {"_philentropy_RcppExport_registerCCallable", (DL_FUNC) &_philentropy_RcppExport_registerCCallable, 0},
+    {NULL, NULL, 0}
+};
+
+RcppExport void R_init_philentropy(DllInfo *dll) {
+    R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
+    R_useDynamicSymbols(dll, FALSE);
+}
@@ -79,6 +79,7 @@ Rcpp::NumericMatrix DistMatrixWithoutUnitMAT(Rcpp::NumericMatrix dists, Rcpp::Fu
         return dist_matrix;
 }
 
+
 // @export
 // [[Rcpp::export]]
 Rcpp::NumericMatrix DistMatrixWithUnitDF(Rcpp::DataFrame distsDF, Rcpp::Function DistFunc, bool testNA, Rcpp::String unit){
@@ -129,4 +130,233 @@ Rcpp::NumericMatrix DistMatrixWithUnitMAT(Rcpp::NumericMatrix dists, Rcpp::Funct
         return dist_matrix;
 }
 
+//' @title Distances and Similarities between Two Probability Density Functions
+//' @description This functions computes the distance/dissimilarity between two probability density functions.
+//' @param P a numeric vector storing the first distribution.
+//' @param Q a numeric vector storing the second distribution.
+//' @param method a character string indicating whether the distance measure that should be computed.
+//' @param p power of the Minkowski distance.
+//' @param testNA a logical value indicating whether or not distributions shall be checked for \code{NA} values.
+//' @param unit type of \code{log} function. Option are 
+//' \itemize{
+//' \item \code{unit = "log"}
+//' \item \code{unit = "log2"}
+//' \item \code{unit = "log10"}   
+//' }
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
+//' @return A single distance value
+//' @examples
+//' P <- 1:10 / sum(1:10)
+//' Q <- 20:29 / sum(20:29)
+//' dist_one_one(P, Q, method = "euclidean", testNA = FALSE)
+//' @export
+// [[Rcpp::export]]
+double dist_one_one(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, const Rcpp::String& method, const double& p = NA_REAL, const bool& testNA = true, const Rcpp::String& unit = "log", const double& epsilon = 0.00001){
+        double dist_value;
+        if (method == "euclidean"){
+                dist_value = euclidean(P, Q, testNA);
+        } else if (method == "manhattan") {
+                dist_value = manhattan(P, Q, testNA);
+        } else if (method == "minkowski") {
+                dist_value = minkowski(P, Q, p, testNA);
+        } else if (method == "chebyshev") {
+                dist_value = chebyshev(P, Q, testNA);
+        } else if (method == "sorensen") {
+                dist_value = sorensen(P, Q, testNA);
+        } else if (method == "gower") {
+                dist_value = gower(P, Q, testNA);
+        } else if (method == "soergel") {
+                dist_value = soergel(P, Q, testNA);
+        } else if (method == "kulczynski_d") {
+                dist_value = kulczynski_d(P, Q, testNA, epsilon);
+        } else if (method == "canberra") {
+                dist_value = canberra(P, Q, testNA);
+        } else if (method == "lorentzian") {
+                dist_value = lorentzian(P, Q, testNA, unit);
+        } else if (method == "intersection") {
+                dist_value = intersection_dist(P, Q, testNA);
+        } else if (method == "non-intersection") {
+                dist_value = 1.0 - intersection_dist(P, Q, testNA);
+        } else if (method == "wavehedges") {
+                dist_value = wave_hedges(P, Q, testNA);
+        } else if (method == "czekanowski") {
+                dist_value = czekanowski(P, Q, testNA);
+        } else if (method == "motyka") {
+                dist_value = motyka(P, Q, testNA);
+        } else if (method == "kulczynski_s") {
+                dist_value = 1.0 / kulczynski_d(P, Q, testNA, epsilon);
+        } else if (method == "tanimoto") {
+                dist_value = tanimoto(P, Q, testNA);
+        } else if (method == "ruzicka") {
+                dist_value = ruzicka(P, Q, testNA);
+        } else if (method == "inner_product") {
+                dist_value = inner_product(P, Q, testNA);
+        } else if (method == "harmonic_mean") {
+                dist_value = harmonic_mean_dist(P, Q, testNA);
+        } else if (method == "cosine") {
+                dist_value = cosine_dist(P, Q, testNA);
+        } else if (method == "hassebrook") {
+                dist_value = kumar_hassebrook(P, Q, testNA);
+        } else if (method == "jaccard") {
+                dist_value = jaccard(P, Q, testNA);
+        } else if (method == "dice") {
+                dist_value = dice_dist(P, Q, testNA);
+        } else if (method == "fidelity") {
+                dist_value = fidelity(P, Q, testNA);
+        } else if (method == "bhattacharyya") {
+                dist_value = bhattacharyya(P, Q, testNA, unit, epsilon);
+        } else if (method == "hellinger") {
+                dist_value = hellinger(P, Q, testNA);
+        } else if (method == "matusita") {
+                dist_value = matusita(P, Q, testNA);
+        } else if (method == "squared_chord") {
+                dist_value = squared_chord(P, Q, testNA);
+        } else if (method == "squared_euclidean") {
+                dist_value = squared_euclidean(P, Q, testNA);
+        } else if (method == "pearson") {
+                dist_value = pearson_chi_sq(P, Q, testNA, epsilon);
+        } else if (method == "neyman") {
+                dist_value = neyman_chi_sq(P, Q, testNA, epsilon);
+        } else if (method == "squared_chi") {
+                dist_value = squared_chi_sq(P, Q, testNA);
+        } else if (method == "prob_symm") {
+                dist_value = prob_symm_chi_sq(P, Q, testNA);
+        } else if (method == "divergence") {
+                dist_value = divergence_sq(P, Q, testNA);
+        } else if (method == "clark") {
+                dist_value = clark_sq(P, Q, testNA);
+        } else if (method == "additive_symm") {
+                dist_value = additive_symm_chi_sq(P, Q, testNA);
+        } else if (method == "kullback-leibler") {
+                dist_value = kullback_leibler_distance(P, Q, testNA, unit, epsilon);
+        } else if (method == "jeffreys") {
+                dist_value = jeffreys(P, Q, testNA, unit, epsilon);
+        } else if (method == "k_divergence") {
+                dist_value = k_divergence(P, Q, testNA, unit);
+        } else if (method == "topsoe") {
+                dist_value = topsoe(P, Q, testNA, unit);
+        } else if (method == "jensen_shannon"){
+                dist_value = jensen_shannon(P, Q, testNA, unit); 
+        } else if (method == "jensen_difference") {
+                dist_value = jensen_difference(P, Q, testNA, unit);
+        } else if (method == "taneja") {
+                dist_value = taneja(P, Q, testNA, unit, epsilon);
+        } else if (method == "kumar-johnson") {
+                dist_value = kumar_johnson(P, Q, testNA, epsilon);
+        } else if (method == "avg") {
+                dist_value = avg(P, Q, testNA);
+        } else {
+                Rcpp::stop("Specified method is not implemented. Please consult getDistMethods().");
+        }
+        return dist_value;
+}
+
+//' @title Distances and Similarities between One and Many Probability Density Functions
+//' @description This functions computes the distance/dissimilarity between one probability density functions and a set of probability density functions.
+//' @param P a numeric vector storing the first distribution.
+//' @param dists a numeric matrix storing distributions in its rows.
+//' @param method a character string indicating whether the distance measure that should be computed.
+//' @param p power of the Minkowski distance.
+//' @param testNA a logical value indicating whether or not distributions shall be checked for \code{NA} values.
+//' @param unit type of \code{log} function. Option are 
+//' \itemize{
+//' \item \code{unit = "log"}
+//' \item \code{unit = "log2"}
+//' \item \code{unit = "log10"}   
+//' }
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
+//' @return A vector of distance values
+//' @examples
+//' set.seed(2020-08-20)
+//' P <- 1:10 / sum(1:10)
+//' M <- t(replicate(100, sample(1:10, size = 10) / 55))
+//' dist_one_many(P, M, method = "euclidean", testNA = FALSE)
+//' @export
+// [[Rcpp::export]]
+Rcpp::NumericVector dist_one_many(const Rcpp::NumericVector& P, Rcpp::NumericMatrix dists, Rcpp::String method, double p = NA_REAL, bool testNA = true, Rcpp::String unit = "log", double epsilon = 0.00001){
+        
+        int nrows = dists.nrow();
+        Rcpp::NumericVector dist_values(nrows);
+        
+        for (int i = 0; i < nrows; i++){
+                dist_values[i] = dist_one_one(P, dists(i, Rcpp::_), method, p, testNA, unit);
+        }
+        return dist_values;
+}
 
+//' @title Distances and Similarities between Many Probability Density Functions
+//' @description This functions computes the distance/dissimilarity between two sets of probability density functions.
+//' @param dists1 a numeric matrix storing distributions in its rows.
+//' @param dists2 a numeric matrix storing distributions in its rows.
+//' @param method a character string indicating whether the distance measure that should be computed.
+//' @param p power of the Minkowski distance.
+//' @param testNA a logical value indicating whether or not distributions shall be checked for \code{NA} values.
+//' @param unit type of \code{log} function. Option are 
+//' \itemize{
+//' \item \code{unit = "log"}
+//' \item \code{unit = "log2"}
+//' \item \code{unit = "log10"}   
+//' }
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
+//' @return A matrix of distance values
+//' @examples 
+//'   set.seed(2020-08-20)
+//'   M1 <- t(replicate(10, sample(1:10, size = 10) / 55))
+//'   M2 <- t(replicate(10, sample(1:10, size = 10) / 55))
+//'   result <- dist_many_many(M1, M2, method = "euclidean", testNA = FALSE)
+//' @export
+// [[Rcpp::export]]
+Rcpp::NumericMatrix dist_many_many(Rcpp::NumericMatrix dists1, Rcpp::NumericMatrix dists2, Rcpp::String method, double p = NA_REAL, bool testNA = true, Rcpp::String unit = "log", double epsilon = 0.00001){
+        int nrows1 = dists1.nrow();
+        int nrows2 = dists2.nrow();
+        double dist_value = 0.0;
+        
+        Rcpp::NumericMatrix dist_matrix(nrows1,nrows2);
+        // std::fill(dist_matrix.begin(), dist_matrix.end(), Rcpp::NumericVector::get_na());
+        
+        for (int i = 0; i < nrows1; i++){
+                for (int j = 0; j < nrows2; j++){
+                        // if(Rcpp::NumericVector::is_na(dist_matrix(i,j))){
+                        dist_value = dist_one_one(dists1(i, Rcpp::_), dists2(j, Rcpp::_), method, p, testNA, unit);
+                        dist_matrix(i,j) = dist_value;
+                        // }
+                }
+        }
+        return dist_matrix;
+}
@@ -366,9 +366,23 @@ double soergel(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool
 //' @param P a numeric vector storing the first distribution.
 //' @param Q a numeric vector storing the second distribution.
 //' @param testNA a logical value indicating whether or not distributions shall be checked for \code{NA} values.
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
 //' @author Hajk-Georg Drost
 //' @examples
-//' kulczynski_d(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
+//' kulczynski_d(P = 1:10/sum(1:10), Q = 20:29/sum(20:29),
+//'     testNA = FALSE, epsilon = 0.00001)
 //' @export
 // [[Rcpp::export]]
 double kulczynski_d(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool testNA, double epsilon){
@@ -1067,14 +1081,41 @@ double fidelity(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool
 //' @param Q a numeric vector storing the second distribution.
 //' @param testNA a logical value indicating whether or not distributions shall be checked for \code{NA} values.
 //' @param unit type of \code{log} function. Option are 
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
 //' \itemize{
 //' \item \code{unit = "log"}
 //' \item \code{unit = "log2"}
 //' \item \code{unit = "log10"}   
 //' }
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
 //' @author Hajk-Georg Drost
 //' @examples
-//' bhattacharyya(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
+//' bhattacharyya(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE,
+//'  unit = "log2", epsilon = 0.00001)
 //' @export
 // [[Rcpp::export]]
 double bhattacharyya(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool testNA, const Rcpp::String unit, double epsilon){
@@ -1225,9 +1266,23 @@ double squared_euclidean(const Rcpp::NumericVector& P, const Rcpp::NumericVector
 //' @param P a numeric vector storing the first distribution.
 //' @param Q a numeric vector storing the second distribution.
 //' @param testNA a logical value indicating whether or not distributions shall be checked for \code{NA} values.
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
 //' @author Hajk-Georg Drost
 //' @examples
-//' pearson_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
+//' pearson_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29),
+//'  testNA = FALSE, epsilon = 0.00001)
 //' @export
 // [[Rcpp::export]]
 double pearson_chi_sq(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool testNA, double epsilon){
@@ -1280,9 +1335,23 @@ double pearson_chi_sq(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q
 //' @param P a numeric vector storing the first distribution.
 //' @param Q a numeric vector storing the second distribution.
 //' @param testNA a logical value indicating whether or not distributions shall be checked for \code{NA} values.
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
 //' @author Hajk-Georg Drost
 //' @examples
-//' neyman_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
+//' neyman_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29),
+//'  testNA = FALSE, epsilon = 0.00001)
 //' @export
 // [[Rcpp::export]]
 double neyman_chi_sq(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool testNA, double epsilon){
@@ -1541,9 +1610,23 @@ double additive_symm_chi_sq(const Rcpp::NumericVector& P, const Rcpp::NumericVec
 //' \item \code{unit = "log2"}
 //' \item \code{unit = "log10"}   
 //' }
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
 //' @author Hajk-Georg Drost
 //' @examples
-//' kullback_leibler_distance(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
+//' kullback_leibler_distance(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE,
+//'  unit = "log2", epsilon = 0.00001)
 //' @export
 // [[Rcpp::export]]
 double kullback_leibler_distance(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool testNA, const Rcpp::String unit, double epsilon){
@@ -1660,9 +1743,23 @@ double kullback_leibler_distance(const Rcpp::NumericVector& P, const Rcpp::Numer
 //' \item \code{unit = "log2"}
 //' \item \code{unit = "log10"}   
 //' }
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
 //' @author Hajk-Georg Drost
 //' @examples
-//' jeffreys(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
+//' jeffreys(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE,
+//'  unit = "log2", epsilon = 0.00001)
 //' @export
 // [[Rcpp::export]]
 double jeffreys(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool testNA, const Rcpp::String unit, double epsilon){
@@ -2344,9 +2441,23 @@ double jensen_difference(const Rcpp::NumericVector& P, const Rcpp::NumericVector
 //' \item \code{unit = "log2"}
 //' \item \code{unit = "log10"}   
 //' }
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
 //' @author Hajk-Georg Drost
 //' @examples
-//' taneja(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
+//' taneja(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE,
+//'  unit = "log2", epsilon = 0.00001)
 //' @export
 // [[Rcpp::export]]
 double taneja(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool testNA, const Rcpp::String unit, double epsilon){
@@ -2465,9 +2576,23 @@ double taneja(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool t
 //' @param P a numeric vector storing the first distribution.
 //' @param Q a numeric vector storing the second distribution.
 //' @param testNA a logical value indicating whether or not distributions shall be checked for \code{NA} values.
+//' @param epsilon epsilon a small value to address cases in the distance computation where division by zero occurs. In
+//' these cases, x / 0 or 0 / 0 will be replaced by \code{epsilon}. The default is \code{epsilon = 0.00001}.
+//' However, we recommend to choose a custom \code{epsilon} value depending on the size of the input vectors,
+//' the expected similarity between compared probability density functions and 
+//' whether or not many 0 values are present within the compared vectors.
+//' As a rough rule of thumb we suggest that when dealing with very large 
+//' input vectors which are very similar and contain many \code{0} values,
+//' the \code{epsilon} value should be set even smaller (e.g. \code{epsilon = 0.000000001}),
+//' whereas when vector sizes are small or distributions very divergent then
+//' higher \code{epsilon} values may also be appropriate (e.g. \code{epsilon = 0.01}).
+//' Addressing this \code{epsilon} issue is important to avoid cases where distance metrics
+//' return negative values which are not defined and only occur due to the
+//' technical issues of computing x / 0 or 0 / 0 cases.
 //' @author Hajk-Georg Drost
 //' @examples
-//' kumar_johnson(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
+//' kumar_johnson(P = 1:10/sum(1:10), Q = 20:29/sum(20:29),
+//'  testNA = FALSE, epsilon = 0.00001)
 //' @export
 // [[Rcpp::export]]
 double kumar_johnson(const Rcpp::NumericVector& P, const Rcpp::NumericVector& Q, bool testNA, double epsilon){
 
@@ -0,0 +1,52 @@
+context("Test dist_one_one, dist_one_many, dist_many_many ...")
+
+set.seed(2020-08-20)
+P <- 1:10 / sum(1:10)
+Q <- 20:29 / sum(20:29)
+M1 <- t(replicate(10, sample(1:10, size = 10) / 55))
+M2 <- t(replicate(20, sample(1:10, size = 10) / 55))
+
+doo1 <- dist_one_one(P, Q, method = "euclidean", testNA = FALSE)
+dom1 <- dist_one_many(P, M1, method = "euclidean", testNA = FALSE)
+dmm1 <- dist_many_many(M1, M2, method = "euclidean", testNA = FALSE)
+
+test_that("dist_one_one output structure is correct", {
+  expect_type(doo1, "double")
+  expect_length(doo1, 1)
+})
+
+test_that("dist_one_many output structure is correct", {
+  expect_type(dom1, "double")
+  expect_length(dom1, nrow(M1))
+})
+
+test_that("dist_many_many output structure is correct", {
+  expect_type(dmm1, "double")
+  expect_equal(dim(dmm1), c(nrow(M1), nrow(M2)))
+})
+
+doo2 = euclidean(P, Q, FALSE)
+
+test_that("dist_one_one output is correct", {
+  expect_equal(doo1, doo2)
+})
+
+dom2 = vector(length = nrow(M1))
+for (i in seq_len(nrow(M1))){
+  dom2[i] = euclidean(P, M1[i, ], FALSE)
+}
+
+test_that("dist_one_many output is correct", {
+  expect_equal(dom1, dom2)
+})
+
+dmm2 = matrix(nrow = nrow(M1), ncol = nrow(M2))
+for (i in seq_len(nrow(M1))){
+  for (j in seq_len(nrow(M2))){
+    dmm2[i, j] = euclidean(M1[i, ], M2[j, ], FALSE)
+  }
+}
+
+test_that("dist_many_many output is correct", {
+  expect_equal(dmm1, dmm2)
+})
@@ -184,7 +184,7 @@ As you can see, although the `distance()` function is quite fast, the internal c
 
 The advantage of `distance()` is that it implements 46 distance measures based on base C++ functions that can be accessed individually by typing `philentropy::` and then `TAB`. In future versions of `philentropy` I will optimize the `distance()` function so that internal checks for data type correctness and correct input data will take less termination time than the base `dist()` function.
 
-
+<!--
 ## Detailed assessment of individual similarity and distance metrics
 
 
@@ -210,4 +210,4 @@ Euclid argued that that the __shortest__ distance between two points is always a
 #### Chebyshev distance
 
 > $d = max | P_i - Q_i |$
-
+-->
@@ -0,0 +1,132 @@
+---
+title: "Comparing many probability density functions"
+author: Jakub Nowosad
+date: 2021-08-20
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Many_Distances}
+  %\VignetteEngine{knitr::rmarkdown}
+  \usepackage[utf8]{inputenc}
+---
+
+The **philentropy** package has several mechanisms to calculate distances between probability density functions.
+The main one is to use the the `distance()` function, which enables to compute 46 different distances/similarities between probability density functions (see `?philentropy::distance` and [a companion vignette](Distances.html) for details).
+Alternatively, it is possible to call each distance/dissimilarity function directly.
+For example, the `euclidean()` function will compute the euclidean distance, while `jaccard` - the Jaccard distance.
+The complete list of available distance measures are available with the `philentropy::getDistMethods()` function.
+
+Both of the above approaches have their pros and cons. 
+The `distance()` function is more flexible as it allows users to use any distance measure and can return either a `matrix` or a `dist` object. 
+It also has several defensive programming checks implemented, and thus, it is more appropriate for regular users.
+Single distance functions, such as `euclidean()` or `jaccard()`, can be, on the other hand, slightly faster as they directly call the underlining C++ code.
+
+Now, we introduce three new low-level functions that are intermediaries between `distance()` and single distance functions. 
+They are fairly flexible, allowing to use of any implemented distance measure, but also usually faster than calling the `distance()` functions (especially, if it is needed to use many times).
+These functions are:
+
+- `dist_one_one()` - expects two vectors (probability density functions), returns a single value
+- `dist_one_many()` - expects one vector (a probability density function) and one matrix (a set of probability density functions), returns a vector of values
+- `dist_many_many()` - expects two matrices (two sets of probability density functions), returns a matrix of values
+
+Let's start testing them by attaching the **philentropy** package.
+
+```{r}
+library(philentropy)
+```
+
+## `dist_one_one()`
+
+`dist_one_one()` is a lower level equivalent to `distance()`.
+However, instead of accepting a numeric `data.frame` or `matrix`, it expects two vectors representing probability density functions.
+In this example, we create two vectors, `P` and `Q`.
+
+```{r}
+P <- 1:10 / sum(1:10)
+Q <- 20:29 / sum(20:29)
+```
+
+To calculate the euclidean distance between them we can use several approaches - (a) build-in R `dist()` function, (b) `philentropy::distance()`, (c) `philentropy::euclidean()`, or the new `dist_one_one()`.
+
+```{r}
+# install.packages("microbenchmark")
+microbenchmark::microbenchmark(
+  dist(rbind(P, Q), method = "euclidean"),
+  distance(rbind(P, Q), method = "euclidean", test.na = FALSE, mute.message = TRUE),
+  euclidean(P, Q, FALSE),
+  dist_one_one(P, Q, method = "euclidean", testNA = FALSE)
+)
+```
+
+All of them return the same, single value.
+However, as you can see in the benchmark above, some are more flexible, and others are faster.
+
+## `dist_one_many()`
+
+The role of `dist_one_many()` is to calculate distances between one probability density function (in a form of a `vector`) and a set of probability density functions (as rows in a `matrix`).
+
+Firstly, let's create our example data.
+
+```{r}
+set.seed(2020-08-20)
+P <- 1:10 / sum(1:10)
+M <- t(replicate(100, sample(1:10, size = 10) / 55))
+```
+
+`P` is our input vector and `M` is our input matrix.
+
+Distances between the `P` vector and probability density functions in `M` can be calculated using several approaches. 
+For example, we could write a `for` loop (adding a new code) or just use the existing `distance()` function and extract only one row (or column) from the results.
+The `dist_one_many()` allows for this calculation directly as it goes through each row in `M` and calculates a given distance measure between `P` and values in this row.
+
+```{r}
+# install.packages("microbenchmark")
+microbenchmark::microbenchmark(
+  as.matrix(dist(rbind(P, M), method = "euclidean"))[1, ][-1],
+  distance(rbind(P, M), method = "euclidean", test.na = FALSE, mute.message = TRUE)[1, ][-1],
+  dist_one_many(P, M, method = "euclidean", testNA = FALSE)
+)
+```
+
+The `dist_one_many()` returns a vector of values.
+It is, in this case, much faster than `distance()`, and visibly faster than `dist()` while allowing for more possible distance measures to be used.
+
+## `dist_many_many()`
+
+`dist_many_many()` calculates distances between two sets of probability density functions (as rows in two `matrix` objects).
+
+Let's create two new `matrix` example data.
+
+```{r}
+set.seed(2020-08-20)
+M1 <- t(replicate(10, sample(1:10, size = 10) / 55))
+M2 <- t(replicate(10, sample(1:10, size = 10) / 55))
+```
+
+`M1` is our first input matrix and `M2` is our second input matrix.
+I am not aware of any function build-in R that allows calculating distances between rows of two matrices, and thus, to solve this problem, we can create our own - `many_dists()`...
+
+```{r}
+many_dists = function(m1, m2){
+  r = matrix(nrow = nrow(m1), ncol = nrow(m2))
+  for (i in seq_len(nrow(m1))){
+    for (j in seq_len(nrow(m2))){
+      x = rbind(m1[i, ], m2[j, ])
+      r[i, j] = distance(x, method = "euclidean", mute.message = TRUE)
+    }
+  }
+  r
+}
+```
+
+... and compare it to `dist_many_many()`.
+
+```{r}
+# install.packages("microbenchmark")
+microbenchmark::microbenchmark(
+  many_dists(M1, M2),
+  dist_many_many(M1, M2, method = "euclidean", testNA = FALSE)
+)
+```
+
+Both `many_dists()`and `dist_many_many()` return a matrix.
+The above benchmark concludes that `dist_many_many()` is about 30 times faster than our custom `many_dists()` approach.