Skip to content

Commit 0221440

Browse files
committed
document details_hier_clust_stats
1 parent d11a762 commit 0221440

File tree

6 files changed

+227
-2
lines changed

6 files changed

+227
-2
lines changed

R/hier_clust.R

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@
55
#' `hier_clust()` defines a model that fits clusters based on a distance-based
66
#' dendrogram
77
#'
8+
#' There are different ways to fit this model, and the method of estimation is
9+
#' chosen by setting the model engine. The engine-specific pages for this model
10+
#' are listed below.
11+
#'
12+
#' - \link[=details_hier_clust_stats]{stats}
13+
#'
814
#' @param mode A single character string for the type of model. The only
915
#' possible value for this model is "partition".
1016
#' @param engine A single character string specifying what computational engine
@@ -23,7 +29,8 @@
2329
#' ## What does it mean to predict?
2430
#'
2531
#' To predict the cluster assignment for a new observation, we find the closest
26-
#' cluster. How we measure “closeness” is dependent on the specified type of linkage in the model:
32+
#' cluster. How we measure “closeness” is dependent on the specified type of
33+
#' linkage in the model:
2734
#'
2835
#' - *single linkage*: The new observation is assigned to the same cluster as
2936
#' its nearest observation from the training data.

R/hier_clust_stats.R

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#' Hierarchical (Agglomerative) Clustering via stats
2+
#'
3+
#' [hier_clust()] creates Hierarchical (Agglomerative) Clustering model.
4+
#'
5+
#' @includeRmd man/rmd/hier_clust_stats.md details
6+
#'
7+
#' @name details_hier_clust_stats
8+
#' @keywords internal
9+
NULL
10+
11+
# See inst/README-DOCS.md for a description of how these files are processed

man/details_hier_clust_stats.Rd

Lines changed: 76 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/hier_clust.Rd

Lines changed: 9 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/rmd/hier_clust_stats.Rmd

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
```{r, child = "aaa.Rmd", include = FALSE}
2+
```
3+
4+
`r descr_models("hier_clust", "stats")`
5+
6+
## Tuning Parameters
7+
8+
```{r stats-param-info, echo = FALSE}
9+
defaults <-
10+
tibble::tibble(tidyclust = c("num_clusters"),
11+
default = c("no default"))
12+
13+
param <-
14+
hier_clust() %>%
15+
set_engine("stats") %>%
16+
set_mode("partition") %>%
17+
make_parameter_list(defaults)
18+
```
19+
20+
This model has `r nrow(param)` tuning parameters:
21+
22+
```{r stats-param-list, echo = FALSE, results = "asis"}
23+
param$item
24+
```
25+
26+
## Translation from tidyclust to the original package (partition)
27+
28+
```{r stats-cls}
29+
hier_clust(num_clusters = integer(1)) %>%
30+
set_engine("stats") %>%
31+
set_mode("partition") %>%
32+
translate_tidyclust()
33+
```
34+
35+
## Preprocessing requirements
36+
37+
```{r child = "template-makes-dummies.Rmd"}
38+
```
39+
40+
## References
41+
42+
- Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole. (S version.)
43+
44+
- Everitt, B. (1974). Cluster Analysis. London: Heinemann Educ. Books.
45+
46+
- Hartigan, J.A. (1975). Clustering Algorithms. New York: Wiley.
47+
48+
- Sneath, P. H. A. and R. R. Sokal (1973). Numerical Taxonomy. San Francisco: Freeman.
49+
50+
- Anderberg, M. R. (1973). Cluster Analysis for Applications. Academic Press: New York.
51+
52+
- Gordon, A. D. (1999). Classification. Second Edition. London: Chapman and Hall / CRC
53+
54+
- Murtagh, F. (1985). “Multidimensional Clustering Algorithms”, in COMPSTAT Lectures 4. Wuerzburg: Physica-Verlag (for algorithmic details of algorithms used).
55+
56+
- McQuitty, L.L. (1966). Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data. Educational and Psychological Measurement, 26, 825–831. doi:10.1177/001316446602600402.
57+
58+
- Legendre, P. and L. Legendre (2012). Numerical Ecology, 3rd English ed. Amsterdam: Elsevier Science BV.
59+
60+
- Murtagh, Fionn and Legendre, Pierre (2014). Ward's hierarchical agglomerative clustering method: which algorithms implement Ward's criterion? Journal of Classification, 31, 274–295. doi:10.1007/s00357-014-9161-z.

man/rmd/hier_clust_stats.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
2+
3+
4+
For this engine, there is a single mode: partition
5+
6+
## Tuning Parameters
7+
8+
9+
10+
This model has 1 tuning parameters:
11+
12+
- `num_clusters`: # Clusters (type: integer, default: no default)
13+
14+
## Translation from tidyclust to the original package (partition)
15+
16+
17+
```r
18+
hier_clust(num_clusters = integer(1)) %>%
19+
set_engine("stats") %>%
20+
set_mode("partition") %>%
21+
translate_tidyclust()
22+
```
23+
24+
```
25+
## Hierarchical Clustering Specification (partition)
26+
##
27+
## Main Arguments:
28+
## num_clusters = integer(1)
29+
## linkage_method = complete
30+
##
31+
## Computational engine: stats
32+
##
33+
## Model fit template:
34+
## tidyclust::.hier_clust_fit_stats(data = missing_arg(), num_clusters = integer(1),
35+
## linkage_method = "complete")
36+
```
37+
38+
## Preprocessing requirements
39+
40+
41+
Factor/categorical predictors need to be converted to numeric values (e.g., dummy or indicator variables) for this engine. When using the formula method via \\code{\\link[=fit.cluster_spec]{fit()}}, tidyclust will convert factor columns to indicators.
42+
43+
## References
44+
45+
- Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth & Brooks/Cole. (S version.)
46+
47+
- Everitt, B. (1974). Cluster Analysis. London: Heinemann Educ. Books.
48+
49+
- Hartigan, J.A. (1975). Clustering Algorithms. New York: Wiley.
50+
51+
- Sneath, P. H. A. and R. R. Sokal (1973). Numerical Taxonomy. San Francisco: Freeman.
52+
53+
- Anderberg, M. R. (1973). Cluster Analysis for Applications. Academic Press: New York.
54+
55+
- Gordon, A. D. (1999). Classification. Second Edition. London: Chapman and Hall / CRC
56+
57+
- Murtagh, F. (1985). “Multidimensional Clustering Algorithms”, in COMPSTAT Lectures 4. Wuerzburg: Physica-Verlag (for algorithmic details of algorithms used).
58+
59+
- McQuitty, L.L. (1966). Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data. Educational and Psychological Measurement, 26, 825–831. doi:10.1177/001316446602600402.
60+
61+
- Legendre, P. and L. Legendre (2012). Numerical Ecology, 3rd English ed. Amsterdam: Elsevier Science BV.
62+
63+
- Murtagh, Fionn and Legendre, Pierre (2014). Ward's hierarchical agglomerative clustering method: which algorithms implement Ward's criterion? Journal of Classification, 31, 274–295. doi:10.1007/s00357-014-9161-z.

0 commit comments

Comments
 (0)