This page provides overview descriptions of my R packages and those I have contributed to. The hex icons link to the GitHub repositories. Links to the package Documentation are also provided.
Topics: Multivariate linear models || Categorical data analysis || Data
Provides HE plot and other functions for visualizing hypothesis tests in multivariate linear models. HE plots represent sums-of-squares-and-products matrices for linear hypotheses and for error using ellipses (in two dimensions) and ellipsoids (in three dimensions). The related ‘candisc’ package provides visualizations in a reduced-rank canonical discriminant space when there are more than a few response variables. Documentation: friendly.github.io/heplots
Functions for computing and visualizing generalized canonical discriminant analyses and canonical correlation analysis for a multivariate linear model. Traditional canonical discriminant analysis is restricted to a one-way ‘MANOVA’ design and is equivalent to canonical correlation analysis between a set of quantitative response variables and a set of dummy variables coded from the factor variable. The ‘candisc’ package generalizes this to higher-way ‘MANOVA’ designs for all factors in a multivariate linear model, computing canonical scores and vectors for each term. The graphic functions provide low-rank (1D, 2D, 3D) visualizations of terms in an ‘mlm’ via the ‘plot.candisc’ and ‘heplot.candisc’ methods. Related plots are now provided for canonical correlation analysis when all predictors are quantitative. Documentation: friendly.github.io/candisc
Provides methods to calculate diagnostics for multicollinearity among predictors in a linear or generalized linear model. It also provides methods to visualize those diagnostics following Friendly & Kwan (2009), “Where’s Waldo: Visualizing Collinearity Diagnostics”, doi:10.1198/tast.2009.0012. These include better tabular presentation of collinearity diagnostics that highlight the important numbers, a semi-graphic tableplot of the diagnostics to make warning and danger levels more salient, and a “collinearity biplot” of the smallest dimensions of predictor space, where collinearity is most apparent. Documentation: friendly.github.io/VisCollin
The genridge package introduces generalizations of the standard
univariate ridge trace plot used in ridge regression and related
methods. These graphical methods show both bias (actually, shrinkage)
and precision, by plotting the covariance ellipsoids of the estimated
coefficients, rather than just the estimates themselves. 2D and 3D
plotting methods are provided, both in the space of the predictor
variables and in the transformed space of the PCA/SVD of the
predictors.
Documentation:
friendly.github.io/genridge
Computes regression deletion diagnostics for multivariate linear models and provides some associated diagnostic plots. The diagnostic measures include hat-values (leverages), generalized Cook’s distance, and generalized squared ‘studentized’ residuals. Several types of plots to detect influential observations are provided. Documentation: friendly.github.io/mvinfluence
A collection of matrix functions for teaching and learning matrix linear algebra as used in multivariate statistical methods. Many of these functions are designed for tutorial purposes in learning matrix algebra ideas using R. In some cases, functions are provided for concepts available elsewhere in R, but where the function call or name is not obvious. In other cases, functions are provided to show or demonstrate an algorithm. In addition, a collection of functions are provided for drawing vector diagrams in 2D and 3D and for rendering matrix expressions and equations in LaTeX. Documentation: friendly.github.io/matlib
Represents generalized geometric ellipsoids with the “(U,D)” representation. It allows degenerate and/or unbounded ellipsoids, together with methods for linear and duality transformations, and for plotting. Thus ellipsoids are naturally extended to include lines, hyperplanes, points, cylinders, etc. This permits exploration of a variety to statistical issues that can be visualized using ellipsoids as discussed by Friendly, Fox & Monette (2013), Elliptical Insights: Understanding Statistical Methods Through Elliptical Geometry doi:10.1214/12-STS402. Documentation: friendly.github.io/gellipsoid
Carries out analyses of two-way tables with one observation per cell,
together with graphical displays for an additive fit and a diagnostic
plot for removable ‘non-additivity’ via a power transformation of the
response. It implements Tukey’s Exploratory Data Analysis (1973) <ISBN:
978-0201076165> methods, including a 1-degree-of-freedom test for
row*column ‘non-additivity’, linear in the row and column effects.
Documentation:
friendly.github.io/twoway
A ‘ggplot2’ based implementation of biplots, giving a representation of
a dataset in a two dimensional space accounting for the greatest
variance, together with variable vectors showing how the data variables
relate to this space. It provides a replacement for stats::biplot(), but
with many enhancements to control the analysis and graphical display. It
implements biplot and scree plot methods which can be used with the
results of prcomp(), princomp(), FactoMineR::PCA(), ade4::dudi.pca() or
MASS::lda() and can be customized using ‘ggplot2’ techniques.
Documentation:
friendly.github.io/ggbiplot
Provides additional data sets, methods and documentation to complement
the ‘vcd’ package for Visualizing Categorical Data and the ‘gnm’ package
for Generalized Nonlinear Models. In particular, ‘vcdExtra’ extends
mosaic, assoc and sieve plots from ‘vcd’ to handle ‘glm()’ and ‘gnm()’
models and adds a 3D version in ‘mosaic3d’. Additionally, methods are
provided for comparing and visualizing lists of ‘glm’ and ‘loglm’
objects. This package is now a support package for the book, “Discrete
Data Analysis with R” by Michael Friendly and David Meyer.
Documentation:
friendly.github.io/vcdExtra
Provides functions for specifying and fitting nested dichotomy logistic
regression models for a multi-category response and methods for
summarising and plotting those models. Nested dichotomies are
statistically independent, and hence provide an additive decomposition
of tests for the overall ‘polytomous’ response. When the dichotomies
make sense substantively, this method can be a simpler alternative to
the standard ‘multinomial’ logistic model which compares response
categories to a reference level. See: J. Fox (2016), “Applied Regression
Analysis and Generalized Linear Models”, 3rd Ed., ISBN 1452205663.
Documentation:
friendly.github.io/nestedLogit
The ‘HistData’ package provides a collection of small data sets that are
interesting and important in the history of statistics and data
visualization. The goal of the package is to make these available, both
for instructional use and for historical research. Some of these present
interesting challenges for graphics or analysis in R. Documentation:
friendly.github.io/HistData/
Maps of France in 1830, multivariate datasets from A.-M. Guerry and
others, and statistical and graphic methods related to Guerry’s “Moral
Statistics of France”. The goal is to facilitate the exploration and
development of statistical and graphic methods for multivariate data in
a geospatial context of historical interest. Documentation:
https://friendly.github.io/Guerry
Collects several classical word pools used most often to provide lists
of words in psychological studies of learning and memory. It provides a
simple function, ‘pickList’ for selecting random samples of words within
given ranges. Documentation:
friendly.github.io/WordPools/
Provides the tables from the ‘Sean Lahman Baseball Database’ as a set of
R data.frames. It uses the data on pitching, hitting and fielding
performance and other tables from 1871 through 2023, as recorded in the
2024 version of the database. Documentation examples show how many
baseball questions can be investigated. Documentation:
cdalzell.github.io/Lahman
Generates a random quotation from a database of quotes on topics in
statistics, data visualization and science. Other functions allow
searching the quotes database by key term tags, or authors or creating a
word cloud. The output is designed to be suitable for use at the
console, in Rmarkdown and LaTeX. Documentation:
rdrr.io/cran/statquotes/