Skip to content

Commit

Permalink
update and edit paper
Browse files Browse the repository at this point in the history
  • Loading branch information
graysonwhite committed Jan 16, 2024
1 parent facf0b6 commit 252253b
Show file tree
Hide file tree
Showing 5 changed files with 140 additions and 117 deletions.
64 changes: 35 additions & 29 deletions paper/paper.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -33,57 +33,60 @@ plots that complies with the grammar of graphics [@wickham2010]. Further, `gglm`
utilizes the `broom` and `broom.mixed` R packages to provide support for
diagnostic plots produced from a variety of model object classes across a wide
variety of R packages [@broom; @broom.mixed]. A quartet of diagnostic plots can
be quickly created using `gglm`'s homonymous function, or through instructive
and intuitive layer functions added to a `ggplot2` object [@ggplot2].
be quickly created using `gglm`'s homonymous function, or plots can be created
individually through instructive and intuitive layer functions added to a
`ggplot2` object [@ggplot2].

# Statement of need
# Statement of Need

When scientists, statistical practitioners, students, and others implement
statistical models, it is of the utmost importance that the modeling assumptions
are verified through visual diagnostics in order to ensure valid statistical
inference. The R statistical software language provides a method for producing
diagnostic plots for linear model objects created with `stats::lm`, however
these plots are visually unappealing, inconsistent with diagnostic plots
produced for other R packages and model types, and out of place in modern
statistics and data science courses focused on learning R with the `tidyverse`
across other R packages and model types, and out of place in modern statistics
and data science courses focused on learning R with the `tidyverse`
[@tidyverse].

`gglm` addresses the described issues with current diagnostic plots in R by
providing a consistent interface for producing beautiful and publication-ready
diagnostic plots for a large variety of R packages and model types (linear
diagnostic plots across a large variety of R packages and model types (linear
models, linear mixed models, generalized linear mixed models, etc.). `gglm`
provides functionality to quickly produce four common diagnostic plots, similar
to `stats::plot.lm`, but produced by `ggplot2`. Further, `gglm` provides a suite
of layer functions adhering to the grammar of graphics which allow the user to
create and fine-tune their diagnostic plots through `ggplot2`'s intuitive
interface. The layer functions are particularly applicable in modern courses
teaching linear regression where students have already learned `ggplot2`, and in
particular they are used in Harvard University's introductory statistics course
[@mcconville2023]. Outside of educational benefits, `gglm` has potential to
allow researchers to more easily publish elegant diagnostic plots. `gglm` has
been downloaded from CRAN over 23,000 times as of January 2024.
teaching linear regression where students have already learned `ggplot2`. For
example, `gglm` and its layer functions are used in Harvard University's
introductory statistics course [@mcconville2023]. Outside of educational
benefits, `gglm` has potential to allow researchers to more easily publish
elegant diagnostic plots. `gglm` has been downloaded from CRAN over 23,000 times
as of January 2024.

# Usage and Philosophy
# Usage and Features

`gglm` has a simple philosophy for usage of the package: "be easy, intuitive,
and customizable". This philosophy comes about from the understanding that an
individual producing a diagnostic plot will be in one of two camps: 1) the
individual who wants an *easy* to use tool that allows them to quickly check
their model diagnostics, or 2) the individual who wants an *intuitive and
customizable* tool that allows them to look closely at their diagnostics for the
purposes of education, fine-tuning for publication, or other reasons. `gglm`
satisfies the individuals in both camps.
`gglm` achieves a balance in functionality by being both as easy to use as the
built-in `stats::plot.lm` method, yet still highly intuitive and customizable
for the curious user. `gglm` is designed with these traits in mind due to the
understanding that an individual producing a diagnostic plot will most likely be
in one of two camps: 1) the individual who wants an *easy* to use tool that
allows them to quickly check their model diagnostics, or 2) the individual who
wants an *intuitive and customizable* tool that allows them to look closely at
their diagnostics for the purposes of education, fine-tuning graphics for
publication, or other reasons. `gglm` satisfies the members of both camps.

The `gglm::gglm` function is made for folks in the first camp who are looking
a more aesthetically pleasing alternative to `stats::plot.lm`. In practice, the
process of using `gglm::gglm` is as simple as and more general than using
The `gglm::gglm` function is made for folks in the first camp who are looking
for a more aesthetically pleasing alternative to `stats::plot.lm`. In practice,
the process of using `gglm::gglm` is as simple as and more general than using
`stats::plot.lm`, with steps as follows:

+ fit a model of any class listed in `gglm::list_model_classes`,
+ call `gglm::gglm` on the saved model object.

The `gglm::stat_*` functions are thus for the individual in the second camp.
`gglm` provides seven functions of this sort, including those that produce the
The `gglm::stat_*` functions are thus for those in the second camp. `gglm`
provides seven functions of this sort, including those that produce the
following plots: Cook's distance by leverage, Cook's distance by observation
number, fitted values by residual values, normal QQ, residual histogram,
residual values by leverage, and scale by location. The steps to produce a
Expand All @@ -103,9 +106,12 @@ Functionality similar to that of `gglm`'s is provided by a variety of R
packages. As mentioned throughout, `stats` provides a `plot` method for
producing diagnostic plots for `lm` objects with base R graphics [@R]. Further,
`lindia` produces diagnostic plots for `lm` objects with `ggplot2` graphics, but
does not include functions that adhere with the grammar of graphics. Finally,
many packages provide methods for plotting diagnostics based on their own model
classes (see, e.g. `lme4::plot.merMod`), however these methods are do not have
consistent usage across packages [@lme4].
does not include functions that adhere with the grammar of graphics [@lindia].
Finally, many packages provide methods for plotting diagnostics based on their
own model classes (see, e.g. `lme4::plot.merMod`), however these methods are do
not have consistent usage across packages [@lme4]. `gglm` hence addresses a
significant gap in functionality by creating a consistent framework for
producing diagnostic plots across R packages and model types while adhering to
the grammar of graphics.

# References
24 changes: 12 additions & 12 deletions paper/paper.log
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
This is XeTeX, Version 3.141592653-2.6-0.999995 (TeX Live 2023) (preloaded format=xelatex 2024.1.15) 15 JAN 2024 17:30
This is XeTeX, Version 3.141592653-2.6-0.999995 (TeX Live 2023) (preloaded format=xelatex 2024.1.15) 15 JAN 2024 21:40
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
Expand Down Expand Up @@ -924,8 +924,8 @@ Package hyperref Warning: Suppressing link with empty target on input line 271.

Package hyperref Warning: Suppressing link with empty target on input line 271.

File: /home/grayson/R/x86_64-pc-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png Graphic file (type bmp)
</home/grayson/R/x86_64-pc-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png>
File: /home/grayson/R/aarch64-unknown-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png Graphic file (type bmp)
</home/grayson/R/aarch64-unknown-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png>

Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) Make it at least 63.55022pt, for example:
Expand All @@ -934,12 +934,12 @@ Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) \addtolength{\topmargin}{-0.95425pt}.

LaTeX Font Info: Font shape `TU/lmss/m/it' in size <8> not available
(Font) Font shape `TU/lmss/m/sl' tried instead on input line 331.
(Font) Font shape `TU/lmss/m/sl' tried instead on input line 334.
[1

]
File: /home/grayson/R/x86_64-pc-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png Graphic file (type bmp)
</home/grayson/R/x86_64-pc-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png>
File: /home/grayson/R/aarch64-unknown-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png Graphic file (type bmp)
</home/grayson/R/aarch64-unknown-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png>

Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) Make it at least 63.55022pt, for example:
Expand All @@ -948,8 +948,8 @@ Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) \addtolength{\topmargin}{-0.95425pt}.

[2]
File: /home/grayson/R/x86_64-pc-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png Graphic file (type bmp)
</home/grayson/R/x86_64-pc-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png>
File: /home/grayson/R/aarch64-unknown-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png Graphic file (type bmp)
</home/grayson/R/aarch64-unknown-linux-gnu-library/4.3/rticles/rmarkdown/templates/joss/resources/JOSS-logo.png>

Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) Make it at least 63.55022pt, for example:
Expand All @@ -963,18 +963,18 @@ LaTeX2e <2023-11-01>
L3 programming layer <2024-01-04>
***********
Package rerunfilecheck Info: File `paper.out' has not changed.
(rerunfilecheck) Checksum: 0B3BF1C9D2BC7F6B8E6DB44814E9B8B7;651.
(rerunfilecheck) Checksum: 36E8A3BFD73D3865102046825CE75FCD;641.
Package logreq Info: Writing requests to 'paper.run.xml'.
\openout1 = `paper.run.xml'.

)
Here is how much of TeX's memory you used:
34349 strings out of 476822
703677 string characters out of 5804165
1944174 words of memory out of 5000000
703729 string characters out of 5804165
1943174 words of memory out of 5000000
55827 multiletter control sequences out of 15000+600000
564925 words of font info for 82 fonts, out of 8000000 for 9000
14 hyphenation exceptions out of 8191
84i,12n,87p,678b,850s stack positions out of 10000i,1000n,20000p,200000b,200000s
84i,13n,87p,678b,850s stack positions out of 10000i,1000n,20000p,200000b,200000s

Output written on paper.pdf (3 pages).
64 changes: 35 additions & 29 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,57 +33,60 @@ plots that complies with the grammar of graphics [@wickham2010]. Further, `gglm`
utilizes the `broom` and `broom.mixed` R packages to provide support for
diagnostic plots produced from a variety of model object classes across a wide
variety of R packages [@broom; @broom.mixed]. A quartet of diagnostic plots can
be quickly created using `gglm`'s homonymous function, or through instructive
and intuitive layer functions added to a `ggplot2` object [@ggplot2].
be quickly created using `gglm`'s homonymous function, or plots can be created
individually through instructive and intuitive layer functions added to a
`ggplot2` object [@ggplot2].

# Statement of need
# Statement of Need

When scientists, statistical practitioners, students, and others implement
statistical models, it is of the utmost importance that the modeling assumptions
are verified through visual diagnostics in order to ensure valid statistical
inference. The R statistical software language provides a method for producing
diagnostic plots for linear model objects created with `stats::lm`, however
these plots are visually unappealing, inconsistent with diagnostic plots
produced for other R packages and model types, and out of place in modern
statistics and data science courses focused on learning R with the `tidyverse`
across other R packages and model types, and out of place in modern statistics
and data science courses focused on learning R with the `tidyverse`
[@tidyverse].

`gglm` addresses the described issues with current diagnostic plots in R by
providing a consistent interface for producing beautiful and publication-ready
diagnostic plots for a large variety of R packages and model types (linear
diagnostic plots across a large variety of R packages and model types (linear
models, linear mixed models, generalized linear mixed models, etc.). `gglm`
provides functionality to quickly produce four common diagnostic plots, similar
to `stats::plot.lm`, but produced by `ggplot2`. Further, `gglm` provides a suite
of layer functions adhering to the grammar of graphics which allow the user to
create and fine-tune their diagnostic plots through `ggplot2`'s intuitive
interface. The layer functions are particularly applicable in modern courses
teaching linear regression where students have already learned `ggplot2`, and in
particular they are used in Harvard University's introductory statistics course
[@mcconville2023]. Outside of educational benefits, `gglm` has potential to
allow researchers to more easily publish elegant diagnostic plots. `gglm` has
been downloaded from CRAN over 23,000 times as of January 2024.
teaching linear regression where students have already learned `ggplot2`. For
example, `gglm` and its layer functions are used in Harvard University's
introductory statistics course [@mcconville2023]. Outside of educational
benefits, `gglm` has potential to allow researchers to more easily publish
elegant diagnostic plots. `gglm` has been downloaded from CRAN over 23,000 times
as of January 2024.

# Usage and Philosophy
# Usage and Features

`gglm` has a simple philosophy for usage of the package: "be easy, intuitive,
and customizable". This philosophy comes about from the understanding that an
individual producing a diagnostic plot will be in one of two camps: 1) the
individual who wants an *easy* to use tool that allows them to quickly check
their model diagnostics, or 2) the individual who wants an *intuitive and
customizable* tool that allows them to look closely at their diagnostics for the
purposes of education, fine-tuning for publication, or other reasons. `gglm`
satisfies the individuals in both camps.
`gglm` achieves a balance in functionality by being both as easy to use as the
built-in `stats::plot.lm` method, yet still highly intuitive and customizable
for the curious user. `gglm` is designed with these traits in mind due to the
understanding that an individual producing a diagnostic plot will most likely be
in one of two camps: 1) the individual who wants an *easy* to use tool that
allows them to quickly check their model diagnostics, or 2) the individual who
wants an *intuitive and customizable* tool that allows them to look closely at
their diagnostics for the purposes of education, fine-tuning graphics for
publication, or other reasons. `gglm` satisfies the members of both camps.

The `gglm::gglm` function is made for folks in the first camp who are looking
a more aesthetically pleasing alternative to `stats::plot.lm`. In practice, the
process of using `gglm::gglm` is as simple as and more general than using
The `gglm::gglm` function is made for folks in the first camp who are looking
for a more aesthetically pleasing alternative to `stats::plot.lm`. In practice,
the process of using `gglm::gglm` is as simple as and more general than using
`stats::plot.lm`, with steps as follows:

+ fit a model of any class listed in `gglm::list_model_classes`,
+ call `gglm::gglm` on the saved model object.

The `gglm::stat_*` functions are thus for the individual in the second camp.
`gglm` provides seven functions of this sort, including those that produce the
The `gglm::stat_*` functions are thus for those in the second camp. `gglm`
provides seven functions of this sort, including those that produce the
following plots: Cook's distance by leverage, Cook's distance by observation
number, fitted values by residual values, normal QQ, residual histogram,
residual values by leverage, and scale by location. The steps to produce a
Expand All @@ -103,9 +106,12 @@ Functionality similar to that of `gglm`'s is provided by a variety of R
packages. As mentioned throughout, `stats` provides a `plot` method for
producing diagnostic plots for `lm` objects with base R graphics [@R]. Further,
`lindia` produces diagnostic plots for `lm` objects with `ggplot2` graphics, but
does not include functions that adhere with the grammar of graphics. Finally,
many packages provide methods for plotting diagnostics based on their own model
classes (see, e.g. `lme4::plot.merMod`), however these methods are do not have
consistent usage across packages [@lme4].
does not include functions that adhere with the grammar of graphics [@lindia].
Finally, many packages provide methods for plotting diagnostics based on their
own model classes (see, e.g. `lme4::plot.merMod`), however these methods are do
not have consistent usage across packages [@lme4]. `gglm` hence addresses a
significant gap in functionality by creating a consistent framework for
producing diagnostic plots across R packages and model types while adhering to
the grammar of graphics.

# References
Binary file modified paper/paper.pdf
Binary file not shown.
Loading

0 comments on commit 252253b

Please sign in to comment.