Skip to content

Plot recipes #582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

irregular-rhomboid
Copy link

This PR builds on the discussion in #581 and provides summary plots of LinearModels for both Plots and Makie. The plots are based on those from R's plot.lm function. While one new lightweight dependency is needed for RecipesBase, the plots are implemented as extensions for StatsPlots and Makie.

Implemented features

  • Plots are implemented for the equivalent of R's Residual plot, Scale-Location plot, Q-Q plot, Residuals-Leverage plot and Cook's distance vs leverage plot. The interface for these functions is the same for both plotting packages.
  • A summary plot function lmplot displaying the residual plot, Q-Q plot, scale-location plot and residuals-leverage plot. The configurability of this function is limited, however.
  • The computation of the leverages has been factored out into a function in lm.jl.
  • For the StatsPlots extension, default values for the axis labels and title are provided. This is not available for the Makie extension, except in lmplot.
  • Basic docstrings are provided for all the new functions

Not implemented

  • Documentation in docs/. This seems like a good thing to add.
  • Tests (it is not clear how to test for both plotting packages at the same time)
  • Loess smoothing. This is a feature in the R plots which would require another dependency to implement, for unclear benefits.
  • Outlier labelling (see last image)

Examples

StatsPlots:

using GLM
using StatsPlots
using RDatasets

data = dataset("datasets", "trees")

l = lm(@formula(Height ~ Girth + Volume), data)

lmplot(l; size = (700,500))

plotlmJulia

Makie:

using GLM
using CairoMakie
using RDatasets

data = dataset("datasets", "trees")

l = lm(@formula(Height ~ Girth + Volume), data)

lmplot(l; size = (700,500))

makie

R equivalent:
lmplotsR

Copy link

codecov bot commented Jan 1, 2025

Codecov Report

Attention: Patch coverage is 95.33679% with 9 lines in your changes missing coverage. Please review.

Project coverage is 91.05%. Comparing base (89493a4) to head (ae37c1c).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
ext/StatsPlotsExt.jl 95.76% 5 Missing ⚠️
src/GLM.jl 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #582      +/-   ##
==========================================
+ Coverage   90.33%   91.05%   +0.72%     
==========================================
  Files           8       11       +3     
  Lines        1107     1297     +190     
==========================================
+ Hits         1000     1181     +181     
- Misses        107      116       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@csimal
Copy link

csimal commented Feb 26, 2025

@ajinkya-k @andreasnoack It would be good to get some feedback about this PR. I finally got time to write some tests, but ultimately plot recipes are hard to test beyond "does not catastrophically crash", and I'm not sure that I can test both the StatsPlots and Makie Recipes since they override the same function.

One option would be to put the recipes into separate mini packages rather than extensions.

@andreasnoack
Copy link
Member

Thanks for preparing this. It's really a great feature. Would it be possible to handle testing of both plotting frameworks by splitting the testing of them into separate CI jobs? Regarding the testing of the images then @jkrumbiegel might be able to share some ideas.

@irregular-rhomboid
Copy link
Author

Would it be possible to handle testing of both plotting frameworks by splitting the testing of them into separate CI jobs?

Good idea. Ii'll look into it.

@@ -98,6 +100,17 @@ module GLM
pivoted_qr!(A; kwargs...) = qr!(A, ColumnNorm(); kwargs...)
end

if !isdefined(Base, :get_extension)
using Requires
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should just raise the minimum Julia version rather than taking a dependency on Requires, especially since get_extension is 1.9 and 1.10 is the oldest supported release series.

@@ -0,0 +1,89 @@
module MakieRecipes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems vanishingly unlikely that someone would load both Makie and Plots in a single session. Rather than duplicating the function stubs for different plotting backends (of which there are more in the ecosystem than just the two implemented here), I think it would probably be fine to define the stubs just once in GLM, even though the methods in different extensions will conflict with one another. Or maybe that would cause other issues I can't think of offhand.

Plotting is a weird case where it feels like a great use case for package extension but in practice it seems hard to do well given how the package extension mechanism itself is designed. This is likely why there are packages like MixedModelsMakie that live separately from MixedModels and Makie. I kind of feel like would be a sensible approach for GLM as well.

Copy link
Author

@irregular-rhomboid irregular-rhomboid Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up making two small separate packages:

The Makie one is unfortunately close to GLMakie, which may not be desirable. I'm open to suggestions.

Currently, there is some code duplication for the standardizedresiduals and leverage functions. It's unclear whether they should be integrated into the main GLM codebase, since they would only be used by these two packages. If not, then this PR can simply be closed without merging.

For the same reason, I'm not sure defining function stubs in GLM is necessary at this point. Might as well just keep the plot packages as standalone, and maybe mention them in the GLM docs

To summarize the latest changes:

  • Moved the Plots and Makie recipes to dedicated packages, instead of extensions.
  • Bumped the Julia compat to 1.11

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plotting is a weird case where it feels like a great use case for package extension but in practice it seems hard to do well given how the package extension mechanism itself is designed.

@ararslan I'd like to understand the argument here. What is the specific issue with using package extensions for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the specific issue with using package extensions for this?

I wouldn't say it's an issue per se, perhaps more of an inconvenience? Package extensions are designed around the idea defining a method of a function in package A for a type in package B without A or B depending on one another directly. As a consequence of this design, package extensions can't have their own exports. The inconvenience for plotting is that plotting packages generally have separate functions for different kinds of plots rather than methods of some parent function, which means you have to maintain stub definitions for the plotting functions in one of the packages (e.g. as implemented in this PR), but there are multiple plotting packages with different, incompatible interfaces but often identically named functions, which then requires duplication of stub definitions (e.g. as implemented in this PR) if you want to avoid overwriting methods if multiple plotting packages get loaded.

It's also worth noting that other modeling packages could conceivably want to define methods of these functions for the model types they define, but if GLM owns the stubs, the extension for the other package has to be defined such that it's loaded for when the modeling package, GLM, and a plotting package are all loaded. It would also need to duplicate it by plotting backend.

It almost feels like it makes more sense for the plotting packages to own the function stubs, but I'm not sure the maintainers of those packages would necessarily agree, haha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants