Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
name: CI
on:
pull_request:
branches:
- master
push:
tags:
- '*'
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## Unreleased

- Added `aggregate` transformation for flexible data aggregation with automatic grouping, custom labels, and support for scalar and vector-valued aggregations [#696](https://github.com/MakieOrg/AlgebraOfGraphics.jl/pull/696).

## v0.11.9 - 2025-10-10

- Improved error message when two layers with incompatible continuous data are combined [#692](https://github.com/MakieOrg/AlgebraOfGraphics.jl/pull/692).
Expand Down
170 changes: 130 additions & 40 deletions docs/src/reference/analyses.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,172 +10,172 @@ EditURL = "analyses.jl"
histogram
```

````@example analyses
```@example analyses
using AlgebraOfGraphics, CairoMakie
set_aog_theme!()

df = (x=randn(5000), y=randn(5000), z=rand(["a", "b", "c"], 5000))
specs = data(df) * mapping(:x, layout=:z) * histogram(bins=range(-2, 2, length=15))
draw(specs)
````
```

````@example analyses
```@example analyses
specs = data(df) * mapping(:x, dodge=:z, color=:z) * histogram(bins=range(-2, 2, length=15))
draw(specs)
````
```

````@example analyses
```@example analyses
specs = data(df) * mapping(:x, stack=:z, color=:z) * histogram(bins=range(-2, 2, length=15))
draw(specs)
````
```

````@example analyses
```@example analyses
specs = data(df) *
mapping((:x, :z) => ((x, z) -> x + 5 * (z == "b")) => "new x", col=:z) *
histogram(datalimits=extrema, bins=20)
draw(specs, facet=(linkxaxes=:minimal,))
````
```

````@example analyses
```@example analyses
data(df) * mapping(:x, :y, layout=:z) * histogram(bins=15) |> draw
````
```

## Density

```@docs
AlgebraOfGraphics.density
```

````@example analyses
```@example analyses
df = (x=randn(5000) .+ repeat([0, 2, 4, 6], inner = 1250), y=randn(5000), z=repeat(["a", "b", "c", "d"], inner = 1250))
specs = data(df) * mapping(:x, layout=:z) * AlgebraOfGraphics.density()

draw(specs)
````
```

```@example analyses
data(df) * mapping(:x, layout=:z) * AlgebraOfGraphics.density(datalimits = (0, 8)) |> draw
```

````@example analyses
```@example analyses
draw(specs * visual(direction = :y))
````
```

````@example analyses
```@example analyses
specs = data(df) *
mapping((:x, :z) => ((x, z) -> x + 5 * (z ∈ ["b", "d"])) => "new x", layout=:z) *
AlgebraOfGraphics.density(datalimits=extrema)
draw(specs, facet=(linkxaxes=:minimal,))
````
```

````@example analyses
```@example analyses
data(df) * mapping(:x, :y, layout=:z) * AlgebraOfGraphics.density(npoints=50) |> draw
````
```

````@example analyses
```@example analyses
specs = data(df) * mapping(:x, :y, layout=:z) *
AlgebraOfGraphics.density(npoints=50) * visual(Surface)

draw(specs, axis=(type=Axis3, zticks=0:0.1:0.2, limits=(nothing, nothing, (0, 0.2))))
````
```

## Frequency

```@docs
frequency
```

````@example analyses
```@example analyses
df = (x=rand(["a", "b", "c"], 100), y=rand(["a", "b", "c"], 100), z=rand(["a", "b", "c"], 100))
specs = data(df) * mapping(:x, layout=:z) * frequency()
draw(specs)
````
```

````@example analyses
```@example analyses
specs = data(df) * mapping(:x, layout=:z, color=:y, stack=:y) * frequency()
draw(specs)
````
```

````@example analyses
```@example analyses
specs = data(df) * mapping(:x, :y, layout=:z) * frequency()
draw(specs)
````
```

## Expectation

```@docs
expectation
```

````@example analyses
```@example analyses
df = (x=rand(["a", "b", "c"], 100), y=rand(["a", "b", "c"], 100), z=rand(100), c=rand(["a", "b", "c"], 100))
specs = data(df) * mapping(:x, :z, layout=:c) * expectation()
draw(specs)
````
```

````@example analyses
```@example analyses
specs = data(df) * mapping(:x, :z, layout=:c, color=:y, dodge=:y) * expectation()
draw(specs)
````
```

````@example analyses
```@example analyses
specs = data(df) * mapping(:x, :y, :z, layout=:c) * expectation()
draw(specs)
````
```

## Linear

```@docs
linear
```

````@example analyses
```@example analyses
x = 1:0.05:10
a = rand(1:7, length(x))
y = 1.2 .* x .+ a .+ 0.5 .* randn.()
df = (; x, y, a)
specs = data(df) * mapping(:x, :y, color=:a => nonnumeric) * (linear() + visual(Scatter))
draw(specs)
````
```

## Smoothing

```@docs
smooth
```

````@example analyses
```@example analyses
x = 1:0.05:10
a = rand(1:7, length(x))
y = sin.(x) .+ a .+ 0.1 .* randn.()
df = (; x, y, a)
specs = data(df) * mapping(:x, :y, color=:a => nonnumeric) * (smooth() + visual(Scatter))
draw(specs)
````
```

## Contours

```@docs
contours
```

````@example analyses
```@example analyses
x = repeat(1:10, 10)
y = repeat(11:20, inner = 10)
z = sqrt.(x .* y)
df = (; x, y, z)
specs = data(df) * mapping(:x, :y, :z) * contours(levels = 8)
draw(specs)
````
```

````@example analyses
```@example analyses
x = repeat(1:10, 10)
y = repeat(11:20, inner = 10)
z = sqrt.(x .* y)
df = (; x, y, z)
specs = data(df) * mapping(:x, :y, :z) * contours(levels = 8, labels = true)
draw(specs)
````
```

## Filled Contours

Expand Down Expand Up @@ -205,3 +205,93 @@ specs = data(df) *
filled_contours(levels = [-Inf, 5, 8, 10, 12, 13, 14, Inf])
draw(specs, scales(Color = (; palette = clipped(from_continuous(:plasma), low = :cyan, high = :red))))
```

## Aggregate

```@docs
aggregate
```

The `aggregate` transformation allows you to perform flexible aggregations on your data.
All mapped columns that are not explicitly aggregated are automatically used for grouping.

This analysis layer is intended for aggregations that are only needed for a visualization, otherwise it may make more sense to compute values in a separate data wrangling step and add a separate `data` layer.

### Basic Aggregation

Compute the mean body mass for each penguin species:

```@example analyses
using AlgebraOfGraphics
using Statistics

penguins = AlgebraOfGraphics.penguins()

data(penguins) *
mapping(:species, :body_mass_g) *
aggregate(2 => mean) *
visual(BarPlot) |> draw
```

### Multiple Grouping Dimensions

Group by both species and sex, computing mean body mass:

```@example analyses
data(penguins) *
mapping(:species, :body_mass_g, color = :sex, dodge = :sex) *
aggregate(2 => mean) *
visual(BarPlot) |> draw
```

### Aggregating Multiple Columns

Compute both mean and standard deviation:

```@example analyses
data(penguins) *
mapping(:species, :body_mass_g) *
(
aggregate(2 => mean) * visual(BarPlot) +
aggregate(2 => mean, 2 => std => 3) * visual(Errorbars)
) |> draw
```

### Splitting Aggregation Results

Sometimes an aggregation function may return multiple values which
should form separate inputs for the subsequent visual. In this
case you can assign a vector of accessor specifications via the pair syntax.
Each vector element must specify an accessor function (here `first` and `last`) and the mapping that the result should be assigned to, either given as integers for positional arguments or symbols for named arguments.

```@example analyses
data(penguins) *
mapping(:species, :body_mass_g, color = :sex, dodge_x = :sex) *
aggregate(2 => extrema => [first => 2, last => 3]) *
visual(Rangebars, linewidth = 3) |> draw(scales(DodgeX = (; width = 0.2)))
```

### Vector-Valued Aggregations

Aggregate functions can return vectors, which will be automatically expanded. If you have another aggregation that returns scalars, the scalars will be repeated to match the length of the vector aggregation. If you have multiple aggregations returning vectors, the lengths of all vectors in a given group must be the same.

```@example analyses
# Get lower and upper quartiles as a vector
lower_upper_quartile(x) = quantile(x, [0.25, 0.75])

data(penguins) *
mapping(:species, :body_mass_g, color = :sex) *
aggregate(2 => lower_upper_quartile) *
visual(Scatter, markersize = 15) |> draw
```

### Custom Labels

Provide custom labels for aggregated outputs:

```@example analyses
data(penguins) *
mapping(:species, :body_mass_g) *
aggregate(2 => mean => "Average Mass (g)") *
visual(BarPlot) |> draw
```
4 changes: 3 additions & 1 deletion src/AlgebraOfGraphics.jl
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ export hideinnerdecorations!, deleteemptyaxes!
export Layer, Layers, ProcessedLayer, ProcessedLayers, zerolayer
export Entry, AxisEntries
export renamer, sorter, nonnumeric, verbatim, presorted
export density, histogram, linear, smooth, expectation, frequency, contours, filled_contours
export density, histogram, linear, smooth, expectation, frequency, contours, filled_contours, aggregate, highlight
export visual, data, geodata, dims, mapping
export datetimeticks
export draw, draw!
Expand Down Expand Up @@ -72,8 +72,10 @@ include("transformations/histogram.jl")
include("transformations/groupreduce.jl")
include("transformations/frequency.jl")
include("transformations/expectation.jl")
include("transformations/aggregate.jl")
include("transformations/contours.jl")
include("transformations/filled_contours.jl")
include("transformations/highlight.jl")
include("guides/guides.jl")
include("guides/legend.jl")
include("guides/colorbar.jl")
Expand Down
20 changes: 18 additions & 2 deletions src/algebra/layer.jl
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,9 @@ end

unnest(vs::AbstractArray, indices) = map(k -> [el[k] for el in vs], indices)

unnest_arrays(vs) = unnest(vs, keys(first(vs)))
unnest_arrays(vs) = isempty(vs) ? [[]] : unnest(vs, keys(first(vs)))
function unnest_dictionaries(vs)
isempty(vs) && return Dictionary()
return Dictionary(Dict((k => [el[k] for el in vs] for k in collect(keys(first(vs))))))
end
slice(v, c) = map(el -> getnewindex(el, c), v)
Expand All @@ -192,6 +193,21 @@ function Base.map(f, processedlayer::ProcessedLayer)
return ProcessedLayer(processedlayer; positional, named)
end

function filtermap(f, processedlayer::ProcessedLayer)
axs = shape(processedlayer)
outputs = map(CartesianIndices(axs)) do c
return f(slice(processedlayer.positional, c), slice(processedlayer.named, c))
end
to_remove = outputs .== nothing
deleteat!(outputs, to_remove)
primary = map(processedlayer.primary) do value
value[.!to_remove]
end

positional, named = unnest_arrays(map(first, outputs)), unnest_dictionaries(map(last, outputs))
return ProcessedLayer(processedlayer; positional, named, primary)
end

## Get scales from a `ProcessedLayer`

function uniquevalues(v::AbstractArray)
Expand All @@ -200,7 +216,7 @@ function uniquevalues(v::AbstractArray)
return collect(uniquesorted(_v, perm))
end

to_label(label::AbstractString) = label
to_label(label) = label
to_label(labels::AbstractArray) = reduce(mergelabels, labels)

# merge dict2 into dict but translate keys first using remapdict
Expand Down
Loading
Loading