Skip to content

Commit e15ea0b

Browse files
committed
init
0 parents  commit e15ea0b

27 files changed

+1915
-0
lines changed

.DS_Store

6 KB
Binary file not shown.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2022 Martin Storath, Andreas Weinmann
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# CSSD - Cubic smoothing splines for discontinuous signals
2+
3+
This is a reference implementation in Matlab for the algorithms described in the paper
4+
5+
**M. Storath, A. Weinmann, "Smoothing splines for discontinuous signals", 2022**
6+
7+
## Overview of main functionalities
8+
1. **cssd.m** computes a cubic smoothing spline with discontinuities (CSSD) for data (x,y). It is a solution of the following model of a smoothing spline $f$ with a-priori unknown discontinuities $J$
9+
10+
$$\min_{f, J} p \sum_{i=1}^N \left(\frac{y_i - f(x_i)}{\delta_i}\right)^2 + (1-p) \int_{[x_1, x_N] \setminus J} (f''(t))^2 dt + \gamma |J|.$$
11+
12+
where
13+
* $y_i = g(x_i) + \epsilon_i$ are samples of piecewise smooth function $g$ at data sites $x_1, \ldots, x_N$, and an estimate $\delta_i$ of the standard deviation of the errors $\epsilon_i$
14+
* the minimum is taken over all possible sets of discontinuities between two data sites $J \subset [x_1, x_N]\setminus \{x_1, \ldots, x_N\}$
15+
and all functions $f$ that are twice continuously differentiable away from the discontinuities.
16+
* The model parameter $p \in (0, 1)$ controls the relative weight of the smoothness term (second term) and the data fidelity term.
17+
* The last term is a penalty for the number of discontinuities $|J|$ weighted by a parameter $\gamma > 0.$
18+
19+
2. **cssd_cv.m** automatically determines values for the model parameters $p$ and $\gamma$ based on K-fold cross validation.
20+
21+
## Quickstart
22+
1. Execute "install_cssd.m" which adds the folder and all subfolders to the Matlab path.
23+
2. Execute any m-file from the demos folder
24+
25+
## Examples
26+
27+
### Synthetic data
28+
<figure>
29+
<img src="images/Ex_Synthetic.png" alt="Synthetic signal" width="600"/>
30+
<figcaption>A synthetic signal is sampled at $N = 100$ random data sites $x_i$
31+
and corrupted by zero mean Gaussian noise with standard deviation
32+
$0.1.$
33+
The results of the discussed model are shown for $p=0.999$ and different parameters of $\gamma,$ where $\gamma=\infty$ corresponds to classical smoothing splines.
34+
The thick lines
35+
represent the results of the shown sample realization. The shaded areas depict the $2.5 \%$ to $97.5 \%$ (pointwise) quantiles of $1000$ realizations. The histograms under the plots show the frequency of the detected discontinuity locations over all realizations.
36+
</figcaption>
37+
</figure>
38+
39+
### Stock data
40+
<figure>
41+
<img src="images/Ex_Stock_CV.png" alt="Stock" width="600"/>
42+
<figcaption>The dots represent the logarithm of the closing prices of the Meta stock from May 18, 2012,
43+
to May 19, 2022. The curve represents the CSSD with parameters determined by K-fold CV ($p = 0.4702$, $\gamma = 0.0069$). The dashed vertical lines indicate the discontinuities of the CSSD, and the ticks correspond to the date before the discontinuity.
44+
</figcaption>
45+
</figure>
46+
47+
### Geyser data
48+
<figure>
49+
<img src="images/Ex_Geyser_CV.png" alt="Geyser" width="600"/>
50+
<figcaption>Fitting a CSSD to the Old Faithful data (circles):
51+
If the parameter is selected based on K-fold CV
52+
we obtain a result without discontinuities which coincides with a classical smoothing spline (solid curve).
53+
Keeping the selected $p$-parameter and lowering the $\gamma$ parameter sufficiently gives a two-phase regression curve (dashed curves) with a breakpoint near $x = 3$ (dashed vertical line), and the two curve segments are nearly linear.
54+
Both of the above parameter sets yield better CV-scores than a linear model (dotted line).
55+
</figcaption>
56+
</figure>
57+
58+
## Reference
59+
60+
M. Storath, A. Weinmann, "Smoothing splines for discontinuous signals", 2022

cssd.m

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
function output = cssd(x,y,p,gamma,xx,delta)
2+
%CSSD Cubic smoothing spline with discontinuities
3+
%
4+
% cssd(x, y, p, gamma, xx, delta) computes a cubic smoothing spline with discontinuities for the
5+
% given data (x,y). The data values may be scalars or vectors. Data points with the
6+
% same site are replaced by their (weighted) average as in the builtin csaps
7+
% function.
8+
%
9+
% Input
10+
% x: vector of data sites
11+
%
12+
% y: vector of same lenght as x or matrix where y(:,i) is a data vector at site x(i)
13+
%
14+
% p: parameter between 0 and 1 that weights the rougness penalty
15+
% (high values result in smoother curves). Use CSSD_CV for automatic
16+
% selection.
17+
%
18+
% gamma: parameter between 0 and Infinity that weights the discontiuity
19+
% penalty (high values result in less discontinuities, gamma = Inf returns
20+
% a classical smoothing spline). Use CSSD_CV for automatic
21+
% selection.
22+
%
23+
% xx: (optional) evaluation points for the result
24+
%
25+
% delta: (optional) weights of the data sites. delta may be thought of as the
26+
% standard deviation of the at site x_i. Should have the same size as x.
27+
% - Note: The Matlab built in spline function csaps uses a different weight
28+
% convention (w). delta is related to Matlab's w by w = 1./delta.^2
29+
% - Note for vector-valued data: Weights are assumed to be identical over
30+
% vector-components. (Componentwise weights might be supported in a future version.)
31+
%
32+
% Output
33+
% output = cssd(...)
34+
% output.pp: ppform of a smoothing spline with discontinuities; if xx is specified,
35+
% the evaluation of the result at the points xx is returned
36+
% output.discont: locations of detected discontinuities, the locations are a
37+
% subset of the midpoints of the data sites x
38+
% output.interval_cell: a list of discrete indices between two discontinuities
39+
% output.pp_cell: a list of the cubic splines corresponding to the indices in interval_cell
40+
%
41+
% See also CSAPS, CSSD_CV
42+
43+
%%% BEGIN CHECK ARGUMENTS
44+
if nargin<5, xx = []; end
45+
if nargin<6, delta = []; end
46+
47+
if isempty(delta), delta = ones(size(x)); end
48+
49+
assert( (0 <= p) && (p <= 1), 'The p parameter must fulfill 0 <= p <= 1')
50+
assert( 0 <= gamma, 'The gamma parameter must fulfill 0 < gamma')
51+
52+
% Matlab uses the parameter w which is related to delta of De Boor's book by w = 1./delta.^2
53+
w = 1./delta.^2;
54+
55+
% checks arguments and creates column vectors (chckxywp is Matlab built in)
56+
[xi,yi,~,wi] = chckxywp(x,y,2,w,p);
57+
deltai = sqrt(1./wi);
58+
59+
% Note: from now on we use the xi, yi, wi, deltai versions
60+
%%% END CHECK ARGUMENTS
61+
62+
[N,D] = size(yi);
63+
64+
% if gamma == Inf (discontinuity has infinite penalty), we may directly
65+
% compute a classical smoothing spline
66+
% also, if p == 1, we may straight compute an interpolating spline, no
67+
% matter how large gamma is (smoothness costs are equal to 0)
68+
if (gamma == Inf) || (p == 1)
69+
pp = csaps(xi,yi',p,[],wi);
70+
discont = [];
71+
interval_cell = {1:N};
72+
pp_cell = {fnxtr(pp,2)};
73+
else
74+
% F stores Bellmann values
75+
F = zeros(N, 1);
76+
% partition: stores the optimal partition
77+
partition = zeros(N, 1);
78+
79+
%%% BEGIN PIECEWISE LINEAR CASE
80+
if p == 0 % the piecewise linear case
81+
B = [ones(N,1), xi]./deltai(:);
82+
rhs = yi./deltai(:);
83+
% precompute eps_1r for r=1,...,N
84+
A = [B, rhs];
85+
G = planerot(A(1:2,1));
86+
A(1:2, :) = G*A(1:2, :);
87+
eps_1r = 0;
88+
% loop starts from index three because eps_11 and eps_12 are zero
89+
for r=3:N
90+
G = planerot(A([1,r],1));
91+
A([1,r],:) = G * A([1,r],:);
92+
G = planerot(A([2,r],2));
93+
A([2,r],2:end) = G * A([2,r],2:end);
94+
eps_1r = eps_1r + sum(A(r, 3:end).^2);
95+
% store the eps_1r as the initial Bellman value corresponding to a
96+
% solution without discontinuities
97+
F(r) = eps_1r;
98+
end
99+
%%% BEGIN MAIN LOOP
100+
for rb=2:N
101+
102+
% best left bound (blb) initialized with 1 corresponding to interval 1:rb
103+
% corresponding Bellman value has been set in the precomputation
104+
blb = 1;
105+
106+
A = [B(1:rb,:), rhs(1:rb,:)];
107+
108+
eps_lr = 0;
109+
% the loop is performed in reverse order so that we may use pruning
110+
for lb = rb-1:-1:2
111+
if lb == rb-1
112+
G = planerot(A([end,end-1],1));
113+
A([end,end-1], :) = G*A([end,end-1], :);
114+
else
115+
G = planerot(A([end, lb],1));
116+
A([end,lb], :) = G * A([end,lb], :);
117+
G = planerot(A([end-1, lb],2));
118+
A([end-1,lb], 2:end) = G*A([end-1,lb], 2:end);
119+
eps_lr = eps_lr + sum(A(lb,3:end).^2);
120+
end
121+
122+
% check if setting a discontinuity between lb-1 and lb gives a
123+
% better energy
124+
candidate_value = F(lb-1) + gamma + eps_lr;
125+
if candidate_value < F(rb)
126+
F(rb ) = candidate_value;
127+
blb = lb;
128+
end
129+
% store the best left bound corresponding to the right bound rb
130+
partition( rb ) = blb-1;
131+
end
132+
end
133+
%%% END MAIN LOOP
134+
135+
%%% END PIECEWISE LINEAR CASE
136+
137+
138+
%%% BEGIN CSSD CASE
139+
else % this is the standard case: gamma > 0 and 0 < p < 1
140+
%%% BEGIN PRECOMPUTATIONS
141+
beta = sqrt(1-p);
142+
alpha = sqrt(p)./deltai;
143+
d = diff(xi); % xi is sorted ascendingly
144+
145+
% precompute eps_1r for r=1,...,N
146+
[eps_1r, R, z] = startEpsLR(yi(1:2,:), d(1), alpha(1:2), beta);
147+
% loop starts from index three because eps_11 and eps_12 are zero
148+
for r=3:N
149+
[eps_1r, R, z] = updateEpsLR(eps_1r, R, yi(r,:), d(r-1), z, alpha(r), beta);
150+
% store the eps_1r as the initial Bellman value corresponding to a
151+
% solution without discontinuities
152+
F(r) = eps_1r;
153+
end
154+
%%% END PRECOMPUTATIONS
155+
156+
%%% BEGIN MAIN LOOP
157+
for rb=2:N
158+
159+
% best left bound (blb) initialized with 1 corresponding to interval 1:rb
160+
% corresponding Bellman value has been set in the precomputation
161+
blb = 1;
162+
163+
% the loop is performed in reverse order so that we may use pruning
164+
for lb = rb-1:-1:2
165+
if lb == rb-1
166+
% get start configuration and store start state in R, z
167+
[eps_lr, R, z] = startEpsLR(yi([rb,rb-1],:), d(rb-1), alpha([rb,rb-1]), beta);
168+
else
169+
% perform fast energy update and store current state in R, z
170+
[eps_lr, R, z] = updateEpsLR(eps_lr, R, yi(lb,:), d(lb), z, alpha(lb), beta);
171+
end
172+
173+
% pruning to skip unreachable configurations
174+
% (if this condition is met the following if-condition cannot never
175+
% be fulfilled because eps_lr is monote increasing and F >= 0.)
176+
if (eps_lr + gamma) >= F(rb)
177+
break
178+
end
179+
180+
% check if setting a discontinuity between lb-1 and lb gives a
181+
% better energy
182+
candidate_value = F(lb-1) + gamma + eps_lr;
183+
if candidate_value < F(rb)
184+
F(rb ) = candidate_value;
185+
blb = lb;
186+
end
187+
188+
end
189+
% store the best left bound corresponding to the right bound rb
190+
partition( rb ) = blb-1;
191+
end
192+
%%% END MAIN LOOP
193+
194+
195+
end
196+
%%% END CSSD CASE
197+
198+
%%% BEGIN RECONSTRUCTION
199+
% the discontinuity locations are coded in the array 'partition'. The
200+
% vector [partition(rb)+1:rb] gives the indices of between two
201+
% discontinuity locations. We start from behind with [partition(N)+1:N] and
202+
% successively compute the preceding intervals.
203+
rb = N;
204+
pp_cell = {};
205+
interval_cell = {};
206+
discont = [];
207+
upper_discont = xi(end) + 1;
208+
while rb > 0
209+
% partition(rb) stores corresponding optimal left bound lb
210+
lb = partition(rb);
211+
if lb == 0
212+
lower_discont = xi(1) - 1;
213+
else
214+
lower_discont = (xi(lb+1) + xi(lb)) /2;
215+
end
216+
interval = (lb+1) : rb;
217+
interval_cell{end+1} = interval; %#ok<AGROW> (runtime not critical in this part of the algorithm)
218+
if length(interval) == 1 % this case should happen rarely but may happen e.g. for data of uneven length and low gamma parameter
219+
ymtx = zeros(4,D);
220+
ymtx(:,D) = yi(interval, :);
221+
pp = ppmak([lower_discont, upper_discont], ymtx', D);
222+
else
223+
pp = csaps(xi(interval),yi(interval, :)', p, [], wi(interval));
224+
pp = linext_pp(pp, lower_discont, upper_discont);
225+
pp = embed_pptocubic(pp);
226+
end
227+
pp_cell{end+1} = pp; %#ok<AGROW> (runtime not critical in this part of the algorithm)
228+
% continue with next right bound
229+
rb = lb;
230+
upper_discont = lower_discont;
231+
discont(end+1) = lower_discont; %#ok<AGROW> (runtime not critical in this part of the algorithm)
232+
end
233+
%%% END RECONSTRUCTION
234+
235+
%%% BEGIN MAKE PP FORM
236+
pp_cell = flip(pp_cell); % the pp's were computed in reverse order which is fixed here
237+
interval_cell = flip(interval_cell);
238+
pp = merge_ppcell(pp_cell);
239+
%%% END MAKE PP FORM
240+
241+
discont = flip(discont(1:end-1))'; % the discontinuities were computed in reverse order which is fixed here
242+
243+
end
244+
245+
246+
%%% BEGIN SET OUTPUT
247+
output.pp = pp;
248+
output.discont = discont;
249+
output.interval_cell = interval_cell;
250+
output.pp_cell = pp_cell;
251+
output.discont_idx = zeros(numel(interval_cell)-1, 1);
252+
for i = 1:numel(output.discont_idx)
253+
output.discont_idx(i) = output.interval_cell{i}(end);
254+
end
255+
256+
if isempty(xx)
257+
output.yy = [];
258+
else
259+
output.yy = ppval(pp, xx);
260+
end
261+
262+
fun_cell = cell(numel(pp_cell),1);
263+
for i=1:numel(pp_cell)
264+
fun_cell{i} = @(xx) ppval(pp_cell{i}, xx);
265+
end
266+
output.pcw_fun = PcwFunReal([-Inf; discont(:); Inf], fun_cell);
267+
%%% END SET OUTPUT
268+
269+
end
270+
271+

0 commit comments

Comments
 (0)