mstorath
diff --git a/‎.DS_Store
6 KB b/‎.DS_Store
6 KB
diff --git a/‎LICENSE
Lines changed: 21 additions & 0 deletions b/‎LICENSE
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 60 additions & 0 deletions b/‎README.md
Lines changed: 60 additions & 0 deletions
diff --git a/‎cssd.m
Lines changed: 271 additions & 0 deletions b/‎cssd.m
Lines changed: 271 additions & 0 deletions
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2022 Martin Storath, Andreas Weinmann
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,60 @@
+# CSSD - Cubic smoothing splines for discontinuous signals
+
+This is a reference implementation in Matlab for the algorithms described in the paper
+
+**M. Storath, A. Weinmann, "Smoothing splines for discontinuous signals", 2022**
+
+## Overview of main functionalities
+1. **cssd.m** computes a cubic smoothing spline with discontinuities (CSSD) for data (x,y). It is a solution of the following model of a smoothing spline $f$ with a-priori unknown discontinuities $J$
+
+  $$\min_{f, J} p \sum_{i=1}^N \left(\frac{y_i - f(x_i)}{\delta_i}\right)^2   +  (1-p) \int_{[x_1, x_N] \setminus J}   (f''(t))^2 dt	 + \gamma |J|.$$
+
+  where
+  *  $y_i = g(x_i) + \epsilon_i$ are samples of piecewise smooth function $g$ at data sites $x_1, \ldots, x_N$, and an estimate $\delta_i$  of the standard deviation of the errors $\epsilon_i$
+  * the minimum is taken over all possible sets of discontinuities between two data sites $J \subset [x_1, x_N]\setminus \{x_1, \ldots, x_N\}$
+   and all functions $f$ that are twice continuously differentiable away from the discontinuities.
+  * The model parameter $p \in (0, 1)$ controls the relative weight of the smoothness term (second term) and the data fidelity term.
+  * The last term is a penalty for the number of discontinuities $|J|$ weighted by a parameter $\gamma > 0.$ 
+
+2. **cssd_cv.m** automatically determines values for the model parameters $p$ and $\gamma$ based on K-fold cross validation.
+
+## Quickstart
+1. Execute "install_cssd.m" which adds the folder and all subfolders to the Matlab path.
+2. Execute any m-file from the demos folder
+
+## Examples
+
+### Synthetic data
+<figure>
+  <img src="images/Ex_Synthetic.png" alt="Synthetic signal" width="600"/>
+  <figcaption>A synthetic signal is sampled at $N = 100$ random data sites $x_i$ 
+	and corrupted by zero mean Gaussian noise with standard deviation 
+$0.1.$
+	The results of the discussed model are shown for $p=0.999$ and different parameters of $\gamma,$ where $\gamma=\infty$  corresponds to classical smoothing splines.
+	The thick lines 
+represent the results of the shown sample realization.	 The shaded areas depict the $2.5 \%$ to $97.5 \%$  (pointwise) quantiles of $1000$ realizations. The histograms under the plots show the frequency of the detected discontinuity locations over all realizations.
+</figcaption>
+</figure>
+
+### Stock data
+<figure>
+  <img src="images/Ex_Stock_CV.png" alt="Stock" width="600"/>
+  <figcaption>The dots represent the logarithm of the closing prices of the Meta stock from May 18, 2012, 
+	to May 19, 2022. The curve represents the CSSD with parameters determined by K-fold CV ($p = 0.4702$,  $\gamma = 0.0069$). The dashed vertical lines indicate the discontinuities of the CSSD, and the ticks correspond to the date before the discontinuity.
+</figcaption>
+</figure>
+
+### Geyser data
+<figure>
+  <img src="images/Ex_Geyser_CV.png" alt="Geyser" width="600"/>
+  <figcaption>Fitting a CSSD to the Old Faithful data (circles):
+	If the parameter is selected based on K-fold CV
+	we obtain a result without discontinuities which coincides with a classical smoothing spline (solid curve).
+	Keeping the selected $p$-parameter and lowering the $\gamma$ parameter sufficiently gives a two-phase regression curve (dashed curves) with a breakpoint near $x = 3$ (dashed vertical line), and the two curve segments are nearly linear. 
+	Both of the above parameter sets yield better CV-scores  than a linear model (dotted line).
+</figcaption>
+</figure>
+
+## Reference
+
+M. Storath, A. Weinmann, "Smoothing splines for discontinuous signals", 2022
@@ -0,0 +1,271 @@
+function output = cssd(x,y,p,gamma,xx,delta)
+%CSSD Cubic smoothing spline with discontinuities
+%
+%   cssd(x, y, p, gamma, xx, delta) computes a cubic smoothing spline with discontinuities for the
+%   given data (x,y). The data values may be scalars or vectors. Data points with the
+%   same site are replaced by their (weighted) average as in the builtin csaps
+%   function. 
+%
+% Input
+% x: vector of data sites
+%
+% y: vector of same lenght as x or matrix where y(:,i) is a data vector at site x(i)
+%
+% p: parameter between 0 and 1 that weights the rougness penalty
+% (high values result in smoother curves). Use CSSD_CV for automatic
+% selection.
+%
+% gamma: parameter between 0 and Infinity that weights the discontiuity
+% penalty (high values result in less discontinuities, gamma = Inf returns
+% a classical smoothing spline). Use CSSD_CV for automatic
+% selection.
+%
+% xx: (optional) evaluation points for the result
+%
+% delta: (optional) weights of the data sites. delta may be thought of as the
+% standard deviation of the at site x_i. Should have the same size as x.
+% - Note: The Matlab built in spline function csaps uses a different weight
+% convention (w). delta is related to Matlab's w by w = 1./delta.^2
+% - Note for vector-valued data: Weights are assumed to be identical over
+% vector-components. (Componentwise weights might be supported in a future version.)
+%
+% Output
+% output = cssd(...)
+% output.pp: ppform of a smoothing spline with discontinuities; if xx is specified,
+% the evaluation of the result at the points xx is returned
+% output.discont: locations of detected discontinuities, the locations are a
+% subset of the midpoints of the data sites x
+% output.interval_cell: a list of discrete indices between two discontinuities
+% output.pp_cell: a list of the cubic splines corresponding to the indices in interval_cell
+%
+%   See also CSAPS, CSSD_CV
+
+%%% BEGIN CHECK ARGUMENTS
+if nargin<5, xx = []; end
+if nargin<6, delta = []; end
+
+if isempty(delta), delta = ones(size(x)); end
+
+assert( (0 <= p) && (p <= 1), 'The p parameter must fulfill 0 <= p <= 1')
+assert( 0 <= gamma, 'The gamma parameter must fulfill 0 < gamma')
+
+% Matlab uses the parameter w which is related to delta of De Boor's book by w = 1./delta.^2
+w = 1./delta.^2;
+
+% checks arguments and creates column vectors (chckxywp is Matlab built in)
+[xi,yi,~,wi] = chckxywp(x,y,2,w,p);
+deltai = sqrt(1./wi);
+
+% Note: from now on we use the xi, yi, wi, deltai versions
+%%% END CHECK ARGUMENTS
+
+[N,D] = size(yi);
+
+% if gamma == Inf (discontinuity has infinite penalty), we may directly
+% compute a classical smoothing spline
+% also, if p == 1, we may straight compute an interpolating spline, no
+% matter how large gamma is (smoothness costs are equal to 0)
+if (gamma == Inf) || (p == 1)
+    pp = csaps(xi,yi',p,[],wi);
+    discont = [];
+    interval_cell = {1:N};
+    pp_cell = {fnxtr(pp,2)};
+else
+    % F stores Bellmann values
+    F = zeros(N, 1);
+    % partition: stores the optimal partition
+    partition = zeros(N, 1);
+
+    %%% BEGIN PIECEWISE LINEAR CASE
+    if p == 0 % the piecewise linear case
+        B = [ones(N,1), xi]./deltai(:);
+        rhs = yi./deltai(:);
+        % precompute eps_1r for r=1,...,N
+        A = [B, rhs];
+        G = planerot(A(1:2,1));
+        A(1:2, :) = G*A(1:2, :);
+        eps_1r = 0;
+        % loop starts from index three because eps_11 and eps_12 are zero
+        for r=3:N
+            G = planerot(A([1,r],1));
+            A([1,r],:) = G * A([1,r],:);
+            G = planerot(A([2,r],2));
+            A([2,r],2:end) = G * A([2,r],2:end);
+            eps_1r = eps_1r + sum(A(r, 3:end).^2);
+            % store the eps_1r as the initial Bellman value corresponding to a
+            % solution without discontinuities
+            F(r) = eps_1r;
+        end
+        %%% BEGIN MAIN LOOP
+        for rb=2:N
+
+            % best left bound (blb) initialized with 1 corresponding to interval 1:rb
+            % corresponding Bellman value has been set in the precomputation
+            blb = 1;
+
+            A = [B(1:rb,:), rhs(1:rb,:)];
+
+            eps_lr = 0;
+            % the loop is performed in reverse order so that we may use pruning
+            for lb = rb-1:-1:2
+                if lb == rb-1
+                    G = planerot(A([end,end-1],1));
+                    A([end,end-1], :) = G*A([end,end-1], :);
+                else
+                    G = planerot(A([end, lb],1));
+                    A([end,lb], :) = G * A([end,lb], :);
+                    G = planerot(A([end-1, lb],2));
+                    A([end-1,lb], 2:end) = G*A([end-1,lb], 2:end);
+                    eps_lr = eps_lr + sum(A(lb,3:end).^2);
+                end
+
+                % check if setting a discontinuity between lb-1 and lb gives a
+                % better energy
+                candidate_value = F(lb-1) + gamma  + eps_lr;
+                if candidate_value < F(rb)
+                    F(rb ) = candidate_value;
+                    blb = lb;
+                end
+                % store the best left bound corresponding to the right bound rb
+                partition( rb ) = blb-1;
+            end
+        end
+        %%% END MAIN LOOP
+
+        %%% END PIECEWISE LINEAR CASE
+
+
+        %%% BEGIN CSSD CASE
+    else % this is the standard case: gamma > 0 and 0 < p < 1
+        %%% BEGIN PRECOMPUTATIONS
+        beta = sqrt(1-p);
+        alpha = sqrt(p)./deltai;
+        d = diff(xi); % xi is sorted ascendingly
+
+        % precompute eps_1r for r=1,...,N
+        [eps_1r, R, z] = startEpsLR(yi(1:2,:), d(1), alpha(1:2), beta);
+        % loop starts from index three because eps_11 and eps_12 are zero
+        for r=3:N
+            [eps_1r, R, z] = updateEpsLR(eps_1r, R, yi(r,:), d(r-1), z, alpha(r), beta);
+            % store the eps_1r as the initial Bellman value corresponding to a
+            % solution without discontinuities
+            F(r) = eps_1r;
+        end
+        %%% END PRECOMPUTATIONS
+
+        %%% BEGIN MAIN LOOP
+        for rb=2:N
+
+            % best left bound (blb) initialized with 1 corresponding to interval 1:rb
+            % corresponding Bellman value has been set in the precomputation
+            blb = 1;
+
+            % the loop is performed in reverse order so that we may use pruning
+            for lb = rb-1:-1:2
+                if lb == rb-1
+                    % get start configuration and store start state in R, z
+                    [eps_lr, R, z] = startEpsLR(yi([rb,rb-1],:), d(rb-1), alpha([rb,rb-1]), beta);
+                else
+                    % perform fast energy update and store current state in R, z
+                    [eps_lr, R, z] = updateEpsLR(eps_lr, R, yi(lb,:), d(lb), z, alpha(lb), beta);
+                end
+
+                % pruning to skip unreachable configurations
+                % (if this condition is met the following if-condition cannot never
+                % be fulfilled because eps_lr is monote increasing and F >= 0.)
+                if (eps_lr + gamma) >= F(rb)
+                    break
+                end
+
+                % check if setting a discontinuity between lb-1 and lb gives a
+                % better energy
+                candidate_value = F(lb-1) + gamma  + eps_lr;
+                if candidate_value < F(rb)
+                    F(rb ) = candidate_value;
+                    blb = lb;
+                end
+
+            end
+            % store the best left bound corresponding to the right bound rb
+            partition( rb ) = blb-1;
+        end
+        %%% END MAIN LOOP
+
+
+    end
+    %%% END CSSD CASE
+
+    %%% BEGIN RECONSTRUCTION
+    % the discontinuity locations are coded in the array 'partition'. The
+    % vector [partition(rb)+1:rb] gives the indices of between two
+    % discontinuity locations. We start from behind with [partition(N)+1:N] and
+    % successively compute the preceding intervals.
+    rb = N;
+    pp_cell = {};
+    interval_cell = {};
+    discont = [];
+    upper_discont = xi(end) + 1;
+    while rb > 0
+        % partition(rb) stores corresponding optimal left bound lb
+        lb = partition(rb);
+        if lb == 0
+            lower_discont = xi(1) - 1;
+        else
+            lower_discont = (xi(lb+1) + xi(lb)) /2;
+        end
+        interval = (lb+1) : rb;
+        interval_cell{end+1} = interval; %#ok<AGROW> (runtime not critical in this part of the algorithm)
+        if length(interval) == 1 % this case should happen rarely but may happen e.g. for data of uneven length and low gamma parameter
+            ymtx = zeros(4,D);
+            ymtx(:,D) = yi(interval, :);
+            pp = ppmak([lower_discont, upper_discont], ymtx', D);
+        else
+            pp = csaps(xi(interval),yi(interval, :)', p, [], wi(interval));
+            pp = linext_pp(pp, lower_discont, upper_discont);
+            pp = embed_pptocubic(pp);
+        end
+        pp_cell{end+1} = pp; %#ok<AGROW> (runtime not critical in this part of the algorithm)
+        % continue with next right bound
+        rb = lb;
+        upper_discont = lower_discont;
+        discont(end+1) = lower_discont; %#ok<AGROW> (runtime not critical in this part of the algorithm)
+    end
+    %%% END RECONSTRUCTION
+
+    %%% BEGIN MAKE PP FORM
+    pp_cell = flip(pp_cell); % the pp's were computed in reverse order which is fixed here
+    interval_cell = flip(interval_cell);
+    pp = merge_ppcell(pp_cell);
+    %%% END MAKE PP FORM
+
+    discont = flip(discont(1:end-1))'; % the discontinuities were computed in reverse order which is fixed here
+
+end
+
+
+%%% BEGIN SET OUTPUT
+output.pp = pp;
+output.discont = discont;
+output.interval_cell = interval_cell;
+output.pp_cell = pp_cell;
+output.discont_idx = zeros(numel(interval_cell)-1, 1);
+for i = 1:numel(output.discont_idx)
+    output.discont_idx(i) = output.interval_cell{i}(end);
+end
+
+if isempty(xx)
+    output.yy = [];
+else
+    output.yy = ppval(pp, xx);
+end
+
+fun_cell = cell(numel(pp_cell),1);
+for i=1:numel(pp_cell)
+    fun_cell{i} = @(xx) ppval(pp_cell{i}, xx);
+end
+output.pcw_fun = PcwFunReal([-Inf; discont(:); Inf], fun_cell);
+%%% END SET OUTPUT
+
+end
+
+