rewrite of the smooth continuum algorithm for speed and stability #186

moustakas · 2024-10-29T01:50:24Z

Partly motivated by #171, this PR contains a rewrite of the smooth continuum algorithm. Instead of relying on a median-smoothing of the residual spectrum in sliding bins, we now use scipy.interpolate.UnivariateSpline, which has the advantage of using the inverse variance as a statistical weight.

Here's one example that was previously quite problematic: main-dark-9196-39633010890377403.

New algorithm:

Old algorithm:

Note especially how the wings of the broad Halpha line are compromised in the current vs the new algorithm.

interpolation spline fitting with a smoothed version that need not go through every point and weight the importance of the points according to the local variability of the residual after removing the main fitted continuum model. The goal is to permit robust fitting while eliminating the need for very expensive median filtering afterwards -- the new version is about 20x faster.

…into smooth_cont

* tag a couple of np.arange() calls as producing floats, since they are immediately used for FP arithmetic

potentially quadratic string concatenation

…snr>3

moustakas · 2024-10-29T12:42:49Z

For completeness, here are two other diagnostic plots which are related to the smooth-continuum modeling (for the example object shown at the top of this PR):

This first figure shows the initial line-fitting which is done in order to get a good estimate of the line-widths and, in particular, to determine if there are broad Balmer lines present. The line-widths, in turn, determine how aggressively to mask the data before continuum-fitting:

And the second figure shows the line-masking, which includes an effort to determine the signal-to-noise ratio of each line (poorly detected lines are not masked because they won't impact the continuum-fitting):

moustakas · 2024-10-29T14:55:15Z

In the example shown above, the data reduction is good; however, there are cases like the ones shown in desihub/desispec#2193 where there are large inter-camera breaks.

For example, fitting all three cameras simultaneously for the following object leads to unsatisfactory results:

fastspec /global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/80605/20210205/redrock-7-80605-thru20210205.fits \
  --targetids 39627676687795663 -o f.fits --debug-plots

Instead, with the updates to the code, each camera can be fitted independently, which yields nice results. (This all supposes, of course, that the redshift is correct, which, in this case it's not; nevertheless, I think this is the behavior we want):

And here's the final result / fit:

…nally relative to the data, not the continuum model

…es template)

moustakas · 2024-10-30T12:49:45Z

I've also taken the opportunity in this PR to update how we estimate the "continuum fluxes", namely the fluxes at 1450, 1500, 1700, 2800, 3000, and 5100 Angstrom (rest) and at 1215.67, 3728.48, 4862.71, 5008.24, and 6564.6 Angstrom (i.e., the continuum level under Lyman-alpha, [OII], H-beta, [OIII] 5007, and H-alpha).

In main, we use a median-smooth, which is slow. Here, I've switched to computing sigma-clipped statistics and to fitting a simple line to the data. I've also added a new diagnostic plot (which revealed at least one bug---we were using the continuum model which included emission lines, rather than the line-free model):

Here's an example where the algorithm works quite nicely:

And here's an example where Lyman-alpha and [OII] aren't perfect because of the age of the stellar population; however, the error is <10% and probably adequate for this PR:

jdbuhler and others added 15 commits September 12, 2024 10:36

Merge branch 'smooth_cont' of https://github.com/desihub/fastspecfit …

e93dc5f

…into smooth_cont

* simplify generation of complex log message

c602eb5

* tag a couple of np.arange() calls as producing floats, since they are immediately used for FP arithmetic

* whoops -- fix typo in log printing

fc19989

Rewrite some complex logging to avoid redundant code and

75c3db4

potentially quadratic string concatenation

fix incorrect var name not caught in local testing

202af59

fixed conflicts with main

aae3bd7

bug in linemask qa

d9c80e9

move _niceline to qa as a utility

fab28d9

smarter line-masking

160154a

estimate the snr of every line in the wavelength range and only mask …

b39c631

…snr>3

work on optimizing masking magic numbers

5102f49

remove debugging; better QA

b35c571

update change log [ci skip]

e0e0ab8

derive the smooth continuum correction per-camera

c5e8d38

moustakas added 4 commits October 29, 2024 08:57

compute smoothcorr stats as the mean of the smooth correction fractio…

c1da932

…nally relative to the data, not the continuum model

remove median-smoothing from continuum_flux code; bug fix (use no-lin…

dfe9ab7

…es template)

sigmaclip now returns mask

d9277a2

even smarter continuum_fluxes code; better QA

a83a7cb

update smoothcorr data model

9f9269c

moustakas merged commit 4f02981 into main Oct 30, 2024
12 checks passed

moustakas deleted the smooth_cont branch October 30, 2024 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rewrite of the smooth continuum algorithm for speed and stability #186

rewrite of the smooth continuum algorithm for speed and stability #186

moustakas commented Oct 29, 2024 •

edited

Loading

moustakas commented Oct 29, 2024

moustakas commented Oct 29, 2024

moustakas commented Oct 30, 2024

rewrite of the smooth continuum algorithm for speed and stability #186

rewrite of the smooth continuum algorithm for speed and stability #186

Conversation

moustakas commented Oct 29, 2024 • edited Loading

moustakas commented Oct 29, 2024

moustakas commented Oct 29, 2024

moustakas commented Oct 30, 2024

moustakas commented Oct 29, 2024 •

edited

Loading