Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite of the smooth continuum algorithm for speed and stability #186

Merged
merged 21 commits into from
Oct 30, 2024

Conversation

moustakas
Copy link
Member

@moustakas moustakas commented Oct 29, 2024

Partly motivated by #171, this PR contains a rewrite of the smooth continuum algorithm. Instead of relying on a median-smoothing of the residual spectrum in sliding bins, we now use scipy.interpolate.UnivariateSpline, which has the advantage of using the inverse variance as a statistical weight.

Here's one example that was previously quite problematic: main-dark-9196-39633010890377403.

New algorithm:

image

Old algorithm:

image

Note especially how the wings of the broad Halpha line are compromised in the current vs the new algorithm.

jdbuhler and others added 15 commits September 12, 2024 10:36
interpolation spline fitting with a smoothed version that need
not go through every point and weight the importance of the points
according to the local variability of the residual after removing
the main fitted continuum model.  The goal is to permit robust
fitting while eliminating the need for very expensive median filtering
afterwards -- the new version is about 20x faster.
interpolation spline fitting with a smoothed version that need
not go through every point and weight the importance of the points
according to the local variability of the residual after removing
the main fitted continuum model.  The goal is to permit robust
fitting while eliminating the need for very expensive median filtering
afterwards -- the new version is about 20x faster.
* tag a couple of np.arange() calls as producing floats,
  since they are immediately used for FP arithmetic
potentially quadratic string concatenation
@moustakas
Copy link
Member Author

For completeness, here are two other diagnostic plots which are related to the smooth-continuum modeling (for the example object shown at the top of this PR):

This first figure shows the initial line-fitting which is done in order to get a good estimate of the line-widths and, in particular, to determine if there are broad Balmer lines present. The line-widths, in turn, determine how aggressively to mask the data before continuum-fitting:

image

And the second figure shows the line-masking, which includes an effort to determine the signal-to-noise ratio of each line (poorly detected lines are not masked because they won't impact the continuum-fitting):

image

@moustakas
Copy link
Member Author

In the example shown above, the data reduction is good; however, there are cases like the ones shown in desihub/desispec#2193 where there are large inter-camera breaks.

For example, fitting all three cameras simultaneously for the following object leads to unsatisfactory results:

fastspec /global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/80605/20210205/redrock-7-80605-thru20210205.fits \
  --targetids 39627676687795663 -o f.fits --debug-plots

image

Instead, with the updates to the code, each camera can be fitted independently, which yields nice results. (This all supposes, of course, that the redshift is correct, which, in this case it's not; nevertheless, I think this is the behavior we want):

image

And here's the final result / fit:

image

@moustakas
Copy link
Member Author

I've also taken the opportunity in this PR to update how we estimate the "continuum fluxes", namely the fluxes at 1450, 1500, 1700, 2800, 3000, and 5100 Angstrom (rest) and at 1215.67, 3728.48, 4862.71, 5008.24, and 6564.6 Angstrom (i.e., the continuum level under Lyman-alpha, [OII], H-beta, [OIII] 5007, and H-alpha).

In main, we use a median-smooth, which is slow. Here, I've switched to computing sigma-clipped statistics and to fitting a simple line to the data. I've also added a new diagnostic plot (which revealed at least one bug---we were using the continuum model which included emission lines, rather than the line-free model):

Here's an example where the algorithm works quite nicely:

image

And here's an example where Lyman-alpha and [OII] aren't perfect because of the age of the stellar population; however, the error is <10% and probably adequate for this PR:

image

@moustakas moustakas merged commit 4f02981 into main Oct 30, 2024
12 checks passed
@moustakas moustakas deleted the smooth_cont branch October 30, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant