Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legendre nnls merge with main #307

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

craigwarner-ufastro
Copy link
Contributor

Created this branch to sort out the differences between main and legendre_nnls. Many changed had been made to main since legendre_nnls was branched off and some conflicts had to be manually resolved, particularly with bands in archetypes.py.

craigwarner-ufastro and others added 9 commits February 27, 2024 13:48
vector (in tdata) so that NNLS can be used instead of BVLS.

Fixed bug when ValueError excepted in rebinning
negative legendre terms.  Refactored a little to clean up.

For archetypes with legendre terms, NNLS is now actually used with
additional terms of -1*each legendre term so that we can use the faster
NNLS to get the same result as BVLS.
because on CPU, the additional computation time for the larger array
sizes dominates any time savings of NNLS vs BVLS
negative legendre terms on the GPU in lieu of BVLS into
per_camera_coeff_with_least_square_batch.  Moved prior_on_coeffs into
zscan.py, called from within per_camera_coeff_with_least_square_batch.
Pass prior_sigma, a scalar, insted of the array from fitz to archetypes.

Bug fix on case where only some object types, not all, have an
archetype.
terms with 0 coefficients and make the coeffs negative for negative
terms.

Refactored to remove the last bit of the BVLS-NNLS trick from
archetypes.py and zfind.py by passing binned instead of tdata to
per_camera_coeff_with_least_square_batch.
involve using a list of bands instead of ncam to be more general.
because results were different and timing as well.
…re_nnls_merge_with_main

Resolved conflicts
@coveralls
Copy link

Coverage Status

coverage: 38.513% (-0.5%) from 39.022%
when pulling 425d3fc on legendre_nnls_merge_with_main
into ca78381 on main.

@craigwarner-ufastro
Copy link
Contributor Author

@sbailey
Filling in more context, this branch was created to rectify conflicts between legendre_nnls and main that had popped up due to changes to main after branching off legendre_nnls. The changes to the legendre_nnls branch included refactoring code such that everything related to the trick of using negative legendre terms and NNLS instead of BVLS was in per_camera_coeff_with_least_square_batch in zscan.py. Additionally, the output was changed to remove the 0 coefficients in the solution - when we create negative copies of the legendre terms and fit 2*nleg + narch terms, half of the coefficients in the fit are 0 - either the positive or negative version of each legendre term. If the negative legendre term has a non-zero coefficent, we multiply that coefficient by -1. Thus our output is exactly the same as if BVLS had been used.

When this was initially pushed out, there were conflicts, which I created a new branch to resolve.

  1. The addition by Abhijeet of
    bands (list): list of wavelength bands

instead of ncam as a parameter to per_camera_coeff_with_least_square_batch.

  1. Moving the prior calculation out of fitz.py and into zscan.py which is where we want it. The best way to proceed moving the legendre-specific manipulations out of archetypes is what I sent you last Monday, that instead of passing tdata to per_camera_coeff_with_least_square_batch, we pass binned to per_camera_coeff_with_least_square_batch and assemble the tdata array there. This consolidates everything related to the BVLS/NNLS trick within that single method per_camera_coeff_with_least_square_batch.

  2. The output COEFF should be collapsed to be the same size as if we had not added the negative legendre terms by removing the zero terms and if a negative term was nonzero, multiply that by -1.E.g. if we have a row [r_i, l1_c1_i, l2_c1_i, -l1_c1_i, -l2_c1_i, l1_c2_i, l2_c2_i, -l1_c2_i, -l2_c2_i, l1_c3_1, l2_c3_i, -l1_c3_i, -l2_c3_i]
    Where r_i is the rebinned data for that wavelength and ln_cm is the nth legendre term for the mth camera for wavelength index i and let's say it equals
    [1, 2, 0, 0, 3, 0, 0, 4, 5, 6, 7, 0, 0]
    then we want that row in the output to be collapsed to
    [1, 2, -3, -4, -5, 6, 7]

After attempting to resolve conflicts, and realizing I needed to update to a newer desi environment, everything ran happily but the results did not match those from legendre_nnls previously so I created this new branch to handle the merge.

After a lot of playing around with various branches in various desi environments, I have finally satisfied myself that everything is ok with my branch to merge into main. There were major output differences between running in environment 23.1 vs 24.4 AND major differences between my legendre_nnls and legendre_nnls_merge_with_main branches running under 24.4. I have finally concluded all of these are due to template changes and code changes unrelated to anything in my updates.If I run from the main branch without archetypes, I now get an identical output to legendre_nnls_merge_with_main. If I run with archetypes, I get an np.allclose agreement between them:

cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/abhijeet-nnls-merge-244.fits /pscratch/sd/c/cdwarner/abhijeet-main-244.fits
col='Z'  ndiff=0  std=0.0
col='ZERR'  ndiff=0  std=0.0
col='CHI2'  ndiff=416  std=1.7036060042842667e-10

FYI here are the differences in my legendre_nnls branch with exact same code and environment 23.1 vs 24.4:

cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/redrock-nnls-231.fits /pscratch/sd/c/cdwarner/redrock-nnls-244.fits 
col='Z'  ndiff=421  std=2.0492627134013167e-05
col='ZERR'  ndiff=446  std=4.637423724708316e-07
col='CHI2'  ndiff=412  std=42.88757255960495

And even bigger the difference between legendre_nnls and legendre_nnls_merge_with_main

cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/redrock-nnls-244.fits /pscratch/sd/c/cdwarner/redrock-nnls-merge-244.fits 
col='Z'  ndiff=355  std=0.2950015931780382
col='ZERR'  ndiff=411  std=1.406718431201316e-05
col='CHI2'  ndiff=351  std=5.980265673243861e+83

Keep in mind that this is without archetypes! - so not touching anything I recently did. Clearly one file had a 9e99 flag set in one code version vs the other.With --archetypes,

cdwarner@nid002624:/global/cfs/cdirs/desi/users/cdwarner/code> python compare_redrock_output.py /pscratch/sd/c/cdwarner/abhijeet-nnls-244.fits /pscratch/sd/c/cdwarner/abhijeet-nnls-merge-244.fits 
col='Z'  ndiff=311  std=0.6753668041725044
col='ZERR'  ndiff=362  std=0.00024064833100371678
col='CHI2'  ndiff=447  std=3206.6975053943816

But again all of these differences seem to come from the templates and code independent of anything I did and the main branch seems to agree with my legendre_nnls_merge_with_main branch.

I did notice that there were changes in the templates between the various DESI environments and between the main/merged code versus legendre_nnls in the print statements on reading the templates.

Finally there is still a significant slowdown in the new version in the fine redshift scan that I am investigating that is independent of archetypes and any code I had touched in the legendre_nnls branch that I attribute to a change in templates as a null hypothesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants