-
Notifications
You must be signed in to change notification settings - Fork 91
Optimizing auto params #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Optimizing auto params #651
Conversation
…fft into optimizing-auto-params
I can only strongly recommend merging Of course this is not a solution for 2.4, but rather for some future release 3.0. |
Ironically, this very fact (FFTW's time to look up existing plans can be
milliseconds) is the very reason we shifted to the guru/plan interface in
2020! It is true we now have plan-free DUCC.FFT, so we can start to think
about the future. But it's not obvious how to change the constructor. Let's
keep talking about it. I don't have the research bandwidth for FINUFFT to
change fast, so we'll have to take it step by step...
…On Thu, Apr 3, 2025 at 12:13 PM mreineck ***@***.***> wrote:
I can only strongly recommend merging setpts into he constructor at some
point instead of introducing additional complications.
The only drawback I'm aware of is the potential cost of FFTW to fetch an
already stored FFT plan from its own plan cache, which is on the order of
milliseconds. This cost is incurred when running NUFFTs with the same
uniform grid, but different (i.e. non-batchable) sets of nonuniform
coordinates. It can be worked around by switching to ducc FFT.
Is this corner case really worth complicating life for almost all users?
Of course this is not a solution for 2.4, but rather for some future
release 3.0.
—
Reply to this email directly, view it on GitHub
<#651 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSTT6XTI6KHCOY3JNOT2XVMPZAVCNFSM6AAAAAB2MSKQ76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZWGI4TSMRUHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: mreineck]*mreineck* left a comment (flatironinstitute/finufft#651)
<#651 (comment)>
I can only strongly recommend merging setpts into he constructor at some
point instead of introducing additional complications.
The only drawback I'm aware of is the potential cost of FFTW to fetch an
already stored FFT plan from its own plan cache, which is on the order of
milliseconds. This cost is incurred when running NUFFTs with the same
uniform grid, but different (i.e. non-batchable) sets of nonuniform
coordinates. It can be worked around by switching to ducc FFT.
Is this corner case really worth complicating life for almost all users?
Of course this is not a solution for 2.4, but rather for some future
release 3.0.
—
Reply to this email directly, view it on GitHub
<#651 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSTT6XTI6KHCOY3JNOT2XVMPZAVCNFSM6AAAAAB2MSKQ76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZWGI4TSMRUHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
I agree that in 3.0 we could make more drastic decisions i.e. defaulting to ducc and merging plan&setpts. Before defaulting to ducc I would have liked to have a pass to the 1D fft to see if there is anything i can do to optimize it. Since that is something that might happen far ahead in the future. For now what would be the most sensible thing to do? |
Independently of how this discussion progresses, I would be happy to look at the concrete problem parameters that demonstrated the FFTW overhead in 2020! Do you still have the information, @ahbarnett ? Or is it recorded on Github? |
@DiamonDinoia, if you really want to have a look at the FFT code, I'm happy to answer any questions you might have! |
I think is issue #27 |
Perfect, thanks! |
For now I propose to assume density = 1 PR #652. Then we can refine the approach here for the next version. |
density=1 is good for now. We add the "M_hint" option for 2.5.
Re slow FFTW-plan-lookup, see:
https://github.com/flatironinstitute/finufft/blob/master/perftest/manysmallprobs.cpp
including its comments...
…On Thu, Apr 3, 2025 at 4:09 PM Marco Barbone ***@***.***> wrote:
For now I propose to assume density = 1 PR #652
<#652>. Then we can
refine the approach here for the next version.
—
Reply to this email directly, view it on GitHub
<#651 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSWXRQ4ASZK2YMOXM4L2XWIN3AVCNFSM6AAAAAB2MSKQ76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZWHAYDSNJQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: DiamonDinoia]*DiamonDinoia* left a comment
(flatironinstitute/finufft#651)
<#651 (comment)>
For now I propose to assume density = 1 PR #652
<#652>. Then we can
refine the approach here for the next version.
—
Reply to this email directly, view it on GitHub
<#651 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSWXRQ4ASZK2YMOXM4L2XWIN3AVCNFSM6AAAAAB2MSKQ76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZWHAYDSNJQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
This is a bit orthogonal, but I wanted to point it out nonetheless, and this might be a good place. # Script to demonstrate that upsampfac 1.25 is dangerous for
# single precision NUFFTs with small epsilon, and should therefore not be used
# as default
import numpy as np
import finufft
eps = 1e-6
x = np.random.uniform(0,2*np.pi,10000).astype(np.float32)
y = np.random.uniform(0,2*np.pi,10000).astype(np.float32)
z = np.random.uniform(0,2*np.pi,10000).astype(np.float32)
c = np.random.uniform(-0.5, 0.5,10000).astype(np.float32) \
+ 1j *np.random.uniform(-0.5, 0.5,10000).astype(np.float32)
# single precision plan
plan1 = finufft.Plan(1, (32, 32, 32), dtype=np.complex64, eps=eps, debug=1)
plan1.setpts(x, y, z)
f1 = plan1.execute(c)
# double precision plan
plan2 = finufft.Plan(1, (32, 32, 32), dtype=np.complex128, eps=eps, debug=1)
plan2.setpts(x.astype(np.float64), y.astype(np.float64), z.astype(np.float64))
f2 = plan2.execute(c.astype(np.complex128))
# compute L2 error between results
l2err = np.sqrt(np.sum(np.abs(f1-f2)**2)/np.sum(np.abs(f2)**2))
print("L2 error between single precision (default upsampfac) and double precision: ", l2err)
# single precision plan, upsampfac 2
plan3 = finufft.Plan(1, (32, 32, 32), dtype=np.complex64, eps=eps, debug=1, upsampfac=2)
plan3.setpts(x, y, z)
f3 = plan3.execute(c)
# compute L2 error between results
l2err = np.sqrt(np.sum(np.abs(f3-f2)**2)/np.sum(np.abs(f2)**2))
print("L2 error between single precision (upsampfac=2) and double precision: ", l2err) Whether upsampfac=1.25 is good enough also depends on dimensionality (accuracy gets worse from 1D to 3D). |
I agree. Upsampfac=1.25 does not let you get as close to eps_mach as 2.0 does. (Comparing to double-prec is not really fair since there are intrinsic limits due to accumulation of rounding error here; but comparing to 2.0 is fair). Eg with 2.4.0-rc1 (tweaking params in fullmathtest.m):
Requesting tol=1e-4 improves the bad errors there down to the 1e-4 level.
Note that usf=2.0 is limited in 1D by N1*eps_mach anyway. I propose to adjust the heuristics so that if tol is within about 2 digits of eps_mach, usf=2.0 is always enforced. This will come with a loss of speed and a growth of RAM, but the advanced user should be exploring usf setting anyway. That should cover it for now, pending better kernel design, which I would like to have time to do... |
I ran the double precision plan only to establish a kind of "ground truth" to compare the other results to. I admit I did that incorrectly - I should have used a much lower plan2 = finufft.Plan(1, (32, 32, 32), dtype=np.complex128, eps=1e-12, debug=1) I just retried this, but the results didn't really change. |
With regard to Marco's M_hint suggested logic, I propose to merge settings "-1" into "0", and have setpts always then recompute heuristic for usf given the M sent into setpts. |
@mreineck found a bug in the previous implementation of improving the heuristic choosing the upsampling factor.
Tuning the upsampling factor (for speed while meeting accuracy) requires knowing the density, which is determined by the number of non-uniform points divided by the number of modes. This ratio captures the balance between [spreading|interpolation] and FFT time.
The problem is that the current guru interface requests the number of NU points only at
setpts
, not atmake_plan
. I believe this is a good design choice, so I am not advocating for a change.As @ahbarnett suggested, I think adding a new option in
finufft_opts
is a good approach.Proposal:
hint_M
: // orM_hint
/nj_hint
suggestions?Explanation
The choice of the default value is intended to give the benefit of the tweak to the simple API, without requiring changes to the implementation for each function in each language wrapper.
The guru interface's use case is to pay NUFFT initialization and FFT planning only once in its current implementation. This proposal maintains the semantics by default, but it moves some of the costs to
setpts
(although it is still paid only once). It also allows for more flexibility, enabling for more control for advanced users.PS
I am testing if copilot spots the issue. Just seeing if it can review the code in any meaningful way.
UPDATE: it does not...