Optimizing auto params #651

DiamonDinoia · 2025-04-03T15:46:15Z

@mreineck found a bug in the previous implementation of improving the heuristic choosing the upsampling factor.

Tuning the upsampling factor (for speed while meeting accuracy) requires knowing the density, which is determined by the number of non-uniform points divided by the number of modes. This ratio captures the balance between [spreading|interpolation] and FFT time.

The problem is that the current guru interface requests the number of NU points only at setpts, not at make_plan. I believe this is a good design choice, so I am not advocating for a change.

As @ahbarnett suggested, I think adding a new option in finufft_opts is a good approach.

Proposal:

hint_M: // or M_hint / nj_hint suggestions?

if opts.upsampfact != 0: 
    # Plan as is
    # The upsampling factor will never change, as it is specified by the user
elif opts.hint_M == 0:  # default
    # Delay upsampling factor determination and initialization to setpts
    # After this, the upsampling factor is fixed, and any further calls to setpts do not change it
elif opts.hint_M == -1:  # aggressive (for advanced users!)
    # Determine the optimal upsampling factor each time setpts is called
    # This is useful if the user calls setpts multiple times with a different number of NU points
    # Only improves performance if the transform is sped up more than the cost of redoing FFT planning and initialization
elif opts.hint_M > 0:  # fixed upsampling factor at planning
    # The user knows the number of NU points at planning, so the upsampling factor can be fully determined
    # Initialization and FFT plan happen at make_plan, and setpts does not change the upsampling factor

Explanation

The choice of the default value is intended to give the benefit of the tweak to the simple API, without requiring changes to the implementation for each function in each language wrapper.

The guru interface's use case is to pay NUFFT initialization and FFT planning only once in its current implementation. This proposal maintains the semantics by default, but it moves some of the costs to setpts (although it is still paid only once). It also allows for more flexibility, enabling for more control for advanced users.

PS

I am testing if copilot spots the issue. Just seeing if it can review the code in any meaningful way.
UPDATE: it does not...

…arams

…fft into optimizing-auto-params

mreineck · 2025-04-03T16:10:13Z

I can only strongly recommend merging setpts into the constructor at some point instead of introducing additional complications.
The only drawback I'm aware of is the potential cost of FFTW to fetch an already stored FFT plan from its own plan cache, which is on the order of milliseconds. This cost is incurred when running NUFFTs with the same uniform grid, but different (i.e. non-batchable) sets of nonuniform coordinates. It can be worked around by switching to ducc FFT.
Is this corner case really worth complicating life for almost all users?

Of course this is not a solution for 2.4, but rather for some future release 3.0.

ahbarnett · 2025-04-03T16:56:56Z

Ironically, this very fact (FFTW's time to look up existing plans can be milliseconds) is the very reason we shifted to the guru/plan interface in 2020! It is true we now have plan-free DUCC.FFT, so we can start to think about the future. But it's not obvious how to change the constructor. Let's keep talking about it. I don't have the research bandwidth for FINUFFT to change fast, so we'll have to take it step by step...

…

On Thu, Apr 3, 2025 at 12:13 PM mreineck ***@***.***> wrote: I can only strongly recommend merging setpts into he constructor at some point instead of introducing additional complications. The only drawback I'm aware of is the potential cost of FFTW to fetch an already stored FFT plan from its own plan cache, which is on the order of milliseconds. This cost is incurred when running NUFFTs with the same uniform grid, but different (i.e. non-batchable) sets of nonuniform coordinates. It can be worked around by switching to ducc FFT. Is this corner case really worth complicating life for almost all users? Of course this is not a solution for 2.4, but rather for some future release 3.0. — Reply to this email directly, view it on GitHub <#651 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSTT6XTI6KHCOY3JNOT2XVMPZAVCNFSM6AAAAAB2MSKQ76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZWGI4TSMRUHA> . You are receiving this because you were mentioned.Message ID: ***@***.***> [image: mreineck]*mreineck* left a comment (flatironinstitute/finufft#651) <#651 (comment)> I can only strongly recommend merging setpts into he constructor at some point instead of introducing additional complications. The only drawback I'm aware of is the potential cost of FFTW to fetch an already stored FFT plan from its own plan cache, which is on the order of milliseconds. This cost is incurred when running NUFFTs with the same uniform grid, but different (i.e. non-batchable) sets of nonuniform coordinates. It can be worked around by switching to ducc FFT. Is this corner case really worth complicating life for almost all users? Of course this is not a solution for 2.4, but rather for some future release 3.0. — Reply to this email directly, view it on GitHub <#651 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSTT6XTI6KHCOY3JNOT2XVMPZAVCNFSM6AAAAAB2MSKQ76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZWGI4TSMRUHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

DiamonDinoia · 2025-04-03T17:03:53Z

I agree that in 3.0 we could make more drastic decisions i.e. defaulting to ducc and merging plan&setpts.

Before defaulting to ducc I would have liked to have a pass to the 1D fft to see if there is anything i can do to optimize it.

Since that is something that might happen far ahead in the future. For now what would be the most sensible thing to do?
Especially if we move to upsampfact being a free parameters as it is in ducc.

mreineck · 2025-04-03T17:19:59Z

Independently of how this discussion progresses, I would be happy to look at the concrete problem parameters that demonstrated the FFTW overhead in 2020! Do you still have the information, @ahbarnett ? Or is it recorded on Github?

mreineck · 2025-04-03T17:21:13Z

@DiamonDinoia, if you really want to have a look at the FFT code, I'm happy to answer any questions you might have!

DiamonDinoia · 2025-04-03T18:10:04Z

Independently of how this discussion progresses, I would be happy to look at the concrete problem parameters that demonstrated the FFTW overhead in 2020! Do you still have the information, @ahbarnett ? Or is it recorded on Github?

I think is issue #27

mreineck · 2025-04-03T18:59:14Z

Perfect, thanks!

DiamonDinoia · 2025-04-03T20:08:38Z

For now I propose to assume density = 1 PR #652. Then we can refine the approach here for the next version.

ahbarnett · 2025-04-03T21:37:21Z

density=1 is good for now. We add the "M_hint" option for 2.5. Re slow FFTW-plan-lookup, see: https://github.com/flatironinstitute/finufft/blob/master/perftest/manysmallprobs.cpp including its comments...

…

On Thu, Apr 3, 2025 at 4:09 PM Marco Barbone ***@***.***> wrote: For now I propose to assume density = 1 PR #652 <#652>. Then we can refine the approach here for the next version. — Reply to this email directly, view it on GitHub <#651 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSWXRQ4ASZK2YMOXM4L2XWIN3AVCNFSM6AAAAAB2MSKQ76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZWHAYDSNJQGI> . You are receiving this because you were mentioned.Message ID: ***@***.***> [image: DiamonDinoia]*DiamonDinoia* left a comment (flatironinstitute/finufft#651) <#651 (comment)> For now I propose to assume density = 1 PR #652 <#652>. Then we can refine the approach here for the next version. — Reply to this email directly, view it on GitHub <#651 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSWXRQ4ASZK2YMOXM4L2XWIN3AVCNFSM6AAAAAB2MSKQ76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZWHAYDSNJQGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

mreineck · 2025-04-08T08:28:04Z

This is a bit orthogonal, but I wanted to point it out nonetheless, and this might be a good place.
When choosing a good upsampfac, we also need to take into account the precision of the underlying data type. The snippet below demonstrates that an upsampfac of 1.25 does not work for epsilon=1e-6 in single precision, while it works perfectly well in double precision:

# Script to demonstrate that upsampfac 1.25 is dangerous for
# single precision NUFFTs with small epsilon, and should therefore not be used
# as default

import numpy as np
import finufft

eps = 1e-6

x = np.random.uniform(0,2*np.pi,10000).astype(np.float32)
y = np.random.uniform(0,2*np.pi,10000).astype(np.float32)
z = np.random.uniform(0,2*np.pi,10000).astype(np.float32)

c =        np.random.uniform(-0.5, 0.5,10000).astype(np.float32) \
     + 1j *np.random.uniform(-0.5, 0.5,10000).astype(np.float32)

# single precision plan
plan1 = finufft.Plan(1, (32, 32, 32), dtype=np.complex64, eps=eps, debug=1)
plan1.setpts(x, y, z)
f1 = plan1.execute(c)

# double precision plan
plan2 = finufft.Plan(1, (32, 32, 32), dtype=np.complex128, eps=eps, debug=1)
plan2.setpts(x.astype(np.float64), y.astype(np.float64), z.astype(np.float64))
f2 = plan2.execute(c.astype(np.complex128))

# compute L2 error between results
l2err = np.sqrt(np.sum(np.abs(f1-f2)**2)/np.sum(np.abs(f2)**2))
print("L2 error between single precision (default upsampfac) and double precision: ", l2err)

# single precision plan, upsampfac 2
plan3 = finufft.Plan(1, (32, 32, 32), dtype=np.complex64, eps=eps, debug=1, upsampfac=2)
plan3.setpts(x, y, z)
f3 = plan3.execute(c)

# compute L2 error between results
l2err = np.sqrt(np.sum(np.abs(f3-f2)**2)/np.sum(np.abs(f2)**2))
print("L2 error between single precision (upsampfac=2) and double precision: ", l2err)

Whether upsampfac=1.25 is good enough also depends on dimensionality (accuracy gets worse from 1D to 3D).
I have some heuristics for this if you are interested.

ahbarnett · 2025-05-01T15:00:30Z

I agree. Upsampfac=1.25 does not let you get as close to eps_mach as 2.0 does. (Comparing to double-prec is not really fair since there are intrinsic limits due to accumulation of rounding error here; but comparing to 2.0 is fair).

Eg with 2.4.0-rc1 (tweaking params in fullmathtest.m):

CPU               	prec=single	tol=1e-06, M=1000, Ntot=1000, ntrans=3
Rel l2 errs:	1D type 1:	0.000398
		1D type 2:	0.000443
		1D type 3:	0.00033
		2D type 1:	6.91e-06
		2D type 2:	6.26e-06
		2D type 3:	3.88e-05
		3D type 1:	2.5e-06
		3D type 2:	2.47e-06
		3D type 3:	0.000398

Requesting tol=1e-4 improves the bad errors there down to the 1e-4 level.
With usf=2.0:

CPU               	prec=single	tol=1e-06, M=1000, Ntot=1000, ntrans=3
Rel l2 errs:	1D type 1:	7.68e-05
		1D type 2:	8.28e-05
		1D type 3:	9.29e-05
		2D type 1:	4.34e-06
		2D type 2:	4.22e-06
		2D type 3:	7.36e-06
		3D type 1:	1.82e-06
		3D type 2:	1.91e-06
		3D type 3:	3.89e-06

Note that usf=2.0 is limited in 1D by N1*eps_mach anyway.

I propose to adjust the heuristics so that if tol is within about 2 digits of eps_mach, usf=2.0 is always enforced. This will come with a loss of speed and a growth of RAM, but the advanced user should be exploring usf setting anyway.

That should cover it for now, pending better kernel design, which I would like to have time to do...

mreineck · 2025-05-01T15:21:50Z

I ran the double precision plan only to establish a kind of "ground truth" to compare the other results to. I admit I did that incorrectly - I should have used a much lower epsilon for that purpose, like so:

plan2 = finufft.Plan(1, (32, 32, 32), dtype=np.complex128, eps=1e-12, debug=1)

I just retried this, but the results didn't really change.

ahbarnett · 2025-05-27T15:50:26Z

With regard to Marco's M_hint suggested logic, I propose to merge settings "-1" into "0", and have setpts always then recompute heuristic for usf given the M sent into setpts.

DiamonDinoia added 9 commits February 4, 2025 15:57

initial upsampfact tweak

4a1f0b2

fixed the function

67c9c4a

updated test cases

6010f32

fixed the sizes

f82d485

added tpye1,2 heuristic

b5c4ff8

added ducc0 upsampfact tweak

d96569d

Merge remote-tracking branch 'flatiron/master' into optimizing-auto-p…

f1bfcfa

…arams

updated changelog

cf664f1

Merge remote-tracking branch 'flatiron/master' into optimizing-auto-p…

2993ae8

…arams

DiamonDinoia requested a review from Copilot April 3, 2025 15:46

This comment was marked as resolved.

Sign in to view

Merge branch 'optimizing-auto-params' of github.com:DiamonDinoia/finu…

864cc9d

…fft into optimizing-auto-params

DiamonDinoia mentioned this pull request May 1, 2025

using 2.00 when withing two digits of machine precision #671

Merged

ahbarnett added this to the 2.5 milestone May 27, 2025

janden force-pushed the master branch from 0da0e59 to 9629084 Compare July 8, 2025 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimizing auto params #651

Optimizing auto params #651

Uh oh!

DiamonDinoia commented Apr 3, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

mreineck commented Apr 3, 2025 •

edited

Loading

Uh oh!

ahbarnett commented Apr 3, 2025 via email

Uh oh!

DiamonDinoia commented Apr 3, 2025

Uh oh!

mreineck commented Apr 3, 2025

Uh oh!

mreineck commented Apr 3, 2025

Uh oh!

DiamonDinoia commented Apr 3, 2025

Uh oh!

mreineck commented Apr 3, 2025

Uh oh!

DiamonDinoia commented Apr 3, 2025

Uh oh!

ahbarnett commented Apr 3, 2025 via email

Uh oh!

mreineck commented Apr 8, 2025

Uh oh!

ahbarnett commented May 1, 2025 •

edited

Loading

Uh oh!

mreineck commented May 1, 2025

Uh oh!

ahbarnett commented May 27, 2025

Uh oh!

Uh oh!

Optimizing auto params #651

Are you sure you want to change the base?

Optimizing auto params #651

Uh oh!

Conversation

DiamonDinoia commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposal:

Explanation

PS

Uh oh!

This comment was marked as resolved.

Uh oh!

mreineck commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahbarnett commented Apr 3, 2025 via email

Uh oh!

DiamonDinoia commented Apr 3, 2025

Uh oh!

mreineck commented Apr 3, 2025

Uh oh!

mreineck commented Apr 3, 2025

Uh oh!

DiamonDinoia commented Apr 3, 2025

Uh oh!

mreineck commented Apr 3, 2025

Uh oh!

DiamonDinoia commented Apr 3, 2025

Uh oh!

ahbarnett commented Apr 3, 2025 via email

Uh oh!

mreineck commented Apr 8, 2025

Uh oh!

ahbarnett commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mreineck commented May 1, 2025

Uh oh!

ahbarnett commented May 27, 2025

Uh oh!

Uh oh!

DiamonDinoia commented Apr 3, 2025 •

edited

Loading

mreineck commented Apr 3, 2025 •

edited

Loading

ahbarnett commented May 1, 2025 •

edited

Loading