Skip to content

Optimizing auto params #651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

DiamonDinoia
Copy link
Collaborator

@DiamonDinoia DiamonDinoia commented Apr 3, 2025

@mreineck found a bug in the previous implementation of improving the heuristic choosing the upsampling factor.

Tuning the upsampling factor (for speed while meeting accuracy) requires knowing the density, which is determined by the number of non-uniform points divided by the number of modes. This ratio captures the balance between [spreading|interpolation] and FFT time.

The problem is that the current guru interface requests the number of NU points only at setpts, not at make_plan. I believe this is a good design choice, so I am not advocating for a change.

As @ahbarnett suggested, I think adding a new option in finufft_opts is a good approach.

Proposal:

hint_M: // or M_hint / nj_hint suggestions?

if opts.upsampfact != 0: 
    # Plan as is
    # The upsampling factor will never change, as it is specified by the user
elif opts.hint_M == 0:  # default
    # Delay upsampling factor determination and initialization to setpts
    # After this, the upsampling factor is fixed, and any further calls to setpts do not change it
elif opts.hint_M == -1:  # aggressive (for advanced users!)
    # Determine the optimal upsampling factor each time setpts is called
    # This is useful if the user calls setpts multiple times with a different number of NU points
    # Only improves performance if the transform is sped up more than the cost of redoing FFT planning and initialization
elif opts.hint_M > 0:  # fixed upsampling factor at planning
    # The user knows the number of NU points at planning, so the upsampling factor can be fully determined
    # Initialization and FFT plan happen at make_plan, and setpts does not change the upsampling factor

Explanation

The choice of the default value is intended to give the benefit of the tweak to the simple API, without requiring changes to the implementation for each function in each language wrapper.

The guru interface's use case is to pay NUFFT initialization and FFT planning only once in its current implementation. This proposal maintains the semantics by default, but it moves some of the costs to setpts (although it is still paid only once). It also allows for more flexibility, enabling for more control for advanced users.

PS

I am testing if copilot spots the issue. Just seeing if it can review the code in any meaningful way.
UPDATE: it does not...

@DiamonDinoia DiamonDinoia requested a review from Copilot April 3, 2025 15:46
Copilot

This comment was marked as resolved.

@mreineck
Copy link
Collaborator

mreineck commented Apr 3, 2025

I can only strongly recommend merging setpts into the constructor at some point instead of introducing additional complications.
The only drawback I'm aware of is the potential cost of FFTW to fetch an already stored FFT plan from its own plan cache, which is on the order of milliseconds. This cost is incurred when running NUFFTs with the same uniform grid, but different (i.e. non-batchable) sets of nonuniform coordinates. It can be worked around by switching to ducc FFT.
Is this corner case really worth complicating life for almost all users?

Of course this is not a solution for 2.4, but rather for some future release 3.0.

@ahbarnett
Copy link
Collaborator

ahbarnett commented Apr 3, 2025 via email

@DiamonDinoia
Copy link
Collaborator Author

I agree that in 3.0 we could make more drastic decisions i.e. defaulting to ducc and merging plan&setpts.

Before defaulting to ducc I would have liked to have a pass to the 1D fft to see if there is anything i can do to optimize it.

Since that is something that might happen far ahead in the future. For now what would be the most sensible thing to do?
Especially if we move to upsampfact being a free parameters as it is in ducc.

@mreineck
Copy link
Collaborator

mreineck commented Apr 3, 2025

Independently of how this discussion progresses, I would be happy to look at the concrete problem parameters that demonstrated the FFTW overhead in 2020! Do you still have the information, @ahbarnett ? Or is it recorded on Github?

@mreineck
Copy link
Collaborator

mreineck commented Apr 3, 2025

@DiamonDinoia, if you really want to have a look at the FFT code, I'm happy to answer any questions you might have!

@DiamonDinoia
Copy link
Collaborator Author

Independently of how this discussion progresses, I would be happy to look at the concrete problem parameters that demonstrated the FFTW overhead in 2020! Do you still have the information, @ahbarnett ? Or is it recorded on Github?

I think is issue #27

@mreineck
Copy link
Collaborator

mreineck commented Apr 3, 2025

Perfect, thanks!

@DiamonDinoia
Copy link
Collaborator Author

For now I propose to assume density = 1 PR #652. Then we can refine the approach here for the next version.

@ahbarnett
Copy link
Collaborator

ahbarnett commented Apr 3, 2025 via email

@mreineck
Copy link
Collaborator

mreineck commented Apr 8, 2025

This is a bit orthogonal, but I wanted to point it out nonetheless, and this might be a good place.
When choosing a good upsampfac, we also need to take into account the precision of the underlying data type. The snippet below demonstrates that an upsampfac of 1.25 does not work for epsilon=1e-6 in single precision, while it works perfectly well in double precision:

# Script to demonstrate that upsampfac 1.25 is dangerous for
# single precision NUFFTs with small epsilon, and should therefore not be used
# as default

import numpy as np
import finufft

eps = 1e-6

x = np.random.uniform(0,2*np.pi,10000).astype(np.float32)
y = np.random.uniform(0,2*np.pi,10000).astype(np.float32)
z = np.random.uniform(0,2*np.pi,10000).astype(np.float32)

c =        np.random.uniform(-0.5, 0.5,10000).astype(np.float32) \
     + 1j *np.random.uniform(-0.5, 0.5,10000).astype(np.float32)

# single precision plan
plan1 = finufft.Plan(1, (32, 32, 32), dtype=np.complex64, eps=eps, debug=1)
plan1.setpts(x, y, z)
f1 = plan1.execute(c)

# double precision plan
plan2 = finufft.Plan(1, (32, 32, 32), dtype=np.complex128, eps=eps, debug=1)
plan2.setpts(x.astype(np.float64), y.astype(np.float64), z.astype(np.float64))
f2 = plan2.execute(c.astype(np.complex128))

# compute L2 error between results
l2err = np.sqrt(np.sum(np.abs(f1-f2)**2)/np.sum(np.abs(f2)**2))
print("L2 error between single precision (default upsampfac) and double precision: ", l2err)

# single precision plan, upsampfac 2
plan3 = finufft.Plan(1, (32, 32, 32), dtype=np.complex64, eps=eps, debug=1, upsampfac=2)
plan3.setpts(x, y, z)
f3 = plan3.execute(c)

# compute L2 error between results
l2err = np.sqrt(np.sum(np.abs(f3-f2)**2)/np.sum(np.abs(f2)**2))
print("L2 error between single precision (upsampfac=2) and double precision: ", l2err)

Whether upsampfac=1.25 is good enough also depends on dimensionality (accuracy gets worse from 1D to 3D).
I have some heuristics for this if you are interested.

@ahbarnett
Copy link
Collaborator

ahbarnett commented May 1, 2025

I agree. Upsampfac=1.25 does not let you get as close to eps_mach as 2.0 does. (Comparing to double-prec is not really fair since there are intrinsic limits due to accumulation of rounding error here; but comparing to 2.0 is fair).

Eg with 2.4.0-rc1 (tweaking params in fullmathtest.m):

CPU               	prec=single	tol=1e-06, M=1000, Ntot=1000, ntrans=3
Rel l2 errs:	1D type 1:	0.000398
		1D type 2:	0.000443
		1D type 3:	0.00033
		2D type 1:	6.91e-06
		2D type 2:	6.26e-06
		2D type 3:	3.88e-05
		3D type 1:	2.5e-06
		3D type 2:	2.47e-06
		3D type 3:	0.000398

Requesting tol=1e-4 improves the bad errors there down to the 1e-4 level.
With usf=2.0:

CPU               	prec=single	tol=1e-06, M=1000, Ntot=1000, ntrans=3
Rel l2 errs:	1D type 1:	7.68e-05
		1D type 2:	8.28e-05
		1D type 3:	9.29e-05
		2D type 1:	4.34e-06
		2D type 2:	4.22e-06
		2D type 3:	7.36e-06
		3D type 1:	1.82e-06
		3D type 2:	1.91e-06
		3D type 3:	3.89e-06

Note that usf=2.0 is limited in 1D by N1*eps_mach anyway.

I propose to adjust the heuristics so that if tol is within about 2 digits of eps_mach, usf=2.0 is always enforced. This will come with a loss of speed and a growth of RAM, but the advanced user should be exploring usf setting anyway.

That should cover it for now, pending better kernel design, which I would like to have time to do...

@mreineck
Copy link
Collaborator

mreineck commented May 1, 2025

I ran the double precision plan only to establish a kind of "ground truth" to compare the other results to. I admit I did that incorrectly - I should have used a much lower epsilon for that purpose, like so:

plan2 = finufft.Plan(1, (32, 32, 32), dtype=np.complex128, eps=1e-12, debug=1)

I just retried this, but the results didn't really change.

@ahbarnett
Copy link
Collaborator

With regard to Marco's M_hint suggested logic, I propose to merge settings "-1" into "0", and have setpts always then recompute heuristic for usf given the M sent into setpts.

@ahbarnett ahbarnett added this to the 2.5 milestone May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants