Skip to content

Conversation

@ProfFan
Copy link
Collaborator

@ProfFan ProfFan commented Nov 1, 2025

#92

and GCC 15 bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118634

On my AMD 9950X3D:

numberOfProblems = 1000000
problemSize = 4
With 1 threads:
Without memory allocation, grain size = 1, time = 0.206183
Without memory allocation, grain size = 10, time = 0.170344
Without memory allocation, grain size = 100, time = 0.156581
Without memory allocation, grain size = 1000, time = 0.15655
With memory allocation, grain size = 1, time = 0.199588
With memory allocation, grain size = 10, time = 0.199542
With memory allocation, grain size = 100, time = 0.199536
With memory allocation, grain size = 1000, time = 0.199497

With 4 threads:
Without memory allocation, grain size = 1, time = 0.0410819
Without memory allocation, grain size = 10, time = 0.0399602
Without memory allocation, grain size = 100, time = 0.0404226
Without memory allocation, grain size = 1000, time = 0.0400035
With memory allocation, grain size = 1, time = 0.051057
With memory allocation, grain size = 10, time = 0.051184
With memory allocation, grain size = 100, time = 0.0512407
With memory allocation, grain size = 1000, time = 0.0512729

With 8 threads:
Without memory allocation, grain size = 1, time = 0.0251816
Without memory allocation, grain size = 10, time = 0.0230612
Without memory allocation, grain size = 100, time = 0.0242118
Without memory allocation, grain size = 1000, time = 0.0247101
With memory allocation, grain size = 1, time = 0.0273827
With memory allocation, grain size = 10, time = 0.0301596
With memory allocation, grain size = 100, time = 0.0287985
With memory allocation, grain size = 1000, time = 0.0293065

Summary of results:
4 threads, without allocation, grain size = 1, speedup = 5.01883
4 threads, without allocation, grain size = 10, speedup = 4.26285
4 threads, without allocation, grain size = 100, speedup = 3.87361
4 threads, without allocation, grain size = 1000, speedup = 3.91341
4 threads, with allocation, grain size = 1, speedup = 3.90913
4 threads, with allocation, grain size = 10, speedup = 3.89853
4 threads, with allocation, grain size = 100, speedup = 3.8941
4 threads, with allocation, grain size = 1000, speedup = 3.89088
8 threads, without allocation, grain size = 1, speedup = 8.18785
8 threads, without allocation, grain size = 10, speedup = 7.38661
8 threads, without allocation, grain size = 100, speedup = 6.46715
8 threads, without allocation, grain size = 1000, speedup = 6.33548
8 threads, with allocation, grain size = 1, speedup = 7.28884
8 threads, with allocation, grain size = 10, speedup = 6.61622
8 threads, with allocation, grain size = 100, speedup = 6.9287
8 threads, with allocation, grain size = 1000, speedup = 6.80726

Copy link
Member

@dellaert dellaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@ProfFan ProfFan merged commit 663bdb6 into develop Nov 4, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants