Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve algorithm selection in fmpz_mat_rref #2144

Merged
merged 2 commits into from
Jan 10, 2025

Conversation

fredrik-johansson
Copy link
Collaborator

Should fix #2129.

Based on the following profiling data with randtest matrices. For each row, the initial number is the number of rows r. For c = 1, 2, ... columns we print f when fmpz_mat_fflu is faster and M when fmpz_mat_rref_mul if faster. The fraction on the right is the ratio c / r of the largest c where M wins.

We see that M basically wins exactly when c <= r for r up to about 20 and in an extended triangular region for r <= 100. Presumably fflu still wins for r > 100 for ratios much larger than 2 but this doesn't seem particularly worth optimizing for (and the profiles take a long time to run).

bits = 10

   1 fff       0.000
   2 ffffff       0.000
   3 fffffff       0.000
   4 ffffffff       0.000
   5 fffffffff       0.000
   6 fffffMffff       1.000
   7 ffffffMffff       1.000
   8 fffffMMMffff       1.000
   9 fffffMMMMffff       1.000
  10 fffffMMMMMffff       1.000
  20 ffffMMMMMMMMMMMMMMMMffff       1.000
  30 ffffMMMMMMMMMMMMMMMMMMMMMMMMMMMffff       1.033
  40 ffffMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.025
  50 ffffMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMff       1.040
  60 ffffMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.117
  70 ffffMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMffM       1.200
  80 ffffMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM       1.262
  90 ffffMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfM       1.267
 100 ffffMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfMfMffff       1.880
 110 fffMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfMMMfMMfMMMMfMfMfMMffMfMfMff       2.355
 120 ffffMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfMMMMMMMMMMMMfMMMMMMMMMMMMMMMMMfMMMMMfMMMfMMMMMMfMffMMfMMfMfffMffff       2.525

bits = 100

   1 fff       0.000
   2 fffff       0.000
   3 ffffff       0.000
   4 ffMMfff       1.000
   5 ffMMMfff       1.000
   6 fMMMMMfff       1.000
   7 ffMMMMMfff       1.000
   8 fMMMMMMMfff       1.000
   9 fMMMMMMMMfff       1.000
  10 fMMMMMMMMMfff       1.000
  20 fMMMMMMMMMMMMMMMMMMMfff       1.000
  30 fMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.033
  40 fMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.025
  50 fMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.080
  60 fMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMff       1.233
  70 MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfMfff       1.529
  80 MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfMMMMMMMMffMMMfff       1.712
  90 MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.856
 100 MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfMMMMMMMfff       2.210

bits = 1000

   1 fff       0.000
   2 fffff       0.000
   3 ffMMMMMMMMM       3.667
   4 fMMMfff       1.000
   5 fMMMMfff       1.000
   6 fMMMMMfff       1.000
   7 fMMMMMMfff       1.000
   8 fMMMMMMMfff       1.000
   9 fMMMMMMMMfff       1.000
  10 fMMMMMMMMMfff       1.000
  20 fMMMMMMMMMMMMMMMMMMMfff       1.000
  30 fMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.033
  40 fMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.150
  50 fMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.260
  60 fMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.300
  70 fMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMfff       1.314

@fredrik-johansson
Copy link
Collaborator Author

Old and new timings for the 10000-bit random matrices in the original issue:

r = 1 c = 1        old: 0.000002 seconds           new: 0.000001 seconds
r = 2 c = 2        old: 0.000010 seconds           new: 0.000010 seconds
r = 3 c = 3        old: 0.000089 seconds           new: 0.000087 seconds
r = 4 c = 4        old: 0.000376 seconds           new: 0.000207 seconds
r = 5 c = 5        old: 0.001129 seconds           new: 0.000007 seconds
r = 6 c = 6        old: 0.002702 seconds           new: 0.000009 seconds
r = 7 c = 7        old: 0.005647 seconds           new: 0.000011 seconds
r = 8 c = 8        old: 0.010621 seconds           new: 0.000023 seconds
r = 9 c = 9        old: 0.017647 seconds           new: 0.000022 seconds
r = 10 c = 10        old: 0.027957 seconds           new: 0.000024 seconds
r = 11 c = 11        old: 0.042708 seconds           new: 0.000090 seconds
r = 12 c = 12        old: 0.060910 seconds           new: 0.000034 seconds
r = 13 c = 13        old: 0.086553 seconds           new: 0.000042 seconds
r = 14 c = 14        old: 0.118287 seconds           new: 0.000039 seconds
r = 15 c = 15        old: 0.159068 seconds           new: 0.000055 seconds
r = 16 c = 16        old: 0.209279 seconds           new: 0.000080 seconds
r = 17 c = 17        old: 0.269945 seconds           new: 0.000074 seconds
r = 18 c = 18        old: 0.343680 seconds           new: 0.000103 seconds
r = 19 c = 19        old: 0.431984 seconds           new: 0.000114 seconds
r = 20 c = 20        old: 0.536749 seconds           new: 0.000139 seconds
r = 21 c = 21        old: 0.000057 seconds           new: 0.000059 seconds
r = 22 c = 22        old: 0.000065 seconds           new: 0.000065 seconds
r = 23 c = 23        old: 0.000071 seconds           new: 0.000071 seconds
r = 24 c = 24        old: 0.000077 seconds           new: 0.000076 seconds
r = 25 c = 25        old: 0.000083 seconds           new: 0.000083 seconds
r = 26 c = 26        old: 0.000091 seconds           new: 0.000089 seconds
r = 27 c = 27        old: 0.000097 seconds           new: 0.000098 seconds
r = 28 c = 28        old: 0.000103 seconds           new: 0.000103 seconds
r = 29 c = 29        old: 0.000112 seconds           new: 0.000111 seconds
r = 30 c = 30        old: 0.000119 seconds           new: 0.000118 seconds
r = 31 c = 31        old: 0.000140 seconds           new: 0.000128 seconds
r = 32 c = 32        old: 0.000145 seconds           new: 0.000135 seconds
r = 33 c = 33        old: 0.000144 seconds           new: 0.000143 seconds
r = 34 c = 34        old: 0.000197 seconds           new: 0.000175 seconds
r = 35 c = 35        old: 0.000162 seconds           new: 0.000161 seconds
r = 36 c = 36        old: 0.000171 seconds           new: 0.000169 seconds
r = 37 c = 37        old: 0.000174 seconds           new: 0.000174 seconds
r = 38 c = 38        old: 0.000191 seconds           new: 0.000190 seconds
r = 39 c = 39        old: 0.000194 seconds           new: 0.000193 seconds
r = 40 c = 40        old: 0.000204 seconds           new: 0.000208 seconds

@albinahlback
Copy link
Collaborator

Very nice! Make FLINT fast again!

@fredrik-johansson fredrik-johansson merged commit a249a44 into flintlib:main Jan 10, 2025
12 checks passed
@fredrik-johansson fredrik-johansson deleted the rref branch January 25, 2025 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Computing echelon form of smaller matrix takes longer than larger matrix?
2 participants