Skip to content

Conversation

@lzsaver
Copy link

@lzsaver lzsaver commented Nov 17, 2025

Motivation and Context

#17563 speeds up system boot by ensuring the 256K benchmark runs during boot, while others run on demand. However, for some reason, the 256K benchmark does not always run during boot. This patch forces the 256K benchmark to be run on demand as well, which should resolve #17945.

Description

One of two things must be true: either the 256K benchmark is not being run when the system boots, or the results of that benchmark are not being taken into account. If you force the 256K benchmark to run on demand, the data is displayed correctly.

It may not quite fit the concept, but it seems to solve the problem. Probably @mcmilk has some more thoughts on the topic.

However, there is an objection to the current concept. If it is true that the 256K benchmark result obtained during boot should be provided on demand, then values should become inconsistent when the governor is changed.

# rmmod zfs spl; cpupower frequency-set -g performance; modprobe zfs && cat /proc/spl/kstat/zfs/chksum_bench
implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic              1271    1546    1620    1574    1572    1607    1404    1553
skein-generic               537     597     612     607     564     611     595     602
sha256-generic              162     177     179     177     180     180     178     179
sha256-x64                  267     300     306     303     307     306     304     305
sha256-ssse3                326     368     378     383     383     382     381     378
sha512-generic              267     329     339     342     343     339     336     340
sha512-x64                  400     458     474     459     481     480     477     477
blake3-generic              347     379     382     379     377     366     315     357
blake3-sse2                 458    1298    1397    1414    1414    1374    1131    1231
blake3-sse41                459    1463    1590    1631    1612    1598    1544    1602
# rmmod zfs spl; cpupower frequency-set -g powersave; modprobe zfs && cat /proc/spl/kstat/zfs/chksum_bench
implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic               274     339     363     315     360     354     356     349
skein-generic               119     122     130     115     104     134     135     133
sha256-generic               34      39      36      39      36      37      39      39
sha256-x64                   60      60      66      67      64      64      67      66
sha256-ssse3                 59      80      83      84      83      83      84      79
sha512-generic               59      63      70      73      71      73      55      57
sha512-x64                   57     100      99      99     104      90     102     103
blake3-generic               76      82      82      79      75      76      73      82
blake3-sse2                  93     280     308     311     310     306     308     308
blake3-sse41                106     321     360     372     373     365     366     362

Probably we do not want the 256K benchmark values from the upper table to end up in the lower one.

We could try even more aggressively.

# MAX=$(cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq)
# MIN=$(cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq)
# rmmod zfs spl; cpupower frequency-set -d "${MAX}" -u "${MAX}"; modprobe zfs && cat /proc/spl/kstat/zfs/chksum_bench
# rmmod zfs spl; cpupower frequency-set -d "${MIN}" -u "${MIN}"; modprobe zfs && cat /proc/spl/kstat/zfs/chksum_bench

However, this is an intuitive line of reasoning. It has not been tested yet because of the issue, which is probably more important.

How Has This Been Tested?

The patch was tested on top of three current branches. If the 256K benchmark results were all zeros, they display correctly after applying the patch. However, it should be noted that the scenario where the 256K benchmark still runs during system boot and then runs again on demand was not tested. It could be that this PR does not solve the root cause of the problem. In any case, there is hope that this can be fixed without a major rewrite of the code.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Contributor

@mcmilk mcmilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fix the cosmetic issue.
But we will need to dig a bit deeper into the problem, why the bs256k variable is zero ;-)

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I find it OK. I don't think saving here makes much sense if we may lose consistency. But before we merge this I think it would be good to understand the original reported problem, so that we would not hide it deeper, ending up with sub-optimal implementation selection.

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Nov 18, 2025
ZFS-CI-Type: linux
Signed-off-by: Alexx Saver <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Fix openzfs#17945
Copy link
Contributor

@mcmilk mcmilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may rework the chksum_init() and maybe other functions.
I would try to fix this also.

@adamdmoss
Copy link
Contributor

adamdmoss commented Nov 20, 2025

I think zfs_chksum.c has gotten out of control, having spent too long even grokking it (including the new state machine, yay) let alone trying to find the underlying bug. 😁

So, I think the underlying bug is that the optimization in chksum_benchit() to skip the 256K test assumes that the caller is passing-in the same chksum_stat_t* as before, with the cs->bs256k already populated. But we can see from chksum_benchmark() that the chksum_stat_data data is reallocated and rezero'd on every chksum_benchmark() invokation, i.e. both at-boot and (the first time) on-demand. Whew.

I think the real fix is to wrap this code:

/* count implementations */
        chksum_stat_cnt = 1;  /* edonr */
        chksum_stat_cnt += 1; /* skein */
        chksum_stat_cnt += sha256->getcnt();
        chksum_stat_cnt += sha512->getcnt();
        chksum_stat_cnt += blake3->getcnt();
        chksum_stat_data = kmem_zalloc(
            sizeof (chksum_stat_t) * chksum_stat_cnt, KM_SLEEP);

... in if (chksum_stat_limit == AT_STARTUP) so the stat data is only allocated and cleared exactly once on first-run.

... but there's a lot of complication and smell here for something that optimizes a function that will normally only get called once ever. IMVHO. (If it were up to me I'd consider reverting the optimization.)

@lzsaver
Copy link
Author

lzsaver commented Nov 21, 2025

Let us first get the code into a working state, worthy of the 2.4 release. No surprises. No experiments.

@mcmilk, after that, you can redo it according to some new concept.

@adamdmoss, thanks. I will take a look.

@lzsaver lzsaver marked this pull request as draft November 21, 2025 21:00
@github-actions github-actions bot added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels Nov 21, 2025
ZFS-CI-Type: linux
Signed-off-by: Alexx Saver <[email protected]>
Co-authored-by: Adam Moss <[email protected]>
@lzsaver lzsaver force-pushed the patch-2 branch 3 times, most recently from ca2011b to 950eb73 Compare November 22, 2025 01:50
@lzsaver
Copy link
Author

lzsaver commented Nov 22, 2025

Well, now it works.

# MIN=$(cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq)
# cpupower frequency-set -d "${MIN}" -u "${MIN}"
# sleep 1m
# cat /proc/spl/kstat/zfs/chksum_bench

implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic               157     245     255     359    1634     339     356     357
skein-generic               118     131     132     135     618     132     136     131
sha256-generic               36      39      35      38     180      39      37      37
sha256-x64                   35      36      40      62     307      51      52      58
sha256-ssse3                 69      79      81      78     384      74      79      76
sha512-generic               47      60      47      50     342      55      63      70
sha512-x64                   59      71      74      82     479      86      90     100
blake3-generic               57      58      66      82     380      77      82      81
blake3-sse2                  97     288     301     313    1417     307     305     307
blake3-sse41                107     329     360     371    1679     367     363     365

We may leave it as it is for now. That is, without taking the governor into account.
After the PR is accepted, we need to finalize the concept and simplify all this code.
At the moment, it looks like we even need to add locks here. Let us try to avoid this.


#define AT_STARTUP 0
#define AT_BENCHMARK 1
#define AT_DONE 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the AT_DONE state - when each reading of the benchmark file should create new statistics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Work in Progress Not yet ready for general review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faster checksum benchmark on system boot

4 participants