Bug Report: 7 out of 14 tests produce incorrect results (verified with os.urandom)
Summary
Multiple tests in nistrng v1.2.3 produce incorrect p-values for all inputs, including cryptographically secure random data (os.urandom). This was discovered by running the test suite against 5 different inputs as a cross-validation:
| Input |
Expected |
nistrng Result |
Correct Result |
os.urandom() (CSPRNG) |
~14/14 PASS |
7/14 |
14/14 PASS |
| AES-256 encrypted data |
~14/14 PASS |
7/14 |
14/14 PASS |
| Unencrypted JPEG |
~0/14 PASS |
0/14 |
0/14 PASS |
The 7 broken tests fail identically regardless of input data, producing the same wrong p-values for true random data as for encrypted data.
Environment
- nistrng version: 1.2.3
- Python: 3.14 (Windows)
- numpy: 2.x
- scipy: 1.17.1
- Test method: 10 samples of 1,000,000 bits each
Bug 1: Approximate Entropy — min and max swapped (CRITICAL)
File: sp800_22r1a/test_approximate_entropy.py, line 52
# Current (WRONG):
blocks_length: int = min(2, max(3, int(math.floor(math.log(bits.size, 2))) - 6))
# This ALWAYS returns 2, because:
# max(3, anything) >= 3
# min(2, 3+) = 2
For 1,000,000 bits: log2(1000000) = 19.9 -> 19 - 6 = 13 -> max(3, 13) = 13 -> min(2, 13) = 2
The test always runs with m=2 instead of the correct m=13. With only 4 possible 2-bit patterns, the test is meaningless for large inputs.
Suggested fix:
blocks_length: int = max(2, min(int(math.floor(math.log(bits.size, 2))) - 6, 13))
Bug 2: Approximate Entropy — incorrect log divisor
File: sp800_22r1a/test_approximate_entropy.py, line 74
# Current (WRONG):
phi_m.append(numpy.sum(c_i[c_i > 0.0] * numpy.log((c_i[c_i > 0.0] / 10.0))))
# Should be (per NIST SP 800-22):
phi_m.append(numpy.sum(c_i[c_i > 0.0] * numpy.log(c_i[c_i > 0.0])))
The division by 10.0 has no basis in the NIST specification and corrupts the Phi-m statistic.
Bug 3: Serial Test — hardcoded pattern length
File: sp800_22r1a/test_serial.py, line 38
self._pattern_length: int = 4 # Hardcoded!
Per NIST SP 800-22, m should be chosen such that m < floor(log2(n)) - 2. For 1,000,000 bits, this means m should be around 14-17. With m=4, the test only examines 16 possible patterns instead of tens of thousands, making it far too coarse to detect non-randomness.
Bug 4: Cumulative Sums — int8 overflow
File: sp800_22r1a/test_cumulative_sums.py, lines 44-56
bits_copy: numpy.ndarray = bits.copy() # bits is int8 (-128..127)
bits_copy[bits_copy == 0] = -1
# ...
forward_sum += bits_copy[i] # Overflows after ~128 steps!
The cumulative sum of +1/-1 values stored as int8 overflows after approximately 128 additions. For 1,000,000 bits, this produces completely wrong forward_max/backward_max values. The test always returns p = 1.0, which is a clear indicator of the bug.
Suggested fix: Convert to int32 or int64 before computation:
bits_copy = bits.copy().astype(numpy.int64)
Bug 5: Maurer's Universal — suspiciously constant p-values
Maurer's Universal test returns p-values of approximately 0.00978 for all inputs, including os.urandom(). The p-values across 10 samples of true random data:
0.00980, 0.00996, 0.00976, 0.00978, 0.00964
This consistency across completely different inputs (random, AES-256, custom ciphers) strongly suggests a computational error in the test implementation.
Bug 6: Random Excursion — incorrect pass/fail evaluation
The test produces reasonable-looking p-values (0.630, 0.683) but marks them as FAIL. A p-value of 0.63 should clearly be a PASS (threshold is 0.01). This appears to be a bug in how the result is evaluated.
Bug 7: DFT/Spectral and Linear Complexity
Both tests return p = 0.000000 for all inputs including os.urandom(), indicating fundamental implementation errors.
Verification Method
The bugs were discovered using a 5-way cross-validation approach:
- True random (
os.urandom) — must pass all tests
- AES-256 (7-Zip encrypted JPEG) — must pass all tests
- Custom cipher A (Turbine V5) — expected to mostly pass
- Custom cipher B (SCHFM2) — expected to mostly pass
- Unencrypted JPEG — must fail all tests (control)
When true random data and AES-256 both fail the same 7 tests with identical p-values, the tests themselves are clearly broken.
A separate pure-Python implementation of the same NIST tests (Monobit, Block Frequency, Runs, Longest Run, Cumulative Sums, Approximate Entropy with m=10, Serial with m=16, Maurer's Universal) correctly returns PASS for all encrypted inputs and true random data, and FAIL for the unencrypted JPEG.
Impact
Any user relying on nistrng to evaluate encryption quality will get misleading results: 7 out of 14 tests always report FAIL regardless of input quality. This can lead to:
- Rejecting good algorithms based on false test failures
- Wasting development time trying to fix non-existent weaknesses
- False sense of confidence when all "working" tests pass
Recommendation
I would recommend adding a simple validation test to the test suite:
import os, numpy as np, nistrng
# Generate true random bits
bits = np.unpackbits(np.frombuffer(os.urandom(125000), dtype=np.uint8)).astype(np.int8)
# All tests should pass for true random data
for name, test in nistrng.SP800_22R1A_BATTERY.items():
if test.is_eligible(bits):
result = test.run(bits)[0]
assert result.passed, f"{name} failed on true random data (p={result.score})"
This would have caught all 7 bugs immediately.
Thank you for maintaining this package. I hope this report helps improve it!
While working on the analysis above, I needed a working test suite for my own project, so I put together a pure-Python implementation of the tests that were giving me trouble.
In case it's useful to anyone running into similar issues, I've published it here:
https://github.com/ReinhardJesolowitz24/py-nist-sp800-22
It covers 12 NIST SP 800-22 tests plus 4 supplementary tests, all validated against os.urandom(), AES-256, and a raw JPEG as negative control. Single file, no dependencies except optional numpy for the DFT test.
It's not meant as a replacement for this project — just a quick workaround until the issues here get addressed. Happy to help with fixes if that's preferred!
Bug Report: 7 out of 14 tests produce incorrect results (verified with os.urandom)
Summary
Multiple tests in nistrng v1.2.3 produce incorrect p-values for all inputs, including cryptographically secure random data (
os.urandom). This was discovered by running the test suite against 5 different inputs as a cross-validation:os.urandom()(CSPRNG)The 7 broken tests fail identically regardless of input data, producing the same wrong p-values for true random data as for encrypted data.
Environment
Bug 1: Approximate Entropy —
minandmaxswapped (CRITICAL)File:
sp800_22r1a/test_approximate_entropy.py, line 52For 1,000,000 bits:
log2(1000000) = 19.9 -> 19 - 6 = 13 -> max(3, 13) = 13 -> min(2, 13) = 2The test always runs with
m=2instead of the correctm=13. With only 4 possible 2-bit patterns, the test is meaningless for large inputs.Suggested fix:
Bug 2: Approximate Entropy — incorrect log divisor
File:
sp800_22r1a/test_approximate_entropy.py, line 74The division by 10.0 has no basis in the NIST specification and corrupts the Phi-m statistic.
Bug 3: Serial Test — hardcoded pattern length
File:
sp800_22r1a/test_serial.py, line 38Per NIST SP 800-22,
mshould be chosen such thatm < floor(log2(n)) - 2. For 1,000,000 bits, this meansmshould be around 14-17. Withm=4, the test only examines 16 possible patterns instead of tens of thousands, making it far too coarse to detect non-randomness.Bug 4: Cumulative Sums — int8 overflow
File:
sp800_22r1a/test_cumulative_sums.py, lines 44-56The cumulative sum of +1/-1 values stored as
int8overflows after approximately 128 additions. For 1,000,000 bits, this produces completely wrongforward_max/backward_maxvalues. The test always returnsp = 1.0, which is a clear indicator of the bug.Suggested fix: Convert to int32 or int64 before computation:
Bug 5: Maurer's Universal — suspiciously constant p-values
Maurer's Universal test returns p-values of approximately 0.00978 for all inputs, including
os.urandom(). The p-values across 10 samples of true random data:This consistency across completely different inputs (random, AES-256, custom ciphers) strongly suggests a computational error in the test implementation.
Bug 6: Random Excursion — incorrect pass/fail evaluation
The test produces reasonable-looking p-values (0.630, 0.683) but marks them as FAIL. A p-value of 0.63 should clearly be a PASS (threshold is 0.01). This appears to be a bug in how the result is evaluated.
Bug 7: DFT/Spectral and Linear Complexity
Both tests return
p = 0.000000for all inputs includingos.urandom(), indicating fundamental implementation errors.Verification Method
The bugs were discovered using a 5-way cross-validation approach:
os.urandom) — must pass all testsWhen true random data and AES-256 both fail the same 7 tests with identical p-values, the tests themselves are clearly broken.
A separate pure-Python implementation of the same NIST tests (Monobit, Block Frequency, Runs, Longest Run, Cumulative Sums, Approximate Entropy with m=10, Serial with m=16, Maurer's Universal) correctly returns PASS for all encrypted inputs and true random data, and FAIL for the unencrypted JPEG.
Impact
Any user relying on nistrng to evaluate encryption quality will get misleading results: 7 out of 14 tests always report FAIL regardless of input quality. This can lead to:
Recommendation
I would recommend adding a simple validation test to the test suite:
This would have caught all 7 bugs immediately.
Thank you for maintaining this package. I hope this report helps improve it!
While working on the analysis above, I needed a working test suite for my own project, so I put together a pure-Python implementation of the tests that were giving me trouble.
In case it's useful to anyone running into similar issues, I've published it here:
https://github.com/ReinhardJesolowitz24/py-nist-sp800-22
It covers 12 NIST SP 800-22 tests plus 4 supplementary tests, all validated against os.urandom(), AES-256, and a raw JPEG as negative control. Single file, no dependencies except optional numpy for the DFT test.
It's not meant as a replacement for this project — just a quick workaround until the issues here get addressed. Happy to help with fixes if that's preferred!