gh-139156: Use PyBytesWriter in UTF-32 encoder #139157

vstinner · 2025-09-19T11:24:53Z

Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.

Issue: Use PyBytesWriter in Unicode codecs #139156

Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.

vstinner · 2025-09-22T06:19:33Z

Benchmark ASCII characters:

import pyperf
runner = pyperf.Runner()
for size in (3, 100, 1000):
    runner.timeit(f'{size:,} chars',
        setup=f's="x"*{size}',
        stmt='s.encode("utf32")')

Result:

Benchmark	ref	pep782
100 chars	58.9 ns	61.5 ns: 1.04x slower
1,000 chars	185 ns	188 ns: 1.02x slower
Geometric mean	(ref)	1.02x slower

Benchmark hidden because not significant (1): 3 chars

vstinner · 2025-09-22T06:23:27Z

Benchmark UCS-4 characters:

import pyperf
runner = pyperf.Runner()
for size in (3, 100, 1000):
    runner.timeit(f'{size:,} UCS-4 chars',
        setup=f's=chr(0x10ffff) * {size}',
        stmt='s.encode("utf32")')

Result:

Benchmark	ref	pep782
3 UCS-4 chars	54.7 ns	63.9 ns: 1.17x slower
100 UCS-4 chars	99.4 ns	101 ns: 1.02x slower
1,000 UCS-4 chars	546 ns	542 ns: 1.01x faster
Geometric mean	(ref)	1.06x slower

vstinner · 2025-09-22T09:50:48Z

cc @serhiy-storchaka

vstinner · 2025-09-22T11:00:14Z

The only significant difference is on "3 UCS-4 chars": 54.7 ns => 63.9 ns: 1.17x slower, +9.2 seconds. That's the cost of the PyBytesWriter API abstraction.

I added a "fast path" for UCS-1 which is the most common cases: it keeps PyBytes_FromStringAndSize(NULL, size). So there is no impact on performance.

serhiy-storchaka

LGTM. 👍

vstinner · 2025-09-22T20:07:29Z

Merged, thanks for the review @serhiy-storchaka.

vstinner added the skip news label Sep 19, 2025

bedevere-app bot added the awaiting core review label Sep 19, 2025

bedevere-app bot mentioned this pull request Sep 19, 2025

Use PyBytesWriter in Unicode codecs #139156

Open

vstinner added 2 commits September 22, 2025 08:12

pythongh-139156: Use PyBytesWriter in UTF-32 encoder

120b75c

Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.

Add UCS1 fast path

9b5deeb

vstinner force-pushed the utf32 branch from e8435b2 to 9b5deeb Compare September 22, 2025 06:13

serhiy-storchaka approved these changes Sep 22, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Sep 22, 2025

vstinner added 2 commits September 22, 2025 14:28

Use PyBytesWriter_FinishWithPointer()

4a4b9ff

Cleanup

1c54b6a

vstinner enabled auto-merge (squash) September 22, 2025 12:38

vstinner merged commit 92ba2c9 into python:main Sep 22, 2025
77 of 79 checks passed

vstinner deleted the utf32 branch September 22, 2025 20:05

bedevere-app bot removed the awaiting merge label Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-139156: Use PyBytesWriter in UTF-32 encoder #139157

gh-139156: Use PyBytesWriter in UTF-32 encoder #139157

vstinner commented Sep 19, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

vstinner commented Sep 22, 2025 •

edited

Loading

Uh oh!

vstinner commented Sep 22, 2025

Uh oh!

vstinner commented Sep 22, 2025

Uh oh!

vstinner commented Sep 22, 2025

Uh oh!

serhiy-storchaka left a comment

Uh oh!

Uh oh!

vstinner commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

gh-139156: Use PyBytesWriter in UTF-32 encoder #139157

gh-139156: Use PyBytesWriter in UTF-32 encoder #139157

Conversation

vstinner commented Sep 19, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Sep 22, 2025

Uh oh!

vstinner commented Sep 22, 2025

Uh oh!

vstinner commented Sep 22, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Sep 22, 2025

Uh oh!

Uh oh!

vstinner commented Sep 19, 2025 •

edited by bedevere-app bot

Loading

vstinner commented Sep 22, 2025 •

edited

Loading