gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C #139123

1st1 · 2025-09-18T13:14:06Z

The C implementation considerably boosts the performance of the key UUID operations:

------------------------------------
Operation                    Speedup
------------------------------------
uuid4() generation            15.01x
uuid7() generation            29.64x
UUID from string               6.76x
UUID from bytes                5.16x
str(uuid) conversion           6.66x
------------------------------------

Summary of changes:

The UUID type is reimplemented in C in its entirety.
The pure-Python is kept around and is used of the C implementation isn't available for some reason.
Both implementations are tested extensively; additional tests are added to ensure that the C implementation of the type follows the pure Python implementation fully.
The Python implementation stores UUID values as int objects. The C implementation stores them as uint8_t[16] array.
The C implementation supports unpickling of UUIDs created with Python 2 using protocols starting with 0.
The C implementation has faster hash() implementation but also caches the computed hash value to speedup cases when UUIDs are used as set/dict keys.
The C implementation has a freelist to make new UUID object instantiation as fast as possible.
uuid4() and uuid7() are now implmented in C. The most performance boost (10x) comes from overfetching entropy to decrease the number of _PyOS_URandom() calls. On its own it's a safe optimization with the edge case that Unix fork needs to be explicitly handled. We do that by comparing the current PID to the PID of when the random buffer was populated.
Portions of code are coming from my implementation of faster UUID in gel-python [1]. I did use AI during the development, but basically had to rewrite the code it generated to be more idiomatic and efficient.
The benchmark can be found here [2].
This PR makes Python UUID operations as fast as they are in NodeJS and Bun runtimes.

[1] https://github.com/MagicStack/py-pgproto/blob/b8109fb311a59f30f9947567a13508da9a776564/uuid.pyx

[2] https://gist.github.com/1st1/f03e816f34a61e4d46c78ff98baf4818

Issue: Implement UUID functions and type in C #139122

python-cla-bot · 2025-09-18T13:14:11Z

All commit authors signed the Contributor License Agreement.

… in C The C implementation considerably boosts the performance of the key UUID operations: ------------------------------------ Operation Speedup ------------------------------------ uuid4() generation 15.01x uuid7() generation 29.64x UUID from string 6.76x UUID from bytes 5.16x str(uuid) conversion 6.66x ------------------------------------ Summary of changes: * The UUID type is reimplemented in C in its entirety. * The pure-Python is kept around and is used of the C implementation isn't available for some reason. * Both implementations are tested extensively; additional tests are added to ensure that the C implementation of the type follows the pure Python implementation fully. * The Python implementation stores UUID values as int objects. The C implementation stores them as `uint8_t[16]` array. * The C implementation supports unpickling of UUIDs created with Python 2 using protocols starting with 0. That necessitated a small fix to the `copyreg` module (the change is only affecting legacy pickle pathway). * The C implementation has faster hash() implementation but also caches the computed hash value to speedup cases when UUIDs are used as set/dict keys. * The C implementation has a freelist to make new UUID object instantiation as fast as possible. * uuid4() and uuid7() are now implmented in C. The most performance boost (10x) comes from overfetching entropy to decrease the number of _PyOS_URandom() calls. On its own it's a safe optimization with the edge case that Unix fork needs to be explicitly handled. We do that by comparing the current PID to the PID of when the random buffer was populated. * Portions of code are coming from my implementation of faster UUID in gel-python [1]. I did use AI during the development, but basically had to rewrite the code it generated to be more idiomatic and efficient. * The benchmark can be found here [2]. * This PR makes Python UUID operations as fast as they are in NodeJS and Bun runtimes. [1] https://github.com/MagicStack/py-pgproto/blob/b8109fb311a59f30f9947567a13508da9a776564/uuid.pyx [2] https://gist.github.com/1st1/f03e816f34a61e4d46c78ff98baf4818

1st1 · 2025-09-18T13:28:07Z

Note to reviewers:

Please compare uuid7 C and Python implementations side by side
Please review all of the bitshift/magic code closely (Python implementation uses int, i'm using char[16], so low-level operations are very different but have to be equivalent)

zooba · 2025-09-18T13:45:59Z

Modules/_uuidmodule.c

+
+    *counter = (((uint64_t)(high & 0x1FF) << 32) | (low >> 32)) & 0x1FFFFFFFFFF;
+    *tail = (uint32_t)low;
+    return 0;


I haven't tested thoroughly (mostly looking at GodBolt output), but this ought to generate more efficient code (and there might be more ways to generate the right values directly, I just didn't go that far):

struct { uint16_t high; uint32_t low1; uint32_t low2; } rand_bytes; if (gen_random(state, (uint8_t*)&rand_bytes, sizeof(rand_bytes)) < 0) { return -1; } *counter = (((uint64_t)(rand_bytes.high & 0x1FF) << 32) | (rand_bytes.low1)) & 0x1FFFFFFFFFF; *tail = rand_bytes.low2;

zooba · 2025-09-18T13:47:05Z

Modules/_uuidmodule.c

+    uint8_t bytes[16];
+    uint64_t timestamp_ms, counter;
+    uint32_t tail;


Ought to be able to do a similar struct/union trick here for faster conversion.

Modules/_uuidmodule.c

zooba · 2025-09-18T13:57:24Z

Modules/_uuidmodule.c

+        goto fail;
+    }
+
+    uuid_mod = PyImport_ImportModule("uuid");


Can we possibly simplify this whole concept (caching the values here) by pushing the queries down to a Python subclass of _uuid.UUID? So that the core create/parse/etc. functionality is entirely native, and the more esoteric queries are written in Python as class uuid.UUID(_uuid.UUID)? (Answer can be "no" if it's a bad idea for reasons I haven't considered)

Steve, I really tried hard to simplify it.

[1] is my attempt, where the C implementation defines the fundamental ops and the rest is kept in Python.

despite me pushing it really hard and even implementing freelist for Python-land classes (probably was first such perversive attempt in the stdlib), the performance of UUID instantiation takes a big hit. Benchmarks became 10-20% slower just because the UUID type is inherited from a Python base.

Overall savings were about 300 lines of C. Which is considerable, but IMO making UUID as fast as we can is more important.

So let's keep this PR approach as is.

That said, good news! I figured out how to fully support pickle without my prior shenanigans with copyreg, so now the C implementation is a full drop-in replacement. 🚀

[1] https://github.com/1st1/cpython/blob/8a5877165e993afb2633cd48da5222326d3f6e0e/Modules/_uuidmodule.c#L4

picnixz · 2025-09-18T17:02:44Z

I would be happy to have a C implementation of UUID but for reviewing purposes, may I suggest that we first focus on implementing the wrapper in C in one PR and have different PRs for each UUID? it would be much easier to focus on the algorithm to review.

picnixz

These are the first round comments I have. I realy want multiple PRs because the module becomes really huge and there are parts that I don't think we need to reimplement in C.

picnixz · 2025-09-18T17:07:25Z

Lib/uuid.py

@@ -219,13 +242,21 @@ def __init__(self, hex=None, bytes=None, bytes_le=None, fields=None,
                raise ValueError('badly formed hexadecimal UUID string')
            int = int_(hex, 16)
        elif bytes_le is not None:
+            if not isinstance(bytes_le, bytes_):


This will make a behavioural change so let's not change this here (previously there would be an assertion error). Let's do it in a separate PR (and remove the assert at the same time)

I don't view this as a behavioral change. It was an error condition before, it's now handled properly. I wouldn't touch this code if I didn't have to re-implement it in C; re-implementing "assert" statement behavior is just counterproductive. I'd do that, if it was really a behavioral change, but I strongly believe it is not.

Lib/uuid.py

Modules/_uuidmodule.c

picnixz · 2025-09-18T17:22:04Z

Modules/_uuidmodule.c

+    }
+
+    int overflow;
+    uint64_t value = PyLong_AsLongLongAndOverflow(field, &overflow);


I think we have a converter to uint64

I don't see PyLong_AsUnsignedLongLongAndOverflow in the API.

It's probably named Uint64 something

I looked, I can't find it. If you have a concrete doc / code link - please share, but please don't make me fish for something you yourself not sure exists :)

I apologize, I'm in the wrong here. Just saw the functions, i've no idea why i couldn't find them before. Sorry. I'll fix the code.

Modules/_uuidmodule.c

1st1 · 2025-09-19T10:48:34Z

I realy want multiple PRs because the module becomes really huge and there are parts that I don't think we need to reimplement in C.

Sorry, I'm not going to do multiple PRs, I've no time for that. But I will address your comments here for sure!

…7() in C The C implementation considerably boosts the performance of the key UUID operations: ------------------------------------ Operation Speedup ------------------------------------ uuid4() generation 15.01x uuid7() generation 29.64x UUID from string 6.76x UUID from bytes 5.16x str(uuid) conversion 6.66x ------------------------------------ Summary of changes: * The UUID type is reimplemented in C in its entirety. * The pure-Python is kept around and is used of the C implementation isn't available for some reason. * Both implementations are tested extensively; additional tests are added to ensure that the C implementation of the type follows the pure Python implementation fully. * The Python implementation stores UUID values as int objects. The C implementation stores them as `uint8_t[16]` array. * The C implementation has faster hash() implementation but also caches the computed hash value to speedup cases when UUIDs are used as set/dict keys. * The C implementation has a freelist to make new UUID object instantiation as fast as possible. * uuid4() and uuid7() are now implmented in C. The most performance boost (10x) comes from overfetching entropy to decrease the number of _PyOS_URandom() calls. On its own it's a safe optimization with the edge case that Unix fork needs to be explicitly handled. We do that by comparing the current PID to the PID of when the random buffer was populated. * Portions of code are coming from my implementation of faster UUID in gel-python [1]. I did use AI during the development, but basically had to rewrite the code it generated to be more idiomatic and efficient. * The benchmark can be found here [2]. * This PR makes Python UUID operations as fast as they are in NodeJS and Bun runtimes. [1] https://github.com/MagicStack/py-pgproto/blob/b8109fb311a59f30f9947567a13508da9a776564/uuid.pyx [2] https://gist.github.com/1st1/f03e816f34a61e4d46c78ff98baf4818

picnixz · 2025-09-19T12:29:04Z

I would REALLY appreciate smaller PRs. Usually this is our (current) workflow, namely incrementing changes. I think other core devs would agree with me here (@vstinner, @serhiy-storchaka and @encukou).

Note that whatever we do, this is a feature that will only be included in 3.15 so I can also make the small PRs once I am back at home. Changing 3.14 is no more possible unless the RM gives their approval here (cc @hugovk, who could also whether he prefers one or multiple PR) as performance improvements are usually considered as new features (especially if we are adding a C implementation).

Misc/NEWS.d/next/Library/2025-09-18-14-13-00.gh-issue-139122.m3lp66.rst

1st1 · 2025-09-19T12:52:59Z

I would REALLY appreciate smaller PRs. Usually this is our (current) workflow, namely incrementing changes. I think other core devs would agree with me here (@vstinner, @serhiy-storchaka and @encukou).

The current PR layout:

90% is reimplementing the UUID type
10% is uuid4 and uuid7 implementations which are isolated separate functions from (1); uuid4 is very tiny.

I'm usually also for making smaller PRs, but objectively there's no point for that right here; I don't see how reviewing this would be fundamentally any different if instead of one PR with 1700 lines in uuidmodule.c, you'd have 3 PRs, one with 1600 lines and another with like 100 lines, and yet another with 30. Reviewing time would be the same, no? What's the point?

If you absolutely insist I can do this, but tbqh I really don't understand why you're pushing for this so hard. It would create a lot of work for me and not necessarily make reviewers more happy.

Note that whatever we do, this is a feature that will only be included in 3.15 so I can also make the small PRs once I am back at home. Changing 3.14 is no more possible unless the RM gives their approval here (cc @hugovk, who could also whether he prefers one or multiple PR) as performance improvements are usually considered as new features (especially if we are adding a C implementation).

I don't think this is 3.14 material and I wasn't pushing for that. 3.15 is fine (unless other people but me want this to be in 3.14, in which case I won't say no).

That said I'd like to see my work through and would love to still merge this myself.

eendebakpt · 2025-09-19T13:16:31Z

Modules/_uuidmodule.c

+    uint64_t random_idx;
+    uint64_t random_last_pid;
+
+    // A freelist for uuid objects -- 15-20% performance boost.


There is a generic freelist implementation for python objects (see https://github.com/python/cpython/blob/7257b24140ac1b39fb8cfd4610134ec79575a396/Include/internal/pycore_freelist_state.h). Unless there are strong reasons not to, the uuid object should use the generic implementation.

Sadly I can't use that, as uuidobject.c is compiled as a shared lib.

That is a pity. I would suggest to leave out the freelist implementation in this PR. It makes the PR simpler (and therefore easier to review and accept). Even without the 10-20% performance gain from the freelist this PR is worthwhile. We can then reconsider the freelist in followup PRs.

1st1 · 2025-09-21T14:47:05Z

@picnixz I'm sorry for my snappiness (I blame 4 days spent deep in C coding! :)) I'm adding some basic machinery to test that C implementation of uuid() functions behave identically. Since this is even more extra code it's now very reasonable to split this into two prs: one for the base UUID type, another for uuid4() and uuid7(). I'll do that later.

picnixz · 2025-09-21T14:50:21Z

No problem! I also agree that sometimes splitting PRs is not always the best approach and thank you for your understanding!

I am on my way back to Switzerland, so I can review this PR closely starting tomorrow.

eendebakpt · 2025-09-21T18:31:09Z

Modules/_uuidmodule.c

+    uuidobject *self = NULL;
+    uuid_state *state = get_uuid_state_by_cls(type);
+
+    Py_BEGIN_CRITICAL_SECTION(type);


What if a subclass of UUID is created? Then both an exact UUID and the subclass can than access the freelist concurrently.

The following minimal example with a subclass segfaults for me

from uuid import UUID class U(UUID): pass a = UUID(int=10) print(a) b = U(int=10) print(b) print('-')

good catch, I'll fix tomorrow

eendebakpt · 2025-09-21T18:41:44Z

Modules/_uuidmodule.c

+    // There's a precedent with NodeJS doing exact same thing for
+    // improving performance of their UUID implementation.
+
+    // IMPORTANT: callers should have a critical section or a lock


I would remove the comment and rename the method to gen_random_lock_held (or a variation to indicate which lock is held).

In addition you could add inside the method an assert statement to verify the lock is indeed held. For example for critical sections there is _Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED

A side effect of some compatility fixes is the new code, with which the new C uuid7() is now 35x faster that pure Python (used to be 30x).

hugovk · 2025-09-21T19:54:06Z

Note that whatever we do, this is a feature that will only be included in 3.15 so I can also make the small PRs once I am back at home. Changing 3.14 is no more possible unless the RM gives their approval here (cc @hugovk, who could also whether he prefers one or multiple PR) as performance improvements are usually considered as new features (especially if we are adding a C implementation).

I don't think this is 3.14 material and I wasn't pushing for that. 3.15 is fine (unless other people but me want this to be in 3.14, in which case I won't say no).

We're all in agreement this is for 3.15 👍

1st1 requested review from ambv, zooba and pablogsal September 18, 2025 13:14

1st1 requested review from ericsnowcurrently and ZeroIntensity as code owners September 18, 2025 13:14

bedevere-app bot added the awaiting core review label Sep 18, 2025

1st1 changed the title ~~The C implementation considerably boosts the performance of the key UUID~~ Reimplement base UUID type, uuid4(), and uuid7() in C Sep 18, 2025

1st1 force-pushed the uuid_push branch 2 times, most recently from 2d0a381 to c4363f8 Compare September 18, 2025 13:22

1st1 changed the title ~~Reimplement base UUID type, uuid4(), and uuid7() in C~~ #139122: Reimplement base UUID type, uuid4(), and uuid7() in C Sep 18, 2025

1st1 changed the title ~~#139122: Reimplement base UUID type, uuid4(), and uuid7() in C~~ #gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C Sep 18, 2025

bedevere-app bot mentioned this pull request Sep 18, 2025

Implement UUID functions and type in C #139122

Open

1st1 force-pushed the uuid_push branch 2 times, most recently from fa12192 to 4a5c2a2 Compare September 18, 2025 13:24

zooba reviewed Sep 18, 2025

View reviewed changes

StanFromIreland changed the title ~~#gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C~~ gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C Sep 18, 2025

picnixz reviewed Sep 18, 2025

View reviewed changes

1st1 force-pushed the uuid_push branch from 4a5c2a2 to d244f95 Compare September 19, 2025 10:47

1st1 force-pushed the uuid_push branch from d244f95 to da233b4 Compare September 19, 2025 10:56

1st1 force-pushed the uuid_push branch from da233b4 to ba9217e Compare September 19, 2025 10:59

1st1 added 4 commits September 19, 2025 12:20

Address @picnixz's review

6079afd

Regen the clinic files

2f76612

Clarify the type name

5496c4f

Fix news

7fbecbc

picnixz reviewed Sep 19, 2025

View reviewed changes

Misc/NEWS.d/next/Library/2025-09-18-14-13-00.gh-issue-139122.m3lp66.rst Show resolved Hide resolved

1st1 added 2 commits September 19, 2025 13:54

Trim the NEWS file down

8a02c85

Use PyObject* in getters

e51da7b

eendebakpt reviewed Sep 19, 2025

View reviewed changes

Codegen string literals and use them

68c324d

serhiy-storchaka self-requested a review September 21, 2025 16:52

eendebakpt reviewed Sep 21, 2025

View reviewed changes

1st1 added 2 commits September 21, 2025 19:51

Ensure total compatibility of C/Python implementations of uuid4 / uuid7

d657777

A side effect of some compatility fixes is the new code, with which the new C uuid7() is now 35x faster that pure Python (used to be 30x).

Stop importing unused SafeUUID members

47fac93

1st1 added 4 commits September 21, 2025 20:58

Drop SafeUUID.unknown

60c0a01

Drop more things that can be imported dynamically

e81f774

Remove circular import; import 'uuid' from '_uuid' lazily

f7c324d

Regen files

712eae3

Uh oh!

gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C #139123

Are you sure you want to change the base?

gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C #139123

Conversation

1st1 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

python-cla-bot bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

1st1 commented Sep 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

picnixz commented Sep 18, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

1st1 commented Sep 19, 2025

Uh oh!

picnixz commented Sep 19, 2025

Uh oh!

Uh oh!

1st1 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1st1 commented Sep 21, 2025

Uh oh!

picnixz commented Sep 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hugovk commented Sep 21, 2025

Uh oh!

Uh oh!

1st1 commented Sep 18, 2025 •

edited

Loading

python-cla-bot bot commented Sep 18, 2025 •

edited

Loading

1st1 commented Sep 19, 2025 •

edited

Loading