allow sorting keys on to_json and to_python by passing in sort_keys #1637

aezomz · 2025-02-15T19:16:37Z

Hello Pydantic Team! This is my first time contributing to a Rust and Pyo3 related repo.
I am also new in Rust.
Do you think this PR will make sense?
Since I have been trying to do model_dump_json with sort keys too.

This feature should simulate the same as how we use json.dumps(data, sort_keys=True)

Will sort from:
        {
            'field_123': b'test_123',
            'field_b': 12,
            'field_a': b'test',
            'field_c': {'mango': 2, 'banana': 3, 'apple': 1},
            'field_n': [
                {'mango': 3, 'banana': 2, 'apple': 1},
                [{'mango': 3, 'banana': 2, 'apple': 1}, {'d': 3, 'b': 2, 'a': 1}],
                3,
            ],
            'field_d': [
                {'d': 3, 'b': 2, 'a': {'nested3': 3, 'nested1': 1, 'nested2': 2}},
                [[{'mango': 3, 'banana': 2, 'apple': 1}], {'d': 3, 'b': 2, 'a': 1}],
                3,
            ],
            'field_none': None,
        }
To:
    assert s.to_python(m, exclude_none=True, sort_keys=True) == snapshot(
        {
            'field_123': b'test_123',
            'field_a': b'test',
            'field_b': 12,
            'field_c': {'apple': 1, 'banana': 3, 'mango': 2},
            'field_n': [
                {'apple': 1, 'banana': 2, 'mango': 3},
                [{'apple': 1, 'banana': 2, 'mango': 3}, {'a': 1, 'b': 2, 'd': 3}],
                3,
            ],
            'field_d': [
                {'a': {'nested1': 1, 'nested2': 2, 'nested3': 3}, 'b': 2, 'd': 3},
                [[{'apple': 1, 'banana': 2, 'mango': 3}], {'a': 1, 'b': 2, 'd': 3}],
                3,
            ],
        }
    )

Take note that

field_d is extra and still manage to sort
sorting recursively for both defined schema and extras
sorting including dictionary in array or nested array array

Please let me know if I miss out any other features that sort_keys=True is suppose to do!

Thanks!

Change Summary

allow sorting keys on to_json and to_python by passing in sort_keys

Related issue number

should fix pydantic/pydantic#7424
Might need to create another MR on Python repo though, need to check.

Checklist

Unit tests for the changes exist
Documentation reflects the changes where applicable
Pydantic tests pass with this pydantic-core (except for expected changes)
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

codecov · 2025-02-15T19:19:04Z

Codecov Report

Attention: Patch coverage is 12.66234% with 269 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/serializers/fields.rs	0.71%	137 Missing and 1 partial ⚠️
src/serializers/type_serializers/dict.rs	15.51%	46 Missing and 3 partials ⚠️
src/serializers/computed_fields.rs	33.96%	33 Missing and 2 partials ⚠️
src/serializers/infer.rs	10.25%	31 Missing and 4 partials ⚠️
src/serializers/mod.rs	40.00%	6 Missing ⚠️
src/serializers/extra.rs	33.33%	4 Missing ⚠️
python/pydantic_core/core_schema.py	50.00%	0 Missing and 1 partial ⚠️
src/errors/validation_exception.rs	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

codspeed-hq · 2025-02-15T19:23:13Z

CodSpeed Performance Report

Merging #1637 will not alter performance

_{Comparing aezomz:allow_model_dump_sort_keys (ad37c91) with main (d9dacb0)}

Summary

✅ 157 untouched benchmarks

aezomz · 2025-03-04T13:13:32Z

please review

adriangb

Main issue is perf regression

src/errors/validation_exception.rs

src/serializers/fields.rs

tests/serializers/test_model.py

aezomz · 2025-03-05T10:44:05Z

I separated the test out, I also refactor the functions so we can reuse when sort_keys=true.
To keep the original perf benchmark, I have done a simple bool check on sort_keys before using expensive function like sorting.

Let me know what else I need to improve. Thanks

aezomz · 2025-03-18T15:23:13Z

please review, not sure how I can take it from here

adriangb

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

src/serializers/fields.rs

aezomz · 2025-03-23T16:35:06Z

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

Added different sort mode as above, updated the PR description.

aezomz · 2025-03-25T16:32:48Z

please review 👍

aezomz · 2025-03-28T17:39:02Z

please review 👍

python/pydantic_core/_pydantic_core.pyi

aezomz · 2025-04-01T16:07:37Z

please review 👍

There is this test that is failing, unclear if its related to my change.
build-wasm-emscripten

Let me know any other places i can optimize. Thanks!

aezomz · 2025-04-11T02:20:12Z

please review 👍

src/serializers/fields.rs

aezomz · 2025-04-15T14:40:20Z

please review 👍

aezomz · 2025-04-16T13:00:39Z

please review 👍

aezomz · 2025-04-25T19:02:38Z

@DouweM please review :)

DouweM

Thanks for your work on this @aezomz! I'm a Rust newbie as well, but I have a feeling we can significantly simplify this code by dropping the recursive dict sorting and moving this to the serializer for the dict type specifically. Can you give that a try please?

I also think it's worth seeing if we can reduce the duplication between the if sort_keys/else branches, but I know Rust typing may make that tricky...

Let me know if you get stuck, we can bring in some of our Rust experts!

src/serializers/fields.rs

DouweM

@aezomz Thanks, this looks a lot better already, but I have a feeling we can get rid of the recursive sorting entirely, if we do it at each level (model + dict) explicitly.

I'm running into the limits of my Rust skills though, and I'm not sure the data types involved actually allow us to do this.

@davidhewitt Would you mind having a Rusty eyed look for us? :)

DouweM · 2025-04-28T18:12:12Z

src/serializers/fields.rs

+                    if !exclude_default(value, &field_extra, serializer).map_err(py_err_se_err)? {
+                        // Get potentially sorted value
+                        if extra.sort_keys {
+                            let sorted_dict = sort_dict_recursive(value.py(), value).map_err(py_err_se_err)?;


Why do we still need to sort_dict_recursive here if we're already doing that in the dict serialization itself? (Same question for the 2 calls below)

I tried ur recommendation and it doesn't work.

It will only work when converting to Python type (to_python). Since we wrote the serializing logic.
As for to_json we use the serde package if am not wrong.
They are separate into defined field processing and non defined fields processing. Cause we need to sort before using the serde package.

This is based on my high level understanding.

I am unconvinced we need the recursive sort here either. serde will serialize the content in the order we call .serialize_entry, I think we just need to make sure that everywhere we do that we're sorting first.

DouweM · 2025-04-28T18:13:47Z

src/serializers/type_serializers/dict.rs

-                        let value =
-                            value_serializer.to_python(&value, next_include.as_ref(), next_exclude.as_ref(), extra)?;
+                        let value = if extra.sort_keys {
+                            let sorted_value = sort_dict_recursive(py, &value)?;


Could we not do a recursive sort here, but sort new_dict specifically after we've built it here?

I think it might be less efficient for sorting new_dict at the end? It would require an extra iteration over all items. The current approach sorts values recursively as they're being processed, which should be more optimal since we are already iterating and setting key values.

From a quick scan, I think the implementation here would have exponential blowup, because for a dict-of-dicts, when serializing the top-level dict we'll call sort_dict_recursive, and then again we'll call it for each dict when we start serializing them below.

DouweM · 2025-04-28T18:15:04Z

src/serializers/type_serializers/dict.rs

-                        );
-                        map.serialize_entry(&key, &value_serialize)?;
+                        if extra.sort_keys {
+                            let sorted_dict = sort_dict_recursive(value.py(), &value).map_err(py_err_se_err)?;


Same as above -- I'd like to get rid of recursive sorting and do this after we build the map here. But I'm not 100% sure the data types here allow that...

We have serialize_pairs_python and serialize_pairs_json functions in infer.rs which are probably good starting points for code re-use which might lead to a solution.

Fully agree the recursive sort is questionable.

DouweM

@aezomz Thanks, this looks a lot better already, but I have a feeling we can get rid of the recursive sorting entirely, if we do it at each level (model + dict) explicitly.

I'm running into the limits of my Rust skills though, and I'm not sure the data types involved actually allow us to do this.

@davidhewitt Would you mind having a Rusty eyed look for us? :)

aezomz · 2025-05-16T13:34:09Z

src/serializers/fields.rs

+            ..*extra
+        };
+
+        let filter = self.filter.key_filter(key, include, exclude).map_err(py_err_se_err)?;


This check for include and exclude defined in the parameters. Those fields are definied in pydantic schema.

So this function handles the serialization or extra handling of those fields.

aezomz · 2025-05-16T13:36:50Z

src/serializers/fields.rs

@@ -444,8 +546,19 @@ impl TypeSerializer for GeneralFieldsSerializer {
                let filter = self.filter.key_filter(&key, include, exclude).map_err(py_err_se_err)?;
                if let Some((next_include, next_exclude)) = filter {
                    let output_key = infer_json_key(&key, extra).map_err(py_err_se_err)?;
-                    let s = SerializeInfer::new(&value, next_include.as_ref(), next_exclude.as_ref(), extra);
-                    map.serialize_entry(&output_key, &s)?;
+                    if extra.sort_keys {


While this block is cater for extra fields (not defined in pydantic schema)
So we need to sort recursively for that as well.
Both of them doesn't seem to use dict.rs serializer . As the objective is to_json and using serde package directly...

https://github.com/pydantic/pydantic-core/pull/1637/files#diff-d32436e9ac9b3e5dfaf920749269f6cff3dae3b3e030561f9b9bf50447067450R540

aezomz · 2025-05-16T13:45:58Z

@DouweM can you help to review again?
I didn't make much changes except for rebase and commit.
I have shared my comments above on the concerns.
Thanks!

davidhewitt

Sorry for the very slow review by me. Thank you for working on this.

I think this is a good starting point however there's a fair bit of work still needed to make this implementation both efficient and correct.

davidhewitt · 2025-05-27T17:07:33Z

src/serializers/type_serializers/dict.rs

+            sorted_dict.set_item(k, sorted_v)?;
+        }
+        Ok(sorted_dict.into_any())
+    } else if let Ok(list) = value.downcast::<PyList>() {


Why a branch for lists here?

davidhewitt · 2025-05-27T17:07:51Z

src/serializers/type_serializers/dict.rs

-                        let value =
-                            value_serializer.to_python(&value, next_include.as_ref(), next_exclude.as_ref(), extra)?;
+                        let value = if extra.sort_keys {
+                            let sorted_value = sort_dict_recursive(py, &value)?;


From a quick scan, I think the implementation here would have exponential blowup, because for a dict-of-dicts, when serializing the top-level dict we'll call sort_dict_recursive, and then again we'll call it for each dict when we start serializing them below.

davidhewitt · 2025-05-28T09:10:24Z

src/serializers/fields.rs

There's now a lot of code duplication in this file. It feels like what we really need is a for_all_fields function which takes a closure and calls it for each field to serialize in turn.

That function can take responsibility for, if sorting requested, collecting all fields into a vec and sorting them first. Otherwise can just do the serializing in a streaming fashion.

davidhewitt · 2025-05-28T09:11:31Z

src/serializers/fields.rs

+                    if !exclude_default(value, &field_extra, serializer).map_err(py_err_se_err)? {
+                        // Get potentially sorted value
+                        if extra.sort_keys {
+                            let sorted_dict = sort_dict_recursive(value.py(), value).map_err(py_err_se_err)?;


I am unconvinced we need the recursive sort here either. serde will serialize the content in the order we call .serialize_entry, I think we just need to make sure that everywhere we do that we're sorting first.

davidhewitt · 2025-05-28T09:12:11Z

src/serializers/infer.rs

There are places in this file where we serialize dicts, those presumably need sorting added (and tests). Best way to test those code paths is by using the serialize_as_any=True runtime flag.

Oic, i just realized that the fields.rs json serialization uses infer.rs ... thats a great hint. Managed to implement it. Will look at adding the tests... Thanks David.

davidhewitt · 2025-05-28T09:13:20Z

src/serializers/type_serializers/dict.rs

-                        );
-                        map.serialize_entry(&key, &value_serialize)?;
+                        if extra.sort_keys {
+                            let sorted_dict = sort_dict_recursive(value.py(), &value).map_err(py_err_se_err)?;


We have serialize_pairs_python and serialize_pairs_json functions in infer.rs which are probably good starting points for code re-use which might lead to a solution.

Fully agree the recursive sort is questionable.

davidhewitt · 2025-05-28T09:13:48Z

tests/serializers/test_model.py

We need to add tests for computed fields; at the moment the implementation neither supports them nor tests them.

aezomz · 2025-06-27T18:27:30Z

@davidhewitt please give it another review when u have the chance. thanks~

aezomz force-pushed the allow_model_dump_sort_keys branch from 07a31f5 to 7222c8d Compare March 4, 2025 10:22

pydantic-hooky bot added the ready for review label Mar 4, 2025

pydantic-hooky bot assigned davidhewitt Mar 4, 2025

aezomz marked this pull request as ready for review March 4, 2025 13:19

adriangb reviewed Mar 4, 2025

View reviewed changes

src/errors/validation_exception.rs Outdated Show resolved Hide resolved

src/serializers/fields.rs Outdated Show resolved Hide resolved

tests/serializers/test_model.py Outdated Show resolved Hide resolved

zzstoatzz mentioned this pull request Mar 10, 2025

[wip] sort keys #1666

Closed

adriangb reviewed Mar 18, 2025

View reviewed changes

src/serializers/fields.rs Outdated Show resolved Hide resolved

aezomz force-pushed the allow_model_dump_sort_keys branch from 0cf4b6d to 95f9329 Compare March 23, 2025 14:44

aezomz force-pushed the allow_model_dump_sort_keys branch from 95f9329 to c83a212 Compare March 28, 2025 17:28

samuelcolvin reviewed Mar 28, 2025

View reviewed changes

python/pydantic_core/_pydantic_core.pyi Outdated Show resolved Hide resolved

aezomz force-pushed the allow_model_dump_sort_keys branch from c83a212 to 0075a4b Compare March 31, 2025 17:30

aezomz force-pushed the allow_model_dump_sort_keys branch from dce2689 to 5ce696c Compare April 4, 2025 15:47

adriangb reviewed Apr 11, 2025

View reviewed changes

src/serializers/fields.rs Outdated Show resolved Hide resolved

aezomz commented Apr 14, 2025

View reviewed changes

src/serializers/fields.rs Outdated Show resolved Hide resolved

aezomz force-pushed the allow_model_dump_sort_keys branch from dfe7380 to 1b8d824 Compare April 14, 2025 17:48

aezomz requested a review from adriangb April 23, 2025 15:15

DouweM self-assigned this Apr 23, 2025

DouweM self-requested a review April 23, 2025 19:24

DouweM force-pushed the allow_model_dump_sort_keys branch from dc0ec59 to 11501f6 Compare April 25, 2025 21:58

DouweM requested changes Apr 25, 2025

View reviewed changes

src/serializers/fields.rs Outdated Show resolved Hide resolved

src/serializers/fields.rs Show resolved Hide resolved

src/serializers/fields.rs Outdated Show resolved Hide resolved

DouweM unassigned davidhewitt Apr 25, 2025

aezomz force-pushed the allow_model_dump_sort_keys branch from 11501f6 to 200865b Compare April 26, 2025 10:29

DouweM requested changes Apr 28, 2025

View reviewed changes

DouweM assigned davidhewitt Apr 28, 2025

aezomz force-pushed the allow_model_dump_sort_keys branch from 53186da to 66c7f53 Compare May 16, 2025 13:25

aezomz commented May 16, 2025

View reviewed changes

aezomz force-pushed the allow_model_dump_sort_keys branch from 66c7f53 to 91e98f9 Compare May 16, 2025 13:44

aezomz requested review from DouweM and samuelcolvin May 16, 2025 13:46

davidhewitt requested changes May 28, 2025

View reviewed changes

pydantic-hooky bot added awaiting author revision and removed ready for review labels May 28, 2025

pydantic-hooky bot assigned aezomz and unassigned davidhewitt May 28, 2025

DouweM assigned davidhewitt and unassigned DouweM May 28, 2025

aezomz added 2 commits June 26, 2025 01:59

feat: allow sort_keys on to_json and to_python

1c62bf1

rebase and adapt to new compute field fn

ad37c91

aezomz force-pushed the allow_model_dump_sort_keys branch from 2efb8ae to ad37c91 Compare June 25, 2025 18:07

aezomz requested a review from davidhewitt June 25, 2025 18:19

allow sorting keys on to_json and to_python by passing in sort_keys #1637

Are you sure you want to change the base?

allow sorting keys on to_json and to_python by passing in sort_keys #1637

Conversation

aezomz commented Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Summary

Related issue number

Checklist

Uh oh!

codecov bot commented Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #1637 will not alter performance

Summary

Uh oh!

aezomz commented Mar 4, 2025

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aezomz commented Mar 5, 2025

Uh oh!

aezomz commented Mar 18, 2025

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aezomz commented Mar 23, 2025

Uh oh!

aezomz commented Mar 25, 2025

Uh oh!

aezomz commented Mar 28, 2025

Uh oh!

Uh oh!

aezomz commented Apr 1, 2025

Uh oh!

aezomz commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

aezomz commented Apr 15, 2025

Uh oh!

aezomz commented Apr 16, 2025

Uh oh!

aezomz commented Apr 25, 2025

Uh oh!

DouweM left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DouweM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aezomz May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

aezomz commented Feb 15, 2025 •

edited

Loading

codecov bot commented Feb 15, 2025 •

edited

Loading

codspeed-hq bot commented Feb 15, 2025 •

edited

Loading

aezomz May 16, 2025 •

edited

Loading

aezomz May 16, 2025 •

edited

Loading