Skip to content

allow sorting keys on to_json and to_python by passing in sort_keys #1637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

aezomz
Copy link

@aezomz aezomz commented Feb 15, 2025

Hello Pydantic Team! This is my first time contributing to a Rust and Pyo3 related repo.
I am also new in Rust.
Do you think this PR will make sense?
Since I have been trying to do model_dump_json with sort keys too.

This feature should simulate the same as how we use json.dumps(data, sort_keys=True)

Will sort from:
        {
            'field_123': b'test_123',
            'field_b': 12,
            'field_a': b'test',
            'field_c': {'mango': 2, 'banana': 3, 'apple': 1},
            'field_n': [
                {'mango': 3, 'banana': 2, 'apple': 1},
                [{'mango': 3, 'banana': 2, 'apple': 1}, {'d': 3, 'b': 2, 'a': 1}],
                3,
            ],
            'field_d': [
                {'d': 3, 'b': 2, 'a': {'nested3': 3, 'nested1': 1, 'nested2': 2}},
                [[{'mango': 3, 'banana': 2, 'apple': 1}], {'d': 3, 'b': 2, 'a': 1}],
                3,
            ],
            'field_none': None,
        }
To:
    assert s.to_python(m, exclude_none=True, sort_keys=True) == snapshot(
        {
            'field_123': b'test_123',
            'field_a': b'test',
            'field_b': 12,
            'field_c': {'apple': 1, 'banana': 3, 'mango': 2},
            'field_n': [
                {'apple': 1, 'banana': 2, 'mango': 3},
                [{'apple': 1, 'banana': 2, 'mango': 3}, {'a': 1, 'b': 2, 'd': 3}],
                3,
            ],
            'field_d': [
                {'a': {'nested1': 1, 'nested2': 2, 'nested3': 3}, 'b': 2, 'd': 3},
                [[{'apple': 1, 'banana': 2, 'mango': 3}], {'a': 1, 'b': 2, 'd': 3}],
                3,
            ],
        }
    )

Take note that

  1. field_d is extra and still manage to sort
  2. sorting recursively for both defined schema and extras
  3. sorting including dictionary in array or nested array array

Please let me know if I miss out any other features that sort_keys=True is suppose to do!

Thanks!

Change Summary

allow sorting keys on to_json and to_python by passing in sort_keys

Related issue number

should fix pydantic/pydantic#7424
Might need to create another MR on Python repo though, need to check.

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

Copy link

codecov bot commented Feb 15, 2025

Codecov Report

Attention: Patch coverage is 85.77075% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/serializers/fields.rs 84.91% 35 Missing ⚠️
python/pydantic_core/core_schema.py 50.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Copy link

codspeed-hq bot commented Feb 15, 2025

CodSpeed Performance Report

Merging #1637 will not alter performance

Comparing aezomz:allow_model_dump_sort_keys (53186da) with main (334a9b0)

Summary

✅ 157 untouched benchmarks

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 07a31f5 to 7222c8d Compare March 4, 2025 10:22
@aezomz
Copy link
Author

aezomz commented Mar 4, 2025

please review

@aezomz aezomz marked this pull request as ready for review March 4, 2025 13:19
Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main issue is perf regression

@aezomz
Copy link
Author

aezomz commented Mar 5, 2025

I separated the test out, I also refactor the functions so we can reuse when sort_keys=true.
To keep the original perf benchmark, I have done a simple bool check on sort_keys before using expensive function like sorting.

Let me know what else I need to improve. Thanks

@zzstoatzz zzstoatzz mentioned this pull request Mar 10, 2025
@aezomz
Copy link
Author

aezomz commented Mar 18, 2025

please review, not sure how I can take it from here

Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 0cf4b6d to 95f9329 Compare March 23, 2025 14:44
@aezomz
Copy link
Author

aezomz commented Mar 23, 2025

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

Added different sort mode as above, updated the PR description.

@aezomz
Copy link
Author

aezomz commented Mar 25, 2025

please review 👍

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 95f9329 to c83a212 Compare March 28, 2025 17:28
@aezomz
Copy link
Author

aezomz commented Mar 28, 2025

please review 👍

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from c83a212 to 0075a4b Compare March 31, 2025 17:30
@aezomz
Copy link
Author

aezomz commented Apr 1, 2025

please review 👍

There is this test that is failing, unclear if its related to my change.
build-wasm-emscripten

Let me know any other places i can optimize. Thanks!

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from dce2689 to 5ce696c Compare April 4, 2025 15:47
@aezomz
Copy link
Author

aezomz commented Apr 11, 2025

please review 👍

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from dfe7380 to 1b8d824 Compare April 14, 2025 17:48
@aezomz
Copy link
Author

aezomz commented Apr 15, 2025

please review 👍

1 similar comment
@aezomz
Copy link
Author

aezomz commented Apr 16, 2025

please review 👍

@aezomz aezomz requested a review from adriangb April 23, 2025 15:15
@DouweM DouweM self-assigned this Apr 23, 2025
@DouweM DouweM self-requested a review April 23, 2025 19:24
@aezomz
Copy link
Author

aezomz commented Apr 25, 2025

@DouweM please review :)

@DouweM DouweM force-pushed the allow_model_dump_sort_keys branch from dc0ec59 to 11501f6 Compare April 25, 2025 21:58
Copy link
Contributor

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @aezomz! I'm a Rust newbie as well, but I have a feeling we can significantly simplify this code by dropping the recursive dict sorting and moving this to the serializer for the dict type specifically. Can you give that a try please?

I also think it's worth seeing if we can reduce the duplication between the if sort_keys/else branches, but I know Rust typing may make that tricky...

Let me know if you get stuck, we can bring in some of our Rust experts!

@@ -229,6 +208,124 @@ impl GeneralFieldsSerializer {
}
}

fn sort_dict_recursive<'py>(py: Python<'py>, value: &Bound<'py, PyAny>) -> PyResult<Bound<'py, PyAny>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was suggested in a previous review, but instead of doing this recursively here, would it be an option to implement single-level sorting in this file and in src/serializers/type_serializers/dict.rs, so that it works on any dict serialization, not just when nested in a model/dataclass/typeddict?

That way, we may not need this recursive helper function anymore, and possibly can do away with process_field_entry_python entirely, since items.sort_by(|(a, _, _), (b, _, _)| a.cmp(b)) is already happening in main_serde_serialize and main_to_python.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for guiding me.
I did a refactor based on your advice!
However, i still have to add a few functions to ensure that we dont repeat our code too much.

@@ -242,24 +339,67 @@ impl GeneralFieldsSerializer {
// we don't both with `used_fields` here because on unions, `to_python(..., mode='json')` is used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment just above here says "we maintain the order of the input dict", that's not correct anymore!

if extra.exclude_none && value.is_none() {
if let Some(field) = op_field {
if field.required {
if !extra.sort_keys {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we swap these branches so we have the true case first followed by the false case? Same for the other instance of if !extra.sort_keys

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 11501f6 to 200865b Compare April 26, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sorting keys to model_dump_json
5 participants