Skip to content

[ty] Deterministic ordering of types #18091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented May 14, 2025

Summary

Our type ordering/normalization previously relied on PartialOrd/Ord instances that were auto-generated for salsa::interned structs. These implementations compare by salsa ID, which is not necessarily deterministic (depends on when a particular type has been interned, and can therefore depend on file checking order).

Besides not being deterministic, it was also incorrect. When comparing two equal unions A1 | B1 and A2 | B2, it was possible for one union to be ordered incorrectly (e.g. if both A1 and B1 were interned nominal instances), leading to incorrect type-relation results.

Writing these ordering functions by hand is exhausting and error-prone. This is why I only went so far as to fix the linked bug. Much more work would be involved to implement this for structs like FunctionType or CallableType. We should probably look into auto-generating these methods somehow?

Related salsa discussion

fixes astral-sh/ty#369

Test Plan

Ran the following command for a while and saw no flaky behavior (on the MRE from the linked ticket):

while true; do ~/.cargo-target/release/ty check --output-format concise 2>&1 | rg '(Found|checks)'; done

@sharkdp sharkdp added the ty Multi-file analysis & type inference label May 14, 2025
Copy link
Contributor

github-actions bot commented May 14, 2025

mypy_primer results

Changes were detected when running on open source projects
hydra-zen (https://github.com/mit-ll-responsible-ai/hydra-zen)
- error[invalid-return-type] src/hydra_zen/wrapper/_implementations.py:945:16: Return type does not match returned value: expected `DataClass_`, found `@Todo(unsupported type[X] special form) | (((...) -> Any) & dict[Unknown, Unknown]) | (DataClass_ & dict[Unknown, Unknown]) | (list[Any] & dict[Unknown, Unknown]) | dict[Any, Any] | (((...) -> Any) & list[Unknown]) | (DataClass_ & list[Unknown]) | list[Any] | (dict[Any, Any] & list[Unknown])`
+ error[invalid-return-type] src/hydra_zen/wrapper/_implementations.py:945:16: Return type does not match returned value: expected `DataClass_`, found `@Todo(unsupported type[X] special form) | (((...) -> Any) & dict[Unknown, Unknown]) | (DataClass_ & dict[Unknown, Unknown]) | (list[Any] & dict[Unknown, Unknown]) | dict[Any, Any] | (((...) -> Any) & list[Unknown]) | (DataClass_ & list[Unknown]) | list[Any]`
- error[type-assertion-failure] tests/annotations/declarations.py:956:5: Argument does not have asserted type `PBuilds[@Todo(Support for `typing.TypeAlias`)] | StdBuilds[@Todo(Support for `typing.TypeAlias`)]`
- error[type-assertion-failure] tests/annotations/declarations.py:961:5: Argument does not have asserted type `PBuilds[@Todo(Support for `typing.TypeAlias`)] | StdBuilds[@Todo(Support for `typing.TypeAlias`)]`
- error[type-assertion-failure] tests/annotations/declarations.py:969:5: Argument does not have asserted type `FullBuilds[@Todo(Support for `typing.TypeAlias`)] | PBuilds[@Todo(Support for `typing.TypeAlias`)] | StdBuilds[@Todo(Support for `typing.TypeAlias`)]`
- error[type-assertion-failure] tests/annotations/declarations.py:980:5: Argument does not have asserted type `FullBuilds[@Todo(Support for `typing.TypeAlias`)] | PBuilds[@Todo(Support for `typing.TypeAlias`)] | StdBuilds[@Todo(Support for `typing.TypeAlias`)]`
- Found 649 diagnostics
+ Found 645 diagnostics

@sharkdp sharkdp marked this pull request as ready for review May 14, 2025 11:56
@MichaReiser
Copy link
Member

Hmm, this is indeed painful and also increases the work necessary to normalize types (because it's now necessary to perform a deep comparison).

Our type ordering/normalization previously relied on PartialOrd/Ord instances that were auto-generated for salsa::interned structs. These implementations compare by salsa ID, which is not necessarily deterministic (depends on when a particular type has been interned, and can therefore depend on file checking order).

I wonder if this is actually a problem. I'm not sure if it will be when we start garbage collecting interned values but is it today?

@sharkdp
Copy link
Contributor Author

sharkdp commented May 14, 2025

I wonder if this is actually a problem. I'm not sure if it will be when we start garbage collecting interned values but is it today?

I don't understand? astral-sh/ty#369 (comment) definitely feels like an actual problem to me. It's not just non-deterministic, it's also incorrect (sometimes).

@sharkdp sharkdp force-pushed the david/stable-ordering branch from cd6a2d3 to 6e83ce9 Compare May 14, 2025 18:24
@sharkdp
Copy link
Contributor Author

sharkdp commented May 14, 2025

also increases the work necessary to normalize types (because it's now necessary to perform a deep comparison).

The benchmarks seem completely neutral: https://codspeed.io/astral-sh/ruff/branches/david%2Fstable-ordering

@AlexWaygood
Copy link
Member

I'm not sure I fully understand yet whether the problem is:

  1. That the ordering isn't stable between runs or
  2. That some part of ty incorrectly assumes that the ordering will be stable between runs

@AlexWaygood
Copy link
Member

Besides not being deterministic, it was also incorrect. When comparing two equal unions A1 | B1 and A2 | B2, it was possible for one union to be ordered incorrectly (e.g. if both A1 and B1 were interned nominal instances), leading to incorrect type-relation results.

I don't fully understand this point. Could you possibly give an example?

@@ -149,6 +149,14 @@ impl<'db> ScopeId<'db> {
NodeWithScopeKind::GeneratorExpression(_) => "<generator>",
}
}

pub(crate) fn ordering(self, db: &'db dyn Db, other: Self) -> std::cmp::Ordering {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this would be tricky to enforce unless salsa avoids adding PartialOrd, Ord to the interned structs as pointed out by Micha and in the linked salsa discussion.

@@ -229,9 +232,15 @@ impl<'db> GenericContext<'db> {

Specialization::new(db, self, expanded.into_boxed_slice())
}

pub(crate) fn ordering(self, db: &'db dyn Db, other: Self) -> Ordering {
self.variables(db)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to implement .ordering for TypeVarInstance as it's interned?

@MichaReiser
Copy link
Member

I wonder if this is actually a problem. I'm not sure if it will be when we start garbage collecting interned values but is it today?

I don't understand? astral-sh/ty#369 (comment) definitely feels like an actual problem to me. It's not just non-deterministic, it's also incorrect (sometimes).

I'm mainly questioning whether we need to override everywhere. We definitely should override it if it otherwise leads to incorrect ordering. I don't think we have to override in cases where it's only about determinism because ids are then a very convenient and fast way to arbitrarily sort items (if the only goal is that they have a fixed ordering but don't require a specific semantic ordering)

It's not quite clear to me what you're fixing in this pr (at least one case seems semantic)

@sharkdp sharkdp marked this pull request as draft May 14, 2025 20:10
@carljm
Copy link
Contributor

carljm commented May 15, 2025

I think our prior assumption was that ordering of types need only be deterministic within a given run of ty, because we should always normalize types as needed to ensure that type ordering differences don't have semantic effects. So if we fix the specific bugs where we fail to normalize or allow type ordering differences to make a semantic difference, that should make it unnecessary to establish a universally consistent type ordering.

@sharkdp
Copy link
Contributor Author

sharkdp commented May 15, 2025

that should make it unnecessary to establish a universally consistent type ordering.

Until we implement persistent caching :-)

@MichaReiser
Copy link
Member

that should make it unnecessary to establish a universally consistent type ordering.

Until we implement persistent caching :-)

Haha, fair. Although salsa might solve this if the rematerialized structs retain their relative order

@sharkdp sharkdp closed this May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ty Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-deterministic output on hydra-zen
5 participants