Skip to content

Conversation

@Soulthym
Copy link
Collaborator

@Soulthym Soulthym commented Jul 18, 2025

This PR aims to solve #334
It adds proper type annotations, checking, and inference

Mainly, it splits Scalar types values into 4 possibilities:

  • _: an Untyped (scalar) type hole, used to represent a ScalarType that is not yet known or defined
  • uint: a constant unsigned integer, aimed to be used only as an index and type for RangeExpression
  • felt: a single field element
  • bool: a boolean value formed from a felt value

Aggregate types have a new syntax:

  • ScalarType[len]: a Vector containing len scalars of type ScalarType, where len is a positive integer:

    • _[len]: a Vector of len scalar type holes
    • uint[len]: a Vector of len unsigned integers
    • felt[len]: a Vector of len field elements
    • bool[len]: a Vector of len boolean values
  • ScalarType[rows, cols]: a Matrix containing rows rows and cols columns of type ScalarType, where rows and cols are positive integers:

    • _[rows, cols]: a Matrix of rows rows and cols columns of scalar type holes
    • uint[rows, cols]: a Matrix of rows rows and cols columns of unsigned integers
    • felt[rows, cols]: a Matrix of rows rows and cols columns of field elements
    • bool[rows, cols]: a Matrix of rows rows and cols columns of boolean values

I am unsure of which syntax to use for Matrices. I have re-used the one used in the Display implementation.
An alternative would be: SCALAR_TYPE[[cols], rows], which has the advantage of matching our notation for Tables (Matrices of unknown row length).
Both are possible and not an issue to change.

It also adds optional type annotations in the parser for:

  • const expressions: const x: TYPE = EXPR, where TYPE is one of the above types.
  • let expressions: let x: TYPE = EXPR;, where TYPE is one of the above types.
  • functions: fn func(x: TYPE) -> TYPE, where TYPE is one of the above types.
  • evaluator: ev evaluator(SCALAR_TYPE[a, b]), where SCALAR_TYPE is one of the above ScalarTypes and a, b are column labels

Most of individual pieces are implemented, but rely on a few fixes and improvements to work properly together. Those will be merged as soonn as the underlying issues are resolved.
I've marked those as [WIP] below.

Check-list:

  • [WIP] Parse for type annotations: WIP
    • type annotations
    • let x: TYPE = EXPR;: still a bug with type update order, a fix is on the way using a ty field in all/most AST nodes
    • fn func(x: TYPE) -> TYPE: relies on the let expression fix
    • ev evaluator(SCALAR_TYPE[a, b]): relies on the let expression fix
  • [WIP] Local Type inference:
    • BinaryExpression: a few cases are still not handled properly, work in progress.
      • May require tweaks depending on test results.
    • Other types:
      • Inference
      • [WIP] Add a ty field to all AST nodes that can have a type to preserve infered types + impl TypeMut.
  • [WIP] Type checking:
    • Local type checking during inference
    • [WIP] New type checking pass:
      • Check that all indices are constant uint values
      • Track narrowed/widened types accross all their uses to raise a diagnostic when an incompatibility is reached.
  • [WIP]: Tests
    • [WIP]: Fix existing tests due to errors/limitations in the type inference pass
    • [WIP]: Update tests where needed
    • [WIP]: Type Inference tests on complex expressions (such as BinaryExpression)
  • [WIP]: Update documentation
  • [WIP]: Remaining items in Separation between Felt, Bool, and Uint types #423 (comment)
  • add new tests with the new features:
    • booleans
    • uints
    • various shapes
    • bin_ops
    • aggregates (vec/matrix/mixed-types of upcastable shapes)
      Leaving it as a Draft PR until the issues outlined above are resolved.

@bobbinth
Copy link
Contributor

const expressions: const x: TYPE = EXPR, where TYPE is one of the above types.

Would we be able to verify at compile time that const expressions match the declared type? I'm mainly thinking of declaring a constant to be a bool type.

let expressions: let x: TYPE = EXPR;, where TYPE is one of the above types.

Similar question to the one above, but also, if we are able to verify the type of EXPR, I'm assuming that we can infer it as well. In such a case, do we really need this syntax?

@Soulthym
Copy link
Collaborator Author

const expressions: const x: TYPE = EXPR, where TYPE is one of the above types.

Would we be able to verify at compile time that const expressions match the declared type? I'm mainly thinking of declaring a constant to be a bool type.

Yes it should be able to verify annotations, both on const and let expressions.

let expressions: let x: TYPE = EXPR;, where TYPE is one of the above types.

Similar question to the one above, but also, if we are able to verify the type of EXPR, I'm assuming that we can infer it as well. In such a case, do we really need this syntax?

I do agree in principle that we don't really need it.
However, my thinking was that having those annotations provided more benefits than not having them, mainly:

  • they make testing the inference logic a lot easier, where the tests can be self contained within a .air script. There's no need to explicitely test every type in its ast representation, only some cases to make sure inference code paths get triggered, the rest can be done "in-line" in test scripts.
  • they allow a greater expressivity of what should be a valid type, and hence more guardrails a module could put in place against miss-using it
  • they provide an escape hatch1, should the inference be wrong (or may be too restrictive/permissive) but still reach production. Code with added annotations should remain compatible with a future inference fix, without requiring modifications or new audits.

Footnotes

  1. Note about the escape hatch: you can already do it by declaring a function with your expression as body, and calling it inline where the expression used to be with its arguments that get checked against argument types. I find that to be somewhat of a hacky fix and prefer optional typing on let/const

@Soulthym
Copy link
Collaborator Author

Soulthym commented Jul 22, 2025

On second thought, I don't think inference can be done perfectly in all cases anyway without making the type system too complex.

Take the following example with 2 mutually exclusive flags a and b:

1: let a: bool = main[0];
2: let b: bool = main[1];
3: enf a^2 = a;
4: enf b^2 = b;
5: let c: TYPE = a + b;
6: enf c^2 = c;
7: let d: TYPE2 = a + b;

What should TYPE and TYPE2 be? They could be both felt and bool, depending on context: line 5 and 6 could come from a different module than line 7, after inlining. The types may not even be equal, where c: TYPE could be inferred to be a bool and d: TYPE2 would be a felt before CSE (which happens last, after type-inference).

I think ensuring inference is conservative when ambiguous + adding an explicit type cast via let annotations that respect subtyping rules under variance would let the user disambiguate in case both interpretations were valid.

Here that would mean infering TYPE = TYPE2 = felt (because (a: bool) + (b: bool) may be a felt iff a == 1 and b == 1), and let the user cast it down to a bool where appropriate, which is a subtype of felt under covariance, so the cast would be valid.

@bobbinth
Copy link
Contributor

I think ensuring inference is conservative when ambiguous + adding an explicit type cast via let annotations that respect subtyping rules under variance would let the user disambiguate in case both interpretations were valid.

I think allowing the user to specify the type may lead to some confusion because the user my assume that specifying the type somehow enforces this type. If we do this, we should make sure to report very clear warnings that we don't actually know if what the user has specified is correct.

Take the following example with 2 mutually exclusive flags a and b:

1: let a: bool = main[0];
2: let b: bool = main[1];
3: enf a^2 = a;
4: enf b^2 = b;
5: let c: TYPE = a + b;
6: enf c^2 = c;
7: let d: TYPE2 = a + b;

A couple of comments here:

  1. We should use a special is_bool(x) primitive to allow the user to "narrow" the type. Under the hood, this will be translated into enf x^2 = x constraint.
  2. The type of a main trace column should always be felt and the user should not be able to change it without an extra constraint. So, for example let a: bool = main[0]; should not be valid.

So, I would re-write the above example as:

1: let a = main[0];
2: let b = main[1];
3: is_bool(a);
4: is_bool(b);
5: let c: TYPE = a + b;
6: is_bool(c);
7: let d: TYPE2 = a + b;

What should TYPE and TYPE2 be? They could be both felt and bool, depending on context: line 5 and 6 could come from a different module than line 7, after inlining. The types may not even be equal, where c: TYPE could be inferred to be a bool and d: TYPE2 would be a felt before CSE (which happens last, after type-inference).

In the above, TYPE would be bool (because we have is_bool(c) and TYPE2 would be felt (because nothing is constraining it).

Is the issue here that is_bool(c) comes after let c = a + b? If so, I wonder if we could modify the way is_bool() works specifically for type inference purposes. For example, could we write the above as:

1: let a = is_bool(main[0]);
2: let b = is_bool(main[1]);
3: let c = a + b;
4: let c = is_bool(c); // not sure if we allow variable shadowing, if not, we could use a different name
5: let d = a + b;

The constraints generated would be exactly the same as in the original example, but here, is_bool() is still the only way to narrow the type from felt to bool, but from inference standpoint, it acts kind of like a function that takes an expression and returns a boolean value.

@Soulthym
Copy link
Collaborator Author

Soulthym commented Jul 23, 2025

Is the issue here that is_bool(c) comes after let c = a + b? If so, I wonder if we could modify the way is_bool() works specifically for type inference purposes.

The order wasn't the issue, the example was meant to show valid type conversions, and their consequence before and after CSE (mainly that d would be detected as equal to c after CSE, and hence could be casted both as a bool and felt, but not before CSE).

If we narrow it down to bool only in the presence of a is_bool(x), that should work aswell, and shouldn't require explicit casting via type annotations.

On the other points you've mentioned, I do agree with that design, and it shouldn't be too big of a change to incorporate in this PR.

@bobbinth
Copy link
Contributor

The order wasn't the issue, the example was meant to show valid type conversions, and their consequence before and after CSE (mainly that d would be detected as equal to c after CSE, and hence could be casted both as a bool and felt, but not before CSE).

Makes sense. I think the can always do is_bool(d) and then (I'm hoping), common sub-expression elimination would be able to remove the duplicate constraints.

If we narrow it down to bool only in the presence of a is_bool(x), that should work aswell, and shouldn't require explicit casting via type annotations.

The only thing I'm wondering is whether it should be just is_bool(x) (as I had in my example) or enf is_bool(x). The latter would be more consistent, I think.

@Soulthym Soulthym force-pushed the feat-numeric-types branch from 65d0047 to 3d22950 Compare July 29, 2025 13:38
@Soulthym Soulthym force-pushed the feat-numeric-types branch from 3d22950 to 7f30ca7 Compare July 29, 2025 13:46
@Soulthym
Copy link
Collaborator Author

Hey @bobbinth!

I have rebased on next, done and merged the necessary tweaks we've discussed to the type system itself under the typing crate.

I am still merging in the integration into the existing codebase, based on next. That part should come fairly soon.

When implementing binary operations, I've noticed a few cases that, while technically correct, lose some information that could be used to refine the type system in a couple cases.

I opened a separate issue #432 to track it for now, since that shouldn't affect the soundness, while only adding precision in certain cases.

Do let me know if you'd like us to include it (or parts of it) in this PR!

@bobbinth
Copy link
Contributor

I opened a separate issue #432 to track it for now, since that shouldn't affect the soundness, while only adding precision in certain cases.

Do let me know if you'd like us to include it (or parts of it) in this PR!

This PR is pretty big as is - so, I'd probably leave it for a follow-up (unless including it in this PR will simplify things somehow).

@Soulthym
Copy link
Collaborator Author

Soulthym commented Aug 1, 2025

Hey @bobbinth!

I have merged the changes up to date in the current branch. It is now integrated in the pipeline.

We currently have the following structure:
./typing: type-system crate
./typing/src/lib.rs: type-system traits and their implementations for most types.
./typing/src/types.rs: type-system structures.

The subtyping rules are as described in
https://github.com/massalabs/air-script/blob/e7fadd4cbcff36d745525921c7ff6ec9c3c33623/typing/src/lib.rs#L72-L312

And those for binary operations (=+-*^) in https://github.com/massalabs/air-script/blob/e7fadd4cbcff36d745525921c7ff6ec9c3c33623/typing/src/types.rs#L465-L668

Notes:

  • currently an unknown type or (unknown scalartype) is a subtype of all (scalar) types. The discussion in Refine type inference #432 seems to suggest that it should actually be a supertype.
  • binary ops have rules for inference that are more precise than the current implementation, details are in those impl's doc comments.
  • Some tests are currently broken, I'll investigate those early next week.
  • I have removed the ability to cast explicitly, there are no more annotations besides function parameters.

I'll update the task list in the top message of this thread.

Remaining task-list:

  • fix left-over TODOs I've added when I did the integration.
  • add the assert_bool primitive.
  • refactor remaining ty() methods as impls of Typing.
  • add a infer_and_update_ty in TypeMut trait, plus call it where appropriate. It should check subtyping rules or error if invalid.
  • fix or adapt the currently broken tests:
    • annotate function parameters
    • manual tweaks to expected Mir/Ast where the type is more refined.
    • if some still dont pass, make unknown types supertypes of all others and adapt ./typing tests, see if that improves things.
    • if there are still tests not passing, restore the "symmetric" behaviour of BinType (args must be widened to the lowest common supertype, usually felt or one of the operand's types).
  • add a type_check method to Typing so it can be called anywhere in the pipeline, or a type_check pass.
  • document all public interfaces and new assumptions (I've mostly only documented subtyping rules and inference rules).

@bobbinth bobbinth requested review from bitwalker and bobbinth August 7, 2025 17:48
@Soulthym Soulthym force-pushed the feat-numeric-types branch from 95f7393 to b2c75b6 Compare August 8, 2025 10:06
@adr1anh adr1anh self-requested a review August 22, 2025 12:47
Copy link
Collaborator

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is still a number of stubbed out bits/TODOs, but I'm approving as-is for now, contingent on my couple suggestions around the typing crate rename and how we manage dependencies within the workspace.

@Soulthym
Copy link
Collaborator Author

Hey @bitwalker, thanks for the review!

I've implemented your suggestions in 9419d39, and renamed the directory accordingly to improve clarity.

@Soulthym
Copy link
Collaborator Author

Soulthym commented Sep 1, 2025

Hey!

We've been able to sync about how Separation between Felt, Bool, and Uint types (#423) and Allow support for computed indices (#444) interact, and we think the best way would be to:

  • merge the Allow support for computed indices PR in next. This one should be reviewed in priority, so that we can proceed with the following items.
  • merge next in Separation between Felt, Bool, and Uint types
  • Update Separation between Felt, Bool, and Uint types to add the relevants checks for computed indices types

In the meantime, I'm merging other missing features/integrations into Separation between Felt, Bool, and Uint types, so while I wont be blocked immediately waiting for Allow support for computed indices, I will be in a few days, once the remaining features are integrated.

@adr1anh
Copy link
Contributor

adr1anh commented Sep 2, 2025

Given the changes you are planning to make to this branch, are there portions that are reviewable now/are not expected to change much?

@Soulthym
Copy link
Collaborator Author

Soulthym commented Sep 2, 2025

Given the changes you are planning to make to this branch, are there portions that are reviewable now/are not expected to change much?

The next commit should be just that: while it is retro-compatible at the moment, there are still overlaps before I integrate it fully on most crates (ast, sema, mir, air-types) I'll ping you as soon as it's at that stage!

@Soulthym
Copy link
Collaborator Author

Soulthym commented Sep 19, 2025

Hey @adr1anh

The tests are currently failing due to the missing type-checking pass (all errors seem to stem from an error due to some types being None), otherwise most changes are fully merged in.

From this point, not much should change besides the new pass, a few tests, and the assert_bool handling once Mir::Cast is merged in.

I've updated the task lists above, for a quick recap here is what's left:

Leftover tasks:

  • type inference + check for uint indices pass (wip)
  • cast primitive in Mir (wip)
  • test new features (wip)
  • documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants