Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement Parser #2149

Open
wants to merge 99 commits into
base: main
Choose a base branch
from
Open

Reimplement Parser #2149

wants to merge 99 commits into from

Conversation

idavis
Copy link
Collaborator

@idavis idavis commented Feb 4, 2025

An intro to the PR with some useful links for the relevant sections:

This PR has two major components. A custom QASM3 parser, which replaces the IBM parser. And a refactored version of the compiler.

The custom QASM3 parser has three components:

  1. The raw lexer.
  2. The cooked lexer.
  3. Parser.

The refactored compiler has two components:
4. Lowerer.
5. Compiler.

Below is more detailed overview of the purpose of each component:

The OpenQASM 3.0 grammar will be useful while reviewing sections 1-2
https://openqasm.com/grammar/index.html

  1. The raw lexer:
    Takes the source code as input and returns a stream of "raw tokens". You can think of these mostly as characters like '(' and type literals.

  2. The cooked lexer:
    Takes the raw token stream and returns a stream of cooked tokens. Cooked tokens have more knowledge about QASM3 syntax, and they make parsing easier.

  3. Parser:
    Takes the stream of cooked tokens as input and returns an AST. We call it the syntax AST, to differentiate it from the semantic AST, which will be introduced later.

The grammar and specification of QASM3 don't fully agree. We had to take some decissions while implementing the parser. They are documented in the rust code. If you want to double check anything, the language spec will come in handy:
https://openqasm.com/language/index.html

  1. Lowerer:
    Takes the AST returned by the parser and performs semantic analysis on it. This is where we report any QASM3 related errors, like "functions can only be defined on the global scope." The lowerer returns a semantic AST. You will want to use the language spec to verify that the errors that we are checking for are actually on the spec, or that we are not missing any checks.
    https://openqasm.com/language/index.html

  2. Compiler:
    Takes the semantic AST and compiles it to a Q# AST. This is where we report any Q# related errors, like unsupported QASM3 features because they don't make sense in Q#. The compiler is a very straighforward mapping from the semantic AST to the Q# AST, since most of the heavy lifting is done during lowering.

Recomended Sections for reviewers:

qsc_qasm3/src/lib.rs (Entry point and general structure)
Mine, Stefan

qsc_qasm3/src/parser/completion.rs (Completion)
qsc_qasm3/src/io.rs (IO)
Mine

qsc_qasm3/src/lex.rs (Raw and Cooked lexer)
Scott

qsc_qasm3/src/parser.rs (Parser)
qsc_qasm3/src/parser/ast.rs (QASM3 AST, double check against grammar & Spec)
Scott

qsc_qasm3/src/oqasm_helpers.rs
qsc_qasm3/src/semantic/types.rs (QASM3 types)
qsc_qasm3/src/semantic.rs (Lowerer entry point)
qsc_qasm3/src/lowerer.rs
This requires QASM3 specific knowledge. We can use as many eyes as possible.
Dmitry, Scott

qsc_qasm3/src/stdlib.rs
qsc_qasm3/src/stdlib/angle.rs
qsc_qasm3/src/stdlib/QasmStdrs
Dmitry

qsc_qasm3/src/types.rs (Types)
qsc_qasm3/src/ast_builder.rs
qsc_qasm3/src/compiler.rs (Compiler)
qsc_qasm3/src/runtime.rs (Runtime Features)
Stefan

fuzz/ (Testing)
Mine, Stefan

pip/src/ (Interop)
qsc/src/
qsc_codegen/src/
Mine, Stefan

.github/fuzz (GitHub pipeline)
Ian, Stefan

Tracking items

Lexing

  • Basic raw tokens
  • Basic cooked tokens
  • Cook pragma and annotation

Parsing

Lowering

Compiling

Fit and Finish

Postponed

  • Update ast_builder calls with module names instead of old ns names
  • bit x = 0; bit y = ~x; is valid QASM3, but it gets compiled to let x = Zero; let y = ~~~x; which is invalid Q# code, since the Result type in Q# doesn't support unary bitwise negation. This is the same for all other operations that bit should support, Q# Result only supports equality.
  • We are currently not enforcing that qasm3 uint types must be positive, since they are compiled to Q# Int which are signed. We might need a UInt type similar to the Angle type introduced in Add angle support to the new compiler #2267 to be able to enforce this constraint.
  • uint 63 bitness with 64 bigint. Separate ops for uint?
  • Lower Ident as Rc<Symbol> and not as SymbolId to avoid an unnecessary SymbolTable lookup? In that way we don't even need to pass the symbol table to the compiler.
  • Use the formal parameters' SymbolIds when creating the function type, so that we can give the user a better error message when they pass an argument that fails implicit casting to a function. There is a catch: we need to insert the function symbol into the symbol table before pushing the scope where the formal parameters live, but we need the SymbolIds of the formal parameters to construct the function symbol.
  • Create qubit cleanup calls which can be conditionally added to end of program during compile, not done for fragments Qubit release calls #2281
  • Every time we allow an arithmetic lint in compiler/qsc_qasm3/src/semantic/ast/const_eval.rs we need to issue the same lint in Q#.
  • Double check cast from int to uint and cast from uint to int in const evaluator. Both types are represented as i64 so there is nothing to do, which is confusing.
  • Profile the usage of Box<[Box<T>]> with iai-callgrind in a large OpenQASM3 sample to verify that is actually faster than using Vec. Even though Box uses less stack space, it reduces cache locality, because now you need to be jumping around in memory to read contiguous elements of a list. I suspect that what we really want to use if Box<[T]>.
  • Input decls are pushed to the symbol table, but should not be in the stmts list. This may be an issue for tooling as there isn't a way to have a forward declared variable in Q#. compiler/qsc_qasm3/src/compiler.rs::QasmCompiler::compile_output_decl_stmt.
  • See the comment on compiler/qsc_qasm3/src/compiler.rs::QasmCompiler::create_entry_item saying "This can create a collision on multiple compiles when interactive. We also have issues with the new entry point inference logic."
  • Minimize number of errors reported in the compiler. There are many situations where we report duplicated or unnecessary extra errors that could make harder for the user to fix the actual problem. For example, see the test named fuzzer_issue_2294. The one valuable error there is the one saying "undefined symbol _`.

@idavis idavis requested a review from orpuente-MS February 4, 2025 02:11
@idavis idavis self-assigned this Feb 4, 2025
@idavis idavis force-pushed the feature/qasm3 branch 2 times, most recently from eb82b24 to b803573 Compare March 18, 2025 21:48
@idavis idavis force-pushed the feature/qasm3 branch 2 times, most recently from 1ad7321 to 313b7af Compare March 27, 2025 16:28
@idavis idavis marked this pull request as ready for review April 9, 2025 21:48
Remove TODO comments from code and add them as Postponed items to the
tracking PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants