Skip to content

Conversation

@wpbonelli
Copy link
Contributor

@wpbonelli wpbonelli commented Mar 9, 2025

Draft lark parser for MF6 input files. Use LALR and refuse the temptation to let ambiguity creep in. TBD if better to keep parser minimal and do more in transformation, or let the parser do more work. If latter we may need to give it definition info to handle the looser varieties of list-based input.

  • base grammar
  • component grammar generation
  • successfully parse all input files
  • implement a transformer
  • basic benchmarking

@wpbonelli
Copy link
Contributor Author

wpbonelli commented Jun 26, 2025

I did some homework.

To go fast we should push as much knowledge as possible into the parser. Why?

  • A more generic parser needs a smarter transformer, and the transformer will be slower. A smarter parser returns a more structured parse tree which needs less transformation.
  • A keyword-agnostic parser needs more manual validation after load. A keyword-aware parser will be faster to validate keywords, and will give nice error messages.

Tentative plan:

  • define a base grammar that knows about field formats
  • define a component-specific grammar template (Jinja?) importing the base grammar
  • for each MF6 component, generate a block- and keyword-aware grammar

This way we get a readable/diffable/unambiguous grammar for each component. Parsers will use LALR so they will be fast. We avoid complicated transformations, and we no longer need to manually check that blocks only contain expected variables.

And down the road, if for whatever reason we want to distribute parsers without depending on lark, we could generate a standalone parser for each grammar. But I see no reason to do that now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant