Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial cleanup and refactoring of the parser #286

Merged
merged 31 commits into from
Sep 12, 2022

Conversation

shonfeder
Copy link
Collaborator

@shonfeder shonfeder commented Sep 3, 2022

Context

To help clear the way for #223 we've agreed that some substantial refactoring of the parer passes should be undertaken to make it less unwieldy to extend and reason about (following the principle of "first make the change easy, then make the change").

This work is largely just preparatory, and the result of "active reading", which is adding comments, cleanup, and simplification as I study the existing parser. I'll resume this work later in the week.

Reviewing

There was some churn in the changes, but review still may benefit from skimming each commit, before a final review, to see the evolution of the changes.

I'd particularly like the review to ensure that the changes I'm proposing actually make the code more readable and easier to reason about. I don't want to be introducing changes for their own sake, so if you think any of my changes end up just as complicated or hard to reason about as what they are replacing (including the naming suggestions), please raise alarm!

Thanks in advance for the review!

@shonfeder shonfeder force-pushed the shon/parser-refactoring/pass-1 branch from f5b759c to aa8fbb0 Compare September 3, 2022 03:43
src/parser.ml Outdated Show resolved Hide resolved
@shonfeder shonfeder force-pushed the shon/parser-refactoring/pass-1 branch 2 times, most recently from cfef1b1 to 2dd3255 Compare September 5, 2022 03:45
The `last` and `tail` functions where each two functions combined into
one, differentiated by a `~rev` flag. This imposes unnecessary overhead
when trying to read the code (IMO). This change replaces the flag-based
usage with two declaritively named functions.

We also rename the `ws` function to the more accurate `trim_ws`, since
it trims white space, and make the flag a boolean, rather than an
optional unit.
Afaik, `drop` is the usual name for this function.

Also adds validation logic to ensure the slice cannot exceed the bounds
of the underlying string.
Helps reduce cognitive overload in trying to graple with `parser.ml` and
gives a more accurate name to the module.
The aim is to enable the reader to see what we are testing for at a
glance, to reduce the cognitive load of reading each line.
Replace logic with drop_while and separate into three distinct functions.
- Use uncons to simplify logic
- Separate helper function def from case analysis
- Simplify failure logic
@shonfeder shonfeder force-pushed the shon/parser-refactoring/pass-1 branch from 2dd3255 to 75805d8 Compare September 5, 2022 03:45
@shonfeder
Copy link
Collaborator Author

It's a long weekend in Canada, so I'll keep doing the cleanup here thru tomorrow, then I'll offer up whatever I have done by EOD for reviw :)

@shonfeder shonfeder force-pushed the shon/parser-refactoring/pass-1 branch from 67204db to 03cb3fb Compare September 5, 2022 17:38
Allows us to use more recent stdlib methods while maintaining
compat with older versions of the compiler.
@shonfeder shonfeder force-pushed the shon/parser-refactoring/pass-1 branch from db0e092 to 28b38e5 Compare September 5, 2022 23:51
These will enable cleaner logic in the parsing
Using a fold we can further simplify the logic, and ditch the explicit loop.
@shonfeder
Copy link
Collaborator Author

That's what I'll be getting done this weekend! Merging will be blocked until I sort out thierry-martinez/stdcompat#26, but this should be ready for review (mainly just to keep it from getting too long).

I'll resume this superficial pass of parsing later in the week, or on the weekend at the latest. I think I'm forming some ideas for more substantial refactoring, and I'll open issues to plan those out as they become more formed.

@shonfeder shonfeder self-assigned this Sep 6, 2022
@shonfeder shonfeder changed the title WIP: Initial cleanup and refactoring of the parser Initial cleanup and refactoring of the parser Sep 6, 2022
@shonfeder shonfeder marked this pull request as ready for review September 6, 2022 02:20
src/block.ml Outdated
@@ -1,5 +1,5 @@
open Ast
module Sub = Parser.Sub
module Sub = StrSlice
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what I should do about this naming. Is Sub a good name for the module? I find it cryptic and inaccurate, under the view that it's not really about substrings, per se, but about slices over a base. But maybe that's more of an implementation detail? Really, it's more like an alternative representation of strings based on the view of a slice...

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find StrSlice a better name than Parser.Sub.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I think I do too. I dropped the Sub alias.

@shonfeder
Copy link
Collaborator Author

Interesting! It looks like some of our custom Compat functions are compatible with 4.14, but break behavior related to unicode on older versions of OCaml! I'll have to restore our Compat, probably as an overlay on top of Stdcompat, and just for whichever function is doing the magic here.

src/parser.ml Outdated
@@ -139,24 +100,23 @@ end = struct

type 'a t = state -> 'a

let ensure_chars_remain st = if st.pos >= String.length st.str then raise Fail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but ensure_chars_remain to me sounds like it prevents chars from being removed whereas it is more like "ensure there are still chars". Can't think of an actual good name. Admittedly not great suggestions:

  • ensure_remaining_chars
  • ensure_not_at_end
  • ensure_end_not_reached
  • fail_if_no_more_chars

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the potential for confusion. Since it's only being used in 2 places, maybe best just to remove this? It also lets us make avoid conditional statements and use conditional expression. WDYT? 6611e25

src/parser.ml Outdated
Comment on lines 196 to 197
| Lsetext_heading of int * int
(** the level of the heading and how long the underline marker is *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider inline records here if they don't make later code more verbose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@tatchi
Copy link
Collaborator

tatchi commented Sep 9, 2022

Interesting! It looks like some of our custom Compat functions are compatible with 4.14, but break behavior related to unicode on older versions of OCaml! I'll have to restore our Compat, probably as an overlay on top of Stdcompat, and just for whichever function is doing the magic here.

Another option could be to use the Uutf.Buffer.add_utf_8 function instead (and get rid of the Compat module). See my commit

@shonfeder
Copy link
Collaborator Author

That's perfect. Thanks for the fix, @tatchi!

@shonfeder shonfeder force-pushed the shon/parser-refactoring/pass-1 branch from d7dd919 to b3cb00c Compare September 12, 2022 02:35
@@ -24,7 +24,8 @@ Additionally, OMD implements a few Github markdown features, an
extension mechanism, and some other features. Note that the opam
package installs both the OMD library and the command line tool `omd`.")
(tags (org:ocamllabs org:mirage))
(depends (ocaml (>= 4.05))
(depends (ocaml (>= 4.08))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm suggesting raising the minimum ocaml version here. I think this library is high-level enough, and has few enough dependencies that we don't need to weigh ourselves down trying to maintain backwards compatibility with versions that much more central libraries have already lost support for.

@shonfeder
Copy link
Collaborator Author

Thanks to @tatchi's suggestion and my suggestion to increase our minimum OCaml version to 4.08 (for the time being at least) I think this should be passing all CI now and ready for another review.

Copy link
Collaborator

@tatchi tatchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor 👏 Thanks for doing that!

@shonfeder shonfeder force-pushed the shon/parser-refactoring/pass-1 branch from b3cb00c to 612f35a Compare September 12, 2022 13:00
@shonfeder shonfeder enabled auto-merge September 12, 2022 13:08
@shonfeder shonfeder merged commit 7e78fec into master Sep 12, 2022
@shonfeder shonfeder deleted the shon/parser-refactoring/pass-1 branch September 12, 2022 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants