Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor AST/Environment #3

Merged
merged 5 commits into from
Mar 16, 2021
Merged

Refactor AST/Environment #3

merged 5 commits into from
Mar 16, 2021

Conversation

markhalliwell
Copy link
Member

@colinodell this is more or less what I was thinking. It would be nice if there were a common namespace that both projects could inherit from, that way other libraries could pass along their own Node objects. As it stands now, I think this may be the direction I want to go in, cause it would allow us to create a TwemojiExtension that adds a custom renderer for all \UnicornFail\Emoji\Node\Inline\Unicode objects.

@markhalliwell markhalliwell added the enhancement New feature or request label Sep 21, 2020
Copy link
Collaborator

@colinodell colinodell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be able to review this in-depth until Friday or Saturday at the earliest, but I do have some initial, high-level thoughts.

src/Node/Node.php Outdated Show resolved Hide resolved
src/Input/Input.php Outdated Show resolved Hide resolved
src/Output/RenderedContentInterface.php Outdated Show resolved Hide resolved
src/Parser/Parser.php Outdated Show resolved Hide resolved
src/Renderer/Inline/TextRenderer.php Outdated Show resolved Hide resolved
@markhalliwell
Copy link
Member Author

@colinodell any further thoughts on this?

@colinodell
Copy link
Collaborator

@colinodell any further thoughts on this?

My apologies! I still owe you a review and feedback on everything else here - I'll try to get that done this weekend. I appreciate the ping!

@colinodell
Copy link
Collaborator

I'll give you some initial, high-level feedback now :)

At this point, I don't have a strong desire to split parts of league/commonmark into separate packages. The only parts I'd be somewhat inclined to split out would be the configuration and event systems, as they are very general-purpose. I don't envision splitting the AST or environment-specific code out as most of that is tightly coupled to the league/commonmark engine.

In terms of parsing input, I'm thinking about this mostly in the context of how it would fit integrate with league/commonmark. In my mind there are two possible ways you might parse emoji:

  • Making a custom inline parser to handle things like :alias: shortcode input
  • Making a custom event observer to check for emoji characters or emoticons inside of parsed Text elements

In the first case, your parser tells the engine to stop at any : - you then try to parse the next few characters as an alias. In the second case, you're looking for any Unicode characters that are valid emoji or emoticons that match a specific sequence of characters.

I believe the following hold true for both cases:

  • You don't need to worry about splitting things into lines - if there's a \n character somewhere it simply fails to match either :alias: or an emoji character. Therefore, \n behaves just like any other character and doesn't need special treatment.
  • If you find an :alias:, emoji, or emoticon you'll be splitting that out into an Emoji node of some kind. Everything else will be left as Text. Therefore you don't need a nested AST implementation, just the ability to split strings into Emoji or Text and put those into some kind of flat, linear list.

For those reasons, I don't think your input parsing approach or AST has to align with league/commonmark. You can still use some kind of AST and/or node types internally, though. Emoji nodes in a league/commonmark extension could still directly embed objects from this library in the nodes' properties if desired.

I have not reviewed the environment stuff here, and I'm sure I missed a few points you mentioned above, so I'll circle back to those (and any other thoughts you have) later this week.

@markhalliwell
Copy link
Member Author

A lot of this work was based on the previous conversations we had about extracting non-commonmark specific code. Things, like environment and config, are certainly not commonmark specific.

In any case, if there's no interest in sharing code then I can certainly simplify it to be less abstract here. My goal was only to stop having to constantly create many of these common requirements. In hindsight and given the amount of time that has passed, it probably makes more sense to (at least now) to not try and change everything.

Re: parsing/line splitting, the primary goal of this particular project is to be a standalone implementation for parsing and converting emojis, emoticons and shortcodes. This has nothing really to do with commonmark and users may need/want to parse a multiline document on its own.

While we could parse things in commonmark at a character level, I believe it's just easier to register DocumentParsedEvent in commonmark and then walk over all the text elements found (as thephpleague/commonmark#546 shows). Given the various types of emojis, emoticons and shortcodes that need to be supported, commonmark doesn't need to know how to parse these; just integrate with this implementation and let it do its own thing.

@markhalliwell
Copy link
Member Author

I've gone ahead and extracted the configuration stuff into its own repo/project and added you as an admin:

https://github.com/unicorn-fail/configuration

Once you deem that it's ready, it should be relatively easy to transfer into the league namespace.

@markhalliwell
Copy link
Member Author

I'm starting to pair down the environment stuff and went ahead and added a Twemoji extension so we can use a real use case scenario to help build it out to determine what is actually needed. Needless to say, this is still a WIP, so I'll ping you once its ready again so you don't accidentally waste your time reviewing.

@markhalliwell markhalliwell force-pushed the ast-environment branch 3 times, most recently from b53d161 to 95d4d91 Compare March 11, 2021 22:56
@markhalliwell
Copy link
Member Author

markhalliwell commented Mar 16, 2021

All tests/coverage is 100% again locally. Need to figure out what's going on with GH actions. I think it may be due to a PHP version/serialization discrepancy (pretty sure the archives need to be serialized in PHP 7.2 as things changed in 7.4).

@colinodell I know this is a lot of code to review, so I'm thinking I'm going to go ahead and merge it and we can follow-up with smaller issues/PRs moving forward?

edit: I re-read a lot of your comments and made adjustments accordingly to simplify and "flatten" the AST. There's still a lot of boiler code, but that's mainly due to the necessity for extensions (Twemoji).

@markhalliwell markhalliwell changed the title WIP: Basic implementation of commonmark’s AST & Environment code Refactor AST/Environment Mar 16, 2021
@markhalliwell markhalliwell merged commit 3903cc8 into latest Mar 16, 2021
@markhalliwell markhalliwell deleted the ast-environment branch March 16, 2021 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants