Refactor AST/Environment #3

markhalliwell · 2020-09-21T01:27:59Z

@colinodell this is more or less what I was thinking. It would be nice if there were a common namespace that both projects could inherit from, that way other libraries could pass along their own Node objects. As it stands now, I think this may be the direction I want to go in, cause it would allow us to create a TwemojiExtension that adds a custom renderer for all \UnicornFail\Emoji\Node\Inline\Unicode objects.

colinodell

I won't be able to review this in-depth until Friday or Saturday at the earliest, but I do have some initial, high-level thoughts.

src/Node/Node.php

src/Input/Input.php

src/Output/RenderedContentInterface.php

src/Parser/Parser.php

src/Renderer/Inline/TextRenderer.php

src/Environment/AbstractConfigurableEnvironment.php

src/Environment/Environment.php

markhalliwell · 2021-03-01T22:42:04Z

@colinodell any further thoughts on this?

colinodell · 2021-03-01T23:07:10Z

@colinodell any further thoughts on this?

My apologies! I still owe you a review and feedback on everything else here - I'll try to get that done this weekend. I appreciate the ping!

colinodell · 2021-03-01T23:46:32Z

I'll give you some initial, high-level feedback now :)

At this point, I don't have a strong desire to split parts of league/commonmark into separate packages. The only parts I'd be somewhat inclined to split out would be the configuration and event systems, as they are very general-purpose. I don't envision splitting the AST or environment-specific code out as most of that is tightly coupled to the league/commonmark engine.

In terms of parsing input, I'm thinking about this mostly in the context of how it would fit integrate with league/commonmark. In my mind there are two possible ways you might parse emoji:

Making a custom inline parser to handle things like :alias: shortcode input
Making a custom event observer to check for emoji characters or emoticons inside of parsed Text elements

In the first case, your parser tells the engine to stop at any : - you then try to parse the next few characters as an alias. In the second case, you're looking for any Unicode characters that are valid emoji or emoticons that match a specific sequence of characters.

I believe the following hold true for both cases:

You don't need to worry about splitting things into lines - if there's a \n character somewhere it simply fails to match either :alias: or an emoji character. Therefore, \n behaves just like any other character and doesn't need special treatment.
If you find an :alias:, emoji, or emoticon you'll be splitting that out into an Emoji node of some kind. Everything else will be left as Text. Therefore you don't need a nested AST implementation, just the ability to split strings into Emoji or Text and put those into some kind of flat, linear list.

For those reasons, I don't think your input parsing approach or AST has to align with league/commonmark. You can still use some kind of AST and/or node types internally, though. Emoji nodes in a league/commonmark extension could still directly embed objects from this library in the nodes' properties if desired.

I have not reviewed the environment stuff here, and I'm sure I missed a few points you mentioned above, so I'll circle back to those (and any other thoughts you have) later this week.

markhalliwell · 2021-03-02T21:01:59Z

A lot of this work was based on the previous conversations we had about extracting non-commonmark specific code. Things, like environment and config, are certainly not commonmark specific.

In any case, if there's no interest in sharing code then I can certainly simplify it to be less abstract here. My goal was only to stop having to constantly create many of these common requirements. In hindsight and given the amount of time that has passed, it probably makes more sense to (at least now) to not try and change everything.

Re: parsing/line splitting, the primary goal of this particular project is to be a standalone implementation for parsing and converting emojis, emoticons and shortcodes. This has nothing really to do with commonmark and users may need/want to parse a multiline document on its own.

While we could parse things in commonmark at a character level, I believe it's just easier to register DocumentParsedEvent in commonmark and then walk over all the text elements found (as thephpleague/commonmark#546 shows). Given the various types of emojis, emoticons and shortcodes that need to be supported, commonmark doesn't need to know how to parse these; just integrate with this implementation and let it do its own thing.

markhalliwell · 2021-03-03T00:17:32Z

I've gone ahead and extracted the configuration stuff into its own repo/project and added you as an admin:

https://github.com/unicorn-fail/configuration

Once you deem that it's ready, it should be relatively easy to transfer into the league namespace.

Signed-off-by: Mark Carver <[email protected]>

markhalliwell · 2021-03-05T22:47:14Z

I'm starting to pair down the environment stuff and went ahead and added a Twemoji extension so we can use a real use case scenario to help build it out to determine what is actually needed. Needless to say, this is still a WIP, so I'll ping you once its ready again so you don't accidentally waste your time reviewing.

…case

…deredContent

markhalliwell · 2021-03-16T15:48:15Z

All tests/coverage is 100% again locally. Need to figure out what's going on with GH actions. I think it may be due to a PHP version/serialization discrepancy (pretty sure the archives need to be serialized in PHP 7.2 as things changed in 7.4).

@colinodell I know this is a lot of code to review, so I'm thinking I'm going to go ahead and merge it and we can follow-up with smaller issues/PRs moving forward?

edit: I re-read a lot of your comments and made adjustments accordingly to simplify and "flatten" the AST. There's still a lot of boiler code, but that's mainly due to the necessity for extensions (Twemoji).

markhalliwell requested a review from colinodell September 21, 2020 01:27

markhalliwell added the enhancement New feature or request label Sep 21, 2020

colinodell reviewed Sep 21, 2020

View reviewed changes

markhalliwell commented Sep 21, 2020

View reviewed changes

src/Environment/AbstractConfigurableEnvironment.php Outdated Show resolved Hide resolved

markhalliwell commented Sep 21, 2020

View reviewed changes

src/Environment/Environment.php Outdated Show resolved Hide resolved

markhalliwell and others added 2 commits March 3, 2021 15:20

WIP: Basic implementation of commonmark’s AST & Environment code

3f34a1a

Signed-off-by: Mark Carver <[email protected]>

Replace configuration with league/configuration

63b164b

markhalliwell force-pushed the ast-environment branch from 49e530d to 63b164b Compare March 3, 2021 21:21

markhalliwell force-pushed the ast-environment branch 3 times, most recently from b53d161 to 95d4d91 Compare March 11, 2021 22:56

Start to simplify environment and add twemoji extension for real use …

daa46e9

…case

markhalliwell force-pushed the ast-environment branch from 95d4d91 to daa46e9 Compare March 12, 2021 23:00

Simplify Document and Node, remove unnecessary Input, NodeWalker, Ren…

f8f8662

…deredContent

markhalliwell force-pushed the ast-environment branch from 07e3c17 to f8f8662 Compare March 16, 2021 15:31

markhalliwell requested a review from colinodell March 16, 2021 15:44

markhalliwell changed the title ~~WIP: Basic implementation of commonmark’s AST & Environment code~~ Refactor AST/Environment Mar 16, 2021

Rebuild dataset archives and fix composer.lock to work with PHP 7.2+

c2e4fb1

markhalliwell merged commit 3903cc8 into latest Mar 16, 2021

markhalliwell deleted the ast-environment branch March 16, 2021 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor AST/Environment #3

Refactor AST/Environment #3

markhalliwell commented Sep 21, 2020

colinodell left a comment

markhalliwell commented Mar 1, 2021

colinodell commented Mar 1, 2021

colinodell commented Mar 1, 2021

markhalliwell commented Mar 2, 2021

markhalliwell commented Mar 3, 2021

markhalliwell commented Mar 5, 2021

markhalliwell commented Mar 16, 2021 •

edited

Loading

Refactor AST/Environment #3

Refactor AST/Environment #3

Conversation

markhalliwell commented Sep 21, 2020

colinodell left a comment

Choose a reason for hiding this comment

markhalliwell commented Mar 1, 2021

colinodell commented Mar 1, 2021

colinodell commented Mar 1, 2021

markhalliwell commented Mar 2, 2021

markhalliwell commented Mar 3, 2021

markhalliwell commented Mar 5, 2021

markhalliwell commented Mar 16, 2021 • edited Loading

markhalliwell commented Mar 16, 2021 •

edited

Loading