-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for GitHub-Flavoured Markdown tables #292
Support for GitHub-Flavoured Markdown tables #292
Conversation
These four new commits do the following:
I hope this is a useful contribution! |
Thank you very much for the contribution: It is most welcome! At a superficial survey, the extensions look great, and the additions to the |
I think following the lead of pandoc is a good choice, unless there's compelling reasons to differ :)
These aren't used for anything in Omd. I think they were only meant for debugging and possibly for interop with some other libraries? Thank for including a s-expr conversion! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally had a chance to review properly. Thanks for this lovely addition!
I left a few suggestions and questions. I should have bandwidth on Sunday to make the changes myself. But let me know if you'd rather make them yourself or if you find any of my suggestions objectionable.
@@ -41,6 +50,7 @@ module Make (C : BlockContent) = struct | |||
| Code_block of 'attr * string * string | |||
| Html_block of 'attr * string | |||
| Definition_list of 'attr * 'attr def_elt list | |||
| Table of 'attr * ('attr C.t * cell_alignment) list * 'attr C.t list list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth making the last two fields an inline record to identify for readers the header from the rows... not sure if that's worth it, but at least a comment to explain would be helpful.
(* Makes sure that there are the same number of delimiters | ||
as headers. See | ||
https://github.github.com/gfm/#example-203 *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for the helpful comments and references!
src/parser.ml
Outdated
@@ -755,25 +756,48 @@ let indented_code ind s = | |||
if indent s + ind < 4 then raise Fail; | |||
Lindented_code (StrSlice.offset (4 - ind) s) | |||
|
|||
(* A sequence of cell contents separated by unescaped '|' | |||
characters. *) | |||
let table_row pipe_prefix s = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a labeled arg for pipe_prefix
will help communicate the meaning of this combinator at call sites. E.g., table_row ~pipe_prefix:true
.
@@ -15,13 +15,23 @@ and inline = function | |||
| Text (_, s) -> Atom s | |||
| Emph (_, il) -> List [ Atom "emph"; inline il ] | |||
| Strong (_, il) -> List [ Atom "strong"; inline il ] | |||
| Code _ -> Atom "code" | |||
| Code _ -> Atom "code" (* FIXME: this seems broken? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted in #293. Thanks!
src/strSlice.ml
Outdated
let take_n n s = | ||
if n < 0 then invalid_arg "take_n"; | ||
let len = min n s.len in | ||
{ s with len } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other take
also takes n
. I wonder if the name here might be clearer as take_prefix
or maybe even prefix length:int s
?
| `normal, c when c = sep -> Some (idx - s.off) | ||
| `normal, _ -> loop (idx + 1) `normal | ||
| `escape, _ -> loop (idx + 1) `normal | ||
| `verbatim_open n, '`' -> loop (idx + 1) (`verbatim_open (n + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really helpful use of polymorphic variants to document and track the state 👍
let trim s = | ||
let is_whitespace = function | ||
| ' ' | '\t' | '\010' .. '\013' -> true | ||
| _ -> false | ||
in | ||
drop_while is_whitespace (drop_last_while is_whitespace s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary for this pr but: tho it's only a few lines of code, but I still wonder if it's worth trying to share the function defined at https://github.com/ocaml/omd/blob/7b59e5926a1f811e1d89b4b37b163a459692d308/src/parser.ml#L11-L13
Could mean adding a util module or....
src/strSlice.mli
Outdated
@@ -13,6 +13,12 @@ val index : (char -> bool) -> t -> int option | |||
(** [index c s] is [Some i] where [i] is the index of the character in [s] for | |||
which [f] is first true, or [None] if [f] holds for no characters in [s]. *) | |||
|
|||
val index_unescaped : char -> t -> int option | |||
(** [index_unescaped c s] is [Some i] where [i] index of the first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(** [index_unescaped c s] is [Some i] where [i] index of the first | |
(** [index_unescaped c s] is [Some i] where [i] is the index of the first |
src/strSlice.mli
Outdated
(** [take_n n s] returns the slice consisting of the first [n] | ||
characters of [s]. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be consistent with the "equational idiom" used in the rest of the docs for this module:
(** [take_n n s] returns the slice consisting of the first [n] | |
characters of [s]. *) | |
(** [take_n n s] is the slice consisting of the first [n] | |
characters of [s]. *) |
</tr> | ||
</thead> | ||
</table> | ||
```````````````````````````````` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding such good test coverage for this feature! 🙏
support more github flavoury tables fix HTML generation for tables Switch to using slices for table lines
- Ignore `|` characters within code spans for the purposes of spotting table cell delimiters. Correctly allow for code spans containing `` ` `` characters. - Disallow tables with no columns. - Allow rows with no `|` characters, including rows that look like setext headings (sequences of `=` or `-`)
- Add a new kind of `element_type` to the HTML trees for the table elements `table`, `thead`, `tbody`, and `tr`. - This makes the HTML output match newline placement used the GFM specification.
New tests for the GFM-style table support, in the same style as the existing tests, comparing HTML output. 1. Excerpt the GFM specification's table examples, with one exception in the treatment of escape characters inside code spans (noted in the test file). 2. Twelve additional tests for various corner cases.
b40cc11
to
543418e
Compare
Thanks a lot for the fantastic work on this! @shonfeder do you think we could cut a release of omd? Having this would solve a rather annoying issue in ocaml.org: ocaml/ocaml.org#59 |
Hi @tmattio! I’ll make time to cut a new alpha release next week. :) |
@tmattio a bit delayed, but see ocaml/opam-repository#22654 |
Thanks a lot @shonfeder! |
CHANGES: - Expose the HTML escape function `htmlentities` (ocaml-community/omd#295 @cuihtlauac) - Support generation of identifiers in headers (ocaml-community/omd#294, @tatchi) - Support GitHub-Flavoured Markdown tables (ocaml-community/omd#292, @bobatkey) - Update parser to support CommonMark Spec 0.30 (ocaml-community/omd#266, @SquidDev) - Preserve the order of input files in the HTML output to stdout (ocaml-community/omd#258, @patricoferris) - Fix all deviations from CommonMark Spec 0.30 (ocaml-community/omd#284, ocaml-community/omd#283, ocaml-community/omd#278, ocaml-community/omd#277, ocaml-community/omd#269, @tatchi)
First cut of support for GitHub-Flavoured Markdown (GFM) tables. No tests yet, but I thought I'd submit a preliminary pull request to make sure that there was interest. This handles simple tables that always have headers, and only allow inline content in cells, and no multi-column or multi-row cells.
All of the existing tests pass.
Addresses part of issue #205 .
GFM-style tables are documented here: https://github.github.com/gfm/#tables-extension- . The code so far seems to work for all the examples in that document, but I haven't made a proper test suite yet, mostly because I don't yet understand how the testing works. I've noticed that there are a few differences between the GFM documentation and Pandoc's
gfm
implementation too.I added some functions to the
StrSlice
module to make the parsing of tables easier.There are choices in how to represent the cell alignment in the HTML output. I went for the one chosen by Pandoc (CSS style information), but the
align="left/right/center"
used in the GFM document might be better.I made up a converstion to Sexprs. I also noticed that inline code and images are not properly converted into Sexprs. What uses this translation?
I'm not sure if I'm handling link definitions properly in the case of lines at the end of blocks that look like they might be table headers, but aren't (line 80 in
block_parser.ml
).