diff --git a/.travis.yml b/.travis.yml index 208fd25b..b0b525a5 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,18 +1,17 @@ -# Travis CI (http://travis-ci.org/) is a continuous integration service for -# open source projects. This file configures it to run unit tests for -# blackfriday. - +sudo: false language: go - go: - - 1.2 - - 1.3 - - 1.4 - - 1.5 - + - "1.10.x" + - "1.11.x" + - tip +matrix: + fast_finish: true + allow_failures: + - go: tip install: - - go get -d -t -v ./... - - go build -v ./... - + - # Do nothing. This is needed to prevent default install action "go get -t -v ./..." from happening here (we want it to happen inside script step). script: - - go test -v ./... + - go get -t -v ./... + - diff -u <(echo -n) <(gofmt -d -s .) + - go tool vet . + - go test -v ./... diff --git a/README.md b/README.md index 7b979700..d9c08a22 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,6 @@ -Blackfriday [![Build Status](https://travis-ci.org/russross/blackfriday.svg?branch=master)](https://travis-ci.org/russross/blackfriday) +Blackfriday +[![Build Status][BuildV2SVG]][BuildV2URL] +[![PkgGoDev][PkgGoDevV2SVG]][PkgGoDevV2URL] =========== Blackfriday is a [Markdown][1] processor implemented in [Go][2]. It @@ -8,7 +10,7 @@ punctuation substitutions, etc.), and it is safe for all utf-8 (unicode) input. HTML output is currently supported, along with Smartypants -extensions. An experimental LaTeX output engine is also included. +extensions. It started as a translation from C of [Sundown][3]. @@ -16,63 +18,98 @@ It started as a translation from C of [Sundown][3]. Installation ------------ -Blackfriday is compatible with Go 1. If you are using an older -release of Go, consider using v1.1 of blackfriday, which was based -on the last stable release of Go prior to Go 1. You can find it as a -tagged commit on github. +Blackfriday is compatible with modern Go releases in module mode. +With Go installed: -With Go 1 and git installed: + go get github.com/russross/blackfriday/v2 - go get github.com/russross/blackfriday +will resolve and add the package to the current development module, +then build and install it. Alternatively, you can achieve the same +if you import it in a package: -will download, compile, and install the package into your `$GOPATH` -directory hierarchy. Alternatively, you can achieve the same if you -import it into a project: - - import "github.com/russross/blackfriday" + import "github.com/russross/blackfriday/v2" and `go get` without parameters. +Legacy GOPATH mode is unsupported. + + +Versions +-------- + +Currently maintained and recommended version of Blackfriday is `v2`. It's being +developed on its own branch: https://github.com/russross/blackfriday/tree/v2 and the +documentation is available at +https://pkg.go.dev/github.com/russross/blackfriday/v2. + +It is `go get`-able in module mode at `github.com/russross/blackfriday/v2`. + +Version 2 offers a number of improvements over v1: + +* Cleaned up API +* A separate call to [`Parse`][4], which produces an abstract syntax tree for + the document +* Latest bug fixes +* Flexibility to easily add your own rendering extensions + +Potential drawbacks: + +* Our benchmarks show v2 to be slightly slower than v1. Currently in the + ballpark of around 15%. +* API breakage. If you can't afford modifying your code to adhere to the new API + and don't care too much about the new features, v2 is probably not for you. +* Several bug fixes are trailing behind and still need to be forward-ported to + v2. See issue [#348](https://github.com/russross/blackfriday/issues/348) for + tracking. + +If you are still interested in the legacy `v1`, you can import it from +`github.com/russross/blackfriday`. Documentation for the legacy v1 can be found +here: https://pkg.go.dev/github.com/russross/blackfriday. + + Usage ----- -For basic usage, it is as simple as getting your input into a byte -slice and calling: +For the most sensible markdown processing, it is as simple as getting your input +into a byte slice and calling: - output := blackfriday.MarkdownBasic(input) +```go +output := blackfriday.Run(input) +``` -This renders it with no extensions enabled. To get a more useful -feature set, use this instead: +Your input will be parsed and the output rendered with a set of most popular +extensions enabled. If you want the most basic feature set, corresponding with +the bare Markdown specification, use: - output := blackfriday.MarkdownCommon(input) +```go +output := blackfriday.Run(input, blackfriday.WithNoExtensions()) +``` ### Sanitize untrusted content Blackfriday itself does nothing to protect against malicious content. If you are -dealing with user-supplied markdown, we recommend running blackfriday's output -through HTML sanitizer such as -[Bluemonday](https://github.com/microcosm-cc/bluemonday). +dealing with user-supplied markdown, we recommend running Blackfriday's output +through HTML sanitizer such as [Bluemonday][5]. -Here's an example of simple usage of blackfriday together with bluemonday: +Here's an example of simple usage of Blackfriday together with Bluemonday: -``` go +```go import ( "github.com/microcosm-cc/bluemonday" - "github.com/russross/blackfriday" + "github.com/russross/blackfriday/v2" ) // ... -unsafe := blackfriday.MarkdownCommon(input) +unsafe := blackfriday.Run(input) html := bluemonday.UGCPolicy().SanitizeBytes(unsafe) ``` ### Custom options -If you want to customize the set of options, first get a renderer -(currently either the HTML or LaTeX output engines), then use it to -call the more general `Markdown` function. For examples, see the -implementations of `MarkdownBasic` and `MarkdownCommon` in -`markdown.go`. +If you want to customize the set of options, use `blackfriday.WithExtensions`, +`blackfriday.WithRenderer` and `blackfriday.WithRefOverride`. + +### `blackfriday-tool` You can also check out `blackfriday-tool` for a more complete example of how to use it. Download and install it using: @@ -84,7 +121,7 @@ markdown file using a standalone program. You can also browse the source directly on github if you are just looking for some example code: -* +* Note that if you have not already done so, installing `blackfriday-tool` will be sufficient to download and install @@ -93,6 +130,22 @@ installed in `$GOPATH/bin`. This is a statically-linked binary that can be copied to wherever you need it without worrying about dependencies and library versions. +### Sanitized anchor names + +Blackfriday includes an algorithm for creating sanitized anchor names +corresponding to a given input text. This algorithm is used to create +anchors for headings when `AutoHeadingIDs` extension is enabled. The +algorithm has a specification, so that other packages can create +compatible anchor names and links to those anchors. + +The specification is located at https://pkg.go.dev/github.com/russross/blackfriday/v2#hdr-Sanitized_Anchor_Names. + +[`SanitizedAnchorName`](https://pkg.go.dev/github.com/russross/blackfriday/v2#SanitizedAnchorName) exposes this functionality, and can be used to +create compatible links to the anchor names generated by blackfriday. +This algorithm is also implemented in a small standalone package at +[`github.com/shurcooL/sanitized_anchor_name`](https://pkg.go.dev/github.com/shurcooL/sanitized_anchor_name). It can be useful for clients +that want a small package and don't need full functionality of blackfriday. + Features -------- @@ -114,7 +167,7 @@ All features of Sundown are supported, including: know and send me the input that does it. NOTE: "safety" in this context means *runtime safety only*. In order to - protect yourself agains JavaScript injection in untrusted content, see + protect yourself against JavaScript injection in untrusted content, see [this example](https://github.com/russross/blackfriday#sanitize-untrusted-content). * **Fast processing**. It is fast enough to render on-demand in @@ -160,7 +213,7 @@ implements the following extensions: and supply a language (to make syntax highlighting simple). Just mark it like this: - ``` go + ```go func getTrue() bool { return true } @@ -169,12 +222,21 @@ implements the following extensions: You can use 3 or more backticks to mark the beginning of the block, and the same number to mark the end of the block. + To preserve classes of fenced code blocks while using the bluemonday + HTML sanitizer, use the following policy: + + ```go + p := bluemonday.UGCPolicy() + p.AllowAttrs("class").Matching(regexp.MustCompile("^language-[a-zA-Z0-9]+$")).OnElements("code") + html := p.SanitizeBytes(unsafe) + ``` + * **Definition lists**. A simple definition list is made of a single-line term followed by a colon and the definition for that term. Cat : Fluffy animal everyone likes - + Internet : Vector of transmission for pictures of cats @@ -185,7 +247,7 @@ implements the following extensions: end of the document. A footnote looks like this: This is a footnote.[^1] - + [^1]: the footnote text. * **Autolinking**. Blackfriday can find URLs that have not been @@ -194,10 +256,8 @@ implements the following extensions: * **Strikethrough**. Use two tildes (`~~`) to mark text that should be crossed out. -* **Hard line breaks**. With this extension enabled (it is off by - default in the `MarkdownBasic` and `MarkdownCommon` convenience - functions), newlines in the input translate into line breaks in - the output. +* **Hard line breaks**. With this extension enabled newlines in the input + translate into line breaks in the output. This extension is off by default. * **Smart quotes**. Smartypants-style punctuation substitution is supported, turning normal double- and single-quote marks into @@ -222,9 +282,9 @@ Other renderers Blackfriday is structured to allow alternative rendering engines. Here are a few of note: -* [github_flavored_markdown](https://godoc.org/github.com/shurcooL/github_flavored_markdown): +* [github_flavored_markdown](https://pkg.go.dev/github.com/shurcooL/github_flavored_markdown): provides a GitHub Flavored Markdown renderer with fenced code block - highlighting, clickable header anchor links. + highlighting, clickable heading anchor links. It's not customizable, and its goal is to produce HTML output equivalent to the [GitHub Markdown API endpoint](https://developer.github.com/v3/markdown/#render-a-markdown-document-in-raw-mode), @@ -233,25 +293,28 @@ are a few of note: * [markdownfmt](https://github.com/shurcooL/markdownfmt): like gofmt, but for markdown. -* LaTeX output: renders output as LaTeX. This is currently part of the - main Blackfriday repository, but may be split into its own project - in the future. If you are interested in owning and maintaining the - LaTeX output component, please be in touch. +* [LaTeX output](https://gitlab.com/ambrevar/blackfriday-latex): + renders output as LaTeX. - It renders some basic documents, but is only experimental at this - point. In particular, it does not do any inline escaping, so input - that happens to look like LaTeX code will be passed through without - modification. +* [bfchroma](https://github.com/Depado/bfchroma/): provides convenience + integration with the [Chroma](https://github.com/alecthomas/chroma) code + highlighting library. bfchroma is only compatible with v2 of Blackfriday and + provides a drop-in renderer ready to use with Blackfriday, as well as + options and means for further customization. +* [Blackfriday-Confluence](https://github.com/kentaro-m/blackfriday-confluence): provides a [Confluence Wiki Markup](https://confluence.atlassian.com/doc/confluence-wiki-markup-251003035.html) renderer. -Todo +* [Blackfriday-Slack](https://github.com/karriereat/blackfriday-slack): converts markdown to slack message style + + +TODO ---- * More unit testing -* Improve unicode support. It does not understand all unicode +* Improve Unicode support. It does not understand all Unicode rules (about what constitutes a letter, a punctuation symbol, etc.), so it may fail to detect word boundaries correctly in - some instances. It is safe on all utf-8 input. + some instances. It is safe on all UTF-8 input. License @@ -260,6 +323,13 @@ License [Blackfriday is distributed under the Simplified BSD License](LICENSE.txt) - [1]: http://daringfireball.net/projects/markdown/ "Markdown" - [2]: http://golang.org/ "Go Language" + [1]: https://daringfireball.net/projects/markdown/ "Markdown" + [2]: https://golang.org/ "Go Language" [3]: https://github.com/vmg/sundown "Sundown" + [4]: https://pkg.go.dev/github.com/russross/blackfriday/v2#Parse "Parse func" + [5]: https://github.com/microcosm-cc/bluemonday "Bluemonday" + + [BuildV2SVG]: https://travis-ci.org/russross/blackfriday.svg?branch=v2 + [BuildV2URL]: https://travis-ci.org/russross/blackfriday + [PkgGoDevV2SVG]: https://pkg.go.dev/badge/github.com/russross/blackfriday/v2 + [PkgGoDevV2URL]: https://pkg.go.dev/github.com/russross/blackfriday/v2 diff --git a/block.go b/block.go index b5b08411..5c9a5882 100644 --- a/block.go +++ b/block.go @@ -15,18 +15,26 @@ package blackfriday import ( "bytes" + "html" + "regexp" + "strings" + "unicode" +) + +const ( + charEntity = "&(?:#x[a-f0-9]{1,8}|#[0-9]{1,8}|[a-z][a-z0-9]{1,31});" + escapable = "[!\"#$%&'()*+,./:;<=>?@[\\\\\\]^_`{|}~-]" +) - "github.com/shurcooL/sanitized_anchor_name" +var ( + reBackslashOrAmp = regexp.MustCompile("[\\&]") + reEntityOrEscapedChar = regexp.MustCompile("(?i)\\\\" + escapable + "|" + charEntity) ) // Parse block-level data. // Note: this function and many that it calls assume that // the input buffer ends with a newline. -func (p *parser) block(out *bytes.Buffer, data []byte) { - if len(data) == 0 || data[len(data)-1] != '\n' { - panic("block input is missing terminating newline") - } - +func (p *Markdown) block(data []byte) { // this is called recursively: enforce a maximum depth if p.nesting >= p.maxNesting { return @@ -35,14 +43,14 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // parse out one block-level construct at a time for len(data) > 0 { - // prefixed header: + // prefixed heading: // - // # Header 1 - // ## Header 2 + // # Heading 1 + // ## Heading 2 // ... - // ###### Header 6 - if p.isPrefixHeader(data) { - data = data[p.prefixHeader(out, data):] + // ###### Heading 6 + if p.isPrefixHeading(data) { + data = data[p.prefixHeading(data):] continue } @@ -52,7 +60,7 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // ... // if data[0] == '<' { - if i := p.html(out, data, true); i > 0 { + if i := p.html(data, true); i > 0 { data = data[i:] continue } @@ -63,9 +71,9 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // % stuff // % more stuff // % even more stuff - if p.flags&EXTENSION_TITLEBLOCK != 0 { + if p.extensions&Titleblock != 0 { if data[0] == '%' { - if i := p.titleBlock(out, data, true); i > 0 { + if i := p.titleBlock(data, true); i > 0 { data = data[i:] continue } @@ -87,7 +95,7 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // return b // } if p.codePrefix(data) > 0 { - data = data[p.code(out, data):] + data = data[p.code(data):] continue } @@ -101,8 +109,8 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // return n * fact(n-1) // } // ``` - if p.flags&EXTENSION_FENCED_CODE != 0 { - if i := p.fencedCode(out, data, true); i > 0 { + if p.extensions&FencedCode != 0 { + if i := p.fencedCodeBlock(data, true); i > 0 { data = data[i:] continue } @@ -116,9 +124,9 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // or // ______ if p.isHRule(data) { - p.r.HRule(out) + p.addBlock(HorizontalRule, nil) var i int - for i = 0; data[i] != '\n'; i++ { + for i = 0; i < len(data) && data[i] != '\n'; i++ { } data = data[i:] continue @@ -129,7 +137,7 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // > A big quote I found somewhere // > on the web if p.quotePrefix(data) > 0 { - data = data[p.quote(out, data):] + data = data[p.quote(data):] continue } @@ -139,8 +147,8 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // ------|-----|--------- // Bob | 31 | 555-1234 // Alice | 27 | 555-4321 - if p.flags&EXTENSION_TABLES != 0 { - if i := p.table(out, data); i > 0 { + if p.extensions&Tables != 0 { + if i := p.table(data); i > 0 { data = data[i:] continue } @@ -153,7 +161,7 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // // also works with + or - if p.uliPrefix(data) > 0 { - data = data[p.list(out, data, 0):] + data = data[p.list(data, 0):] continue } @@ -162,7 +170,7 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // 1. Item 1 // 2. Item 2 if p.oliPrefix(data) > 0 { - data = data[p.list(out, data, LIST_TYPE_ORDERED):] + data = data[p.list(data, ListTypeOrdered):] continue } @@ -174,55 +182,62 @@ func (p *parser) block(out *bytes.Buffer, data []byte) { // // Term 2 // : Definition c - if p.flags&EXTENSION_DEFINITION_LISTS != 0 { + if p.extensions&DefinitionLists != 0 { if p.dliPrefix(data) > 0 { - data = data[p.list(out, data, LIST_TYPE_DEFINITION):] + data = data[p.list(data, ListTypeDefinition):] continue } } // anything else must look like a normal paragraph - // note: this finds underlined headers, too - data = data[p.paragraph(out, data):] + // note: this finds underlined headings, too + data = data[p.paragraph(data):] } p.nesting-- } -func (p *parser) isPrefixHeader(data []byte) bool { +func (p *Markdown) addBlock(typ NodeType, content []byte) *Node { + p.closeUnmatchedBlocks() + container := p.addChild(typ, 0) + container.content = content + return container +} + +func (p *Markdown) isPrefixHeading(data []byte) bool { if data[0] != '#' { return false } - if p.flags&EXTENSION_SPACE_HEADERS != 0 { + if p.extensions&SpaceHeadings != 0 { level := 0 - for level < 6 && data[level] == '#' { + for level < 6 && level < len(data) && data[level] == '#' { level++ } - if data[level] != ' ' { + if level == len(data) || data[level] != ' ' { return false } } return true } -func (p *parser) prefixHeader(out *bytes.Buffer, data []byte) int { +func (p *Markdown) prefixHeading(data []byte) int { level := 0 - for level < 6 && data[level] == '#' { + for level < 6 && level < len(data) && data[level] == '#' { level++ } i := skipChar(data, level, ' ') end := skipUntilChar(data, i, '\n') skip := end id := "" - if p.flags&EXTENSION_HEADER_IDS != 0 { + if p.extensions&HeadingIDs != 0 { j, k := 0, 0 - // find start/end of header id + // find start/end of heading id for j = i; j < end-1 && (data[j] != '{' || data[j+1] != '#'); j++ { } for k = j + 1; k < end && data[k] != '}'; k++ { } - // extract header id iff found + // extract heading id iff found if j < end && k < end { id = string(data[j+2 : k]) end = j @@ -242,45 +257,41 @@ func (p *parser) prefixHeader(out *bytes.Buffer, data []byte) int { end-- } if end > i { - if id == "" && p.flags&EXTENSION_AUTO_HEADER_IDS != 0 { - id = sanitized_anchor_name.Create(string(data[i:end])) - } - work := func() bool { - p.inline(out, data[i:end]) - return true + if id == "" && p.extensions&AutoHeadingIDs != 0 { + id = SanitizedAnchorName(string(data[i:end])) } - p.r.Header(out, work, level, id) + block := p.addBlock(Heading, data[i:end]) + block.HeadingID = id + block.Level = level } return skip } -func (p *parser) isUnderlinedHeader(data []byte) int { - // test of level 1 header +func (p *Markdown) isUnderlinedHeading(data []byte) int { + // test of level 1 heading if data[0] == '=' { i := skipChar(data, 1, '=') i = skipChar(data, i, ' ') - if data[i] == '\n' { + if i < len(data) && data[i] == '\n' { return 1 - } else { - return 0 } + return 0 } - // test of level 2 header + // test of level 2 heading if data[0] == '-' { i := skipChar(data, 1, '-') i = skipChar(data, i, ' ') - if data[i] == '\n' { + if i < len(data) && data[i] == '\n' { return 2 - } else { - return 0 } + return 0 } return 0 } -func (p *parser) titleBlock(out *bytes.Buffer, data []byte, doRender bool) int { +func (p *Markdown) titleBlock(data []byte, doRender bool) int { if data[0] != '%' { return 0 } @@ -294,12 +305,17 @@ func (p *parser) titleBlock(out *bytes.Buffer, data []byte, doRender bool) int { } data = bytes.Join(splitData[0:i], []byte("\n")) - p.r.TitleBlock(out, data) - - return len(data) + consumed := len(data) + data = bytes.TrimPrefix(data, []byte("% ")) + data = bytes.Replace(data, []byte("\n% "), []byte("\n"), -1) + block := p.addBlock(Heading, data) + block.Level = 1 + block.IsTitleblock = true + + return consumed } -func (p *parser) html(out *bytes.Buffer, data []byte, doRender bool) int { +func (p *Markdown) html(data []byte, doRender bool) int { var i, j int // identify the opening tag @@ -311,12 +327,12 @@ func (p *parser) html(out *bytes.Buffer, data []byte, doRender bool) int { // handle special cases if !tagfound { // check for an HTML comment - if size := p.htmlComment(out, data, doRender); size > 0 { + if size := p.htmlComment(data, doRender); size > 0 { return size } // check for an
tag - if size := p.htmlHr(out, data, doRender); size > 0 { + if size := p.htmlHr(data, doRender); size > 0 { return size } @@ -391,15 +407,20 @@ func (p *parser) html(out *bytes.Buffer, data []byte, doRender bool) int { for end > 0 && data[end-1] == '\n' { end-- } - p.r.BlockHtml(out, data[:end]) + finalizeHTMLBlock(p.addBlock(HTMLBlock, data[:end])) } return i } +func finalizeHTMLBlock(block *Node) { + block.Literal = block.content + block.content = nil +} + // HTML comment, lax form -func (p *parser) htmlComment(out *bytes.Buffer, data []byte, doRender bool) int { - i := p.inlineHtmlComment(out, data) +func (p *Markdown) htmlComment(data []byte, doRender bool) int { + i := p.inlineHTMLComment(data) // needs to end with a blank line if j := p.isEmpty(data[i:]); j > 0 { size := i + j @@ -409,7 +430,8 @@ func (p *parser) htmlComment(out *bytes.Buffer, data []byte, doRender bool) int for end > 0 && data[end-1] == '\n' { end-- } - p.r.BlockHtml(out, data[:end]) + block := p.addBlock(HTMLBlock, data[:end]) + finalizeHTMLBlock(block) } return size } @@ -417,7 +439,10 @@ func (p *parser) htmlComment(out *bytes.Buffer, data []byte, doRender bool) int } // HR, which is the only self-closing block tag considered -func (p *parser) htmlHr(out *bytes.Buffer, data []byte, doRender bool) int { +func (p *Markdown) htmlHr(data []byte, doRender bool) int { + if len(data) < 4 { + return 0 + } if data[0] != '<' || (data[1] != 'h' && data[1] != 'H') || (data[2] != 'r' && data[2] != 'R') { return 0 } @@ -425,13 +450,11 @@ func (p *parser) htmlHr(out *bytes.Buffer, data []byte, doRender bool) int { // not an
tag after all; at least not a valid one return 0 } - i := 3 - for data[i] != '>' && data[i] != '\n' { + for i < len(data) && data[i] != '>' && data[i] != '\n' { i++ } - - if data[i] == '>' { + if i < len(data) && data[i] == '>' { i++ if j := p.isEmpty(data[i:]); j > 0 { size := i + j @@ -441,18 +464,17 @@ func (p *parser) htmlHr(out *bytes.Buffer, data []byte, doRender bool) int { for end > 0 && data[end-1] == '\n' { end-- } - p.r.BlockHtml(out, data[:end]) + finalizeHTMLBlock(p.addBlock(HTMLBlock, data[:end])) } return size } } - return 0 } -func (p *parser) htmlFindTag(data []byte) (string, bool) { +func (p *Markdown) htmlFindTag(data []byte) (string, bool) { i := 0 - for isalnum(data[i]) { + for i < len(data) && isalnum(data[i]) { i++ } key := string(data[:i]) @@ -462,9 +484,11 @@ func (p *parser) htmlFindTag(data []byte) (string, bool) { return "", false } -func (p *parser) htmlFindEnd(tag string, data []byte) int { +func (p *Markdown) htmlFindEnd(tag string, data []byte) int { // assume data[0] == '<' && data[1] == '/' already tested - + if tag == "hr" { + return 2 + } // check if tag is a match closetag := []byte("") if !bytes.HasPrefix(data, closetag) { @@ -484,7 +508,7 @@ func (p *parser) htmlFindEnd(tag string, data []byte) int { return i } - if p.flags&EXTENSION_LAX_HTML_BLOCKS != 0 { + if p.extensions&LaxHTMLBlocks != 0 { return i } if skip = p.isEmpty(data[i:]); skip == 0 { @@ -495,7 +519,7 @@ func (p *parser) htmlFindEnd(tag string, data []byte) int { return i + skip } -func (p *parser) isEmpty(data []byte) int { +func (*Markdown) isEmpty(data []byte) int { // it is okay to call isEmpty on an empty buffer if len(data) == 0 { return 0 @@ -507,10 +531,13 @@ func (p *parser) isEmpty(data []byte) int { return 0 } } - return i + 1 + if i < len(data) && data[i] == '\n' { + i++ + } + return i } -func (p *parser) isHRule(data []byte) bool { +func (*Markdown) isHRule(data []byte) bool { i := 0 // skip up to three spaces @@ -526,7 +553,7 @@ func (p *parser) isHRule(data []byte) bool { // the whole line must be the char or whitespace n := 0 - for data[i] != '\n' { + for i < len(data) && data[i] != '\n' { switch { case data[i] == c: n++ @@ -539,21 +566,23 @@ func (p *parser) isHRule(data []byte) bool { return n >= 3 } -func (p *parser) isFencedCode(data []byte, syntax **string, oldmarker string) (skip int, marker string) { +// isFenceLine checks if there's a fence line (e.g., ``` or ``` go) at the beginning of data, +// and returns the end index if so, or 0 otherwise. It also returns the marker found. +// If info is not nil, it gets set to the syntax specified in the fence line. +func isFenceLine(data []byte, info *string, oldmarker string) (end int, marker string) { i, size := 0, 0 - skip = 0 // skip up to three spaces for i < len(data) && i < 3 && data[i] == ' ' { i++ } - if i >= len(data) { - return - } // check for the marker characters: ~ or ` + if i >= len(data) { + return 0, "" + } if data[i] != '~' && data[i] != '`' { - return + return 0, "" } c := data[i] @@ -564,90 +593,96 @@ func (p *parser) isFencedCode(data []byte, syntax **string, oldmarker string) (s i++ } - if i >= len(data) { - return - } - // the marker char must occur at least 3 times if size < 3 { - return + return 0, "" } marker = string(data[i-size : i]) // if this is the end marker, it must match the beginning marker if oldmarker != "" && marker != oldmarker { - return + return 0, "" } - if syntax != nil { - syn := 0 + // TODO(shurcooL): It's probably a good idea to simplify the 2 code paths here + // into one, always get the info string, and discard it if the caller doesn't care. + if info != nil { + infoLength := 0 i = skipChar(data, i, ' ') if i >= len(data) { - return + if i == len(data) { + return i, marker + } + return 0, "" } - syntaxStart := i + infoStart := i if data[i] == '{' { i++ - syntaxStart++ + infoStart++ for i < len(data) && data[i] != '}' && data[i] != '\n' { - syn++ + infoLength++ i++ } if i >= len(data) || data[i] != '}' { - return + return 0, "" } // strip all whitespace at the beginning and the end // of the {} block - for syn > 0 && isspace(data[syntaxStart]) { - syntaxStart++ - syn-- + for infoLength > 0 && isspace(data[infoStart]) { + infoStart++ + infoLength-- } - for syn > 0 && isspace(data[syntaxStart+syn-1]) { - syn-- + for infoLength > 0 && isspace(data[infoStart+infoLength-1]) { + infoLength-- } - i++ + i = skipChar(data, i, ' ') } else { - for i < len(data) && !isspace(data[i]) { - syn++ + for i < len(data) && !isverticalspace(data[i]) { + infoLength++ i++ } } - language := string(data[syntaxStart : syntaxStart+syn]) - *syntax = &language + *info = strings.TrimSpace(string(data[infoStart : infoStart+infoLength])) } - i = skipChar(data, i, ' ') - if i >= len(data) || data[i] != '\n' { - return + if i == len(data) { + return i, marker } - - skip = i + 1 - return + if i > len(data) || data[i] != '\n' { + return 0, "" + } + return i + 1, marker // Take newline into account. } -func (p *parser) fencedCode(out *bytes.Buffer, data []byte, doRender bool) int { - var lang *string - beg, marker := p.isFencedCode(data, &lang, "") +// fencedCodeBlock returns the end index if data contains a fenced code block at the beginning, +// or 0 otherwise. It writes to out if doRender is true, otherwise it has no side effects. +// If doRender is true, a final newline is mandatory to recognize the fenced code block. +func (p *Markdown) fencedCodeBlock(data []byte, doRender bool) int { + var info string + beg, marker := isFenceLine(data, &info, "") if beg == 0 || beg >= len(data) { return 0 } + fenceLength := beg - 1 var work bytes.Buffer + work.Write([]byte(info)) + work.WriteByte('\n') for { // safe to assume beg < len(data) // check for the end of the code block - fenceEnd, _ := p.isFencedCode(data[beg:], nil, marker) + fenceEnd, _ := isFenceLine(data[beg:], nil, marker) if fenceEnd != 0 { beg += fenceEnd break @@ -668,30 +703,57 @@ func (p *parser) fencedCode(out *bytes.Buffer, data []byte, doRender bool) int { beg = end } - syntax := "" - if lang != nil { - syntax = *lang - } - if doRender { - p.r.BlockCode(out, work.Bytes(), syntax) + block := p.addBlock(CodeBlock, work.Bytes()) // TODO: get rid of temp buffer + block.IsFenced = true + block.FenceLength = fenceLength + finalizeCodeBlock(block) } return beg } -func (p *parser) table(out *bytes.Buffer, data []byte) int { - var header bytes.Buffer - i, columns := p.tableHeader(&header, data) +func unescapeChar(str []byte) []byte { + if str[0] == '\\' { + return []byte{str[1]} + } + return []byte(html.UnescapeString(string(str))) +} + +func unescapeString(str []byte) []byte { + if reBackslashOrAmp.Match(str) { + return reEntityOrEscapedChar.ReplaceAllFunc(str, unescapeChar) + } + return str +} + +func finalizeCodeBlock(block *Node) { + if block.IsFenced { + newlinePos := bytes.IndexByte(block.content, '\n') + firstLine := block.content[:newlinePos] + rest := block.content[newlinePos+1:] + block.Info = unescapeString(bytes.Trim(firstLine, "\n")) + block.Literal = rest + } else { + block.Literal = block.content + } + block.content = nil +} + +func (p *Markdown) table(data []byte) int { + table := p.addBlock(Table, nil) + i, columns := p.tableHeader(data) if i == 0 { + p.tip = table.Parent + table.Unlink() return 0 } - var body bytes.Buffer + p.addBlock(TableBody, nil) for i < len(data) { pipes, rowStart := 0, i - for ; data[i] != '\n'; i++ { + for ; i < len(data) && data[i] != '\n'; i++ { if data[i] == '|' { pipes++ } @@ -703,12 +765,12 @@ func (p *parser) table(out *bytes.Buffer, data []byte) int { } // include the newline in data sent to tableRow - i++ - p.tableRow(&body, data[rowStart:i], columns, false) + if i < len(data) && data[i] == '\n' { + i++ + } + p.tableRow(data[rowStart:i], columns, false) } - p.r.Table(out, header.Bytes(), body.Bytes(), columns) - return i } @@ -721,10 +783,10 @@ func isBackslashEscaped(data []byte, i int) bool { return backslashes&1 == 1 } -func (p *parser) tableHeader(out *bytes.Buffer, data []byte) (size int, columns []int) { +func (p *Markdown) tableHeader(data []byte) (size int, columns []CellAlignFlags) { i := 0 colCount := 1 - for i = 0; data[i] != '\n'; i++ { + for i = 0; i < len(data) && data[i] != '\n'; i++ { if data[i] == '|' && !isBackslashEscaped(data, i) { colCount++ } @@ -736,7 +798,11 @@ func (p *parser) tableHeader(out *bytes.Buffer, data []byte) (size int, columns } // include the newline in the data sent to tableRow - header := data[:i+1] + j := i + if j < len(data) && data[j] == '\n' { + j++ + } + header := data[:j] // column count ignores pipes at beginning or end of line if data[0] == '|' { @@ -746,7 +812,7 @@ func (p *parser) tableHeader(out *bytes.Buffer, data []byte) (size int, columns colCount-- } - columns = make([]int, colCount) + columns = make([]CellAlignFlags, colCount) // move on to the header underline i++ @@ -762,27 +828,29 @@ func (p *parser) tableHeader(out *bytes.Buffer, data []byte) (size int, columns // each column header is of form: / *:?-+:? *|/ with # dashes + # colons >= 3 // and trailing | optional on last column col := 0 - for data[i] != '\n' { + for i < len(data) && data[i] != '\n' { dashes := 0 if data[i] == ':' { i++ - columns[col] |= TABLE_ALIGNMENT_LEFT + columns[col] |= TableAlignmentLeft dashes++ } - for data[i] == '-' { + for i < len(data) && data[i] == '-' { i++ dashes++ } - if data[i] == ':' { + if i < len(data) && data[i] == ':' { i++ - columns[col] |= TABLE_ALIGNMENT_RIGHT + columns[col] |= TableAlignmentRight dashes++ } - for data[i] == ' ' { + for i < len(data) && data[i] == ' ' { i++ } - + if i == len(data) { + return + } // end of column test is messy switch { case dashes < 3: @@ -793,12 +861,12 @@ func (p *parser) tableHeader(out *bytes.Buffer, data []byte) (size int, columns // marker found, now skip past trailing whitespace col++ i++ - for data[i] == ' ' { + for i < len(data) && data[i] == ' ' { i++ } // trailing junk found after last column - if col >= colCount && data[i] != '\n' { + if col >= colCount && i < len(data) && data[i] != '\n' { return } @@ -819,27 +887,31 @@ func (p *parser) tableHeader(out *bytes.Buffer, data []byte) (size int, columns return } - p.tableRow(out, header, columns, true) - size = i + 1 + p.addBlock(TableHead, nil) + p.tableRow(header, columns, true) + size = i + if size < len(data) && data[size] == '\n' { + size++ + } return } -func (p *parser) tableRow(out *bytes.Buffer, data []byte, columns []int, header bool) { +func (p *Markdown) tableRow(data []byte, columns []CellAlignFlags, header bool) { + p.addBlock(TableRow, nil) i, col := 0, 0 - var rowWork bytes.Buffer if data[i] == '|' && !isBackslashEscaped(data, i) { i++ } for col = 0; col < len(columns) && i < len(data); col++ { - for data[i] == ' ' { + for i < len(data) && data[i] == ' ' { i++ } cellStart := i - for (data[i] != '|' || isBackslashEscaped(data, i)) && data[i] != '\n' { + for i < len(data) && (data[i] != '|' || isBackslashEscaped(data, i)) && data[i] != '\n' { i++ } @@ -848,42 +920,33 @@ func (p *parser) tableRow(out *bytes.Buffer, data []byte, columns []int, header // skip the end-of-cell marker, possibly taking us past end of buffer i++ - for cellEnd > cellStart && data[cellEnd-1] == ' ' { + for cellEnd > cellStart && cellEnd-1 < len(data) && data[cellEnd-1] == ' ' { cellEnd-- } - var cellWork bytes.Buffer - p.inline(&cellWork, data[cellStart:cellEnd]) - - if header { - p.r.TableHeaderCell(&rowWork, cellWork.Bytes(), columns[col]) - } else { - p.r.TableCell(&rowWork, cellWork.Bytes(), columns[col]) - } + cell := p.addBlock(TableCell, data[cellStart:cellEnd]) + cell.IsHeader = header + cell.Align = columns[col] } // pad it out with empty columns to get the right number for ; col < len(columns); col++ { - if header { - p.r.TableHeaderCell(&rowWork, nil, columns[col]) - } else { - p.r.TableCell(&rowWork, nil, columns[col]) - } + cell := p.addBlock(TableCell, nil) + cell.IsHeader = header + cell.Align = columns[col] } // silently ignore rows with too many cells - - p.r.TableRow(out, rowWork.Bytes()) } // returns blockquote prefix length -func (p *parser) quotePrefix(data []byte) int { +func (p *Markdown) quotePrefix(data []byte) int { i := 0 - for i < 3 && data[i] == ' ' { + for i < 3 && i < len(data) && data[i] == ' ' { i++ } - if data[i] == '>' { - if data[i+1] == ' ' { + if i < len(data) && data[i] == '>' { + if i+1 < len(data) && data[i+1] == ' ' { return i + 2 } return i + 1 @@ -893,7 +956,7 @@ func (p *parser) quotePrefix(data []byte) int { // blockquote ends with at least one blank line // followed by something without a blockquote prefix -func (p *parser) terminateBlockquote(data []byte, beg, end int) bool { +func (p *Markdown) terminateBlockquote(data []byte, beg, end int) bool { if p.isEmpty(data[beg:]) <= 0 { return false } @@ -904,7 +967,8 @@ func (p *parser) terminateBlockquote(data []byte, beg, end int) bool { } // parse a blockquote fragment -func (p *parser) quote(out *bytes.Buffer, data []byte) int { +func (p *Markdown) quote(data []byte) int { + block := p.addBlock(BlockQuote, nil) var raw bytes.Buffer beg, end := 0, 0 for beg < len(data) { @@ -912,9 +976,9 @@ func (p *parser) quote(out *bytes.Buffer, data []byte) int { // Step over whole lines, collecting them. While doing that, check for // fenced code and if one's found, incorporate it altogether, // irregardless of any contents inside it - for data[end] != '\n' { - if p.flags&EXTENSION_FENCED_CODE != 0 { - if i := p.fencedCode(out, data[end:], false); i > 0 { + for end < len(data) && data[end] != '\n' { + if p.extensions&FencedCode != 0 { + if i := p.fencedCodeBlock(data[end:], false); i > 0 { // -1 to compensate for the extra end++ after the loop: end += i - 1 break @@ -922,44 +986,53 @@ func (p *parser) quote(out *bytes.Buffer, data []byte) int { } end++ } - end++ - + if end < len(data) && data[end] == '\n' { + if end+1 < len(data) && data[end+1] == '\n' { + beg += p.quotePrefix(data[beg:]) + raw.Write(data[beg:end]) + break + } else { + end++ + } + } if pre := p.quotePrefix(data[beg:]); pre > 0 { // skip the prefix beg += pre } else if p.terminateBlockquote(data, beg, end) { break } - // this line is part of the blockquote raw.Write(data[beg:end]) beg = end } - - var cooked bytes.Buffer - p.block(&cooked, raw.Bytes()) - p.r.BlockQuote(out, cooked.Bytes()) + p.block(raw.Bytes()) + p.finalize(block) return end } // returns prefix length for block code -func (p *parser) codePrefix(data []byte) int { - if data[0] == ' ' && data[1] == ' ' && data[2] == ' ' && data[3] == ' ' { +func (p *Markdown) codePrefix(data []byte) int { + if len(data) >= 1 && data[0] == '\t' { + return 1 + } + if len(data) >= 4 && data[0] == ' ' && data[1] == ' ' && data[2] == ' ' && data[3] == ' ' { return 4 } return 0 } -func (p *parser) code(out *bytes.Buffer, data []byte) int { +func (p *Markdown) code(data []byte) int { var work bytes.Buffer i := 0 for i < len(data) { beg := i - for data[i] != '\n' { + for i < len(data) && data[i] != '\n' { + i++ + } + if i < len(data) && data[i] == '\n' { i++ } - i++ blankline := p.isEmpty(data[beg:i]) > 0 if pre := p.codePrefix(data[beg:i]); pre > 0 { @@ -970,7 +1043,7 @@ func (p *parser) code(out *bytes.Buffer, data []byte) int { break } - // verbatim copy to the working buffeu + // verbatim copy to the working buffer if blankline { work.WriteByte('\n') } else { @@ -990,122 +1063,195 @@ func (p *parser) code(out *bytes.Buffer, data []byte) int { work.WriteByte('\n') - p.r.BlockCode(out, work.Bytes(), "") + block := p.addBlock(CodeBlock, work.Bytes()) // TODO: get rid of temp buffer + block.IsFenced = false + finalizeCodeBlock(block) return i } // returns unordered list item prefix -func (p *parser) uliPrefix(data []byte) int { +func (p *Markdown) uliPrefix(data []byte) int { i := 0 - // start with up to 3 spaces - for i < 3 && data[i] == ' ' { + for i < len(data) && i < 3 && data[i] == ' ' { i++ } - - // need a *, +, or - followed by a space + if i >= len(data)-1 { + return 0 + } + // need one of {'*', '+', '-'} followed by a space or a tab if (data[i] != '*' && data[i] != '+' && data[i] != '-') || - data[i+1] != ' ' { + (data[i+1] != ' ' && data[i+1] != '\t') { return 0 } return i + 2 } // returns ordered list item prefix -func (p *parser) oliPrefix(data []byte) int { +func (p *Markdown) oliPrefix(data []byte) int { i := 0 // start with up to 3 spaces - for i < 3 && data[i] == ' ' { + for i < 3 && i < len(data) && data[i] == ' ' { i++ } // count the digits start := i - for data[i] >= '0' && data[i] <= '9' { + for i < len(data) && data[i] >= '0' && data[i] <= '9' { i++ } + if start == i || i >= len(data)-1 { + return 0 + } - // we need >= 1 digits followed by a dot and a space - if start == i || data[i] != '.' || data[i+1] != ' ' { + // we need >= 1 digits followed by a dot and a space or a tab + if data[i] != '.' || !(data[i+1] == ' ' || data[i+1] == '\t') { return 0 } return i + 2 } // returns definition list item prefix -func (p *parser) dliPrefix(data []byte) int { +func (p *Markdown) dliPrefix(data []byte) int { + if len(data) < 2 { + return 0 + } i := 0 - - // need a : followed by a spaces - if data[i] != ':' || data[i+1] != ' ' { + // need a ':' followed by a space or a tab + if data[i] != ':' || !(data[i+1] == ' ' || data[i+1] == '\t') { return 0 } - for data[i] == ' ' { + for i < len(data) && data[i] == ' ' { i++ } return i + 2 } // parse ordered or unordered list block -func (p *parser) list(out *bytes.Buffer, data []byte, flags int) int { +func (p *Markdown) list(data []byte, flags ListType) int { i := 0 - flags |= LIST_ITEM_BEGINNING_OF_LIST - work := func() bool { - for i < len(data) { - skip := p.listItem(out, data[i:], &flags) - i += skip + flags |= ListItemBeginningOfList + block := p.addBlock(List, nil) + block.ListFlags = flags + block.Tight = true - if skip == 0 || flags&LIST_ITEM_END_OF_LIST != 0 { - break - } - flags &= ^LIST_ITEM_BEGINNING_OF_LIST + for i < len(data) { + skip := p.listItem(data[i:], &flags) + if flags&ListItemContainsBlock != 0 { + block.ListData.Tight = false } - return true + i += skip + if skip == 0 || flags&ListItemEndOfList != 0 { + break + } + flags &= ^ListItemBeginningOfList } - p.r.List(out, work, flags) + above := block.Parent + finalizeList(block) + p.tip = above return i } +// Returns true if the list item is not the same type as its parent list +func (p *Markdown) listTypeChanged(data []byte, flags *ListType) bool { + if p.dliPrefix(data) > 0 && *flags&ListTypeDefinition == 0 { + return true + } else if p.oliPrefix(data) > 0 && *flags&ListTypeOrdered == 0 { + return true + } else if p.uliPrefix(data) > 0 && (*flags&ListTypeOrdered != 0 || *flags&ListTypeDefinition != 0) { + return true + } + return false +} + +// Returns true if block ends with a blank line, descending if needed +// into lists and sublists. +func endsWithBlankLine(block *Node) bool { + // TODO: figure this out. Always false now. + for block != nil { + // if block.lastLineBlank { + // return true + // } + t := block.Type + if t == List || t == Item { + block = block.LastChild + } else { + break + } + } + return false +} + +func finalizeList(block *Node) { + block.open = false + item := block.FirstChild + for item != nil { + // check for non-final list item ending with blank line: + if endsWithBlankLine(item) && item.Next != nil { + block.ListData.Tight = false + break + } + // recurse into children of list item, to see if there are spaces + // between any of them: + subItem := item.FirstChild + for subItem != nil { + if endsWithBlankLine(subItem) && (item.Next != nil || subItem.Next != nil) { + block.ListData.Tight = false + break + } + subItem = subItem.Next + } + item = item.Next + } +} + // Parse a single list item. // Assumes initial prefix is already removed if this is a sublist. -func (p *parser) listItem(out *bytes.Buffer, data []byte, flags *int) int { +func (p *Markdown) listItem(data []byte, flags *ListType) int { // keep track of the indentation of the first line itemIndent := 0 - for itemIndent < 3 && data[itemIndent] == ' ' { - itemIndent++ + if data[0] == '\t' { + itemIndent += 4 + } else { + for itemIndent < 3 && data[itemIndent] == ' ' { + itemIndent++ + } } + var bulletChar byte = '*' i := p.uliPrefix(data) if i == 0 { i = p.oliPrefix(data) + } else { + bulletChar = data[i-2] } if i == 0 { i = p.dliPrefix(data) // reset definition term flag if i > 0 { - *flags &= ^LIST_TYPE_TERM + *flags &= ^ListTypeTerm } } if i == 0 { - // if in defnition list, set term flag and continue - if *flags&LIST_TYPE_DEFINITION != 0 { - *flags |= LIST_TYPE_TERM + // if in definition list, set term flag and continue + if *flags&ListTypeDefinition != 0 { + *flags |= ListTypeTerm } else { return 0 } } // skip leading whitespace on first line - for data[i] == ' ' { + for i < len(data) && data[i] == ' ' { i++ } // find the end of the line line := i - for i > 0 && data[i-1] != '\n' { + for i > 0 && i < len(data) && data[i-1] != '\n' { i++ } @@ -1119,13 +1265,14 @@ func (p *parser) listItem(out *bytes.Buffer, data []byte, flags *int) int { // process the following lines containsBlankLine := false sublist := 0 + codeBlockMarker := "" gatherlines: for line < len(data) { i++ // find the end of this line - for data[i-1] != '\n' { + for i < len(data) && data[i-1] != '\n' { i++ } @@ -1139,11 +1286,39 @@ gatherlines: // calculate the indentation indent := 0 - for indent < 4 && line+indent < i && data[line+indent] == ' ' { - indent++ + indentIndex := 0 + if data[line] == '\t' { + indentIndex++ + indent += 4 + } else { + for indent < 4 && line+indent < i && data[line+indent] == ' ' { + indent++ + indentIndex++ + } } - chunk := data[line+indent : i] + chunk := data[line+indentIndex : i] + + if p.extensions&FencedCode != 0 { + // determine if in or out of codeblock + // if in codeblock, ignore normal list processing + _, marker := isFenceLine(chunk, nil, codeBlockMarker) + if marker != "" { + if codeBlockMarker == "" { + // start of codeblock + codeBlockMarker = marker + } else { + // end of codeblock. + codeBlockMarker = "" + } + } + // we are in a codeblock, write line, and continue + if codeBlockMarker != "" || marker != "" { + raw.Write(data[line+indentIndex : i]) + line = i + continue gatherlines + } + } // evaluate how this line fits in switch { @@ -1152,109 +1327,111 @@ gatherlines: p.oliPrefix(chunk) > 0 || p.dliPrefix(chunk) > 0: - if containsBlankLine { - *flags |= LIST_ITEM_CONTAINS_BLOCK - } - // to be a nested list, it must be indented more - // if not, it is the next item in the same list + // if not, it is either a different kind of list + // or the next item in the same list if indent <= itemIndent { + if p.listTypeChanged(chunk, flags) { + *flags |= ListItemEndOfList + } else if containsBlankLine { + *flags |= ListItemContainsBlock + } + break gatherlines } + if containsBlankLine { + *flags |= ListItemContainsBlock + } + // is this the first item in the nested list? if sublist == 0 { sublist = raw.Len() } - // is this a nested prefix header? - case p.isPrefixHeader(chunk): - // if the header is not indented, it is not nested in the list + // is this a nested prefix heading? + case p.isPrefixHeading(chunk): + // if the heading is not indented, it is not nested in the list // and thus ends the list if containsBlankLine && indent < 4 { - *flags |= LIST_ITEM_END_OF_LIST + *flags |= ListItemEndOfList break gatherlines } - *flags |= LIST_ITEM_CONTAINS_BLOCK + *flags |= ListItemContainsBlock // anything following an empty line is only part // of this item if it is indented 4 spaces // (regardless of the indentation of the beginning of the item) case containsBlankLine && indent < 4: - if *flags&LIST_TYPE_DEFINITION != 0 && i < len(data)-1 { + if *flags&ListTypeDefinition != 0 && i < len(data)-1 { // is the next item still a part of this list? next := i - for data[next] != '\n' { + for next < len(data) && data[next] != '\n' { next++ } for next < len(data)-1 && data[next] == '\n' { next++ } if i < len(data)-1 && data[i] != ':' && data[next] != ':' { - *flags |= LIST_ITEM_END_OF_LIST + *flags |= ListItemEndOfList } } else { - *flags |= LIST_ITEM_END_OF_LIST + *flags |= ListItemEndOfList } break gatherlines // a blank line means this should be parsed as a block case containsBlankLine: raw.WriteByte('\n') - *flags |= LIST_ITEM_CONTAINS_BLOCK + *flags |= ListItemContainsBlock } - // if this line was preceeded by one or more blanks, + // if this line was preceded by one or more blanks, // re-introduce the blank into the buffer if containsBlankLine { containsBlankLine = false raw.WriteByte('\n') - } // add the line into the working buffer without prefix - raw.Write(data[line+indent : i]) + raw.Write(data[line+indentIndex : i]) line = i } rawBytes := raw.Bytes() + block := p.addBlock(Item, nil) + block.ListFlags = *flags + block.Tight = false + block.BulletChar = bulletChar + block.Delimiter = '.' // Only '.' is possible in Markdown, but ')' will also be possible in CommonMark + // render the contents of the list item - var cooked bytes.Buffer - if *flags&LIST_ITEM_CONTAINS_BLOCK != 0 && *flags&LIST_TYPE_TERM == 0 { + if *flags&ListItemContainsBlock != 0 && *flags&ListTypeTerm == 0 { // intermediate render of block item, except for definition term if sublist > 0 { - p.block(&cooked, rawBytes[:sublist]) - p.block(&cooked, rawBytes[sublist:]) + p.block(rawBytes[:sublist]) + p.block(rawBytes[sublist:]) } else { - p.block(&cooked, rawBytes) + p.block(rawBytes) } } else { // intermediate render of inline item if sublist > 0 { - p.inline(&cooked, rawBytes[:sublist]) - p.block(&cooked, rawBytes[sublist:]) + child := p.addChild(Paragraph, 0) + child.content = rawBytes[:sublist] + p.block(rawBytes[sublist:]) } else { - p.inline(&cooked, rawBytes) + child := p.addChild(Paragraph, 0) + child.content = rawBytes } } - - // render the actual list item - cookedBytes := cooked.Bytes() - parsedEnd := len(cookedBytes) - - // strip trailing newlines - for parsedEnd > 0 && cookedBytes[parsedEnd-1] == '\n' { - parsedEnd-- - } - p.r.ListItem(out, cookedBytes[:parsedEnd], *flags) - return line } // render a single paragraph that has already been parsed out -func (p *parser) renderParagraph(out *bytes.Buffer, data []byte) { +func (p *Markdown) renderParagraph(data []byte) { if len(data) == 0 { return } @@ -1265,27 +1442,29 @@ func (p *parser) renderParagraph(out *bytes.Buffer, data []byte) { beg++ } + end := len(data) // trim trailing newline - end := len(data) - 1 + if data[len(data)-1] == '\n' { + end-- + } // trim trailing spaces for end > beg && data[end-1] == ' ' { end-- } - work := func() bool { - p.inline(out, data[beg:end]) - return true - } - p.r.Paragraph(out, work) + p.addBlock(Paragraph, data[beg:end]) } -func (p *parser) paragraph(out *bytes.Buffer, data []byte) int { +func (p *Markdown) paragraph(data []byte) int { // prev: index of 1st char of previous line // line: index of 1st char of current line // i: index of cursor/end of current line var prev, line, i int - + tabSize := TabSizeDefault + if p.extensions&TabSizeEight != 0 { + tabSize = TabSizeDouble + } // keep going until we find something to mark the end of the paragraph for i < len(data) { // mark the beginning of the current line @@ -1293,24 +1472,32 @@ func (p *parser) paragraph(out *bytes.Buffer, data []byte) int { current := data[i:] line = i + // did we find a reference or a footnote? If so, end a paragraph + // preceding it and report that we have consumed up to the end of that + // reference: + if refEnd := isReference(p, current, tabSize); refEnd > 0 { + p.renderParagraph(data[:i]) + return i + refEnd + } + // did we find a blank line marking the end of the paragraph? if n := p.isEmpty(current); n > 0 { // did this blank line followed by a definition list item? - if p.flags&EXTENSION_DEFINITION_LISTS != 0 { + if p.extensions&DefinitionLists != 0 { if i < len(data)-1 && data[i+1] == ':' { - return p.list(out, data[prev:], LIST_TYPE_DEFINITION) + return p.list(data[prev:], ListTypeDefinition) } } - p.renderParagraph(out, data[:i]) + p.renderParagraph(data[:i]) return i + n } - // an underline under some text marks a header, so our paragraph ended on prev line + // an underline under some text marks a heading, so our paragraph ended on prev line if i > 0 { - if level := p.isUnderlinedHeader(current); level > 0 { + if level := p.isUnderlinedHeading(current); level > 0 { // render the paragraph - p.renderParagraph(out, data[:prev]) + p.renderParagraph(data[:prev]) // ignore leading and trailing whitespace eol := i - 1 @@ -1321,24 +1508,17 @@ func (p *parser) paragraph(out *bytes.Buffer, data []byte) int { eol-- } - // render the header - // this ugly double closure avoids forcing variables onto the heap - work := func(o *bytes.Buffer, pp *parser, d []byte) func() bool { - return func() bool { - pp.inline(o, d) - return true - } - }(out, p, data[prev:eol]) - id := "" - if p.flags&EXTENSION_AUTO_HEADER_IDS != 0 { - id = sanitized_anchor_name.Create(string(data[prev:eol])) + if p.extensions&AutoHeadingIDs != 0 { + id = SanitizedAnchorName(string(data[prev:eol])) } - p.r.Header(out, work, level, id) + block := p.addBlock(Heading, data[prev:eol]) + block.Level = level + block.HeadingID = id // find the end of the underline - for data[i] != '\n' { + for i < len(data) && data[i] != '\n' { i++ } return i @@ -1346,53 +1526,93 @@ func (p *parser) paragraph(out *bytes.Buffer, data []byte) int { } // if the next line starts a block of HTML, then the paragraph ends here - if p.flags&EXTENSION_LAX_HTML_BLOCKS != 0 { - if data[i] == '<' && p.html(out, current, false) > 0 { + if p.extensions&LaxHTMLBlocks != 0 { + if data[i] == '<' && p.html(current, false) > 0 { // rewind to before the HTML block - p.renderParagraph(out, data[:i]) + p.renderParagraph(data[:i]) return i } } - // if there's a prefixed header or a horizontal rule after this, paragraph is over - if p.isPrefixHeader(current) || p.isHRule(current) { - p.renderParagraph(out, data[:i]) + // if there's a prefixed heading or a horizontal rule after this, paragraph is over + if p.isPrefixHeading(current) || p.isHRule(current) { + p.renderParagraph(data[:i]) return i } // if there's a fenced code block, paragraph is over - if p.flags&EXTENSION_FENCED_CODE != 0 { - if p.fencedCode(out, current, false) > 0 { - p.renderParagraph(out, data[:i]) + if p.extensions&FencedCode != 0 { + if p.fencedCodeBlock(current, false) > 0 { + p.renderParagraph(data[:i]) return i } } // if there's a definition list item, prev line is a definition term - if p.flags&EXTENSION_DEFINITION_LISTS != 0 { + if p.extensions&DefinitionLists != 0 { if p.dliPrefix(current) != 0 { - return p.list(out, data[prev:], LIST_TYPE_DEFINITION) + ret := p.list(data[prev:], ListTypeDefinition) + return ret } } // if there's a list after this, paragraph is over - if p.flags&EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK != 0 { + if p.extensions&NoEmptyLineBeforeBlock != 0 { if p.uliPrefix(current) != 0 || p.oliPrefix(current) != 0 || p.quotePrefix(current) != 0 || p.codePrefix(current) != 0 { - p.renderParagraph(out, data[:i]) + p.renderParagraph(data[:i]) return i } } // otherwise, scan to the beginning of the next line - for data[i] != '\n' { - i++ + nl := bytes.IndexByte(data[i:], '\n') + if nl >= 0 { + i += nl + 1 + } else { + i += len(data[i:]) } + } + + p.renderParagraph(data[:i]) + return i +} + +func skipChar(data []byte, start int, char byte) int { + i := start + for i < len(data) && data[i] == char { i++ } + return i +} - p.renderParagraph(out, data[:i]) +func skipUntilChar(text []byte, start int, char byte) int { + i := start + for i < len(text) && text[i] != char { + i++ + } return i } + +// SanitizedAnchorName returns a sanitized anchor name for the given text. +// +// It implements the algorithm specified in the package comment. +func SanitizedAnchorName(text string) string { + var anchorName []rune + futureDash := false + for _, r := range text { + switch { + case unicode.IsLetter(r) || unicode.IsNumber(r): + if futureDash && len(anchorName) > 0 { + anchorName = append(anchorName, '-') + } + futureDash = false + anchorName = append(anchorName, unicode.ToLower(r)) + default: + futureDash = true + } + } + return string(anchorName) +} diff --git a/block_test.go b/block_test.go index b33c2578..ed854670 100644 --- a/block_test.go +++ b/block_test.go @@ -18,66 +18,8 @@ import ( "testing" ) -func runMarkdownBlockWithRenderer(input string, extensions int, renderer Renderer) string { - return string(Markdown([]byte(input), renderer, extensions)) -} - -func runMarkdownBlock(input string, extensions int) string { - htmlFlags := 0 - htmlFlags |= HTML_USE_XHTML - - renderer := HtmlRenderer(htmlFlags, "", "") - - return runMarkdownBlockWithRenderer(input, extensions, renderer) -} - -func runnerWithRendererParameters(parameters HtmlRendererParameters) func(string, int) string { - return func(input string, extensions int) string { - htmlFlags := 0 - htmlFlags |= HTML_USE_XHTML - - renderer := HtmlRendererWithParameters(htmlFlags, "", "", parameters) - - return runMarkdownBlockWithRenderer(input, extensions, renderer) - } -} - -func doTestsBlock(t *testing.T, tests []string, extensions int) { - doTestsBlockWithRunner(t, tests, extensions, runMarkdownBlock) -} - -func doTestsBlockWithRunner(t *testing.T, tests []string, extensions int, runner func(string, int) string) { - // catch and report panics - var candidate string - defer func() { - if err := recover(); err != nil { - t.Errorf("\npanic while processing [%#v]: %s\n", candidate, err) - } - }() - - for i := 0; i+1 < len(tests); i += 2 { - input := tests[i] - candidate = input - expected := tests[i+1] - actual := runner(candidate, extensions) - if actual != expected { - t.Errorf("\nInput [%#v]\nExpected[%#v]\nActual [%#v]", - candidate, expected, actual) - } - - // now test every substring to stress test bounds checking - if !testing.Short() { - for start := 0; start < len(input); start++ { - for end := start + 1; end <= len(input); end++ { - candidate = input[start:end] - _ = runMarkdownBlock(candidate, extensions) - } - } - } - } -} - func TestPrefixHeaderNoExtensions(t *testing.T) { + t.Parallel() var tests = []string{ "# Header 1\n", "

Header 1

\n", @@ -147,6 +89,7 @@ func TestPrefixHeaderNoExtensions(t *testing.T) { } func TestPrefixHeaderSpaceExtension(t *testing.T) { + t.Parallel() var tests = []string{ "# Header 1\n", "

Header 1

\n", @@ -203,10 +146,11 @@ func TestPrefixHeaderSpaceExtension(t *testing.T) { "\n", } - doTestsBlock(t, tests, EXTENSION_SPACE_HEADERS) + doTestsBlock(t, tests, SpaceHeadings) } func TestPrefixHeaderIdExtension(t *testing.T) { + t.Parallel() var tests = []string{ "# Header 1 {#someid}\n", "

Header 1

\n", @@ -263,10 +207,11 @@ func TestPrefixHeaderIdExtension(t *testing.T) { "\n", } - doTestsBlock(t, tests, EXTENSION_HEADER_IDS) + doTestsBlock(t, tests, HeadingIDs) } func TestPrefixHeaderIdExtensionWithPrefixAndSuffix(t *testing.T) { + t.Parallel() var tests = []string{ "# header 1 {#someid}\n", "

header 1

\n", @@ -306,15 +251,20 @@ func TestPrefixHeaderIdExtensionWithPrefixAndSuffix(t *testing.T) { "

Nested header

\n\n\n", } - parameters := HtmlRendererParameters{ - HeaderIDPrefix: "PRE:", - HeaderIDSuffix: ":POST", + parameters := HTMLRendererParameters{ + HeadingIDPrefix: "PRE:", + HeadingIDSuffix: ":POST", } - doTestsBlockWithRunner(t, tests, EXTENSION_HEADER_IDS, runnerWithRendererParameters(parameters)) + doTestsParam(t, tests, TestParams{ + extensions: HeadingIDs, + HTMLFlags: UseXHTML, + HTMLRendererParameters: parameters, + }) } func TestPrefixAutoHeaderIdExtension(t *testing.T) { + t.Parallel() var tests = []string{ "# Header 1\n", "

Header 1

\n", @@ -362,10 +312,11 @@ func TestPrefixAutoHeaderIdExtension(t *testing.T) { "# Header\n\n# Header 1\n\n# Header\n\n# Header", "

Header

\n\n

Header 1

\n\n

Header

\n\n

Header

\n", } - doTestsBlock(t, tests, EXTENSION_AUTO_HEADER_IDS) + doTestsBlock(t, tests, AutoHeadingIDs) } func TestPrefixAutoHeaderIdExtensionWithPrefixAndSuffix(t *testing.T) { + t.Parallel() var tests = []string{ "# Header 1\n", "

Header 1

\n", @@ -414,23 +365,116 @@ func TestPrefixAutoHeaderIdExtensionWithPrefixAndSuffix(t *testing.T) { "

Header

\n\n

Header 1

\n\n

Header

\n\n

Header

\n", } - parameters := HtmlRendererParameters{ - HeaderIDPrefix: "PRE:", - HeaderIDSuffix: ":POST", + parameters := HTMLRendererParameters{ + HeadingIDPrefix: "PRE:", + HeadingIDSuffix: ":POST", } - doTestsBlockWithRunner(t, tests, EXTENSION_AUTO_HEADER_IDS, runnerWithRendererParameters(parameters)) + doTestsParam(t, tests, TestParams{ + extensions: AutoHeadingIDs, + HTMLFlags: UseXHTML, + HTMLRendererParameters: parameters, + }) +} + +func TestPrefixHeaderLevelOffset(t *testing.T) { + t.Parallel() + var offsetTests = []struct { + offset int + tests []string + }{{ + offset: 0, + tests: []string{ + "# Header 1\n", + "

Header 1

\n", + + "## Header 2\n", + "

Header 2

\n", + + "### Header 3\n", + "

Header 3

\n", + + "#### Header 4\n", + "

Header 4

\n", + + "##### Header 5\n", + "
Header 5
\n", + + "###### Header 6\n", + "
Header 6
\n", + + "####### Header 7\n", + "
# Header 7
\n", + }, + }, { + offset: 1, + tests: []string{ + "# Header 1\n", + "

Header 1

\n", + + "## Header 2\n", + "

Header 2

\n", + + "### Header 3\n", + "

Header 3

\n", + + "#### Header 4\n", + "
Header 4
\n", + + "##### Header 5\n", + "
Header 5
\n", + + "###### Header 6\n", + "
Header 6
\n", + + "####### Header 7\n", + "
# Header 7
\n", + }, + }, { + offset: -1, + tests: []string{ + "# Header 1\n", + "

Header 1

\n", + + "## Header 2\n", + "

Header 2

\n", + + "### Header 3\n", + "

Header 3

\n", + + "#### Header 4\n", + "

Header 4

\n", + + "##### Header 5\n", + "

Header 5

\n", + + "###### Header 6\n", + "
Header 6
\n", + + "####### Header 7\n", + "
# Header 7
\n", + }, + }} + for _, offsetTest := range offsetTests { + offset := offsetTest.offset + tests := offsetTest.tests + doTestsParam(t, tests, TestParams{ + HTMLRendererParameters: HTMLRendererParameters{HeadingLevelOffset: offset}, + }) + } } func TestPrefixMultipleHeaderExtensions(t *testing.T) { + t.Parallel() var tests = []string{ "# Header\n\n# Header {#header}\n\n# Header 1", "

Header

\n\n

Header

\n\n

Header 1

\n", } - doTestsBlock(t, tests, EXTENSION_AUTO_HEADER_IDS|EXTENSION_HEADER_IDS) + doTestsBlock(t, tests, AutoHeadingIDs|HeadingIDs) } func TestUnderlineHeaders(t *testing.T) { + t.Parallel() var tests = []string{ "Header 1\n========\n", "

Header 1

\n", @@ -481,6 +525,7 @@ func TestUnderlineHeaders(t *testing.T) { } func TestUnderlineHeadersAutoIDs(t *testing.T) { + t.Parallel() var tests = []string{ "Header 1\n========\n", "

Header 1

\n", @@ -527,10 +572,11 @@ func TestUnderlineHeadersAutoIDs(t *testing.T) { "Header 1\n========\n\nHeader 1\n========\n", "

Header 1

\n\n

Header 1

\n", } - doTestsBlock(t, tests, EXTENSION_AUTO_HEADER_IDS) + doTestsBlock(t, tests, AutoHeadingIDs) } func TestHorizontalRule(t *testing.T) { + t.Parallel() var tests = []string{ "-\n", "

-

\n", @@ -596,6 +642,7 @@ func TestHorizontalRule(t *testing.T) { } func TestUnorderedList(t *testing.T) { + t.Parallel() var tests = []string{ "* Hello\n", "\n", @@ -707,6 +754,7 @@ func TestUnorderedList(t *testing.T) { } func TestOrderedList(t *testing.T) { + t.Parallel() var tests = []string{ "1. Hello\n", "
    \n
  1. Hello
  2. \n
\n", @@ -803,6 +851,7 @@ func TestOrderedList(t *testing.T) { } func TestDefinitionList(t *testing.T) { + t.Parallel() var tests = []string{ "Term 1\n: Definition a\n", "
\n
Term 1
\n
Definition a
\n
\n", @@ -901,10 +950,23 @@ func TestDefinitionList(t *testing.T) { "\n" + "\n

Text 2

\n", } - doTestsBlock(t, tests, EXTENSION_DEFINITION_LISTS) + doTestsBlock(t, tests, DefinitionLists) +} + +func TestConsecutiveLists(t *testing.T) { + t.Parallel() + var tests = []string{ + "1. Hello\n\n* Hello\n\nTerm 1\n: Definition a\n", + "
    \n
  1. Hello
  2. \n
\n\n\n\n
\n
Term 1
\n
Definition a
\n
\n", + + "1. Not nested\n2. ordered list\n\n\t1. nested\n\t2. ordered list\n\n\t* nested\n\t* unordered list\n* Not nested\n* unordered list", + "
    \n
  1. Not nested

  2. \n\n
  3. ordered list

    \n\n
      \n
    1. nested
    2. \n
    3. ordered list
    4. \n
    \n\n
      \n
    • nested
    • \n
    • unordered list
    • \n
  4. \n
\n\n\n", + } + doTestsBlock(t, tests, DefinitionLists) } func TestPreformattedHtml(t *testing.T) { + t.Parallel() var tests = []string{ "
\n", "
\n", @@ -958,6 +1020,7 @@ func TestPreformattedHtml(t *testing.T) { } func TestPreformattedHtmlLax(t *testing.T) { + t.Parallel() var tests = []string{ "Paragraph\n
\nHere? >&<\n
\n", "

Paragraph

\n\n
\nHere? >&<\n
\n", @@ -977,14 +1040,18 @@ func TestPreformattedHtmlLax(t *testing.T) { "Paragraph\n\n
\nHow about here? >&<\n
\n\nAnd here?\n", "

Paragraph

\n\n
\nHow about here? >&<\n
\n\n

And here?

\n", } - doTestsBlock(t, tests, EXTENSION_LAX_HTML_BLOCKS) + doTestsBlock(t, tests, LaxHTMLBlocks) } func TestFencedCodeBlock(t *testing.T) { + t.Parallel() var tests = []string{ "``` go\nfunc foo() bool {\n\treturn true;\n}\n```\n", "
func foo() bool {\n\treturn true;\n}\n
\n", + "``` go foo bar\nfunc foo() bool {\n\treturn true;\n}\n```\n", + "
func foo() bool {\n\treturn true;\n}\n
\n", + "``` c\n/* special & char < > \" escaping */\n```\n", "
/* special & char < > " escaping */\n
\n", @@ -1062,11 +1129,18 @@ func TestFencedCodeBlock(t *testing.T) { "Some text before a fenced code block\n``` oz\ncode blocks breakup paragraphs\n```\nSome text in between\n``` oz\nmultiple code blocks work okay\n```\nAnd some text after a fenced code block", "

Some text before a fenced code block

\n\n
code blocks breakup paragraphs\n
\n\n

Some text in between

\n\n
multiple code blocks work okay\n
\n\n

And some text after a fenced code block

\n", + + "```\n[]:()\n```\n", + "
[]:()\n
\n", + + "```\n[]:()\n[]:)\n[]:(\n[]:x\n[]:testing\n[:testing\n\n[]:\nlinebreak\n[]()\n\n[]:\n[]()\n```", + "
[]:()\n[]:)\n[]:(\n[]:x\n[]:testing\n[:testing\n\n[]:\nlinebreak\n[]()\n\n[]:\n[]()\n
\n", } - doTestsBlock(t, tests, EXTENSION_FENCED_CODE) + doTestsBlock(t, tests, FencedCode) } func TestFencedCodeInsideBlockquotes(t *testing.T) { + t.Parallel() cat := func(s ...string) string { return strings.Join(s, "\n") } var tests = []string{ cat("> ```go", @@ -1141,6 +1215,31 @@ continues

woo.

+`, + // ------------------------------------------- + cat("> foo", + "> ", + "> quote", + "", + "> foo2", + "> ", + "> quote2", + "", + "continues", + ""), + `
+

foo

+ +

quote

+
+ +
+

foo2

+ +

quote2

+
+ +

continues

`, } @@ -1175,10 +1274,11 @@ okay tests = append(tests, forms[0], want) tests = append(tests, forms[1], want) - doTestsBlock(t, tests, EXTENSION_FENCED_CODE) + doTestsBlock(t, tests, FencedCode) } func TestTable(t *testing.T) { + t.Parallel() var tests = []string{ "a | b\n---|---\nc | d\n", "\n\n\n\n\n\n\n\n" + @@ -1222,10 +1322,11 @@ func TestTable(t *testing.T) { "a|b\\|c|d\n---|---|---\nf|g\\|h|i\n", "
ab
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ab|cd
fg|hi
\n", } - doTestsBlock(t, tests, EXTENSION_TABLES) + doTestsBlock(t, tests, Tables) } func TestUnorderedListWith_EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK(t *testing.T) { + t.Parallel() var tests = []string{ "* Hello\n", "\n", @@ -1333,10 +1434,11 @@ func TestUnorderedListWith_EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK(t *testing.T) { "* List\n\n * sublist\n\n normal text\n\n * another sublist\n", "\n", } - doTestsBlock(t, tests, EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK) + doTestsBlock(t, tests, NoEmptyLineBeforeBlock) } func TestOrderedList_EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK(t *testing.T) { + t.Parallel() var tests = []string{ "1. Hello\n", "
    \n
  1. Hello
  2. \n
\n", @@ -1429,14 +1531,18 @@ func TestOrderedList_EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK(t *testing.T) { "1. numbers\n1. are ignored\n", "
    \n
  1. numbers
  2. \n
  3. are ignored
  4. \n
\n", } - doTestsBlock(t, tests, EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK) + doTestsBlock(t, tests, NoEmptyLineBeforeBlock) } func TestFencedCodeBlock_EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK(t *testing.T) { + t.Parallel() var tests = []string{ "``` go\nfunc foo() bool {\n\treturn true;\n}\n```\n", "
func foo() bool {\n\treturn true;\n}\n
\n", + "``` go foo bar\nfunc foo() bool {\n\treturn true;\n}\n```\n", + "
func foo() bool {\n\treturn true;\n}\n
\n", + "``` c\n/* special & char < > \" escaping */\n```\n", "
/* special & char < > " escaping */\n
\n", @@ -1500,10 +1606,52 @@ func TestFencedCodeBlock_EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK(t *testing.T) { " ``` oz\nleading spaces\n ```\n", "
``` oz\n
\n\n

leading spaces

\n\n
```\n
\n", } - doTestsBlock(t, tests, EXTENSION_FENCED_CODE|EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK) + doTestsBlock(t, tests, FencedCode|NoEmptyLineBeforeBlock) +} + +func TestListWithFencedCodeBlock(t *testing.T) { + t.Parallel() + var tests = []string{ + "1. one\n\n ```\n code\n ```\n\n2. two\n", + "
    \n
  1. one

    \n\n
    code\n
  2. \n\n
  3. two

  4. \n
\n", + // https://github.com/russross/blackfriday/issues/239 + "1. one\n\n ```\n - code\n ```\n\n2. two\n", + "
    \n
  1. one

    \n\n
    - code\n
  2. \n\n
  3. two

  4. \n
\n", + } + doTestsBlock(t, tests, FencedCode) +} + +func TestListWithMalformedFencedCodeBlock(t *testing.T) { + t.Parallel() + // Ensure that in the case of an unclosed fenced code block in a list, + // no source gets ommitted (even if it is malformed). + // See russross/blackfriday#372 for context. + var tests = []string{ + "1. one\n\n ```\n code\n\n2. two\n", + "
    \n
  1. one\n```\ncode\n2. two
  2. \n
\n", + + "1. one\n\n ```\n - code\n\n2. two\n", + "
    \n
  1. one\n```\n- code\n2. two
  2. \n
\n", + } + doTestsBlock(t, tests, FencedCode) +} + +func TestListWithFencedCodeBlockNoExtensions(t *testing.T) { + t.Parallel() + // If there is a fenced code block in a list, and FencedCode is not set, + // lists should be processed normally. + var tests = []string{ + "1. one\n\n ```\n code\n ```\n\n2. two\n", + "
    \n
  1. one

    \n\n

    \ncode\n

  2. \n\n
  3. two

  4. \n
\n", + + "1. one\n\n ```\n - code\n ```\n\n2. two\n", + "
    \n
  1. one

    \n\n

    ```

    \n\n
      \n
    • code\n```
    • \n
  2. \n\n
  3. two

  4. \n
\n", + } + doTestsBlock(t, tests, 0) } func TestTitleBlock_EXTENSION_TITLEBLOCK(t *testing.T) { + t.Parallel() var tests = []string{ "% Some title\n" + "% Another title line\n" + @@ -1511,13 +1659,14 @@ func TestTitleBlock_EXTENSION_TITLEBLOCK(t *testing.T) { "

" + "Some title\n" + "Another title line\n" + - "Yep, more here too\n" + - "

", + "Yep, more here too" + + "\n", } - doTestsBlock(t, tests, EXTENSION_TITLEBLOCK) + doTestsBlock(t, tests, Titleblock) } func TestBlockComments(t *testing.T) { + t.Parallel() var tests = []string{ "Some text\n\n\n", "

Some text

\n\n\n", @@ -1530,3 +1679,265 @@ func TestBlockComments(t *testing.T) { } doTestsBlock(t, tests, 0) } + +func TestTOC(t *testing.T) { + t.Parallel() + var tests = []string{ + "# Title\n\n##Subtitle1\n\n##Subtitle2", + //"\n\n

Title

\n\n

Subtitle1

\n\n

Subtitle2

\n", + ` + +

Title

+ +

Subtitle1

+ +

Subtitle2

+`, + + "# Title\n\n##Subtitle\n\n#Title2", + //"\n\n

Title

\n\n

Subtitle

\n\n

Title2

\n", + ` + +

Title

+ +

Subtitle

+ +

Title2

+`, + + "## Subtitle\n\n# Title", + ` + +

Subtitle

+ +

Title

+`, + + "# Title 1\n\n## Subtitle 1\n\n### Subsubtitle 1\n\n# Title 2\n\n### Subsubtitle 2", + ` + +

Title 1

+ +

Subtitle 1

+ +

Subsubtitle 1

+ +

Title 2

+ +

Subsubtitle 2

+`, + + "# Title with `code`", + ` + +

Title with code

+`, + + // Trigger empty TOC + "#", + "", + } + doTestsParam(t, tests, TestParams{ + HTMLFlags: UseXHTML | TOC, + }) +} + +func TestCompletePage(t *testing.T) { + t.Parallel() + var tests = []string{ + "*foo*", + ` + + + + + + + + +

foo

+ + + +`, + } + doTestsParam(t, tests, TestParams{HTMLFlags: UseXHTML | CompletePage}) +} + +func TestIsFenceLine(t *testing.T) { + t.Parallel() + tests := []struct { + data []byte + infoRequested bool + wantEnd int + wantMarker string + wantInfo string + }{ + { + data: []byte("```"), + wantEnd: 3, + wantMarker: "```", + }, + { + data: []byte("```\nstuff here\n"), + wantEnd: 4, + wantMarker: "```", + }, + { + data: []byte("```\nstuff here\n"), + infoRequested: true, + wantEnd: 4, + wantMarker: "```", + }, + { + data: []byte("stuff here\n```\n"), + wantEnd: 0, + }, + { + data: []byte("```"), + infoRequested: true, + wantEnd: 3, + wantMarker: "```", + }, + { + data: []byte("``` go"), + infoRequested: true, + wantEnd: 6, + wantMarker: "```", + wantInfo: "go", + }, + { + data: []byte("``` go foo bar"), + infoRequested: true, + wantEnd: 14, + wantMarker: "```", + wantInfo: "go foo bar", + }, + { + data: []byte("``` go foo bar "), + infoRequested: true, + wantEnd: 16, + wantMarker: "```", + wantInfo: "go foo bar", + }, + } + + for _, test := range tests { + var info *string + if test.infoRequested { + info = new(string) + } + end, marker := isFenceLine(test.data, info, "```") + if got, want := end, test.wantEnd; got != want { + t.Errorf("got end %v, want %v", got, want) + } + if got, want := marker, test.wantMarker; got != want { + t.Errorf("got marker %q, want %q", got, want) + } + if test.infoRequested { + if got, want := *info, test.wantInfo; got != want { + t.Errorf("got info string %q, want %q", got, want) + } + } + } +} + +func TestSanitizedAnchorName(t *testing.T) { + tests := []struct { + text string + want string + }{ + { + text: "This is a header", + want: "this-is-a-header", + }, + { + text: "This is also a header", + want: "this-is-also-a-header", + }, + { + text: "main.go", + want: "main-go", + }, + { + text: "Article 123", + want: "article-123", + }, + { + text: "<- Let's try this, shall we?", + want: "let-s-try-this-shall-we", + }, + { + text: " ", + want: "", + }, + { + text: "Hello, 世界", + want: "hello-世界", + }, + } + for _, test := range tests { + if got := SanitizedAnchorName(test.text); got != test.want { + t.Errorf("SanitizedAnchorName(%q):\ngot %q\nwant %q", test.text, got, test.want) + } + } +} diff --git a/doc.go b/doc.go new file mode 100644 index 00000000..57ff152a --- /dev/null +++ b/doc.go @@ -0,0 +1,46 @@ +// Package blackfriday is a markdown processor. +// +// It translates plain text with simple formatting rules into an AST, which can +// then be further processed to HTML (provided by Blackfriday itself) or other +// formats (provided by the community). +// +// The simplest way to invoke Blackfriday is to call the Run function. It will +// take a text input and produce a text output in HTML (or other format). +// +// A slightly more sophisticated way to use Blackfriday is to create a Markdown +// processor and to call Parse, which returns a syntax tree for the input +// document. You can leverage Blackfriday's parsing for content extraction from +// markdown documents. You can assign a custom renderer and set various options +// to the Markdown processor. +// +// If you're interested in calling Blackfriday from command line, see +// https://github.com/russross/blackfriday-tool. +// +// Sanitized Anchor Names +// +// Blackfriday includes an algorithm for creating sanitized anchor names +// corresponding to a given input text. This algorithm is used to create +// anchors for headings when AutoHeadingIDs extension is enabled. The +// algorithm is specified below, so that other packages can create +// compatible anchor names and links to those anchors. +// +// The algorithm iterates over the input text, interpreted as UTF-8, +// one Unicode code point (rune) at a time. All runes that are letters (category L) +// or numbers (category N) are considered valid characters. They are mapped to +// lower case, and included in the output. All other runes are considered +// invalid characters. Invalid characters that precede the first valid character, +// as well as invalid character that follow the last valid character +// are dropped completely. All other sequences of invalid characters +// between two valid characters are replaced with a single dash character '-'. +// +// SanitizedAnchorName exposes this functionality, and can be used to +// create compatible links to the anchor names generated by blackfriday. +// This algorithm is also implemented in a small standalone package at +// github.com/shurcooL/sanitized_anchor_name. It can be useful for clients +// that want a small package and don't need full functionality of blackfriday. +package blackfriday + +// NOTE: Keep Sanitized Anchor Name algorithm in sync with package +// github.com/shurcooL/sanitized_anchor_name. +// Otherwise, users of sanitized_anchor_name will get anchor names +// that are incompatible with those generated by blackfriday. diff --git a/entities.go b/entities.go new file mode 100644 index 00000000..a2c3edb6 --- /dev/null +++ b/entities.go @@ -0,0 +1,2236 @@ +package blackfriday + +// Extracted from https://html.spec.whatwg.org/multipage/entities.json +var entities = map[string]bool{ + "Æ": true, + "Æ": true, + "&": true, + "&": true, + "Á": true, + "Á": true, + "Ă": true, + "Â": true, + "Â": true, + "А": true, + "𝔄": true, + "À": true, + "À": true, + "Α": true, + "Ā": true, + "⩓": true, + "Ą": true, + "𝔸": true, + "⁡": true, + "Å": true, + "Å": true, + "𝒜": true, + "≔": true, + "Ã": true, + "Ã": true, + "Ä": true, + "Ä": true, + "∖": true, + "⫧": true, + "⌆": true, + "Б": true, + "∵": true, + "ℬ": true, + "Β": true, + "𝔅": true, + "𝔹": true, + "˘": true, + "ℬ": true, + "≎": true, + "Ч": true, + "©": true, + "©": true, + "Ć": true, + "⋒": true, + "ⅅ": true, + "ℭ": true, + "Č": true, + "Ç": true, + "Ç": true, + "Ĉ": true, + "∰": true, + "Ċ": true, + "¸": true, + "·": true, + "ℭ": true, + "Χ": true, + "⊙": true, + "⊖": true, + "⊕": true, + "⊗": true, + "∲": true, + "”": true, + "’": true, + "∷": true, + "⩴": true, + "≡": true, + "∯": true, + "∮": true, + "ℂ": true, + "∐": true, + "∳": true, + "⨯": true, + "𝒞": true, + "⋓": true, + "≍": true, + "ⅅ": true, + "⤑": true, + "Ђ": true, + "Ѕ": true, + "Џ": true, + "‡": true, + "↡": true, + "⫤": true, + "Ď": true, + "Д": true, + "∇": true, + "Δ": true, + "𝔇": true, + "´": true, + "˙": true, + "˝": true, + "`": true, + "˜": true, + "⋄": true, + "ⅆ": true, + "𝔻": true, + "¨": true, + "⃜": true, + "≐": true, + "∯": true, + "¨": true, + "⇓": true, + "⇐": true, + "⇔": true, + "⫤": true, + "⟸": true, + "⟺": true, + "⟹": true, + "⇒": true, + "⊨": true, + "⇑": true, + "⇕": true, + "∥": true, + "↓": true, + "⤓": true, + "⇵": true, + "̑": true, + "⥐": true, + "⥞": true, + "↽": true, + "⥖": true, + "⥟": true, + "⇁": true, + "⥗": true, + "⊤": true, + "↧": true, + "⇓": true, + "𝒟": true, + "Đ": true, + "Ŋ": true, + "Ð": true, + "Ð": true, + "É": true, + "É": true, + "Ě": true, + "Ê": true, + "Ê": true, + "Э": true, + "Ė": true, + "𝔈": true, + "È": true, + "È": true, + "∈": true, + "Ē": true, + "◻": true, + "▫": true, + "Ę": true, + "𝔼": true, + "Ε": true, + "⩵": true, + "≂": true, + "⇌": true, + "ℰ": true, + "⩳": true, + "Η": true, + "Ë": true, + "Ë": true, + "∃": true, + "ⅇ": true, + "Ф": true, + "𝔉": true, + "◼": true, + "▪": true, + "𝔽": true, + "∀": true, + "ℱ": true, + "ℱ": true, + "Ѓ": true, + ">": true, + ">": true, + "Γ": true, + "Ϝ": true, + "Ğ": true, + "Ģ": true, + "Ĝ": true, + "Г": true, + "Ġ": true, + "𝔊": true, + "⋙": true, + "𝔾": true, + "≥": true, + "⋛": true, + "≧": true, + "⪢": true, + "≷": true, + "⩾": true, + "≳": true, + "𝒢": true, + "≫": true, + "Ъ": true, + "ˇ": true, + "^": true, + "Ĥ": true, + "ℌ": true, + "ℋ": true, + "ℍ": true, + "─": true, + "ℋ": true, + "Ħ": true, + "≎": true, + "≏": true, + "Е": true, + "IJ": true, + "Ё": true, + "Í": true, + "Í": true, + "Î": true, + "Î": true, + "И": true, + "İ": true, + "ℑ": true, + "Ì": true, + "Ì": true, + "ℑ": true, + "Ī": true, + "ⅈ": true, + "⇒": true, + "∬": true, + "∫": true, + "⋂": true, + "⁣": true, + "⁢": true, + "Į": true, + "𝕀": true, + "Ι": true, + "ℐ": true, + "Ĩ": true, + "І": true, + "Ï": true, + "Ï": true, + "Ĵ": true, + "Й": true, + "𝔍": true, + "𝕁": true, + "𝒥": true, + "Ј": true, + "Є": true, + "Х": true, + "Ќ": true, + "Κ": true, + "Ķ": true, + "К": true, + "𝔎": true, + "𝕂": true, + "𝒦": true, + "Љ": true, + "<": true, + "<": true, + "Ĺ": true, + "Λ": true, + "⟪": true, + "ℒ": true, + "↞": true, + "Ľ": true, + "Ļ": true, + "Л": true, + "⟨": true, + "←": true, + "⇤": true, + "⇆": true, + "⌈": true, + "⟦": true, + "⥡": true, + "⇃": true, + "⥙": true, + "⌊": true, + "↔": true, + "⥎": true, + "⊣": true, + "↤": true, + "⥚": true, + "⊲": true, + "⧏": true, + "⊴": true, + "⥑": true, + "⥠": true, + "↿": true, + "⥘": true, + "↼": true, + "⥒": true, + "⇐": true, + "⇔": true, + "⋚": true, + "≦": true, + "≶": true, + "⪡": true, + "⩽": true, + "≲": true, + "𝔏": true, + "⋘": true, + "⇚": true, + "Ŀ": true, + "⟵": true, + "⟷": true, + "⟶": true, + "⟸": true, + "⟺": true, + "⟹": true, + "𝕃": true, + "↙": true, + "↘": true, + "ℒ": true, + "↰": true, + "Ł": true, + "≪": true, + "⤅": true, + "М": true, + " ": true, + "ℳ": true, + "𝔐": true, + "∓": true, + "𝕄": true, + "ℳ": true, + "Μ": true, + "Њ": true, + "Ń": true, + "Ň": true, + "Ņ": true, + "Н": true, + "​": true, + "​": true, + "​": true, + "​": true, + "≫": true, + "≪": true, + " ": true, + "𝔑": true, + "⁠": true, + " ": true, + "ℕ": true, + "⫬": true, + "≢": true, + "≭": true, + "∦": true, + "∉": true, + "≠": true, + "≂̸": true, + "∄": true, + "≯": true, + "≱": true, + "≧̸": true, + "≫̸": true, + "≹": true, + "⩾̸": true, + "≵": true, + "≎̸": true, + "≏̸": true, + "⋪": true, + "⧏̸": true, + "⋬": true, + "≮": true, + "≰": true, + "≸": true, + "≪̸": true, + "⩽̸": true, + "≴": true, + "⪢̸": true, + "⪡̸": true, + "⊀": true, + "⪯̸": true, + "⋠": true, + "∌": true, + "⋫": true, + "⧐̸": true, + "⋭": true, + "⊏̸": true, + "⋢": true, + "⊐̸": true, + "⋣": true, + "⊂⃒": true, + "⊈": true, + "⊁": true, + "⪰̸": true, + "⋡": true, + "≿̸": true, + "⊃⃒": true, + "⊉": true, + "≁": true, + "≄": true, + "≇": true, + "≉": true, + "∤": true, + "𝒩": true, + "Ñ": true, + "Ñ": true, + "Ν": true, + "Œ": true, + "Ó": true, + "Ó": true, + "Ô": true, + "Ô": true, + "О": true, + "Ő": true, + "𝔒": true, + "Ò": true, + "Ò": true, + "Ō": true, + "Ω": true, + "Ο": true, + "𝕆": true, + "“": true, + "‘": true, + "⩔": true, + "𝒪": true, + "Ø": true, + "Ø": true, + "Õ": true, + "Õ": true, + "⨷": true, + "Ö": true, + "Ö": true, + "‾": true, + "⏞": true, + "⎴": true, + "⏜": true, + "∂": true, + "П": true, + "𝔓": true, + "Φ": true, + "Π": true, + "±": true, + "ℌ": true, + "ℙ": true, + "⪻": true, + "≺": true, + "⪯": true, + "≼": true, + "≾": true, + "″": true, + "∏": true, + "∷": true, + "∝": true, + "𝒫": true, + "Ψ": true, + """: true, + """: true, + "𝔔": true, + "ℚ": true, + "𝒬": true, + "⤐": true, + "®": true, + "®": true, + "Ŕ": true, + "⟫": true, + "↠": true, + "⤖": true, + "Ř": true, + "Ŗ": true, + "Р": true, + "ℜ": true, + "∋": true, + "⇋": true, + "⥯": true, + "ℜ": true, + "Ρ": true, + "⟩": true, + "→": true, + "⇥": true, + "⇄": true, + "⌉": true, + "⟧": true, + "⥝": true, + "⇂": true, + "⥕": true, + "⌋": true, + "⊢": true, + "↦": true, + "⥛": true, + "⊳": true, + "⧐": true, + "⊵": true, + "⥏": true, + "⥜": true, + "↾": true, + "⥔": true, + "⇀": true, + "⥓": true, + "⇒": true, + "ℝ": true, + "⥰": true, + "⇛": true, + "ℛ": true, + "↱": true, + "⧴": true, + "Щ": true, + "Ш": true, + "Ь": true, + "Ś": true, + "⪼": true, + "Š": true, + "Ş": true, + "Ŝ": true, + "С": true, + "𝔖": true, + "↓": true, + "←": true, + "→": true, + "↑": true, + "Σ": true, + "∘": true, + "𝕊": true, + "√": true, + "□": true, + "⊓": true, + "⊏": true, + "⊑": true, + "⊐": true, + "⊒": true, + "⊔": true, + "𝒮": true, + "⋆": true, + "⋐": true, + "⋐": true, + "⊆": true, + "≻": true, + "⪰": true, + "≽": true, + "≿": true, + "∋": true, + "∑": true, + "⋑": true, + "⊃": true, + "⊇": true, + "⋑": true, + "Þ": true, + "Þ": true, + "™": true, + "Ћ": true, + "Ц": true, + " ": true, + "Τ": true, + "Ť": true, + "Ţ": true, + "Т": true, + "𝔗": true, + "∴": true, + "Θ": true, + "  ": true, + " ": true, + "∼": true, + "≃": true, + "≅": true, + "≈": true, + "𝕋": true, + "⃛": true, + "𝒯": true, + "Ŧ": true, + "Ú": true, + "Ú": true, + "↟": true, + "⥉": true, + "Ў": true, + "Ŭ": true, + "Û": true, + "Û": true, + "У": true, + "Ű": true, + "𝔘": true, + "Ù": true, + "Ù": true, + "Ū": true, + "_": true, + "⏟": true, + "⎵": true, + "⏝": true, + "⋃": true, + "⊎": true, + "Ų": true, + "𝕌": true, + "↑": true, + "⤒": true, + "⇅": true, + "↕": true, + "⥮": true, + "⊥": true, + "↥": true, + "⇑": true, + "⇕": true, + "↖": true, + "↗": true, + "ϒ": true, + "Υ": true, + "Ů": true, + "𝒰": true, + "Ũ": true, + "Ü": true, + "Ü": true, + "⊫": true, + "⫫": true, + "В": true, + "⊩": true, + "⫦": true, + "⋁": true, + "‖": true, + "‖": true, + "∣": true, + "|": true, + "❘": true, + "≀": true, + " ": true, + "𝔙": true, + "𝕍": true, + "𝒱": true, + "⊪": true, + "Ŵ": true, + "⋀": true, + "𝔚": true, + "𝕎": true, + "𝒲": true, + "𝔛": true, + "Ξ": true, + "𝕏": true, + "𝒳": true, + "Я": true, + "Ї": true, + "Ю": true, + "Ý": true, + "Ý": true, + "Ŷ": true, + "Ы": true, + "𝔜": true, + "𝕐": true, + "𝒴": true, + "Ÿ": true, + "Ж": true, + "Ź": true, + "Ž": true, + "З": true, + "Ż": true, + "​": true, + "Ζ": true, + "ℨ": true, + "ℤ": true, + "𝒵": true, + "á": true, + "á": true, + "ă": true, + "∾": true, + "∾̳": true, + "∿": true, + "â": true, + "â": true, + "´": true, + "´": true, + "а": true, + "æ": true, + "æ": true, + "⁡": true, + "𝔞": true, + "à": true, + "à": true, + "ℵ": true, + "ℵ": true, + "α": true, + "ā": true, + "⨿": true, + "&": true, + "&": true, + "∧": true, + "⩕": true, + "⩜": true, + "⩘": true, + "⩚": true, + "∠": true, + "⦤": true, + "∠": true, + "∡": true, + "⦨": true, + "⦩": true, + "⦪": true, + "⦫": true, + "⦬": true, + "⦭": true, + "⦮": true, + "⦯": true, + "∟": true, + "⊾": true, + "⦝": true, + "∢": true, + "Å": true, + "⍼": true, + "ą": true, + "𝕒": true, + "≈": true, + "⩰": true, + "⩯": true, + "≊": true, + "≋": true, + "'": true, + "≈": true, + "≊": true, + "å": true, + "å": true, + "𝒶": true, + "*": true, + "≈": true, + "≍": true, + "ã": true, + "ã": true, + "ä": true, + "ä": true, + "∳": true, + "⨑": true, + "⫭": true, + "≌": true, + "϶": true, + "‵": true, + "∽": true, + "⋍": true, + "⊽": true, + "⌅": true, + "⌅": true, + "⎵": true, + "⎶": true, + "≌": true, + "б": true, + "„": true, + "∵": true, + "∵": true, + "⦰": true, + "϶": true, + "ℬ": true, + "β": true, + "ℶ": true, + "≬": true, + "𝔟": true, + "⋂": true, + "◯": true, + "⋃": true, + "⨀": true, + "⨁": true, + "⨂": true, + "⨆": true, + "★": true, + "▽": true, + "△": true, + "⨄": true, + "⋁": true, + "⋀": true, + "⤍": true, + "⧫": true, + "▪": true, + "▴": true, + "▾": true, + "◂": true, + "▸": true, + "␣": true, + "▒": true, + "░": true, + "▓": true, + "█": true, + "=⃥": true, + "≡⃥": true, + "⌐": true, + "𝕓": true, + "⊥": true, + "⊥": true, + "⋈": true, + "╗": true, + "╔": true, + "╖": true, + "╓": true, + "═": true, + "╦": true, + "╩": true, + "╤": true, + "╧": true, + "╝": true, + "╚": true, + "╜": true, + "╙": true, + "║": true, + "╬": true, + "╣": true, + "╠": true, + "╫": true, + "╢": true, + "╟": true, + "⧉": true, + "╕": true, + "╒": true, + "┐": true, + "┌": true, + "─": true, + "╥": true, + "╨": true, + "┬": true, + "┴": true, + "⊟": true, + "⊞": true, + "⊠": true, + "╛": true, + "╘": true, + "┘": true, + "└": true, + "│": true, + "╪": true, + "╡": true, + "╞": true, + "┼": true, + "┤": true, + "├": true, + "‵": true, + "˘": true, + "¦": true, + "¦": true, + "𝒷": true, + "⁏": true, + "∽": true, + "⋍": true, + "\": true, + "⧅": true, + "⟈": true, + "•": true, + "•": true, + "≎": true, + "⪮": true, + "≏": true, + "≏": true, + "ć": true, + "∩": true, + "⩄": true, + "⩉": true, + "⩋": true, + "⩇": true, + "⩀": true, + "∩︀": true, + "⁁": true, + "ˇ": true, + "⩍": true, + "č": true, + "ç": true, + "ç": true, + "ĉ": true, + "⩌": true, + "⩐": true, + "ċ": true, + "¸": true, + "¸": true, + "⦲": true, + "¢": true, + "¢": true, + "·": true, + "𝔠": true, + "ч": true, + "✓": true, + "✓": true, + "χ": true, + "○": true, + "⧃": true, + "ˆ": true, + "≗": true, + "↺": true, + "↻": true, + "®": true, + "Ⓢ": true, + "⊛": true, + "⊚": true, + "⊝": true, + "≗": true, + "⨐": true, + "⫯": true, + "⧂": true, + "♣": true, + "♣": true, + ":": true, + "≔": true, + "≔": true, + ",": true, + "@": true, + "∁": true, + "∘": true, + "∁": true, + "ℂ": true, + "≅": true, + "⩭": true, + "∮": true, + "𝕔": true, + "∐": true, + "©": true, + "©": true, + "℗": true, + "↵": true, + "✗": true, + "𝒸": true, + "⫏": true, + "⫑": true, + "⫐": true, + "⫒": true, + "⋯": true, + "⤸": true, + "⤵": true, + "⋞": true, + "⋟": true, + "↶": true, + "⤽": true, + "∪": true, + "⩈": true, + "⩆": true, + "⩊": true, + "⊍": true, + "⩅": true, + "∪︀": true, + "↷": true, + "⤼": true, + "⋞": true, + "⋟": true, + "⋎": true, + "⋏": true, + "¤": true, + "¤": true, + "↶": true, + "↷": true, + "⋎": true, + "⋏": true, + "∲": true, + "∱": true, + "⌭": true, + "⇓": true, + "⥥": true, + "†": true, + "ℸ": true, + "↓": true, + "‐": true, + "⊣": true, + "⤏": true, + "˝": true, + "ď": true, + "д": true, + "ⅆ": true, + "‡": true, + "⇊": true, + "⩷": true, + "°": true, + "°": true, + "δ": true, + "⦱": true, + "⥿": true, + "𝔡": true, + "⇃": true, + "⇂": true, + "⋄": true, + "⋄": true, + "♦": true, + "♦": true, + "¨": true, + "ϝ": true, + "⋲": true, + "÷": true, + "÷": true, + "÷": true, + "⋇": true, + "⋇": true, + "ђ": true, + "⌞": true, + "⌍": true, + "$": true, + "𝕕": true, + "˙": true, + "≐": true, + "≑": true, + "∸": true, + "∔": true, + "⊡": true, + "⌆": true, + "↓": true, + "⇊": true, + "⇃": true, + "⇂": true, + "⤐": true, + "⌟": true, + "⌌": true, + "𝒹": true, + "ѕ": true, + "⧶": true, + "đ": true, + "⋱": true, + "▿": true, + "▾": true, + "⇵": true, + "⥯": true, + "⦦": true, + "џ": true, + "⟿": true, + "⩷": true, + "≑": true, + "é": true, + "é": true, + "⩮": true, + "ě": true, + "≖": true, + "ê": true, + "ê": true, + "≕": true, + "э": true, + "ė": true, + "ⅇ": true, + "≒": true, + "𝔢": true, + "⪚": true, + "è": true, + "è": true, + "⪖": true, + "⪘": true, + "⪙": true, + "⏧": true, + "ℓ": true, + "⪕": true, + "⪗": true, + "ē": true, + "∅": true, + "∅": true, + "∅": true, + " ": true, + " ": true, + " ": true, + "ŋ": true, + " ": true, + "ę": true, + "𝕖": true, + "⋕": true, + "⧣": true, + "⩱": true, + "ε": true, + "ε": true, + "ϵ": true, + "≖": true, + "≕": true, + "≂": true, + "⪖": true, + "⪕": true, + "=": true, + "≟": true, + "≡": true, + "⩸": true, + "⧥": true, + "≓": true, + "⥱": true, + "ℯ": true, + "≐": true, + "≂": true, + "η": true, + "ð": true, + "ð": true, + "ë": true, + "ë": true, + "€": true, + "!": true, + "∃": true, + "ℰ": true, + "ⅇ": true, + "≒": true, + "ф": true, + "♀": true, + "ffi": true, + "ff": true, + "ffl": true, + "𝔣": true, + "fi": true, + "fj": true, + "♭": true, + "fl": true, + "▱": true, + "ƒ": true, + "𝕗": true, + "∀": true, + "⋔": true, + "⫙": true, + "⨍": true, + "½": true, + "½": true, + "⅓": true, + "¼": true, + "¼": true, + "⅕": true, + "⅙": true, + "⅛": true, + "⅔": true, + "⅖": true, + "¾": true, + "¾": true, + "⅗": true, + "⅜": true, + "⅘": true, + "⅚": true, + "⅝": true, + "⅞": true, + "⁄": true, + "⌢": true, + "𝒻": true, + "≧": true, + "⪌": true, + "ǵ": true, + "γ": true, + "ϝ": true, + "⪆": true, + "ğ": true, + "ĝ": true, + "г": true, + "ġ": true, + "≥": true, + "⋛": true, + "≥": true, + "≧": true, + "⩾": true, + "⩾": true, + "⪩": true, + "⪀": true, + "⪂": true, + "⪄": true, + "⋛︀": true, + "⪔": true, + "𝔤": true, + "≫": true, + "⋙": true, + "ℷ": true, + "ѓ": true, + "≷": true, + "⪒": true, + "⪥": true, + "⪤": true, + "≩": true, + "⪊": true, + "⪊": true, + "⪈": true, + "⪈": true, + "≩": true, + "⋧": true, + "𝕘": true, + "`": true, + "ℊ": true, + "≳": true, + "⪎": true, + "⪐": true, + ">": true, + ">": true, + "⪧": true, + "⩺": true, + "⋗": true, + "⦕": true, + "⩼": true, + "⪆": true, + "⥸": true, + "⋗": true, + "⋛": true, + "⪌": true, + "≷": true, + "≳": true, + "≩︀": true, + "≩︀": true, + "⇔": true, + " ": true, + "½": true, + "ℋ": true, + "ъ": true, + "↔": true, + "⥈": true, + "↭": true, + "ℏ": true, + "ĥ": true, + "♥": true, + "♥": true, + "…": true, + "⊹": true, + "𝔥": true, + "⤥": true, + "⤦": true, + "⇿": true, + "∻": true, + "↩": true, + "↪": true, + "𝕙": true, + "―": true, + "𝒽": true, + "ℏ": true, + "ħ": true, + "⁃": true, + "‐": true, + "í": true, + "í": true, + "⁣": true, + "î": true, + "î": true, + "и": true, + "е": true, + "¡": true, + "¡": true, + "⇔": true, + "𝔦": true, + "ì": true, + "ì": true, + "ⅈ": true, + "⨌": true, + "∭": true, + "⧜": true, + "℩": true, + "ij": true, + "ī": true, + "ℑ": true, + "ℐ": true, + "ℑ": true, + "ı": true, + "⊷": true, + "Ƶ": true, + "∈": true, + "℅": true, + "∞": true, + "⧝": true, + "ı": true, + "∫": true, + "⊺": true, + "ℤ": true, + "⊺": true, + "⨗": true, + "⨼": true, + "ё": true, + "į": true, + "𝕚": true, + "ι": true, + "⨼": true, + "¿": true, + "¿": true, + "𝒾": true, + "∈": true, + "⋹": true, + "⋵": true, + "⋴": true, + "⋳": true, + "∈": true, + "⁢": true, + "ĩ": true, + "і": true, + "ï": true, + "ï": true, + "ĵ": true, + "й": true, + "𝔧": true, + "ȷ": true, + "𝕛": true, + "𝒿": true, + "ј": true, + "є": true, + "κ": true, + "ϰ": true, + "ķ": true, + "к": true, + "𝔨": true, + "ĸ": true, + "х": true, + "ќ": true, + "𝕜": true, + "𝓀": true, + "⇚": true, + "⇐": true, + "⤛": true, + "⤎": true, + "≦": true, + "⪋": true, + "⥢": true, + "ĺ": true, + "⦴": true, + "ℒ": true, + "λ": true, + "⟨": true, + "⦑": true, + "⟨": true, + "⪅": true, + "«": true, + "«": true, + "←": true, + "⇤": true, + "⤟": true, + "⤝": true, + "↩": true, + "↫": true, + "⤹": true, + "⥳": true, + "↢": true, + "⪫": true, + "⤙": true, + "⪭": true, + "⪭︀": true, + "⤌": true, + "❲": true, + "{": true, + "[": true, + "⦋": true, + "⦏": true, + "⦍": true, + "ľ": true, + "ļ": true, + "⌈": true, + "{": true, + "л": true, + "⤶": true, + "“": true, + "„": true, + "⥧": true, + "⥋": true, + "↲": true, + "≤": true, + "←": true, + "↢": true, + "↽": true, + "↼": true, + "⇇": true, + "↔": true, + "⇆": true, + "⇋": true, + "↭": true, + "⋋": true, + "⋚": true, + "≤": true, + "≦": true, + "⩽": true, + "⩽": true, + "⪨": true, + "⩿": true, + "⪁": true, + "⪃": true, + "⋚︀": true, + "⪓": true, + "⪅": true, + "⋖": true, + "⋚": true, + "⪋": true, + "≶": true, + "≲": true, + "⥼": true, + "⌊": true, + "𝔩": true, + "≶": true, + "⪑": true, + "↽": true, + "↼": true, + "⥪": true, + "▄": true, + "љ": true, + "≪": true, + "⇇": true, + "⌞": true, + "⥫": true, + "◺": true, + "ŀ": true, + "⎰": true, + "⎰": true, + "≨": true, + "⪉": true, + "⪉": true, + "⪇": true, + "⪇": true, + "≨": true, + "⋦": true, + "⟬": true, + "⇽": true, + "⟦": true, + "⟵": true, + "⟷": true, + "⟼": true, + "⟶": true, + "↫": true, + "↬": true, + "⦅": true, + "𝕝": true, + "⨭": true, + "⨴": true, + "∗": true, + "_": true, + "◊": true, + "◊": true, + "⧫": true, + "(": true, + "⦓": true, + "⇆": true, + "⌟": true, + "⇋": true, + "⥭": true, + "‎": true, + "⊿": true, + "‹": true, + "𝓁": true, + "↰": true, + "≲": true, + "⪍": true, + "⪏": true, + "[": true, + "‘": true, + "‚": true, + "ł": true, + "<": true, + "<": true, + "⪦": true, + "⩹": true, + "⋖": true, + "⋋": true, + "⋉": true, + "⥶": true, + "⩻": true, + "⦖": true, + "◃": true, + "⊴": true, + "◂": true, + "⥊": true, + "⥦": true, + "≨︀": true, + "≨︀": true, + "∺": true, + "¯": true, + "¯": true, + "♂": true, + "✠": true, + "✠": true, + "↦": true, + "↦": true, + "↧": true, + "↤": true, + "↥": true, + "▮": true, + "⨩": true, + "м": true, + "—": true, + "∡": true, + "𝔪": true, + "℧": true, + "µ": true, + "µ": true, + "∣": true, + "*": true, + "⫰": true, + "·": true, + "·": true, + "−": true, + "⊟": true, + "∸": true, + "⨪": true, + "⫛": true, + "…": true, + "∓": true, + "⊧": true, + "𝕞": true, + "∓": true, + "𝓂": true, + "∾": true, + "μ": true, + "⊸": true, + "⊸": true, + "⋙̸": true, + "≫⃒": true, + "≫̸": true, + "⇍": true, + "⇎": true, + "⋘̸": true, + "≪⃒": true, + "≪̸": true, + "⇏": true, + "⊯": true, + "⊮": true, + "∇": true, + "ń": true, + "∠⃒": true, + "≉": true, + "⩰̸": true, + "≋̸": true, + "ʼn": true, + "≉": true, + "♮": true, + "♮": true, + "ℕ": true, + " ": true, + " ": true, + "≎̸": true, + "≏̸": true, + "⩃": true, + "ň": true, + "ņ": true, + "≇": true, + "⩭̸": true, + "⩂": true, + "н": true, + "–": true, + "≠": true, + "⇗": true, + "⤤": true, + "↗": true, + "↗": true, + "≐̸": true, + "≢": true, + "⤨": true, + "≂̸": true, + "∄": true, + "∄": true, + "𝔫": true, + "≧̸": true, + "≱": true, + "≱": true, + "≧̸": true, + "⩾̸": true, + "⩾̸": true, + "≵": true, + "≯": true, + "≯": true, + "⇎": true, + "↮": true, + "⫲": true, + "∋": true, + "⋼": true, + "⋺": true, + "∋": true, + "њ": true, + "⇍": true, + "≦̸": true, + "↚": true, + "‥": true, + "≰": true, + "↚": true, + "↮": true, + "≰": true, + "≦̸": true, + "⩽̸": true, + "⩽̸": true, + "≮": true, + "≴": true, + "≮": true, + "⋪": true, + "⋬": true, + "∤": true, + "𝕟": true, + "¬": true, + "¬": true, + "∉": true, + "⋹̸": true, + "⋵̸": true, + "∉": true, + "⋷": true, + "⋶": true, + "∌": true, + "∌": true, + "⋾": true, + "⋽": true, + "∦": true, + "∦": true, + "⫽⃥": true, + "∂̸": true, + "⨔": true, + "⊀": true, + "⋠": true, + "⪯̸": true, + "⊀": true, + "⪯̸": true, + "⇏": true, + "↛": true, + "⤳̸": true, + "↝̸": true, + "↛": true, + "⋫": true, + "⋭": true, + "⊁": true, + "⋡": true, + "⪰̸": true, + "𝓃": true, + "∤": true, + "∦": true, + "≁": true, + "≄": true, + "≄": true, + "∤": true, + "∦": true, + "⋢": true, + "⋣": true, + "⊄": true, + "⫅̸": true, + "⊈": true, + "⊂⃒": true, + "⊈": true, + "⫅̸": true, + "⊁": true, + "⪰̸": true, + "⊅": true, + "⫆̸": true, + "⊉": true, + "⊃⃒": true, + "⊉": true, + "⫆̸": true, + "≹": true, + "ñ": true, + "ñ": true, + "≸": true, + "⋪": true, + "⋬": true, + "⋫": true, + "⋭": true, + "ν": true, + "#": true, + "№": true, + " ": true, + "⊭": true, + "⤄": true, + "≍⃒": true, + "⊬": true, + "≥⃒": true, + ">⃒": true, + "⧞": true, + "⤂": true, + "≤⃒": true, + "<⃒": true, + "⊴⃒": true, + "⤃": true, + "⊵⃒": true, + "∼⃒": true, + "⇖": true, + "⤣": true, + "↖": true, + "↖": true, + "⤧": true, + "Ⓢ": true, + "ó": true, + "ó": true, + "⊛": true, + "⊚": true, + "ô": true, + "ô": true, + "о": true, + "⊝": true, + "ő": true, + "⨸": true, + "⊙": true, + "⦼": true, + "œ": true, + "⦿": true, + "𝔬": true, + "˛": true, + "ò": true, + "ò": true, + "⧁": true, + "⦵": true, + "Ω": true, + "∮": true, + "↺": true, + "⦾": true, + "⦻": true, + "‾": true, + "⧀": true, + "ō": true, + "ω": true, + "ο": true, + "⦶": true, + "⊖": true, + "𝕠": true, + "⦷": true, + "⦹": true, + "⊕": true, + "∨": true, + "↻": true, + "⩝": true, + "ℴ": true, + "ℴ": true, + "ª": true, + "ª": true, + "º": true, + "º": true, + "⊶": true, + "⩖": true, + "⩗": true, + "⩛": true, + "ℴ": true, + "ø": true, + "ø": true, + "⊘": true, + "õ": true, + "õ": true, + "⊗": true, + "⨶": true, + "ö": true, + "ö": true, + "⌽": true, + "∥": true, + "¶": true, + "¶": true, + "∥": true, + "⫳": true, + "⫽": true, + "∂": true, + "п": true, + "%": true, + ".": true, + "‰": true, + "⊥": true, + "‱": true, + "𝔭": true, + "φ": true, + "ϕ": true, + "ℳ": true, + "☎": true, + "π": true, + "⋔": true, + "ϖ": true, + "ℏ": true, + "ℎ": true, + "ℏ": true, + "+": true, + "⨣": true, + "⊞": true, + "⨢": true, + "∔": true, + "⨥": true, + "⩲": true, + "±": true, + "±": true, + "⨦": true, + "⨧": true, + "±": true, + "⨕": true, + "𝕡": true, + "£": true, + "£": true, + "≺": true, + "⪳": true, + "⪷": true, + "≼": true, + "⪯": true, + "≺": true, + "⪷": true, + "≼": true, + "⪯": true, + "⪹": true, + "⪵": true, + "⋨": true, + "≾": true, + "′": true, + "ℙ": true, + "⪵": true, + "⪹": true, + "⋨": true, + "∏": true, + "⌮": true, + "⌒": true, + "⌓": true, + "∝": true, + "∝": true, + "≾": true, + "⊰": true, + "𝓅": true, + "ψ": true, + " ": true, + "𝔮": true, + "⨌": true, + "𝕢": true, + "⁗": true, + "𝓆": true, + "ℍ": true, + "⨖": true, + "?": true, + "≟": true, + """: true, + """: true, + "⇛": true, + "⇒": true, + "⤜": true, + "⤏": true, + "⥤": true, + "∽̱": true, + "ŕ": true, + "√": true, + "⦳": true, + "⟩": true, + "⦒": true, + "⦥": true, + "⟩": true, + "»": true, + "»": true, + "→": true, + "⥵": true, + "⇥": true, + "⤠": true, + "⤳": true, + "⤞": true, + "↪": true, + "↬": true, + "⥅": true, + "⥴": true, + "↣": true, + "↝": true, + "⤚": true, + "∶": true, + "ℚ": true, + "⤍": true, + "❳": true, + "}": true, + "]": true, + "⦌": true, + "⦎": true, + "⦐": true, + "ř": true, + "ŗ": true, + "⌉": true, + "}": true, + "р": true, + "⤷": true, + "⥩": true, + "”": true, + "”": true, + "↳": true, + "ℜ": true, + "ℛ": true, + "ℜ": true, + "ℝ": true, + "▭": true, + "®": true, + "®": true, + "⥽": true, + "⌋": true, + "𝔯": true, + "⇁": true, + "⇀": true, + "⥬": true, + "ρ": true, + "ϱ": true, + "→": true, + "↣": true, + "⇁": true, + "⇀": true, + "⇄": true, + "⇌": true, + "⇉": true, + "↝": true, + "⋌": true, + "˚": true, + "≓": true, + "⇄": true, + "⇌": true, + "‏": true, + "⎱": true, + "⎱": true, + "⫮": true, + "⟭": true, + "⇾": true, + "⟧": true, + "⦆": true, + "𝕣": true, + "⨮": true, + "⨵": true, + ")": true, + "⦔": true, + "⨒": true, + "⇉": true, + "›": true, + "𝓇": true, + "↱": true, + "]": true, + "’": true, + "’": true, + "⋌": true, + "⋊": true, + "▹": true, + "⊵": true, + "▸": true, + "⧎": true, + "⥨": true, + "℞": true, + "ś": true, + "‚": true, + "≻": true, + "⪴": true, + "⪸": true, + "š": true, + "≽": true, + "⪰": true, + "ş": true, + "ŝ": true, + "⪶": true, + "⪺": true, + "⋩": true, + "⨓": true, + "≿": true, + "с": true, + "⋅": true, + "⊡": true, + "⩦": true, + "⇘": true, + "⤥": true, + "↘": true, + "↘": true, + "§": true, + "§": true, + ";": true, + "⤩": true, + "∖": true, + "∖": true, + "✶": true, + "𝔰": true, + "⌢": true, + "♯": true, + "щ": true, + "ш": true, + "∣": true, + "∥": true, + "­": true, + "­": true, + "σ": true, + "ς": true, + "ς": true, + "∼": true, + "⩪": true, + "≃": true, + "≃": true, + "⪞": true, + "⪠": true, + "⪝": true, + "⪟": true, + "≆": true, + "⨤": true, + "⥲": true, + "←": true, + "∖": true, + "⨳": true, + "⧤": true, + "∣": true, + "⌣": true, + "⪪": true, + "⪬": true, + "⪬︀": true, + "ь": true, + "/": true, + "⧄": true, + "⌿": true, + "𝕤": true, + "♠": true, + "♠": true, + "∥": true, + "⊓": true, + "⊓︀": true, + "⊔": true, + "⊔︀": true, + "⊏": true, + "⊑": true, + "⊏": true, + "⊑": true, + "⊐": true, + "⊒": true, + "⊐": true, + "⊒": true, + "□": true, + "□": true, + "▪": true, + "▪": true, + "→": true, + "𝓈": true, + "∖": true, + "⌣": true, + "⋆": true, + "☆": true, + "★": true, + "ϵ": true, + "ϕ": true, + "¯": true, + "⊂": true, + "⫅": true, + "⪽": true, + "⊆": true, + "⫃": true, + "⫁": true, + "⫋": true, + "⊊": true, + "⪿": true, + "⥹": true, + "⊂": true, + "⊆": true, + "⫅": true, + "⊊": true, + "⫋": true, + "⫇": true, + "⫕": true, + "⫓": true, + "≻": true, + "⪸": true, + "≽": true, + "⪰": true, + "⪺": true, + "⪶": true, + "⋩": true, + "≿": true, + "∑": true, + "♪": true, + "¹": true, + "¹": true, + "²": true, + "²": true, + "³": true, + "³": true, + "⊃": true, + "⫆": true, + "⪾": true, + "⫘": true, + "⊇": true, + "⫄": true, + "⟉": true, + "⫗": true, + "⥻": true, + "⫂": true, + "⫌": true, + "⊋": true, + "⫀": true, + "⊃": true, + "⊇": true, + "⫆": true, + "⊋": true, + "⫌": true, + "⫈": true, + "⫔": true, + "⫖": true, + "⇙": true, + "⤦": true, + "↙": true, + "↙": true, + "⤪": true, + "ß": true, + "ß": true, + "⌖": true, + "τ": true, + "⎴": true, + "ť": true, + "ţ": true, + "т": true, + "⃛": true, + "⌕": true, + "𝔱": true, + "∴": true, + "∴": true, + "θ": true, + "ϑ": true, + "ϑ": true, + "≈": true, + "∼": true, + " ": true, + "≈": true, + "∼": true, + "þ": true, + "þ": true, + "˜": true, + "×": true, + "×": true, + "⊠": true, + "⨱": true, + "⨰": true, + "∭": true, + "⤨": true, + "⊤": true, + "⌶": true, + "⫱": true, + "𝕥": true, + "⫚": true, + "⤩": true, + "‴": true, + "™": true, + "▵": true, + "▿": true, + "◃": true, + "⊴": true, + "≜": true, + "▹": true, + "⊵": true, + "◬": true, + "≜": true, + "⨺": true, + "⨹": true, + "⧍": true, + "⨻": true, + "⏢": true, + "𝓉": true, + "ц": true, + "ћ": true, + "ŧ": true, + "≬": true, + "↞": true, + "↠": true, + "⇑": true, + "⥣": true, + "ú": true, + "ú": true, + "↑": true, + "ў": true, + "ŭ": true, + "û": true, + "û": true, + "у": true, + "⇅": true, + "ű": true, + "⥮": true, + "⥾": true, + "𝔲": true, + "ù": true, + "ù": true, + "↿": true, + "↾": true, + "▀": true, + "⌜": true, + "⌜": true, + "⌏": true, + "◸": true, + "ū": true, + "¨": true, + "¨": true, + "ų": true, + "𝕦": true, + "↑": true, + "↕": true, + "↿": true, + "↾": true, + "⊎": true, + "υ": true, + "ϒ": true, + "υ": true, + "⇈": true, + "⌝": true, + "⌝": true, + "⌎": true, + "ů": true, + "◹": true, + "𝓊": true, + "⋰": true, + "ũ": true, + "▵": true, + "▴": true, + "⇈": true, + "ü": true, + "ü": true, + "⦧": true, + "⇕": true, + "⫨": true, + "⫩": true, + "⊨": true, + "⦜": true, + "ϵ": true, + "ϰ": true, + "∅": true, + "ϕ": true, + "ϖ": true, + "∝": true, + "↕": true, + "ϱ": true, + "ς": true, + "⊊︀": true, + "⫋︀": true, + "⊋︀": true, + "⫌︀": true, + "ϑ": true, + "⊲": true, + "⊳": true, + "в": true, + "⊢": true, + "∨": true, + "⊻": true, + "≚": true, + "⋮": true, + "|": true, + "|": true, + "𝔳": true, + "⊲": true, + "⊂⃒": true, + "⊃⃒": true, + "𝕧": true, + "∝": true, + "⊳": true, + "𝓋": true, + "⫋︀": true, + "⊊︀": true, + "⫌︀": true, + "⊋︀": true, + "⦚": true, + "ŵ": true, + "⩟": true, + "∧": true, + "≙": true, + "℘": true, + "𝔴": true, + "𝕨": true, + "℘": true, + "≀": true, + "≀": true, + "𝓌": true, + "⋂": true, + "◯": true, + "⋃": true, + "▽": true, + "𝔵": true, + "⟺": true, + "⟷": true, + "ξ": true, + "⟸": true, + "⟵": true, + "⟼": true, + "⋻": true, + "⨀": true, + "𝕩": true, + "⨁": true, + "⨂": true, + "⟹": true, + "⟶": true, + "𝓍": true, + "⨆": true, + "⨄": true, + "△": true, + "⋁": true, + "⋀": true, + "ý": true, + "ý": true, + "я": true, + "ŷ": true, + "ы": true, + "¥": true, + "¥": true, + "𝔶": true, + "ї": true, + "𝕪": true, + "𝓎": true, + "ю": true, + "ÿ": true, + "ÿ": true, + "ź": true, + "ž": true, + "з": true, + "ż": true, + "ℨ": true, + "ζ": true, + "𝔷": true, + "ж": true, + "⇝": true, + "𝕫": true, + "𝓏": true, + "‍": true, + "‌": true, +} diff --git a/esc.go b/esc.go new file mode 100644 index 00000000..6ab60102 --- /dev/null +++ b/esc.go @@ -0,0 +1,70 @@ +package blackfriday + +import ( + "html" + "io" +) + +var htmlEscaper = [256][]byte{ + '&': []byte("&"), + '<': []byte("<"), + '>': []byte(">"), + '"': []byte("""), +} + +func escapeHTML(w io.Writer, s []byte) { + escapeEntities(w, s, false) +} + +func escapeAllHTML(w io.Writer, s []byte) { + escapeEntities(w, s, true) +} + +func escapeEntities(w io.Writer, s []byte, escapeValidEntities bool) { + var start, end int + for end < len(s) { + escSeq := htmlEscaper[s[end]] + if escSeq != nil { + isEntity, entityEnd := nodeIsEntity(s, end) + if isEntity && !escapeValidEntities { + w.Write(s[start : entityEnd+1]) + start = entityEnd + 1 + } else { + w.Write(s[start:end]) + w.Write(escSeq) + start = end + 1 + } + } + end++ + } + if start < len(s) && end <= len(s) { + w.Write(s[start:end]) + } +} + +func nodeIsEntity(s []byte, end int) (isEntity bool, endEntityPos int) { + isEntity = false + endEntityPos = end + 1 + + if s[end] == '&' { + for endEntityPos < len(s) { + if s[endEntityPos] == ';' { + if entities[string(s[end:endEntityPos+1])] { + isEntity = true + break + } + } + if !isalnum(s[endEntityPos]) && s[endEntityPos] != '&' && s[endEntityPos] != '#' { + break + } + endEntityPos++ + } + } + + return isEntity, endEntityPos +} + +func escLink(w io.Writer, text []byte) { + unesc := html.UnescapeString(string(text)) + escapeHTML(w, []byte(unesc)) +} diff --git a/esc_test.go b/esc_test.go new file mode 100644 index 00000000..fe318f70 --- /dev/null +++ b/esc_test.go @@ -0,0 +1,49 @@ +package blackfriday + +import ( + "bytes" + "testing" +) + +func TestEsc(t *testing.T) { + t.Parallel() + tests := []string{ + "abc", "abc", + "a&c", "a&c", + "<", "<", + "[]:<", "[]:<", + "Hello |" + processingInstruction = "[<][?].*?[?][>]" + singleQuotedValue = "'[^']*'" + tagName = "[A-Za-z][A-Za-z0-9-]*" + unquotedValue = "[^\"'=<>`\\x00-\\x20]+" ) -type HtmlRendererParameters struct { +// HTMLRendererParameters is a collection of supplementary parameters tweaking +// the behavior of various parts of HTML renderer. +type HTMLRendererParameters struct { // Prepend this text to each relative URL. AbsolutePrefix string // Add this text to each footnote anchor, to ensure uniqueness. @@ -65,34 +83,38 @@ type HtmlRendererParameters struct { // HTML_FOOTNOTE_RETURN_LINKS flag is enabled. If blank, the string // [return] is used. FootnoteReturnLinkContents string - // If set, add this text to the front of each Header ID, to ensure + // If set, add this text to the front of each Heading ID, to ensure // uniqueness. - HeaderIDPrefix string - // If set, add this text to the back of each Header ID, to ensure uniqueness. - HeaderIDSuffix string + HeadingIDPrefix string + // If set, add this text to the back of each Heading ID, to ensure uniqueness. + HeadingIDSuffix string + // Increase heading levels: if the offset is 1,

becomes

etc. + // Negative offset is also valid. + // Resulting levels are clipped between 1 and 6. + HeadingLevelOffset int + + Title string // Document title (used if CompletePage is set) + CSS string // Optional CSS file URL (used if CompletePage is set) + Icon string // Optional icon file URL (used if CompletePage is set) + + Flags HTMLFlags // Flags allow customizing this renderer's behavior } -// Html is a type that implements the Renderer interface for HTML output. +// HTMLRenderer is a type that implements the Renderer interface for HTML output. // -// Do not create this directly, instead use the HtmlRenderer function. -type Html struct { - flags int // HTML_* options - closeTag string // how to end singleton tags: either " />" or ">" - title string // document title - css string // optional css file url (used with HTML_COMPLETE_PAGE) +// Do not create this directly, instead use the NewHTMLRenderer function. +type HTMLRenderer struct { + HTMLRendererParameters - parameters HtmlRendererParameters + closeTag string // how to end singleton tags: either " />" or ">" - // table of contents data - tocMarker int - headerCount int - currentLevel int - toc *bytes.Buffer + // Track heading IDs to prevent ID collision in a single generation. + headingIDs map[string]int - // Track header IDs to prevent ID collision in a single generation. - headerIDs map[string]int + lastOutputLen int + disableTags int - smartypants *smartypantsRenderer + sr *SPRenderer } const ( @@ -100,850 +122,831 @@ const ( htmlClose = ">" ) -// HtmlRenderer creates and configures an Html object, which +// NewHTMLRenderer creates and configures an HTMLRenderer object, which // satisfies the Renderer interface. -// -// flags is a set of HTML_* options ORed together. -// title is the title of the document, and css is a URL for the document's -// stylesheet. -// title and css are only used when HTML_COMPLETE_PAGE is selected. -func HtmlRenderer(flags int, title string, css string) Renderer { - return HtmlRendererWithParameters(flags, title, css, HtmlRendererParameters{}) -} - -func HtmlRendererWithParameters(flags int, title string, - css string, renderParameters HtmlRendererParameters) Renderer { +func NewHTMLRenderer(params HTMLRendererParameters) *HTMLRenderer { // configure the rendering engine closeTag := htmlClose - if flags&HTML_USE_XHTML != 0 { + if params.Flags&UseXHTML != 0 { closeTag = xhtmlClose } - if renderParameters.FootnoteReturnLinkContents == "" { - renderParameters.FootnoteReturnLinkContents = `[return]` + if params.FootnoteReturnLinkContents == "" { + // U+FE0E is VARIATION SELECTOR-15. + // It suppresses automatic emoji presentation of the preceding + // U+21A9 LEFTWARDS ARROW WITH HOOK on iOS and iPadOS. + params.FootnoteReturnLinkContents = "↩\ufe0e" } - return &Html{ - flags: flags, - closeTag: closeTag, - title: title, - css: css, - parameters: renderParameters, - - headerCount: 0, - currentLevel: 0, - toc: new(bytes.Buffer), + return &HTMLRenderer{ + HTMLRendererParameters: params, - headerIDs: make(map[string]int), + closeTag: closeTag, + headingIDs: make(map[string]int), - smartypants: smartypants(flags), + sr: NewSmartypantsRenderer(params.Flags), } } -// Using if statements is a bit faster than a switch statement. As the compiler -// improves, this should be unnecessary this is only worthwhile because -// attrEscape is the single largest CPU user in normal use. -// Also tried using map, but that gave a ~3x slowdown. -func escapeSingleChar(char byte) (string, bool) { - if char == '"' { - return """, true - } - if char == '&' { - return "&", true - } - if char == '<' { - return "<", true - } - if char == '>' { - return ">", true - } - return "", false +func isHTMLTag(tag []byte, tagname string) bool { + found, _ := findHTMLTagPos(tag, tagname) + return found } -func attrEscape(out *bytes.Buffer, src []byte) { - org := 0 - for i, ch := range src { - if entity, ok := escapeSingleChar(ch); ok { - if i > org { - // copy all the normal characters since the last escape - out.Write(src[org:i]) - } - org = i + 1 - out.WriteString(entity) +// Look for a character, but ignore it when it's in any kind of quotes, it +// might be JavaScript +func skipUntilCharIgnoreQuotes(html []byte, start int, char byte) int { + inSingleQuote := false + inDoubleQuote := false + inGraveQuote := false + i := start + for i < len(html) { + switch { + case html[i] == char && !inSingleQuote && !inDoubleQuote && !inGraveQuote: + return i + case html[i] == '\'': + inSingleQuote = !inSingleQuote + case html[i] == '"': + inDoubleQuote = !inDoubleQuote + case html[i] == '`': + inGraveQuote = !inGraveQuote } + i++ } - if org < len(src) { - out.Write(src[org:]) - } + return start } -func entityEscapeWithSkip(out *bytes.Buffer, src []byte, skipRanges [][]int) { - end := 0 - for _, rang := range skipRanges { - attrEscape(out, src[end:rang[0]]) - out.Write(src[rang[0]:rang[1]]) - end = rang[1] +func findHTMLTagPos(tag []byte, tagname string) (bool, int) { + i := 0 + if i < len(tag) && tag[0] != '<' { + return false, -1 } - attrEscape(out, src[end:]) -} - -func (options *Html) GetFlags() int { - return options.flags -} - -func (options *Html) TitleBlock(out *bytes.Buffer, text []byte) { - text = bytes.TrimPrefix(text, []byte("% ")) - text = bytes.Replace(text, []byte("\n% "), []byte("\n"), -1) - out.WriteString("

") - out.Write(text) - out.WriteString("\n

") -} - -func (options *Html) Header(out *bytes.Buffer, text func() bool, level int, id string) { - marker := out.Len() - doubleSpace(out) + i++ + i = skipSpace(tag, i) - if id == "" && options.flags&HTML_TOC != 0 { - id = fmt.Sprintf("toc_%d", options.headerCount) + if i < len(tag) && tag[i] == '/' { + i++ } - if id != "" { - id = options.ensureUniqueHeaderID(id) - - if options.parameters.HeaderIDPrefix != "" { - id = options.parameters.HeaderIDPrefix + id + i = skipSpace(tag, i) + j := 0 + for ; i < len(tag); i, j = i+1, j+1 { + if j >= len(tagname) { + break } - if options.parameters.HeaderIDSuffix != "" { - id = id + options.parameters.HeaderIDSuffix + if strings.ToLower(string(tag[i]))[0] != tagname[j] { + return false, -1 } - - out.WriteString(fmt.Sprintf("", level, id)) - } else { - out.WriteString(fmt.Sprintf("", level)) } - tocMarker := out.Len() - if !text() { - out.Truncate(marker) - return + if i == len(tag) { + return false, -1 } - // are we building a table of contents? - if options.flags&HTML_TOC != 0 { - options.TocHeaderWithAnchor(out.Bytes()[tocMarker:], level, id) + rightAngle := skipUntilCharIgnoreQuotes(tag, i, '>') + if rightAngle >= i { + return true, rightAngle } - out.WriteString(fmt.Sprintf("\n", level)) + return false, -1 } -func (options *Html) BlockHtml(out *bytes.Buffer, text []byte) { - if options.flags&HTML_SKIP_HTML != 0 { - return +func skipSpace(tag []byte, i int) int { + for i < len(tag) && isspace(tag[i]) { + i++ } - - doubleSpace(out) - out.Write(text) - out.WriteByte('\n') -} - -func (options *Html) HRule(out *bytes.Buffer) { - doubleSpace(out) - out.WriteString("") - } else { - out.WriteString("\">") + // link begin with '/' but not '//', the second maybe a protocol relative link + if len(link) >= 2 && link[0] == '/' && link[1] != '/' { + return true } - attrEscape(out, text) - out.WriteString("\n") -} - -func (options *Html) BlockQuote(out *bytes.Buffer, text []byte) { - doubleSpace(out) - out.WriteString("
\n") - out.Write(text) - out.WriteString("
\n") -} - -func (options *Html) Table(out *bytes.Buffer, header []byte, body []byte, columnData []int) { - doubleSpace(out) - out.WriteString("\n\n") - out.Write(header) - out.WriteString("\n\n\n") - out.Write(body) - out.WriteString("\n
\n") -} - -func (options *Html) TableRow(out *bytes.Buffer, text []byte) { - doubleSpace(out) - out.WriteString("\n") - out.Write(text) - out.WriteString("\n\n") -} - -func (options *Html) TableHeaderCell(out *bytes.Buffer, text []byte, align int) { - doubleSpace(out) - switch align { - case TABLE_ALIGNMENT_LEFT: - out.WriteString("") - case TABLE_ALIGNMENT_RIGHT: - out.WriteString("") - case TABLE_ALIGNMENT_CENTER: - out.WriteString("") - default: - out.WriteString("") + // only the root '/' + if len(link) == 1 && link[0] == '/' { + return true } - out.Write(text) - out.WriteString("") -} - -func (options *Html) TableCell(out *bytes.Buffer, text []byte, align int) { - doubleSpace(out) - switch align { - case TABLE_ALIGNMENT_LEFT: - out.WriteString("") - case TABLE_ALIGNMENT_RIGHT: - out.WriteString("") - case TABLE_ALIGNMENT_CENTER: - out.WriteString("") - default: - out.WriteString("") + // current directory : begin with "./" + if bytes.HasPrefix(link, []byte("./")) { + return true } - out.Write(text) - out.WriteString("") -} - -func (options *Html) Footnotes(out *bytes.Buffer, text func() bool) { - out.WriteString("
\n") - options.HRule(out) - options.List(out, text, LIST_TYPE_ORDERED) - out.WriteString("
\n") -} + // parent directory : begin with "../" + if bytes.HasPrefix(link, []byte("../")) { + return true + } -func (options *Html) FootnoteItem(out *bytes.Buffer, name, text []byte, flags int) { - if flags&LIST_ITEM_CONTAINS_BLOCK != 0 || flags&LIST_ITEM_BEGINNING_OF_LIST != 0 { - doubleSpace(out) - } - slug := slugify(name) - out.WriteString(`
  • `) - out.Write(text) - if options.flags&HTML_FOOTNOTE_RETURN_LINKS != 0 { - out.WriteString(` `) - out.WriteString(options.parameters.FootnoteReturnLinkContents) - out.WriteString(``) - } - out.WriteString("
  • \n") + return false } -func (options *Html) List(out *bytes.Buffer, text func() bool, flags int) { - marker := out.Len() - doubleSpace(out) +func (r *HTMLRenderer) ensureUniqueHeadingID(id string) string { + for count, found := r.headingIDs[id]; found; count, found = r.headingIDs[id] { + tmp := fmt.Sprintf("%s-%d", id, count+1) - if flags&LIST_TYPE_DEFINITION != 0 { - out.WriteString("
    ") - } else if flags&LIST_TYPE_ORDERED != 0 { - out.WriteString("
      ") - } else { - out.WriteString("
        ") - } - if !text() { - out.Truncate(marker) - return - } - if flags&LIST_TYPE_DEFINITION != 0 { - out.WriteString("
    \n") - } else if flags&LIST_TYPE_ORDERED != 0 { - out.WriteString("\n") - } else { - out.WriteString("\n") + if _, tmpFound := r.headingIDs[tmp]; !tmpFound { + r.headingIDs[id] = count + 1 + id = tmp + } else { + id = id + "-1" + } } -} -func (options *Html) ListItem(out *bytes.Buffer, text []byte, flags int) { - if (flags&LIST_ITEM_CONTAINS_BLOCK != 0 && flags&LIST_TYPE_DEFINITION == 0) || - flags&LIST_ITEM_BEGINNING_OF_LIST != 0 { - doubleSpace(out) + if _, found := r.headingIDs[id]; !found { + r.headingIDs[id] = 0 } - if flags&LIST_TYPE_TERM != 0 { - out.WriteString("
    ") - } else if flags&LIST_TYPE_DEFINITION != 0 { - out.WriteString("
    ") - } else { - out.WriteString("
  • ") - } - out.Write(text) - if flags&LIST_TYPE_TERM != 0 { - out.WriteString("\n") - } else if flags&LIST_TYPE_DEFINITION != 0 { - out.WriteString("
  • \n") - } else { - out.WriteString("\n") - } -} - -func (options *Html) Paragraph(out *bytes.Buffer, text func() bool) { - marker := out.Len() - doubleSpace(out) - out.WriteString("

    ") - if !text() { - out.Truncate(marker) - return - } - out.WriteString("

    \n") + return id } -func (options *Html) AutoLink(out *bytes.Buffer, link []byte, kind int) { - skipRanges := htmlEntity.FindAllIndex(link, -1) - if options.flags&HTML_SAFELINK != 0 && !isSafeLink(link) && kind != LINK_TYPE_EMAIL { - // mark it but don't link it if it is not a safe link: no smartypants - out.WriteString("") - entityEscapeWithSkip(out, link, skipRanges) - out.WriteString("") - return +func (r *HTMLRenderer) addAbsPrefix(link []byte) []byte { + if r.AbsolutePrefix != "" && isRelativeLink(link) && link[0] != '.' { + newDest := r.AbsolutePrefix + if link[0] != '/' { + newDest += "/" + } + newDest += string(link) + return []byte(newDest) } + return link +} - out.WriteString(" 0 { - out.WriteString(fmt.Sprintf("\" rel=\"%s", strings.Join(relAttrs, " "))) + if flags&NoopenerLinks != 0 { + val = append(val, "noopener") } - - // blank target only add to external link - if options.flags&HTML_HREF_TARGET_BLANK != 0 && !isRelativeLink(link) { - out.WriteString("\" target=\"_blank") + if flags&HrefTargetBlank != 0 { + attrs = append(attrs, "target=\"_blank\"") } - - out.WriteString("\">") - - // Pretty print: if we get an email address as - // an actual URI, e.g. `mailto:foo@bar.com`, we don't - // want to print the `mailto:` prefix - switch { - case bytes.HasPrefix(link, []byte("mailto://")): - attrEscape(out, link[len("mailto://"):]) - case bytes.HasPrefix(link, []byte("mailto:")): - attrEscape(out, link[len("mailto:"):]) - default: - entityEscapeWithSkip(out, link, skipRanges) + if len(val) == 0 { + return attrs } - - out.WriteString("") + attr := fmt.Sprintf("rel=%q", strings.Join(val, " ")) + return append(attrs, attr) } -func (options *Html) CodeSpan(out *bytes.Buffer, text []byte) { - out.WriteString("") - attrEscape(out, text) - out.WriteString("") +func isMailto(link []byte) bool { + return bytes.HasPrefix(link, []byte("mailto:")) } -func (options *Html) DoubleEmphasis(out *bytes.Buffer, text []byte) { - out.WriteString("") - out.Write(text) - out.WriteString("") -} - -func (options *Html) Emphasis(out *bytes.Buffer, text []byte) { - if len(text) == 0 { - return +func needSkipLink(flags HTMLFlags, dest []byte) bool { + if flags&SkipLinks != 0 { + return true } - out.WriteString("") - out.Write(text) - out.WriteString("") + return flags&Safelink != 0 && !isSafeLink(dest) && !isMailto(dest) } -func (options *Html) maybeWriteAbsolutePrefix(out *bytes.Buffer, link []byte) { - if options.parameters.AbsolutePrefix != "" && isRelativeLink(link) && link[0] != '.' { - out.WriteString(options.parameters.AbsolutePrefix) - if link[0] != '/' { - out.WriteByte('/') - } - } +func isSmartypantable(node *Node) bool { + pt := node.Parent.Type + return pt != Link && pt != CodeBlock && pt != Code } -func (options *Html) Image(out *bytes.Buffer, link []byte, title []byte, alt []byte) { - if options.flags&HTML_SKIP_IMAGES != 0 { - return - } - - out.WriteString("\"") 0 { - attrEscape(out, alt) +func appendLanguageAttr(attrs []string, info []byte) []string { + if len(info) == 0 { + return attrs } - if len(title) > 0 { - out.WriteString("\" title=\"") - attrEscape(out, title) + endOfLang := bytes.IndexAny(info, "\t ") + if endOfLang < 0 { + endOfLang = len(info) } - - out.WriteByte('"') - out.WriteString(options.closeTag) -} - -func (options *Html) LineBreak(out *bytes.Buffer) { - out.WriteString("") - attrEscape(out, content) - out.WriteString("") - return - } - - if options.flags&HTML_SAFELINK != 0 && !isSafeLink(link) { - // write the link text out but don't link it, just mark it with typewriter font - out.WriteString("") - attrEscape(out, content) - out.WriteString("") - return - } - - out.WriteString(" 0 { - out.WriteString("\" title=\"") - attrEscape(out, title) - } - var relAttrs []string - if options.flags&HTML_NOFOLLOW_LINKS != 0 && !isRelativeLink(link) { - relAttrs = append(relAttrs, "nofollow") - } - if options.flags&HTML_NOREFERRER_LINKS != 0 && !isRelativeLink(link) { - relAttrs = append(relAttrs, "noreferrer") - } - if len(relAttrs) > 0 { - out.WriteString(fmt.Sprintf("\" rel=\"%s", strings.Join(relAttrs, " "))) - } - - // blank target only add to external link - if options.flags&HTML_HREF_TARGET_BLANK != 0 && !isRelativeLink(link) { - out.WriteString("\" target=\"_blank") +func (r *HTMLRenderer) tag(w io.Writer, name []byte, attrs []string) { + w.Write(name) + if len(attrs) > 0 { + w.Write(spaceBytes) + w.Write([]byte(strings.Join(attrs, " "))) } + w.Write(gtBytes) + r.lastOutputLen = 1 +} - out.WriteString("\">") - out.Write(content) - out.WriteString("") - return +func footnoteRef(prefix string, node *Node) []byte { + urlFrag := prefix + string(slugify(node.Destination)) + anchor := fmt.Sprintf(`%d`, urlFrag, node.NoteID) + return []byte(fmt.Sprintf(`%s`, urlFrag, anchor)) } -func (options *Html) RawHtmlTag(out *bytes.Buffer, text []byte) { - if options.flags&HTML_SKIP_HTML != 0 { - return - } - if options.flags&HTML_SKIP_STYLE != 0 && isHtmlTag(text, "style") { - return - } - if options.flags&HTML_SKIP_LINKS != 0 && isHtmlTag(text, "a") { - return - } - if options.flags&HTML_SKIP_IMAGES != 0 && isHtmlTag(text, "img") { - return - } - out.Write(text) +func footnoteItem(prefix string, slug []byte) []byte { + return []byte(fmt.Sprintf(`
  • `, prefix, slug)) } -func (options *Html) TripleEmphasis(out *bytes.Buffer, text []byte) { - out.WriteString("") - out.Write(text) - out.WriteString("") +func footnoteReturnLink(prefix, returnLink string, slug []byte) []byte { + const format = ` %s` + return []byte(fmt.Sprintf(format, prefix, slug, returnLink)) } -func (options *Html) StrikeThrough(out *bytes.Buffer, text []byte) { - out.WriteString("") - out.Write(text) - out.WriteString("") +func itemOpenCR(node *Node) bool { + if node.Prev == nil { + return false + } + ld := node.Parent.ListData + return !ld.Tight && ld.ListFlags&ListTypeDefinition == 0 } -func (options *Html) FootnoteRef(out *bytes.Buffer, ref []byte, id int) { - slug := slugify(ref) - out.WriteString(``) - out.WriteString(strconv.Itoa(id)) - out.WriteString(``) +func skipParagraphTags(node *Node) bool { + grandparent := node.Parent.Parent + if grandparent == nil || grandparent.Type != List { + return false + } + tightOrTerm := grandparent.Tight || node.Parent.ListFlags&ListTypeTerm != 0 + return grandparent.Type == List && tightOrTerm } -func (options *Html) Entity(out *bytes.Buffer, entity []byte) { - out.Write(entity) +func cellAlignment(align CellAlignFlags) string { + switch align { + case TableAlignmentLeft: + return "left" + case TableAlignmentRight: + return "right" + case TableAlignmentCenter: + return "center" + default: + return "" + } } -func (options *Html) NormalText(out *bytes.Buffer, text []byte) { - if options.flags&HTML_USE_SMARTYPANTS != 0 { - options.Smartypants(out, text) +func (r *HTMLRenderer) out(w io.Writer, text []byte) { + if r.disableTags > 0 { + w.Write(htmlTagRe.ReplaceAll(text, []byte{})) } else { - attrEscape(out, text) + w.Write(text) } + r.lastOutputLen = len(text) } -func (options *Html) Smartypants(out *bytes.Buffer, text []byte) { - smrt := smartypantsData{false, false} +func (r *HTMLRenderer) cr(w io.Writer) { + if r.lastOutputLen > 0 { + r.out(w, nlBytes) + } +} - // first do normal entity escaping - var escaped bytes.Buffer - attrEscape(&escaped, text) - text = escaped.Bytes() +var ( + nlBytes = []byte{'\n'} + gtBytes = []byte{'>'} + spaceBytes = []byte{' '} +) - mark := 0 - for i := 0; i < len(text); i++ { - if action := options.smartypants[text[i]]; action != nil { - if i > mark { - out.Write(text[mark:i]) - } +var ( + brTag = []byte("
    ") + brXHTMLTag = []byte("
    ") + emTag = []byte("") + emCloseTag = []byte("") + strongTag = []byte("") + strongCloseTag = []byte("") + delTag = []byte("") + delCloseTag = []byte("") + ttTag = []byte("") + ttCloseTag = []byte("") + aTag = []byte("") + preTag = []byte("
    ")
    +	preCloseTag        = []byte("
    ") + codeTag = []byte("") + codeCloseTag = []byte("") + pTag = []byte("

    ") + pCloseTag = []byte("

    ") + blockquoteTag = []byte("
    ") + blockquoteCloseTag = []byte("
    ") + hrTag = []byte("
    ") + hrXHTMLTag = []byte("
    ") + ulTag = []byte("
      ") + ulCloseTag = []byte("
    ") + olTag = []byte("
      ") + olCloseTag = []byte("
    ") + dlTag = []byte("
    ") + dlCloseTag = []byte("
    ") + liTag = []byte("
  • ") + liCloseTag = []byte("
  • ") + ddTag = []byte("
    ") + ddCloseTag = []byte("
    ") + dtTag = []byte("
    ") + dtCloseTag = []byte("
    ") + tableTag = []byte("") + tableCloseTag = []byte("
    ") + tdTag = []byte("") + thTag = []byte("") + theadTag = []byte("") + theadCloseTag = []byte("") + tbodyTag = []byte("") + tbodyCloseTag = []byte("") + trTag = []byte("") + trCloseTag = []byte("") + h1Tag = []byte("") + h2Tag = []byte("") + h3Tag = []byte("") + h4Tag = []byte("") + h5Tag = []byte("") + h6Tag = []byte("") + + footnotesDivBytes = []byte("\n
    \n\n") + footnotesCloseDivBytes = []byte("\n
    \n") +) - previousChar := byte(0) - if i > 0 { - previousChar = text[i-1] - } - i += action(out, &smrt, previousChar, text[i:]) - mark = i + 1 - } +func headingTagsFromLevel(level int) ([]byte, []byte) { + if level <= 1 { + return h1Tag, h1CloseTag } - - if mark < len(text) { - out.Write(text[mark:]) + switch level { + case 2: + return h2Tag, h2CloseTag + case 3: + return h3Tag, h3CloseTag + case 4: + return h4Tag, h4CloseTag + case 5: + return h5Tag, h5CloseTag } + return h6Tag, h6CloseTag } -func (options *Html) DocumentHeader(out *bytes.Buffer) { - if options.flags&HTML_COMPLETE_PAGE == 0 { - return - } - - ending := "" - if options.flags&HTML_USE_XHTML != 0 { - out.WriteString("\n") - out.WriteString("\n") - ending = " /" +func (r *HTMLRenderer) outHRTag(w io.Writer) { + if r.Flags&UseXHTML == 0 { + r.out(w, hrTag) } else { - out.WriteString("\n") - out.WriteString("\n") - } - out.WriteString("\n") - out.WriteString(" ") - options.NormalText(out, []byte(options.title)) - out.WriteString("\n") - out.WriteString(" \n") - out.WriteString(" \n") - if options.css != "" { - out.WriteString(" \n") - } - out.WriteString("\n") - out.WriteString("\n") - - options.tocMarker = out.Len() + r.out(w, hrXHTMLTag) + } } -func (options *Html) DocumentFooter(out *bytes.Buffer) { - // finalize and insert the table of contents - if options.flags&HTML_TOC != 0 { - options.TocFinalize() - - // now we have to insert the table of contents into the document - var temp bytes.Buffer - - // start by making a copy of everything after the document header - temp.Write(out.Bytes()[options.tocMarker:]) - - // now clear the copied material from the main output buffer - out.Truncate(options.tocMarker) - - // corner case spacing issue - if options.flags&HTML_COMPLETE_PAGE != 0 { - out.WriteByte('\n') +// RenderNode is a default renderer of a single node of a syntax tree. For +// block nodes it will be called twice: first time with entering=true, second +// time with entering=false, so that it could know when it's working on an open +// tag and when on close. It writes the result to w. +// +// The return value is a way to tell the calling walker to adjust its walk +// pattern: e.g. it can terminate the traversal by returning Terminate. Or it +// can ask the walker to skip a subtree of this node by returning SkipChildren. +// The typical behavior is to return GoToNext, which asks for the usual +// traversal to the next node. +func (r *HTMLRenderer) RenderNode(w io.Writer, node *Node, entering bool) WalkStatus { + attrs := []string{} + switch node.Type { + case Text: + if r.Flags&Smartypants != 0 { + var tmp bytes.Buffer + escapeHTML(&tmp, node.Literal) + r.sr.Process(w, tmp.Bytes()) + } else { + if node.Parent.Type == Link { + escLink(w, node.Literal) + } else { + escapeHTML(w, node.Literal) + } } - - // insert the table of contents - out.WriteString("\n") - - // corner case spacing issue - if options.flags&HTML_COMPLETE_PAGE == 0 && options.flags&HTML_OMIT_CONTENTS == 0 { - out.WriteByte('\n') + case Softbreak: + r.cr(w) + // TODO: make it configurable via out(renderer.softbreak) + case Hardbreak: + if r.Flags&UseXHTML == 0 { + r.out(w, brTag) + } else { + r.out(w, brXHTMLTag) } - - // write out everything that came after it - if options.flags&HTML_OMIT_CONTENTS == 0 { - out.Write(temp.Bytes()) + r.cr(w) + case Emph: + if entering { + r.out(w, emTag) + } else { + r.out(w, emCloseTag) } - } - - if options.flags&HTML_COMPLETE_PAGE != 0 { - out.WriteString("\n\n") - out.WriteString("\n") - } - -} - -func (options *Html) TocHeaderWithAnchor(text []byte, level int, anchor string) { - for level > options.currentLevel { - switch { - case bytes.HasSuffix(options.toc.Bytes(), []byte("\n")): - // this sublist can nest underneath a header - size := options.toc.Len() - options.toc.Truncate(size - len("\n")) - - case options.currentLevel > 0: - options.toc.WriteString("
  • ") + case Strong: + if entering { + r.out(w, strongTag) + } else { + r.out(w, strongCloseTag) } - if options.toc.Len() > 0 { - options.toc.WriteByte('\n') + case Del: + if entering { + r.out(w, delTag) + } else { + r.out(w, delCloseTag) } - options.toc.WriteString("
      \n") - options.currentLevel++ - } - - for level < options.currentLevel { - options.toc.WriteString("
    ") - if options.currentLevel > 1 { - options.toc.WriteString("
  • \n") + case HTMLSpan: + if r.Flags&SkipHTML != 0 { + break } - options.currentLevel-- - } - - options.toc.WriteString("
  • ") - options.headerCount++ - - options.toc.Write(text) - - options.toc.WriteString("
  • \n") -} - -func (options *Html) TocHeader(text []byte, level int) { - options.TocHeaderWithAnchor(text, level, "") -} - -func (options *Html) TocFinalize() { - for options.currentLevel > 1 { - options.toc.WriteString("\n") - options.currentLevel-- - } - - if options.currentLevel > 0 { - options.toc.WriteString("\n") - } -} - -func isHtmlTag(tag []byte, tagname string) bool { - found, _ := findHtmlTagPos(tag, tagname) - return found -} - -// Look for a character, but ignore it when it's in any kind of quotes, it -// might be JavaScript -func skipUntilCharIgnoreQuotes(html []byte, start int, char byte) int { - inSingleQuote := false - inDoubleQuote := false - inGraveQuote := false - i := start - for i < len(html) { - switch { - case html[i] == char && !inSingleQuote && !inDoubleQuote && !inGraveQuote: - return i - case html[i] == '\'': - inSingleQuote = !inSingleQuote - case html[i] == '"': - inDoubleQuote = !inDoubleQuote - case html[i] == '`': - inGraveQuote = !inGraveQuote + r.out(w, node.Literal) + case Link: + // mark it but don't link it if it is not a safe link: no smartypants + dest := node.LinkData.Destination + if needSkipLink(r.Flags, dest) { + if entering { + r.out(w, ttTag) + } else { + r.out(w, ttCloseTag) + } + } else { + if entering { + dest = r.addAbsPrefix(dest) + var hrefBuf bytes.Buffer + hrefBuf.WriteString("href=\"") + escLink(&hrefBuf, dest) + hrefBuf.WriteByte('"') + attrs = append(attrs, hrefBuf.String()) + if node.NoteID != 0 { + r.out(w, footnoteRef(r.FootnoteAnchorPrefix, node)) + break + } + attrs = appendLinkAttrs(attrs, r.Flags, dest) + if len(node.LinkData.Title) > 0 { + var titleBuff bytes.Buffer + titleBuff.WriteString("title=\"") + escapeHTML(&titleBuff, node.LinkData.Title) + titleBuff.WriteByte('"') + attrs = append(attrs, titleBuff.String()) + } + r.tag(w, aTag, attrs) + } else { + if node.NoteID != 0 { + break + } + r.out(w, aCloseTag) + } } - i++ - } - return start -} - -func findHtmlTagPos(tag []byte, tagname string) (bool, int) { - i := 0 - if i < len(tag) && tag[0] != '<' { - return false, -1 - } - i++ - i = skipSpace(tag, i) - - if i < len(tag) && tag[i] == '/' { - i++ - } - - i = skipSpace(tag, i) - j := 0 - for ; i < len(tag); i, j = i+1, j+1 { - if j >= len(tagname) { + case Image: + if r.Flags&SkipImages != 0 { + return SkipChildren + } + if entering { + dest := node.LinkData.Destination + dest = r.addAbsPrefix(dest) + if r.disableTags == 0 { + //if options.safe && potentiallyUnsafe(dest) { + //out(w, ``)
+				//} else {
+				r.out(w, []byte(`<img src=`)) + } + } + case Code: + r.out(w, codeTag) + escapeAllHTML(w, node.Literal) + r.out(w, codeCloseTag) + case Document: + break + case Paragraph: + if skipParagraphTags(node) { break } - - if strings.ToLower(string(tag[i]))[0] != tagname[j] { - return false, -1 + if entering { + // TODO: untangle this clusterfuck about when the newlines need + // to be added and when not. + if node.Prev != nil { + switch node.Prev.Type { + case HTMLBlock, List, Paragraph, Heading, CodeBlock, BlockQuote, HorizontalRule: + r.cr(w) + } + } + if node.Parent.Type == BlockQuote && node.Prev == nil { + r.cr(w) + } + r.out(w, pTag) + } else { + r.out(w, pCloseTag) + if !(node.Parent.Type == Item && node.Next == nil) { + r.cr(w) + } } + case BlockQuote: + if entering { + r.cr(w) + r.out(w, blockquoteTag) + } else { + r.out(w, blockquoteCloseTag) + r.cr(w) + } + case HTMLBlock: + if r.Flags&SkipHTML != 0 { + break + } + r.cr(w) + r.out(w, node.Literal) + r.cr(w) + case Heading: + headingLevel := r.HTMLRendererParameters.HeadingLevelOffset + node.Level + openTag, closeTag := headingTagsFromLevel(headingLevel) + if entering { + if node.IsTitleblock { + attrs = append(attrs, `class="title"`) + } + if node.HeadingID != "" { + id := r.ensureUniqueHeadingID(node.HeadingID) + if r.HeadingIDPrefix != "" { + id = r.HeadingIDPrefix + id + } + if r.HeadingIDSuffix != "" { + id = id + r.HeadingIDSuffix + } + attrs = append(attrs, fmt.Sprintf(`id="%s"`, id)) + } + r.cr(w) + r.tag(w, openTag, attrs) + } else { + r.out(w, closeTag) + if !(node.Parent.Type == Item && node.Next == nil) { + r.cr(w) + } + } + case HorizontalRule: + r.cr(w) + r.outHRTag(w) + r.cr(w) + case List: + openTag := ulTag + closeTag := ulCloseTag + if node.ListFlags&ListTypeOrdered != 0 { + openTag = olTag + closeTag = olCloseTag + } + if node.ListFlags&ListTypeDefinition != 0 { + openTag = dlTag + closeTag = dlCloseTag + } + if entering { + if node.IsFootnotesList { + r.out(w, footnotesDivBytes) + r.outHRTag(w) + r.cr(w) + } + r.cr(w) + if node.Parent.Type == Item && node.Parent.Parent.Tight { + r.cr(w) + } + r.tag(w, openTag[:len(openTag)-1], attrs) + r.cr(w) + } else { + r.out(w, closeTag) + //cr(w) + //if node.parent.Type != Item { + // cr(w) + //} + if node.Parent.Type == Item && node.Next != nil { + r.cr(w) + } + if node.Parent.Type == Document || node.Parent.Type == BlockQuote { + r.cr(w) + } + if node.IsFootnotesList { + r.out(w, footnotesCloseDivBytes) + } + } + case Item: + openTag := liTag + closeTag := liCloseTag + if node.ListFlags&ListTypeDefinition != 0 { + openTag = ddTag + closeTag = ddCloseTag + } + if node.ListFlags&ListTypeTerm != 0 { + openTag = dtTag + closeTag = dtCloseTag + } + if entering { + if itemOpenCR(node) { + r.cr(w) + } + if node.ListData.RefLink != nil { + slug := slugify(node.ListData.RefLink) + r.out(w, footnoteItem(r.FootnoteAnchorPrefix, slug)) + break + } + r.out(w, openTag) + } else { + if node.ListData.RefLink != nil { + slug := slugify(node.ListData.RefLink) + if r.Flags&FootnoteReturnLinks != 0 { + r.out(w, footnoteReturnLink(r.FootnoteAnchorPrefix, r.FootnoteReturnLinkContents, slug)) + } + } + r.out(w, closeTag) + r.cr(w) + } + case CodeBlock: + attrs = appendLanguageAttr(attrs, node.Info) + r.cr(w) + r.out(w, preTag) + r.tag(w, codeTag[:len(codeTag)-1], attrs) + escapeAllHTML(w, node.Literal) + r.out(w, codeCloseTag) + r.out(w, preCloseTag) + if node.Parent.Type != Item { + r.cr(w) + } + case Table: + if entering { + r.cr(w) + r.out(w, tableTag) + } else { + r.out(w, tableCloseTag) + r.cr(w) + } + case TableCell: + openTag := tdTag + closeTag := tdCloseTag + if node.IsHeader { + openTag = thTag + closeTag = thCloseTag + } + if entering { + align := cellAlignment(node.Align) + if align != "" { + attrs = append(attrs, fmt.Sprintf(`align="%s"`, align)) + } + if node.Prev == nil { + r.cr(w) + } + r.tag(w, openTag, attrs) + } else { + r.out(w, closeTag) + r.cr(w) + } + case TableHead: + if entering { + r.cr(w) + r.out(w, theadTag) + } else { + r.out(w, theadCloseTag) + r.cr(w) + } + case TableBody: + if entering { + r.cr(w) + r.out(w, tbodyTag) + // XXX: this is to adhere to a rather silly test. Should fix test. + if node.FirstChild == nil { + r.cr(w) + } + } else { + r.out(w, tbodyCloseTag) + r.cr(w) + } + case TableRow: + if entering { + r.cr(w) + r.out(w, trTag) + } else { + r.out(w, trCloseTag) + r.cr(w) + } + default: + panic("Unknown node type " + node.Type.String()) } - - if i == len(tag) { - return false, -1 - } - - rightAngle := skipUntilCharIgnoreQuotes(tag, i, '>') - if rightAngle > i { - return true, rightAngle - } - - return false, -1 -} - -func skipUntilChar(text []byte, start int, char byte) int { - i := start - for i < len(text) && text[i] != char { - i++ - } - return i -} - -func skipSpace(tag []byte, i int) int { - for i < len(tag) && isspace(tag[i]) { - i++ - } - return i + return GoToNext } -func skipChar(data []byte, start int, char byte) int { - i := start - for i < len(data) && data[i] == char { - i++ +// RenderHeader writes HTML document preamble and TOC if requested. +func (r *HTMLRenderer) RenderHeader(w io.Writer, ast *Node) { + r.writeDocumentHeader(w) + if r.Flags&TOC != 0 { + r.writeTOC(w, ast) } - return i } -func doubleSpace(out *bytes.Buffer) { - if out.Len() > 0 { - out.WriteByte('\n') +// RenderFooter writes HTML document footer. +func (r *HTMLRenderer) RenderFooter(w io.Writer, ast *Node) { + if r.Flags&CompletePage == 0 { + return } + io.WriteString(w, "\n\n\n") } -func isRelativeLink(link []byte) (yes bool) { - // a tag begin with '#' - if link[0] == '#' { - return true - } - - // link begin with '/' but not '//', the second maybe a protocol relative link - if len(link) >= 2 && link[0] == '/' && link[1] != '/' { - return true - } - - // only the root '/' - if len(link) == 1 && link[0] == '/' { - return true - } - - // current directory : begin with "./" - if bytes.HasPrefix(link, []byte("./")) { - return true +func (r *HTMLRenderer) writeDocumentHeader(w io.Writer) { + if r.Flags&CompletePage == 0 { + return } - - // parent directory : begin with "../" - if bytes.HasPrefix(link, []byte("../")) { - return true + ending := "" + if r.Flags&UseXHTML != 0 { + io.WriteString(w, "\n") + io.WriteString(w, "\n") + ending = " /" + } else { + io.WriteString(w, "\n") + io.WriteString(w, "\n") } + io.WriteString(w, "\n") + io.WriteString(w, " ") + if r.Flags&Smartypants != 0 { + r.sr.Process(w, []byte(r.Title)) + } else { + escapeHTML(w, []byte(r.Title)) + } + io.WriteString(w, "\n") + io.WriteString(w, " \n") + io.WriteString(w, " \n") + if r.CSS != "" { + io.WriteString(w, " \n") + } + if r.Icon != "" { + io.WriteString(w, " \n") + } + io.WriteString(w, "\n") + io.WriteString(w, "\n\n") +} + +func (r *HTMLRenderer) writeTOC(w io.Writer, ast *Node) { + buf := bytes.Buffer{} + + inHeading := false + tocLevel := 0 + headingCount := 0 + + ast.Walk(func(node *Node, entering bool) WalkStatus { + if node.Type == Heading && !node.HeadingData.IsTitleblock { + inHeading = entering + if entering { + node.HeadingID = fmt.Sprintf("toc_%d", headingCount) + if node.Level == tocLevel { + buf.WriteString("\n\n
  • ") + } else if node.Level < tocLevel { + for node.Level < tocLevel { + tocLevel-- + buf.WriteString("
  • \n") + } + buf.WriteString("\n\n
  • ") + } else { + for node.Level > tocLevel { + tocLevel++ + buf.WriteString("\n
      \n
    • ") + } + } + + fmt.Fprintf(&buf, ``, headingCount) + headingCount++ + } else { + buf.WriteString("") + } + return GoToNext + } - return false -} + if inHeading { + return r.RenderNode(&buf, node, entering) + } -func (options *Html) ensureUniqueHeaderID(id string) string { - for count, found := options.headerIDs[id]; found; count, found = options.headerIDs[id] { - tmp := fmt.Sprintf("%s-%d", id, count+1) + return GoToNext + }) - if _, tmpFound := options.headerIDs[tmp]; !tmpFound { - options.headerIDs[id] = count + 1 - id = tmp - } else { - id = id + "-1" - } + for ; tocLevel > 0; tocLevel-- { + buf.WriteString("
    • \n
    ") } - if _, found := options.headerIDs[id]; !found { - options.headerIDs[id] = 0 + if buf.Len() > 0 { + io.WriteString(w, "\n") } - - return id + r.lastOutputLen = buf.Len() } diff --git a/inline.go b/inline.go index a28bb660..d45bd941 100644 --- a/inline.go +++ b/inline.go @@ -22,6 +22,23 @@ import ( var ( urlRe = `((https?|ftp):\/\/|\/)[-A-Za-z0-9+&@#\/%?=~_|!:,.;\(\)]+` anchorRe = regexp.MustCompile(`^(]+")?\s?>` + urlRe + `<\/a>)`) + + // https://www.w3.org/TR/html5/syntax.html#character-references + // highest unicode code point in 17 planes (2^20): 1,114,112d = + // 7 dec digits or 6 hex digits + // named entity references can be 2-31 characters with stuff like < + // at one end and ∳ at the other. There + // are also sometimes numbers at the end, although this isn't inherent + // in the specification; there are never numbers anywhere else in + // current character references, though; see ¾ and ▒, etc. + // https://www.w3.org/TR/html5/syntax.html#named-character-references + // + // entity := "&" (named group | number ref) ";" + // named group := [a-zA-Z]{2,31}[0-9]{0,2} + // number ref := "#" (dec ref | hex ref) + // dec ref := [0-9]{1,7} + // hex ref := ("x" | "X") [0-9a-fA-F]{1,6} + htmlEntityRe = regexp.MustCompile(`&([a-zA-Z]{2,31}[0-9]{0,2}|#([0-9]{1,7}|[xX][0-9a-fA-F]{1,6}));`) ) // Functions to parse text within a block @@ -29,87 +46,89 @@ var ( // data is the complete block being rendered // offset is the number of valid chars before the current cursor -func (p *parser) inline(out *bytes.Buffer, data []byte) { - // this is called recursively: enforce a maximum depth - if p.nesting >= p.maxNesting { +func (p *Markdown) inline(currBlock *Node, data []byte) { + // handlers might call us recursively: enforce a maximum depth + if p.nesting >= p.maxNesting || len(data) == 0 { return } p.nesting++ - - i, end := 0, 0 - for i < len(data) { - // copy inactive chars into the output - for end < len(data) && p.inlineCallback[data[end]] == nil { - end++ - } - - p.r.NormalText(out, data[i:end]) - - if end >= len(data) { - break - } - i = end - - // call the trigger + beg, end := 0, 0 + for end < len(data) { handler := p.inlineCallback[data[end]] - if consumed := handler(p, out, data, i); consumed == 0 { - // no action from the callback; buffer the byte for later - end = i + 1 + if handler != nil { + if consumed, node := handler(p, data, end); consumed == 0 { + // No action from the callback. + end++ + } else { + // Copy inactive chars into the output. + currBlock.AppendChild(text(data[beg:end])) + if node != nil { + currBlock.AppendChild(node) + } + // Skip past whatever the callback used. + beg = end + consumed + end = beg + } } else { - // skip past whatever the callback used - i += consumed - end = i + end++ } } - + if beg < len(data) { + if data[end-1] == '\n' { + end-- + } + currBlock.AppendChild(text(data[beg:end])) + } p.nesting-- } // single and double emphasis parsing -func emphasis(p *parser, out *bytes.Buffer, data []byte, offset int) int { +func emphasis(p *Markdown, data []byte, offset int) (int, *Node) { data = data[offset:] c := data[0] - ret := 0 if len(data) > 2 && data[1] != c { // whitespace cannot follow an opening emphasis; // strikethrough only takes two characters '~~' if c == '~' || isspace(data[1]) { - return 0 + return 0, nil } - if ret = helperEmphasis(p, out, data[1:], c); ret == 0 { - return 0 + ret, node := helperEmphasis(p, data[1:], c) + if ret == 0 { + return 0, nil } - return ret + 1 + return ret + 1, node } if len(data) > 3 && data[1] == c && data[2] != c { if isspace(data[2]) { - return 0 + return 0, nil } - if ret = helperDoubleEmphasis(p, out, data[2:], c); ret == 0 { - return 0 + ret, node := helperDoubleEmphasis(p, data[2:], c) + if ret == 0 { + return 0, nil } - return ret + 2 + return ret + 2, node } if len(data) > 4 && data[1] == c && data[2] == c && data[3] != c { if c == '~' || isspace(data[3]) { - return 0 + return 0, nil } - if ret = helperTripleEmphasis(p, out, data, 3, c); ret == 0 { - return 0 + ret, node := helperTripleEmphasis(p, data, 3, c) + if ret == 0 { + return 0, nil } - return ret + 3 + return ret + 3, node } - return 0 + return 0, nil } -func codeSpan(p *parser, out *bytes.Buffer, data []byte, offset int) int { +func codeSpan(p *Markdown, data []byte, offset int) (int, *Node) { data = data[offset:] nb := 0 @@ -131,7 +150,7 @@ func codeSpan(p *parser, out *bytes.Buffer, data []byte, offset int) int { // no matching delimiter? if i < nb && end >= len(data) { - return 0 + return 0, nil } // trim outside whitespace @@ -147,39 +166,36 @@ func codeSpan(p *parser, out *bytes.Buffer, data []byte, offset int) int { // render the code span if fBegin != fEnd { - p.r.CodeSpan(out, data[fBegin:fEnd]) + code := NewNode(Code) + code.Literal = data[fBegin:fEnd] + return end, code } - return end - + return end, nil } // newline preceded by two spaces becomes
    -// newline without two spaces works when EXTENSION_HARD_LINE_BREAK is enabled -func lineBreak(p *parser, out *bytes.Buffer, data []byte, offset int) int { - // remove trailing spaces from out - outBytes := out.Bytes() - end := len(outBytes) - eol := end - for eol > 0 && outBytes[eol-1] == ' ' { - eol-- - } - out.Truncate(eol) - - precededByTwoSpaces := offset >= 2 && data[offset-2] == ' ' && data[offset-1] == ' ' - precededByBackslash := offset >= 1 && data[offset-1] == '\\' // see http://spec.commonmark.org/0.18/#example-527 - precededByBackslash = precededByBackslash && p.flags&EXTENSION_BACKSLASH_LINE_BREAK != 0 - - // should there be a hard line break here? - if p.flags&EXTENSION_HARD_LINE_BREAK == 0 && !precededByTwoSpaces && !precededByBackslash { - return 0 +func maybeLineBreak(p *Markdown, data []byte, offset int) (int, *Node) { + origOffset := offset + for offset < len(data) && data[offset] == ' ' { + offset++ + } + + if offset < len(data) && data[offset] == '\n' { + if offset-origOffset >= 2 { + return offset - origOffset + 1, NewNode(Hardbreak) + } + return offset - origOffset, nil } + return 0, nil +} - if precededByBackslash && eol > 0 { - out.Truncate(eol - 1) +// newline without two spaces works when HardLineBreak is enabled +func lineBreak(p *Markdown, data []byte, offset int) (int, *Node) { + if p.extensions&HardLineBreak != 0 { + return 1, NewNode(Hardbreak) } - p.r.LineBreak(out) - return 1 + return 0, nil } type linkType int @@ -198,27 +214,43 @@ func isReferenceStyleLink(data []byte, pos int, t linkType) bool { return pos < len(data)-1 && data[pos] == '[' && data[pos+1] != '^' } +func maybeImage(p *Markdown, data []byte, offset int) (int, *Node) { + if offset < len(data)-1 && data[offset+1] == '[' { + return link(p, data, offset) + } + return 0, nil +} + +func maybeInlineFootnote(p *Markdown, data []byte, offset int) (int, *Node) { + if offset < len(data)-1 && data[offset+1] == '[' { + return link(p, data, offset) + } + return 0, nil +} + // '[': parse a link or an image or a footnote -func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { +func link(p *Markdown, data []byte, offset int) (int, *Node) { // no links allowed inside regular links, footnote, and deferred footnotes if p.insideLink && (offset > 0 && data[offset-1] == '[' || len(data)-1 > offset && data[offset+1] == '^') { - return 0 + return 0, nil } var t linkType switch { // special case: ![^text] == deferred footnote (that follows something with // an exclamation point) - case p.flags&EXTENSION_FOOTNOTES != 0 && len(data)-1 > offset && data[offset+1] == '^': + case p.extensions&Footnotes != 0 && len(data)-1 > offset && data[offset+1] == '^': t = linkDeferredFootnote // ![alt] == image - case offset > 0 && data[offset-1] == '!': + case offset >= 0 && data[offset] == '!': t = linkImg + offset++ // ^[text] == inline footnote // [^refId] == deferred footnote - case p.flags&EXTENSION_FOOTNOTES != 0: - if offset > 0 && data[offset-1] == '^' { + case p.extensions&Footnotes != 0: + if offset >= 0 && data[offset] == '^' { t = linkInlineFootnote + offset++ } else if len(data)-1 > offset && data[offset+1] == '^' { t = linkDeferredFootnote } @@ -231,7 +263,7 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { var ( i = 1 - noteId int + noteID int title, link, altContent []byte textHasNl = false ) @@ -246,7 +278,7 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { case data[i] == '\n': textHasNl = true - case data[i-1] == '\\': + case isBackslashEscaped(data, i): continue case data[i] == '[': @@ -261,11 +293,12 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { } if i >= len(data) { - return 0 + return 0, nil } txtE := i i++ + var footnoteNode *Node // skip any amount of whitespace or newline // (this is much more lax than original markdown syntax) @@ -301,7 +334,7 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { } if i >= len(data) { - return 0 + return 0, nil } linkE := i @@ -326,7 +359,7 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { } if i >= len(data) { - return 0 + return 0, nil } // skip whitespace after title @@ -378,7 +411,7 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { i++ } if i >= len(data) { - return 0 + return 0, nil } linkE := i @@ -408,7 +441,7 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { // find the reference with matching id lr, ok := p.getRef(string(id)) if !ok { - return 0 + return 0, nil } // keep link and title from reference @@ -445,9 +478,10 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { } } + footnoteNode = NewNode(Item) if t == linkInlineFootnote { // create a new reference - noteId = len(p.notes) + 1 + noteID = len(p.notes) + 1 var fragment []byte if len(id) > 0 { @@ -458,14 +492,15 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { } copy(fragment, slugify(id)) } else { - fragment = append([]byte("footnote-"), []byte(strconv.Itoa(noteId))...) + fragment = append([]byte("footnote-"), []byte(strconv.Itoa(noteID))...) } ref := &reference{ - noteId: noteId, + noteID: noteID, hasBlock: false, link: fragment, title: id, + footnote: footnoteNode, } p.notes = append(p.notes, ref) @@ -476,11 +511,12 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { // find the reference with matching id lr, ok := p.getRef(string(id)) if !ok { - return 0 + return 0, nil } if t == linkDeferredFootnote { - lr.noteId = len(p.notes) + 1 + lr.noteID = len(p.notes) + 1 + lr.footnote = footnoteNode p.notes = append(p.notes, lr) } @@ -488,27 +524,13 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { link = lr.link // if inline footnote, title == footnote contents title = lr.title - noteId = lr.noteId + noteID = lr.noteID } // rewind the whitespace i = txtE + 1 } - // build content: img alt is escaped, link content is parsed - var content bytes.Buffer - if txtE > 1 { - if t == linkImg { - content.Write(data[1:txtE]) - } else { - // links cannot contain other links, so turn off link parsing temporarily - insideLink := p.insideLink - p.insideLink = true - p.inline(&content, data[1:txtE]) - p.insideLink = insideLink - } - } - var uLink []byte if t == linkNormal || t == linkImg { if len(link) > 0 { @@ -518,49 +540,54 @@ func link(p *parser, out *bytes.Buffer, data []byte, offset int) int { } // links need something to click on and somewhere to go - if len(uLink) == 0 || (t == linkNormal && content.Len() == 0) { - return 0 + if len(uLink) == 0 || (t == linkNormal && txtE <= 1) { + return 0, nil } } // call the relevant rendering function + var linkNode *Node switch t { case linkNormal: + linkNode = NewNode(Link) + linkNode.Destination = normalizeURI(uLink) + linkNode.Title = title if len(altContent) > 0 { - p.r.Link(out, uLink, title, altContent) + linkNode.AppendChild(text(altContent)) } else { - p.r.Link(out, uLink, title, content.Bytes()) + // links cannot contain other links, so turn off link parsing + // temporarily and recurse + insideLink := p.insideLink + p.insideLink = true + p.inline(linkNode, data[1:txtE]) + p.insideLink = insideLink } case linkImg: - outSize := out.Len() - outBytes := out.Bytes() - if outSize > 0 && outBytes[outSize-1] == '!' { - out.Truncate(outSize - 1) - } - - p.r.Image(out, uLink, title, content.Bytes()) + linkNode = NewNode(Image) + linkNode.Destination = uLink + linkNode.Title = title + linkNode.AppendChild(text(data[1:txtE])) + i++ - case linkInlineFootnote: - outSize := out.Len() - outBytes := out.Bytes() - if outSize > 0 && outBytes[outSize-1] == '^' { - out.Truncate(outSize - 1) + case linkInlineFootnote, linkDeferredFootnote: + linkNode = NewNode(Link) + linkNode.Destination = link + linkNode.Title = title + linkNode.NoteID = noteID + linkNode.Footnote = footnoteNode + if t == linkInlineFootnote { + i++ } - p.r.FootnoteRef(out, link, noteId) - - case linkDeferredFootnote: - p.r.FootnoteRef(out, link, noteId) - default: - return 0 + return 0, nil } - return i + return i, linkNode } -func (p *parser) inlineHtmlComment(out *bytes.Buffer, data []byte) int { +func (p *Markdown) inlineHTMLComment(data []byte) int { if len(data) < 5 { return 0 } @@ -579,44 +606,75 @@ func (p *parser) inlineHtmlComment(out *bytes.Buffer, data []byte) int { return i + 1 } +func stripMailto(link []byte) []byte { + if bytes.HasPrefix(link, []byte("mailto://")) { + return link[9:] + } else if bytes.HasPrefix(link, []byte("mailto:")) { + return link[7:] + } else { + return link + } +} + +// autolinkType specifies a kind of autolink that gets detected. +type autolinkType int + +// These are the possible flag values for the autolink renderer. +const ( + notAutolink autolinkType = iota + normalAutolink + emailAutolink +) + // '<' when tags or autolinks are allowed -func leftAngle(p *parser, out *bytes.Buffer, data []byte, offset int) int { +func leftAngle(p *Markdown, data []byte, offset int) (int, *Node) { data = data[offset:] - altype := LINK_TYPE_NOT_AUTOLINK - end := tagLength(data, &altype) - if size := p.inlineHtmlComment(out, data); size > 0 { + altype, end := tagLength(data) + if size := p.inlineHTMLComment(data); size > 0 { end = size } if end > 2 { - if altype != LINK_TYPE_NOT_AUTOLINK { + if altype != notAutolink { var uLink bytes.Buffer unescapeText(&uLink, data[1:end+1-2]) if uLink.Len() > 0 { - p.r.AutoLink(out, uLink.Bytes(), altype) + link := uLink.Bytes() + node := NewNode(Link) + node.Destination = link + if altype == emailAutolink { + node.Destination = append([]byte("mailto:"), link...) + } + node.AppendChild(text(stripMailto(link))) + return end, node } } else { - p.r.RawHtmlTag(out, data[:end]) + htmlTag := NewNode(HTMLSpan) + htmlTag.Literal = data[:end] + return end, htmlTag } } - return end + return end, nil } // '\\' backslash escape var escapeChars = []byte("\\`*_{}[]()#+-.!:|&<>~") -func escape(p *parser, out *bytes.Buffer, data []byte, offset int) int { +func escape(p *Markdown, data []byte, offset int) (int, *Node) { data = data[offset:] if len(data) > 1 { + if p.extensions&BackslashLineBreak != 0 && data[1] == '\n' { + return 2, NewNode(Hardbreak) + } if bytes.IndexByte(escapeChars, data[1]) < 0 { - return 0 + return 0, nil } - p.r.NormalText(out, data[1:2]) + return 2, text(data[1:2]) } - return 2 + return 2, nil } func unescapeText(ob *bytes.Buffer, src []byte) { @@ -642,7 +700,7 @@ func unescapeText(ob *bytes.Buffer, src []byte) { // '&' escaped when it doesn't belong to an entity // valid entities are assumed to be anything matching &#?[A-Za-z0-9]+; -func entity(p *parser, out *bytes.Buffer, data []byte, offset int) int { +func entity(p *Markdown, data []byte, offset int) (int, *Node) { data = data[offset:] end := 1 @@ -658,25 +716,70 @@ func entity(p *parser, out *bytes.Buffer, data []byte, offset int) int { if end < len(data) && data[end] == ';' { end++ // real entity } else { - return 0 // lone '&' + return 0, nil // lone '&' } - p.r.Entity(out, data[:end]) + ent := data[:end] + // undo & escaping or it will be converted to &amp; by another + // escaper in the renderer + if bytes.Equal(ent, []byte("&")) { + ent = []byte{'&'} + } - return end + return end, text(ent) } func linkEndsWithEntity(data []byte, linkEnd int) bool { - entityRanges := htmlEntity.FindAllIndex(data[:linkEnd], -1) + entityRanges := htmlEntityRe.FindAllIndex(data[:linkEnd], -1) return entityRanges != nil && entityRanges[len(entityRanges)-1][1] == linkEnd } -func autoLink(p *parser, out *bytes.Buffer, data []byte, offset int) int { - // quick check to rule out most false hits on ':' - if p.insideLink || len(data) < offset+3 || data[offset+1] != '/' || data[offset+2] != '/' { - return 0 +// hasPrefixCaseInsensitive is a custom implementation of +// strings.HasPrefix(strings.ToLower(s), prefix) +// we rolled our own because ToLower pulls in a huge machinery of lowercasing +// anything from Unicode and that's very slow. Since this func will only be +// used on ASCII protocol prefixes, we can take shortcuts. +func hasPrefixCaseInsensitive(s, prefix []byte) bool { + if len(s) < len(prefix) { + return false + } + delta := byte('a' - 'A') + for i, b := range prefix { + if b != s[i] && b != s[i]+delta { + return false + } } + return true +} + +var protocolPrefixes = [][]byte{ + []byte("http://"), + []byte("https://"), + []byte("ftp://"), + []byte("file://"), + []byte("mailto:"), +} + +const shortestPrefix = 6 // len("ftp://"), the shortest of the above +func maybeAutoLink(p *Markdown, data []byte, offset int) (int, *Node) { + // quick check to rule out most false hits + if p.insideLink || len(data) < offset+shortestPrefix { + return 0, nil + } + for _, prefix := range protocolPrefixes { + endOfHead := offset + 8 // 8 is the len() of the longest prefix + if endOfHead > len(data) { + endOfHead = len(data) + } + if hasPrefixCaseInsensitive(data[offset:endOfHead], prefix) { + return autoLink(p, data, offset) + } + } + return 0, nil +} + +func autoLink(p *Markdown, data []byte, offset int) (int, *Node) { // Now a more expensive check to see if we're not inside an anchor element anchorStart := offset offsetFromAnchor := 0 @@ -687,8 +790,9 @@ func autoLink(p *parser, out *bytes.Buffer, data []byte, offset int) int { anchorStr := anchorRe.Find(data[anchorStart:]) if anchorStr != nil { - out.Write(anchorStr[offsetFromAnchor:]) - return len(anchorStr) - offsetFromAnchor + anchorClose := NewNode(HTMLSpan) + anchorClose.Literal = anchorStr[offsetFromAnchor:] + return len(anchorStr) - offsetFromAnchor, anchorClose } // scan backward for a word boundary @@ -697,14 +801,14 @@ func autoLink(p *parser, out *bytes.Buffer, data []byte, offset int) int { rewind++ } if rewind > 6 { // longest supported protocol is "mailto" which has 6 letters - return 0 + return 0, nil } origData := data data = data[offset-rewind:] if !isSafeLink(data) { - return 0 + return 0, nil } linkEnd := 0 @@ -781,19 +885,17 @@ func autoLink(p *parser, out *bytes.Buffer, data []byte, offset int) int { } } - // we were triggered on the ':', so we need to rewind the output a bit - if out.Len() >= rewind { - out.Truncate(len(out.Bytes()) - rewind) - } - var uLink bytes.Buffer unescapeText(&uLink, data[:linkEnd]) if uLink.Len() > 0 { - p.r.AutoLink(out, uLink.Bytes(), LINK_TYPE_NORMAL) + node := NewNode(Link) + node.Destination = uLink.Bytes() + node.AppendChild(text(uLink.Bytes())) + return linkEnd, node } - return linkEnd - rewind + return linkEnd, nil } func isEndOfLink(char byte) bool { @@ -826,17 +928,17 @@ func isSafeLink(link []byte) bool { } // return the length of the given tag, or 0 is it's not valid -func tagLength(data []byte, autolink *int) int { +func tagLength(data []byte) (autolink autolinkType, end int) { var i, j int // a valid tag can't be shorter than 3 chars if len(data) < 3 { - return 0 + return notAutolink, 0 } // begins with a '<' optionally followed by '/', followed by letter or number if data[0] != '<' { - return 0 + return notAutolink, 0 } if data[1] == '/' { i = 2 @@ -845,11 +947,11 @@ func tagLength(data []byte, autolink *int) int { } if !isalnum(data[i]) { - return 0 + return notAutolink, 0 } // scheme test - *autolink = LINK_TYPE_NOT_AUTOLINK + autolink = notAutolink // try to find the beginning of an URI for i < len(data) && (isalnum(data[i]) || data[i] == '.' || data[i] == '+' || data[i] == '-') { @@ -858,21 +960,20 @@ func tagLength(data []byte, autolink *int) int { if i > 1 && i < len(data) && data[i] == '@' { if j = isMailtoAutoLink(data[i:]); j != 0 { - *autolink = LINK_TYPE_EMAIL - return i + j + return emailAutolink, i + j } } if i > 2 && i < len(data) && data[i] == ':' { - *autolink = LINK_TYPE_NORMAL + autolink = normalAutolink i++ } // complete autolink test: no whitespace or ' or " switch { case i >= len(data): - *autolink = LINK_TYPE_NOT_AUTOLINK - case *autolink != 0: + autolink = notAutolink + case autolink != notAutolink: j = i for i < len(data) { @@ -887,24 +988,20 @@ func tagLength(data []byte, autolink *int) int { } if i >= len(data) { - return 0 + return autolink, 0 } if i > j && data[i] == '>' { - return i + 1 + return autolink, i + 1 } // one of the forbidden chars has been found - *autolink = LINK_TYPE_NOT_AUTOLINK - } - - // look for something looking like a tag end - for i < len(data) && data[i] != '>' { - i++ + autolink = notAutolink } - if i >= len(data) { - return 0 + i += bytes.IndexByte(data[i:], '>') + if i < 0 { + return autolink, 0 } - return i + 1 + return autolink, i + 1 } // look for the address part of a mail autolink and '>' @@ -928,9 +1025,8 @@ func isMailtoAutoLink(data []byte) int { case '>': if nb == 1 { return i + 1 - } else { - return 0 } + return 0 default: return 0 } @@ -993,9 +1089,8 @@ func helperFindEmphChar(data []byte, c byte) int { if data[i] != '[' && data[i] != '(' { // not a link if tmpI > 0 { return tmpI - } else { - continue } + continue } cc := data[i] i++ @@ -1014,7 +1109,7 @@ func helperFindEmphChar(data []byte, c byte) int { return 0 } -func helperEmphasis(p *parser, out *bytes.Buffer, data []byte, c byte) int { +func helperEmphasis(p *Markdown, data []byte, c byte) (int, *Node) { i := 0 // skip one symbol if coming from emph3 @@ -1025,11 +1120,11 @@ func helperEmphasis(p *parser, out *bytes.Buffer, data []byte, c byte) int { for i < len(data) { length := helperFindEmphChar(data[i:], c) if length == 0 { - return 0 + return 0, nil } i += length if i >= len(data) { - return 0 + return 0, nil } if i+1 < len(data) && data[i+1] == c { @@ -1039,52 +1134,46 @@ func helperEmphasis(p *parser, out *bytes.Buffer, data []byte, c byte) int { if data[i] == c && !isspace(data[i-1]) { - if p.flags&EXTENSION_NO_INTRA_EMPHASIS != 0 { + if p.extensions&NoIntraEmphasis != 0 { if !(i+1 == len(data) || isspace(data[i+1]) || ispunct(data[i+1])) { continue } } - var work bytes.Buffer - p.inline(&work, data[:i]) - p.r.Emphasis(out, work.Bytes()) - return i + 1 + emph := NewNode(Emph) + p.inline(emph, data[:i]) + return i + 1, emph } } - return 0 + return 0, nil } -func helperDoubleEmphasis(p *parser, out *bytes.Buffer, data []byte, c byte) int { +func helperDoubleEmphasis(p *Markdown, data []byte, c byte) (int, *Node) { i := 0 for i < len(data) { length := helperFindEmphChar(data[i:], c) if length == 0 { - return 0 + return 0, nil } i += length if i+1 < len(data) && data[i] == c && data[i+1] == c && i > 0 && !isspace(data[i-1]) { - var work bytes.Buffer - p.inline(&work, data[:i]) - - if work.Len() > 0 { - // pick the right renderer - if c == '~' { - p.r.StrikeThrough(out, work.Bytes()) - } else { - p.r.DoubleEmphasis(out, work.Bytes()) - } + nodeType := Strong + if c == '~' { + nodeType = Del } - return i + 2 + node := NewNode(nodeType) + p.inline(node, data[:i]) + return i + 2, node } i++ } - return 0 + return 0, nil } -func helperTripleEmphasis(p *parser, out *bytes.Buffer, data []byte, offset int, c byte) int { +func helperTripleEmphasis(p *Markdown, data []byte, offset int, c byte) (int, *Node) { i := 0 origData := data data = data[offset:] @@ -1092,7 +1181,7 @@ func helperTripleEmphasis(p *parser, out *bytes.Buffer, data []byte, offset int, for i < len(data) { length := helperFindEmphChar(data[i:], c) if length == 0 { - return 0 + return 0, nil } i += length @@ -1104,30 +1193,36 @@ func helperTripleEmphasis(p *parser, out *bytes.Buffer, data []byte, offset int, switch { case i+2 < len(data) && data[i+1] == c && data[i+2] == c: // triple symbol found - var work bytes.Buffer - - p.inline(&work, data[:i]) - if work.Len() > 0 { - p.r.TripleEmphasis(out, work.Bytes()) - } - return i + 3 + strong := NewNode(Strong) + em := NewNode(Emph) + strong.AppendChild(em) + p.inline(em, data[:i]) + return i + 3, strong case (i+1 < len(data) && data[i+1] == c): // double symbol found, hand over to emph1 - length = helperEmphasis(p, out, origData[offset-2:], c) + length, node := helperEmphasis(p, origData[offset-2:], c) if length == 0 { - return 0 - } else { - return length - 2 + return 0, nil } + return length - 2, node default: // single symbol found, hand over to emph2 - length = helperDoubleEmphasis(p, out, origData[offset-1:], c) + length, node := helperDoubleEmphasis(p, origData[offset-1:], c) if length == 0 { - return 0 - } else { - return length - 1 + return 0, nil } + return length - 1, node } } - return 0 + return 0, nil +} + +func text(s []byte) *Node { + node := NewNode(Text) + node.Literal = s + return node +} + +func normalizeURI(s []byte) []byte { + return s // TODO: implement } diff --git a/inline_test.go b/inline_test.go index d793e019..5a91d8aa 100644 --- a/inline_test.go +++ b/inline_test.go @@ -20,91 +20,8 @@ import ( "strings" ) -func runMarkdownInline(input string, opts Options, htmlFlags int, params HtmlRendererParameters) string { - opts.Extensions |= EXTENSION_AUTOLINK - opts.Extensions |= EXTENSION_STRIKETHROUGH - - htmlFlags |= HTML_USE_XHTML - - renderer := HtmlRendererWithParameters(htmlFlags, "", "", params) - - return string(MarkdownOptions([]byte(input), renderer, opts)) -} - -func doTestsInline(t *testing.T, tests []string) { - doTestsInlineParam(t, tests, Options{}, 0, HtmlRendererParameters{}) -} - -func doLinkTestsInline(t *testing.T, tests []string) { - doTestsInline(t, tests) - - prefix := "http://localhost" - params := HtmlRendererParameters{AbsolutePrefix: prefix} - transformTests := transformLinks(tests, prefix) - doTestsInlineParam(t, transformTests, Options{}, 0, params) - doTestsInlineParam(t, transformTests, Options{}, commonHtmlFlags, params) -} - -func doSafeTestsInline(t *testing.T, tests []string) { - doTestsInlineParam(t, tests, Options{}, HTML_SAFELINK, HtmlRendererParameters{}) - - // All the links in this test should not have the prefix appended, so - // just rerun it with different parameters and the same expectations. - prefix := "http://localhost" - params := HtmlRendererParameters{AbsolutePrefix: prefix} - transformTests := transformLinks(tests, prefix) - doTestsInlineParam(t, transformTests, Options{}, HTML_SAFELINK, params) -} - -func doTestsInlineParam(t *testing.T, tests []string, opts Options, htmlFlags int, - params HtmlRendererParameters) { - // catch and report panics - var candidate string - /* - defer func() { - if err := recover(); err != nil { - t.Errorf("\npanic while processing [%#v] (%v)\n", candidate, err) - } - }() - */ - - for i := 0; i+1 < len(tests); i += 2 { - input := tests[i] - candidate = input - expected := tests[i+1] - actual := runMarkdownInline(candidate, opts, htmlFlags, params) - if actual != expected { - t.Errorf("\nInput [%#v]\nExpected[%#v]\nActual [%#v]", - candidate, expected, actual) - } - - // now test every substring to stress test bounds checking - if !testing.Short() { - for start := 0; start < len(input); start++ { - for end := start + 1; end <= len(input); end++ { - candidate = input[start:end] - _ = runMarkdownInline(candidate, opts, htmlFlags, params) - } - } - } - } -} - -func transformLinks(tests []string, prefix string) []string { - newTests := make([]string, len(tests)) - anchorRe := regexp.MustCompile(`nothing inline

    \n", @@ -161,6 +78,7 @@ func TestEmphasis(t *testing.T) { } func TestReferenceOverride(t *testing.T) { + t.Parallel() var tests = []string{ "test [ref1][]\n", "

    test ref1

    \n", @@ -183,11 +101,11 @@ func TestReferenceOverride(t *testing.T) { "test [ref5][]\n", "

    test Moo

    \n", } - doTestsInlineParam(t, tests, Options{ - ReferenceOverride: func(reference string) (rv *Reference, overridden bool) { + doTestsInlineParam(t, tests, TestParams{ + referenceOverride: func(reference string) (rv *Reference, overridden bool) { switch reference { case "ref1": - // just an overriden reference exists without definition + // just an overridden reference exists without definition return &Reference{ Link: "http://www.ref1.com/", Title: "Reference 1"}, true @@ -214,10 +132,12 @@ func TestReferenceOverride(t *testing.T) { }, true } return nil, false - }}, 0, HtmlRendererParameters{}) + }, + }) } func TestStrong(t *testing.T) { + t.Parallel() var tests = []string{ "nothing inline\n", "

    nothing inline

    \n", @@ -277,6 +197,7 @@ func TestStrong(t *testing.T) { } func TestEmphasisMix(t *testing.T) { + t.Parallel() var tests = []string{ "***triple emphasis***\n", "

    triple emphasis

    \n", @@ -306,6 +227,7 @@ func TestEmphasisMix(t *testing.T) { } func TestEmphasisLink(t *testing.T) { + t.Parallel() var tests = []string{ "[first](before) *text[second] (inside)text* [third](after)\n", "

    first textsecondtext third

    \n", @@ -323,6 +245,7 @@ func TestEmphasisLink(t *testing.T) { } func TestStrikeThrough(t *testing.T) { + t.Parallel() var tests = []string{ "nothing inline\n", "

    nothing inline

    \n", @@ -352,6 +275,7 @@ func TestStrikeThrough(t *testing.T) { } func TestCodeSpan(t *testing.T) { + t.Parallel() var tests = []string{ "`source code`\n", "

    source code

    \n", @@ -390,6 +314,7 @@ func TestCodeSpan(t *testing.T) { } func TestLineBreak(t *testing.T) { + t.Parallel() var tests = []string{ "this line \nhas a break\n", "

    this line
    \nhas a break

    \n", @@ -424,12 +349,12 @@ func TestLineBreak(t *testing.T) { "this has an \nextra space\n", "

    this has an
    \nextra space

    \n", } - doTestsInlineParam(t, tests, Options{ - Extensions: EXTENSION_BACKSLASH_LINE_BREAK}, - 0, HtmlRendererParameters{}) + doTestsInlineParam(t, tests, TestParams{ + extensions: BackslashLineBreak}) } func TestInlineLink(t *testing.T) { + t.Parallel() var tests = []string{ "[foo](/bar/)\n", "

    foo

    \n", @@ -547,6 +472,7 @@ func TestInlineLink(t *testing.T) { } func TestRelAttrLink(t *testing.T) { + t.Parallel() var nofollowTests = []string{ "[foo](http://bar.com/foo/)\n", "

    foo

    \n", @@ -566,8 +492,9 @@ func TestRelAttrLink(t *testing.T) { "[foo](../bar)\n", "

    foo

    \n", } - doTestsInlineParam(t, nofollowTests, Options{}, HTML_SAFELINK|HTML_NOFOLLOW_LINKS, - HtmlRendererParameters{}) + doTestsInlineParam(t, nofollowTests, TestParams{ + HTMLFlags: Safelink | NofollowLinks, + }) var noreferrerTests = []string{ "[foo](http://bar.com/foo/)\n", @@ -576,21 +503,35 @@ func TestRelAttrLink(t *testing.T) { "[foo](/bar/)\n", "

    foo

    \n", } - doTestsInlineParam(t, noreferrerTests, Options{}, HTML_SAFELINK|HTML_NOREFERRER_LINKS, - HtmlRendererParameters{}) + doTestsInlineParam(t, noreferrerTests, TestParams{ + HTMLFlags: Safelink | NoreferrerLinks, + }) + + var noopenerTests = []string{ + "[foo](http://bar.com/foo/)\n", + "

    foo

    \n", + + "[foo](/bar/)\n", + "

    foo

    \n", + } + doTestsInlineParam(t, noopenerTests, TestParams{ + HTMLFlags: Safelink | NoopenerLinks, + }) - var nofollownoreferrerTests = []string{ + var nofollownoreferrernoopenerTests = []string{ "[foo](http://bar.com/foo/)\n", - "

    foo

    \n", + "

    foo

    \n", "[foo](/bar/)\n", "

    foo

    \n", } - doTestsInlineParam(t, nofollownoreferrerTests, Options{}, HTML_SAFELINK|HTML_NOFOLLOW_LINKS|HTML_NOREFERRER_LINKS, - HtmlRendererParameters{}) + doTestsInlineParam(t, nofollownoreferrernoopenerTests, TestParams{ + HTMLFlags: Safelink | NofollowLinks | NoreferrerLinks | NoopenerLinks, + }) } func TestHrefTargetBlank(t *testing.T) { + t.Parallel() var tests = []string{ // internal link "[foo](/bar/)\n", @@ -614,10 +555,13 @@ func TestHrefTargetBlank(t *testing.T) { "[foo](http://example.com)\n", "

    foo

    \n", } - doTestsInlineParam(t, tests, Options{}, HTML_SAFELINK|HTML_HREF_TARGET_BLANK, HtmlRendererParameters{}) + doTestsInlineParam(t, tests, TestParams{ + HTMLFlags: Safelink | HrefTargetBlank, + }) } func TestSafeInlineLink(t *testing.T) { + t.Parallel() var tests = []string{ "[foo](/bar/)\n", "

    foo

    \n", @@ -651,6 +595,7 @@ func TestSafeInlineLink(t *testing.T) { } func TestReferenceLink(t *testing.T) { + t.Parallel() var tests = []string{ "[link][ref]\n", "

    [link][ref]

    \n", @@ -681,11 +626,15 @@ func TestReferenceLink(t *testing.T) { "[ref]\n [ref]: ../url/ \"title\"\n", "

    ref

    \n", + + "[link][ref]\n [ref]: /url/", + "

    link

    \n", } doLinkTestsInline(t, tests) } func TestTags(t *testing.T) { + t.Parallel() var tests = []string{ "a tag\n", "

    a tag

    \n", @@ -703,6 +652,7 @@ func TestTags(t *testing.T) { } func TestAutoLink(t *testing.T) { + t.Parallel() var tests = []string{ "http://foo.com/\n", "

    http://foo.com/

    \n", @@ -808,21 +758,25 @@ func TestAutoLink(t *testing.T) { "http://foo.com/viewtopic.php?param="18"", "

    http://foo.com/viewtopic.php?param="18"

    \n", + + "https://fancy.com\n", + "

    https://fancy.com

    \n", } doLinkTestsInline(t, tests) } var footnoteTests = []string{ "testing footnotes.[^a]\n\n[^a]: This is the note\n", - `

    testing footnotes.1

    + `

    testing footnotes.1

    +

      -
    1. This is the note -
    2. +
    3. This is the note
    +
    `, @@ -838,9 +792,10 @@ var footnoteTests = []string{ No longer in the footnote `, - `

    testing long1 notes.

    + `

    testing long1 notes.

    No longer in the footnote

    +

    @@ -854,9 +809,9 @@ No longer in the footnote some code

    -

    Paragraph 3

    -
  • +

    Paragraph 3

    + `, @@ -874,26 +829,28 @@ what happens here [note]: /link/c `, - `

    testing1 multiple2 notes.

    + `

    testing1 multiple2 notes.

    omg

    what happens here

    +

      -
    1. this is note c -
    2. -
    3. this is note d -
    4. +
    5. this is note c
    6. + +
    7. this is note d
    +
    `, "testing inline^[this is the note] notes.\n", - `

    testing inline1 notes.

    + `

    testing inline1 notes.

    +

    @@ -901,11 +858,13 @@ what happens here
    1. this is the note
    +
    `, "testing multiple[^1] types^[inline note] of notes[^2]\n\n[^2]: the second deferred note\n[^1]: the first deferred note\n\n\twhich happens to be a block\n", - `

    testing multiple1 types2 of notes3

    + `

    testing multiple1 types2 of notes3

    +

    @@ -913,12 +872,13 @@ what happens here
    1. the first deferred note

      -

      which happens to be a block

      -
    2. +

      which happens to be a block

      +
    3. inline note
    4. -
    5. the second deferred note -
    6. + +
    7. the second deferred note
    +
    `, @@ -928,7 +888,8 @@ what happens here may be multiple paragraphs. `, - `

    This is a footnote12

    + `

    This is a footnote12

    +

    @@ -936,21 +897,22 @@ what happens here
    1. the footnote text.

      -

      may be multiple paragraphs.

      -
    2. +

      may be multiple paragraphs.

      +
    3. and this is an inline footnote
    +
    `, "empty footnote[^]\n\n[^]: fn text", - "

    empty footnote1

    \n
    \n\n
    \n\n
      \n
    1. fn text\n
    2. \n
    \n
    \n", + "

    empty footnote1

    \n\n
    \n\n
    \n\n
      \n
    1. fn text
    2. \n
    \n\n
    \n", "Some text.[^note1]\n\n[^note1]: fn1", - "

    Some text.1

    \n
    \n\n
    \n\n
      \n
    1. fn1\n
    2. \n
    \n
    \n", + "

    Some text.1

    \n\n
    \n\n
    \n\n
      \n
    1. fn1
    2. \n
    \n\n
    \n", "Some text.[^note1][^note2]\n\n[^note1]: fn1\n[^note2]: fn2\n", - "

    Some text.12

    \n
    \n\n
    \n\n
      \n
    1. fn1\n
    2. \n
    3. fn2\n
    4. \n
    \n
    \n", + "

    Some text.12

    \n\n
    \n\n
    \n\n
      \n
    1. fn1
    2. \n\n
    3. fn2
    4. \n
    \n\n
    \n", `Bla bla [^1] [WWW][w3] @@ -958,15 +920,16 @@ what happens here [w3]: http://www.w3.org/ `, - `

    Bla bla 1 WWW

    + `

    Bla bla 1 WWW

    +

      -
    1. This is a footnote -
    2. +
    3. This is a footnote
    +
    `, @@ -974,24 +937,35 @@ what happens here [^fn1]: Fine print `, - `

    This is exciting!1

    + `

    This is exciting!1

    +

      -
    1. Fine print -
    2. +
    3. Fine print
    +
    `, + + `This text does not reference a footnote. + +[^footnote]: But it has a footnote! And it gets omitted. +`, + "

    This text does not reference a footnote.

    \n", } func TestFootnotes(t *testing.T) { - doTestsInlineParam(t, footnoteTests, Options{Extensions: EXTENSION_FOOTNOTES}, 0, HtmlRendererParameters{}) + t.Parallel() + doTestsInlineParam(t, footnoteTests, TestParams{ + extensions: Footnotes, + }) } func TestFootnotesWithParameters(t *testing.T) { + t.Parallel() tests := make([]string, len(footnoteTests)) prefix := "testPrefix" @@ -1008,15 +982,20 @@ func TestFootnotesWithParameters(t *testing.T) { tests[i] = test } - params := HtmlRendererParameters{ + params := HTMLRendererParameters{ FootnoteAnchorPrefix: prefix, FootnoteReturnLinkContents: returnText, } - doTestsInlineParam(t, tests, Options{Extensions: EXTENSION_FOOTNOTES}, HTML_FOOTNOTE_RETURN_LINKS, params) + doTestsInlineParam(t, tests, TestParams{ + extensions: Footnotes, + HTMLFlags: FootnoteReturnLinks, + HTMLRendererParameters: params, + }) } func TestNestedFootnotes(t *testing.T) { + t.Parallel() var tests = []string{ `Paragraph.[^fn1] @@ -1025,25 +1004,26 @@ func TestNestedFootnotes(t *testing.T) { [^fn2]: Obelisk`, - `

    Paragraph.1

    + `

    Paragraph.1

    +

      -
    1. Asterisk2 -
    2. -
    3. Obelisk -
    4. +
    5. Asterisk2
    6. + +
    7. Obelisk
    +
    `, } - doTestsInlineParam(t, tests, Options{Extensions: EXTENSION_FOOTNOTES}, 0, - HtmlRendererParameters{}) + doTestsInlineParam(t, tests, TestParams{extensions: Footnotes}) } func TestInlineComments(t *testing.T) { + t.Parallel() var tests = []string{ "Hello \nrhubarb\n", "

    blahblah\n\nrhubarb

    \n", } - doTestsInlineParam(t, tests, Options{}, HTML_USE_SMARTYPANTS|HTML_SMARTYPANTS_DASHES, HtmlRendererParameters{}) + doTestsInlineParam(t, tests, TestParams{HTMLFlags: Smartypants | SmartypantsDashes}) } func TestSmartDoubleQuotes(t *testing.T) { + t.Parallel() var tests = []string{ "this should be normal \"quoted\" text.\n", "

    this should be normal “quoted” text.

    \n", @@ -1081,10 +1062,24 @@ func TestSmartDoubleQuotes(t *testing.T) { "two pair of \"some\" quoted \"text\".\n", "

    two pair of “some” quoted “text”.

    \n"} - doTestsInlineParam(t, tests, Options{}, HTML_USE_SMARTYPANTS, HtmlRendererParameters{}) + doTestsInlineParam(t, tests, TestParams{HTMLFlags: Smartypants}) +} + +func TestSmartDoubleQuotesNBSP(t *testing.T) { + t.Parallel() + var tests = []string{ + "this should be normal \"quoted\" text.\n", + "

    this should be normal “ quoted ” text.

    \n", + "this \" single double\n", + "

    this “  single double

    \n", + "two pair of \"some\" quoted \"text\".\n", + "

    two pair of “ some ” quoted “ text ”.

    \n"} + + doTestsInlineParam(t, tests, TestParams{HTMLFlags: Smartypants | SmartypantsQuotesNBSP}) } func TestSmartAngledDoubleQuotes(t *testing.T) { + t.Parallel() var tests = []string{ "this should be angled \"quoted\" text.\n", "

    this should be angled «quoted» text.

    \n", @@ -1093,17 +1088,31 @@ func TestSmartAngledDoubleQuotes(t *testing.T) { "two pair of \"some\" quoted \"text\".\n", "

    two pair of «some» quoted «text».

    \n"} - doTestsInlineParam(t, tests, Options{}, HTML_USE_SMARTYPANTS|HTML_SMARTYPANTS_ANGLED_QUOTES, HtmlRendererParameters{}) + doTestsInlineParam(t, tests, TestParams{HTMLFlags: Smartypants | SmartypantsAngledQuotes}) +} + +func TestSmartAngledDoubleQuotesNBSP(t *testing.T) { + t.Parallel() + var tests = []string{ + "this should be angled \"quoted\" text.\n", + "

    this should be angled « quoted » text.

    \n", + "this \" single double\n", + "

    this «  single double

    \n", + "two pair of \"some\" quoted \"text\".\n", + "

    two pair of « some » quoted « text ».

    \n"} + + doTestsInlineParam(t, tests, TestParams{HTMLFlags: Smartypants | SmartypantsAngledQuotes | SmartypantsQuotesNBSP}) } func TestSmartFractions(t *testing.T) { + t.Parallel() var tests = []string{ "1/2, 1/4 and 3/4; 1/4th and 3/4ths\n", "

    ½, ¼ and ¾; ¼th and ¾ths

    \n", "1/2/2015, 1/4/2015, 3/4/2015; 2015/1/2, 2015/1/4, 2015/3/4.\n", "

    1/2/2015, 1/4/2015, 3/4/2015; 2015/1/2, 2015/1/4, 2015/3/4.

    \n"} - doTestsInlineParam(t, tests, Options{}, HTML_USE_SMARTYPANTS, HtmlRendererParameters{}) + doTestsInlineParam(t, tests, TestParams{HTMLFlags: Smartypants}) tests = []string{ "1/2, 2/3, 81/100 and 1000000/1048576.\n", @@ -1111,10 +1120,11 @@ func TestSmartFractions(t *testing.T) { "1/2/2015, 1/4/2015, 3/4/2015; 2015/1/2, 2015/1/4, 2015/3/4.\n", "

    1/2/2015, 1/4/2015, 3/4/2015; 2015/1/2, 2015/1/4, 2015/3/4.

    \n"} - doTestsInlineParam(t, tests, Options{}, HTML_USE_SMARTYPANTS|HTML_SMARTYPANTS_FRACTIONS, HtmlRendererParameters{}) + doTestsInlineParam(t, tests, TestParams{HTMLFlags: Smartypants | SmartypantsFractions}) } func TestDisableSmartDashes(t *testing.T) { + t.Parallel() doTestsInlineParam(t, []string{ "foo - bar\n", "

    foo - bar

    \n", @@ -1122,7 +1132,7 @@ func TestDisableSmartDashes(t *testing.T) { "

    foo -- bar

    \n", "foo --- bar\n", "

    foo --- bar

    \n", - }, Options{}, 0, HtmlRendererParameters{}) + }, TestParams{}) doTestsInlineParam(t, []string{ "foo - bar\n", "

    foo – bar

    \n", @@ -1130,7 +1140,7 @@ func TestDisableSmartDashes(t *testing.T) { "

    foo — bar

    \n", "foo --- bar\n", "

    foo —– bar

    \n", - }, Options{}, HTML_USE_SMARTYPANTS|HTML_SMARTYPANTS_DASHES, HtmlRendererParameters{}) + }, TestParams{HTMLFlags: Smartypants | SmartypantsDashes}) doTestsInlineParam(t, []string{ "foo - bar\n", "

    foo - bar

    \n", @@ -1138,8 +1148,7 @@ func TestDisableSmartDashes(t *testing.T) { "

    foo – bar

    \n", "foo --- bar\n", "

    foo — bar

    \n", - }, Options{}, HTML_USE_SMARTYPANTS|HTML_SMARTYPANTS_LATEX_DASHES|HTML_SMARTYPANTS_DASHES, - HtmlRendererParameters{}) + }, TestParams{HTMLFlags: Smartypants | SmartypantsLatexDashes | SmartypantsDashes}) doTestsInlineParam(t, []string{ "foo - bar\n", "

    foo - bar

    \n", @@ -1147,7 +1156,61 @@ func TestDisableSmartDashes(t *testing.T) { "

    foo -- bar

    \n", "foo --- bar\n", "

    foo --- bar

    \n", - }, Options{}, - HTML_USE_SMARTYPANTS|HTML_SMARTYPANTS_LATEX_DASHES, - HtmlRendererParameters{}) + }, TestParams{HTMLFlags: Smartypants | SmartypantsLatexDashes}) +} + +func TestSkipLinks(t *testing.T) { + t.Parallel() + doTestsInlineParam(t, []string{ + "[foo](gopher://foo.bar)", + "

    foo

    \n", + + "[foo](mailto://bar/)\n", + "

    foo

    \n", + }, TestParams{ + HTMLFlags: SkipLinks, + }) +} + +func TestSkipImages(t *testing.T) { + t.Parallel() + doTestsInlineParam(t, []string{ + "![foo](/bar/)\n", + "

    \n", + }, TestParams{ + HTMLFlags: SkipImages, + }) +} + +func TestUseXHTML(t *testing.T) { + t.Parallel() + doTestsParam(t, []string{ + "---", + "
    \n", + }, TestParams{}) + doTestsParam(t, []string{ + "---", + "
    \n", + }, TestParams{HTMLFlags: UseXHTML}) +} + +func TestSkipHTML(t *testing.T) { + t.Parallel() + doTestsParam(t, []string{ + "
    \n\ntext\n\n
    the form
    ", + "

    text

    \n\n

    the form

    \n", + + "text inline html more text", + "

    text inline html more text

    \n", + }, TestParams{HTMLFlags: SkipHTML}) +} + +func BenchmarkSmartDoubleQuotes(b *testing.B) { + params := TestParams{HTMLFlags: Smartypants} + params.extensions |= Autolink | Strikethrough + params.HTMLFlags |= UseXHTML + + for i := 0; i < b.N; i++ { + runMarkdown("this should be normal \"quoted\" text.\n", params) + } } diff --git a/latex.go b/latex.go deleted file mode 100644 index 70705aa9..00000000 --- a/latex.go +++ /dev/null @@ -1,332 +0,0 @@ -// -// Blackfriday Markdown Processor -// Available at http://github.com/russross/blackfriday -// -// Copyright © 2011 Russ Ross . -// Distributed under the Simplified BSD License. -// See README.md for details. -// - -// -// -// LaTeX rendering backend -// -// - -package blackfriday - -import ( - "bytes" -) - -// Latex is a type that implements the Renderer interface for LaTeX output. -// -// Do not create this directly, instead use the LatexRenderer function. -type Latex struct { -} - -// LatexRenderer creates and configures a Latex object, which -// satisfies the Renderer interface. -// -// flags is a set of LATEX_* options ORed together (currently no such options -// are defined). -func LatexRenderer(flags int) Renderer { - return &Latex{} -} - -func (options *Latex) GetFlags() int { - return 0 -} - -// render code chunks using verbatim, or listings if we have a language -func (options *Latex) BlockCode(out *bytes.Buffer, text []byte, lang string) { - if lang == "" { - out.WriteString("\n\\begin{verbatim}\n") - } else { - out.WriteString("\n\\begin{lstlisting}[language=") - out.WriteString(lang) - out.WriteString("]\n") - } - out.Write(text) - if lang == "" { - out.WriteString("\n\\end{verbatim}\n") - } else { - out.WriteString("\n\\end{lstlisting}\n") - } -} - -func (options *Latex) TitleBlock(out *bytes.Buffer, text []byte) { - -} - -func (options *Latex) BlockQuote(out *bytes.Buffer, text []byte) { - out.WriteString("\n\\begin{quotation}\n") - out.Write(text) - out.WriteString("\n\\end{quotation}\n") -} - -func (options *Latex) BlockHtml(out *bytes.Buffer, text []byte) { - // a pretty lame thing to do... - out.WriteString("\n\\begin{verbatim}\n") - out.Write(text) - out.WriteString("\n\\end{verbatim}\n") -} - -func (options *Latex) Header(out *bytes.Buffer, text func() bool, level int, id string) { - marker := out.Len() - - switch level { - case 1: - out.WriteString("\n\\section{") - case 2: - out.WriteString("\n\\subsection{") - case 3: - out.WriteString("\n\\subsubsection{") - case 4: - out.WriteString("\n\\paragraph{") - case 5: - out.WriteString("\n\\subparagraph{") - case 6: - out.WriteString("\n\\textbf{") - } - if !text() { - out.Truncate(marker) - return - } - out.WriteString("}\n") -} - -func (options *Latex) HRule(out *bytes.Buffer) { - out.WriteString("\n\\HRule\n") -} - -func (options *Latex) List(out *bytes.Buffer, text func() bool, flags int) { - marker := out.Len() - if flags&LIST_TYPE_ORDERED != 0 { - out.WriteString("\n\\begin{enumerate}\n") - } else { - out.WriteString("\n\\begin{itemize}\n") - } - if !text() { - out.Truncate(marker) - return - } - if flags&LIST_TYPE_ORDERED != 0 { - out.WriteString("\n\\end{enumerate}\n") - } else { - out.WriteString("\n\\end{itemize}\n") - } -} - -func (options *Latex) ListItem(out *bytes.Buffer, text []byte, flags int) { - out.WriteString("\n\\item ") - out.Write(text) -} - -func (options *Latex) Paragraph(out *bytes.Buffer, text func() bool) { - marker := out.Len() - out.WriteString("\n") - if !text() { - out.Truncate(marker) - return - } - out.WriteString("\n") -} - -func (options *Latex) Table(out *bytes.Buffer, header []byte, body []byte, columnData []int) { - out.WriteString("\n\\begin{tabular}{") - for _, elt := range columnData { - switch elt { - case TABLE_ALIGNMENT_LEFT: - out.WriteByte('l') - case TABLE_ALIGNMENT_RIGHT: - out.WriteByte('r') - default: - out.WriteByte('c') - } - } - out.WriteString("}\n") - out.Write(header) - out.WriteString(" \\\\\n\\hline\n") - out.Write(body) - out.WriteString("\n\\end{tabular}\n") -} - -func (options *Latex) TableRow(out *bytes.Buffer, text []byte) { - if out.Len() > 0 { - out.WriteString(" \\\\\n") - } - out.Write(text) -} - -func (options *Latex) TableHeaderCell(out *bytes.Buffer, text []byte, align int) { - if out.Len() > 0 { - out.WriteString(" & ") - } - out.Write(text) -} - -func (options *Latex) TableCell(out *bytes.Buffer, text []byte, align int) { - if out.Len() > 0 { - out.WriteString(" & ") - } - out.Write(text) -} - -// TODO: this -func (options *Latex) Footnotes(out *bytes.Buffer, text func() bool) { - -} - -func (options *Latex) FootnoteItem(out *bytes.Buffer, name, text []byte, flags int) { - -} - -func (options *Latex) AutoLink(out *bytes.Buffer, link []byte, kind int) { - out.WriteString("\\href{") - if kind == LINK_TYPE_EMAIL { - out.WriteString("mailto:") - } - out.Write(link) - out.WriteString("}{") - out.Write(link) - out.WriteString("}") -} - -func (options *Latex) CodeSpan(out *bytes.Buffer, text []byte) { - out.WriteString("\\texttt{") - escapeSpecialChars(out, text) - out.WriteString("}") -} - -func (options *Latex) DoubleEmphasis(out *bytes.Buffer, text []byte) { - out.WriteString("\\textbf{") - out.Write(text) - out.WriteString("}") -} - -func (options *Latex) Emphasis(out *bytes.Buffer, text []byte) { - out.WriteString("\\textit{") - out.Write(text) - out.WriteString("}") -} - -func (options *Latex) Image(out *bytes.Buffer, link []byte, title []byte, alt []byte) { - if bytes.HasPrefix(link, []byte("http://")) || bytes.HasPrefix(link, []byte("https://")) { - // treat it like a link - out.WriteString("\\href{") - out.Write(link) - out.WriteString("}{") - out.Write(alt) - out.WriteString("}") - } else { - out.WriteString("\\includegraphics{") - out.Write(link) - out.WriteString("}") - } -} - -func (options *Latex) LineBreak(out *bytes.Buffer) { - out.WriteString(" \\\\\n") -} - -func (options *Latex) Link(out *bytes.Buffer, link []byte, title []byte, content []byte) { - out.WriteString("\\href{") - out.Write(link) - out.WriteString("}{") - out.Write(content) - out.WriteString("}") -} - -func (options *Latex) RawHtmlTag(out *bytes.Buffer, tag []byte) { -} - -func (options *Latex) TripleEmphasis(out *bytes.Buffer, text []byte) { - out.WriteString("\\textbf{\\textit{") - out.Write(text) - out.WriteString("}}") -} - -func (options *Latex) StrikeThrough(out *bytes.Buffer, text []byte) { - out.WriteString("\\sout{") - out.Write(text) - out.WriteString("}") -} - -// TODO: this -func (options *Latex) FootnoteRef(out *bytes.Buffer, ref []byte, id int) { - -} - -func needsBackslash(c byte) bool { - for _, r := range []byte("_{}%$&\\~#") { - if c == r { - return true - } - } - return false -} - -func escapeSpecialChars(out *bytes.Buffer, text []byte) { - for i := 0; i < len(text); i++ { - // directly copy normal characters - org := i - - for i < len(text) && !needsBackslash(text[i]) { - i++ - } - if i > org { - out.Write(text[org:i]) - } - - // escape a character - if i >= len(text) { - break - } - out.WriteByte('\\') - out.WriteByte(text[i]) - } -} - -func (options *Latex) Entity(out *bytes.Buffer, entity []byte) { - // TODO: convert this into a unicode character or something - out.Write(entity) -} - -func (options *Latex) NormalText(out *bytes.Buffer, text []byte) { - escapeSpecialChars(out, text) -} - -// header and footer -func (options *Latex) DocumentHeader(out *bytes.Buffer) { - out.WriteString("\\documentclass{article}\n") - out.WriteString("\n") - out.WriteString("\\usepackage{graphicx}\n") - out.WriteString("\\usepackage{listings}\n") - out.WriteString("\\usepackage[margin=1in]{geometry}\n") - out.WriteString("\\usepackage[utf8]{inputenc}\n") - out.WriteString("\\usepackage{verbatim}\n") - out.WriteString("\\usepackage[normalem]{ulem}\n") - out.WriteString("\\usepackage{hyperref}\n") - out.WriteString("\n") - out.WriteString("\\hypersetup{colorlinks,%\n") - out.WriteString(" citecolor=black,%\n") - out.WriteString(" filecolor=black,%\n") - out.WriteString(" linkcolor=black,%\n") - out.WriteString(" urlcolor=black,%\n") - out.WriteString(" pdfstartview=FitH,%\n") - out.WriteString(" breaklinks=true,%\n") - out.WriteString(" pdfauthor={Blackfriday Markdown Processor v") - out.WriteString(VERSION) - out.WriteString("}}\n") - out.WriteString("\n") - out.WriteString("\\newcommand{\\HRule}{\\rule{\\linewidth}{0.5mm}}\n") - out.WriteString("\\addtolength{\\parskip}{0.5\\baselineskip}\n") - out.WriteString("\\parindent=0pt\n") - out.WriteString("\n") - out.WriteString("\\begin{document}\n") -} - -func (options *Latex) DocumentFooter(out *bytes.Buffer) { - out.WriteString("\n\\end{document}\n") -} diff --git a/markdown.go b/markdown.go index 746905bf..58d2e453 100644 --- a/markdown.go +++ b/markdown.go @@ -1,231 +1,200 @@ -// // Blackfriday Markdown Processor // Available at http://github.com/russross/blackfriday // // Copyright © 2011 Russ Ross . // Distributed under the Simplified BSD License. // See README.md for details. -// - -// -// -// Markdown parsing and processing -// -// -// Blackfriday markdown processor. -// -// Translates plain text with simple formatting rules into HTML or LaTeX. package blackfriday import ( "bytes" "fmt" + "io" "strings" "unicode/utf8" ) -const VERSION = "1.4" +// +// Markdown parsing and processing +// + +// Version string of the package. Appears in the rendered document when +// CompletePage flag is on. +const Version = "2.0" + +// Extensions is a bitwise or'ed collection of enabled Blackfriday's +// extensions. +type Extensions int // These are the supported markdown parsing extensions. // OR these values together to select multiple extensions. const ( - EXTENSION_NO_INTRA_EMPHASIS = 1 << iota // ignore emphasis markers inside words - EXTENSION_TABLES // render tables - EXTENSION_FENCED_CODE // render fenced code blocks - EXTENSION_AUTOLINK // detect embedded URLs that are not explicitly marked - EXTENSION_STRIKETHROUGH // strikethrough text using ~~test~~ - EXTENSION_LAX_HTML_BLOCKS // loosen up HTML block parsing rules - EXTENSION_SPACE_HEADERS // be strict about prefix header rules - EXTENSION_HARD_LINE_BREAK // translate newlines into line breaks - EXTENSION_TAB_SIZE_EIGHT // expand tabs to eight spaces instead of four - EXTENSION_FOOTNOTES // Pandoc-style footnotes - EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK // No need to insert an empty line to start a (code, quote, ordered list, unordered list) block - EXTENSION_HEADER_IDS // specify header IDs with {#id} - EXTENSION_TITLEBLOCK // Titleblock ala pandoc - EXTENSION_AUTO_HEADER_IDS // Create the header ID from the text - EXTENSION_BACKSLASH_LINE_BREAK // translate trailing backslashes into line breaks - EXTENSION_DEFINITION_LISTS // render definition lists - - commonHtmlFlags = 0 | - HTML_USE_XHTML | - HTML_USE_SMARTYPANTS | - HTML_SMARTYPANTS_FRACTIONS | - HTML_SMARTYPANTS_DASHES | - HTML_SMARTYPANTS_LATEX_DASHES - - commonExtensions = 0 | - EXTENSION_NO_INTRA_EMPHASIS | - EXTENSION_TABLES | - EXTENSION_FENCED_CODE | - EXTENSION_AUTOLINK | - EXTENSION_STRIKETHROUGH | - EXTENSION_SPACE_HEADERS | - EXTENSION_HEADER_IDS | - EXTENSION_BACKSLASH_LINE_BREAK | - EXTENSION_DEFINITION_LISTS + NoExtensions Extensions = 0 + NoIntraEmphasis Extensions = 1 << iota // Ignore emphasis markers inside words + Tables // Render tables + FencedCode // Render fenced code blocks + Autolink // Detect embedded URLs that are not explicitly marked + Strikethrough // Strikethrough text using ~~test~~ + LaxHTMLBlocks // Loosen up HTML block parsing rules + SpaceHeadings // Be strict about prefix heading rules + HardLineBreak // Translate newlines into line breaks + TabSizeEight // Expand tabs to eight spaces instead of four + Footnotes // Pandoc-style footnotes + NoEmptyLineBeforeBlock // No need to insert an empty line to start a (code, quote, ordered list, unordered list) block + HeadingIDs // specify heading IDs with {#id} + Titleblock // Titleblock ala pandoc + AutoHeadingIDs // Create the heading ID from the text + BackslashLineBreak // Translate trailing backslashes into line breaks + DefinitionLists // Render definition lists + + CommonHTMLFlags HTMLFlags = UseXHTML | Smartypants | + SmartypantsFractions | SmartypantsDashes | SmartypantsLatexDashes + + CommonExtensions Extensions = NoIntraEmphasis | Tables | FencedCode | + Autolink | Strikethrough | SpaceHeadings | HeadingIDs | + BackslashLineBreak | DefinitionLists ) -// These are the possible flag values for the link renderer. -// Only a single one of these values will be used; they are not ORed together. -// These are mostly of interest if you are writing a new output format. -const ( - LINK_TYPE_NOT_AUTOLINK = iota - LINK_TYPE_NORMAL - LINK_TYPE_EMAIL -) +// ListType contains bitwise or'ed flags for list and list item objects. +type ListType int // These are the possible flag values for the ListItem renderer. // Multiple flag values may be ORed together. // These are mostly of interest if you are writing a new output format. const ( - LIST_TYPE_ORDERED = 1 << iota - LIST_TYPE_DEFINITION - LIST_TYPE_TERM - LIST_ITEM_CONTAINS_BLOCK - LIST_ITEM_BEGINNING_OF_LIST - LIST_ITEM_END_OF_LIST + ListTypeOrdered ListType = 1 << iota + ListTypeDefinition + ListTypeTerm + + ListItemContainsBlock + ListItemBeginningOfList // TODO: figure out if this is of any use now + ListItemEndOfList ) +// CellAlignFlags holds a type of alignment in a table cell. +type CellAlignFlags int + // These are the possible flag values for the table cell renderer. // Only a single one of these values will be used; they are not ORed together. // These are mostly of interest if you are writing a new output format. const ( - TABLE_ALIGNMENT_LEFT = 1 << iota - TABLE_ALIGNMENT_RIGHT - TABLE_ALIGNMENT_CENTER = (TABLE_ALIGNMENT_LEFT | TABLE_ALIGNMENT_RIGHT) + TableAlignmentLeft CellAlignFlags = 1 << iota + TableAlignmentRight + TableAlignmentCenter = (TableAlignmentLeft | TableAlignmentRight) ) // The size of a tab stop. const ( - TAB_SIZE_DEFAULT = 4 - TAB_SIZE_EIGHT = 8 + TabSizeDefault = 4 + TabSizeDouble = 8 ) // blockTags is a set of tags that are recognized as HTML block tags. // Any of these can be included in markdown text without special escaping. var blockTags = map[string]struct{}{ - "blockquote": struct{}{}, - "del": struct{}{}, - "div": struct{}{}, - "dl": struct{}{}, - "fieldset": struct{}{}, - "form": struct{}{}, - "h1": struct{}{}, - "h2": struct{}{}, - "h3": struct{}{}, - "h4": struct{}{}, - "h5": struct{}{}, - "h6": struct{}{}, - "iframe": struct{}{}, - "ins": struct{}{}, - "math": struct{}{}, - "noscript": struct{}{}, - "ol": struct{}{}, - "pre": struct{}{}, - "p": struct{}{}, - "script": struct{}{}, - "style": struct{}{}, - "table": struct{}{}, - "ul": struct{}{}, + "blockquote": {}, + "del": {}, + "div": {}, + "dl": {}, + "fieldset": {}, + "form": {}, + "h1": {}, + "h2": {}, + "h3": {}, + "h4": {}, + "h5": {}, + "h6": {}, + "iframe": {}, + "ins": {}, + "math": {}, + "noscript": {}, + "ol": {}, + "pre": {}, + "p": {}, + "script": {}, + "style": {}, + "table": {}, + "ul": {}, // HTML5 - "address": struct{}{}, - "article": struct{}{}, - "aside": struct{}{}, - "canvas": struct{}{}, - "figcaption": struct{}{}, - "figure": struct{}{}, - "footer": struct{}{}, - "header": struct{}{}, - "hgroup": struct{}{}, - "main": struct{}{}, - "nav": struct{}{}, - "output": struct{}{}, - "progress": struct{}{}, - "section": struct{}{}, - "video": struct{}{}, + "address": {}, + "article": {}, + "aside": {}, + "canvas": {}, + "figcaption": {}, + "figure": {}, + "footer": {}, + "header": {}, + "hgroup": {}, + "main": {}, + "nav": {}, + "output": {}, + "progress": {}, + "section": {}, + "video": {}, } -// Renderer is the rendering interface. -// This is mostly of interest if you are implementing a new rendering format. -// -// When a byte slice is provided, it contains the (rendered) contents of the -// element. +// Renderer is the rendering interface. This is mostly of interest if you are +// implementing a new rendering format. // -// When a callback is provided instead, it will write the contents of the -// respective element directly to the output buffer and return true on success. -// If the callback returns false, the rendering function should reset the -// output buffer as though it had never been called. -// -// Currently Html and Latex implementations are provided +// Only an HTML implementation is provided in this repository, see the README +// for external implementations. type Renderer interface { - // block-level callbacks - BlockCode(out *bytes.Buffer, text []byte, lang string) - BlockQuote(out *bytes.Buffer, text []byte) - BlockHtml(out *bytes.Buffer, text []byte) - Header(out *bytes.Buffer, text func() bool, level int, id string) - HRule(out *bytes.Buffer) - List(out *bytes.Buffer, text func() bool, flags int) - ListItem(out *bytes.Buffer, text []byte, flags int) - Paragraph(out *bytes.Buffer, text func() bool) - Table(out *bytes.Buffer, header []byte, body []byte, columnData []int) - TableRow(out *bytes.Buffer, text []byte) - TableHeaderCell(out *bytes.Buffer, text []byte, flags int) - TableCell(out *bytes.Buffer, text []byte, flags int) - Footnotes(out *bytes.Buffer, text func() bool) - FootnoteItem(out *bytes.Buffer, name, text []byte, flags int) - TitleBlock(out *bytes.Buffer, text []byte) - - // Span-level callbacks - AutoLink(out *bytes.Buffer, link []byte, kind int) - CodeSpan(out *bytes.Buffer, text []byte) - DoubleEmphasis(out *bytes.Buffer, text []byte) - Emphasis(out *bytes.Buffer, text []byte) - Image(out *bytes.Buffer, link []byte, title []byte, alt []byte) - LineBreak(out *bytes.Buffer) - Link(out *bytes.Buffer, link []byte, title []byte, content []byte) - RawHtmlTag(out *bytes.Buffer, tag []byte) - TripleEmphasis(out *bytes.Buffer, text []byte) - StrikeThrough(out *bytes.Buffer, text []byte) - FootnoteRef(out *bytes.Buffer, ref []byte, id int) - - // Low-level callbacks - Entity(out *bytes.Buffer, entity []byte) - NormalText(out *bytes.Buffer, text []byte) - - // Header and footer - DocumentHeader(out *bytes.Buffer) - DocumentFooter(out *bytes.Buffer) - - GetFlags() int + // RenderNode is the main rendering method. It will be called once for + // every leaf node and twice for every non-leaf node (first with + // entering=true, then with entering=false). The method should write its + // rendition of the node to the supplied writer w. + RenderNode(w io.Writer, node *Node, entering bool) WalkStatus + + // RenderHeader is a method that allows the renderer to produce some + // content preceding the main body of the output document. The header is + // understood in the broad sense here. For example, the default HTML + // renderer will write not only the HTML document preamble, but also the + // table of contents if it was requested. + // + // The method will be passed an entire document tree, in case a particular + // implementation needs to inspect it to produce output. + // + // The output should be written to the supplied writer w. If your + // implementation has no header to write, supply an empty implementation. + RenderHeader(w io.Writer, ast *Node) + + // RenderFooter is a symmetric counterpart of RenderHeader. + RenderFooter(w io.Writer, ast *Node) } // Callback functions for inline parsing. One such function is defined // for each character that triggers a response when parsing inline data. -type inlineParser func(p *parser, out *bytes.Buffer, data []byte, offset int) int - -// Parser holds runtime state used by the parser. -// This is constructed by the Markdown function. -type parser struct { - r Renderer - refOverride ReferenceOverrideFunc - refs map[string]*reference - inlineCallback [256]inlineParser - flags int - nesting int - maxNesting int - insideLink bool +type inlineParser func(p *Markdown, data []byte, offset int) (int, *Node) + +// Markdown is a type that holds extensions and the runtime state used by +// Parse, and the renderer. You can not use it directly, construct it with New. +type Markdown struct { + renderer Renderer + referenceOverride ReferenceOverrideFunc + refs map[string]*reference + inlineCallback [256]inlineParser + extensions Extensions + nesting int + maxNesting int + insideLink bool // Footnotes need to be ordered as well as available to quickly check for // presence. If a ref is also a footnote, it's stored both in refs and here // in notes. Slice is nil if footnotes not enabled. notes []*reference + + doc *Node + tip *Node // = doc + oldTip *Node + lastMatchedContainer *Node // = doc + allClosed bool } -func (p *parser) getRef(refid string) (ref *reference, found bool) { - if p.refOverride != nil { - r, overridden := p.refOverride(refid) +func (p *Markdown) getRef(refid string) (ref *reference, found bool) { + if p.referenceOverride != nil { + r, overridden := p.referenceOverride(refid) if overridden { if r == nil { return nil, false @@ -233,7 +202,7 @@ func (p *parser) getRef(refid string) (ref *reference, found bool) { return &reference{ link: []byte(r.Link), title: []byte(r.Title), - noteId: 0, + noteID: 0, hasBlock: false, text: []byte(r.Text)}, true } @@ -243,6 +212,36 @@ func (p *parser) getRef(refid string) (ref *reference, found bool) { return ref, found } +func (p *Markdown) finalize(block *Node) { + above := block.Parent + block.open = false + p.tip = above +} + +func (p *Markdown) addChild(node NodeType, offset uint32) *Node { + return p.addExistingChild(NewNode(node), offset) +} + +func (p *Markdown) addExistingChild(node *Node, offset uint32) *Node { + for !p.tip.canContain(node.Type) { + p.finalize(p.tip) + } + p.tip.AppendChild(node) + p.tip = node + return node +} + +func (p *Markdown) closeUnmatchedBlocks() { + if !p.allClosed { + for p.oldTip != p.lastMatchedContainer { + parent := p.oldTip.Parent + p.finalize(p.oldTip) + p.oldTip = parent + } + p.allClosed = true + } +} + // // // Public interface @@ -267,102 +266,27 @@ type Reference struct { // See the documentation in Options for more details on use-case. type ReferenceOverrideFunc func(reference string) (ref *Reference, overridden bool) -// Options represents configurable overrides and callbacks (in addition to the -// extension flag set) for configuring a Markdown parse. -type Options struct { - // Extensions is a flag set of bit-wise ORed extension bits. See the - // EXTENSION_* flags defined in this package. - Extensions int - - // ReferenceOverride is an optional function callback that is called every - // time a reference is resolved. - // - // In Markdown, the link reference syntax can be made to resolve a link to - // a reference instead of an inline URL, in one of the following ways: - // - // * [link text][refid] - // * [refid][] - // - // Usually, the refid is defined at the bottom of the Markdown document. If - // this override function is provided, the refid is passed to the override - // function first, before consulting the defined refids at the bottom. If - // the override function indicates an override did not occur, the refids at - // the bottom will be used to fill in the link details. - ReferenceOverride ReferenceOverrideFunc -} - -// MarkdownBasic is a convenience function for simple rendering. -// It processes markdown input with no extensions enabled. -func MarkdownBasic(input []byte) []byte { - // set up the HTML renderer - htmlFlags := HTML_USE_XHTML - renderer := HtmlRenderer(htmlFlags, "", "") - - // set up the parser - return MarkdownOptions(input, renderer, Options{Extensions: 0}) -} - -// Call Markdown with most useful extensions enabled -// MarkdownCommon is a convenience function for simple rendering. -// It processes markdown input with common extensions enabled, including: -// -// * Smartypants processing with smart fractions and LaTeX dashes -// -// * Intra-word emphasis suppression -// -// * Tables -// -// * Fenced code blocks -// -// * Autolinking -// -// * Strikethrough support -// -// * Strict header parsing -// -// * Custom Header IDs -func MarkdownCommon(input []byte) []byte { - // set up the HTML renderer - renderer := HtmlRenderer(commonHtmlFlags, "", "") - return MarkdownOptions(input, renderer, Options{ - Extensions: commonExtensions}) -} - -// Markdown is the main rendering function. -// It parses and renders a block of markdown-encoded text. -// The supplied Renderer is used to format the output, and extensions dictates -// which non-standard extensions are enabled. -// -// To use the supplied Html or LaTeX renderers, see HtmlRenderer and -// LatexRenderer, respectively. -func Markdown(input []byte, renderer Renderer, extensions int) []byte { - return MarkdownOptions(input, renderer, Options{ - Extensions: extensions}) -} - -// MarkdownOptions is just like Markdown but takes additional options through -// the Options struct. -func MarkdownOptions(input []byte, renderer Renderer, opts Options) []byte { - // no point in parsing if we can't render - if renderer == nil { - return nil +// New constructs a Markdown processor. You can use the same With* functions as +// for Run() to customize parser's behavior and the renderer. +func New(opts ...Option) *Markdown { + var p Markdown + for _, opt := range opts { + opt(&p) } - - extensions := opts.Extensions - - // fill in the render structure - p := new(parser) - p.r = renderer - p.flags = extensions - p.refOverride = opts.ReferenceOverride p.refs = make(map[string]*reference) p.maxNesting = 16 p.insideLink = false - + docNode := NewNode(Document) + p.doc = docNode + p.tip = docNode + p.oldTip = docNode + p.lastMatchedContainer = docNode + p.allClosed = true // register inline parsers + p.inlineCallback[' '] = maybeLineBreak p.inlineCallback['*'] = emphasis p.inlineCallback['_'] = emphasis - if extensions&EXTENSION_STRIKETHROUGH != 0 { + if p.extensions&Strikethrough != 0 { p.inlineCallback['~'] = emphasis } p.inlineCallback['`'] = codeSpan @@ -371,115 +295,166 @@ func MarkdownOptions(input []byte, renderer Renderer, opts Options) []byte { p.inlineCallback['<'] = leftAngle p.inlineCallback['\\'] = escape p.inlineCallback['&'] = entity - - if extensions&EXTENSION_AUTOLINK != 0 { - p.inlineCallback[':'] = autoLink - } - - if extensions&EXTENSION_FOOTNOTES != 0 { + p.inlineCallback['!'] = maybeImage + p.inlineCallback['^'] = maybeInlineFootnote + if p.extensions&Autolink != 0 { + p.inlineCallback['h'] = maybeAutoLink + p.inlineCallback['m'] = maybeAutoLink + p.inlineCallback['f'] = maybeAutoLink + p.inlineCallback['H'] = maybeAutoLink + p.inlineCallback['M'] = maybeAutoLink + p.inlineCallback['F'] = maybeAutoLink + } + if p.extensions&Footnotes != 0 { p.notes = make([]*reference, 0) } - - first := firstPass(p, input) - second := secondPass(p, first) - return second + return &p } -// first pass: -// - extract references -// - expand tabs -// - normalize newlines -// - copy everything else -func firstPass(p *parser, input []byte) []byte { - var out bytes.Buffer - tabSize := TAB_SIZE_DEFAULT - if p.flags&EXTENSION_TAB_SIZE_EIGHT != 0 { - tabSize = TAB_SIZE_EIGHT - } - beg, end := 0, 0 - lastFencedCodeBlockEnd := 0 - for beg < len(input) { // iterate over lines - if end = isReference(p, input[beg:], tabSize); end > 0 { - beg += end - } else { // skip to the next line - end = beg - for end < len(input) && input[end] != '\n' && input[end] != '\r' { - end++ - } +// Option customizes the Markdown processor's default behavior. +type Option func(*Markdown) - if p.flags&EXTENSION_FENCED_CODE != 0 { - // track fenced code block boundaries to suppress tab expansion - // inside them: - if beg >= lastFencedCodeBlockEnd { - if i := p.fencedCode(&out, input[beg:], false); i > 0 { - lastFencedCodeBlockEnd = beg + i - } - } - } - - // add the line body if present - if end > beg { - if end < lastFencedCodeBlockEnd { // Do not expand tabs while inside fenced code blocks. - out.Write(input[beg:end]) - } else { - expandTabs(&out, input[beg:end], tabSize) - } - } - out.WriteByte('\n') - - if end < len(input) && input[end] == '\r' { - end++ - } - if end < len(input) && input[end] == '\n' { - end++ - } - - beg = end - } +// WithRenderer allows you to override the default renderer. +func WithRenderer(r Renderer) Option { + return func(p *Markdown) { + p.renderer = r } +} - // empty input? - if out.Len() == 0 { - out.WriteByte('\n') +// WithExtensions allows you to pick some of the many extensions provided by +// Blackfriday. You can bitwise OR them. +func WithExtensions(e Extensions) Option { + return func(p *Markdown) { + p.extensions = e } - - return out.Bytes() } -// second pass: actual rendering -func secondPass(p *parser, input []byte) []byte { - var output bytes.Buffer - - p.r.DocumentHeader(&output) - p.block(&output, input) - - if p.flags&EXTENSION_FOOTNOTES != 0 && len(p.notes) > 0 { - p.r.Footnotes(&output, func() bool { - flags := LIST_ITEM_BEGINNING_OF_LIST - for i := 0; i < len(p.notes); i += 1 { - ref := p.notes[i] - var buf bytes.Buffer - if ref.hasBlock { - flags |= LIST_ITEM_CONTAINS_BLOCK - p.block(&buf, ref.title) - } else { - p.inline(&buf, ref.title) - } - p.r.FootnoteItem(&output, ref.link, buf.Bytes(), flags) - flags &^= LIST_ITEM_BEGINNING_OF_LIST | LIST_ITEM_CONTAINS_BLOCK - } - - return true +// WithNoExtensions turns off all extensions and custom behavior. +func WithNoExtensions() Option { + return func(p *Markdown) { + p.extensions = NoExtensions + p.renderer = NewHTMLRenderer(HTMLRendererParameters{ + Flags: HTMLFlagsNone, }) } +} - p.r.DocumentFooter(&output) - - if p.nesting != 0 { - panic("Nesting level did not end at zero") +// WithRefOverride sets an optional function callback that is called every +// time a reference is resolved. +// +// In Markdown, the link reference syntax can be made to resolve a link to +// a reference instead of an inline URL, in one of the following ways: +// +// * [link text][refid] +// * [refid][] +// +// Usually, the refid is defined at the bottom of the Markdown document. If +// this override function is provided, the refid is passed to the override +// function first, before consulting the defined refids at the bottom. If +// the override function indicates an override did not occur, the refids at +// the bottom will be used to fill in the link details. +func WithRefOverride(o ReferenceOverrideFunc) Option { + return func(p *Markdown) { + p.referenceOverride = o } +} - return output.Bytes() +// Run is the main entry point to Blackfriday. It parses and renders a +// block of markdown-encoded text. +// +// The simplest invocation of Run takes one argument, input: +// output := Run(input) +// This will parse the input with CommonExtensions enabled and render it with +// the default HTMLRenderer (with CommonHTMLFlags). +// +// Variadic arguments opts can customize the default behavior. Since Markdown +// type does not contain exported fields, you can not use it directly. Instead, +// use the With* functions. For example, this will call the most basic +// functionality, with no extensions: +// output := Run(input, WithNoExtensions()) +// +// You can use any number of With* arguments, even contradicting ones. They +// will be applied in order of appearance and the latter will override the +// former: +// output := Run(input, WithNoExtensions(), WithExtensions(exts), +// WithRenderer(yourRenderer)) +func Run(input []byte, opts ...Option) []byte { + r := NewHTMLRenderer(HTMLRendererParameters{ + Flags: CommonHTMLFlags, + }) + optList := []Option{WithRenderer(r), WithExtensions(CommonExtensions)} + optList = append(optList, opts...) + parser := New(optList...) + ast := parser.Parse(input) + var buf bytes.Buffer + parser.renderer.RenderHeader(&buf, ast) + ast.Walk(func(node *Node, entering bool) WalkStatus { + return parser.renderer.RenderNode(&buf, node, entering) + }) + parser.renderer.RenderFooter(&buf, ast) + return buf.Bytes() +} + +// Parse is an entry point to the parsing part of Blackfriday. It takes an +// input markdown document and produces a syntax tree for its contents. This +// tree can then be rendered with a default or custom renderer, or +// analyzed/transformed by the caller to whatever non-standard needs they have. +// The return value is the root node of the syntax tree. +func (p *Markdown) Parse(input []byte) *Node { + p.block(input) + // Walk the tree and finish up some of unfinished blocks + for p.tip != nil { + p.finalize(p.tip) + } + // Walk the tree again and process inline markdown in each block + p.doc.Walk(func(node *Node, entering bool) WalkStatus { + if node.Type == Paragraph || node.Type == Heading || node.Type == TableCell { + p.inline(node, node.content) + node.content = nil + } + return GoToNext + }) + p.parseRefsToAST() + return p.doc +} + +func (p *Markdown) parseRefsToAST() { + if p.extensions&Footnotes == 0 || len(p.notes) == 0 { + return + } + p.tip = p.doc + block := p.addBlock(List, nil) + block.IsFootnotesList = true + block.ListFlags = ListTypeOrdered + flags := ListItemBeginningOfList + // Note: this loop is intentionally explicit, not range-form. This is + // because the body of the loop will append nested footnotes to p.notes and + // we need to process those late additions. Range form would only walk over + // the fixed initial set. + for i := 0; i < len(p.notes); i++ { + ref := p.notes[i] + p.addExistingChild(ref.footnote, 0) + block := ref.footnote + block.ListFlags = flags | ListTypeOrdered + block.RefLink = ref.link + if ref.hasBlock { + flags |= ListItemContainsBlock + p.block(ref.title) + } else { + p.inline(block, ref.title) + } + flags &^= ListItemBeginningOfList | ListItemContainsBlock + } + above := block.Parent + finalizeList(block) + p.tip = above + block.Walk(func(node *Node, entering bool) WalkStatus { + if node.Type == Paragraph || node.Type == Heading { + p.inline(node, node.content) + node.content = nil + } + return GoToNext + }) } // @@ -505,24 +480,62 @@ func secondPass(p *parser, input []byte) []byte { // [^note]: This is the explanation. // // Footnotes should be placed at the end of the document in an ordered list. -// Inline footnotes such as: +// Finally, there are inline footnotes such as: // -// Inline footnotes^[Not supported.] also exist. +// Inline footnotes^[Also supported.] provide a quick inline explanation, +// but are rendered at the bottom of the document. // -// are not yet supported. -// References are parsed and stored in this struct. +// reference holds all information necessary for a reference-style links or +// footnotes. +// +// Consider this markdown with reference-style links: +// +// [link][ref] +// +// [ref]: /url/ "tooltip title" +// +// It will be ultimately converted to this HTML: +// +//

    link

    +// +// And a reference structure will be populated as follows: +// +// p.refs["ref"] = &reference{ +// link: "/url/", +// title: "tooltip title", +// } +// +// Alternatively, reference can contain information about a footnote. Consider +// this markdown: +// +// Text needing a footnote.[^a] +// +// [^a]: This is the note +// +// A reference structure will be populated as follows: +// +// p.refs["a"] = &reference{ +// link: "a", +// title: "This is the note", +// noteID: , +// } +// +// TODO: As you can see, it begs for splitting into two dedicated structures +// for refs and for footnotes. type reference struct { link []byte title []byte - noteId int // 0 if not a footnote ref + noteID int // 0 if not a footnote ref hasBlock bool - text []byte + footnote *Node // a link to the Item node within a list of footnotes + + text []byte // only gets populated by refOverride feature with Reference.Text } func (r *reference) String() string { - return fmt.Sprintf("{link: %q, title: %q, text: %q, noteId: %d, hasBlock: %v}", - r.link, r.title, r.text, r.noteId, r.hasBlock) + return fmt.Sprintf("{link: %q, title: %q, text: %q, noteID: %d, hasBlock: %v}", + r.link, r.title, r.text, r.noteID, r.hasBlock) } // Check whether or not data starts with a reference link. @@ -530,7 +543,7 @@ func (r *reference) String() string { // (in the render struct). // Returns the number of bytes to skip to move past it, // or zero if the first line is not a reference. -func isReference(p *parser, data []byte, tabSize int) int { +func isReference(p *Markdown, data []byte, tabSize int) int { // up to 3 optional leading spaces if len(data) < 4 { return 0 @@ -540,18 +553,18 @@ func isReference(p *parser, data []byte, tabSize int) int { i++ } - noteId := 0 + noteID := 0 // id part: anything but a newline between brackets if data[i] != '[' { return 0 } i++ - if p.flags&EXTENSION_FOOTNOTES != 0 { + if p.extensions&Footnotes != 0 { if i < len(data) && data[i] == '^' { // we can set it to anything here because the proper noteIds will // be assigned later during the second pass. It just has to be != 0 - noteId = 1 + noteID = 1 i++ } } @@ -563,7 +576,11 @@ func isReference(p *parser, data []byte, tabSize int) int { return 0 } idEnd := i - + // footnotes can have empty ID, like this: [^], but a reference can not be + // empty like this: []. Break early if it's not a footnote and there's no ID + if noteID == 0 && idOffset == idEnd { + return 0 + } // spacer: colon (space | tab)* newline? (space | tab)* i++ if i >= len(data) || data[i] != ':' { @@ -594,7 +611,7 @@ func isReference(p *parser, data []byte, tabSize int) int { hasBlock bool ) - if p.flags&EXTENSION_FOOTNOTES != 0 && noteId != 0 { + if p.extensions&Footnotes != 0 && noteID != 0 { linkOffset, linkEnd, raw, hasBlock = scanFootnote(p, data, i, tabSize) lineEnd = linkEnd } else { @@ -607,11 +624,11 @@ func isReference(p *parser, data []byte, tabSize int) int { // a valid ref has been found ref := &reference{ - noteId: noteId, + noteID: noteID, hasBlock: hasBlock, } - if noteId > 0 { + if noteID > 0 { // reusing the link field for the id since footnotes don't have links ref.link = data[idOffset:idEnd] // if footnote, it's not really a title, it's the contained text @@ -629,7 +646,7 @@ func isReference(p *parser, data []byte, tabSize int) int { return lineEnd } -func scanLinkRef(p *parser, data []byte, i int) (linkOffset, linkEnd, titleOffset, titleEnd, lineEnd int) { +func scanLinkRef(p *Markdown, data []byte, i int) (linkOffset, linkEnd, titleOffset, titleEnd, lineEnd int) { // link: whitespace-free sequence, optionally between angle brackets if data[i] == '<' { i++ @@ -638,9 +655,6 @@ func scanLinkRef(p *parser, data []byte, i int) (linkOffset, linkEnd, titleOffse for i < len(data) && data[i] != ' ' && data[i] != '\t' && data[i] != '\n' && data[i] != '\r' { i++ } - if i == len(data) { - return - } linkEnd = i if data[linkOffset] == '<' && data[linkEnd-1] == '>' { linkOffset++ @@ -700,13 +714,13 @@ func scanLinkRef(p *parser, data []byte, i int) (linkOffset, linkEnd, titleOffse return } -// The first bit of this logic is the same as (*parser).listItem, but the rest +// The first bit of this logic is the same as Parser.listItem, but the rest // is much simpler. This function simply finds the entire block and shifts it // over by one tab if it is indeed a block (just returns the line if it's not). // blockEnd is the end of the section in the input buffer, and contents is the // extracted text that was shifted over one tab. It will need to be rendered at // the end of the document. -func scanFootnote(p *parser, data []byte, i, indentSize int) (blockStart, blockEnd int, contents []byte, hasBlock bool) { +func scanFootnote(p *Markdown, data []byte, i, indentSize int) (blockStart, blockEnd int, contents []byte, hasBlock bool) { if i == 0 || len(data) == 0 { return } @@ -799,7 +813,17 @@ func ispunct(c byte) bool { // Test if a character is a whitespace character. func isspace(c byte) bool { - return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f' || c == '\v' + return ishorizontalspace(c) || isverticalspace(c) +} + +// Test if a character is a horizontal whitespace character. +func ishorizontalspace(c byte) bool { + return c == ' ' || c == '\t' +} + +// Test if a character is a vertical character. +func isverticalspace(c byte) bool { + return c == '\n' || c == '\r' || c == '\f' || c == '\v' } // Test if a character is letter. diff --git a/markdown_test.go b/markdown_test.go new file mode 100644 index 00000000..078dcda3 --- /dev/null +++ b/markdown_test.go @@ -0,0 +1,39 @@ +// +// Blackfriday Markdown Processor +// Available at http://github.com/russross/blackfriday +// +// Copyright © 2011 Russ Ross . +// Distributed under the Simplified BSD License. +// See README.md for details. +// + +// +// Unit tests for full document parsing and rendering +// + +package blackfriday + +import "testing" + +func TestDocument(t *testing.T) { + t.Parallel() + var tests = []string{ + // Empty document. + "", + "", + + " ", + "", + + // This shouldn't panic. + // https://github.com/russross/blackfriday/issues/172 + "[]:<", + "

    []:<

    \n", + + // This shouldn't panic. + // https://github.com/russross/blackfriday/issues/173 + " [", + "

    [

    \n", + } + doTests(t, tests) +} diff --git a/node.go b/node.go new file mode 100644 index 00000000..04e6050c --- /dev/null +++ b/node.go @@ -0,0 +1,360 @@ +package blackfriday + +import ( + "bytes" + "fmt" +) + +// NodeType specifies a type of a single node of a syntax tree. Usually one +// node (and its type) corresponds to a single markdown feature, e.g. emphasis +// or code block. +type NodeType int + +// Constants for identifying different types of nodes. See NodeType. +const ( + Document NodeType = iota + BlockQuote + List + Item + Paragraph + Heading + HorizontalRule + Emph + Strong + Del + Link + Image + Text + HTMLBlock + CodeBlock + Softbreak + Hardbreak + Code + HTMLSpan + Table + TableCell + TableHead + TableBody + TableRow +) + +var nodeTypeNames = []string{ + Document: "Document", + BlockQuote: "BlockQuote", + List: "List", + Item: "Item", + Paragraph: "Paragraph", + Heading: "Heading", + HorizontalRule: "HorizontalRule", + Emph: "Emph", + Strong: "Strong", + Del: "Del", + Link: "Link", + Image: "Image", + Text: "Text", + HTMLBlock: "HTMLBlock", + CodeBlock: "CodeBlock", + Softbreak: "Softbreak", + Hardbreak: "Hardbreak", + Code: "Code", + HTMLSpan: "HTMLSpan", + Table: "Table", + TableCell: "TableCell", + TableHead: "TableHead", + TableBody: "TableBody", + TableRow: "TableRow", +} + +func (t NodeType) String() string { + return nodeTypeNames[t] +} + +// ListData contains fields relevant to a List and Item node type. +type ListData struct { + ListFlags ListType + Tight bool // Skip

    s around list item data if true + BulletChar byte // '*', '+' or '-' in bullet lists + Delimiter byte // '.' or ')' after the number in ordered lists + RefLink []byte // If not nil, turns this list item into a footnote item and triggers different rendering + IsFootnotesList bool // This is a list of footnotes +} + +// LinkData contains fields relevant to a Link node type. +type LinkData struct { + Destination []byte // Destination is what goes into a href + Title []byte // Title is the tooltip thing that goes in a title attribute + NoteID int // NoteID contains a serial number of a footnote, zero if it's not a footnote + Footnote *Node // If it's a footnote, this is a direct link to the footnote Node. Otherwise nil. +} + +// CodeBlockData contains fields relevant to a CodeBlock node type. +type CodeBlockData struct { + IsFenced bool // Specifies whether it's a fenced code block or an indented one + Info []byte // This holds the info string + FenceChar byte + FenceLength int + FenceOffset int +} + +// TableCellData contains fields relevant to a TableCell node type. +type TableCellData struct { + IsHeader bool // This tells if it's under the header row + Align CellAlignFlags // This holds the value for align attribute +} + +// HeadingData contains fields relevant to a Heading node type. +type HeadingData struct { + Level int // This holds the heading level number + HeadingID string // This might hold heading ID, if present + IsTitleblock bool // Specifies whether it's a title block +} + +// Node is a single element in the abstract syntax tree of the parsed document. +// It holds connections to the structurally neighboring nodes and, for certain +// types of nodes, additional information that might be needed when rendering. +type Node struct { + Type NodeType // Determines the type of the node + Parent *Node // Points to the parent + FirstChild *Node // Points to the first child, if any + LastChild *Node // Points to the last child, if any + Prev *Node // Previous sibling; nil if it's the first child + Next *Node // Next sibling; nil if it's the last child + + Literal []byte // Text contents of the leaf nodes + + HeadingData // Populated if Type is Heading + ListData // Populated if Type is List + CodeBlockData // Populated if Type is CodeBlock + LinkData // Populated if Type is Link + TableCellData // Populated if Type is TableCell + + content []byte // Markdown content of the block nodes + open bool // Specifies an open block node that has not been finished to process yet +} + +// NewNode allocates a node of a specified type. +func NewNode(typ NodeType) *Node { + return &Node{ + Type: typ, + open: true, + } +} + +func (n *Node) String() string { + ellipsis := "" + snippet := n.Literal + if len(snippet) > 16 { + snippet = snippet[:16] + ellipsis = "..." + } + return fmt.Sprintf("%s: '%s%s'", n.Type, snippet, ellipsis) +} + +// Unlink removes node 'n' from the tree. +// It panics if the node is nil. +func (n *Node) Unlink() { + if n.Prev != nil { + n.Prev.Next = n.Next + } else if n.Parent != nil { + n.Parent.FirstChild = n.Next + } + if n.Next != nil { + n.Next.Prev = n.Prev + } else if n.Parent != nil { + n.Parent.LastChild = n.Prev + } + n.Parent = nil + n.Next = nil + n.Prev = nil +} + +// AppendChild adds a node 'child' as a child of 'n'. +// It panics if either node is nil. +func (n *Node) AppendChild(child *Node) { + child.Unlink() + child.Parent = n + if n.LastChild != nil { + n.LastChild.Next = child + child.Prev = n.LastChild + n.LastChild = child + } else { + n.FirstChild = child + n.LastChild = child + } +} + +// InsertBefore inserts 'sibling' immediately before 'n'. +// It panics if either node is nil. +func (n *Node) InsertBefore(sibling *Node) { + sibling.Unlink() + sibling.Prev = n.Prev + if sibling.Prev != nil { + sibling.Prev.Next = sibling + } + sibling.Next = n + n.Prev = sibling + sibling.Parent = n.Parent + if sibling.Prev == nil { + sibling.Parent.FirstChild = sibling + } +} + +// IsContainer returns true if 'n' can contain children. +func (n *Node) IsContainer() bool { + switch n.Type { + case Document: + fallthrough + case BlockQuote: + fallthrough + case List: + fallthrough + case Item: + fallthrough + case Paragraph: + fallthrough + case Heading: + fallthrough + case Emph: + fallthrough + case Strong: + fallthrough + case Del: + fallthrough + case Link: + fallthrough + case Image: + fallthrough + case Table: + fallthrough + case TableHead: + fallthrough + case TableBody: + fallthrough + case TableRow: + fallthrough + case TableCell: + return true + default: + return false + } +} + +// IsLeaf returns true if 'n' is a leaf node. +func (n *Node) IsLeaf() bool { + return !n.IsContainer() +} + +func (n *Node) canContain(t NodeType) bool { + if n.Type == List { + return t == Item + } + if n.Type == Document || n.Type == BlockQuote || n.Type == Item { + return t != Item + } + if n.Type == Table { + return t == TableHead || t == TableBody + } + if n.Type == TableHead || n.Type == TableBody { + return t == TableRow + } + if n.Type == TableRow { + return t == TableCell + } + return false +} + +// WalkStatus allows NodeVisitor to have some control over the tree traversal. +// It is returned from NodeVisitor and different values allow Node.Walk to +// decide which node to go to next. +type WalkStatus int + +const ( + // GoToNext is the default traversal of every node. + GoToNext WalkStatus = iota + // SkipChildren tells walker to skip all children of current node. + SkipChildren + // Terminate tells walker to terminate the traversal. + Terminate +) + +// NodeVisitor is a callback to be called when traversing the syntax tree. +// Called twice for every node: once with entering=true when the branch is +// first visited, then with entering=false after all the children are done. +type NodeVisitor func(node *Node, entering bool) WalkStatus + +// Walk is a convenience method that instantiates a walker and starts a +// traversal of subtree rooted at n. +func (n *Node) Walk(visitor NodeVisitor) { + w := newNodeWalker(n) + for w.current != nil { + status := visitor(w.current, w.entering) + switch status { + case GoToNext: + w.next() + case SkipChildren: + w.entering = false + w.next() + case Terminate: + return + } + } +} + +type nodeWalker struct { + current *Node + root *Node + entering bool +} + +func newNodeWalker(root *Node) *nodeWalker { + return &nodeWalker{ + current: root, + root: root, + entering: true, + } +} + +func (nw *nodeWalker) next() { + if (!nw.current.IsContainer() || !nw.entering) && nw.current == nw.root { + nw.current = nil + return + } + if nw.entering && nw.current.IsContainer() { + if nw.current.FirstChild != nil { + nw.current = nw.current.FirstChild + nw.entering = true + } else { + nw.entering = false + } + } else if nw.current.Next == nil { + nw.current = nw.current.Parent + nw.entering = false + } else { + nw.current = nw.current.Next + nw.entering = true + } +} + +func dump(ast *Node) { + fmt.Println(dumpString(ast)) +} + +func dumpR(ast *Node, depth int) string { + if ast == nil { + return "" + } + indent := bytes.Repeat([]byte("\t"), depth) + content := ast.Literal + if content == nil { + content = ast.content + } + result := fmt.Sprintf("%s%s(%q)\n", indent, ast.Type, content) + for n := ast.FirstChild; n != nil; n = n.Next { + result += dumpR(n, depth+1) + } + return result +} + +func dumpString(ast *Node) string { + return dumpR(ast, 0) +} diff --git a/ref_test.go b/ref_test.go index 770439cf..a2e689ca 100644 --- a/ref_test.go +++ b/ref_test.go @@ -19,59 +19,8 @@ import ( "testing" ) -func runMarkdownReference(input string, flag int) string { - renderer := HtmlRenderer(0, "", "") - return string(Markdown([]byte(input), renderer, flag)) -} - -func doTestsReference(t *testing.T, files []string, flag int) { - // catch and report panics - var candidate string - defer func() { - if err := recover(); err != nil { - t.Errorf("\npanic while processing [%#v]\n", candidate) - } - }() - - for _, basename := range files { - filename := filepath.Join("testdata", basename+".text") - inputBytes, err := ioutil.ReadFile(filename) - if err != nil { - t.Errorf("Couldn't open '%s', error: %v\n", filename, err) - continue - } - input := string(inputBytes) - - filename = filepath.Join("testdata", basename+".html") - expectedBytes, err := ioutil.ReadFile(filename) - if err != nil { - t.Errorf("Couldn't open '%s', error: %v\n", filename, err) - continue - } - expected := string(expectedBytes) - - // fmt.Fprintf(os.Stderr, "processing %s ...", filename) - actual := string(runMarkdownReference(input, flag)) - if actual != expected { - t.Errorf("\n [%#v]\nExpected[%#v]\nActual [%#v]", - basename+".text", expected, actual) - } - // fmt.Fprintf(os.Stderr, " ok\n") - - // now test every prefix of every input to check for - // bounds checking - if !testing.Short() { - start, max := 0, len(input) - for end := start + 1; end <= max; end++ { - candidate = input[start:end] - // fmt.Fprintf(os.Stderr, " %s %d:%d/%d\n", filename, start, end, max) - _ = runMarkdownReference(candidate, flag) - } - } - } -} - func TestReference(t *testing.T) { + t.Parallel() files := []string{ "Amps and angle encoding", "Auto links", @@ -100,6 +49,7 @@ func TestReference(t *testing.T) { } func TestReference_EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK(t *testing.T) { + t.Parallel() files := []string{ "Amps and angle encoding", "Auto links", @@ -124,5 +74,53 @@ func TestReference_EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK(t *testing.T) { "Tabs", "Tidyness", } - doTestsReference(t, files, EXTENSION_NO_EMPTY_LINE_BEFORE_BLOCK) + doTestsReference(t, files, NoEmptyLineBeforeBlock) +} + +// benchResultAnchor is an anchor variable to store the result of a benchmarked +// code so that compiler could never optimize away the call to runMarkdown() +var benchResultAnchor string + +func BenchmarkReference(b *testing.B) { + params := TestParams{extensions: CommonExtensions} + files := []string{ + "Amps and angle encoding", + "Auto links", + "Backslash escapes", + "Blockquotes with code blocks", + "Code Blocks", + "Code Spans", + "Hard-wrapped paragraphs with list-like lines", + "Horizontal rules", + "Inline HTML (Advanced)", + "Inline HTML (Simple)", + "Inline HTML comments", + "Links, inline style", + "Links, reference style", + "Links, shortcut references", + "Literal quotes in titles", + "Markdown Documentation - Basics", + "Markdown Documentation - Syntax", + "Nested blockquotes", + "Ordered and unordered lists", + "Strong and em together", + "Tabs", + "Tidyness", + } + var tests []string + for _, basename := range files { + filename := filepath.Join("testdata", basename+".text") + inputBytes, err := ioutil.ReadFile(filename) + if err != nil { + b.Errorf("Couldn't open '%s', error: %v\n", filename, err) + continue + } + tests = append(tests, string(inputBytes)) + } + b.ResetTimer() + for n := 0; n < b.N; n++ { + for _, test := range tests { + benchResultAnchor = runMarkdown(test, params) + } + } } diff --git a/smartypants.go b/smartypants.go index eeffa5e1..3a220e94 100644 --- a/smartypants.go +++ b/smartypants.go @@ -17,11 +17,14 @@ package blackfriday import ( "bytes" + "io" ) -type smartypantsData struct { +// SPRenderer is a struct containing state of a Smartypants renderer. +type SPRenderer struct { inSingleQuote bool inDoubleQuote bool + callbacks [256]smartCallback } func wordBoundary(c byte) bool { @@ -39,7 +42,7 @@ func isdigit(c byte) bool { return c >= '0' && c <= '9' } -func smartQuoteHelper(out *bytes.Buffer, previousChar byte, nextChar byte, quote byte, isOpen *bool) bool { +func smartQuoteHelper(out *bytes.Buffer, previousChar byte, nextChar byte, quote byte, isOpen *bool, addNBSP bool) bool { // edge of the buffer is likely to be a tag that we don't get to see, // so we treat it like text sometimes @@ -96,6 +99,12 @@ func smartQuoteHelper(out *bytes.Buffer, previousChar byte, nextChar byte, quote *isOpen = false } + // Note that with the limited lookahead, this non-breaking + // space will also be appended to single double quotes. + if addNBSP && !*isOpen { + out.WriteString(" ") + } + out.WriteByte('&') if *isOpen { out.WriteByte('l') @@ -104,10 +113,15 @@ func smartQuoteHelper(out *bytes.Buffer, previousChar byte, nextChar byte, quote } out.WriteByte(quote) out.WriteString("quo;") + + if addNBSP && *isOpen { + out.WriteString(" ") + } + return true } -func smartSingleQuote(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { +func (r *SPRenderer) smartSingleQuote(out *bytes.Buffer, previousChar byte, text []byte) int { if len(text) >= 2 { t1 := tolower(text[1]) @@ -116,7 +130,7 @@ func smartSingleQuote(out *bytes.Buffer, smrt *smartypantsData, previousChar byt if len(text) >= 3 { nextChar = text[2] } - if smartQuoteHelper(out, previousChar, nextChar, 'd', &smrt.inDoubleQuote) { + if smartQuoteHelper(out, previousChar, nextChar, 'd', &r.inDoubleQuote, false) { return 1 } } @@ -141,7 +155,7 @@ func smartSingleQuote(out *bytes.Buffer, smrt *smartypantsData, previousChar byt if len(text) > 1 { nextChar = text[1] } - if smartQuoteHelper(out, previousChar, nextChar, 's', &smrt.inSingleQuote) { + if smartQuoteHelper(out, previousChar, nextChar, 's', &r.inSingleQuote, false) { return 0 } @@ -149,7 +163,7 @@ func smartSingleQuote(out *bytes.Buffer, smrt *smartypantsData, previousChar byt return 0 } -func smartParens(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { +func (r *SPRenderer) smartParens(out *bytes.Buffer, previousChar byte, text []byte) int { if len(text) >= 3 { t1 := tolower(text[1]) t2 := tolower(text[2]) @@ -174,7 +188,7 @@ func smartParens(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, te return 0 } -func smartDash(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { +func (r *SPRenderer) smartDash(out *bytes.Buffer, previousChar byte, text []byte) int { if len(text) >= 2 { if text[1] == '-' { out.WriteString("—") @@ -191,7 +205,7 @@ func smartDash(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text return 0 } -func smartDashLatex(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { +func (r *SPRenderer) smartDashLatex(out *bytes.Buffer, previousChar byte, text []byte) int { if len(text) >= 3 && text[1] == '-' && text[2] == '-' { out.WriteString("—") return 2 @@ -205,13 +219,13 @@ func smartDashLatex(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, return 0 } -func smartAmpVariant(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte, quote byte) int { +func (r *SPRenderer) smartAmpVariant(out *bytes.Buffer, previousChar byte, text []byte, quote byte, addNBSP bool) int { if bytes.HasPrefix(text, []byte(""")) { nextChar := byte(0) if len(text) >= 7 { nextChar = text[6] } - if smartQuoteHelper(out, previousChar, nextChar, quote, &smrt.inDoubleQuote) { + if smartQuoteHelper(out, previousChar, nextChar, quote, &r.inDoubleQuote, addNBSP) { return 5 } } @@ -224,15 +238,18 @@ func smartAmpVariant(out *bytes.Buffer, smrt *smartypantsData, previousChar byte return 0 } -func smartAmp(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { - return smartAmpVariant(out, smrt, previousChar, text, 'd') -} +func (r *SPRenderer) smartAmp(angledQuotes, addNBSP bool) func(*bytes.Buffer, byte, []byte) int { + var quote byte = 'd' + if angledQuotes { + quote = 'a' + } -func smartAmpAngledQuote(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { - return smartAmpVariant(out, smrt, previousChar, text, 'a') + return func(out *bytes.Buffer, previousChar byte, text []byte) int { + return r.smartAmpVariant(out, previousChar, text, quote, addNBSP) + } } -func smartPeriod(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { +func (r *SPRenderer) smartPeriod(out *bytes.Buffer, previousChar byte, text []byte) int { if len(text) >= 3 && text[1] == '.' && text[2] == '.' { out.WriteString("…") return 2 @@ -247,13 +264,13 @@ func smartPeriod(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, te return 0 } -func smartBacktick(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { +func (r *SPRenderer) smartBacktick(out *bytes.Buffer, previousChar byte, text []byte) int { if len(text) >= 2 && text[1] == '`' { nextChar := byte(0) if len(text) >= 3 { nextChar = text[2] } - if smartQuoteHelper(out, previousChar, nextChar, 'd', &smrt.inDoubleQuote) { + if smartQuoteHelper(out, previousChar, nextChar, 'd', &r.inDoubleQuote, false) { return 1 } } @@ -262,7 +279,7 @@ func smartBacktick(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, return 0 } -func smartNumberGeneric(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { +func (r *SPRenderer) smartNumberGeneric(out *bytes.Buffer, previousChar byte, text []byte) int { if wordBoundary(previousChar) && previousChar != '/' && len(text) >= 3 { // is it of the form digits/digits(word boundary)?, i.e., \d+/\d+\b // note: check for regular slash (/) or fraction slash (⁄, 0x2044, or 0xe2 81 84 in utf-8) @@ -304,7 +321,7 @@ func smartNumberGeneric(out *bytes.Buffer, smrt *smartypantsData, previousChar b return 0 } -func smartNumber(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { +func (r *SPRenderer) smartNumber(out *bytes.Buffer, previousChar byte, text []byte) int { if wordBoundary(previousChar) && previousChar != '/' && len(text) >= 3 { if text[0] == '1' && text[1] == '/' && text[2] == '2' { if len(text) < 4 || wordBoundary(text[3]) && text[3] != '/' { @@ -332,27 +349,27 @@ func smartNumber(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, te return 0 } -func smartDoubleQuoteVariant(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte, quote byte) int { +func (r *SPRenderer) smartDoubleQuoteVariant(out *bytes.Buffer, previousChar byte, text []byte, quote byte) int { nextChar := byte(0) if len(text) > 1 { nextChar = text[1] } - if !smartQuoteHelper(out, previousChar, nextChar, quote, &smrt.inDoubleQuote) { + if !smartQuoteHelper(out, previousChar, nextChar, quote, &r.inDoubleQuote, false) { out.WriteString(""") } return 0 } -func smartDoubleQuote(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { - return smartDoubleQuoteVariant(out, smrt, previousChar, text, 'd') +func (r *SPRenderer) smartDoubleQuote(out *bytes.Buffer, previousChar byte, text []byte) int { + return r.smartDoubleQuoteVariant(out, previousChar, text, 'd') } -func smartAngledDoubleQuote(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { - return smartDoubleQuoteVariant(out, smrt, previousChar, text, 'a') +func (r *SPRenderer) smartAngledDoubleQuote(out *bytes.Buffer, previousChar byte, text []byte) int { + return r.smartDoubleQuoteVariant(out, previousChar, text, 'a') } -func smartLeftAngle(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int { +func (r *SPRenderer) smartLeftAngle(out *bytes.Buffer, previousChar byte, text []byte) int { i := 0 for i < len(text) && text[i] != '>' { @@ -363,38 +380,78 @@ func smartLeftAngle(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, return i } -type smartCallback func(out *bytes.Buffer, smrt *smartypantsData, previousChar byte, text []byte) int +type smartCallback func(out *bytes.Buffer, previousChar byte, text []byte) int + +// NewSmartypantsRenderer constructs a Smartypants renderer object. +func NewSmartypantsRenderer(flags HTMLFlags) *SPRenderer { + var ( + r SPRenderer -type smartypantsRenderer [256]smartCallback + smartAmpAngled = r.smartAmp(true, false) + smartAmpAngledNBSP = r.smartAmp(true, true) + smartAmpRegular = r.smartAmp(false, false) + smartAmpRegularNBSP = r.smartAmp(false, true) -func smartypants(flags int) *smartypantsRenderer { - r := new(smartypantsRenderer) - if flags&HTML_SMARTYPANTS_ANGLED_QUOTES == 0 { - r['"'] = smartDoubleQuote - r['&'] = smartAmp + addNBSP = flags&SmartypantsQuotesNBSP != 0 + ) + + if flags&SmartypantsAngledQuotes == 0 { + r.callbacks['"'] = r.smartDoubleQuote + if !addNBSP { + r.callbacks['&'] = smartAmpRegular + } else { + r.callbacks['&'] = smartAmpRegularNBSP + } } else { - r['"'] = smartAngledDoubleQuote - r['&'] = smartAmpAngledQuote + r.callbacks['"'] = r.smartAngledDoubleQuote + if !addNBSP { + r.callbacks['&'] = smartAmpAngled + } else { + r.callbacks['&'] = smartAmpAngledNBSP + } } - r['\''] = smartSingleQuote - r['('] = smartParens - if flags&HTML_SMARTYPANTS_DASHES != 0 { - if flags&HTML_SMARTYPANTS_LATEX_DASHES == 0 { - r['-'] = smartDash + r.callbacks['\''] = r.smartSingleQuote + r.callbacks['('] = r.smartParens + if flags&SmartypantsDashes != 0 { + if flags&SmartypantsLatexDashes == 0 { + r.callbacks['-'] = r.smartDash } else { - r['-'] = smartDashLatex + r.callbacks['-'] = r.smartDashLatex } } - r['.'] = smartPeriod - if flags&HTML_SMARTYPANTS_FRACTIONS == 0 { - r['1'] = smartNumber - r['3'] = smartNumber + r.callbacks['.'] = r.smartPeriod + if flags&SmartypantsFractions == 0 { + r.callbacks['1'] = r.smartNumber + r.callbacks['3'] = r.smartNumber } else { for ch := '1'; ch <= '9'; ch++ { - r[ch] = smartNumberGeneric + r.callbacks[ch] = r.smartNumberGeneric + } + } + r.callbacks['<'] = r.smartLeftAngle + r.callbacks['`'] = r.smartBacktick + return &r +} + +// Process is the entry point of the Smartypants renderer. +func (r *SPRenderer) Process(w io.Writer, text []byte) { + mark := 0 + for i := 0; i < len(text); i++ { + if action := r.callbacks[text[i]]; action != nil { + if i > mark { + w.Write(text[mark:i]) + } + previousChar := byte(0) + if i > 0 { + previousChar = text[i-1] + } + var tmp bytes.Buffer + i += action(&tmp, previousChar, text[i:]) + w.Write(tmp.Bytes()) + mark = i + 1 } } - r['<'] = smartLeftAngle - r['`'] = smartBacktick - return r + if mark < len(text) { + w.Write(text[mark:]) + } } diff --git a/testdata/Inline HTML (Simple).html b/testdata/Inline HTML (Simple).html index 6bf78f8f..f84939c3 100644 --- a/testdata/Inline HTML (Simple).html +++ b/testdata/Inline HTML (Simple).html @@ -1,13 +1,13 @@

    Here's a simple block:

    - foo + foo

    This should be a code block, though:

    <div>
    -    foo
    +	foo
     </div>
     
    @@ -19,11 +19,11 @@

    Now, nested:

    -
    -
    - foo -
    -
    +
    +
    + foo +
    +

    This should just be an HTML comment:

    diff --git a/testdata/Inline HTML comments.html b/testdata/Inline HTML comments.html index 3f167a16..f201242d 100644 --- a/testdata/Inline HTML comments.html +++ b/testdata/Inline HTML comments.html @@ -3,7 +3,7 @@

    Paragraph two.

    diff --git a/testdata/Links, inline style.html b/testdata/Links, inline style.html index 5802f2de..a6112607 100644 --- a/testdata/Links, inline style.html +++ b/testdata/Links, inline style.html @@ -8,4 +8,6 @@

    URL and title.

    +

    URL with backslash\.

    +

    [Empty]().

    diff --git a/testdata/Links, inline style.text b/testdata/Links, inline style.text index 09017a90..6de7ab4c 100644 --- a/testdata/Links, inline style.text +++ b/testdata/Links, inline style.text @@ -8,5 +8,6 @@ Just a [URL](/url/). [URL and title](/url/ "title has spaces afterward" ). +[URL with backslash\\](/url/). [Empty](). diff --git a/testdata/Markdown Documentation - Basics.html b/testdata/Markdown Documentation - Basics.html index ea3a61c3..7aafd550 100644 --- a/testdata/Markdown Documentation - Basics.html +++ b/testdata/Markdown Documentation - Basics.html @@ -244,6 +244,18 @@

    Links

    <a href="http://www.nytimes.com/">The New York Times</a>.</p> +

    It is also common to find other protocols such as ftp used for links:

    + +

    Input:

    + +
    For example one may test download speeds [here](ftp://speedtest.tele2.net/)
    +
    + +

    Output:

    + +
    <p>For example one may test download speeds <a href="ftp://speedtest.tele2.net/">here</a>.</p>
    +
    +

    Images

    Image syntax is very much like link syntax.

    diff --git a/testdata/Markdown Documentation - Basics.text b/testdata/Markdown Documentation - Basics.text index 486055ca..9ac8a56c 100644 --- a/testdata/Markdown Documentation - Basics.text +++ b/testdata/Markdown Documentation - Basics.text @@ -239,6 +239,15 @@ Output:

    I start my morning with a cup of coffee and The New York Times.

    +It is also common to find other protocols such as ftp used for links: + +Input: + + For example one may test download speeds [here](ftp://speedtest.tele2.net/) + +Output: + +

    For example one may test download speeds here.

    ### Images ### diff --git a/testdata/Markdown Documentation - Syntax.html b/testdata/Markdown Documentation - Syntax.html index 61dde593..6cd05fb9 100644 --- a/testdata/Markdown Documentation - Syntax.html +++ b/testdata/Markdown Documentation - Syntax.html @@ -939,8 +939,8 @@

    Backslash Escapes

    [] square brackets () parentheses # hash mark -+ plus sign -- minus sign (hyphen) ++ plus sign +- minus sign (hyphen) . dot ! exclamation mark diff --git a/testdata/Tabs.html b/testdata/Tabs.html index 64006d96..509b41c6 100644 --- a/testdata/Tabs.html +++ b/testdata/Tabs.html @@ -13,13 +13,13 @@

    And:

    -
        this code block is indented by two tabs
    +
    	this code block is indented by two tabs
     

    And:

    -
    +   this is an example list item
    -    indented with tabs
    +
    +	this is an example list item
    +	indented with tabs
     
     +   this is an example list item
         indented with spaces