Reference Style Links are Converted (and lost) #34

tajmone · 2016-12-19T11:08:56Z

I wanted to carry on from issue #26 (now closed).

markdownfmt converts reference-styled links into normal inline links. Eg:

[link text][link id]  
[link id]: http://www.somelink.com      "my ref-link title

becomes:

[link text](http://www.somelink.com "my ref-link title")

I think this is quite an issue. A tidy-up tool shouldn't convert these links and remove the reference labels — tidy-markdown doesn't.

I would like to motivate this issue through real-use scenarios examples.

It is common practice when working with markdown documentation to store all links at the end of a document, in reference-link sections. This means that when a link changes, one only has to go at the bottom of all documents and change the links section and all links within the document will be «updated» (so to speak).

To give you an example: using RawGit to provide html live previews — the proper way to use RawGit for production links is to use the cdn.rawgit URL, which caches permanently the preview. So, usually links point to specific commits of the file, to prevent permanent caching of the master branch version. As files change in time, so do their RawGit preview URLs:

https://cdn.rawgit.com/username/repo/baf2857/somepath/index.html

after a new commit (hash: xyz1938) might become:

https://cdn.rawgit.com/username/repo/xyz1938/somepath/index.html

And so on with all links that use RawGit — the commit part of the URLs will have to change to accomodate the new commit hash.

And instead of going mad to search-and-replace all URLs inside the actual links, the quick way is to change it just in the links-section at the end of each document in the repo — something often automated with scripts in large documentation projects.

Basically, markdown makes life a lot easier with reference-styled links — I personally store some snippets files with all the links used in a project, carry out all changes to links in the snippet files, and then just copy-and-paste the whole links-section over the old one in all docs (yep: ALL DOCS, because unused link-ids are harmless, they just don't show up in the final preview of the document!).

In bigger projects, I use a pandoc toolchain, and invoke pandoc to always add/include the reference-links section file when processing sources — so the actual link-section is stored in a single file, used by all documents for all external links. This is a cool way to handle massive quantities of links. And it also allows to have different link-sections for different versions of a project (eg for a repo which is both on GitHub and Bitbucket) without having duplicate documents.

But Why?

What I haven't yet figure out is why blackfriday converts these links. Usually this happens in parsers which built an AST from the source (like pandoc), and in the post-processing phase they replace all references with their actual values. But I haven't yet worked my way into the source code enough to understand how it all works under the hood.

If this was the case, it might not be so easy to change this behavior. Any ideas about it?

But definitely, a clean-up tool shouldn't remove elements from the doc.

The text was updated successfully, but these errors were encountered:

dmitshur · 2016-12-23T18:00:41Z

Hi @tajmone, thank you for writing up this detailed issue. I will reply to various parts of your comment.

I think this is quite an issue. A tidy-up tool shouldn't convert these links and remove the reference labels — tidy-markdown doesn't.

But definitely, a clean-up tool shouldn't remove elements from the doc.

First thing to keep in mind about markdownfmt is that it's not simple a clean-up tool. Its description reads:

Like gofmt, but for Markdown.

I should elaborate, but what I mean by that is... Like gofmt, it's meant to be opinionated, and it tries to unify multiple equivalent ways of doing something into one canonical format. For example, it always uses - item for bullet points (instead of * item), --- for headers (instead of ***** or many other valid combinations), and **bold for making text bold (instead of __bold__).

When I first created markdownfmt, I really wanted it to be something everyone can run on most Markdown files, just like people run gofmt on all .go files. While not everyone likes the exact formatting decisions gofmt makes, people do like the overall consistency of having one unified tool.

Later on, I realized that Markdown is too vague of a "specification" and has too many variations and inconsistencies for this to be possible, so I know it's not an attainable goal. As I said in shurcooL-legacy/atom-markdown-format#26 (comment), If I really wanted to achieve that today, I'd begin by creating a standalone spec first.

However, this is just background information. Let me get into this specific issue.

I would like to motivate this issue through real-use scenarios examples.

Thanks for doing that, it's very helpful.

It is common practice when working with markdown documentation to store all links at the end of a document, in reference-link sections. This means that when a link changes, one only has to go at the bottom of all documents and change the links section and all links within the document will be «updated» (so to speak).

To give you an example: using RawGit to provide html live previews — the proper way to use RawGit for production links is to use the cdn.rawgit URL, which caches permanently the preview. So, usually links point to specific commits of the file, to prevent permanent caching of the master branch version. As files change in time, so do their RawGit preview URLs:
https://cdn.rawgit.com/username/repo/baf2857/somepath/index.html
after a new commit (hash: xyz1938) might become:
https://cdn.rawgit.com/username/repo/xyz1938/somepath/index.html
And so on with all links that use RawGit — the commit part of the URLs will have to change to accomodate the new commit hash.

And instead of going mad to search-and-replace all URLs inside the actual links, the quick way is to change it just in the links-section at the end of each document in the repo — something often automated with scripts in large documentation projects.

Basically, markdown makes life a lot easier with reference-styled links — I personally store some snippets files with all the links used in a project, carry out all changes to links in the snippet files, and then just copy-and-paste the whole links-section over the old one in all docs (yep: ALL DOCS, because unused link-ids are harmless, they just don't show up in the final preview of the document!).

In bigger projects, I use a pandoc toolchain, and invoke pandoc to always add/include the reference-links section file when processing sources — so the actual link-section is stored in a single file, used by all documents for all external links. This is a cool way to handle massive quantities of links. And it also allows to have different link-sections for different versions of a project (eg for a repo which is both on GitHub and Bitbucket) without having duplicate documents.

Thank you for providing that rationale. It's convincing, and I understand why it would be preferable to maintain the two type of links separately instead of merging them into the same inline kind. I agree. 👍

Now let's discuss the potential implementation side of this.

What I haven't yet figure out is why blackfriday converts these links. Usually this happens in parsers which built an AST from the source (like pandoc), and in the post-processing phase they replace all references with their actual values. But I haven't yet worked my way into the source code enough to understand how it all works under the hood.

If this was the case, it might not be so easy to change this behavior. Any ideas about it?

The current blackfriday implementation is very old, and has certain limitations that are unavoidable. One of them is that it does not preserve the original markdown document layout, and it's a very lossy parsing. It only emits enough information to produce HTML for the rendered Markdown.

So, it's actually extremely hard to implement this with that blackfriday implementation, even if I really wanted to. You can also see #6 (comment) for a detailed explanation.

However, there is a version 2 of blackfriday which resolves this limitation. If markdownfmt were to be rewritten to use blackfriday v2, then this becomes a possibility.

tajmone · 2016-12-24T00:33:36Z

@shurcooL , thanks for the great reply.

I personally don't see any issue wheather markdownfmt uses *, - or + for bullet lists -- as long as it enforces some kind of accepted markdown standard it serves to avoid diffing nightmares and false commits. Aesthetics aside, markdown is intended as an intermediate format, so what counts is what you get when you convert it to some other format.

I wasn't aware of this blackfriday limitations, nor that a version 2 is available. Thanks for letting me know, and for having accepted this proposal as an enhancement.

Wish I could help, but right now my knowledge in Go is too weak ... and I already juggling with a few langs I'm learning.

B.R.

Tristano

hartzell · 2018-02-16T23:39:41Z

@shurcooL, first thanks for sharing this tool. I'd also like to see this behavior change. Is there an active effort to move to v2 of blackfriday?

hartzell · 2018-02-18T00:54:24Z

@shurcooL , @jsternberg

I'm not a fan of markdownfmt inlining reference links. I use them when I need use a link in multiple places in the document and I want to avoid repeating myself and creating an opportunity to update some links but not others.

@JStenberg has a fork of markdownfmt that uses blackdown.v2, which the earlier comments in this thread suggested has sufficient hooks to enable rendering reference links.

I've been reading through the blackfriday source and running some dumb tests w/ spew.Dump() and I'm afraid that blackfriday is still discarding what we'd need to preserve reference links.

Here is the bit where they're parsed. Reference links ([e.g.][this] seem to be collapsed into the same LinkData struct as the other link types.

Heavy handed use of spew.Dump() in the the blackfriday HTML renderer supports this conclusion (e.g. around line 547 of html.go):

	case Link:
		// mark it but don't link it if it is not a safe link: no smartypants
		spew.Dump(node.LinkData)
		dest := node.LinkData.Destination
		if needSkipLink(r.Flags, dest) {

I've tentatively concluded that to get the behavior I want I'd need to convince blackfriday to use two different types of Nodes and the modify the renders. Does that match your understanding?

dmitshur · 2018-03-02T19:31:59Z

Is there an active effort to move to v2 of blackfriday?

See issue #39 and PR #40 for the latest updates on that.

@JStenberg has a fork of markdownfmt that uses blackdown.v2, which the earlier comments in this thread suggested has sufficient hooks to enable rendering reference links.

I've been reading through the blackfriday source and running some dumb tests w/ spew.Dump() and I'm afraid that blackfriday is still discarding what we'd need to preserve reference links. [...] Does that match your understanding?

It was my understanding that blackfriday v2 would preserve the information about whether the link was inline or reference-style, as preserving more information about the parsed markdown document was a part of the design goal of v2. I hadn't actually confirmed whether that's the case for this particular case.

/cc @rtfb Do you know if v2 lets one be able to tell whether a link was inline or reference-style?

jsternberg · 2018-03-02T19:34:44Z

I don't know about the inline vs reference-style, but v2 of blackfriday isn't able to tell you whether a link is an auto link or not. So you can't tell the difference between these: [http://example.com](http://example.com) and http://example.com. I just opted to make it always use an auto link because I figured markdownfmt was going to make decisions on how to format a document anyway.

hartzell · 2018-03-03T21:57:02Z

I'm 99.9% sure that blackfriday.v2 is discarding the information about the type of link used and collapsing it into the same node for either case.

Here is the code that's doing the work.

The fastest hack to peek at what was happening was to add a call to spew.Dump() in the the blackfriday HTML renderer like so (e.g. around line 547 of html.go):

	case Link:
		// mark it but don't link it if it is not a safe link: no smartypants
		spew.Dump(node.LinkData)
		dest := node.LinkData.Destination
		if needSkipLink(r.Flags, dest) {

and then run a little test program.

package main

import (
	"fmt"

	"gopkg.in/russross/blackfriday.v2"
)

func main() {
	input := []byte(`
- This is [reference][moose]
- This is [inline](http://www.georgehartzell.com)

[moose]: http://www.georgehartzell.com
`)
	output := blackfriday.Run(input)
	fmt.Println(string(output))
}

It generates the following output (dump is called twice, explanation here):

(blackfriday.LinkData) {
 Destination: ([]uint8) (len=29 cap=64) {
  00000000  68 74 74 70 3a 2f 2f 77  77 77 2e 67 65 6f 72 67  |http://www.georg|
  00000010  65 68 61 72 74 7a 65 6c  6c 2e 63 6f 6d           |ehartzell.com|
 },
 Title: ([]uint8) (cap=47) {
 },
 NoteID: (int) 0,
 Footnote: (*blackfriday.Node)(<nil>)
}
(blackfriday.LinkData) {
 Destination: ([]uint8) (len=29 cap=64) {
  00000000  68 74 74 70 3a 2f 2f 77  77 77 2e 67 65 6f 72 67  |http://www.georg|
  00000010  65 68 61 72 74 7a 65 6c  6c 2e 63 6f 6d           |ehartzell.com|
 },
 Title: ([]uint8) (cap=47) {
 },
 NoteID: (int) 0,
 Footnote: (*blackfriday.Node)(<nil>)
}
(blackfriday.LinkData) {
 Destination: ([]uint8) (len=29 cap=64) {
  00000000  68 74 74 70 3a 2f 2f 77  77 77 2e 67 65 6f 72 67  |http://www.georg|
  00000010  65 68 61 72 74 7a 65 6c  6c 2e 63 6f 6d           |ehartzell.com|
 },
 Title: ([]uint8) <nil>,
 NoteID: (int) 0,
 Footnote: (*blackfriday.Node)(<nil>)
}
(blackfriday.LinkData) {
 Destination: ([]uint8) (len=29 cap=64) {
  00000000  68 74 74 70 3a 2f 2f 77  77 77 2e 67 65 6f 72 67  |http://www.georg|
  00000010  65 68 61 72 74 7a 65 6c  6c 2e 63 6f 6d           |ehartzell.com|
 },
 Title: ([]uint8) <nil>,
 NoteID: (int) 0,
 Footnote: (*blackfriday.Node)(<nil>)
}
<ul>
<li>This is <a href="http://www.georgehartzell.com">reference</a></li>
<li>This is <a href="http://www.georgehartzell.com">inline</a></li>
</ul>

rtfb · 2018-03-04T09:58:03Z

Thanks for digging into this, @hartzell! You're right, currently Blackfriday does not record this information. Here's a tracking issue for that: russross/blackfriday#369 (not exactly the same issue, but close enough to clump together).

dmitshur added the enhancement label Dec 23, 2016

dmitshur mentioned this issue Dec 23, 2016

SmartyPants-Unicode handling of quotes, dashes, and ellipses #33

Open

dmitshur mentioned this issue Jun 25, 2018

markdownfmt makes annoying and completely unnecessary changes. #42

Open

DavyJ0nes mentioned this issue Nov 15, 2023

feat: switch markdown formatter einride/sage#486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference Style Links are Converted (and lost) #34

Reference Style Links are Converted (and lost) #34

tajmone commented Dec 19, 2016

dmitshur commented Dec 23, 2016

tajmone commented Dec 24, 2016

hartzell commented Feb 16, 2018

hartzell commented Feb 18, 2018

dmitshur commented Mar 2, 2018 •

edited

Loading

jsternberg commented Mar 2, 2018

hartzell commented Mar 3, 2018

rtfb commented Mar 4, 2018

Reference Style Links are Converted (and lost) #34

Reference Style Links are Converted (and lost) #34

Comments

tajmone commented Dec 19, 2016

But Why?

dmitshur commented Dec 23, 2016

tajmone commented Dec 24, 2016

hartzell commented Feb 16, 2018

hartzell commented Feb 18, 2018

dmitshur commented Mar 2, 2018 • edited Loading

jsternberg commented Mar 2, 2018

hartzell commented Mar 3, 2018

rtfb commented Mar 4, 2018

dmitshur commented Mar 2, 2018 •

edited

Loading