MD-LD

Markdown-Linked Data (MD-LD) — a deterministic, streaming-friendly RDF authoring format that extends Markdown with explicit {...} annotations.

Demo | Repository

🚀 Quick Start

pnpm install mdld-parse

import { parse } from 'mdld-parse';

const result = parse(`
[ex] <http://example.org/>

# Document {=ex:doc .ex:Article label}

[Alice] {?ex:author =ex:alice .prov:Person ex:firstName label}
[Smith] {ex:lastName}`);

console.log(result.quads);
// RDF/JS quads ready for n3.js, rdflib, etc.
// @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
// @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
// @prefix prov: <http://www.w3.org/ns/prov#>.
// @prefix ex: <http://example.org/>.

// ex:doc a ex:Article;
//     rdfs:label "Document";
//     ex:author ex:alice.
// ex:alice a prov:Person;
//     rdfs:label "Alice";
//     ex:firstName "Alice";
//     ex:lastName "Smith".

📚 Documentation Hub

📖 Documentation - Complete documentation with guides and references
🎯 Examples - Real-world MD-LD examples and use cases
📋 Specification - Formal specification and test suite

✨ Core Features

🔗 Prefix folding - Build hierarchical namespaces with lightweight IRI authoring
📍 Subject declarations - {=IRI} and {=#fragment} for context setting
🎯 Object IRIs - {+IRI} and {+#fragment} for temporary object declarations
🔄 Three predicate forms - p (S→L), ?p (S→O), !p (O→S)
🏷️ Type declarations - .Class for rdf:type triples
📅 Datatypes & language - ^^xsd:date and @en support
🧩 Fragments - Document structuring with {=#fragment}
⚡ Polarity system - Sophisticated diff authoring with + and - prefixes
📍 Origin tracking - Complete provenance with lean quad-to-source mapping
🎯 Elevated statements - Automatic rdf:Statement pattern detection for "golden" graph extraction

🌟 What is MD-LD?

MD-LD allows you to author RDF graphs directly in Markdown using explicit {...} annotations:

⚡ Performance & Architecture

MD-LD v0.10.0 features a character-based tokenization system for optimal performance:

20-28% faster parsing than regex-based approaches
Memory-efficient with ~640 bytes per quad retained
Streaming-friendly with O(n) linear time complexity
Character-based detection replaces complex regex patterns
Unified tokenizer architecture in src/tokenizers.js

Tokenizer Architecture

The parser uses specialized character-based tokenizers:

// Block-level tokenizers
detectFence()      // ```code blocks
detectPrefix()     // [prefix] <iri>
detectHeading()     // # Headings
detectList()       // - List items
detectBlockquote()  // > Blockquotes
detectStandaloneSubject() // {=subject}

// Inline carrier scanner
scanInlineCarriers() // [text], **bold**, `code`, <URL>

This design provides:

Better maintainability - Easier to debug and extend
Improved error handling - More precise edge case detection
Cleaner code structure - No complex regex patterns
Full backward compatibility - All 127 tests passing

[my] <tag:alice@example.com,2026:>
# 2024-07-18 {=my:journal-2024-07-18 .my:Event my:date ^^xsd:date}
## A good day {label}
Mood: [Happy] {my:mood}
Energy level: [8] {my:energyLevel ^^xsd:integer}
Met [Sam] {+my:sam .my:Person ?my:attendee} on my regular walk at [Central Park] {+my:central-park ?my:location .my:Place label @en} and talked about [Sunny] {my:weather} weather.

Generates valid RDF triples with complete provenance tracking.

📦 Installation

Node.js

pnpm install mdld-parse

import { parse } from 'mdld-parse';

const markdown = `# Document {=ex:doc .Article}
[Alice] {author}`;

const result = parse({
  text: markdown,
  context: { ex: 'http://example.org/' }
});

console.log(result.quads);
// RDF/JS quads ready for n3.js, rdflib, etc.

Browser (ES Modules)

<script type="module">
  import { parse } from 'https://cdn.jsdelivr.net/npm/mdld-parse/+esm';
  
  const result = parse('# Hello {=ex:hello label}');
</script>

🧠 Semantic Model

MD-LD encodes a directed labeled multigraph where three nodes may be in scope:

S — current subject (IRI)
O — object resource (IRI from link/image)
L — literal value (string + optional datatype/language)

Predicate Routing (§8.1)

Each predicate form determines the graph edge:

Form	Edge	Example	Meaning
`p`	S → L	`[Alice] {name}`	literal property
`?p`	S → O	`[NASA] {=ex:nasa ?org}`	object property
`!p`	O → S	`[Parent] {=ex:p !hasPart}`	reverse object

📍 Elevated Statements

MD-LD automatically detects rdf:Statement patterns during parsing and extracts elevated SPO quads for convenient consumption by applications.

Pattern Detection

When the parser encounters a complete rdf:Statement pattern with rdf:subject, rdf:predicate, and rdf:object, it automatically adds the corresponding SPO quad to the statements array:

[ex] <http://example.org/>

## Elevated statement {=ex:stmt1 .rdf:Statement}
**Alice** {+ex:alice ?rdf:subject} *knows* {+ex:knows ?rdf:predicate} **Bob** {+ex:bob ?rdf:object}

Direct statement:**Alice** {=ex:alice} knows **Bob** {?ex:knows +ex:bob}

🎨 Syntax Quick Reference

Subject Declaration

Set current subject (emits no quads):

## Apollo 11 {=ex:apollo11}

Fragment Syntax

Create fragment IRIs relative to current subject:

# Document {=ex:document}
{=#summary}
[Content] {label}

Fragments replace any existing fragment and require a current subject.

Type Declaration

Emit rdf:type triple:

## Apollo 11 {=ex:apollo11 .ex:SpaceMission .ex:Event}

Literal Properties

Inline value carriers emit literal properties:

# Mission {=ex:apollo11}
[Neil Armstrong] {ex:commander}
[1969] {ex:year ^^xsd:gYear}
[Historic mission] {ex:description @en}

Object Properties

Links create relationships (use ? prefix):

# Mission {=ex:apollo11}
[NASA] {=ex:nasa ?ex:organizer}

Resource Declaration

Declare resources inline with {=iri}:

# Mission {=ex:apollo11}
[Neil Armstrong] {=ex:armstrong ?ex:commander .prov:Person}

🔧 API Reference

`parse({ text, context, dataFactory, graph })`

Parse MD-LD markdown and return RDF quads with lean origin tracking.

Parameters (named object):

text (string, required) — MD-LD formatted text
context (object, optional) — Prefix mappings (default: { '@vocab': 'http://www.w3.org/2000/01/rdf-schema#', rdf, rdfs, xsd, sh, prov })
dataFactory (object, optional) — Custom RDF/JS DataFactory
graph (string, optional) — Named graph IRI

Returns: { quads, remove, statements, origin, context, primarySubject, md }

Legacy: parse(text, options) still works for backward compatibility

quads — Array of RDF/JS Quads (final resolved graph state)
remove — Array of RDF/JS Quads (external retractions targeting prior state)
statements — Array of elevated RDF/JS Quads extracted from rdf:Statement patterns
origin — Lean origin tracking object with quadIndex for UI navigation
context — Final context used (includes prefixes)
primarySubject — String IRI or null (first non-fragment subject declaration)
md — Clean Markdown with all MD-LD annotations stripped (round-trip safe)

`merge(docs, options)`

Merge multiple MDLD documents with diff polarity resolution.

Parameters:

docs (array) — Array of markdown strings or ParseResult objects
options (object, optional):
- context (object) — Prefix mappings (merged with DEFAULT_CONTEXT)

Returns: { quads, remove, origin, context, primarySubjects }

quads — Array of RDF/JS Quads (final resolved graph state)
remove — Array of RDF/JS Quads (external retractions targeting prior state)
origin — Merge origin tracking with document index and polarity
context — Final merged context
primarySubjects — Array of string IRIs (primary subjects from each document, in merge order)

`generate({ quads, context, primarySubject })`

Generate deterministic MDLD from RDF quads with visual styling.

Parameters (named object):

quads (array, required) — Array of RDF/JS Quads to convert
context (object, optional) — Prefix mappings (default: {})
primarySubject (string, optional) — String IRI to place first in output (ensures round-trip safety). If not provided, falls back to the first subject from quads.

Returns: { text, context }

Features:

Visual carrier styles based on datatype (code spans for numbers, bold booleans, etc.)
Label-in-heading: Uses rdfs:label in subject headings when available
Multiple labels: First label in heading, additional labels rendered as literals
Round-trip safe: All data preserved through parse → generate → parse
Composable: generate(parse(text)) extracts semantics; parse(generate({quads})) normalizes quads

`generateNode({ quads, focusIRI, context })`

Generate node-centric MDLD showing all quads where a specific IRI appears in any position.

Parameters (named object):

quads (array, required) — Array of RDF/JS Quads to search
focusIRI (string, required) — The IRI to center the view on
context (object, optional) — Prefix mappings (default: {})

Returns: { text, context }

Behavior (Safety-First):

If focusIRI is null/undefined: Returns empty text
If focusIRI not in graph: Returns empty text (never falls back to all data)
If quads is empty: Returns empty text

Safety rationale: Prevents accidental rendering of entire databases on misspelled IRIs—critical for production use with LLM cost per token. Explicit emptiness signals "not found" to the caller.

Use case: Perfect for exploring a specific node and all its relationships—where it appears as subject, object, predicate, type, or datatype. Creates an exhaustive view of everything related to the focus IRI. Ideal for node-centric knowledge graph explorers.

Composition Patterns

With the unified named parameter API, parse() and generate() compose seamlessly through object spreading:

import { parse, generate, generateNode } from 'mdld-parse';

// Pattern 1: parse → generate (semantic extraction)
const canonical = generate({ ...parse({ text, context }) });
// text → quads → canonical MDLD (deterministic, visual styling applied)

// Pattern 2: generate → parse (normalize external RDF)
const normalized = parse({ ...generate({ quads: externalQuads, context }) });
// external quads → MDLD → validated quads (DataFactory-safe, no blank nodes)

// Pattern 3: parse → generateNode (node-centric exploration)
const nodeView = generateNode({ ...parse({ text }), focusIRI: 'http://example.org/alice' });
// full graph → isolated node view (safe: returns empty if IRI not found)

Why this works:

parse() returns { quads, context, primarySubject, md, ... }
generate() accepts { quads, context, primarySubject }
generateNode() accepts { quads, context, focusIRI } (with focusIRI override)
Perfect shape alignment enables elegant { ...spread } composition

The `md` Field — Clean Markdown Extraction

Every parse() result includes a md field containing the original Markdown with all MD-LD annotations stripped:

const result = parse({
  text: `# Document {=ex:doc .Article}\n[Content] {ex:content}`,
  context: { ex: 'http://example.org/' }
});

console.log(result.md);
// # Document\nContent

// Round-trip safety: re-parsing clean MD produces zero quads
const reparsed = parse({ text: result.md });
console.log(reparsed.quads.length); // 0

Behavior:

Valid MD-LD annotations ({=...}, {+...}, {...}) are completely removed
Content from value carriers ([text], **bold**, `code`) is preserved
Invalid syntax (annotations not at end-of-line) is preserved as visible markers
Headings, lists, blockquotes, code blocks maintain their structure
Prefix declarations at start of line are stripped
Standalone subject declarations ({=ex:subject}) are stripped

Use cases:

Content extraction — Get readable Markdown without semantic markup
Syntax validation — Remaining {...} patterns indicate invalid MD-LD syntax
Round-trip testing — parse(md).md should parse to zero quads
Preview generation — Show clean document before publishing

`locate(quad, origin)`

Locate the origin entry for a quad using the lean origin system.

Parameters:

quad (object) — The quad to locate (subject, predicate, object)
origin (object) — Origin object containing quadIndex

Returns: { blockId, range, carrierType, subject, predicate, context, value, polarity } or null

`render(quads, options)`

Render RDF quads as HTML+RDFa for web display.

Parameters:

quads (array) — Array of RDF/JS Quads to render
options (object, optional):
- context (object) — Prefix mappings for CURIE shortening
- baseIRI (string) — Base IRI for resolving relative references

Returns: { html, context }

Utility Functions

import {
  DEFAULT_CONTEXT,    // Default prefix mappings
  DataFactory,        // RDF/JS DataFactory instance
  hash,              // String hashing function
  expandIRI,         // IRI expansion with context
  shortenIRI,        // IRI shortening with context
  parseSemanticBlock // Parse semantic block syntax
} from 'mdld-parse';

🏗️ Architecture

Design Principles

Zero dependencies — Pure JavaScript, ~15KB minified
Streaming-first — Single-pass parsing, O(n) complexity
Standards-compliant — RDF/JS data model
Origin tracking — Full round-trip support with source maps
Explicit semantics — No guessing, inference, or heuristics

RDF/JS Compatibility

Quads are compatible with:

n3.js — Turtle/N-Triples serialization
rdflib.js — RDF store and reasoning
sparqljs — SPARQL queries
rdf-ext — Extended RDF utilities

🧪 Testing

The parser includes comprehensive tests covering all spec requirements:

pnpm test

Tests validate:

Subject declaration and context
All predicate forms (p, ?p, !p)
Datatypes and language tags
Explicit list item annotations
Code blocks and blockquotes
Round-trip serialization

Name		Name	Last commit message	Last commit date
Latest commit History 306 Commits
.github/workflows		.github/workflows
demo		demo
docs		docs
examples		examples
grammar		grammar
public		public
spec		spec
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
index.html		index.html
index.md		index.md
logo.svg		logo.svg
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
vite.config.demo.js		vite.config.demo.js
vite.config.js		vite.config.js
vite.config.lib.js		vite.config.lib.js

Folders and files

Latest commit

History

Repository files navigation

MD-LD

🚀 Quick Start

📚 Documentation Hub

✨ Core Features

🌟 What is MD-LD?

⚡ Performance & Architecture

Tokenizer Architecture

📦 Installation

Node.js

Browser (ES Modules)

🧠 Semantic Model

Predicate Routing (§8.1)

📍 Elevated Statements

Pattern Detection

🎨 Syntax Quick Reference

Subject Declaration

Fragment Syntax

Type Declaration

Literal Properties

Object Properties

Resource Declaration

🔧 API Reference

parse({ text, context, dataFactory, graph })

merge(docs, options)

generate({ quads, context, primarySubject })

generateNode({ quads, focusIRI, context })

Composition Patterns

The md Field — Clean Markdown Extraction

locate(quad, origin)

render(quads, options)

Utility Functions

🏗️ Architecture

Design Principles

RDF/JS Compatibility

🧪 Testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

`parse({ text, context, dataFactory, graph })`

`merge(docs, options)`

`generate({ quads, context, primarySubject })`

`generateNode({ quads, focusIRI, context })`

The `md` Field — Clean Markdown Extraction

`locate(quad, origin)`

`render(quads, options)`