Markdown-Linked Data (MD-LD) — a deterministic, streaming-friendly RDF authoring format that extends Markdown with explicit {...} annotations.
pnpm install mdld-parseimport { parse } from 'mdld-parse';
const result = parse(`
[ex] <http://example.org/>
# Document {=ex:doc .ex:Article label}
[Alice] {?ex:author =ex:alice .prov:Person ex:firstName label}
[Smith] {ex:lastName}`);
console.log(result.quads);
// RDF/JS quads ready for n3.js, rdflib, etc.
// @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
// @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
// @prefix prov: <http://www.w3.org/ns/prov#>.
// @prefix ex: <http://example.org/>.
// ex:doc a ex:Article;
// rdfs:label "Document";
// ex:author ex:alice.
// ex:alice a prov:Person;
// rdfs:label "Alice";
// ex:firstName "Alice";
// ex:lastName "Smith".- 📖 Documentation - Complete documentation with guides and references
- 🎯 Examples - Real-world MD-LD examples and use cases
- 📋 Specification - Formal specification and test suite
- 🔗 Prefix folding - Build hierarchical namespaces with lightweight IRI authoring
- 📍 Subject declarations -
{=IRI}and{=#fragment}for context setting - 🎯 Object IRIs -
{+IRI}and{+#fragment}for temporary object declarations - 🔄 Three predicate forms -
p(S→L),?p(S→O),!p(O→S) - 🏷️ Type declarations -
.Classfor rdf:type triples - 📅 Datatypes & language -
^^xsd:dateand@ensupport - 🧩 Fragments - Document structuring with
{=#fragment} - ⚡ Polarity system - Sophisticated diff authoring with
+and-prefixes - 📍 Origin tracking - Complete provenance with lean quad-to-source mapping
- 🎯 Elevated statements - Automatic rdf:Statement pattern detection for "golden" graph extraction
MD-LD allows you to author RDF graphs directly in Markdown using explicit {...} annotations:
MD-LD v0.10.0 features a character-based tokenization system for optimal performance:
- 20-28% faster parsing than regex-based approaches
- Memory-efficient with ~640 bytes per quad retained
- Streaming-friendly with O(n) linear time complexity
- Character-based detection replaces complex regex patterns
- Unified tokenizer architecture in
src/tokenizers.js
The parser uses specialized character-based tokenizers:
// Block-level tokenizers
detectFence() // ```code blocks
detectPrefix() // [prefix] <iri>
detectHeading() // # Headings
detectList() // - List items
detectBlockquote() // > Blockquotes
detectStandaloneSubject() // {=subject}
// Inline carrier scanner
scanInlineCarriers() // [text], **bold**, `code`, <URL>This design provides:
- Better maintainability - Easier to debug and extend
- Improved error handling - More precise edge case detection
- Cleaner code structure - No complex regex patterns
- Full backward compatibility - All 127 tests passing
[my] <tag:alice@example.com,2026:>
# 2024-07-18 {=my:journal-2024-07-18 .my:Event my:date ^^xsd:date}
## A good day {label}
Mood: [Happy] {my:mood}
Energy level: [8] {my:energyLevel ^^xsd:integer}
Met [Sam] {+my:sam .my:Person ?my:attendee} on my regular walk at [Central Park] {+my:central-park ?my:location .my:Place label @en} and talked about [Sunny] {my:weather} weather. Generates valid RDF triples with complete provenance tracking.
pnpm install mdld-parseimport { parse } from 'mdld-parse';
const markdown = `# Document {=ex:doc .Article}
[Alice] {author}`;
const result = parse({
text: markdown,
context: { ex: 'http://example.org/' }
});
console.log(result.quads);
// RDF/JS quads ready for n3.js, rdflib, etc.<script type="module">
import { parse } from 'https://cdn.jsdelivr.net/npm/mdld-parse/+esm';
const result = parse('# Hello {=ex:hello label}');
</script>MD-LD encodes a directed labeled multigraph where three nodes may be in scope:
- S — current subject (IRI)
- O — object resource (IRI from link/image)
- L — literal value (string + optional datatype/language)
Each predicate form determines the graph edge:
| Form | Edge | Example | Meaning |
|---|---|---|---|
p |
S → L | [Alice] {name} |
literal property |
?p |
S → O | [NASA] {=ex:nasa ?org} |
object property |
!p |
O → S | [Parent] {=ex:p !hasPart} |
reverse object |
MD-LD automatically detects rdf:Statement patterns during parsing and extracts elevated SPO quads for convenient consumption by applications.
When the parser encounters a complete rdf:Statement pattern with rdf:subject, rdf:predicate, and rdf:object, it automatically adds the corresponding SPO quad to the statements array:
[ex] <http://example.org/>
## Elevated statement {=ex:stmt1 .rdf:Statement}
**Alice** {+ex:alice ?rdf:subject} *knows* {+ex:knows ?rdf:predicate} **Bob** {+ex:bob ?rdf:object}
Direct statement:**Alice** {=ex:alice} knows **Bob** {?ex:knows +ex:bob} Set current subject (emits no quads):
## Apollo 11 {=ex:apollo11}Create fragment IRIs relative to current subject:
# Document {=ex:document}
{=#summary}
[Content] {label}Fragments replace any existing fragment and require a current subject.
Emit rdf:type triple:
## Apollo 11 {=ex:apollo11 .ex:SpaceMission .ex:Event}Inline value carriers emit literal properties:
# Mission {=ex:apollo11}
[Neil Armstrong] {ex:commander}
[1969] {ex:year ^^xsd:gYear}
[Historic mission] {ex:description @en}Links create relationships (use ? prefix):
# Mission {=ex:apollo11}
[NASA] {=ex:nasa ?ex:organizer}Declare resources inline with {=iri}:
# Mission {=ex:apollo11}
[Neil Armstrong] {=ex:armstrong ?ex:commander .prov:Person}Parse MD-LD markdown and return RDF quads with lean origin tracking.
Parameters (named object):
text(string, required) — MD-LD formatted textcontext(object, optional) — Prefix mappings (default:{ '@vocab': 'http://www.w3.org/2000/01/rdf-schema#', rdf, rdfs, xsd, sh, prov })dataFactory(object, optional) — Custom RDF/JS DataFactorygraph(string, optional) — Named graph IRI
Returns: { quads, remove, statements, origin, context, primarySubject, md }
Legacy:
parse(text, options)still works for backward compatibility
quads— Array of RDF/JS Quads (final resolved graph state)remove— Array of RDF/JS Quads (external retractions targeting prior state)statements— Array of elevated RDF/JS Quads extracted from rdf:Statement patternsorigin— Lean origin tracking object with quadIndex for UI navigationcontext— Final context used (includes prefixes)primarySubject— String IRI or null (first non-fragment subject declaration)md— Clean Markdown with all MD-LD annotations stripped (round-trip safe)
Merge multiple MDLD documents with diff polarity resolution.
Parameters:
docs(array) — Array of markdown strings or ParseResult objectsoptions(object, optional):context(object) — Prefix mappings (merged with DEFAULT_CONTEXT)
Returns: { quads, remove, origin, context, primarySubjects }
quads— Array of RDF/JS Quads (final resolved graph state)remove— Array of RDF/JS Quads (external retractions targeting prior state)origin— Merge origin tracking with document index and polaritycontext— Final merged contextprimarySubjects— Array of string IRIs (primary subjects from each document, in merge order)
Generate deterministic MDLD from RDF quads with visual styling.
Parameters (named object):
quads(array, required) — Array of RDF/JS Quads to convertcontext(object, optional) — Prefix mappings (default:{})primarySubject(string, optional) — String IRI to place first in output (ensures round-trip safety). If not provided, falls back to the first subject from quads.
Returns: { text, context }
Features:
- Visual carrier styles based on datatype (code spans for numbers, bold booleans, etc.)
- Label-in-heading: Uses
rdfs:labelin subject headings when available - Multiple labels: First label in heading, additional labels rendered as literals
- Round-trip safe: All data preserved through parse → generate → parse
- Composable:
generate(parse(text))extracts semantics;parse(generate({quads}))normalizes quads
Generate node-centric MDLD showing all quads where a specific IRI appears in any position.
Parameters (named object):
quads(array, required) — Array of RDF/JS Quads to searchfocusIRI(string, required) — The IRI to center the view oncontext(object, optional) — Prefix mappings (default:{})
Returns: { text, context }
Behavior (Safety-First):
- If
focusIRIis null/undefined: Returns empty text - If
focusIRInot in graph: Returns empty text (never falls back to all data) - If
quadsis empty: Returns empty text
Safety rationale: Prevents accidental rendering of entire databases on misspelled IRIs—critical for production use with LLM cost per token. Explicit emptiness signals "not found" to the caller.
Use case: Perfect for exploring a specific node and all its relationships—where it appears as subject, object, predicate, type, or datatype. Creates an exhaustive view of everything related to the focus IRI. Ideal for node-centric knowledge graph explorers.
With the unified named parameter API, parse() and generate() compose seamlessly through object spreading:
import { parse, generate, generateNode } from 'mdld-parse';
// Pattern 1: parse → generate (semantic extraction)
const canonical = generate({ ...parse({ text, context }) });
// text → quads → canonical MDLD (deterministic, visual styling applied)
// Pattern 2: generate → parse (normalize external RDF)
const normalized = parse({ ...generate({ quads: externalQuads, context }) });
// external quads → MDLD → validated quads (DataFactory-safe, no blank nodes)
// Pattern 3: parse → generateNode (node-centric exploration)
const nodeView = generateNode({ ...parse({ text }), focusIRI: 'http://example.org/alice' });
// full graph → isolated node view (safe: returns empty if IRI not found)Why this works:
parse()returns{ quads, context, primarySubject, md, ... }generate()accepts{ quads, context, primarySubject }generateNode()accepts{ quads, context, focusIRI }(with focusIRI override)- Perfect shape alignment enables elegant
{ ...spread }composition
Every parse() result includes a md field containing the original Markdown with all MD-LD annotations stripped:
const result = parse({
text: `# Document {=ex:doc .Article}\n[Content] {ex:content}`,
context: { ex: 'http://example.org/' }
});
console.log(result.md);
// # Document\nContent
// Round-trip safety: re-parsing clean MD produces zero quads
const reparsed = parse({ text: result.md });
console.log(reparsed.quads.length); // 0Behavior:
- Valid MD-LD annotations (
{=...},{+...},{...}) are completely removed - Content from value carriers (
[text],**bold**,`code`) is preserved - Invalid syntax (annotations not at end-of-line) is preserved as visible markers
- Headings, lists, blockquotes, code blocks maintain their structure
- Prefix declarations at start of line are stripped
- Standalone subject declarations (
{=ex:subject}) are stripped
Use cases:
- Content extraction — Get readable Markdown without semantic markup
- Syntax validation — Remaining
{...}patterns indicate invalid MD-LD syntax - Round-trip testing —
parse(md).mdshould parse to zero quads - Preview generation — Show clean document before publishing
Locate the origin entry for a quad using the lean origin system.
Parameters:
quad(object) — The quad to locate (subject, predicate, object)origin(object) — Origin object containing quadIndex
Returns: { blockId, range, carrierType, subject, predicate, context, value, polarity } or null
Render RDF quads as HTML+RDFa for web display.
Parameters:
quads(array) — Array of RDF/JS Quads to renderoptions(object, optional):context(object) — Prefix mappings for CURIE shorteningbaseIRI(string) — Base IRI for resolving relative references
Returns: { html, context }
import {
DEFAULT_CONTEXT, // Default prefix mappings
DataFactory, // RDF/JS DataFactory instance
hash, // String hashing function
expandIRI, // IRI expansion with context
shortenIRI, // IRI shortening with context
parseSemanticBlock // Parse semantic block syntax
} from 'mdld-parse';- Zero dependencies — Pure JavaScript, ~15KB minified
- Streaming-first — Single-pass parsing, O(n) complexity
- Standards-compliant — RDF/JS data model
- Origin tracking — Full round-trip support with source maps
- Explicit semantics — No guessing, inference, or heuristics
Quads are compatible with:
n3.js— Turtle/N-Triples serializationrdflib.js— RDF store and reasoningsparqljs— SPARQL queriesrdf-ext— Extended RDF utilities
The parser includes comprehensive tests covering all spec requirements:
pnpm testTests validate:
- Subject declaration and context
- All predicate forms (p, ?p, !p)
- Datatypes and language tags
- Explicit list item annotations
- Code blocks and blockquotes
- Round-trip serialization