diff --git a/AGENTS.md b/AGENTS.md deleted file mode 100644 index 67bdef9..0000000 --- a/AGENTS.md +++ /dev/null @@ -1,56 +0,0 @@ -# Agent Guidelines for Kuddle.Net - -## Build Commands - -- **Build**: `dotnet run .build/targets.cs build` -- **Test**: `dotnet run .build/targets.cs test` -- **Clean**: `dotnet run .build/targets.cs clean` -- **Pack**: `dotnet run .build/targets.cs pack` -- **Single test**: `dotnet test --filter "FullyQualifiedName~TestName"` (TUnit supports standard .NET test filtering) - -## Code Style Guidelines - -### Project Setup - -- **Target Framework**: .NET 10.0 -- **Language Version**: C# preview -- **Nullable**: Enabled -- **Implicit Usings**: Enabled -- **Treat Warnings as Errors**: True - -### Naming Conventions - -- **Types/Namespaces**: PascalCase -- **Interfaces**: IPascalCase -- **Methods/Properties/Events**: PascalCase -- **Local variables/Parameters**: camelCase -- **Private fields**: _camelCase -- **Private static fields**: s_camelCase -- **Constants**: PascalCase - -### Formatting - -- **Indentation**: 4 spaces -- **Braces**: Required for all control structures -- **New lines**: Before open braces for all constructs -- **Using directives**: Outside namespace -- **File-scoped namespaces**: Preferred - -### Code Analysis & Formatting - -- **Analyzer**: Roslynator.Analyzers, Roslynator.CodeAnalysis.Analyzers, Roslynator.Formatting.Analyzers -- **Style enforcement**: Via .editorconfig rules -- **Expression-bodied members**: Preferred for accessors, indexers, properties -- **Pattern matching**: Preferred over is/as checks -- **Null checking**: Use null propagation and coalescing operators - -### Error Handling - -- **Exceptions**: Throw descriptive exceptions with context -- **Null checks**: Leverage nullable reference types -- **Validation**: Use argument validation where appropriate - -### Dependencies - -- **Parsing**: Parlot library -- **Testing**: TUnit with Microsoft.Testing.Platform diff --git a/docs/low-level-api.md b/docs/low-level-api.md deleted file mode 100644 index 69a24a5..0000000 --- a/docs/low-level-api.md +++ /dev/null @@ -1,145 +0,0 @@ -# Lower-Level API: Reader, Writer, and AST - -For scenarios where the high-level `KdlSerializer` is too restrictive, Kuddle.Net provides direct access to the KDL AST (Abstract Syntax Tree) via `KdlReader` and `KdlWriter`. - -## Reading KDL (KdlReader) - -`KdlReader.Read` parses a KDL string and returns a `KdlDocument`. - -```csharp -using Kuddle.AST; -using Kuddle.Serialization; - -string kdl = "node 1 2 key=\"val\""; -KdlDocument doc = KdlReader.Read(kdl); - -foreach (KdlNode node in doc.Nodes) -{ - Console.WriteLine($"Node name: {node.Name.Value}"); -} -``` - -### Options - -`KdlReaderOptions` allows you to customize the reading process: - -```csharp -var options = new KdlReaderOptions -{ - ValidateReservedTypes = true // Validates (uuid), (date-time), etc. format -}; - -KdlDocument doc = KdlReader.Read(kdl, options); -``` - ---- - -## Writing KDL (KdlWriter) - -`KdlWriter.Write` takes a `KdlDocument` (or any `KdlObject`) and returns its KDL string representation. - -```csharp -var doc = new KdlDocument(); -// ... build doc ... - -string kdl = KdlWriter.Write(doc); -// Or use doc.ToString() which uses default options -``` - -### Options - -`KdlWriterOptions` controls the output formatting: - -```csharp -var options = new KdlWriterOptions -{ - IndentChar = "\t", - NewLine = "\r\n", - SpaceAfterProp = " ", - EscapeUnicode = true -}; - -string kdl = KdlWriter.Write(doc, options); -``` - ---- - -## The KDL AST - -The AST is composed of records representing KDL constructs. - -### `KdlDocument` - -The root of a KDL file. - -- `Nodes`: `List` - -### `KdlNode` - -A single KDL node. - -- `Name`: `KdlString` -- `Entries`: `List` (Arguments or Properties) -- `Children`: `KdlBlock?` (Nested nodes) -- `TypeAnnotation`: `string?` - -### `KdlEntry` - -Base class for entries within a node. - -- `KdlArgument`: Positional value (`KdlValue`) -- `KdlProperty`: Key-value pair (`KdlString Key`, `KdlValue Value`) - -### `KdlValue` - -Base class for all constants. - -- `KdlString`: Represents strings. Support for varieties via `StringKind`: - - `StringKind.Bare`: `bare-string` - - `StringKind.Quoted`: `"quoted string"` - - `StringKind.Raw`: `r#"raw string"#` - - `StringKind.MultiLine`: `"""multi-line string"""` -- `KdlNumber`: Represents numeric values. Stores the `RawValue` string to preserve precision and formatting (e.g., `0xFF` vs `255`). -- `KdlBool`: `#true` or `#false`. -- `KdlNull`: `#null`. - ---- - -## Serialization Options - -When using `KdlSerializer`, you can pass `KdlSerializerOptions` to control the behavior: - -```csharp -var options = new KdlSerializerOptions -{ - IgnoreNullValues = true, // Don't write properties with null values - CaseInsensitiveNames = true, // Match KDL names to C# properties case-insensitively - WriteTypeAnnotations = true // Include (uuid), (date-time) etc. in output -}; - -string kdl = KdlSerializer.Serialize(myObj, options); -``` - ---- - -## Extension Methods - -Kuddle.Net provides helpful extension methods in `Kuddle.Extensions` for working with the AST: - -```csharp -using Kuddle.Extensions; - -KdlNode node = ...; - -// Get property value -KdlValue? val = node.Prop("my-key"); - -// Get argument by index -KdlValue? arg = node.Arg(0); - -// Try to get typed values -if (node.TryGetProp("port", out int port)) -{ - // ... -} -``` diff --git a/docs/serialization-attributes.md b/docs/serialization-attributes.md deleted file mode 100644 index 7d3e254..0000000 --- a/docs/serialization-attributes.md +++ /dev/null @@ -1,558 +0,0 @@ -# Attribute Usage - -This document describes how to use Kuddle's serialization attributes to map between KDL documents and C# types. - -## Quick Reference - -| Attribute | Target | Purpose | -| ---------------------- | -------- | ------------------------------- | -| `[KdlType]` | Class | Override the expected node name | -| `[KdlArgument(index)]` | Property | Map to a positional argument | -| `[KdlProperty(key?)]` | Property | Map to a named property | -| `[KdlNode(name?)]` | Property | Map to child nodes | -| `[KdlIgnore]` | Property | Exclude from serialization | - ---- - -## Node Name Matching - -### Default Convention - -By default, the class name (lowercased) must match the KDL node name: - -```csharp -// Matches: package "my-lib" -public class Package { ... } -``` - -### Custom Node Name with `[KdlType]` - -Override the expected node name: - -```csharp -// Matches: db "primary" (not "database") -[KdlType("db")] -public class Database { ... } -``` - ---- - -## `[KdlArgument(index)]` — Positional Arguments - -Maps a property to a **positional argument** by zero-based index. - -### KDL - -```kdl -point 10 20 30 label="origin" -``` - -### C# Model - -```csharp -public class Point -{ - [KdlArgument(0)] - public int X { get; set; } - - [KdlArgument(1)] - public int Y { get; set; } - - [KdlArgument(2)] - public int Z { get; set; } - - [KdlProperty("label")] - public string? Label { get; set; } -} -``` - -### Rules - -1. **Index is required** — Each `[KdlArgument]` must specify its position -2. **Indices should be contiguous** — Gaps (0, 2 without 1) may cause errors -3. **Order matters** — During serialization, arguments are written in index order -4. **Scalar types only** — Arguments cannot be complex objects - -### Supported Types - -- `string` -- `int`, `long`, `double`, `decimal` -- `bool` -- `Guid` (with `(uuid)` type annotation) -- `DateTimeOffset` (with `(date-time)` type annotation) -- Nullable variants of all above - ---- - -## `[KdlProperty(key?)]` — Named Properties - -Maps a property to a **KDL property** (key=value pair). - -### KDL - -```kdl -dependency lodash version="4.17.21" optional=#false -``` - -### C# Model - -```csharp -public class Dependency -{ - [KdlArgument(0)] - public string Package { get; set; } = ""; - - [KdlProperty("version")] - public string Version { get; set; } = "*"; - - [KdlProperty("optional")] - public bool Optional { get; set; } -} -``` - -### Key Name Resolution - -1. **Explicit key**: `[KdlProperty("my-key")]` → matches `my-key=...` -2. **Implicit key**: `[KdlProperty]` → uses property name lowercased - -```csharp -[KdlProperty] // Matches "timeout=..." -public int Timeout { get; set; } - -[KdlProperty("max-retries")] // Matches "max-retries=..." -public int MaxRetries { get; set; } -``` - -### Rules - -1. **Last value wins** — Per KDL spec, if `key=1 key=2`, the value is `2` -2. **Missing properties use defaults** — No error if property absent in KDL -3. **Scalar types only** — Properties cannot be complex objects - ---- - -## `[KdlNode(name?)]` — Child Nodes - -Maps a property to **child nodes** within the parent's `{ }` block. - -### Basic Usage — Wrapped Collection - -By default, a collection is wrapped in a container node with the specified name: - -#### KDL - -```kdl -project { - dependencies { - dependency "lodash" version="4.17.21" - dependency "react" version="18.2.0" - } -} -``` - -#### C# Model - -```csharp -public class Project -{ - [KdlNode("dependencies")] - public List Dependencies { get; set; } = []; -} -``` - -### Flattened Collection - -If you want the child nodes to appear directly under the parent without a wrapper, use `Flatten = true`: - -#### KDL - -```kdl -project { - dependency "lodash" version="4.17.21" - dependency "react" version="18.2.0" -} -``` - -#### C# Model - -```csharp -public class Project -{ - [KdlNode("dependency", Flatten = true)] - public List Dependencies { get; set; } = []; -} -``` - -### Customizing Element Names - -For wrapped collections, you can specify the name of the item nodes using `ElementName`: - -#### KDL - -```kdl -project { - dependencies { - pkg "lodash" - pkg "react" - } -} -``` - -#### C# Model - -```csharp -public class Project -{ - [KdlNode("dependencies", ElementName = "pkg")] - public List Packages { get; set; } = []; -} -``` - -### Scalar Child Node - -When the property type is a **non-collection complex type**, it maps to a single child node: - -#### KDL - -```kdl -application myapp { - database { - host "localhost" - port 5432 - } -} -``` - -#### C# Model - -```csharp -public class Application -{ - [KdlArgument(0)] - public string Name { get; set; } = ""; - - [KdlNode("database")] - public DatabaseConfig? Database { get; set; } -} - -public class DatabaseConfig -{ - [KdlNode("host")] // Maps child node's Arg(0) - public string Host { get; set; } = ""; - - [KdlNode("port")] // Maps child node's Arg(0) - public int Port { get; set; } -} -``` - -### Scalar Child Node - -When the property type is a **scalar type**, it extracts `Arg(0)` from the child node: - -#### KDL - -```kdl -config { - timeout 5000 - enabled #true -} -``` - -#### C# Model - -```csharp -public class Config -{ - [KdlNode("timeout")] - public int Timeout { get; set; } - - [KdlNode("enabled")] - public bool Enabled { get; set; } -} -``` - -### Node Name Resolution - -1. **Explicit name**: `[KdlNode("my-items")]` → matches child nodes named `my-items` -2. **Implicit name**: `[KdlNode]` → uses property name lowercased - -### Rules - -1. **Collection types** → Collects all matching child nodes into the list/array -2. **Complex types** → Expects exactly one matching child node (throws if multiple) -3. **Scalar types** → Extracts `Arg(0)` from the single matching child node -4. **Supported collection types**: `List`, `T[]`, `IEnumerable`, `IList` - ---- - -## `[KdlIgnore]` — Exclude Properties - -Excludes a property from both serialization and deserialization: - -```csharp -public class User -{ - [KdlArgument(0)] - public string Username { get; set; } = ""; - - [KdlIgnore] - public string ComputedDisplayName => Username.ToUpperInvariant(); - - [KdlIgnore] - public DateTime LoadedAt { get; set; } = DateTime.UtcNow; -} -``` - ---- - -## Dictionaries - -Dictionaries can be mapped in two ways depending on whether you want them as **Properties** or **Child Nodes**. - -### Dictionary as Properties (`[KdlProperty]`) - -Maps dictionary entries to `key=value` properties on the node. -*Note: Value types must be scalars.* - -#### C# Model - -```csharp -public class Header -{ - [KdlProperty("meta")] - public Dictionary Metadata { get; set; } = []; -} -``` - -#### KDL - -```kdl -header meta:author="Alice" meta:version="1.2.3" -``` - -### Dictionary as Child Nodes (`[KdlNode]`) - -Maps dictionary entries to individual child nodes where the **key is the node name**. - -#### C# Model - -```csharp -public class Environment -{ - [KdlNode("vars", Flatten = true)] - public Dictionary Variables { get; set; } = []; -} -``` - -#### KDL - -```kdl -environment { - PATH "/usr/bin" - HOME "/home/alice" -} -``` - ---- - -## Enums - -Enums are automatically serialized and deserialized as bare strings. - -```csharp -public enum LogLevel { Debug, Info, Warning, Error } - -public class Logger -{ - [KdlProperty] - public LogLevel Level { get; set; } -} -``` - -**KDL:** `logger level=info` (Case-insensitive by default) - ---- - -## Custom Type Annotations - -You can force a specific KDL type annotation on any argument, property, or node. - -```csharp -public class Data -{ - [KdlProperty("checksum", TypeAnnotation = "hex")] - public string Hash { get; set; } -} -``` - -**KDL:** `data checksum=(hex)"a1b2c3d4"` - ---- - -## Type Conversion - -### Automatic Type Mapping - -| C# Type | KDL Representation | -| ------------------- | ----------------------------------- | -| `string` | `"value"` or `bare-string` | -| `int`, `long` | `123`, `0xFF`, `0o77`, `0b1010` | -| `double`, `decimal` | `3.14`, `1.5e-10` | -| `bool` | `#true`, `#false` | -| `Guid` | `(uuid)"550e8400-..."` | -| `DateTimeOffset` | `(date-time)"2024-01-15T10:30:00Z"` | -| Nullable `T?` | Value or `#null` | - -### Type Annotations - -Type annotations in KDL provide parsing hints. The serializer recognizes: - -- `(uuid)` → `Guid` -- `(date-time)` → `DateTimeOffset` - -```kdl -user alice { - id (uuid)"550e8400-e29b-41d4-a716-446655440000" - createdAt (date-time)"2024-01-15T10:30:00Z" -} -``` - ---- - -## Document-Level vs Node-Level Deserialization - -### Single Node → Object - -When your type has `[KdlArgument]` or `[KdlProperty]` attributes, the deserializer expects **exactly one root node**: - -```csharp -var kdl = "package my-lib version=\"1.0\""; -var pkg = KdlSerializer.Deserialize(kdl); -``` - -### Multiple Nodes → Collection - -Use `DeserializeMany` for documents with multiple top-level nodes: - -```kdl -server web-1 host="10.0.0.1" -server web-2 host="10.0.0.2" -server api-1 host="10.0.1.1" -``` - -```csharp -var servers = KdlSerializer.DeserializeMany(kdl); -// Returns IEnumerable with 3 items -``` - -### Document as Container - -For a document where each node type maps to a different property: - -```kdl -name "my-project" -version "1.0.0" -author "Alice" -``` - -```csharp -public class Manifest -{ - [KdlNode("name")] - public string Name { get; set; } = ""; - - [KdlNode("version")] - public string Version { get; set; } = ""; - - [KdlNode("author")] - public string Author { get; set; } = ""; -} -``` - ---- - -## Complete Example - -### KDL Document - -```kdl -project web-app version="2.0.0" { - dependency lodash version="4.17.21" optional=#false - dependency react version="18.2.0" optional=#false - - devDependency jest version="29.0.0" - devDependency typescript version="5.0.0" - - author "Alice" { - email "alice@example.com" - url "https://github.com/alice" - } - - repository type="git" url="https://github.com/alice/web-app" -} -``` - -### C# Models - -```csharp -public class Project -{ - [KdlArgument(0)] - public string Name { get; set; } = ""; - - [KdlProperty("version")] - public string Version { get; set; } = "1.0.0"; - - [KdlNode("dependency")] - public List Dependencies { get; set; } = []; - - [KdlNode("devDependency")] - public List DevDependencies { get; set; } = []; - - [KdlNode("author")] - public Author? Author { get; set; } - - [KdlNode("repository")] - public Repository? Repository { get; set; } -} - -public class Dependency -{ - [KdlArgument(0)] - public string Package { get; set; } = ""; - - [KdlProperty("version")] - public string Version { get; set; } = "*"; - - [KdlProperty("optional")] - public bool Optional { get; set; } -} - -public class Author -{ - [KdlArgument(0)] - public string Name { get; set; } = ""; - - [KdlNode("email")] - public string? Email { get; set; } - - [KdlNode("url")] - public string? Url { get; set; } -} - -public class Repository -{ - [KdlProperty("type")] - public string Type { get; set; } = ""; - - [KdlProperty("url")] - public string Url { get; set; } = ""; -} -``` - -## Limitations & Notes - -1. **No polymorphism** — Cannot deserialize to derived types based on discriminator -2. **Case sensitivity** — Node/property name matching is case-insensitive by default -3. **Argument gaps** — Missing argument indices will throw; ensure contiguous indices -4. **No Round-trip fidelity** — Comments, formatting, and slashdash elements are not preserved diff --git a/kdl-spec.md b/kdl-spec.md deleted file mode 100644 index 4c35ad1..0000000 --- a/kdl-spec.md +++ /dev/null @@ -1,1112 +0,0 @@ ---- -title: "The KDL Document Language" -abbrev: "KDL" -docname: draft-marchan-kdl2-latest -submissionType: independent -category: exp - -ipr: none -area: General -venue: - github: kdl-org/kdl - home: -workgroup: KDL Community -keyword: - -- Document-Language -- Configuration - -stand_alone: yes -smart_quotes: no -pi: [toc, sortrefs, symrefs] - -author: - -- name: Katerina Zoé Marchán Salvá - ins: K. Marchán - organization: Microsoft -- name: The KDL Contributors - ins: KDL Contributors - -normative: - -informative: - ---- abstract - -KDL is a node-oriented document language. Its niche and purpose overlaps with -XML, and as do many of its semantics. You can use KDL both as a configuration -language, and a data exchange or storage format, if you so choose. - -This is the formal specification for KDL, including the intended data model and -the grammar. - -This document describes an unreleased minor change to KDL. For the latest -official version of the language, see . - - - ---- note_License - -This work is licensed under Creative Commons Attribution-ShareAlike 4.0 -International. To view a copy of this license, visit - - ---- middle - -# Compatibility - -KDL 2.0 is designed such that for any given KDL document written as [KDL -1.0](./SPEC_v1.md) or KDL 2.0, the parse will either fail completely, or, if the -parse succeeds, the data represented by a v1 or v2 parser will be identical. -This means that it's safe to use a fallback parsing strategy in order to support -both v1 and v2 simultaneously. For example, `node "foo"` is a valid node in both -versions, and should be represented identically by parsers. - -A version marker `/- kdl-version 2` (or `1`) _MAY_ be added to the beginning of -a KDL document, optionally preceded by the BOM, and parsers _MAY_ use that as a -hint as to which version to parse the document as. - -# Introduction - -KDL is a node-oriented document language. Its niche and purpose overlaps with -XML, and as do many of its semantics. You can use KDL both as a configuration -language, and a data exchange or storage format, if you so choose. - -The bulk of this document is dedicated to a long-form description of all -Components ({{components}}) of a KDL document. -There is also a much more terse -Grammar ({{full-grammar}}) at the end of the document that covers most of the -rules, with some semantic exceptions involving the data model. - -KDL is designed to be easy to read _and_ easy to implement. - -In this document, references to "left" or "right" refer to directions in the -_data stream_ towards the beginning or end, respectively; in other words, -the directions if the data stream were only ASCII text. They do not refer -to the writing direction of text, which can flow in either direction, -depending on the characters used. - -# Components - -## Document - -The toplevel concept of KDL is a Document. A Document is composed of zero or -more Nodes ({{node}}), separated by newlines, semicolons, and whitespace, and eventually -terminated by an EOF. - -All KDL documents MUST be encoded in UTF-8 and conform to the specifications in -this document. - -### Example - -The following is a document composed of two toplevel nodes: - -~~~kdl -foo { - bar -} -baz -~~~ - -## Node - -Being a node-oriented language means that the real core component of any KDL -document is the "node". Every node must have a name, which must be a -String ({{string}}). - -The name may be preceded by a Type Annotation ({{type-annotation}}) to further -clarify its type, particularly in relation to its parent node. (For example, -clarifying that a particular `date` child node is for the _publication_ date, -rather than the last-modified date, with `(published)date`.) - -Following the name are zero or more Arguments ({{argument}}) or -Properties ({{property}}), separated by either whitespace ({{whitespace}}) or a -slash-escaped line continuation ({{line-continuation}}). Arguments and Properties -may be interspersed in any order, much like is common with positional arguments -vs options in command line tools. Collectively, Arguments and Properties may be -referred to as "Entries". - -Children ({{children-block}}) can be placed after the name and the optional -Entries, possibly separated by either whitespace or a -slash-escaped line continuation. - -Arguments are ordered relative to each other and that order must be preserved in -order to maintain the semantics. Properties between Arguments do not affect -Argument ordering. - -By contrast, Properties _SHOULD NOT_ be assumed to be presented in a given -order. Children ({{children-block}}) should be used if an order-sensitive -key/value data structure must be represented in KDL. Cf. JSON objects -preserving key order. - -Nodes _MAY_ be prefixed with Slashdash ({{slashdash-comments}}) to "comment out" -the entire node, including its properties, arguments, and children, and make -it act as plain whitespace, even if it spreads across multiple lines. - -Finally, a node is terminated by either a Newline ({{newline}}), a semicolon -(`;`), the end of its parent's child block (`}`) or the end of the file/stream -(an `EOF`). - -### Example - -~~~kdl -// `foo` will have an Argument value list like `[1, 3]`. -foo 1 key=val 3 { - bar - (role)baz 1 2 -} -~~~ - -## Line Continuation - -Line continuations allow Nodes ({{node}}) to be spread across multiple lines. - -A line continuation is a `\` character followed by zero or more whitespace -items (including multiline comments) and an optional single-line comment. It -must be terminated by a Newline ({{newline}}) (including the Newline that is -part of single-line comments). - -Following a line continuation, processing of a Node can continue as usual. - -### Example - -~~~kdl -my-node 1 2 \ // comments are ok after \ - 3 4 // This is the actual end of the Node. -~~~ - -## Property - -A Property is a key/value pair attached to a Node ({{node}}). A Property is -composed of a String ({{string}}), followed immediately by an equals sign (`=`, `U+003D`), -and then a Value ({{value}}). - -Properties should be interpreted left-to-right, with rightmost properties with -identical names overriding earlier properties. That is: - -~~~kdl -node a=1 a=2 -~~~ - -In this example, the node's `a` value must be `2`, not `1`. - -No other guarantees about order should be expected by implementers. -Deserialized representations may iterate over properties in any order and -still be spec-compliant. - -Properties _MAY_ be prefixed with `/-` to "comment out" the entire token and -make it act as plain whitespace, even if it spreads across multiple lines. - -## Argument - -An Argument is a bare Value ({{value}}) attached to a Node ({{node}}), with no -associated key. It shares the same space as Properties ({{property}}), and may be interleaved with them. - -A Node may have any number of Arguments, which should be evaluated left to -right. KDL implementations _MUST_ preserve the order of Arguments relative to -each other (not counting Properties). - -Arguments _MAY_ be prefixed with `/-` to "comment out" the entire token and -make it act as plain whitespace, even if it spreads across multiple lines. - -### Example - -~~~kdl -my-node 1 2 3 a b c -~~~ - -## Children Block - -A children block is a block of Nodes ({{node}}), surrounded by `{` and `}`. They -are an optional part of nodes, and create a hierarchy of KDL nodes. - -Regular node termination rules apply, which means multiple nodes can be -included in a single-line children block, as long as they're all terminated by -`;`. - -### Example - -~~~kdl -parent { - child1 - child2 -} - -parent { child1; child2 } -~~~ - -## Value - -A value is either: a String ({{string}}), a Number ({{number}}), a -Boolean ({{boolean}}), or Null ({{null}}). - -Values _MUST_ be either Arguments ({{argument}}) or values of -Properties ({{property}}). Only String ({{string}}) values may be used as -Node ({{node}}) names or Property ({{property}}) keys. - -Values (both as arguments and in properties) _MAY_ be prefixed by a single -Type Annotation ({{type-annotation}}). - -## Type Annotation - -A type annotation is a prefix to any Node Name ({{node}}) or Value ({{value}}) that -includes a _suggestion_ of what type the value is _intended_ to be treated as, -or as a _context-specific elaboration_ of the more generic type the node name -indicates. - -Type annotations are written as a set of `(` and `)` with a single -String ({{string}}) in it. It may contain Whitespace after the `(` and before -the `)`, and may be separated from its target by Whitespace. - -KDL does not specify any restrictions on what implementations might do with -these annotations. They are free to ignore them, or use them to make decisions -about how to interpret a value. - -Additionally, the following type annotations MAY be recognized by KDL parsers -and, if used, SHOULD interpret these types as follows: - -### Reserved Type Annotations for Numbers Without Decimals - -Signed integers of various sizes (the number is the bit size): - -- `i8` -- `i16` -- `i32` -- `i64` -- `i128` - -Unsigned integers of various sizes (the number is the bit size): - -- `u8` -- `u16` -- `u32` -- `u64` -- `u128` - -Platform-dependent integer types, both signed and unsigned: - -- `isize` -- `usize` - -### Reserved Type Annotations for Numbers With Decimals - -IEEE 754 floating point numbers, both single (32) and double (64) precision: - -- `f32` -- `f64` - -IEEE 754-2008 decimal floating point numbers - -- `decimal64` -- `decimal128` - -### Reserved Type Annotations for Strings - -- `date-time`: ISO8601 date/time format. -- `time`: "Time" section of ISO8601. -- `date`: "Date" section of ISO8601. -- `duration`: ISO8601 duration format. -- `decimal`: IEEE 754-2008 decimal string format. -- `currency`: ISO 4217 currency code. -- `country-2`: ISO 3166-1 alpha-2 country code. -- `country-3`: ISO 3166-1 alpha-3 country code. -- `country-subdivision`: ISO 3166-2 country subdivision code. -- `email`: RFC5322 email address. -- `idn-email`: RFC6531 internationalized email address. -- `hostname`: RFC1123 internet hostname (only ASCII segments) -- `idn-hostname`: RFC5890 internationalized internet hostname - (only `xn--`-prefixed ASCII "punycode" segments, or non-ASCII segments) -- `ipv4`: RFC2673 dotted-quad IPv4 address. -- `ipv6`: RFC2373 IPv6 address. -- `url`: RFC3986 URI. -- `url-reference`: RFC3986 URI Reference. -- `irl`: RFC3987 Internationalized Resource Identifier. -- `irl-reference`: RFC3987 Internationalized Resource Identifier Reference. -- `url-template`: RFC6570 URI Template. -- `uuid`: RFC4122 UUID. -- `regex`: Regular expression. Specific patterns may be implementation-dependent. -- `base64`: A Base64-encoded string, denoting arbitrary binary data. -- `base85`: An [Ascii85](https://en.wikipedia.org/wiki/Ascii85)-encoded string, denoting arbitrary binary data. - -### Examples - -~~~kdl -node (u8)123 -node prop=(regex).* -(published)date "1970-01-01" -(contributor)person name="Foo McBar" -~~~ - -## String - -Strings in KDL represent textual UTF-8 Values ({{value}}). A String is either an -Identifier String ({{identifier-string}}) (like `foo`), a -Quoted String ({{quoted-string}}) (like `"foo"`) -or a Multi-Line String ({{multi-line-string}}). -Both Quoted and Multiline strings come in normal -and Raw String ({{raw-string}}) variants (like `#"foo"#`): - -- Identifier Strings let you write short, "single-word" strings with a - minimum of syntax -- Quoted Strings let you write strings "like normal", with whitespace and escapes. -- Multi-Line Strings let you write strings across multiple lines - and with indentation that's not part of the string value. -- Raw Strings don't allow any escapes, - allowing you to not worry about the string's content containing anything that - might look like an escape. - -Strings _MUST_ be represented as UTF-8 values. - -Strings _MUST NOT_ include the code points for -disallowed literal code points ({{disallowed-literal-code-points}}) directly. -Quoted and Multi-Line Strings may include these code points as _values_ -by representing them with their corresponding `\u{...}` escape. - -## Identifier String - -An Identifier String (sometimes referred to as just an "identifier") is -composed of any [Unicode Scalar -Value](https://unicode.org/glossary/#unicode_scalar_value) other than -non-initial characters ({{non-initial-characters}}), followed by any number of -Unicode Scalar Values other than non-identifier -characters ({{non-identifier-characters}}). - -A handful of patterns are disallowed, to avoid confusion with other values: - -- idents that appear to start with a Number ({{number}}) (like `1.0v2` or - `-1em`) or the "almost a number" pattern of a decimal point without a - leading digit (like `.1`). -- idents that are the language keywords (`inf`, `-inf`, `nan`, `true`, - `false`, and `null`) without their leading `#`. - -Identifiers that match these patterns _MUST_ be treated as a syntax error; such -values can only be written as quoted or raw strings. The precise details of the -identifier syntax is specified in the Full Grammar in {{full-grammar}}. - -### Non-initial characters - -The following characters cannot be the first character in an -Identifier String ({{identifier-string}}): - -- Any decimal digit (0-9) -- Any non-identifier characters ({{non-identifier-characters}}) - -Additionally, the following initial characters impose limitations on subsequent -characters: - -- the `+` and `-` characters can only be used as an initial character if - the second character is _not_ a digit. If the second character is `.`, then - the third character must _not_ be a digit. -- the `.` character can only be used as an initial character if - the second character is _not_ a digit. - -This allows identifiers to look like `--this` or `.md`, and removes the -ambiguity of having an identifier look like a number. - -### Non-identifier characters - -The following characters cannot be used anywhere in a Identifier String ({{identifier-string}}): - -- Any of `(){}[]/\"#;=` -- Any Whitespace ({{whitespace}}) or Newline ({{newline}}). -- Any disallowed literal code points ({{disallowed-literal-code-points}}) in KDL - documents. - -## Quoted String - -A Quoted String is delimited by `"` on either side of any number of literal -string characters except unescaped `"` and `\`. - -Literal Newline ({{newline}}) characters can only be included -if they are Escaped Whitespace ({{escaped-whitespace}}), -which discards them from the string value. -Actually including a newline in the value requires using a newline escape sequence, -like `\n`, -or using a Multi-Line String ({{multi-line-string}}) -which is actually designed for strings stretching across multiple lines. - -Like Identifier Strings, Quoted Strings _MUST NOT_ include any of the -disallowed literal code-points ({{disallowed-literal-code-points}}) as code -points in their body. - -Quoted Strings have a Raw String ({{raw-string}}) variant, -which disallows escapes. - -### Escapes - -In addition to literal code points, a number of "escapes" are supported in Quoted Strings. -"Escapes" are the character `\` followed by another character, and are -interpreted as described in the following table: - -| Name | Escape | Code Pt | -|-------------------------------|--------|----------| -| Line Feed | `\n` | `U+000A` | -| Carriage Return | `\r` | `U+000D` | -| Character Tabulation (Tab) | `\t` | `U+0009` | -| Reverse Solidus (Backslash) | `\\` | `U+005C` | -| Quotation Mark (Double Quote) | `\"` | `U+0022` | -| Backspace | `\b` | `U+0008` | -| Form Feed | `\f` | `U+000C` | -| Space | `\s` | `U+0020` | -| Unicode Escape | `\u{(1-6 hex chars)}` | Code point described by hex characters, as long as it represents a [Unicode Scalar Value](https://unicode.org/glossary/#unicode_scalar_value) | -| Whitespace Escape | See below | N/A | - -#### Escaped Whitespace - -In addition to escaping individual characters, `\` can also escape whitespace. -When a `\` is followed by one or more literal whitespace characters, the `\` -and all of that whitespace are discarded. For example, - -~~~kdl -"Hello World" -~~~ - -and - -~~~kdl -"Hello \ World" -~~~ - -are semantically identical. See whitespace ({{whitespace}}) -and newlines ({{newline}}) for how whitespace is defined. - -Note that only literal whitespace is escaped; whitespace escapes (`\n` and -such) are retained. For example, these strings are all semantically identical: - -~~~kdl -"Hello\ \nWorld" - - "Hello\n\ - World" - -"Hello\nWorld" - -""" - Hello - World - """ -~~~ - -#### Invalid escapes - -Except as described in the escapes table, above, `\` _MUST NOT_ precede any -other characters in a string. - -## Multi-line String - -Multi-Line Strings support multiple lines with literal, non-escaped -Newlines. They must use a special multi-line syntax, and they automatically -"dedent" the string, allowing its value to be indented to a visually matching -level as desired. - -A Multi-Line String is opened and closed by _three_ double-quote characters, -like `"""`. -Its first line _MUST_ immediately start with a Newline ({{newline}}) -after its opening `"""`. -Its final line _MUST_ contain only whitespace -before the closing `"""`. -All in-between lines that contain non-newline, non-whitespace characters -_MUST_ start with _at least_ the exact same whitespace as the final line -(precisely matching codepoints, not merely counting characters or "size"); -they may contain additional whitespace following this prefix. The lines in -between may contain unescaped `"` (but no unescaped `"""` as this would close -the string). - -The value of the Multi-Line String omits the first and last Newline, the -Whitespace of the last line, and the matching Whitespace prefix on all -intermediate lines. The first and last Newline can be the same character (that -is, empty multi-line strings are legal). - -In other words, the final line specifies the whitespace prefix that will be -removed from all other lines. - -Whitespace-only lines (that is, lines containing only literal whitespace -characters, not including whitespace escapes like `\t`) always represent -empty lines in the string value, regardless of what whitespace they -contain (if any). They do not have to start with the same whitespace prefix -that other lines do; all characters on the line are ignored. - -Multi-line Strings that do not immediately start with a Newline and whose final -`"""` is not preceded by optional whitespace and a Newline are illegal. This -also means that `"""` may not be used for a single-line String (e.g. -`"""foo"""`). - -### Newline Normalization - -Literal Newline sequences in Multi-line Strings must be normalized to a single -`U+000A` (`LF`) during deserialization. This means, for example, that `CR LF` -becomes a single `LF` during parsing. - -This normalization does not apply to non-literal Newlines entered using escape -sequences. That is: - -~~~kdl -multi-line """ - \r\n[CRLF] - foo[CRLF] - """ -~~~ - -becomes: - -~~~kdl -single-line "\r\n\nfoo" -~~~ - -For clarity: this normalization applies to each individual Newline sequence. -That is, the literal sequence `CRLF CRLF` becomes `LF LF`, not `LF`. - -### Examples - -#### Indented multi-line string - -~~~kdl -multi-line """ - foo - This is the base indentation - bar - """ -~~~ - -This example's string value will be: - -~~~ - foo -This is the base indentation - bar -~~~ - -which is equivalent to - -~~~kdl -" foo\nThis is the base indentation\n bar" -~~~ - -when written as a single-line string. - -#### Shorter last-line indent - -If the last line wasn't indented as far, -it won't dedent the rest of the lines as much: - -~~~kdl -multi-line """ - foo - This is no longer on the left edge - bar - """ -~~~ - -This example's string value will be: - -~~~ - foo - This is no longer on the left edge - bar -~~~ - -Equivalent to - -~~~kdl -" foo\n This is no longer on the left edge\n bar" -~~~ - -#### Empty lines - -Empty lines can contain any whitespace, or none at all, and will be reflected as empty in the value: - -~~~kdl -multi-line """ - Indented a bit - - A second indented paragraph. - """ -~~~ - -This example's string value will be: - -~~~ -Indented a bit. - -A second indented paragraph. -~~~ - -Equivalent to - -~~~kdl -"Indented a bit.\n\nA second indented paragraph." -~~~ - -#### Syntax errors - -The following yield **syntax errors**: - -~~~kdl -multi-line """can't be single line""" -~~~ - -~~~kdl -multi-line """ - closing quote with non-whitespace prefix""" -~~~ - -~~~kdl -multi-line """stuff - """ -~~~ - -~~~kdl -// Every line must share the exact same prefix as the closing line. -multi-line """[\n] -[tab]a[\n] -[space][space]b[\n] -[space][tab][\n] -[tab]""" -~~~ - -### Interaction with Whitespace Escapes - -Multi-line strings support the same mechanism for escaping whitespace as Quoted -Strings. - -When processing a Multi-line String, implementations MUST dedent the string -_after_ resolving all whitespace escapes, but _before_ resolving other backslash -escapes. This means a whitespace escape that attempts to escape the final line's -newline and/or whitespace prefix can be invalid: if removing escaped whitespace -places the closing `"""` on a line with non-whitespace characters, this escape -is invalid. - -For example, the following example is illegal: - -~~~kdl - """ - foo - bar\ - """ - - // equivalent to - """ - foo - bar""" -~~~ - -while the following example is allowed - -~~~kdl - """ - foo \ -bar - baz - \ """ - - // equivalent to - """ - foo bar - baz - """ -~~~ - -## Raw String - -Both Quoted ({{quoted-string}}) and Multi-Line Strings ({{multi-line-string}}) have -Raw String variants, which are identical in syntax except they do not support -`\`-escapes. This includes line-continuation escapes (`\` + `ws` collapsing to -nothing). They otherwise share the same properties as far as literal -Newline ({{newline}}) characters go, multi-line rules, and the requirement of -UTF-8 representation. - -The Raw String variants are indicated by preceding the strings's opening quotes -with one or more `#` characters. The string is then closed by its normal closing -quotes, followed by a _matching_ number of `#` characters. This means that the -string may contain any combination of `"` and `#` characters other than its -closing delimiter (e.g., if a raw string starts with `##"`, it can contain `"` -or `"#`, but not `"##` or `"###`). - -Like other Strings, Raw Strings _MUST NOT_ include any of the disallowed -literal code-points ({{disallowed-literal-code-points}}) as code points in their -body. Unlike with Quoted Strings, these cannot simply be escaped, and are thus -unrepresentable when using Raw Strings. - -### Example - -~~~kdl -just-escapes #"\n will be literal"# -~~~ - -The string contains the literal characters `\n will be literal`. - -~~~kdl -quotes-and-escapes ##"hello\n\r\asd"#world"## -~~~ - -The string contains the literal characters `hello\n\r\asd"#world` - -~~~kdl -raw-multi-line #""" - Here's a """ - multiline string - """ - without escapes. - """# -~~~ - -The string contains the value - -~~~ -Here's a """ - multiline string - """ -without escapes. -~~~ - -or equivalently, - -~~~kdl -"Here's a \"\"\"\n multiline string\n \"\"\"\nwithout escapes." -~~~ - -as a Quoted String. - -## Number - -Numbers in KDL represent numerical Values ({{value}}). There is no logical distinction in KDL -between real numbers, integers, and floating point numbers. It's up to -individual implementations to determine how to represent KDL numbers. - -There are five syntaxes for Numbers: Keywords, Decimal, Hexadecimal, Octal, and Binary. - -- All non-Keyword ({{keyword-numbers}}) numbers may optionally start with one of `-` or `+`, which determine whether they'll be positive or negative. -- Binary numbers start with `0b` and only allow `0` and `1` as digits, which may be separated by `_`. They represent numbers in radix 2. -- Octal numbers start with `0o` and only allow digits between `0` and `7`, which may be separated by `_`. They represent numbers in radix 8. -- Hexadecimal numbers start with `0x` and allow digits between `0` and `9`, as well as letters `A` through `F`, in either lower or upper case, which may be separated by `_`. They represent numbers in radix 16. -- Decimal numbers are a bit more special: - - They have no radix prefix. - - They use digits `0` through `9`, which may be separated by `_`. - - They may optionally include a decimal separator `.`, followed by more digits, which may again be separated by `_`. - - They may optionally be followed by `E` or `e`, an optional `-` or `+`, and more digits, to represent an exponent value. - -In all cases where the above says that digits "may be separated by `_`", -this means that between any two digits, or after the digits, any number of -consecutive `_` characters can appear. Underscores are not allowed _before_ the digits. -That is, `1___2` and `12____` are valid (and both equivalent to just `12`), but -`_12` is _not_ a valid number (it will instead parse as an identifier string), -nor is `0x_1a` (it will simply be invalid). - -Note that, similar to JSON and some other languages, -numbers without an integer digit (such as `.1`) are illegal. -They must be written with at least one integer digit, like `0.1`. -(These patterns are also disallowed from Identifier Strings ({{identifier-string}}), to avoid confusion.) - -### Keyword Numbers - -There are three special "keyword" numbers included in KDL to accommodate the -widespread use of [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floats: - -- `#inf` - floating point positive infinity. -- `#-inf` - floating point negative infinity. -- `#nan` - floating point NaN/Not a Number. - -To go along with this and prevent foot guns, the bare Identifier -Strings ({{identifier-string}}) `inf`, `-inf`, and `nan` are considered illegal -identifiers and should yield a syntax error. - -The existence of these keywords does not imply that any numbers be represented -as IEEE 754 floats. These are simply for clarity and convenience for any -implementation that chooses to represent their numbers in this way. - -## Boolean - -A boolean Value ({{value}}) is either the symbol `#true` or `#false`. These -_SHOULD_ be represented by implementation as boolean logical values, or some -approximation thereof. - -### Example - -~~~kdl -my-node #true value=#false -~~~ - -## Null - -The symbol `#null` represents a null Value ({{value}}). It's up to the -implementation to decide how to represent this, but it generally signals the -"absence" of a value. - -### Example - -~~~kdl -my-node #null key=#null -~~~ - -## Whitespace - -The following characters should be treated as non-Newline ({{newline}}) [white -space](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt): - -| Name | Code Pt | -| ------------------------- | -------- | -| Character Tabulation | `U+0009` | -| Space | `U+0020` | -| No-Break Space | `U+00A0` | -| Ogham Space Mark | `U+1680` | -| En Quad | `U+2000` | -| Em Quad | `U+2001` | -| En Space | `U+2002` | -| Em Space | `U+2003` | -| Three-Per-Em Space | `U+2004` | -| Four-Per-Em Space | `U+2005` | -| Six-Per-Em Space | `U+2006` | -| Figure Space | `U+2007` | -| Punctuation Space | `U+2008` | -| Thin Space | `U+2009` | -| Hair Space | `U+200A` | -| Narrow No-Break Space | `U+202F` | -| Medium Mathematical Space | `U+205F` | -| Ideographic Space | `U+3000` | - -### Single-line comments - -Any text after `//`, until the next literal Newline ({{newline}}) is "commented -out", and is considered to be Whitespace ({{whitespace}}). - -### Multi-line comments - -In addition to single-line comments using `//`, comments can also be started -with `/*` and ended with `*/`. These comments can span multiple lines. They -are allowed in all positions where Whitespace ({{whitespace}}) is allowed and -can be nested. - -### Slashdash comments - -Finally, a special kind of comment called a "slashdash", denoted by `/-`, can -be used to comment out entire _components_ of a KDL document logically, and -have those elements not be included as part of the parsed document data. - -Slashdash comments can be used before the following, including before their type -annotations, if present: - -- A Node ({{node}}): the entire Node is treated as Whitespace, including all - props, args, and children. -- An Argument ({{argument}}): the Argument value is treated as Whitespace. -- A Property ({{property}}) key: the entire property, including both key and value, - is treated as Whitespace. A slashdash of just the property value is not allowed. -- A Children Block ({{children-block}}): the entire block, including all - children within, is treated as Whitespace. Only other children blocks, whether - slashdashed or not, may follow a slashdashed children block. - -A slashdash may be be followed by any amount of whitespace, including newlines and -comments (other than other slashdashes), before the element that it comments out. - -## Newline - -The following character sequences [should be treated as new -lines](https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G41643): - -| Acronym | Name | Code Pt | -| ------- | ----------------------------- | ------------------- | -| CRLF | Carriage Return and Line Feed | `U+000D` + `U+000A` | -| CR | Carriage Return | `U+000D` | -| LF | Line Feed | `U+000A` | -| NEL | Next Line | `U+0085` | -| VT | Vertical tab | `U+000B` | -| FF | Form Feed | `U+000C` | -| LS | Line Separator | `U+2028` | -| PS | Paragraph Separator | `U+2029` | - -Note that for the purpose of new lines, the specific sequence `CRLF` is -considered _a single newline_. - -## Disallowed Literal Code Points - -The following code points may not appear literally anywhere in the document. -They may be represented in Strings (but not Raw Strings) using Unicode Escapes ({{escapes}}) (`\u{...}`, -except for non Unicode Scalar Value, which can't be represented even as escapes). - -- The codepoints `U+0000-0008` or the codepoints `U+000E-001F` (various - control characters). -- `U+007F` (the Delete control character). -- Any codepoint that is not a [Unicode Scalar - Value](https://unicode.org/glossary/#unicode_scalar_value) (`U+D800-DFFF`). -- `U+200E-200F`, `U+202A-202E`, and `U+2066-2069`, the [unicode - "direction control" - characters](https://www.w3.org/International/questions/qa-bidi-unicode-controls) -- `U+FEFF`, aka Zero-width Non-breaking Space (ZWNBSP)/Byte Order Mark (BOM), - except as the first code point in a document. - -# Full Grammar - -This is the full official grammar for KDL and should be considered -authoritative if something seems to disagree with the text above. The grammar -language syntax is defined in {{grammar-language}}. - -~~~abnf -document := bom? version? nodes - -// Nodes -nodes := (line-space* node)* line-space* - -base-node := slashdash? type? node-space* string - (node-space* (node-space | slashdash) node-prop-or-arg)* - // slashdashed node-children must always be after props and args. - (node-space* slashdash node-children)* - (node-space* node-children)? - (node-space* slashdash node-children)* - node-space* -node := base-node node-terminator -final-node := base-node node-terminator? - -// Entries -node-prop-or-arg := prop | value -node-children := '{' nodes final-node? '}' -node-terminator := single-line-comment | newline | ';' | eof - -prop := string node-space* '=' node-space* value -value := type? node-space* (string | number | keyword) -type := '(' node-space* string node-space* ')' - -// Strings -string := identifier-string | quoted-string | raw-string ¶ - -identifier-string := - (unambiguous-ident | signed-ident | dotted-ident) - - disallowed-keyword-identifiers -unambiguous-ident := - (identifier-char - digit - sign - '.') identifier-char* -signed-ident := - sign ((identifier-char - digit - '.') identifier-char*)? -dotted-ident := - sign? '.' ((identifier-char - digit) identifier-char*)? -identifier-char := - unicode - unicode-space - newline - [\\/(){};\[\]"#=] - - disallowed-literal-code-points -disallowed-keyword-identifiers := - 'true' | 'false' | 'null' | 'inf' | '-inf' | 'nan' - -quoted-string := - '"' single-line-string-body '"' | - '"""' newline - (multi-line-string-body newline)? - (unicode-space | ws-escape)* '"""' -single-line-string-body := (string-character - newline)* -multi-line-string-body := (('"' | '""')? string-character)* -string-character := - '\\' (["\\bfnrts] | - 'u{' hex-unicode '}') | - ws-escape | - [^\\"] - disallowed-literal-code-points -ws-escape := '\\' (unicode-space | newline)+ -hex-digit := [0-9a-fA-F] -hex-unicode := hex-digit{1, 6} - surrogate - above-max-scalar -surrogate := [0]{0, 2} [dD] [8-9a-fA-F] hex-digit{2} -// U+D800-DFFF: D 8 00 -// D F FF -above-max-scalar = [2-9a-fA-F] hex-digit{5} | - [1] [1-9a-fA-F] hex-digit{4} - - -raw-string := '#' raw-string-quotes '#' | '#' raw-string '#' -raw-string-quotes := - '"' single-line-raw-string-body '"' | - '"""' newline - (multi-line-raw-string-body newline)? - unicode-space* '"""' -single-line-raw-string-body := - '' | - (single-line-raw-string-char - '"') - single-line-raw-string-char*? | - '"' (single-line-raw-string-char - '"') - single-line-raw-string-char*? -single-line-raw-string-char := - unicode - newline - disallowed-literal-code-points -multi-line-raw-string-body := - (unicode - disallowed-literal-code-points)*? - -// Numbers -number := keyword-number | hex | octal | binary | decimal - -decimal := sign? integer ('.' integer)? exponent? -exponent := ('e' | 'E') sign? integer -integer := digit (digit | '_')* -digit := [0-9] -sign := '+' | '-' - -hex := sign? '0x' hex-digit (hex-digit | '_')* -octal := sign? '0o' [0-7] [0-7_]* -binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* - -// Keywords and booleans. -keyword := boolean | '#null' -keyword-number := '#inf' | '#-inf' | '#nan' -boolean := '#true' | '#false' - -// Specific code points -bom := '\u{FEFF}' -disallowed-literal-code-points := - See Table (Disallowed Literal Code Points) -unicode := Any Unicode Scalar Value -unicode-space := See Table - (All White_Space unicode characters which are not `newline`) - -// Comments -single-line-comment := '//' ^newline* (newline | eof) -multi-line-comment := '/*' commented-block -commented-block := - '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block -slashdash := '/-' line-space* - -// Whitespace -ws := unicode-space | multi-line-comment -escline := '\\' ws* (single-line-comment | newline | eof) -newline := See Table (All Newline White_Space) -// Whitespace where newlines are allowed. -line-space := node-space | newline | single-line-comment -// Whitespace within nodes, -// where newline-ish things must be esclined. -node-space := ws* escline ws* | ws+ - -// Version marker -version := - '/-' unicode-space* 'kdl-version' unicode-space+ ('1' | '2') - unicode-space* newline -~~~ - -## Grammar language - -The grammar language syntax is a combination of ABNF with some regex spice thrown in. -Specifically: - -- Single quotes (`'`) are used to denote literal text. `\` within a literal - string is used for escaping other single-quotes, for initiating unicode - characters using hex values (`\u{FEFF}`), and for escaping `\` itself - (`\\`). -- `*` is used for "zero or more", `+` is used for "one or more", and `?` is - used for "zero or one". Per standard regex semantics, `*` and `+` are _greedy_; - they match as many instances as possible without failing the match. -- `*?` (used only in raw strings) indicates a _non-greedy_ match; - it matches as _few_ instances as possible without failing the match. -- `¶` is a _cut point_. It always matches and consumes no characters, - but once matched, the parser is not allowed to backtrack past that point in the source. - If a parser would rewind past the cut point, it must instead fail the overall parse, - as if it had run out of options. - (This is only used with the `raw-string` production, - to ensure the first instance of the appropriate closing quote sequence - is guaranteed to be the end of the raw string, - rather than allowing it to potentially consume more of the document unexpectedly.) -- `()` can be used to group matches that must be matched together. -- `a | b` means `a or b`, whichever matches first. If multiple items are before - a `|`, they are a single group. `a b c | d` is equivalent to `(a b c) | d`. -- `[]` are used for regex-style character matches, where any character between - the brackets will be a single match. `\` is used to escape `\`, `[`, and - `]`. They also support character ranges (`0-9`), and negation (`^`) -- `-` is used for "except for" or "minus" whatever follows it. For example, - `a - 'x'` means "any `a`, except something that matches the literal `'x'`". -- The prefix `^` means "something that does not match" whatever follows it. - For example, `^foo` means "must not match `foo`". -- A single definition may be split over multiple lines. Newlines are treated as - spaces. -- `//` followed by text on its own line is used as comment syntax. diff --git a/readme.md b/readme.md index 07ebe5f..d32b921 100644 --- a/readme.md +++ b/readme.md @@ -1,85 +1,398 @@ # Kuddle.Net +- [Kuddle.Net](#kuddlenet) + - [Quick Start](#quick-start) + - [Mapping C# Members](#mapping-c-members) + - [Advanced Composition](#advanced-composition) + - [Type Annotations and Validation](#type-annotations-and-validation) + - [Output Control and Formatting](#output-control-and-formatting) + - [Integrations](#integrations) + Kuddle.Net is a .NET implementation of a [KDL](https://kdl.dev) parser/serializer targeting [v2](https://kdl.dev/spec/) of the spec. KDL is a concise, human-readable language built for configuration and data exchange. Head to for more specifics on the KDL document language itself. -## Installation +## Quick Start + +Implement KDL v2 serialization in a .NET project. + +### Install Kuddle.Net + +Run the installation command in your project directory: -```text +```bash dotnet add package Kuddle.Net ``` -## Quick Start: Serialization & Deserialization +### Define a Model -For most use cases, `KdlSerializer` provides the easiest way to work with KDL data by mapping it directly to C# classes. +Create a class with a parameterless constructor. Kuddle.Net uses kebab-case for KDL node names by default. -### Deserializing KDL to Objects +```csharp +using Kuddle.Serialization; + +public class Plugin +{ + public string Name { get; set; } = string.Empty; + public string Version { get; set; } = "1.0.0"; +} +``` + +### Serialize and Deserialize + +Use the **KdlSerializer** static class for string-based operations. ```csharp using Kuddle.Serialization; -var kdl = """ - server "production" { - host "10.0.0.1" - port 8080 - } - """; +// Initialize data +var plugin = new Plugin { Name = "Kuddle", Version = "2.0.0" }; + +// 1. Convert Object to KDL String +string kdl = KdlSerializer.Serialize(plugin); +// Result: plugin name="Kuddle" version="2.0.0" -// Deserialize a single root node -var config = KdlSerializer.Deserialize(kdl); +// 2. Convert KDL String back to Object +var result = KdlSerializer.Deserialize(kdl); ``` -### Serializing Objects to KDL +### Understand KDL Structure + +KDL utilizes a node-based hierarchy. Use the following table to map KDL concepts to .NET types. + +| KDL Concept | .NET Equivalent | Example | +| :--- | :--- | :--- | +| **Node** | Class / POCO | `server { ... }` | +| **Argument** | Positional Value | `node "value"` | +| **Property** | Key-Value Pair | `node key="value"` | +| **Children** | Nested Objects/Collections | `node { child_node }` | +| **Annotation** | Type metadata | `(uuid)"..."` | + +--- + +## Mapping C# Members + +Control how C# properties map to KDL structures using attributes. + +### Configure Naming Conventions + +Kuddle.Net mandates **kebab-case** for implicit names. A property `SystemSettings` maps to node or key `system-settings`. + +Override naming by passing a string argument to mapping attributes: ```csharp -var myConfig = new ServerConfig { Host = "localhost", Port = 3000 }; -string kdl = KdlSerializer.Serialize(myConfig); +[KdlProperty("serial_NO")] +public string SerialNumber { get; set; } ``` -### Document-Level Deserialization +### Map Entry Types -If your KDL file contains multiple top-level nodes of the same type, use `DeserializeMany`: +KDL nodes store data in three slots. Use attributes to assign properties to specific slots: -```csharp -var kdl = """ - user "alice" role="admin" - user "bob" role="user" - """; +| Attribute | KDL Target | Mapping Logic | +| :--- | :--- | :--- | +| **[KdlProperty]** | Property | Key-value pairs: `key="value"`. | +| **[KdlArgument]** | Argument | Positional values: `node "value"`. | +| **[KdlNode]** | Child Node | Nested nodes or blocks: `node { child }`. | + +**Default Inference:** -var users = KdlSerializer.DeserializeMany(kdl); +- **Scalars** (int, string, bool, DateTime): Maps to **Properties**. +- **Complex Types / Collections**: Maps to **Child Nodes**. + +### Implement Positional Arguments + +Specify the 0-based index for positional values. + +```csharp +public record User( + [property: KdlArgument(0)] int Id, + [property: KdlArgument(1)] string Role +); +// Output: user 1 "admin" ``` +**The "Rest" Argument Constraint:** +Map a collection to an argument to capture all remaining values. + +- **Requirement:** The collection argument must possess the highest index in the class. +- **Uniqueness:** Only one collection argument is permitted per node. + +### Manage Null and Boolean Values + +Kuddle.Net follows KDL v2 strict type requirements for booleans and nulls. + +**Null Fidelity:** +Toggle `IgnoreNullValues` in `KdlSerializerOptions`: + +- **True (Default):** Omit null properties from output. +- **False:** Emit the `#null` literal. + +**Boolean Explicitness:** +KDL requires `#true` or `#false`. Bare identifiers like `true` are parsed as **KdlString**, not **KdlBool**. Kuddle.Net handles this conversion automatically for `System.Boolean` types. + --- -## Mapping with Attributes +## Advanced Composition + +Manage complex document hierarchies through collection strategies, and unmapped data capture. + +### Configure Collection Mapping + +Kuddle.Net provides two strategies for mapping `IEnumerable` properties. + +**Wrapped Collections (Default):** +The property name defines a parent node, and items appear as children. + +```csharp +[KdlNode("items")] +public List Tags { get; set; } = ["net10", "kdl"]; +/* Output: +items { + - "net10" + - "kdl" +} +*/ +``` + +**Flattened Collections:** +Set `Flatten = true` to omit the container node and emit items as siblings. + +```csharp +[KdlNode("tag", Flatten = true)] +public List Tags { get; set; } = ["net10"]; +/* Output: +tag "net10" +*/ +``` + +### Implement Member Hoisting + +Flatten complex objects to merge their properties into the parent node's scope. + +```csharp +public class Root { + [KdlNode(Flatten = true)] + public Metadata Info { get; set; } +} + +public class Metadata { + [KdlProperty] public string Author { get; set; } +} +// Result: root author="name" +``` + +**Constraint:** Flattening is restricted to collections and complex objects. Applying `Flatten = true` to a scalar type (e.g., `int`, `string`) throws `KdlConfigurationException`. + +### Capture Unmapped Data + +Use `[KdlExtensionData]` to preserve KDL elements that do not match existing class members. + +**Requirements:** + +- Property type must be `IDictionary` or `IDictionary`. +- Unmapped properties are stored as native CLR types (`string`, `double`, `bool`). +- Unmapped nodes are stored as raw `KdlNode` AST objects. + +```csharp +public class Config { + [KdlExtensionData] + public Dictionary CatchAll { get; set; } +} +``` + +*Note: Elements prefixed with the slashdash `/-` are ignored by the parser and are not captured.* -To control how C# properties map to KDL arguments, properties, and child nodes, use the provided attributes. +### Select Root Mapping Strategy -| Attribute | Target | Purpose | -| ---------------------- | -------- | --------------------------------------------- | -| `[KdlArgument(index)]` | Property | Maps to a positional argument | -| `[KdlProperty(key)]` | Property | Maps to a `key="value"` property | -| `[KdlNode(name)]` | Property | Maps to a child node (or collection of nodes) | -| `[KdlType(name)]` | Class | Overrides the default node name | +Set the `RootMapping` property in `KdlSerializerOptions` to define top-level structure. -**[Detailed Attribute Documentation](docs/serialization-attributes.md)** +| Strategy | Description | Best Use Case | +| :--- | :--- | :--- | +| **AsNode** (Default) | Maps the object to one root node. | Data exchange / Storage. | +| **AsDocument** | Maps properties to top-level nodes. | Config files (e.g., `appsettings.kdl`). | --- -## Advanced Usage +## Type Annotations and Validation -### Lower-Level AST Access +Enforce type safety and data integrity using KDL v2 type annotations and reserved type validators. -If you need full control over the KDL structure, you can use `KdlReader` to get a `KdlDocument` AST. +### Utilize Standard Type Annotations + +Kuddle.Net automatically emits and resolves reserved KDL annotations for standard .NET types. + +| .NET Type | KDL Annotation | Output Example | +| :--- | :--- | :--- | +| **Guid** | `(uuid)` | `(uuid)"550e...4000"` | +| **DateTimeOffset** | `(date-time)` | `(date-time)"2023-10-05T14:48:00Z"` | +| **DateOnly** | `(date)` | `(date)"2023-10-05"` | +| **TimeOnly** | `(time)` | `(time)"14:48:00"` | +| **TimeSpan** | `(duration)` | `(duration)"PT1H30M"` | + +### Enforce Numeric Precision + +Specify bit-widths for numeric entries using the `TypeAnnotation` property on mapping attributes. This ensures cross-platform compatibility for integer and floating-point types. + +```csharp +public class Metrics +{ + [KdlProperty(TypeAnnotation = "u8")] + public byte Priority { get; set; } + + [KdlProperty(TypeAnnotation = "f64")] + public double Velocity { get; set; } +} +// Output: priority=(u8)10 velocity=(f64)120.5 +``` + +### Configure Reserved Type Validation + +Enable `KdlReservedTypeValidator` to ensure values for specific identifiers (e.g., `ipv4`, `regex`, `base64`) conform to their format specifications. + +**Enable/Disable Validation:** +Modify `KdlReaderOptions` before parsing. Validation is enabled by default. + +```csharp +var options = new KdlReaderOptions { ValidateReservedTypes = true }; +var doc = KdlReader.Read(kdlText, options); +``` + +**Handle Validation Failures:** +Catch `KuddleValidationException` to inspect specific failures. This exception contains an `Errors` collection referencing the failing node and a descriptive message. ```csharp -KdlDocument doc = KdlReader.Read(kdlString); +try { + KdlReader.Read(kdlText); +} catch (KuddleValidationException ex) { + foreach (var err in ex.Errors) Console.WriteLine(err.Message); +} ``` -**[Lower-Level API Documentation](docs/low-level-api.md)** +### Map Enums + +Enums serialize as **bare strings** (unquoted identifiers). + +- **Serialization:** Emits the exact member name string. +- **Deserialization:** Performs case-insensitive matching against member names. + +```csharp +public enum Status { Active, Inactive } + +public class Account { + public Status State { get; set; } +} +// KDL: account state=Active +``` --- -## License +## Output Control and Formatting + +Manage KDL output structure and string representation via **KdlWriterOptions** and **KdlStringStyle**. + +### String Style Selection + +Kuddle.Net selects string formats based on character content and configured flags. + +| Style | Result | Selection Criteria | +| :--- | :--- | :--- | +| **Bare** | `name` | No spaces or reserved characters `()[]{}/\"#;=`. Cannot start with a digit. | +| **Quoted** | `"name"` | Contains spaces or reserved characters. Standard escape sequences applied. | +| **Multi-line** | `"""..."""` | Contains newlines. Requires `AllowMultiline` flag. | + +### Raw String Formatting + +Raw strings disable escape sequence processing. Use raw strings for Regex patterns, Windows paths, or content with heavy quotation marks. + +**Delimiter Calculation:** +The writer identifies the longest consecutive sequence of `#` characters in the source string. It then wraps the string in `n+1` hashes to prevent premature termination. + +**Style Flags:** + +- **RawPaths:** Employs raw strings if the value contains `/` or `\`. +- **PreferRaw:** Employs raw strings if the content requires escaping (e.g., internal quotes). + +### Indentation and Style Settings + +Configure document appearance through the **KdlWriterOptions** record. + +| Option | Values | Description | +| :--- | :--- | :--- | +| **IndentType** | `Spaces`, `Tabs` | Sets the character used for nesting. | +| **IndentSize** | `Two`, `Four` | Sets the count of spaces per level (ignored for Tabs). | +| **EscapeUnicode** | `true`, `false` | If `true`, non-ASCII characters emit as `\u{XXXX}`. | +| **NewLine** | `\n` | Internal constant. All output uses LF line endings. | + +### Code Example: Formatting Configuration + +Apply custom formatting by passing options to the serializer. + +```csharp +var options = new KdlSerializerOptions +{ + StringStyle = KdlStringStyle.RawPaths | KdlStringStyle.AllowBare, + Writer = new KdlWriterOptions + { + IndentType = KdlWriterIndentType.Spaces, + IndentSize = KdlWriterIndentSize.Two, + EscapeUnicode = true + } +}; + +string kdl = KdlSerializer.Serialize(myObject, options); +``` + +## Integrations + +### Microsoft.Extensions.Configuration + +The `Kuddle.Net.Extensions.Configuration` package enables KDL as a configuration source. + +**Installation:** + +```bash +dotnet add package Kuddle.Net.Extensions.Configuration +``` + +**Implementation:** +Add the KDL provider to the `ConfigurationBuilder`. + +```csharp +using Kuddle.Extensions.Configuration; + +var config = new ConfigurationBuilder() + .AddKdlFile("appsettings.kdl", optional: false, reloadOnChange: true) + .Build(); + +string connection = config["database:connection-string"]; +``` + +### Configuration Key Mapping + +KDL document structures map to flattened .NET configuration keys using the following logic: + +| KDL Structure | Configuration Key | Example | +| :--- | :--- | :--- | +| **Nested Nodes** | Colon Separator | `server { port 80 }` → `server:port` | +| **Anonymous Nodes (`-`)** | Numeric Index | `- "val"` → `:0`, `:1` | +| **Node Arguments** | Numeric Index | `endpoints "a" "b"` → `endpoints:0`, `endpoints:1` | +| **Properties** | Key Name | `node key="val"` → `node:key` | + +### Exception Reference + +Kuddle.Net uses specific exceptions for syntax and mapping failures. + +| Exception | Root Cause | Critical Properties | +| :--- | :--- | :--- | +| **KuddleParseException** | Syntax error in KDL source. | `Line`, `Column`, `Offset` | +| **KuddleSerializationException** | CLR/KDL type mismatch. | `Message` | +| **KuddleValidationException** | Reserved type format failure. | `Errors` (Collection) | +| **KdlConfigurationException** | Invalid attribute configuration. | `Message` | + +### Diagnostic Coordinates + +`KuddleParseException` provides exact locations for syntax correction: -Kuddle.Net is licensed under the MIT License. +- **Line:** 1-based line number. +- **Column:** 1-based column number. +- **Offset:** 0-based character position from document start. diff --git a/todo.md b/todo.md deleted file mode 100644 index 491a5eb..0000000 --- a/todo.md +++ /dev/null @@ -1,104 +0,0 @@ -# TODO - -## Phase 1: Foundation & Metadata (The "Brain") - -*Before parsing a single byte, your system must understand the shape of your C# types.* - -* **1.1. Define the Attribute Suite** - * [ ] Create `KdlPropertyDictionaryAttribute`. - * [ ] Create `KdlNodeDictionaryAttribute` (with `string NodeName` property). - * [ ] Create `KdlKeyedNodesAttribute` (with `string NodeName` and `string KeyProperty`). - * [ ] Add `Enforce` bool to existing attributes (for future Strict Mode). - -* **1.2. Upgrade `KdlEntryMapping`** - * [ ] Update the record to detect the 3 new Dictionary attributes. - * [ ] Add logic to resolve the effective `Name` (handling the fallback to property names). - * [ ] Add a field to store `KeyPropertyName` (specifically for `KdlKeyedNodes`). - -* **1.3. Implement Type Inspection Utilities** - * [ ] Implement `TypeHelpers.GetDictionaryInfo(Type t)`: Returns `(bool IsDict, Type KeyType, Type ValueType)`. Must handle `class MyDict : Dictionary`. - * [ ] Implement `TypeHelpers.GetCollectionInfo(Type t)`: Returns `(bool IsCol, Type ElemType)`. Must handle Arrays and Lists. - -* **1.4. Build the `TypeMetadata` Validator** - * [ ] Implement **Attribute Exclusion Check**: Throw if a property has both `[KdlProperty]` and `[KdlNode]`. - * [ ] Implement **Contiguity Check**: Throw if `[KdlArgument]` indices have gaps (e.g., 0, 2). - * [ ] Implement **Bucketing**: Pre-sort mappings into `Arguments`, `Properties`, `Children`, and `Dictionaries` lists so the parser doesn't scan attributes at runtime. - ---- - -## Phase 2: Core Logic Refactoring (The "Muscle") - -*Update the main loop to use your new Metadata instead of raw reflection.* - -* **2.1. Refactor `Deserialize`** - * [ ] Change the entry point to look up `TypeMetadata.For()`. - * [ ] Replace current attribute lookups with loops over the pre-calculated Metadata buckets. - -* **2.2. Implement Strict Argument Mapping** - * [ ] Iterate `meta.ArgumentAttributes`. - * [ ] Map KDL Argument `i` to Property `i`. - * [ ] **Validation**: If KDL has fewer arguments than required (non-nullable) properties, decide if you throw or use default. - -* **2.3. Implement Child Node Mapping (`[KdlNode]`)** - * [ ] **Collection Mode**: If `meta.IsCollection` or property is `List`, find *all* matching children, deserialize, and Add. - * [ ] **Single Object Mode**: If property is a class, find *exactly one* matching child. Throw if multiple exist (ambiguous match). - * [ ] **Scalar Flattening**: If property is `int`/`string`, find child, read `Arg[0]`, assign. - ---- - -## Phase 3: The Dictionary Engine (The "Complex Part") - -*Implement the three strategies for IDictionary.* - -* **3.1. Implement `[KdlPropertyDictionary]`** - * [ ] Iterate over **Properties** of the *current* KDL node. - * [ ] Filter out properties already mapped to explicit C# properties. - * [ ] Cast/Convert remaining values to `TValue` (usually string) and add to the dictionary. - -* **3.2. Implement `[KdlNodeDictionary]`** - * [ ] Iterate over **Child Nodes** of the current KDL node. - * [ ] **Key Extraction**: Use the Child Node's Name. - * [ ] **Value Extraction (Scalar)**: If `TValue` is primitive, read `Arg[0]`. - * [ ] **Value Extraction (Object)**: If `TValue` is complex, recursively call `DeserializeNode`. - -* **3.3. Implement `[KdlKeyedNodes]`** - * [ ] Find all child nodes matching the attribute's `NodeName`. - * [ ] Loop: - 1. Deserialize child node into `TObject`. - 2. Use Reflection to read `KeyProperty` from `TObject`. - 3. Add `(Key, TObject)` to the dictionary. - ---- - -## Phase 4: Collection & Instantiation - -*Handle the "plumbing" of creating objects and lists.* - -* **4.1. Factory Logic** - * [ ] Ensure every target type has a parameterless constructor. - * [ ] For collections: Handle `List`, `T[]` (needs buffering), and `Dictionary`. - * [ ] Handle **Read-Only Properties**: If a collection property is `get` only but not null, `Clear()` it and reuse the instance rather than trying to set it. - -* **4.2. Nullability Safety** - * [ ] Check for `KdlNull` tokens. - * [ ] Throw `KdlInvalidCastException` if trying to assign `#null` to `int` or `bool`. - ---- - -## Phase 5: Verification - -*Prove it works.* - -* **5.1. Test The "Theme/Layout" Scenario** - * [ ] Create the complex nested Dictionary structure from our previous discussion. - * [ ] Verify deeply nested recursion works. -* **5.2. Test Failure Modes** - * [ ] Test "Duplicate Attributes" -> Expect Startup Crash. - * [ ] Test "Missing Argument Index" -> Expect Startup Crash. - * [ ] Test "Duplicate Key in Dictionary" -> Expect Deserialization Crash. - -## What is deferred (Post-v1) - -* Type Annotations logic (`(uuid)`, `(date-time)`). -* Serialization (Writing C# -> KDL). -* Polymorphism (Selecting different derived classes based on KDL annotations).