diff --git a/annotations.md b/annotations.md index dabbafdc1..f3ba371be 100644 --- a/annotations.md +++ b/annotations.md @@ -31,12 +31,12 @@ This specification defines the following annotation keys, intended for but not l * **org.opencontainers.image.ref.name** Name of the reference for a target (string). * SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md). * Character set of the value SHOULD conform to alphanum of `A-Za-z0-9` and separator set of `-._:@/+` - * An EBNF'esque grammar + regular expression like: + * The reference must match the following [grammar](considerations.md#ebnf): ``` - ref := component ["/" component]* - component := alphanum [separator alphanum]* - alphanum := /[A-Za-z0-9]+/ - separator := /[-._:@+]/ | "--" + ref ::= component ("/" component)* + component ::= alphanum (separator alphanum)* + alphanum ::= [A-Za-z0-9]+ + separator ::= [-._:@+] | "--" ``` * **org.opencontainers.image.title** Human-readable title of the image (string) * **org.opencontainers.image.description** Human-readable description of the software packaged in the image (string) diff --git a/considerations.md b/considerations.md index 55e84942c..92df969c2 100644 --- a/considerations.md +++ b/considerations.md @@ -24,3 +24,107 @@ Implementations: [github.com/docker/go]: https://github.com/docker/go/ [Go]: https://golang.org/ [JSON]: http://json.org/ + +# EBNF + +For field formats described in this specification, we use a limited subset of [Extended Backus-Naur Form][ebnf], similar to that used by the [XML specification][xmlebnf]. +Grammars present in the OCI specification are regular and can be converted to a single regular expressions. +However, regular expressions are avoided to limit abiguity between regular expression syntax. +By defining a subset of EBNF used here, the possibility of variation, misunderstanding or ambiguities from linking to a larger specification can be avoided. + +Grammars are made up of rules in the following form: + +``` +symbol ::= expression +``` + +We can say we have the production identified by symbol if the input is matched by the expression. +Whitespace is completely ignored in rule definitions. + +## Expressions + +The simplest expression is the literal, surrounded by quotes: + +``` +literal ::= "matchthis" +``` + +The above expression defines a symbol, "literal", that matches the exact input of "matchthis". +Character classes are delineated by brackets (`[]`), describing either a set, range or multiple range of characters: + +``` +set := [abc] +range := [A-Z] +``` + +The above symbol "set" would match one character of either "a", "b" or "c". +The symbol "range" would match any character, "A" to "Z", inclusive. +Currently, only matching for 7-bit ascii literals and character classes is defined, as that is all that is required by this specification. + +Expressions can be made up of one or more expressions, such that one must be followed by the other. +This is known as an implicit concatenation operator. +For example, to satisfy the following rule, both `A` and `B` must be matched to satisfy the rule: + +``` +symbol ::= A B +``` + +Each expression must be matched once and only once, `A` followed by `B`. +To support the description of repetition and optional match criteria, the postfix operators `*` and `+` are defined. +`*` indicates that the preceeding expression can be matched zero or more times. +`+` indicates that the preceeding expression must be matched one or more times. +These appear in the following form: + +``` +zeroormore ::= expression* +oneormore ::= expression+ +``` + +Parentheses are used to group expressions into a larger expression: + +``` +group ::= (A B) +``` + +Like simpler expressions above, operators can be applied to groups, as well. +To allow for alternates, we also define the infix operator `|`. + +``` +oneof ::= A | B +``` + +The above indicates that the expression should match one of the expressions, `A` or `B`. + +## Precedence + +The operator precedence is in the following order: + +- Terminals (literals and character classes) +- Grouping `()` +- Unary operators `+*` +- Concatenation +- Alternates `|` + +The precedence can be better described using grouping to show equivalents. +Concatenation has higher precedence than alernates, such `A B | C D` is equivalent to `(A B) | (C D)`. +Unary operators have higher precedence than alternates and concatenation, such that `A+ | B+` is equivalent to `(A+) | (B+)`. + +## Examples + +The following combines the previous definitions to match a simple, relative path name, describing the individual components: + +``` +path ::= component ("/" component)* +component ::= [a-z]+ +``` + +The production "component" is one or more lowercase letters. +A "path" is then at least one component, possibly followed by zero or more slash-component pairs. +The above can be converted into the following regular expression: + +``` +[a-z]+(?:/[a-z]+)* +``` + +[ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form +[xmlebnf]: https://www.w3.org/TR/REC-xml/#sec-notation diff --git a/descriptor.md b/descriptor.md index 2757a1fcb..8a9ff0626 100644 --- a/descriptor.md +++ b/descriptor.md @@ -66,14 +66,14 @@ If the _digest_ can be communicated in a secure manner, one can verify content f The value of the `digest` property is a string consisting of an _algorithm_ portion and an _encoded_ portion. The _algorithm_ specifies the cryptographic hash function and encoding used for the digest; the _encoded_ portion contains the encoded result of the hash function. -A digest string MUST match the following grammar: +A digest string MUST match the following [grammar](considerations.md#ebnf): ``` -digest := algorithm ":" encoded -algorithm := algorithm-component [algorithm-separator algorithm-component]* -algorithm-component := /[a-z0-9]+/ -algorithm-separator := /[+._-]/ -encoded := /[a-zA-Z0-9=_-]+/ +digest ::= algorithm ":" encoded +algorithm ::= algorithm-component (algorithm-separator algorithm-component)* +algorithm-component ::= [a-z0-9]+ +algorithm-separator ::= [+._-] +encoded ::= [a-zA-Z0-9=_-]+ ``` Note that _algorithm_ MAY impose algorithm-specific restriction on the grammar of the _encoded_ portion.