Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify non_json_data.md, document contentSchema keyword #1028

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 107 additions & 48 deletions pages/understanding-json-schema/reference/non_json_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,88 +3,147 @@ title: "Media: string-encoding non-JSON data"
section: docs
---

<Keywords label="single: non-JSON data single: media" />
<Keywords label="single: non-JSON data single: media"/>

<Star label="New in draft 7" />

JSON schema has a set of [keywords](../../learn/glossary#keyword) to describe and optionally validate
non-JSON data stored inside JSON strings. Since it would be difficult to
write validators for many media types, JSON schema validators are not
required to validate the contents of JSON strings based on these
keywords. However, these keywords are still useful for an application
that consumes validated JSON.
JSON schema has a set of [keywords](../../learn/glossary#keyword) to describe and optionally validate non-JSON data stored inside JSON strings. Due to the difficulty in writing validators for all media types, JSON schema validators are not required to validate the contents of JSON strings based on these keywords. However, applications that consume validated JSON use these keywords to encode and decode data during the storage and transmission of media types.

<Keywords label="single: contentMediaType single: media; contentMediaType" />

## contentMediaType
## contentMediaType and contentEncoding

The `contentMediaType` keyword specifies the MIME type of the contents
of a string, as described in [RFC 2046](https://tools.ietf.org/html/rfc2046).
There is a list of [MIME types officially registered by the IANA](http://www.iana.org/assignments/media-types/media-types.xhtml),
but the set of types supported will be application and operating system dependent.
Mozilla Developer Network also maintains a [shorter list of MIME types that are important for the web](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Complete_list_of_MIME_types)
The `contentMediaType` keyword specifies the media type of the content of a string, as described in [RFC 2046](https://tools.ietf.org/html/rfc2046). The Internet Assigned Numbers Authority (IANA) has officially registered [a comprehensive list of media types](http://www.iana.org/assignments/media-types/media-types.xhtml), but the set of supported types depends on the application and operating system. Mozilla Developer Network maintains a [shorter list of media types that are important for the web](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Complete_list_of_MIME_types)

<Keywords label="single: contentEncoding single: media; contentEncoding" />
### Example

The following schema specifies a string containing an HTML file using the document's default encoding.

## contentEncoding
```json
// props { "isSchema": true }
{
"type": "string",
"contentMediaType": "text/html"
}
```
```json
// props { "indent": true, "valid": true }
"<!DOCTYPE html><html xmlns=\"http://www.w3.org/1999/xhtml\"><head></head></html>"
```

The `contentEncoding` keyword specifies the encoding used to store the
contents, as specified in [RFC 2054, part
6.1](https://tools.ietf.org/html/rfc2045) and [RFC
4648](https://datatracker.ietf.org/doc/html/rfc4648).
<Keywords label="single: contentEncoding single: media; contentEncoding" />

The acceptable values are `quoted-printable`,
`base16`, `base32`, and `base64`. If not specified, the encoding is the
same as the containing JSON document.

Without getting into the low-level details of each of these encodings,
there are really only two options useful for modern usage:
The `contentEncoding` keyword specifies the encoding used to store the contents, as specified in [RFC 2054, part 6.1](https://tools.ietf.org/html/rfc2045) and [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648).

- If the content is encoded in the same encoding as the enclosing JSON
document (which for practical purposes, is almost always UTF-8),
leave `contentEncoding` unspecified, and include the content in a
string as-is. This includes text-based content types, such as
`text/html` or `application/xml`.
- If the content is binary data, set `contentEncoding` to `base64` and
encode the contents using
[Base64](https://tools.ietf.org/html/rfc4648). This would include
many image types, such as `image/png` or audio types, such as
`audio/mpeg`.
The acceptable values are the following:
- `quoted-printable`
- `base16`
- `base32`
- `base64`

If not specified, the encoding is the same as the containing JSON document.

<Keywords label="single: contentSchema single: media; contentSchema" />
There are two main scenarios:

## contentSchema
<Star label="New in draft 2019-09" />
1. **Same encoding as JSON document**: Leave `contentEncoding` unspecified and include the content in a string as-is. This is suitable for text-based content types (e.g., `text/html`, `application/xml`) and assumes UTF-8 encoding in most cases.
2. **Binary data**: Set `contentEncoding` to `base64` and encode the content using Base64. This is appropriate for binary content types such as images (`image/png`) or audio files (`audio/mpeg`).

Documentation Coming soon

## Examples
### Example

The following schema indicates the string contains an HTML document,
encoded using the same encoding as the surrounding document:
The following schema indicates that a string contains a PNG file and is encoded using Base64:

```json
// props { "isSchema": true }
{
"type": "string",
"contentMediaType": "text/html"
"contentEncoding": "base64",
"contentMediaType": "image/png"
}
```
```json
// props { "indent": true, "valid": true }
"<!DOCTYPE html><html xmlns=\"http://www.w3.org/1999/xhtml\"><head></head></html>"
"iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAABmJLR0QA/wD/AP+gvaeTAAAA..."
```
The following schema indicates that a string contains a PNG image, encoded using Base64:

To better understand how `contentEncoding` and `contentMediaType` are applied in practice, let's consider the process of transmitting non-JSON data:

<!--
![Role of contentEncoding and contenMediaType keywords in the transmission of non-JSON data](/img/media-keywords.png)
-->

```mermaid
block-beta
columns 9
A space B space C space D space E
F space:5 G space:2

A{{"Sender"}} --> F{"contentEncoding
contentMediaType"}
F{"contentEncoding
contentMediaType"} --> B{{"Encoded data"}}
B{{"Encoded data"}} --> C(["Transmission"])
C(["Transmission"]) --> D{{"Consumer application"}}
D{{"Consumer application"}} --> G{"contentEncoding
contentMediaType"}
G{"contentEncoding
contentMediaType"} --> E{{"Decoded data"}}
```

1. The sender encodes the content, using `contentEncoding` to specify the encoding method (e.g., base64) and `contentMediaType` to indicate the media type of the original content.
2. The encoded data is then transmitted.
3. Upon receiving the data, the consumer application uses the `contentEncoding` and `contentMediaType` information to select the appropriate decoding method.
4. Finally, the consumer application decodes the data, restoring it to its original form.

This process ensures that the non-JSON content is properly encoded for transmission and accurately decoded by the recipient, maintaining the integrity of the data throughout the process.

<Keywords label="single: contentSchema single: media; contentSchema" />

Copy link
Contributor

@AnithaKraman AnithaKraman Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the content of the contributing.md file and the Google style guide, I has just assumed that we are replacing the "keywords", "star" tags in the new documentation. Now, I'm not so sure about my assumption :). Please let me know. Thank you

Copy link
Contributor Author

@valeriahhdez valeriahhdez Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the review and editing work is to make sure that the documentation uses the tags according to the rules of the style guide: https://github.com/json-schema-org/website/blob/main/pages/md-style-guide.md
That could mean replacing the "Star" tag when it doesn't follow the guidelines with another tag best suited for the context.

Regarding the "Keywords" tag, we don't have it documented in the style guide, yet. But that is a topic for another conversation that I posted on the general channel.

## contentSchema
<Star label="New in draft 2019-09" />

The value of `contentSchema` must be a valid JSON schema that you can use to define the structure and constraints of the content. It is used in conjunction with `contentMediaType` when the instance is a string. If `contentMediaType` is absent, the value of `contentSchema` is ignored.

## Full example

The following schema indicates that a string contains a JSON object encoded using Base64:

```json
// props { "isSchema": true }
{
"type": "string",
"contentEncoding": "base64",
"contentMediaType": "image/png"
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"data": {
"type": "string",
"contentMediaType": "application/json",
"contentEncoding": "base64",
"contentSchema": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
}
},
"required": ["name", "age"]
}
}
}
}
```
```json
// props { "indent": true, "valid": true }
"iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAABmJLR0QA/wD/AP+gvaeTAAAA..."
"eyJuYW1lIjoiSm9obiBEb2UiLCJ0b21lIjoiMjUifQ=="
```

```json
// props { "indent": true, "valid": true }
{
"name": "John Doe",
"age": 25
}
```
Binary file added public/img/media-keywords.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.