Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
336 changes: 336 additions & 0 deletions spices/SPICE-0012-url-standard-library-module.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,336 @@
= URL standard library module

* Proposal: link:./SPICE-0012-url-standard-library-module.adoc[SPICE-0012]
* Author: https://github.com/bioball[Dan Chao]
* Status: TBD
* Implemented in: TBD
* Category: Standard Library

== Introduction

This proposal introduces a new standard library module for managing and describing URLs.

== Motivation

A URL (URI) is a common type used within service configuration.

Examples:

* Website addresses
* Database connection strings
* Binary objects (data URIs)

In the base module is a typealias for `Uri`, but only defines it as a string and does not provide any extra validation.

Currently, there exists an https://pkl-lang.org/package-docs/pkg.pkl-lang.org/pkl-pantry/pkl.experimental.uri/current/URI/index.html[experimental URI library].
Much of this design is drawn from the learnings of that library.

== Proposed Solution

A new standard library module will be added, called `pkl.Url`.

A new external property on `String` will be added, called `isValidUrl`.

The `Uri` typealias will be changed to `typealias Uri = String(isValidUrl)`.

== Detailed design

Pkl's URL implementation will follow rules described in https://url.spec.whatwg.org[WHATWG URL standard].

Following the standard, it will be called "URL", and not "URI" nor "IRI".
The https://url.spec.whatwg.org/#goals[rationale] for this naming:

> Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the https://trends.google.com/trends/explore?q=url,uri[search result popularity contest].

=== module-level properties

The following make up the properties of the Url class:

.pkl.Url
[source,pkl]
----
module pkl.Url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a URI like env:USER be represented by this module?

In go, this is parsed as

url.URL{
  Scheme: "env",
  Opaque: "USER",
}

Copy link
Member Author

@bioball bioball Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be parsed as new URL { scheme = "env"; path = "USER" }.

See https://url.spec.whatwg.org/#example-url-components

Actually, I wonder how we can better support opaque URLs. The env URL env://foo/bar represents an env var with value //foo/bar, and doesn't mean "the host is foo and the path is /bar".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to proffer Go's URL implementation as a carrot and Swift's as the stick:

Go

Go's url.Parse function does (roughly) this:

  1. Split off the fragment, if present, and decode/store it
  2. Split off the scheme, if present
  3. Split off the query, if present, and store it
  4. If a scheme is present and the character immediately following the : is not /, store the remainder in Opaque and return
  5. Otherwise, proceed to parse the authority and path

url.URL.String produces a string from a URL struct. When Opaque is non-empty, the result has form scheme:opaque?query#fragment, otherwise it's scheme://userinfo@host/path?query#fragment.

In practice, I've found this API to be extremely usable and flexible enough for working with the varied URIs found in the Pkl ecosystem. This may be a good design to reference that departs somewhat from the literal spec in the name of usability.

Swift

Having worked with pkl-swift, I've found Foundation URL type's lack of similar support makes working with opaque URIs like env:USER significantly less ergonomic. Parsing just the "USER" out requires error-prone mangling of the full URL string, eg. https://github.com/apple/pkl-swift/blob/c280a0e6324097c429bfc19a2cfe5761ab12bbe9/docs/modules/ROOT/pages/external-readers.adoc?plain=1#L33) to get the equivalent of Go's url.URL.Opaque.

Pkl

Given that opaque URIs are already commonplace in the Pkl ecosystem (and used for several of the runtime's built-in resources), I think having first-class support for them in this proposal is necessary. A similar model to Go's could be adopted, but with added type constraints so ensure that opaque is never non-null at the same time as hostname/port/path`.

A case like env://foo/bar (or env:/foo/bar) is still tricky (and go parses them "wrong"). I wonder if it would be reasonable to do something like this:

opaque: String = "<rendered authority/path>"

Url.toString() would then use this field and ignore the authority/path field. This would allow direct construction of the Url to directly set opaque while still maintaining the same behavior as non-opaque URLs. Similarly, during parsing, if an authority/path are found (same criteria as Go) then those fields are populated, otherwise opaque is set directly.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually need to do wrangling too with Go's url API for opaque URLs. For example, to turn a URL like env://foo%20/bar into //foo /bar, you'd need:

func getSchemeSpecificPart(u *url.URL) (string, error) {
  return url.PathUnescape(strings.Split(u.String(), ":")[1])
}

Whether a URL is opaque or not is really up to the scheme, and a simple rule like "if the char after the colon is /, it's opaque" doesn't feel good enough.


/// The scheme component.
scheme: AsciiString

/// The username component.
///
/// If the URL does not require a username, set to the empty string.
username: AsciiString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to make the empty string the "null" value instead of proper null like port, path, etc.?
I'd rather go with AsciiString?.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. According to the spec, this field (and several other ones) do not admit null:

https://url.spec.whatwg.org/#url-representation

There are some fields that can be null, but this is not one of them.

Practically, I don't know if this makes any difference.


/// The password component.
///
/// If the URL does not require a password, set to the empty string.
password: AsciiString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above.


/// A domain name, IPV4 address, IPV6 address or an otherwise opaque host.
hostname: String?

/// The port component.
port: UInt16?

/// The path component.
///
/// It typically refers to a directory or a file, but has no predefined meaning.
path: String?

/// The query string component.
query: String?

/// The fragment component.
fragment: AsciiString?

/// A string whose characters are in the printable ASCII range (code points `0x20` through `0x7e`).
local typealias AsciiString = String(matches(Regex("[ -~]*")))
----

=== Parser API

A parser API will be introduced for parsing string inputs into URLs. This parser is a class within module `pkl.Url`.

The parser will follow the steps as described in https://url.spec.whatwg.org/#concept-basic-url-parser[WHATWG].

The base URL, as per the specification, is used to help resolve relative-URL strings.

.pkl.Url
[source,pkl]
----
module pkl.Url

import "pkl:Url"

// etc

/// A URL parser.
///
/// Follows the specification in <https://url.spec.whatwg.org/#concept-basic-url-parser>.
class Parser {
/// The base URL, if any.
base: Url?

/// Parses [source] into a URL.
///
/// Throws if [source] is an invalid URL.
external function parse(source: String): Url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with having one version that throws on failure, for when users know their URL is correct. But we should have parseOrNull(source: String): Url? for when you want to recover from errors.

Copy link
Member Author

@bioball bioball Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our other parsers don't have a parseOrNull, so, this follows the same precedent (see yaml.Parser and json.Parser).

If you need to recover from errors, you can use test.catch

Copy link
Member Author

@bioball bioball Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting a little bit more on this, I'm convinced that we should just have a parseOrNull here, updated.

The rationale is: we also have similar "OrNull" methods in other places. It makes it much easier for users; e.g. Uri.parseOrNull(input) ?? Uri.parse("https://example.com")

We should have the YAML and JSON parsers conform to this as a future task.

}
----

=== `SearchParams` API

A search params API will be introduced for working with `application/x-www-form-urlencoded` encoded query strings.

Additionally, a `hidden fixed` property is added representing the parsed search params of the current URL's query string.

.pkl.Url
[source,pkl]
----
module pkl.Url

// etc

/// The parsed query as search params.
hidden fixed searchParams: SearchParams? =
if (query != null) SearchParams(query)
else null

/// Creates a [SearchParams] from the given form encoded string.
const function SearchParams(input: String): SearchParams = // etc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible input is not a valid query string? If so, what happens then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question; I don't think so.

Here is the parsing algorithm: https://url.spec.whatwg.org/#urlencoded-parsing

None of these steps involve failure.


/// A representation of data encoded in `application/x-www-form-urlencoded` format.
class SearchParams {
values: Mapping<String, Listing<String>>

function toString()
}
----

=== Percent encoding API

Several new methods will be introduced for working with percent encoding.

The `encode` method follows the `encodeURI` method as described in https://262.ecma-international.org/5.1/#sec-15.1.3.3[ECMA-262 15.1.3.3].

The `encodeComponent` method follows the `encodeURIComponent` method as described in https://262.ecma-international.org/5.1/#sec-15.1.3.4[ECMA-262 15.1.3.4]

.pkl.Url
[source,pkl]
----
module pkl.Url

/// The [percent-encoding](https://en.wikipedia.org/wiki/Percent-encoding) of the UTF-8 bytes of
/// [source].
///
/// Example:
/// ```
/// percentEncode(" ") == "%20"
/// percentEncode("/") == "%2F"
/// ```
const external function percentEncode(source: String): String

/// The [percent-decoding](https://en.wikipedia.org/wiki/Percent-encoding) of [source] as utf-8 bytes into its underlying string.
///
/// Example:
/// ```
/// percentDecode("%20") == " "
/// percentDecode("%2F") == "/"
/// ```
const external function percentDecode(source: String): String

/// Encodes [value] using percent-encoding to make it safe for the literal use as a URI.
///
/// All characters except for alphanumeric chracters, and the chracters `!#$&'()*+,-./:;=?@_~`
/// are percent-encoded.
///
/// Follows the rules for the `encodeURI` function as described by
/// [ECMA-262](https://262.ecma-international.org/5.1/#sec-15.1.3.3).
///
/// Facts:
/// ```
/// encode("https://example.com/some path/") == "https://example.com/some%20path"
/// ```
const external function encode(value: String): String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const external function encode(value: String): String
const external function encodeUri(value: String): String

encode is a bit too generic IMO, and it was hard to understand the difference between it and percentEncode at first glance.
Could also be called encodeForUri, encodeUriString, etc.

Copy link
Member Author

@bioball bioball Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module's name is already Url, so, usage looks like Url.encode. I think Url.encodeUri would be a little too wordy.

By the way, one of the suggestions of this SPICE is to prefer "URL" over "URI".


/// Encodes [str] using percent-encoding to make it safe to literal use as a URI component.
///
/// All characters except for alphanumeric characters, and the characters `-_.!~*'()` are
/// percent-encoded.
///
/// Follows the rules for the `encodeURIComponent` function as described by
/// [ECMA-262](https://262.ecma-international.org/5.1/#sec-15.1.3.4).
///
/// Facts:
/// ```
/// encodeComponent("https://example.com/some path") == "https%3A%2F%2example.com%2Fsome%20path"
/// ```
const external function encodeComponent(value: String): String
----

=== Method `toString()`

The `toString()` will be overloaded to return the serialized URL.

.pkl.Url
[source,pkl]
----
module pkl.Url

// etc

function toString() = // implementation
----

==== Sample usage:

[source,pkl]
----
myUrl: Url = new {
scheme = "https"
host = "example.com"
path = "/foo.txt"
}

result = myUrl.toString() // <1>
----
<1> `result = "\https://example.com/foo.txt"`

=== Method `resolveUrl()`

A method, `resolveUrl()`, accepts another URL and resolves it as a reference to this URL.

It follows the rules described in https://www.rfc-editor.org/rfc/rfc3986#section-5.2[RFC-3986 Section 5.2].

.pkl.Url
[source,pkl]
----
module pkl.Url

import "pkl:Url"

// etc

/// Resolves [other] as a URI reference to this URI.
///
/// Follows the rules described in
/// [RFC-3986 Section 5.2](https://www.rfc-editor.org/rfc/rfc3986#section-5.2).
function resolveUrl(other: Url) = // implementation
----

=== Sample usage

URLs can be constructed either by using the parser, or directly by setting fields on the struct.

[source,pkl]
----
import "pkl:Url"

myUrl: Url = new { // <1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse will throw (or return null) on an invalid URL. What happens when you create an invalid URL with new? Will it throw?
We could have a UrlBuilder that allows you to create a URL with new and call toUrl(orNull) to validate it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ensure that you cannot construct invalid Urls through property types and constraints.

scheme = "https"
host = "example.com"
path = "/foo.txt"
}

local parser: Url.Parser = new {}

myUrl2: Url = parser.parse("https://example.com/foo.txt") // <2>

myUrl3: Url = new { // <3>
local sp: Url.SearchParams = new {
values {
["key"] { "730d67" }
}
}
scheme = "https"
host = "example.com"
path = "/foo.txt"
query = sp.toString()
}

myUrl4: Url = // <4>
let (parsed = parser.parse("https://example.com/foo.txt?foo=bar"))
(parsed) {
query = (super.searchParams) {
values {
["qux"] { "corge" }
}
}.toString()
}
----
<1> Constructing URL directly
<2> Constructing a URL using `Url.Parser.parse()`
<3> Constructing a URL query using the `SearchParams` API
<4> Constructing a URL from an existing URL, and adding to its query string via the `SearchParams` API

== Compatibility

This is purely a new API, and is backwards compatible with existing Pkl.

== Future directions

=== IP Address Library

A URL's host can possibly contain IPV4 and IPV6 addresses.
To enhance using these types of URLs, Pkl can possibly introduce an IP Address library in the future.

With an IP address library, it is possible to provide better constraints on the `host` property (either ASCII string or IP address).

=== Modifying other standard library properties

There are some other places throughout the standard library that make use of URIs.

These include:

* `pkl.reflect.Module.uri`
* `pkl.reflect.Module.imports`
* `pkl.Project.projectFileUri`
* `pkl.EvaluatorSettings.Proxy.address`

Currently, these are typed using typealias `Uri`.
A possible future direction is to change these types to `pkl.Url`.

== Alternatives considered

Instead of introducing a new module, we can add these as types to `pkl.base`.
However, any name added to the base module is a breaking change (a variable resolved off implicit `this` will break).

Additionally, adding new classes adds more overhead to the evaluation of any module.