Skip to content

Commit b415ddc

Browse files
committed
fix(enumerate): prioritize alphanumeric chars
Previously, `enumerate` would output characters in the "unicode order". That means, characters which are early in the unicode range are output first (such as "\u0000", "\u0001"). Now, the more common alphanumeric characters are always produced first. That way, we get the more representative strings tend to get enumerated first, which is useful because we usually only look at the first few items in the enumeration.
1 parent 1fd0e37 commit b415ddc

File tree

1 file changed

+21
-1
lines changed

1 file changed

+21
-1
lines changed

src/char-set.ts

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -317,8 +317,28 @@ export function toString(set: CharSet): string {
317317
}
318318

319319
export function enumerate(set: CharSet): Stream.Stream<string> {
320+
// If we enumerate the set in "unicode order" then we only get
321+
// chars like "\u0000", "\u0001" for a while. We prefer to enumerate
322+
// more common characters first, since users will usually only
323+
// look at the first few items in the enumeration.
324+
const lowerChars = charRange('a', 'z')
325+
const upperChars = charRange('A', 'Z')
326+
const numChars = charRange('0', '9')
327+
328+
// The input set minus the "common characters ranges":
329+
const restChars = [lowerChars, upperChars, numChars].reduce(
330+
(acc, item) => difference(acc, item), set
331+
)
332+
333+
const rangesWithBiasedOrder = [
334+
...getRanges(intersection(lowerChars, set)),
335+
...getRanges(intersection(upperChars, set)),
336+
...getRanges(intersection(numChars, set)),
337+
...getRanges(restChars),
338+
]
339+
320340
return Stream.concat(Stream.fromArray(
321-
[...getRanges(set)].map(
341+
rangesWithBiasedOrder.map(
322342
range => Stream.map(
323343
codePoint => String.fromCodePoint(codePoint),
324344
Stream.range(range.start, range.end)

0 commit comments

Comments
 (0)