-
Notifications
You must be signed in to change notification settings - Fork 50
[html] Various performance optimizations #2019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
PR Health
Breaking changes
|
Package | Change | Current Version | New Version | Needed Version | Looking good? |
---|---|---|---|---|---|
html | Breaking | 0.15.5+1 | 0.15.6 | 0.16.0 Got "0.15.6" expected >= "0.16.0" (breaking changes) |
This check can be disabled by tagging the PR with skip-breaking-check
.
Changelog Entry ✔️
Package | Changed Files |
---|
Changes to files need to be accounted for in their respective changelogs.
Coverage ⚠️
File | Coverage |
---|---|
pkgs/html/lib/parser.dart | 💔 99 % ⬇️ 0 % |
pkgs/html/lib/src/constants.dart | 💚 93 % ⬆️ 0 % |
pkgs/html/lib/src/html_input_stream.dart | 💚 80 % ⬆️ 11 % |
pkgs/html/lib/src/tokenizer.dart | 💚 100 % |
pkgs/html/lib/src/treebuilder.dart | 💚 95 % |
pkgs/html/lib/src/trie.dart | 💔 Not covered |
pkgs/html/tool/generate_trie.dart | 💔 Not covered |
This check for test coverage is informational (issues shown here will not fail the PR).
This check can be disabled by tagging the PR with skip-coverage-check
.
API leaks ⚠️
The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.
Package | Leaked API symbols |
---|---|
html | HtmlTokenizer Token HtmlInputStream TagToken DoctypeToken StringToken TreeBuilder ActiveFormattingElements StartTagToken TagAttribute CommentToken CharactersToken SpaceCharactersToken EndTagToken |
This check can be disabled by tagging the PR with skip-leaking-check
.
License Headers ⚠️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files |
---|
pkgs/html/lib/parser.dart |
pkgs/html/lib/src/constants.dart |
pkgs/html/lib/src/html_input_stream.dart |
pkgs/html/lib/src/tokenizer.dart |
pkgs/html/lib/src/treebuilder.dart |
pkgs/html/test/parser_feature_test.dart |
pkgs/html/test/trie_test.dart |
pkgs/html/tool/generate_trie.dart |
All source files should start with a license header.
Unrelated files missing license headers
Files |
---|
pkgs/bazel_worker/benchmark/benchmark.dart |
pkgs/bazel_worker/example/client.dart |
pkgs/bazel_worker/example/worker.dart |
pkgs/benchmark_harness/integration_test/perf_benchmark_test.dart |
pkgs/boolean_selector/example/example.dart |
pkgs/clock/lib/clock.dart |
pkgs/clock/lib/src/clock.dart |
pkgs/clock/lib/src/default.dart |
pkgs/clock/lib/src/stopwatch.dart |
pkgs/clock/lib/src/utils.dart |
pkgs/clock/test/clock_test.dart |
pkgs/clock/test/default_test.dart |
pkgs/clock/test/stopwatch_test.dart |
pkgs/clock/test/utils.dart |
pkgs/coverage/lib/src/coverage_options.dart |
pkgs/coverage/test/collect_coverage_config_test.dart |
pkgs/coverage/test/config_file_locator_test.dart |
pkgs/html/example/main.dart |
pkgs/html/lib/dom.dart |
pkgs/html/lib/dom_parsing.dart |
pkgs/html/lib/html_escape.dart |
pkgs/html/lib/src/encoding_parser.dart |
pkgs/html/lib/src/list_proxy.dart |
pkgs/html/lib/src/query_selector.dart |
pkgs/html/lib/src/token.dart |
pkgs/html/lib/src/utils.dart |
pkgs/html/test/dom_test.dart |
pkgs/html/test/parser_test.dart |
pkgs/html/test/query_selector_test.dart |
pkgs/html/test/selectors/level1_baseline_test.dart |
pkgs/html/test/selectors/level1_lib.dart |
pkgs/html/test/selectors/selectors.dart |
pkgs/html/test/support.dart |
pkgs/html/test/tokenizer_test.dart |
pkgs/pubspec_parse/test/git_uri_test.dart |
pkgs/stack_trace/example/example.dart |
pkgs/watcher/test/custom_watcher_factory_test.dart |
pkgs/yaml_edit/example/example.dart |
This check can be disabled by tagging the PR with skip-license-check
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! A 2x Improvement sounds great. I had some minor comments and you'll need to up the version to 0.15.6
and add a changelog and it's good to go!
pkgs/html/example/generate_trie.dart
Outdated
@@ -0,0 +1,18 @@ | |||
import 'package:dart_style/dart_style.dart'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should live in html/tool/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkgs/html/example/generate_trie.dart
Outdated
'const entitiesTrieRoot = $root;'.replaceAll('{}', '<int, dynamic>{}'); | ||
final formatted = DartFormatter().format(source); | ||
// trimRight() because print adds its own newline | ||
print(formatted.trimRight()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of printing to stdout, this could override lib/src/trie.dart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkgs/html/example/generate_trie.dart
Outdated
} | ||
} | ||
final source = | ||
'const entitiesTrieRoot = $root;'.replaceAll('{}', '<int, dynamic>{}'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a // AUTO GENERATED by 'tool/generate_trie.dart'. DO NOT EDIT!
to the beginning of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkgs/html/example/generate_trie.dart
Outdated
for (final entity in entities.keys) { | ||
var node = root; | ||
for (final charCode in entity.codeUnits) { | ||
node = (node[charCode] ??= <int, dynamic>{}) as Map<int, dynamic>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good enough approach. Another way would be to create a 2d array of STATE * ALPHABET which would be around 500KBs of memory but faster to traverse. Best is using double array tries but that's more complicated to implement. A potential fun project for later perhaps since a 2x improvement is great already!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never heard of those, interesting
pkgs/html/lib/src/constants.dart
Outdated
@@ -424,7 +468,60 @@ const List<String> tableInsertModeElements = [ | |||
]; | |||
|
|||
// TODO(jmesserly): remove these in favor of the test functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the TODO comment mentions, a possible improvement here is to have a test function that only checks 2 integer bounds instead of using Set.contains
. Could be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it doesn't quite contain all letters right? Since there are some symbols between upper and lowercase letters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well then 4 integer bound checks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -77,9 +77,6 @@ On line 4, column 3 of ParseError: Unexpected DOCTYPE. Ignored. | |||
expect(parser.errors.length, 1); | |||
final error = parser.errors[0]; | |||
expect(error.errorCode, 'unexpected-doctype'); | |||
expect(error.span!.start.line, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand why we deleted these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I no longer generate fileInfo unless generateSpans=true, we don't get any error.span any more for this test which has generateSpans=false
|
pkgs/html/tool/generate_trie.dart
Outdated
'const entitiesTrieRoot = $root;'.replaceAll('{}', '<int, dynamic>{}'); | ||
final formatted = DartFormatter().format(source); | ||
final htmlDir = File(Platform.script.path).parent.parent; | ||
File('${htmlDir.path}/lib/src/trie.dart').writeAsStringSync(formatted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work on windows? You can use join
from package:path
to make it platform independent:
File(p.join(htmlDir.path, 'lib', 'src', 'trie.dart').writeAsStringSync(formatted);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it seems that this file is not formatted using dart format
causing the CI to complain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Package publishing
Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation. |
- Format generate_trie.dart - Use path join() in generate_trie.dart - Add charsUntilAsciiLetter() - Move all codes to Charcode {}
This is a noticeable speedup by scanning and avoiding creating a new string first.
I found another 40% speedup via some new commits |
Actually surprisingly slow
String toAsciiLowerCase() => | ||
String.fromCharCodes(codeUnits.map(_asciiToLower)); | ||
String toAsciiLowerCase() { | ||
if (codeUnits.any((c) => c >= Charcode.kUpperA && c <= Charcode.kUpperZ)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These conditions can live as functions: codeUnits.any(isUpperCaseCode)
static const int kGreaterThan = 0x3E; | ||
|
||
/// A | ||
static const int kUpperA = 0x41; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We usually don't use kFooBar
when identifying constants in Dart, instead we just use fooBar
.
https://dart.dev/effective-dart/style#dont-use-prefix-letters
: String.fromCharCode(_chars[_offset]); | ||
final charCode = _chars[_offset]; | ||
if (charCode < 256) { | ||
return asciiCharacters[charCode]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does changing this to String.fromCharCode(charCode)
affect the performance much?
@@ -107,8 +107,9 @@ class HtmlInputStream { | |||
} | |||
} | |||
|
|||
final isSurrogatePair = | |||
i + 1 < charsLength && _isLeadSurrogate(c) && _isTrailSurrogate(rawChars[i + 1]); | |||
final isSurrogatePair = _isLeadSurrogate(c) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also order it in the original way but when you think about the fact that most characters aren't surrogate, this makes sense as it saves a check per character! Good find!
Should be twice as fast according to my use cases. Happy to discuss what you need for testing, (micro)benchmarking, preserving public API for old slow functions, or whatever else to merge it. I know I still need to do stuff like bump version, changelog, etc.
This was originally at dart-archive/html#251
That version has proper separation of each commit which may be more helpful, it's been squashed for the repo migration.