Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

~30-40% perf win using 'charCodeAt' in CParser.write() #57

Merged
merged 8 commits into from
Jan 4, 2019
Merged

~30-40% perf win using 'charCodeAt' in CParser.write() #57

merged 8 commits into from
Jan 4, 2019

Conversation

DLehenbauer
Copy link

Uses character codes (i.e., numbers) instead of strings comparison inside of the CParser.write() state machine for a significant perf win (measured on node v8.11.4).

Also includes:

  • Benchmark for comparison (npm i && npm run bench)
  • small set of easy-to-debug tests targeting parser (optional)
  • .vscode configuration for debugging tests & benchmark (optional)

Copy link
Author

@DLehenbauer DLehenbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @evan-king and @dscape - Any interest in working w/me to get this PR accepted? The perf improvement seems substantial.

.vscode/launch.json Outdated Show resolved Hide resolved
@@ -0,0 +1,4 @@
{
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Ditto re: this VS Code config file to set tab indentation to 2 spaces.)

@@ -0,0 +1,117 @@
const { Suite } = require("benchmark");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the benchmark I used to measure the impact of the change. It measures the four .json files under "../samples" and for me shows ~30+% speedup under 'node 8.11.4'. The speedup was significant enough that I didn't bother disabling turbo boost, sleep states, etc. on the CPU.


parser.onkey = name => {
this.key++;
assert(name !== "𝓥𝓸𝓵𝓭𝓮𝓶𝓸𝓻𝓽");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"He Who Must Not Be Named" ;-)

package.json Outdated
"test": "mocha -r should -t 10000 -s 2000 test/clarinet.js test/npm.js test/utf8-chunks.js test/position.js"
"test": "mocha -r should -t 10000 -s 2000 test/parser.spec.js test/clarinet.js test/npm.js test/utf8-chunks.js test/position.js",
"bench": "node benchmark/index.js",
"postinstall": "niv clarinet@latest --destination clarinet-last-published"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'niv' is a tool for installing multiple versions of a package side-by-side under node_modules, used by the new benchmark to compare the development version with the last published version. As an alternative, we could check in a snapshot of the previous 'clarinet.js' under './benchmark/'.

"should": "1.0.x",
"underscore": "1.2.3"
},
"scripts": {
"test": "mocha -r should -t 10000 -s 2000 test/clarinet.js test/npm.js test/utf8-chunks.js test/position.js"
"test": "mocha -r should -t 10000 -s 2000 test/parser.spec.js test/clarinet.js test/npm.js test/utf8-chunks.js test/position.js",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI - I don't believe the 'should' package is still used?

@@ -1,4 +1,6 @@
;(function (clarinet) {
"use strict";

// non node-js needs to set clarinet debug on root
var env
, fastlist
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI - I don't believe 'fastlist' is used?

}
c = chunk.charAt(i++);
c = chunk.charCodeAt(i++);
parser.position++;
starti = i-1;
if (!c) break;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a unicode expert, but I believe surrogates are still processed correctly because:

  1. The hi/lo surrogates won't match any of the char codes we handle explicitly, so...
  2. They'll fall through to the pre-existing code that appends them via substring (just below the fold)

@@ -548,93 +638,83 @@ else env = window;
continue;

case S.TRUE:
if (c==='') continue; // strange buffers
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure these "strange buffers" can be reproduced? If so, charCodeAt(..) probably returns 'undefined'. I could put these back defensively, but I'd be curious to see if this case still occurs.

@@ -361,25 +454,25 @@ else env = window;

if (clarinet.DEBUG) console.log(i,c,clarinet.STATE[parser.state]);
parser.position ++;
if (c === "\n") {
if (c === Char.lineFeed) {
parser.line ++;
parser.column = 0;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - I'm not sure 'column' is updated every time we advance a character position. It might be more reliable to store the 'columnStart' position and then calculate the column on demand with:

get column() { return this.position - this.columnStart; }

Copy link
Collaborator

@evan-king evan-king left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for following up on the PR. Though I've been quite busy of late, it hadn't slipped in unnoticed and I've been pondering it for a while. Especially given my lack of time to address any issues that arise in a timely manner, my top priority is stability.

With that in mind (and in addition to the other trivial feedback given), there are 2 things I'd like to know:

  • how much of the performance gain came from simply removing all the empty string checks
  • whether anything useful can be unearthed about why they were once needed

The latter is a tall order, as the git history is pretty sparse and seems to lack any higher level summaries/explanations. But I'd be a lot more comfortable removing them if I could identify what situation did require them, and thus rationalize treating them as dead code.

If they don't significantly impact performance, leaving them in would be the easier way to just get this pushed through.

.vscode/launch.json Outdated Show resolved Hide resolved
clarinet.js Show resolved Hide resolved
@@ -0,0 +1,178 @@
"use strict";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rather favor test suites set up in this way (filthy ES6 classes/OOP notwithstanding ;)), and especially being able to run more focused tests for debugging, so I'm happy to keep it.

.vscode/launch.json Outdated Show resolved Hide resolved
@DLehenbauer
Copy link
Author

I was lazy about analyzing the empty string checks. Looking again, it's pretty easy to convince oneself that the removed empty string checks were previously unreachable.

The normal control flow is to advance i++ until charAt(i) reads off the end of the string. This results in c being assigned to an empty string every time write() is called. (i.e., no "strange buffers" required)

When c is an empty string, the if (!c) break; near the main while...loop terminates the loop before the switch statement has a chance to dispatch to the S.TRUE*/S.FALSE*/S.NULL* cases.

Since c is not reassigned before we dispatch to the true/false/null cases, and no one falls through to these cases, we can conclude that c can not be an empty string when we enter the true/false/null cases (and consequently, the removed empty string checks were indeed unreachable.)

@evan-king
Copy link
Collaborator

evan-king commented Jan 4, 2019

I'm pretty comfortable with that analysis. However, it does still leave the question of how much performance gain came from removing extraneous conditional checks and not from the use of string literal comparison vs character value constants.

The former makes more sense to me as a cause of sub-optimal performance (especially post-Spectre). With that in mind, I want to be certain the larger change is justified by a meaningful impact. Otherwise, significant changes are being introduced for still-unquantified benefit that may not even remain relevant as future engine-level optimizations come down the pipeline, and future maintainers will be as afraid to disrupt or remove performance-related code as I am to introduce it.

To be clear, what I'm asking is for rough benchmark numbers for removed conditionals alone, and optimized code with conditionals preserved, alongsize current vanilla and fully optimized code. If the charCodeAt optimizations count for at least 50% of the 30% gain, then I'll merge as-is. If not, I'll consider the matter further first.

@DLehenbauer
Copy link
Author

I understand... The perf gain is 100% from switching to 'charCodeAt()'. The reason for the performance improvement is that 'charCodeAt()' returns a number, which can be compared in a single machine op and involves zero heap allocation or resulting GC tax.

Calling 'charAt()' on the other hand semantically allocates a 1-character string on each invocation. This allocation can potentially be optimized away, but since the string escapes the module (via parser.c and parser.p) it would be a difficult optimization to detect.

My only motivation for removing the empty string checks was that they were either unreachable or my understanding of the code was incorrect. Removing them felt more honest than mechanically converting them, as removal would attract scrutiny in review. :-)

Here are the benchmark results from removing the conditional only:

> node index.js

old-creationix x 5,022 ops/sec ±0.25% (93 runs sampled)
new-creationix x 5,036 ops/sec ±0.24% (95 runs sampled)
Fastest is new-creationix

old-npm x 6.97 ops/sec ±0.32% (22 runs sampled)
new-npm x 6.99 ops/sec ±0.23% (22 runs sampled)
Fastest is new-npm,old-npm

old-twitter x 4.09 ops/sec ±1.00% (15 runs sampled)
new-twitter x 4.07 ops/sec ±0.45% (15 runs sampled)
Fastest is old-twitter

old-wikipedia x 118,013 ops/sec ±0.46% (92 runs sampled)
new-wikipedia x 115,889 ops/sec ±1.34% (93 runs sampled)
Fastest is old-wikipedia

And here are the results w/the conversion to character codes:

> node index.js

old-creationix x 5,011 ops/sec ±0.60% (93 runs sampled)
new-creationix x 8,548 ops/sec ±0.29% (92 runs sampled)
Fastest is new-creationix

old-npm x 6.96 ops/sec ±0.61% (22 runs sampled)
new-npm x 9.87 ops/sec ±0.24% (29 runs sampled)
Fastest is new-npm

old-twitter x 4.06 ops/sec ±0.22% (15 runs sampled)
new-twitter x 6.29 ops/sec ±0.36% (20 runs sampled)
Fastest is new-twitter

old-wikipedia x 116,979 ops/sec ±0.32% (94 runs sampled)
new-wikipedia x 175,097 ops/sec ±0.34% (92 runs sampled)
Fastest is new-wikipedia

PS - In tweaking the benchmark to get these measurements, I ran into this issue with the 'npm-install-version' package I added. (I pushed a change to avoid the dependency.)

@evan-king
Copy link
Collaborator

Thanks for the thorough PR support. Approved and merging.

@evan-king evan-king merged commit 7f65dd3 into dscape:master Jan 4, 2019
@DLehenbauer DLehenbauer deleted the perf branch April 5, 2019 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants