-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parser performance by 50% #79
base: master
Are you sure you want to change the base?
Conversation
I tried running the tests but I keep getting this error:
|
What kind of things are you doing where parser performance is a big issue? I'm not sure adding this complexity for a 50% increase is worth it. Adding a simple branch for plain tag matching to You're probably using yarn or a very old npm to install your dependencies if you're getting duplicated modules like that. |
We are currently using ProseMirror in a note taking app which stores these notes in HTML and later the user can view/edit them. For that to work, the parse is obviously invovled. It isn't an issue for smaller notes but for large notes (> 300K words) optimizing the parser can bring down the waiting time between user clicking on a note and it appearing on the screen ready to edit. Even a small improvement helps.
I mean, it is a total of 30 lines and 2 methods. Even that can be further reduced by refactoring the code. 50% increase is not trivial for the amount of changes this PR makes.
Unfortunately, The changes I have made use a simple rule of doing the work once per parse instead of once per match. It can be changed to look less complex. It really isn't doing a whole lot. Of course, the decision rests with you. I'd be happy to make any further changes.
No, actually, I am running npm v10.2.4. I just ran |
Benchmarking these changes by parsing and re-parsing the example document in the demo of the dev demo page, I don't see an actual noticeable speed improvement. Are you using any particularly complex selectors in your schema? |
The speed difference is most noticeable on Chromium-based browsers, unfortunately. Here's a really simple down snippet that benchmarks the changes made in this PR: const doc = document.createElement("div");
for (let i = 0; i < 100000; ++i) {
const element = document.createElement("p");
element.classList.add("something");
doc.append(element);
}
console.time("creating set");
const set = new Set(doc.querySelectorAll("p.something").values());
const matchers = {
["p.something"]: (node) => set.has(node),
};
console.timeEnd("creating set");
const results = [];
for (let i = 0; i < 10; ++i) {
const loopStart = performance.now();
for (const node of doc.children) {
}
const loopEnd = performance.now() - loopStart;
const matcherStart = performance.now();
for (const node of doc.children) {
matchers["p.something"](node);
}
const matcherEnd = performance.now() - matcherStart;
const matchesStart = performance.now();
for (const node of doc.children) {
node.matches("p.something");
}
const matchesEnd = performance.now() - matchesStart;
results.push({ loop: loopEnd, matcher: matcherEnd - loopEnd, matches: matchesEnd - loopEnd });
}
console.table(results); I ran these in Google Chrome and Firefox, and got the following results: This tries to exclude the time it takes to loop over the elements (not super accurately, though). The crux is that the speed difference is most significant on Chrome, and on Firefox it actually gets slower. However, if you include the cost of creating the set of nodes, this PR isn't really looking all that great (at least that's what independent benchmarking is showing). This is very different from what I am seeing after benchmarking ProseMirror with these changes which makes me wonder whether the performance difference is due to something else? |
I benchmarked both Firefox and Chrome. Neither showed a significant difference. I'm not interested in micro-benchmarks that just run one leaf function—there'd have to be a noticeable improvement in |
How many nodes are in the document you benchmarked on? I'll run the benchmarks again. It's possible that I am doing something wrong/different because I am seeing 40% improvement in |
I was parsing a 3000-node document in a loop for a few seconds, counting the amount of parses per second. |
This PR significantly improves
parse
&parseSlice
performance by optimizing how Prosemirror does tag matching:querySelectorAll
on the main dom node once instead of callingmatches
on each node individuallyli
,p
etc. This avoids calling any browser API and works much faster.These changes shouldn't break anything.
Benchmarks
Before:
After:
That's a solid ~40% improvement.