-
Notifications
You must be signed in to change notification settings - Fork 876
Make syntax highlight generation scale better. #5642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The TextArea widget code queries the entires syntax tree for each edit, using the tree-sitter Query.captures method. This has potential scaling issues, but such issues are exacerbated by the fact the Query.captures method scales very poorly with the number of line it is asked to generate captures for. It appears to be quadratic or something similar I think - I strongly suspect a bug in tree-sitter or its python bindings. On my laptop, this makes editing a 25,000 line Python file painfully unresponsive. A 25,000 lines Python file is large, but not entirely unreasonable. I actually became aware of this behaviour developing a LaTeX editor, which showed symptoms after about 1,000 lines. This change replaces the plain TextArea._highlights dictionary with a dictonary-like class that lazily performs item access to build hilghlight information for small blocks of (50) lines at a time. As well as keeping the TextArea very much more responsive, it will reduce the average memory requirements for larger documents. During regression testing, I discovered that the per-line highlights are not necessarily in the correct (best) order for syntax highlighting. For example, highlighting within string expressions can lost. So I added suitable sorting. This required that the snapshots for some tests to be updated.
In case it is of interest. For the LaTeX editor I was developing when I found this issue, I also implemented a spell checker and search function. It was very easy to extend the new HighlightMap class so that I could overlay misspelled word and matching text on top of the syntax highlighting. |
Love it! Exposing the highlight mapping is also super useful. |
I forgot to upload this earlier. It is the large Python file and a simple application that I have been using for ad-hoc responsiveness testing - before and after. To use, just run the mini_editor and type. I have not tried to add any tests due to fact that reliable automated performance testing is hard and not necessarily very helpful in this case. I can look into this if you feel that it is worth trying to automatically catch performance regression. |
Just played with the demo - very nice! Editing a 25k line Python file with syntax highlighting and almost no perceptible delay cc @willmcgugan :) |
I am glad you like it, but if you were wondering why I my branch has a '1' on the end... Try typing in a docstring at the top of the file - i.e. on a blank line open with triple quote (""") and then typing the contents leaving the end triple quote missing. I have a textarea-speedup-2 branch for this mostly ready to go. I need to finish some tweaks to the tests before I can prepare a PR. |
Awesome - I just made a PR on your fork with some suggestions, here: paul-ollis#1 Feel free to leave comments. A few type hinting changes but also I suggested a change to how the "uncovered_lines" part works - although I could be misunderstanding your implementation, I feel like we don't need to be constructing all those For what it's worth, we can likely do even better here if we can get this to work: for changed_range in tree.changed_ranges(new_tree):
print("Changed range:")
print(f" Start point {changed_range.start_point}")
print(f" Start byte {changed_range.start_byte}")
print(f" End point {changed_range.end_point}")
print(f" End byte {changed_range.end_byte}") In early versions of the Python tree-sitter bindings, this method wasn't working correctly. For each edit, I wanted to use this method to return the document ranges affected by it, so that we could only re-highlight those parts. I'm not sure if that has since been fixed though - we only upgraded to the new bindings relatively recently. |
Thanks for the PR @darrenburns. FYI idea behind the dict of ranges was to make getitem as efficient as possible at the expense of some memory - not too much because the only one range object is created for each 50 lines. However, based on you PR, I see no merit in keeping the scheme - your code is shorter and cleaner. I have incorporated your changes with some modifications, which, I think, make the code cleaner still. I have also renamed _build_highlight_map to _reset_highlights, but you might be able to come up with a better name. |
@darrenburns. Re tree.changed_ranges, my other branch (PR imminent) is a good place to try some experiments. I will have a play and let you know what I find (on the next PR). |
It seems that Tree.changed_ranges does not always do what is required to be useful here. I am looking at alternative approaches, which may take me a while. |
The TextArea widget code queries the entires syntax tree for each edit, using the tree-sitter Query.captures method. This has potential scaling issues, but such issues are exacerbated by the fact the Query.captures method scales very poorly with the number of line it is asked to generate captures for. It appears to be quadratic or something similar I think - I strongly suspect a bug in tree-sitter or its python bindings.
On my laptop, this makes editing a 25,000 line Python file painfully unresponsive. A 25,000 lines Python file is large, but not entirely unreasonable. I actually became aware of this behaviour developing a LaTeX editor, which showed symptoms after about 1,000 lines.
This change replaces the plain TextArea._highlights dictionary with a dictonary-like class that lazily performs item access to build hilghlight information for small blocks of (50) lines at a time. As well as keeping the TextArea very much more responsive, it will reduce the average memory requirements for larger documents.
During regression testing, I discovered that the per-line highlights are not necessarily in the correct (best) order for syntax highlighting. For example, highlighting within string expressions can lost. So I added suitable sorting. This required that the snapshots for some tests to be updated.
Please review the following checklist.