Make syntax highlight generation scale better. #5642

paul-ollis · 2025-03-12T11:20:55Z

The TextArea widget code queries the entires syntax tree for each edit, using the tree-sitter Query.captures method. This has potential scaling issues, but such issues are exacerbated by the fact the Query.captures method scales very poorly with the number of line it is asked to generate captures for. It appears to be quadratic or something similar I think - I strongly suspect a bug in tree-sitter or its python bindings.

On my laptop, this makes editing a 25,000 line Python file painfully unresponsive. A 25,000 lines Python file is large, but not entirely unreasonable. I actually became aware of this behaviour developing a LaTeX editor, which showed symptoms after about 1,000 lines.

This change replaces the plain TextArea._highlights dictionary with a dictonary-like class that lazily performs item access to build hilghlight information for small blocks of (50) lines at a time. As well as keeping the TextArea very much more responsive, it will reduce the average memory requirements for larger documents.

During regression testing, I discovered that the per-line highlights are not necessarily in the correct (best) order for syntax highlighting. For example, highlighting within string expressions can lost. So I added suitable sorting. This required that the snapshots for some tests to be updated.

Please review the following checklist.

Docstrings on all new or modified functions / classes
[n/a] Updated documentation
Updated CHANGELOG.md (where appropriate)

The TextArea widget code queries the entires syntax tree for each edit, using the tree-sitter Query.captures method. This has potential scaling issues, but such issues are exacerbated by the fact the Query.captures method scales very poorly with the number of line it is asked to generate captures for. It appears to be quadratic or something similar I think - I strongly suspect a bug in tree-sitter or its python bindings. On my laptop, this makes editing a 25,000 line Python file painfully unresponsive. A 25,000 lines Python file is large, but not entirely unreasonable. I actually became aware of this behaviour developing a LaTeX editor, which showed symptoms after about 1,000 lines. This change replaces the plain TextArea._highlights dictionary with a dictonary-like class that lazily performs item access to build hilghlight information for small blocks of (50) lines at a time. As well as keeping the TextArea very much more responsive, it will reduce the average memory requirements for larger documents. During regression testing, I discovered that the per-line highlights are not necessarily in the correct (best) order for syntax highlighting. For example, highlighting within string expressions can lost. So I added suitable sorting. This required that the snapshots for some tests to be updated.

paul-ollis · 2025-03-12T12:31:17Z

In case it is of interest.

For the LaTeX editor I was developing when I found this issue, I also implemented a spell checker and search function. It was very easy to extend the new HighlightMap class so that I could overlay misspelled word and matching text on top of the syntax highlighting.

darrenburns · 2025-03-12T12:48:14Z

Love it! Exposing the highlight mapping is also super useful.

paul-ollis · 2025-03-12T15:23:57Z

I forgot to upload this earlier. It is the large Python file and a simple application that I have been using for ad-hoc responsiveness testing - before and after. To use, just run the mini_editor and type.

demo.zip

I have not tried to add any tests due to fact that reliable automated performance testing is hard and not necessarily very helpful in this case. I can look into this if you feel that it is worth trying to automatically catch performance regression.

darrenburns · 2025-03-12T16:03:25Z

Just played with the demo - very nice! Editing a 25k line Python file with syntax highlighting and almost no perceptible delay cc @willmcgugan :)

paul-ollis · 2025-03-12T17:01:04Z

I am glad you like it, but if you were wondering why I my branch has a '1' on the end...

Try typing in a docstring at the top of the file - i.e. on a blank line open with triple quote (""") and then typing the contents leaving the end triple quote missing. I have a textarea-speedup-2 branch for this mostly ready to go. I need to finish some tweaks to the tests before I can prepare a PR.

darrenburns · 2025-03-12T17:45:31Z

Awesome - I just made a PR on your fork with some suggestions, here: paul-ollis#1

Feel free to leave comments. A few type hinting changes but also I suggested a change to how the "uncovered_lines" part works - although I could be misunderstanding your implementation, I feel like we don't need to be constructing all those range objects on each edit and can instead calculate the "range" a line index belongs to on the fly.

For what it's worth, we can likely do even better here if we can get this to work:

for changed_range in tree.changed_ranges(new_tree):
    print("Changed range:")
    print(f"  Start point {changed_range.start_point}")
    print(f"  Start byte {changed_range.start_byte}")
    print(f"  End point {changed_range.end_point}")
    print(f"  End byte {changed_range.end_byte}")

In early versions of the Python tree-sitter bindings, this method wasn't working correctly.

For each edit, I wanted to use this method to return the document ranges affected by it, so that we could only re-highlight those parts. I'm not sure if that has since been fixed though - we only upgraded to the new bindings relatively recently.

paul-ollis · 2025-03-12T21:45:31Z

Thanks for the PR @darrenburns. FYI idea behind the dict of ranges was to make getitem as efficient as possible at the expense of some memory - not too much because the only one range object is created for each 50 lines. However, based on you PR, I see no merit in keeping the scheme - your code is shorter and cleaner. I have incorporated your changes with some modifications, which, I think, make the code cleaner still.

I have also renamed _build_highlight_map to _reset_highlights, but you might be able to come up with a better name.

paul-ollis · 2025-03-13T11:29:26Z

@darrenburns. Re tree.changed_ranges, my other branch (PR imminent) is a good place to try some experiments. I will have a play and let you know what I find (on the next PR).

paul-ollis · 2025-03-21T09:27:55Z

It seems that Tree.changed_ranges does not always do what is required to be useful here.

I am looking at alternative approaches, which may take me a while.

paul-ollis · 2025-04-11T12:05:33Z

Converted this to draft because it is basically subsumed into #5645. When the status of #5645 is settled this can be closed or readied for review.

Update with suggestions by Darren Burns.

7992465

This was referenced Mar 13, 2025

Background tree-sitter incremental parsing to improve TextArea responsiveness. #5645

Draft

Some efficiency improvements in tree-sitter highlighting paul-ollis/textual#1

Open

paul-ollis marked this pull request as draft April 11, 2025 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make syntax highlight generation scale better. #5642

Make syntax highlight generation scale better. #5642

paul-ollis commented Mar 12, 2025

paul-ollis commented Mar 12, 2025

darrenburns commented Mar 12, 2025

paul-ollis commented Mar 12, 2025

darrenburns commented Mar 12, 2025

paul-ollis commented Mar 12, 2025

darrenburns commented Mar 12, 2025

paul-ollis commented Mar 12, 2025

paul-ollis commented Mar 13, 2025

paul-ollis commented Mar 21, 2025

paul-ollis commented Apr 11, 2025 •

edited

Loading

Make syntax highlight generation scale better. #5642

Are you sure you want to change the base?

Make syntax highlight generation scale better. #5642

Conversation

paul-ollis commented Mar 12, 2025

paul-ollis commented Mar 12, 2025

darrenburns commented Mar 12, 2025

paul-ollis commented Mar 12, 2025

darrenburns commented Mar 12, 2025

paul-ollis commented Mar 12, 2025

darrenburns commented Mar 12, 2025

paul-ollis commented Mar 12, 2025

paul-ollis commented Mar 13, 2025

paul-ollis commented Mar 21, 2025

paul-ollis commented Apr 11, 2025 • edited Loading

paul-ollis commented Apr 11, 2025 •

edited

Loading