-
Notifications
You must be signed in to change notification settings - Fork 116
Description
Editing a large file can take up to multiple seconds per character. Right now CodeEditTextView uses the default NSTextStorage as a text store, which stores text contiguously in memory. This means that for each insert all characters after the insert location must be copied one spot over in memory to make room for the inserted character.
For the most part this isn't a problem for smaller files. But opening a file like sqlite.c (~151,000 LOC) the performance impact becomes very noticeable. I've attached a screen recording of typing a few characters into this file, it's easy to see how long it takes.
Screen.Recording.2023-01-25.at.10.31.59.PM.mov
The way to fix this issue is to implement a custom NSTextStorage object using a data structure more suited to a text editor. By default, NSTextStorage uses an NSMutableAttributedString to store both text and attributes. These implementations are very general and don't perform well with edits.
Therefore the problem is twofold: text storage and attribute storage.
There are a few algorithms that are generally used in text editors for storing text:
In my opinion, a piece table would be the best option. A gap buffer may be a solid solution, but doesn't offer the same append-only type performance a piece table can offer. With a piece table every edit is added to the text buffer as an immutable, appended, chunk of text which completely removes the need to copy text when inserting characters.
The only downside of a piece table comes around when a lot of edits happen as found by the VSCode team.
The other problem is attribute storage. Attributes consist of a range (UInt-UInt) and a dictionary of attributes NSDictionary. A solution for attribute storage may be to implement a tree that stores ranges as it's nodes and allows for O(log n) lookup time for a given index. However, a tree structure will have a large space requirement, and ranges may be very small (or even a single index) when we're talking about syntax highlighting, leading to many nodes in a tree.
Another option would be to somehow store a small amount of metadata about each range alongside the string itself. We already have a ThemeAttributesProviding protocol that can be used to convert a capture type into attributes. We could store the capture type (sqrt(21 available captures) = ~5 needed bits, rounded up) using 1 extra byte for every highlighted range and when asked for attribute data convert the capture into attributes.
Any thoughts or discussion on implementation and performance is welcome!
Metadata
Metadata
Assignees
Labels
Type
Projects
Status