✨ Editing Performance

Editing a large file can take up to multiple seconds per character. Right now CodeEditTextView uses the default `NSTextStorage` as a text store, which stores text contiguously in memory. This means that for each insert all characters after the insert location must be copied one spot over in memory to make room for the inserted character.

For the most part this isn't a problem for smaller files. But opening a file like `sqlite.c` (~151,000 LOC) the performance impact becomes very noticeable. I've attached a screen recording of typing a few characters into this file, it's easy to see how long it takes.

https://user-images.githubusercontent.com/35942988/214759044-5ae45760-fe25-46ee-b81c-206a01c5aff8.mov

The way to fix this issue is to implement a custom `NSTextStorage` object using a data structure more suited to a text editor. By default, `NSTextStorage` uses an `NSMutableAttributedString` to store both text and attributes. These implementations are very general and don't perform well with edits.

Therefore the problem is twofold: text storage and attribute storage.

There are a few algorithms that are generally used in text editors for storing text:
- [Gap Buffers](https://en.wikipedia.org/wiki/Gap_buffer)
- [Piece Tables](https://en.wikipedia.org/wiki/Piece_table)

In my opinion, a piece table would be the best option. A gap buffer may be a solid solution, but doesn't offer the same append-only type performance a piece table can offer. With a piece table every edit is added to the text buffer as an immutable, appended, chunk of text which completely removes the need to copy text when inserting characters.

The only downside of a piece table comes around when a *lot* of edits happen as found by the [VSCode team](https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation#_conclusion-and-gotchas).

The other problem is attribute storage. Attributes consist of a range (`UInt`-`UInt`) and a dictionary of attributes `NSDictionary`. A solution for attribute storage may be to implement a tree that stores ranges as it's nodes and allows for `O(log n)` lookup time for a given index. However, a tree structure will have a large space requirement, and ranges may be very small (or even a single index) when we're talking about syntax highlighting, leading to many nodes in a tree.

Another option would be to somehow store a small amount of metadata about each range alongside the string itself. We already have a `ThemeAttributesProviding` protocol that can be used to convert a capture type into attributes. We could store the capture type (`sqrt(21 available captures) = ~5` needed bits, rounded up) using 1 extra byte for every highlighted range and when asked for attribute data convert the capture into attributes.

Any thoughts or discussion on implementation and performance is welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ Editing Performance #132

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

✨ Editing Performance #132

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions