Update splittext plan with CSS, a11y, SEO, BiDi, nesting, and line detection amendments#132
Update splittext plan with CSS, a11y, SEO, BiDi, nesting, and line detection amendments#132
Conversation
tombigel
commented
Feb 23, 2026
- Add base CSS strategy (injectStyles, space handling, direction detection)
- Simplify ARIA to container-level only
- Add SEO strategy with preserveText visually-hidden duplicate
- Add BiDi/shaping injection options (bidiResolver, shaper)
- Add nested option (flatten/preserve/depth) for DOM structure control
- Make line detection opt-in and note binary-search algorithm improvement
- Add Safari whitespace normalization test results and Playwright test case
…tection amendments - Add base CSS strategy (injectStyles, space handling, direction detection) - Simplify ARIA to container-level only - Add SEO strategy with preserveText visually-hidden duplicate - Add BiDi/shaping injection options (bidiResolver, shaper) - Add nested option (flatten/preserve/depth) for DOM structure control - Make line detection opt-in and note binary-search algorithm improvement - Add Safari whitespace normalization test results and Playwright test case Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Pull request overview
Updates the SplitText package implementation plan to incorporate new strategies for base CSS injection, accessibility/SEO handling, BiDi/shaping extensibility, nested DOM handling, and opt-in line detection (including Safari/WebKit whitespace quirks and a Playwright test case).
Changes:
- Makes line detection explicitly opt-in and documents a more efficient binary-search-based approach.
- Adds plan-level API/options for CSS injection, preserveText for SEO/a11y, nested DOM handling, and BiDi/shaping injection points.
- Expands testing/documentation sections (Safari whitespace normalization notes + Playwright test case, wrapper/CSS guidance).
Comments suppressed due to low confidence (1)
.cursor/plans/text_splitter_package_1eeee927.plan.md:850
SplitTextResultis typed as returningHTMLSpanElement[], but the implementation sketch (cache + getters +_performSplit) usesHTMLElement[]. This should be consistent (ideallyHTMLSpanElement[]everywhere, since wrappers are spans), otherwise the public API typing and internals will diverge.
private _cache: {
chars?: HTMLElement[];
words?: HTMLElement[];
lines?: HTMLElement[];
sentences?: HTMLElement[];
} = {};
constructor(element: HTMLElement, options?: SplitTextOptions) {
this._element = element;
this._originalHTML = element.innerHTML;
// Eager split if type is provided; track whether 'lines' was requested (lines are opt-in and expensive)
this._linesRequested = false;
if (options?.type) {
const types = Array.isArray(options.type) ? options.type : [options.type];
this._linesRequested = types.includes('lines');
for (const type of types) {
this._performSplit(type);
}
}
}
// Lazy getter - split on first access, return cached thereafter
get chars(): HTMLElement[] {
if (!this._cache.chars) {
this._cache.chars = this._performSplit('chars');
}
return this._cache.chars;
}
get words(): HTMLElement[] {
if (!this._cache.words) {
this._cache.words = this._performSplit('words');
}
return this._cache.words;
}
get lines(): HTMLElement[] {
if (!this._linesRequested) return []; // or throw with message: "Lines not requested; pass type: 'lines' or type: [..., 'lines']"
if (!this._cache.lines) {
this._cache.lines = this._performSplit('lines');
}
return this._cache.lines;
}
get sentences(): HTMLElement[] {
if (!this._cache.sentences) {
this._cache.sentences = this._performSplit('sentences');
}
return this._cache.sentences;
}
private _performSplit(type: 'chars' | 'words' | 'lines' | 'sentences'): HTMLElement[] {
// Actual splitting logic - creates wrapper elements in DOM
// Returns array of created HTMLElements
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Cursor <cursoragent@cursor.com>
…iv, HTMLSpanElement types, doc numbering Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 20 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…requirements - Fix revert() prose, test locator, _handleResize options, remove splitBy/preserveWhitespace - Add Intl.Segmenter browser requirement + polyfill note - BiDi: external plugin API; remove shaper; shaped-languages note - Line detection: binary search pseudocode, naive example, heightTracker fix - Nested: clarify flatten deeper than N with example - Minor: aria inner div, line detection string[] note, numbered lists, Prettier Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- Remove lines opt-in special-casing; .lines is now lazy like all other getters - Add display: contents to aria-hidden wrapper div to prevent layout breakage - Remove redundant .split-space class (base CSS handles space preservation) - Assert withoutNorm in Safari whitespace test - Add Segmenter Polyfill API section with contract and compatible polyfills - Trim repetitive integration examples and redundant code subsections - Tag all code snippets with target file paths Made-with: Cursor
ydaniv
left a comment
There was a problem hiding this comment.
Reviewed until Line Detection Algorithm
| // Base CSS (inline-block, white-space, etc.) | ||
| injectStyles?: boolean; // default: true - auto-inject minimal base stylesheet (deduplicated via data-splittext) |
There was a problem hiding this comment.
Why do we need this exposed? What happens if it's false?
There was a problem hiding this comment.
than we don't add the base css. not a must, but a nice option if the styles are bundled somewhere else
There was a problem hiding this comment.
Let's clarify this feature a bit:
- We don't want to specify this per split. This is a global setting.
- We can generate the global CSS and inject it via
adoptedStyleSheets. Because anyway we operate in the client. - If we're adding these styles globally it's enough for now, and we don't need the implementation to inject inline styles.
| injectStyles?: boolean; // default: true - auto-inject minimal base stylesheet (deduplicated via data-splittext) | ||
|
|
||
| // DOM structure | ||
| nested?: 'flatten' | 'preserve' | number; // default: 'flatten' |
There was a problem hiding this comment.
Why do we need this exposed? What are the use-cases for flattening or providing a number?
There was a problem hiding this comment.
I i remember correctly your original plan is set to preserve html structure. It is not always what we want, sometimes we just want the textContent of a styled text section, hence "flatten". the "number" option is a suggestion from claude that i adopted to have the ability to set nesting levels. so if i know my structure and all i want to preserve are top level bold and italic i can set this to 1.
again - not a must but a nice advanced feature
There was a problem hiding this comment.
But when do we need this "flatten"? Do we have a real use-case?
Do we have a real use-case for "number"?
There was a problem hiding this comment.
Use case for flatten? of course.
First of all this is de facto what everybody else is doing, we shouls at least have the option.
But generally for text editors generated html structures splitting can get messy, especially word and line - you can get to a point where you need two passes or look ahead... i don't remember, did it too many years ago (Think of a structure like <p> som<strong>e text th<span style="color:red">en som</span>e other tex</strong>t</p>) and It's before we get to <ul><li> markers and counters. We might decide we also don't want to deal with it.
For the number option, is a cool feature i can see value in, not a must.
|
|
||
| ```css | ||
| /* Example: Typewriter effect */ | ||
| /* Typewriter effect using staggered animation-delay via --index custom property */ |
There was a problem hiding this comment.
And how/where is that --index property applied?
There was a problem hiding this comment.
mm, nice catch, I missed this example.
It's a good feature though, what do you say? we tell the splitter to add hardcoded --letter-index --word-index etc. for each element on split?
There was a problem hiding this comment.
Yes, if we could apply --index per part and use that in the effect it would be nice. But these things are hard to manage, so need to see what we end up with and see if it makes sense.
Maybe something like --c-idx, --w-idx, etc.
There was a problem hiding this comment.
done, used the longform (--letter-index) and also added another prop partIndex so this can be disabled if needed for some reason
| ### Base CSS Strategy | ||
|
|
||
| For transforms to work correctly on spans, `display: inline-block` is often required. Users can set this via `wrapperStyle`: | ||
| When `injectStyles` is true (default), the package injects a minimal base stylesheet once per document via a `<style data-splittext>` tag (deduplicated). This ensures transforms and spacing work without requiring users to add CSS manually. |
There was a problem hiding this comment.
Is this a static file or generated?
Also, better to not assume this implementation. Could be different depending on use-case.
There was a problem hiding this comment.
the base should be static i guess. I agree that the implementation details are too detailed. will change
| ### Unicode/Emoji Handling & Text Segmentation | ||
|
|
||
| Use `Intl.Segmenter` for proper character segmentation (with fallback for older browsers): | ||
| Use `Intl.Segmenter` (native or via the `segmenter` option — see **Segmenter Polyfill API**) for all text segmentation — characters, words, and sentences. This provides locale-aware splitting that correctly handles emoji, multi-codepoint grapheme clusters, CJK text without spaces, and language-specific word/sentence boundaries. Because `Intl.Segmenter` handles these concerns natively, custom `splitBy` or whitespace-handling options are unnecessary. |
There was a problem hiding this comment.
This is a lot of craft, the output is very verbose and not necessary.
Please revert, unless you can point anything that's really needed.
There was a problem hiding this comment.
You didn't change anything.
Anyway the list below is also not ideal, we can probably revise that as well.
| const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' }); | ||
| const chars = [...segmenter.segment(text)].map((s) => s.segment); | ||
| // Characters (grapheme clusters — handles emoji, combining marks, etc.) | ||
| const charSegmenter = new Intl.Segmenter('en', { granularity: 'grapheme' }); | ||
| const chars = [...charSegmenter.segment(text)].map((s) => s.segment); | ||
|
|
||
| // Words (locale-aware — works for CJK, languages without spaces, etc.) | ||
| const wordSegmenter = new Intl.Segmenter('en', { granularity: 'word' }); | ||
| const words = [...wordSegmenter.segment(text)].filter((s) => s.isWordLike).map((s) => s.segment); | ||
|
|
||
| // Sentences | ||
| const sentenceSegmenter = new Intl.Segmenter('en', { granularity: 'sentence' }); | ||
| const sentences = [...sentenceSegmenter.segment(text)].map((s) => s.segment); |
There was a problem hiding this comment.
Not necessary. Basically the LLM knows how to use the API.
There was a problem hiding this comment.
You didn't change anything here
There was a problem hiding this comment.
really done this time
…, fix gaps - Remove untested binary search line detection; restore Ben Nadel's proven getClientRects() approach as primary algorithm - Remove redundant cross-reference line, stale binary-search perf note - Simplify Base CSS section (static predefined stylesheet, less prescriptive) - Add .sr-only to base CSS required styles - Fix broken markdown formatting on SplitTextResultImpl heading - Remove orphaned masking todo (no plan section existed) - Remove redundant range.selectNodeContents line in height-tracking code - Fix markdown lint warnings (double blank lines) Made-with: Cursor
| return lines; | ||
| } | ||
| ``` | ||
| Instead of tracking rect count, track `getBoundingClientRect().height` on a range anchored at the text node start. When height increases, a new line has been reached. Same O(n) iteration but uses a single bounding rect per step instead of a rect array, which may be cheaper for long text. |
There was a problem hiding this comment.
I think this doesn't make sense, but also the whole algo above seems very inefficient. Increasing the range node by node and calling getBoundingClientRect() each time.
There was a problem hiding this comment.
mm, it added it without me asking, i missed it
| 1. **No pre-wrapping required** — detect lines from original text nodes | ||
| 2. **Accurate to browser rendering** — uses actual layout, not approximated positions | ||
| 3. **Measure first, wrap second** — original text stays intact during detection |
| **Primary Approach: Range API with `getClientRects()`** | ||
|
|
||
| Use the DOM Range API to detect line breaks from text nodes _before_ creating wrapper elements. This avoids unnecessary DOM manipulation and provides accurate line detection based on the browser's actual rendering: | ||
|
|
||
| ```typescript | ||
| function detectLines(textNode: Text): string[] { | ||
| const range = document.createRange(); | ||
| const text = textNode.textContent || ''; | ||
| const lines: string[][] = []; | ||
| let lineChars: string[] = []; | ||
|
|
||
| // Normalize whitespace (Safari compatibility) | ||
| textNode.textContent = text.trim().replace(/\s+/g, ' '); | ||
| Line detection is lazy like all other split types — it runs on first access of `.lines` (or eagerly if `type` includes `'lines'`). It uses the DOM `Range` API to detect line breaks from text nodes *before* DOM manipulation, avoiding unnecessary wrapper creation during measurement. | ||
|
|
||
| for (let i = 0; i < text.length; i++) { | ||
| range.setStart(textNode, 0); | ||
| range.setEnd(textNode, i + 1); | ||
| **Primary approach: Range API with `getClientRects()`** → `src/lineDetection.ts` | ||
|
|
||
| // getClientRects() returns one rect per rendered line | ||
| const lineIndex = range.getClientRects().length - 1; | ||
| Incrementally expand a `Range` one character at a time through the text node. At each position, `getClientRects()` returns one rect per visual line the range spans — so `getClientRects().length - 1` gives the line index of the last character. Group characters by their line index to extract rendered lines. (Technique from Ben Nadel, blog #4310; tested across Chrome, Firefox, Edge, Safari.) | ||
|
|
||
| if (!lines[lineIndex]) { | ||
| lines.push((lineChars = [])); | ||
| } | ||
| lineChars.push(text.charAt(i)); | ||
| } | ||
|
|
||
| return lines.map((chars) => chars.join('').trim()); | ||
| } | ||
| ``` | ||
| Whitespace must be normalized before detection (`textNode.textContent = text.trim().replace(/\s+/g, ' ')`) — Safari returns rects based on markup structure rather than rendered layout when raw whitespace is present (see Browser Compatibility section). |
There was a problem hiding this comment.
Please revert these, I think the original text with the algo itself is more explicit and clear
| // Example 1: Lazy evaluation - no splitting happens yet | ||
| const result = splitText('.headline'); | ||
|
|
||
| // Splitting happens on first access, result is cached | ||
| const chars = result.chars; // Splits into chars NOW, caches result | ||
| const chars2 = result.chars; // Returns cached result (no re-split) | ||
|
|
||
| // Lines are split separately when accessed | ||
| const lines = result.lines; // Splits into lines NOW, caches result | ||
|
|
||
| // Example 2: Eager split with type option | ||
| const eagerResult = splitText('.headline', { type: 'words' }); | ||
| // Words are split immediately on invocation | ||
|
|
||
| // Other types still use lazy evaluation | ||
| const lines2 = eagerResult.lines; // Splits into lines on access | ||
|
|
||
| // Example 3: Multiple types eager | ||
| const multiResult = splitText('.headline', { type: ['chars', 'words'] }); | ||
| // Both chars and words split immediately | ||
|
|
||
| // Example 4: With animation library |
There was a problem hiding this comment.
This is the parts that explains the lazy/eager splitting. You removed it without reading and the LLM missed it.
There was a problem hiding this comment.
restored the relevant examples (i think)
…code, trim verbosity - Remove injectStyles from per-split options; CSS now injected globally via adoptedStyleSheets - Add partIndexing option with --char-index/--word-index/--line-index/--sentence-index CSS custom properties - Restore original detectLines() function code (reverts compact description per reviewer request) - Remove height-tracking alternative from line detection section - Trim verbose Intl.Segmenter section and remove redundant code examples - Restore lazy/eager integration examples that explain core API behavior - Update doc references from data-index to CSS custom property indexing Made-with: Cursor