Skip to content

Update splittext plan with CSS, a11y, SEO, BiDi, nesting, and line detection amendments#132

Open
tombigel wants to merge 12 commits intomasterfrom
tombigel/splittext-plan-revisions
Open

Update splittext plan with CSS, a11y, SEO, BiDi, nesting, and line detection amendments#132
tombigel wants to merge 12 commits intomasterfrom
tombigel/splittext-plan-revisions

Conversation

@tombigel
Copy link
Contributor

  • Add base CSS strategy (injectStyles, space handling, direction detection)
  • Simplify ARIA to container-level only
  • Add SEO strategy with preserveText visually-hidden duplicate
  • Add BiDi/shaping injection options (bidiResolver, shaper)
  • Add nested option (flatten/preserve/depth) for DOM structure control
  • Make line detection opt-in and note binary-search algorithm improvement
  • Add Safari whitespace normalization test results and Playwright test case

…tection amendments

- Add base CSS strategy (injectStyles, space handling, direction detection)
- Simplify ARIA to container-level only
- Add SEO strategy with preserveText visually-hidden duplicate
- Add BiDi/shaping injection options (bidiResolver, shaper)
- Add nested option (flatten/preserve/depth) for DOM structure control
- Make line detection opt-in and note binary-search algorithm improvement
- Add Safari whitespace normalization test results and Playwright test case

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the SplitText package implementation plan to incorporate new strategies for base CSS injection, accessibility/SEO handling, BiDi/shaping extensibility, nested DOM handling, and opt-in line detection (including Safari/WebKit whitespace quirks and a Playwright test case).

Changes:

  • Makes line detection explicitly opt-in and documents a more efficient binary-search-based approach.
  • Adds plan-level API/options for CSS injection, preserveText for SEO/a11y, nested DOM handling, and BiDi/shaping injection points.
  • Expands testing/documentation sections (Safari whitespace normalization notes + Playwright test case, wrapper/CSS guidance).
Comments suppressed due to low confidence (1)

.cursor/plans/text_splitter_package_1eeee927.plan.md:850

  • SplitTextResult is typed as returning HTMLSpanElement[], but the implementation sketch (cache + getters + _performSplit) uses HTMLElement[]. This should be consistent (ideally HTMLSpanElement[] everywhere, since wrappers are spans), otherwise the public API typing and internals will diverge.
  private _cache: {
    chars?: HTMLElement[];
    words?: HTMLElement[];
    lines?: HTMLElement[];
    sentences?: HTMLElement[];
  } = {};

  constructor(element: HTMLElement, options?: SplitTextOptions) {
    this._element = element;
    this._originalHTML = element.innerHTML;

    // Eager split if type is provided; track whether 'lines' was requested (lines are opt-in and expensive)
    this._linesRequested = false;
    if (options?.type) {
      const types = Array.isArray(options.type) ? options.type : [options.type];
      this._linesRequested = types.includes('lines');
      for (const type of types) {
        this._performSplit(type);
      }
    }
  }

  // Lazy getter - split on first access, return cached thereafter
  get chars(): HTMLElement[] {
    if (!this._cache.chars) {
      this._cache.chars = this._performSplit('chars');
    }
    return this._cache.chars;
  }

  get words(): HTMLElement[] {
    if (!this._cache.words) {
      this._cache.words = this._performSplit('words');
    }
    return this._cache.words;
  }

  get lines(): HTMLElement[] {
    if (!this._linesRequested) return []; // or throw with message: "Lines not requested; pass type: 'lines' or type: [..., 'lines']"
    if (!this._cache.lines) {
      this._cache.lines = this._performSplit('lines');
    }
    return this._cache.lines;
  }

  get sentences(): HTMLElement[] {
    if (!this._cache.sentences) {
      this._cache.sentences = this._performSplit('sentences');
    }
    return this._cache.sentences;
  }

  private _performSplit(type: 'chars' | 'words' | 'lines' | 'sentences'): HTMLElement[] {
    // Actual splitting logic - creates wrapper elements in DOM
    // Returns array of created HTMLElements
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tombigel and others added 2 commits February 23, 2026 14:18
Co-authored-by: Cursor <cursoragent@cursor.com>
…iv, HTMLSpanElement types, doc numbering

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 20 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tombigel and others added 2 commits February 24, 2026 11:39
…requirements

- Fix revert() prose, test locator, _handleResize options, remove splitBy/preserveWhitespace
- Add Intl.Segmenter browser requirement + polyfill note
- BiDi: external plugin API; remove shaper; shaped-languages note
- Line detection: binary search pseudocode, naive example, heightTracker fix
- Nested: clarify flatten deeper than N with example
- Minor: aria inner div, line detection string[] note, numbered lists, Prettier

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@tombigel tombigel requested a review from ameerabuf February 27, 2026 18:28
- Remove lines opt-in special-casing; .lines is now lazy like all other getters
- Add display: contents to aria-hidden wrapper div to prevent layout breakage
- Remove redundant .split-space class (base CSS handles space preservation)
- Assert withoutNorm in Safari whitespace test
- Add Segmenter Polyfill API section with contract and compatible polyfills
- Trim repetitive integration examples and redundant code subsections
- Tag all code snippets with target file paths

Made-with: Cursor
@tombigel tombigel requested review from ameerabuf and ydaniv March 3, 2026 14:35
Copy link
Collaborator

@ydaniv ydaniv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed until Line Detection Algorithm

Comment on lines +156 to +157
// Base CSS (inline-block, white-space, etc.)
injectStyles?: boolean; // default: true - auto-inject minimal base stylesheet (deduplicated via data-splittext)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this exposed? What happens if it's false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

than we don't add the base css. not a must, but a nice option if the styles are bundled somewhere else

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify this feature a bit:

  1. We don't want to specify this per split. This is a global setting.
  2. We can generate the global CSS and inject it via adoptedStyleSheets. Because anyway we operate in the client.
  3. If we're adding these styles globally it's enough for now, and we don't need the implementation to inject inline styles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

injectStyles?: boolean; // default: true - auto-inject minimal base stylesheet (deduplicated via data-splittext)

// DOM structure
nested?: 'flatten' | 'preserve' | number; // default: 'flatten'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this exposed? What are the use-cases for flattening or providing a number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I i remember correctly your original plan is set to preserve html structure. It is not always what we want, sometimes we just want the textContent of a styled text section, hence "flatten". the "number" option is a suggestion from claude that i adopted to have the ability to set nesting levels. so if i know my structure and all i want to preserve are top level bold and italic i can set this to 1.
again - not a must but a nice advanced feature

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when do we need this "flatten"? Do we have a real use-case?
Do we have a real use-case for "number"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use case for flatten? of course.
First of all this is de facto what everybody else is doing, we shouls at least have the option.
But generally for text editors generated html structures splitting can get messy, especially word and line - you can get to a point where you need two passes or look ahead... i don't remember, did it too many years ago (Think of a structure like <p> som<strong>e text th<span style="color:red">en som</span>e other tex</strong>t</p>) and It's before we get to <ul><li> markers and counters. We might decide we also don't want to deal with it.

For the number option, is a cool feature i can see value in, not a must.


```css
/* Example: Typewriter effect */
/* Typewriter effect using staggered animation-delay via --index custom property */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how/where is that --index property applied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm, nice catch, I missed this example.
It's a good feature though, what do you say? we tell the splitter to add hardcoded --letter-index --word-index etc. for each element on split?

Copy link
Collaborator

@ydaniv ydaniv Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we could apply --index per part and use that in the effect it would be nice. But these things are hard to manage, so need to see what we end up with and see if it makes sense.
Maybe something like --c-idx, --w-idx, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, used the longform (--letter-index) and also added another prop partIndex so this can be disabled if needed for some reason

### Base CSS Strategy

For transforms to work correctly on spans, `display: inline-block` is often required. Users can set this via `wrapperStyle`:
When `injectStyles` is true (default), the package injects a minimal base stylesheet once per document via a `<style data-splittext>` tag (deduplicated). This ensures transforms and spacing work without requiring users to add CSS manually.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a static file or generated?
Also, better to not assume this implementation. Could be different depending on use-case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the base should be static i guess. I agree that the implementation details are too detailed. will change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### Unicode/Emoji Handling & Text Segmentation

Use `Intl.Segmenter` for proper character segmentation (with fallback for older browsers):
Use `Intl.Segmenter` (native or via the `segmenter` option — see **Segmenter Polyfill API**) for all text segmentation — characters, words, and sentences. This provides locale-aware splitting that correctly handles emoji, multi-codepoint grapheme clusters, CJK text without spaces, and language-specific word/sentence boundaries. Because `Intl.Segmenter` handles these concerns natively, custom `splitBy` or whitespace-handling options are unnecessary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of craft, the output is very verbose and not necessary.
Please revert, unless you can point anything that's really needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't change anything.
Anyway the list below is also not ideal, we can probably revise that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done for realz

Comment on lines +862 to +973
const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
const chars = [...segmenter.segment(text)].map((s) => s.segment);
// Characters (grapheme clusters — handles emoji, combining marks, etc.)
const charSegmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
const chars = [...charSegmenter.segment(text)].map((s) => s.segment);

// Words (locale-aware — works for CJK, languages without spaces, etc.)
const wordSegmenter = new Intl.Segmenter('en', { granularity: 'word' });
const words = [...wordSegmenter.segment(text)].filter((s) => s.isWordLike).map((s) => s.segment);

// Sentences
const sentenceSegmenter = new Intl.Segmenter('en', { granularity: 'sentence' });
const sentences = [...sentenceSegmenter.segment(text)].map((s) => s.segment);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary. Basically the LLM knows how to use the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't change anything here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really done this time

…, fix gaps

- Remove untested binary search line detection; restore Ben Nadel's proven
  getClientRects() approach as primary algorithm
- Remove redundant cross-reference line, stale binary-search perf note
- Simplify Base CSS section (static predefined stylesheet, less prescriptive)
- Add .sr-only to base CSS required styles
- Fix broken markdown formatting on SplitTextResultImpl heading
- Remove orphaned masking todo (no plan section existed)
- Remove redundant range.selectNodeContents line in height-tracking code
- Fix markdown lint warnings (double blank lines)

Made-with: Cursor
@tombigel tombigel requested a review from ydaniv March 11, 2026 14:29
return lines;
}
```
Instead of tracking rect count, track `getBoundingClientRect().height` on a range anchored at the text node start. When height increases, a new line has been reached. Same O(n) iteration but uses a single bounding rect per step instead of a rect array, which may be cheaper for long text.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't make sense, but also the whole algo above seems very inefficient. Increasing the range node by node and calling getBoundingClientRect() each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm, it added it without me asking, i missed it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +781 to +783
1. **No pre-wrapping required** — detect lines from original text nodes
2. **Accurate to browser rendering** — uses actual layout, not approximated positions
3. **Measure first, wrap second** — original text stays intact during detection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can revert this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +755 to +773
**Primary Approach: Range API with `getClientRects()`**

Use the DOM Range API to detect line breaks from text nodes _before_ creating wrapper elements. This avoids unnecessary DOM manipulation and provides accurate line detection based on the browser's actual rendering:

```typescript
function detectLines(textNode: Text): string[] {
const range = document.createRange();
const text = textNode.textContent || '';
const lines: string[][] = [];
let lineChars: string[] = [];

// Normalize whitespace (Safari compatibility)
textNode.textContent = text.trim().replace(/\s+/g, ' ');
Line detection is lazy like all other split types — it runs on first access of `.lines` (or eagerly if `type` includes `'lines'`). It uses the DOM `Range` API to detect line breaks from text nodes *before* DOM manipulation, avoiding unnecessary wrapper creation during measurement.

for (let i = 0; i < text.length; i++) {
range.setStart(textNode, 0);
range.setEnd(textNode, i + 1);
**Primary approach: Range API with `getClientRects()`** → `src/lineDetection.ts`

// getClientRects() returns one rect per rendered line
const lineIndex = range.getClientRects().length - 1;
Incrementally expand a `Range` one character at a time through the text node. At each position, `getClientRects()` returns one rect per visual line the range spans — so `getClientRects().length - 1` gives the line index of the last character. Group characters by their line index to extract rendered lines. (Technique from Ben Nadel, blog #4310; tested across Chrome, Firefox, Edge, Safari.)

if (!lines[lineIndex]) {
lines.push((lineChars = []));
}
lineChars.push(text.charAt(i));
}

return lines.map((chars) => chars.join('').trim());
}
```
Whitespace must be normalized before detection (`textNode.textContent = text.trim().replace(/\s+/g, ' ')`) — Safari returns rects based on markup structure rather than rendered layout when raw whitespace is present (see Browser Compatibility section).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert these, I think the original text with the algo itself is more explicit and clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines -456 to -477
// Example 1: Lazy evaluation - no splitting happens yet
const result = splitText('.headline');

// Splitting happens on first access, result is cached
const chars = result.chars; // Splits into chars NOW, caches result
const chars2 = result.chars; // Returns cached result (no re-split)

// Lines are split separately when accessed
const lines = result.lines; // Splits into lines NOW, caches result

// Example 2: Eager split with type option
const eagerResult = splitText('.headline', { type: 'words' });
// Words are split immediately on invocation

// Other types still use lazy evaluation
const lines2 = eagerResult.lines; // Splits into lines on access

// Example 3: Multiple types eager
const multiResult = splitText('.headline', { type: ['chars', 'words'] });
// Both chars and words split immediately

// Example 4: With animation library
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the parts that explains the lazy/eager splitting. You removed it without reading and the LLM missed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored the relevant examples (i think)

…code, trim verbosity

- Remove injectStyles from per-split options; CSS now injected globally via adoptedStyleSheets
- Add partIndexing option with --char-index/--word-index/--line-index/--sentence-index CSS custom properties
- Restore original detectLines() function code (reverts compact description per reviewer request)
- Remove height-tracking alternative from line detection section
- Trim verbose Intl.Segmenter section and remove redundant code examples
- Restore lazy/eager integration examples that explain core API behavior
- Update doc references from data-index to CSS custom property indexing

Made-with: Cursor
@tombigel tombigel requested a review from ydaniv March 12, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants