Skip to content

fix: add new approach to rename functionality (#34655)#35086

Draft
gortiz-dotcms wants to merge 15 commits intomainfrom
issue-34655-folder-rename-fails
Draft

fix: add new approach to rename functionality (#34655)#35086
gortiz-dotcms wants to merge 15 commits intomainfrom
issue-34655-folder-rename-fails

Conversation

@gortiz-dotcms
Copy link
Member

@gortiz-dotcms gortiz-dotcms commented Mar 23, 2026

This PR fixes: #34655

Problem

`FolderAPIImpl.renameFolder()` failed with a PostgreSQL constraint violation when renaming a folder that contained more than ~20K contentlets:

```
ERROR: Cannot delete as this path has children
Where: PL/pgSQL function check_child_assets() line 13 at RAISE
```

Root Cause

The old implementation followed a create-new / move-all / delete-old cycle:

  1. Create a new `Folder` + `Identifier` record with the new name.
  2. Fetch all children via Elasticsearch (`findContentletsByFolder` → `search("+conFolder:...")`).
  3. Move each child one-by-one in a Java loop (one DB round-trip per item).
  4. Delete the old folder record.

Step 2 is the critical flaw: any contentlet not indexed in Elasticsearch is silently skipped (e.g. `.HEIC` files, recently uploaded content, or anything with a stale index). After moving only the ES-indexed subset, the old folder still had DB children pointing to its path. When step 4 tried to delete the folder, the PostgreSQL trigger `check_child_assets()` found those leftover children and raised a constraint violation.

This also explains why a manual re-index temporarily "fixed" the issue (#32181): the re-index brought ES and the DB back in sync, so the next rename attempt found all children.

Secondary cost: moving 20K+ items one-by-one in Java meant 20K+ individual DB round-trips, making the operation slow even when it succeeded.

Fix

Replace the create-new / move-all / delete-old cycle with a pure-SQL in-place rename:

  1. Update `folder.name` for the existing folder record (via `save()`).
  2. Update `identifier.asset_name` for the folder's identifier (via `IdentifierAPI.save()`). The old URI is explicitly evicted from the identifier cache before mutation so the pre-rename path key is removed.
  3. Bulk-update all children's `parent_path` using a plain `UPDATE identifier SET parent_path = ? WHERE parent_path = ? AND host_inode = ?`, processed depth-first by sorting all (oldPath → newPath) pairs by ascending path length before issuing any UPDATE.
  4. Flush folder, identifier, and nav-tool caches (targeted eviction using a pre-rename snapshot of sub-folder paths — no nuclear cache clear).
  5. Queue async ES reindex after the transaction commits (in `FolderAPIImpl`, not inside the factory) so a transient ES failure never rolls back a successful DB rename.

No new folder is created, no old folder is deleted — so the `check_child_assets()` trigger never fires. The rename is now DB-only and ES-independent, so unindexed items are never missed.

Why depth-first ordering (not a single bulk UPDATE)?

The existing `identifier_parent_path_trigger` fires `BEFORE UPDATE` on every `identifier` row and verifies that the new `parent_path` resolves to an existing folder. Processing each folder level before its children guarantees that by the time the trigger runs for a child row, its new parent path already exists in the DB.

All (oldPath → newPath) pairs are derived from a pre-loaded snapshot of sub-folder data and sorted by ascending old-path length: a parent path is always shorter than any of its descendants, so length-order guarantees parent-before-child without recursion or an explicit stack.

New index for trigger performance

`EXPLAIN` on the trigger's validation query showed it was using `idx_identifier_perm` (`asset_type + host_inode`) and then applying `lower(parent_path||asset_name||'/')` as a row-level filter — a scan over all folders on the host per trigger invocation. With 20K rows being updated this is O(N × F) lookups.

A new functional index `idx_identifier_parent_path_trigger` on `(host_inode, asset_type, lower(parent_path||asset_name||'/'))` matches the trigger's exact expression. After adding it, `EXPLAIN` showed the expression moved from `Filter` to `Index Cond` and cost dropped from 90.34 → 8.57. The index is added to `postgres.sql` for new installs and via startup task `Task260324AddIdentifierPathTriggerIndex` for existing databases.

Performance

Old approach New approach
Child discovery ES query (misses unindexed items) None needed
Child updates 20K+ Java → DB round-trips 1 bulk `UPDATE` per folder level
Trigger validation 0 (before broken delete) Per-row, now index-backed (O(log F))
Folder deletion + cascade Required → constraint error Not needed

Changes

  • `FolderFactoryImpl` — in-place rename, targeted cache eviction, depth-first child path update (sorted by path length), LIKE escaping for folder names containing `%` / `_`.
  • `FolderAPIImpl` — ES reindex moved here (post-transaction, guarded by try/catch); removed duplicate nav-cache evictions now handled by the factory.
  • `Task260324AddIdentifierPathTriggerIndex` + `postgres.sql` — new functional index for trigger query performance.

Tests

  • Updated `renameFolder()` — adjusted assertions to reflect in-place rename semantics: identifiers are preserved (same UUID), `asset_name` / `parent_path` are updated rather than creating new records.
  • New `renameFolder_updatesChildrenAndSubChildrenPaths()` — creates a folder tree with a file asset, an HTML page, and a sub-folder (itself containing a file asset), renames the root, then asserts at the DB level (via `identifierAPI.loadFromDb()` and a direct `DotConnect` query — no cache, no ES) that every item's `parent_path` was updated correctly at all nesting levels and the old path no longer exists.

🤖 Generated with Claude Code

@claude
Copy link
Contributor

claude bot commented Mar 23, 2026

Claude finished @gortiz-dotcms's task in 6m 39s —— View job


🔍 dotCMS Backend Review

  • Load dotCMS conventions
  • Get Java diff
  • Run specialized sub-agent reviews (security, database, Java standards, REST API)
  • Consolidate findings and post review

Three 🟡 Medium issues identified (all previously flagged, all still open). Full review posted in the dedicated review comment above.

@github-actions github-actions bot added the Area : Backend PR changes Java/Maven backend code label Mar 23, 2026
gortiz-dotcms and others added 6 commits March 24, 2026 09:27
- Replace folderCache.clearCache() (nuclear) with targeted eviction:
  snapshot sub-folder old paths before the bulk UPDATE, then evict
  each individually using old path data so only affected entries are removed.
- Replace recursive updateChildPathsRecursively with iterative
  updateChildPaths (ArrayDeque) to prevent stack overflow on deeply
  nested folder trees.
- Add refreshContentUnderFolder(folder) after path update to queue
  async ES reindex and keep Elasticsearch path data in sync.
- Add comment clarifying transaction context is provided by
  @WrapInTransaction on FolderAPIImpl.renameFolder().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Bug fix: evict old identifier URI cache entry before mutating
  asset_name by calling removeFromCacheByIdentifier(id) while the cache
  still holds the old URI; after save() the new URI key is stored
  normally. Avoids the need for the protected removeFromCacheByURI().
- Nav cache: loop over subFolderSnapshot to evict nav entries for every
  sub-folder inode in the renamed tree, not only the root folder.
- LIKE safety: escape '%' and '_' in folder-path LIKE parameters via
  escapeLikeParam() helper + ESCAPE '\\' clause so folder names
  containing SQL wildcard characters do not over-match.
- Javadoc/comments: fix misleading "level-by-level" description —
  traversal is depth-first (LIFO stack); parent-before-children
  invariant is still preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move refreshContentUnderFolder out of @WrapInTransaction scope:
  the call now lives in FolderAPIImpl.renameFolder after the factory
  returns, wrapped in try/catch so a transient ES failure never rolls
  back a successful DB rename.
- Remove duplicate nav-cache evictions from FolderAPIImpl (root folder
  inode and parent path); the factory already handles the full tree.
- Add functional index idx_identifier_parent_path_trigger on
  identifier(host_inode, asset_type, lower(parent_path||asset_name||'/'))
  so the identifier_parent_path_check trigger can use an index scan
  instead of a filter scan over all folders on the host. EXPLAIN
  confirmed cost drops from 90.34 to 8.57. Added to postgres.sql for
  new installs and Task260324AddIdentifierPathTriggerIndex for existing
  databases.
- Document the mutation ordering dependency in save(folder): save() must
  run before removeFromCacheByIdentifier() because it reads the
  identifier cache (still holds old asset_name) to build the correct
  FolderCache eviction key.
- Fix test old-path assertion: replace cache-aware findFolderByPath
  with a direct DotConnect query so a stale cache entry cannot mask a
  real bug.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Critical: derive oldPath from ident.getAssetName() (DB snapshot) instead
  of folder.getName(). When called via saveFolder the folder object already
  has getName()==newName, making oldPath==newPath so updateChildPaths matched
  zero rows and left all children with a stale parent_path.
- Remove ::text cast from idx_identifier_parent_path_trigger definition so the
  index expression matches the trigger query exactly. EXPLAIN confirmed both
  variants work identically (PostgreSQL normalises VARCHAR||VARCHAR to text
  internally), but removing the cast is cleaner and avoids any theoretical
  planner edge cases.
- Change Task260324 forceRun() to false: CREATE INDEX IF NOT EXISTS is
  idempotent; the startup framework records the task in db_version so it
  only runs once.
- Fix misleading comment in FolderAPIImpl.renameFolder: the ES reindex call
  is inside @WrapInTransaction (DB connection stays open), not post-commit.
  The try/catch prevents RuntimeException propagation and rollback; actual
  reindex work is async via dist_reindex_journal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Document inode preservation in renameFolder: save() uses upsertFolder
  keyed on the existing inode so the inode is never regenerated. This
  makes the old updateOtherFolderReferences calls (structure.folder,
  permission.inode_id, permission_reference.asset_id) pure no-ops that
  were safely removed.
- Add comment to evictSubFolderCache explaining why stub Folder and
  Identifier objects only need inode/hostId and parent_path/asset_name —
  those are the exact fields FolderCacheImpl.removeFolder reads.
- Fix Task260324 Javadoc: identifier_parent_path_trigger is the trigger
  name; identifier_parent_path_check() is the function it calls.
- Add integration test renameFolder_withLikeWildcardsInName_doesNotAffect
  SiblingFolders: renames a folder whose name contains _ and %, then
  asserts via loadFromDb that a sibling folder whose name matches the
  unescaped LIKE pattern retains its original parent_path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Eliminate N+1 SELECT queries in updateChildPaths: compute all
  (oldPath, newPath) pairs from the pre-loaded subFolderSnapshot instead
  of issuing a SELECT per folder level. Pairs are sorted by ascending
  path length (parent-before-child) to satisfy the trigger invariant.
  Removes ArrayDeque/Deque imports which are no longer needed.
- Document implicit mutation contract in FolderAPIImpl: the factory
  calls folder.setName(newName) so folder.getPath() returns the new path
  by the time refreshContentUnderFolder is called. Refactoring the
  factory to use a defensive copy must keep this call site in sync.
- Add Javadoc to clearIdentifierCacheForSubtree explaining that
  removeFromCacheByIdentifier uses the stale cached URI (old path) to
  evict the old path-keyed entry — which is the desired behavior.
- Add comment in loadSubFolderSnapshot clarifying that LIKE 'oldPath%'
  covers direct sub-folders: since oldPath ends with '/' and '%' matches
  zero chars, '/original/' LIKE '/original/%' is TRUE.
- Add IF NOT EXISTS to idx_identifier_parent_path_trigger in postgres.sql
  for consistency with the startup task and idempotent re-runs.
- Add test renameFolder_toExistingName_returnsFalse: renames a folder to
  the name of an existing sibling and asserts return value is false and
  neither folder's asset_name is modified.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Mar 25, 2026

🔍 dotCMS Backend Review

[🟡 Medium] Task260324AddIdentifierPathTriggerIndex.java:48 — PostgreSQL-specific DDL without a database-type guard

CREATE INDEX IF NOT EXISTS ... lower(parent_path||asset_name||'/') is a PostgreSQL functional-index expression. The task runs unconditionally on every database type. On H2 (integration tests) or any future non-Postgres backend this DDL will throw at startup, halting the task runner. Established runonce tasks (e.g. Task05080RecreateIdentifierIndex) guard equivalent DDL with DbConnectionFactory.isPostgres().

new DotConnect().executeStatement(
    "CREATE INDEX IF NOT EXISTS " + INDEX_NAME + " ON identifier"
    + " (host_inode, asset_type, lower(parent_path||asset_name||'/')))");

💡 Wrap in if (DbConnectionFactory.isPostgres()) { ... }. This was flagged in the prior review and is still open.


[🟡 Medium] FolderAPIImpl.java:192-204folder.getPath() logged without CR/LF sanitization

newName is correctly sanitized into safeNewName, but folder.getPath() — built from user-supplied ancestor folder names stored in the DB — is passed verbatim into every Logger.error() call in the catch blocks. A folder whose path contains \r or \n can inject synthetic log lines, masking audit trails or confusing log-aggregation pipelines.

Logger.error(FolderAPIImpl.class, "Error renaming folder '"
    + folder.getPath() + "' with id: " + folder.getIdentifier() + " to name: "
    + safeNewName + ". Error: " + e.getMessage());

💡 Apply the same sanitization: final String safePath = folder.getPath().replaceAll("[\\r\\n]", " "); and substitute safePath in all Logger calls inside the catch blocks. This was flagged in the prior review and is still open.


[🟡 Medium] Task260324AddIdentifierPathTriggerIndex.java:33INDEX_NAME constant has package-private visibility

static final String INDEX_NAME is accessible package-wide but serves no purpose outside this class. No other class in the package references it; the package-private access appears to be an accidental omission of private.

static final String INDEX_NAME = "idx_identifier_parent_path_trigger";

💡 Change to private static final String INDEX_NAME = ...;. This was flagged in the prior review and is still open.


Next steps

  • 🟡 You can ask me to handle mechanical fixes inline: @claude fix <issue description> in <File.java>
  • Every new push triggers a fresh review automatically

View job run

gortiz-dotcms and others added 8 commits March 25, 2026 09:46
…ta resilience

Guard against snapshot entries whose path does not start with the
expected prefix (corrupt data scenario); log a warning and skip
rather than throwing StringIndexOutOfBoundsException or producing
an incorrect replacement path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
refreshContentUnderFolder declares throws DotReindexStateException only.
Catching the precise exception (instead of Exception) allows DotDataException
(e.g. a failed dist_reindex_journal INSERT) to propagate and roll back the
transaction, while still swallowing transient ES reindex-queue failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… description

- FolderAPITest: swap test order so each Javadoc block immediately precedes
  its method (renameFolder_toExistingName_returnsFalse first,
  renameFolder_updatesChildrenAndSubChildrenPaths second)
- FolderFactoryImpl: add inline comment in updateChildPaths explaining the
  intentional absence of an asset_type filter in the bulk UPDATE

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dPaths

loadResult() on a DML statement works only because DotConnect internally
inspects the SQL prefix to route to statement.execute(). Using the
purpose-built executeUpdate() is semantically correct and avoids a fragile
dependency on that internal branching logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FolderAPIImpl: add explicit DotSecurityException re-throw before the broad
  catch(Exception) handler so permission errors from the factory keep their
  type and are not mis-mapped to 500 instead of 403.

- IdentifierCache/IdentifierCacheImpl: widen removeFromCacheByURI from
  protected to public so callers outside the cache hierarchy can evict
  URI-keyed entries directly (needed for cold-cache completeness).

- FolderFactoryImpl.clearIdentifierCacheForSubtree: also SELECT parent_path
  and asset_name; call removeFromCacheByURI(hostId, oldUri) before
  removeFromCacheByIdentifier(id) to guarantee full eviction even when the
  UUID-keyed entry is absent. Add comment explaining ContentletCache and
  HTMLPageCache are inode/UUID-keyed (no path eviction needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FolderFactoryImpl.renameFolder: add @WrapInTransaction so the factory
  operation is always atomic regardless of call path (belt-and-suspenders
  alongside the API-layer annotation).

- FolderFactoryImpl.renameFolder: null-guard loadFromDb result with an
  explicit DotDataException so a detached/cache-only folder produces a
  clear error instead of an NPE.

- FolderFactoryImpl.renameFolder: capture old folder.getName() and
  ident.getAssetName() before mutation; restore them in the catch block so
  callers that retain a reference after a failed rename see the original
  name, not the new one.

- FolderFactoryImpl.evictSubFolderCache: set assetType="folder" on the
  stub Identifier so Identifier.getPath() appends a trailing '/'.
  FolderCacheImpl keys path-based entries with the slash; without this
  every sub-folder path-keyed cache entry was silently missed on eviction.

- FolderAPIImpl.renameFolder: sanitize user-supplied newName before logging
  (strip \r and \n) to prevent log-injection attacks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FolderFactoryImpl.updateChildPaths: sanitize DB-sourced path values before
  logging (strip \r/\n) to prevent log injection, consistent with the
  safeNewName fix already applied in FolderAPIImpl.

- FolderFactoryImpl.updateChildPaths/clearIdentifierCacheForSubtree: add
  explicit (String) cast to row.get("parent_path") in both sites so a DB
  null produces a NullPointerException (detectable) rather than the string
  "null" silently forming a malformed path.

- FolderFactoryImpl.renameFolder: detect unique-constraint violation in the
  catch block via DbConnectionFactory.isConstraintViolationException(); treat
  it as a concurrent collision — restore old state and return false instead
  of re-throwing, matching the pre-checked collision path.

- IdentifierCacheImpl: add @OverRide to removeFromCacheByIdentifier(String)
  and removeFromCacheByURI per project progressive-enhancement rule.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. Remove @WrapInTransaction from FolderFactoryImpl.renameFolder() —
   the annotation was added by this PR but is unnecessary: both
   FolderAPIImpl.renameFolder() and FolderAPIImpl.save() already carry
   @WrapInTransaction, so the factory method always runs inside an
   existing transaction. The extra annotation caused the test
   renameFolder (line 202) to fail: findSubFolders() returned empty
   after rename.

2. Fix renameFolder_updatesChildrenAndSubChildrenPaths test (line 266):
   FolderDataGen.site() sets this.parent = null, so calling
   .parent(parentFolder).site(site) in that order created a root-level
   folder with parent_path='/' instead of a child folder. updateChildPaths
   only matched parent_path='/original-xxx/', so the sub-folder path was
   never updated, causing renamedSubIdent.getParentPath() to return '/'
   instead of '/renamed-xxx/'. Fix: remove the .site(site) call; the
   site is already captured via the parent folder's hostId.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[DEFECT] Folder rename fails with database constraint violation when folder contains many items

1 participant