fix: add new approach to rename functionality (#34655)#35086
fix: add new approach to rename functionality (#34655)#35086gortiz-dotcms wants to merge 15 commits intomainfrom
Conversation
|
Claude finished @gortiz-dotcms's task in 6m 39s —— View job 🔍 dotCMS Backend Review
Three 🟡 Medium issues identified (all previously flagged, all still open). Full review posted in the dedicated review comment above. |
- Replace folderCache.clearCache() (nuclear) with targeted eviction: snapshot sub-folder old paths before the bulk UPDATE, then evict each individually using old path data so only affected entries are removed. - Replace recursive updateChildPathsRecursively with iterative updateChildPaths (ArrayDeque) to prevent stack overflow on deeply nested folder trees. - Add refreshContentUnderFolder(folder) after path update to queue async ES reindex and keep Elasticsearch path data in sync. - Add comment clarifying transaction context is provided by @WrapInTransaction on FolderAPIImpl.renameFolder(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Bug fix: evict old identifier URI cache entry before mutating asset_name by calling removeFromCacheByIdentifier(id) while the cache still holds the old URI; after save() the new URI key is stored normally. Avoids the need for the protected removeFromCacheByURI(). - Nav cache: loop over subFolderSnapshot to evict nav entries for every sub-folder inode in the renamed tree, not only the root folder. - LIKE safety: escape '%' and '_' in folder-path LIKE parameters via escapeLikeParam() helper + ESCAPE '\\' clause so folder names containing SQL wildcard characters do not over-match. - Javadoc/comments: fix misleading "level-by-level" description — traversal is depth-first (LIFO stack); parent-before-children invariant is still preserved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move refreshContentUnderFolder out of @WrapInTransaction scope: the call now lives in FolderAPIImpl.renameFolder after the factory returns, wrapped in try/catch so a transient ES failure never rolls back a successful DB rename. - Remove duplicate nav-cache evictions from FolderAPIImpl (root folder inode and parent path); the factory already handles the full tree. - Add functional index idx_identifier_parent_path_trigger on identifier(host_inode, asset_type, lower(parent_path||asset_name||'/')) so the identifier_parent_path_check trigger can use an index scan instead of a filter scan over all folders on the host. EXPLAIN confirmed cost drops from 90.34 to 8.57. Added to postgres.sql for new installs and Task260324AddIdentifierPathTriggerIndex for existing databases. - Document the mutation ordering dependency in save(folder): save() must run before removeFromCacheByIdentifier() because it reads the identifier cache (still holds old asset_name) to build the correct FolderCache eviction key. - Fix test old-path assertion: replace cache-aware findFolderByPath with a direct DotConnect query so a stale cache entry cannot mask a real bug. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Critical: derive oldPath from ident.getAssetName() (DB snapshot) instead of folder.getName(). When called via saveFolder the folder object already has getName()==newName, making oldPath==newPath so updateChildPaths matched zero rows and left all children with a stale parent_path. - Remove ::text cast from idx_identifier_parent_path_trigger definition so the index expression matches the trigger query exactly. EXPLAIN confirmed both variants work identically (PostgreSQL normalises VARCHAR||VARCHAR to text internally), but removing the cast is cleaner and avoids any theoretical planner edge cases. - Change Task260324 forceRun() to false: CREATE INDEX IF NOT EXISTS is idempotent; the startup framework records the task in db_version so it only runs once. - Fix misleading comment in FolderAPIImpl.renameFolder: the ES reindex call is inside @WrapInTransaction (DB connection stays open), not post-commit. The try/catch prevents RuntimeException propagation and rollback; actual reindex work is async via dist_reindex_journal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Document inode preservation in renameFolder: save() uses upsertFolder keyed on the existing inode so the inode is never regenerated. This makes the old updateOtherFolderReferences calls (structure.folder, permission.inode_id, permission_reference.asset_id) pure no-ops that were safely removed. - Add comment to evictSubFolderCache explaining why stub Folder and Identifier objects only need inode/hostId and parent_path/asset_name — those are the exact fields FolderCacheImpl.removeFolder reads. - Fix Task260324 Javadoc: identifier_parent_path_trigger is the trigger name; identifier_parent_path_check() is the function it calls. - Add integration test renameFolder_withLikeWildcardsInName_doesNotAffect SiblingFolders: renames a folder whose name contains _ and %, then asserts via loadFromDb that a sibling folder whose name matches the unescaped LIKE pattern retains its original parent_path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Eliminate N+1 SELECT queries in updateChildPaths: compute all (oldPath, newPath) pairs from the pre-loaded subFolderSnapshot instead of issuing a SELECT per folder level. Pairs are sorted by ascending path length (parent-before-child) to satisfy the trigger invariant. Removes ArrayDeque/Deque imports which are no longer needed. - Document implicit mutation contract in FolderAPIImpl: the factory calls folder.setName(newName) so folder.getPath() returns the new path by the time refreshContentUnderFolder is called. Refactoring the factory to use a defensive copy must keep this call site in sync. - Add Javadoc to clearIdentifierCacheForSubtree explaining that removeFromCacheByIdentifier uses the stale cached URI (old path) to evict the old path-keyed entry — which is the desired behavior. - Add comment in loadSubFolderSnapshot clarifying that LIKE 'oldPath%' covers direct sub-folders: since oldPath ends with '/' and '%' matches zero chars, '/original/' LIKE '/original/%' is TRUE. - Add IF NOT EXISTS to idx_identifier_parent_path_trigger in postgres.sql for consistency with the startup task and idempotent re-runs. - Add test renameFolder_toExistingName_returnsFalse: renames a folder to the name of an existing sibling and asserts return value is false and neither folder's asset_name is modified. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🔍 dotCMS Backend Review[🟡 Medium]
new DotConnect().executeStatement(
"CREATE INDEX IF NOT EXISTS " + INDEX_NAME + " ON identifier"
+ " (host_inode, asset_type, lower(parent_path||asset_name||'/')))");💡 Wrap in [🟡 Medium]
Logger.error(FolderAPIImpl.class, "Error renaming folder '"
+ folder.getPath() + "' with id: " + folder.getIdentifier() + " to name: "
+ safeNewName + ". Error: " + e.getMessage());💡 Apply the same sanitization: [🟡 Medium]
static final String INDEX_NAME = "idx_identifier_parent_path_trigger";💡 Change to Next steps
|
…ta resilience Guard against snapshot entries whose path does not start with the expected prefix (corrupt data scenario); log a warning and skip rather than throwing StringIndexOutOfBoundsException or producing an incorrect replacement path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
refreshContentUnderFolder declares throws DotReindexStateException only. Catching the precise exception (instead of Exception) allows DotDataException (e.g. a failed dist_reindex_journal INSERT) to propagate and roll back the transaction, while still swallowing transient ES reindex-queue failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… description - FolderAPITest: swap test order so each Javadoc block immediately precedes its method (renameFolder_toExistingName_returnsFalse first, renameFolder_updatesChildrenAndSubChildrenPaths second) - FolderFactoryImpl: add inline comment in updateChildPaths explaining the intentional absence of an asset_type filter in the bulk UPDATE Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dPaths loadResult() on a DML statement works only because DotConnect internally inspects the SQL prefix to route to statement.execute(). Using the purpose-built executeUpdate() is semantically correct and avoids a fragile dependency on that internal branching logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FolderAPIImpl: add explicit DotSecurityException re-throw before the broad catch(Exception) handler so permission errors from the factory keep their type and are not mis-mapped to 500 instead of 403. - IdentifierCache/IdentifierCacheImpl: widen removeFromCacheByURI from protected to public so callers outside the cache hierarchy can evict URI-keyed entries directly (needed for cold-cache completeness). - FolderFactoryImpl.clearIdentifierCacheForSubtree: also SELECT parent_path and asset_name; call removeFromCacheByURI(hostId, oldUri) before removeFromCacheByIdentifier(id) to guarantee full eviction even when the UUID-keyed entry is absent. Add comment explaining ContentletCache and HTMLPageCache are inode/UUID-keyed (no path eviction needed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FolderFactoryImpl.renameFolder: add @WrapInTransaction so the factory operation is always atomic regardless of call path (belt-and-suspenders alongside the API-layer annotation). - FolderFactoryImpl.renameFolder: null-guard loadFromDb result with an explicit DotDataException so a detached/cache-only folder produces a clear error instead of an NPE. - FolderFactoryImpl.renameFolder: capture old folder.getName() and ident.getAssetName() before mutation; restore them in the catch block so callers that retain a reference after a failed rename see the original name, not the new one. - FolderFactoryImpl.evictSubFolderCache: set assetType="folder" on the stub Identifier so Identifier.getPath() appends a trailing '/'. FolderCacheImpl keys path-based entries with the slash; without this every sub-folder path-keyed cache entry was silently missed on eviction. - FolderAPIImpl.renameFolder: sanitize user-supplied newName before logging (strip \r and \n) to prevent log-injection attacks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FolderFactoryImpl.updateChildPaths: sanitize DB-sourced path values before
logging (strip \r/\n) to prevent log injection, consistent with the
safeNewName fix already applied in FolderAPIImpl.
- FolderFactoryImpl.updateChildPaths/clearIdentifierCacheForSubtree: add
explicit (String) cast to row.get("parent_path") in both sites so a DB
null produces a NullPointerException (detectable) rather than the string
"null" silently forming a malformed path.
- FolderFactoryImpl.renameFolder: detect unique-constraint violation in the
catch block via DbConnectionFactory.isConstraintViolationException(); treat
it as a concurrent collision — restore old state and return false instead
of re-throwing, matching the pre-checked collision path.
- IdentifierCacheImpl: add @OverRide to removeFromCacheByIdentifier(String)
and removeFromCacheByURI per project progressive-enhancement rule.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. Remove @WrapInTransaction from FolderFactoryImpl.renameFolder() — the annotation was added by this PR but is unnecessary: both FolderAPIImpl.renameFolder() and FolderAPIImpl.save() already carry @WrapInTransaction, so the factory method always runs inside an existing transaction. The extra annotation caused the test renameFolder (line 202) to fail: findSubFolders() returned empty after rename. 2. Fix renameFolder_updatesChildrenAndSubChildrenPaths test (line 266): FolderDataGen.site() sets this.parent = null, so calling .parent(parentFolder).site(site) in that order created a root-level folder with parent_path='/' instead of a child folder. updateChildPaths only matched parent_path='/original-xxx/', so the sub-folder path was never updated, causing renamedSubIdent.getParentPath() to return '/' instead of '/renamed-xxx/'. Fix: remove the .site(site) call; the site is already captured via the parent folder's hostId. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This PR fixes: #34655
Problem
`FolderAPIImpl.renameFolder()` failed with a PostgreSQL constraint violation when renaming a folder that contained more than ~20K contentlets:
```
ERROR: Cannot delete as this path has children
Where: PL/pgSQL function check_child_assets() line 13 at RAISE
```
Root Cause
The old implementation followed a create-new / move-all / delete-old cycle:
Step 2 is the critical flaw: any contentlet not indexed in Elasticsearch is silently skipped (e.g. `.HEIC` files, recently uploaded content, or anything with a stale index). After moving only the ES-indexed subset, the old folder still had DB children pointing to its path. When step 4 tried to delete the folder, the PostgreSQL trigger `check_child_assets()` found those leftover children and raised a constraint violation.
This also explains why a manual re-index temporarily "fixed" the issue (#32181): the re-index brought ES and the DB back in sync, so the next rename attempt found all children.
Secondary cost: moving 20K+ items one-by-one in Java meant 20K+ individual DB round-trips, making the operation slow even when it succeeded.
Fix
Replace the create-new / move-all / delete-old cycle with a pure-SQL in-place rename:
No new folder is created, no old folder is deleted — so the `check_child_assets()` trigger never fires. The rename is now DB-only and ES-independent, so unindexed items are never missed.
Why depth-first ordering (not a single bulk UPDATE)?
The existing `identifier_parent_path_trigger` fires `BEFORE UPDATE` on every `identifier` row and verifies that the new `parent_path` resolves to an existing folder. Processing each folder level before its children guarantees that by the time the trigger runs for a child row, its new parent path already exists in the DB.
All (oldPath → newPath) pairs are derived from a pre-loaded snapshot of sub-folder data and sorted by ascending old-path length: a parent path is always shorter than any of its descendants, so length-order guarantees parent-before-child without recursion or an explicit stack.
New index for trigger performance
`EXPLAIN` on the trigger's validation query showed it was using `idx_identifier_perm` (`asset_type + host_inode`) and then applying `lower(parent_path||asset_name||'/')` as a row-level filter — a scan over all folders on the host per trigger invocation. With 20K rows being updated this is O(N × F) lookups.
A new functional index `idx_identifier_parent_path_trigger` on `(host_inode, asset_type, lower(parent_path||asset_name||'/'))` matches the trigger's exact expression. After adding it, `EXPLAIN` showed the expression moved from `Filter` to `Index Cond` and cost dropped from 90.34 → 8.57. The index is added to `postgres.sql` for new installs and via startup task `Task260324AddIdentifierPathTriggerIndex` for existing databases.
Performance
Changes
Tests
🤖 Generated with Claude Code