Skip to content

feat(s3-publishing): Add S3 static publishing support for vanity URLs#35808

Merged
dsilvam merged 18 commits into
mainfrom
s3-static-push-vanity-url-dsilvam
Jun 4, 2026
Merged

feat(s3-publishing): Add S3 static publishing support for vanity URLs#35808
dsilvam merged 18 commits into
mainfrom
s3-static-push-vanity-url-dsilvam

Conversation

@dsilvam
Copy link
Copy Markdown
Member

@dsilvam dsilvam commented May 22, 2026

Summary

This PR carries the same changes as #35643 (originally authored by @riccardoruocco), opened under a different account so CI checks can run.

Proposed Changes

  • Add a feature flag, STATIC_PUSH_S3_VANITY_ALIAS_ENABLED, to enable Vanity URL handling for AWS S3 static publishing. When the flag is false, dotCMS behaves exactly as it does today and no Vanity URL alias is generated.
  • When the flag is true, publishing a Vanity URL to a static S3 endpoint makes dotCMS resolve the target content, identify the live resource, render or copy it, and write the static clone to the S3 path represented by the Vanity URL.
  • This behavior is backed by the s3_vanity_alias table, which acts as an operational snapshot of the Vanity URL state on S3 and stores the aliases that have actually been materialized.
  • When a canonical content is republished, dotCMS checks the alias table and refreshes the existing Vanity URL clone if one is already tracked. If no Vanity URL alias exists yet, nothing changes.
  • When content is unpublished or removed, dotCMS uses the alias table to remove the corresponding static alias from S3 and keep the bucket aligned with the current state.
  • When a Vanity URL is unpublished or updated, dotCMS updates the tracked alias accordingly, so stale S3 keys are not left behind.

Additional Info

The goal of this change is to make Vanity URLs work on static S3 publishing in the same way they already work at runtime on live dotCMS.

The implementation is intentionally opt-in and does not affect existing installations unless STATIC_PUSH_S3_VANITY_ALIAS_ENABLED=true is set.

The s3_vanity_alias table is used as the source of truth for what was actually written to S3, so publish, republish, unpublish, and delete can all behave consistently without depending only on the current live content state.

Closes #35663

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 22, 2026

Claude finished @dsilvam's task in 7m 11s —— View job


🔍 dotCMS Backend Review

The 🔴 Critical findings from the previous review (system-user rendering non-public content, file-asset anonymous access gate) and the earlier 🟠 High items (S3-before-DB ordering in publishMaterializedAlias and publishAliases, publishedNow not tracked) are all correctly remediated at HEAD. One 🟠 High item remains open, and six 🟡 Medium items carry over or are newly identified.


[🟠 High] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:398-409 (unpublishAliasesByVanityUrl)

repository.deleteByVanityUrlId(...) carries @WrapInTransaction — it commits atomically before control returns. The S3 removal loop then runs outside that transaction. If removeMaterializedAlias throws mid-loop, the DB rows are permanently gone while some S3 objects survive: DB and S3 diverge with no compensation path. The catch block at line 407 only re-throws.

repository.deleteByVanityUrlId(context.endpointId, languageId, vanityUrlId); // commits
for (final S3VanityAlias alias : persistedAliases) {
    removeMaterializedAlias(context, alias);  // S3 — if this throws, DB already committed
}

💡 Collect removed aliases in a deletedNow list; on exception, re-publish them (same restoreAliases pattern used in unpublishAliases). Move the deleteByVanityUrlId call after all S3 removals have succeeded.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/AWSS3EndPointPublisher.java:429-430 (getCompleteFileKey)

rejectPathTraversal is called for filePath but not for bucketRootPrefix. The prefix comes from endpoint configuration (aws_bucket_folder_prefix); a misconfigured or compromised endpoint with ../../ in the prefix would construct S3 keys outside the intended namespace.

protected String getCompleteFileKey(final String bucketRootPrefix, final String filePath) {
    rejectPathTraversal(filePath);   // only filePath is checked
    // bucketRootPrefix concatenated without traversal validation
}

💡 Add rejectPathTraversal(bucketRootPrefix) at the top of the method, before the prefix is used.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasRepository.java:18TABLE_NAME constant unused

TABLE_NAME = "static_s3_vanity_mapping" is declared but all eight SQL string constants hardcode the table name as a literal. A future rename of the constant would silently leave all queries pointing at the old name, providing false rename-safety.

private static final String TABLE_NAME = "static_s3_vanity_mapping"; // never referenced
// ...
"FROM static_s3_vanity_mapping WHERE ..."  // literal everywhere

💡 Replace each hardcoded literal in the SQL constants with TABLE_NAME.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasRepository.java:310-314longValue null-input hazard

String.valueOf(null) returns the four-character string "null", not null — so Long.parseLong("null") throws NumberFormatException instead of a meaningful error. The language_id column is NOT NULL in the DDL, but a schema inconsistency or direct DB write could trigger this path.

return Long.parseLong(String.valueOf(value)); // null -> "null" -> NumberFormatException

💡 Add a null guard: if (value == null) throw new IllegalStateException("Unexpected null for language_id"); — or replace with ConversionUtils.toLong(value, -1L) as used elsewhere in dotCMS for SQL row coercion.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:3 — unused import (carried from previous reviews)

import com.dotcms.business.WrapInTransaction is still present but no method in this class uses @WrapInTransaction. Transaction management is correctly pushed into S3VanityAliasRepository.

import com.dotcms.business.WrapInTransaction;

💡 Remove the unused import.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasSupport.java:71-73targetType silently ignored

materializeVanityPath(vanityPath, DotAsset targetType) accepts a targetType parameter but immediately delegates to normalizeVanityPath(vanityPath) without using it. Callers passing a meaningful DotAsset value receive no indication the argument is discarded.

public Optional<String> materializeVanityPath(final String vanityPath, final DotAsset targetType) {
    return normalizeVanityPath(vanityPath);  // targetType silently ignored
}

💡 Either remove the parameter and update callers, or add a Javadoc note that the parameter is reserved and currently has no effect.


[🟡 Medium] dotCMS/src/main/java/com/dotmarketing/startup/runonce/Task260408CreateS3VanityAliasTable.java:49-59 — unbounded varchar columns (carried from previous reviews)

Hash columns (canonical_path_hash, vanity_path_hash) hold SHA-256 hex (always 64 chars); id columns (endpoint_id, host_id, vanity_url_id) hold UUIDs (36 chars). Bare varchar communicates no contract and keeps the composite PK index footprint larger than necessary.

endpoint_id varchar not null,
canonical_path_hash varchar not null,
vanity_path_hash varchar not null,

💡 Use varchar(36) for UUID columns and varchar(64) for SHA-256 hash columns.


Next steps

  • 🟠 Fix locally and push — these need your judgment
  • 🟡 You can ask me to handle mechanical fixes inline: @claude fix <issue description> in <File.java>
  • Every new push triggers a fresh review automatically

@semgrep-code-dotcms-test
Copy link
Copy Markdown
Contributor

Semgrep found 16 CUSTOM_INJECTION-2 findings:

The method identified is susceptible to injection. The input should be validated and properly
escaped.

If this is a critical or high severity finding, please also link this issue in the #security channel in Slack.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 22, 2026

🔍 dotCMS Backend Review

The previously-flagged 🔴 Critical and 🟠 High findings (path traversal in S3VanityAliasSupport, Lucene injection in AWSS3Publisher, defense-in-depth in getCompleteFileKey, system-user / cross-tenant access, and the @WrapInTransaction over S3 I/O) all appear properly remediated at HEAD (6e20a421). The Italian comment, missing Thread.currentThread().interrupt(), and Optional.get() without isPresent() are also resolved. Three 🟡 Medium items remain.


[🟡 Medium] dotCMS/src/main/java/com/dotmarketing/startup/runonce/Task260408CreateS3VanityAliasTable.java:49-59

The PK columns (endpoint_id, host_id, canonical_path_hash, vanity_path_hash) and vanity_url_id are declared as bare varchar (functionally text on Postgres). The hash columns hold SHA-256 hex (always 64 chars) and the id columns are UUIDs (36 chars). Sizing them communicates the schema contract, keeps the PK index footprint smaller, and lets Postgres reject malformed values early.

endpoint_id varchar not null,
host_id varchar not null,
canonical_path_hash varchar not null,
vanity_path_hash varchar not null,
vanity_url_id varchar,

💡 Tighten to varchar(36) for the id columns and varchar(64) for the hash columns. Path columns can stay unbounded since they hold arbitrary URL paths.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasRepository.java:281-286

longValue falls back to Long.parseLong(String.valueOf(value)) if the JDBC driver doesn't return a Number. Postgres always returns bigint as Long today, so the fallback is effectively dead — but String.valueOf(null) returns the literal "null", which would throw a confusing NumberFormatException rather than a meaningful error, and a value like "42.0" would also fail rather than parse to 42. Brittle, not strictly broken.

private long longValue(final Object value) {
    if (value instanceof Number) {
        return ((Number) value).longValue();
    }
    return Long.parseLong(String.valueOf(value));
}

💡 Either trust the JDBC contract and throw on unexpected types, or use ConversionUtils.toLong(value, -1L) (already used elsewhere in dotCMS for SQL row coercion).


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:3

Unused import: com.dotcms.business.WrapInTransaction is imported but no method in the file is annotated with @WrapInTransaction — the transactional boundary was correctly pushed down into S3VanityAliasRepository. Stale imports add noise and obscure that the transaction model has changed.

import com.dotcms.business.WrapInTransaction;
import com.dotcms.vanityurl.business.VanityUrlAPI;
import com.dotcms.vanityurl.model.CachedVanityUrl;

💡 Remove the unused WrapInTransaction import.


Next steps

  • 🟡 You can ask me to handle these mechanical fixes inline: @claude fix <issue description> in <File.java>
  • Every new push triggers a fresh review automatically

@syedATdot syedATdot force-pushed the s3-static-push-vanity-url-dsilvam branch from d3c5801 to 1f11202 Compare May 26, 2026 06:24
@mergify
Copy link
Copy Markdown

mergify Bot commented May 26, 2026

⚠️ The sha of the head commit of this PR conflicts with #35643. Mergify cannot evaluate rules on this PR. Once #35643 is merged or closed, Mergify will resume processing this PR. ⚠️

@mergify
Copy link
Copy Markdown

mergify Bot commented May 26, 2026

⚠️ The sha of the head commit of this PR conflicts with #35643. Mergify cannot evaluate rules on this PR. Once #35643 is merged or closed, Mergify will resume processing this PR. ⚠️

dsilvam and others added 2 commits May 26, 2026 13:48
- Path traversal: add containsTraversalOrUnsafePath() to S3VanityAliasSupport
  normalizers — URL-decodes once, then rejects .. / . segments, control chars,
  and residual % (double-encoded sequences)
- Lucene injection: validate asset identifier as UUID before concatenating into
  ES/Lucene query in AWSS3Publisher
- Defense-in-depth: add rejectPathTraversal() guard in getCompleteFileKey()
  before any path becomes an S3 object key
- Connection pool: remove @WrapInTransaction from all four public service methods
  in S3VanityAliasService — S3 I/O now runs outside DB transactions; each
  repository call manages its own short-lived transaction

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… DDL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 3, 2026

🔍 dotCMS Backend Review

The three 🟡 Medium items flagged in the previous review (varchar column sizing, longValue fallback, unused WrapInTransaction import) remain unaddressed. Three new 🟠 High findings and additional 🟡 Medium items were identified on this push.


[🟠 High] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:96

Contentlet.toString() uses reflective field dumping via ToStringBuilder.reflectionToString. Logging the full contentlet exposes all content field values — including custom fields that may hold PII — to log files. In a static-publishing context this is particularly sensitive because the contentlet may carry user-generated data from any content type.

Logger.warn(this, "Skipping unsupported Vanity URL: " + vanityContentlet);

💡 Log only stable identifiers: Logger.warn(this, "Skipping unsupported Vanity URL: " + (vanityContentlet != null ? vanityContentlet.getIdentifier() : "null"));


[🟠 High] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:~515 (unpublishAliasesByVanityUrl)

Each alias in the loop gets its own isolated @WrapInTransaction DB commit (via repository.deleteAlias). If the S3 removal succeeds for alias #1 and then the DB delete for alias #2 fails, alias #1's row is already committed-deleted while alias #1's S3 object is also gone — the table and S3 are permanently diverged with no rollback path. The single-alias deleteAlias method should not be called in a loop; a bulk deleteByVanityUrlId should be exposed and called once after all S3 removes complete.

for (final S3VanityAlias alias : persistedAliases) {
    removeMaterializedAlias(context, alias);  // S3 I/O — irreversible
    repository.deleteAlias(alias);            // one commit per alias — no cross-alias atomicity
}

💡 Complete all S3 removes first, then issue a single repository.deleteByVanityUrlId(...) call. Add a public @WrapInTransaction deleteByVanityUrlId delegating to the existing private deleteByVanityUrlIdInternal.


[🟠 High] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:~1403 (publishMaterializedAlias)

S3 writes (publishAlias, deleteObsoleteAliases) are issued before the DB write (repository.replaceMappingsByVanityUrlId). If the DB write fails after S3 succeeds, the DB retains old alias rows that no longer match the newly published S3 content. Future publishes will then attempt to delete or refresh stale S3 keys. The catch block only re-throws — it performs no S3 compensation.

publishAlias(context, alias, file);                     // S3 write — committed
deleteObsoleteAliases(context, persistedAliases, alias); // S3 delete — committed
repository.replaceMappingsByVanityUrlId(...);            // if this throws, DB is stale

💡 Invert the order: write to DB first (inside its @WrapInTransaction), then issue S3 I/O. If S3 fails, add a compensating repository.replaceMappingsByVanityUrlId(...) with the old mapping list in the catch block, analogous to the restoreAliases pattern used in publishAliases.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:~1375 (systemUser for page rendering)

htmlPageAssetAPI.getHTML is called with the system user and liveMode=true. The system user bypasses dotCMS permission checks. Any live page reachable by the system user — including content behind permission-protected containers or personalization blocks — may be rendered and published as a publicly accessible static S3 file. This should be explicitly verified: confirm that liveMode=true restricts the rendered output to the same content an anonymous public visitor would see.

return htmlPageAssetAPI.getHTML(target.htmlPage, true, contentletInode, systemUser,
        context.language.getId(), Constants.USER_AGENT_DOTCMS_PUSH_PUBLISH);

[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/AWSS3Publisher.java:~456 (Lucene query construction)

The Lucene query is built by string concatenation of assetId into the query string. The UUID guard directly above is the only injection barrier. This is a fragile pattern: if the guard were accidentally removed or bypassed, unescaped content could reach the Lucene index. Per dotCMS standards, Lucene queries involving external values should use escaping or builder utilities as defense-in-depth.

final List<Contentlet> contentlets = contentletAPI.search(
    "+identifier:" + assetId + " +live:true", 0, 0, null,
    APILocator.getUserAPI().getSystemUser(), false);

💡 Add a brief comment documenting the UUID guard as the sole injection barrier, or apply LuceneQueryUtils.escapeSpecialChars(assetId) if available.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasRepository.java:18

TABLE_NAME is declared but never referenced — all six SQL string constants hardcode "static_s3_vanity_mapping" as a literal. A future rename of the constant would silently leave all queries pointing at the old name.

private static final String TABLE_NAME = "static_s3_vanity_mapping";
// ...
"FROM static_s3_vanity_mapping "  // TABLE_NAME never referenced

💡 Replace each hardcoded "static_s3_vanity_mapping" in the SQL constants with TABLE_NAME.


[🟡 Medium] dotCMS/src/main/java/com/dotmarketing/startup/runonce/Task260408CreateS3VanityAliasTable.java:49-59 (from previous review, still open)

Hash columns (canonical_path_hash, vanity_path_hash) hold SHA-256 hex (always 64 chars) and id columns are UUIDs (always 36 chars). Bare varchar communicates no contract, keeps the PK index footprint larger than needed, and allows malformed values into the PK.

endpoint_id varchar not null,
canonical_path_hash varchar not null,
vanity_path_hash varchar not null,

💡 Use varchar(36) for id columns and varchar(64) for hash columns.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasRepository.java:281-286 (from previous review, still open)

longValue falls back to Long.parseLong(String.valueOf(value)). String.valueOf(null) returns "null", which throws NumberFormatException rather than a meaningful error. The fallback is effectively dead code since Postgres always returns bigint as Long, but it is brittle.

return Long.parseLong(String.valueOf(value));

💡 Either throw on unexpected types, or use ConversionUtils.toLong(value, -1L).


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:3 (from previous review, still open)

import com.dotcms.business.WrapInTransaction is still present but no method in the class uses @WrapInTransaction. Transaction management was correctly pushed down into the repository layer.

import com.dotcms.business.WrapInTransaction;

💡 Remove the unused import.


Next steps

  • 🟠 Fix locally and push — these need your judgment
  • 🟡 You can ask me to handle mechanical fixes inline: @claude fix <issue description> in <File.java>
  • Every new push triggers a fresh review automatically

@dsilvam dsilvam disabled auto-merge June 3, 2026 19:36
- Log only vanityContentlet.getIdentifier() instead of toString() to
  prevent PII exposure from reflective field dumping
- Invert DB/S3 write order in publishMaterializedAlias: commit DB first,
  then issue S3 I/O; compensate DB on S3 failure to keep mappings
  consistent with actual S3 state
- Complete all S3 removes before the single bulk deleteByVanityUrlId
  call in unpublishAliasesByVanityUrl to avoid cross-alias partial
  commit divergence between table and bucket
- Add public deleteByVanityUrlId overloads (with and without languageId)
  backed by a new DELETE_BY_VANITY_URL_ID_ANY_LANGUAGE SQL constant
- Replace Optional.of(fileAsset.getFileAsset()) with Optional.ofNullable
  in S3VanityResolvedTarget.physicalFile() to prevent NPE when binary
  is absent on the local node (remote storage, not yet synced)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 4, 2026

🔍 dotCMS Backend Review

[🔴 Critical] S3VanityAliasService.java:~405 (renderTargetHtml) + S3VanityTargetResolver.java:~162 (resolveFileAsset)

htmlPageAssetAPI.getHTML(...) is called with systemUser. PermissionBitAPIImpl short-circuits to return true unconditionally for the system user, so any page — members-only, employee-only, permission-controlled — that a Vanity URL points to will be rendered and written as a permanently public S3 static file. This is more severe than a temporary permission bypass: the content is materialized in the bucket and remains publicly accessible even after the page is later unpublished or access-controlled. The same issue applies in resolveFileAsset where findContentletByIdentifierOrFallback with respectFrontendRoles=false and system user bypasses all front-end role gates.

return htmlPageAssetAPI.getHTML(target.htmlPage, true, contentletInode, systemUser,
        context.language.getId(), Constants.USER_AGENT_DOTCMS_PUSH_PUBLISH);

💡 Before rendering, assert the target has CMS Anonymous read permission: permissionAPI.doesUserHavePermission(page, PermissionAPI.PERMISSION_READ, null /* anonymous */, true). Skip and call unpublishAliases if the page is not publicly readable. Apply the same anonymous-access gate in resolveFileAsset.


[🟠 High] AWSS3Publisher.java:~456 (findLiveVanityContentlet)

The Lucene/Elasticsearch query is built by string concatenation with the UUID guard as the sole injection barrier. The assetId value originates from the publish queue — a system boundary — and UUIDUtil.isUUID is the only protection. If UUIDUtil.isUUID ever accepts non-canonical UUID forms (e.g., uppercase, trimmed-but-padded), arbitrary Lucene clauses could be injected. The existing findLiveVanityContentletForLanguage method uses findContentletByIdentifier directly, which is the safer pattern and should be used here too.

final List<Contentlet> contentlets = contentletAPI.search(
    "+identifier:" + assetId + " +live:true",
    0, 0, null, APILocator.getUserAPI().getSystemUser(), false);

💡 Use findContentletByIdentifier per-language (same pattern as findLiveVanityContentletForLanguage) or add +contentType:vanity_url to narrow scope and document the UUID-guard assumption with a comment.


[🟠 High] S3VanityAliasService.java:~514 (publishAliases)

S3 publishes and deletes are issued before the DB write (repository.replaceMappings). The compensation via restoreAliases only restores S3-deleted objects, but newly published alias objects are never rolled back on DB failure. This leaves S3 and the DB permanently diverged if the DB write throws — until a full republish happens. The publishedAliases consumer is a no-op (alias -> { }), so published objects cannot be compensated.

publishAliases(context, aliasesToRefresh, alias -> { });   // S3 writes — not tracked for rollback
deleteAliases(context, aliasesToDelete, deletedNow::add);  // S3 deletes
repository.replaceMappings(context.lookup, aliasesToRefresh); // DB — if this fails, published S3 objects leak

💡 Track newly published aliases in a publishedNow list and on DB failure issue compensating S3 deletes for them, symmetric with how deletedNow is restored.


[🟠 High] S3VanityAliasService.java:~542 (unpublishAliasesByVanityUrl)

S3 deletes happen inside the loop before the DB delete. If S3 succeeds for some aliases and the DB delete subsequently fails, the DB still records those aliases as present while the S3 objects are already gone — the table and S3 are diverged with no compensation path. The unpublishAliases method at line ~706 handles this correctly with a deletedNow collector and restoreAliases; unpublishAliasesByVanityUrl should follow the same pattern.

for (final S3VanityAlias alias : persistedAliases) {
    removeMaterializedAlias(context, alias);  // S3 I/O — no rollback if DB step later fails
}
repository.deleteByVanityUrlId(context.endpointId, languageId, vanityUrlId); // DB delete

💡 Collect successfully removed aliases in a deletedNow list; on exception, re-publish them (same as restoreAliases pattern in unpublishAliases).


[🟡 Medium] AWSS3EndPointPublisher.java:~436 (getCompleteFileKey)

rejectPathTraversal is called only for filePath. The bucketRootPrefix (from endpoint configuration property aws_bucket_folder_prefix) is concatenated directly into the S3 object key without any traversal check. A misconfigured or compromised endpoint with ../../ in the prefix would construct S3 keys outside the intended namespace, potentially overwriting objects in sibling "directories" within the bucket.

rejectPathTraversal(filePath);   // only filePath checked
// bucketRootPrefix concatenated without validation:
completeFileKey = bucketRootPrefix + File.separator + completeFileKey;

💡 Call rejectPathTraversal(bucketRootPrefix) at the start of getCompleteFileKey.


[🟡 Medium] S3VanityAliasService.java:~514 (publishAliases) — logic defect

filterExisting(currentAliases, indexByStorageLocation(persistedAliases)) returns only aliases whose S3 location already exists in the persisted set. Brand-new aliases present in currentAliases but absent from persistedAliases are silently excluded from both aliasesToRefresh and aliasesToDelete and are never published to S3 or written to the DB. This means a canonical-file republish will never create first-time alias entries, only refresh already-materialized ones.

final List<S3VanityAlias> aliasesToRefresh = filterExisting(currentAliases,
        indexByStorageLocation(persistedAliases));  // new aliases silently dropped

💡 Confirm whether this intentional "refresh only" contract is correct. If new aliases should also be published on canonical republish, aliasesToRefresh should be currentAliases and filterExisting used only to identify unchanged objects.


[🟡 Medium] S3VanityAliasRepository.java:~18TABLE_NAME constant unused

TABLE_NAME = "static_s3_vanity_mapping" is declared but all seven SQL string constants hardcode the literal directly. If the table is renamed, all eight literals must be updated manually while the constant remains stale, silently providing false rename-safety.

private static final String TABLE_NAME = "static_s3_vanity_mapping"; // never used
// ...
"FROM static_s3_vanity_mapping WHERE ..."  // TABLE_NAME not referenced

💡 Replace each hardcoded literal with TABLE_NAME in the SQL constants, e.g. "FROM " + TABLE_NAME + " WHERE ..."


[🟡 Medium] S3VanityAliasRepository.java:~310longValue throws NumberFormatException on null

String.valueOf(null) returns the four-character string "null", not null — so Long.parseLong("null") throws NumberFormatException, masking the root cause and propagating as an unchecked exception. The language_id column is NOT NULL in the DDL, but a schema inconsistency or direct DB write could trigger this.

return Long.parseLong(String.valueOf(value)); // null value → "null" → NumberFormatException

💡 Add a null guard: if (value == null) throw new IllegalStateException("Unexpected null for language_id"); — or use ConversionUtils.toLong(value, -1L) as used elsewhere in dotCMS for SQL row coercion.


[🟡 Medium] S3VanityAliasService.java:3 — unused WrapInTransaction import (carried from previous review)

import com.dotcms.business.WrapInTransaction is still present but no method in this class uses @WrapInTransaction. Transaction management is correctly pushed into S3VanityAliasRepository.

import com.dotcms.business.WrapInTransaction;

💡 Remove the unused import.


[🟡 Medium] S3VanityAliasSupport.java:~63targetType parameter silently ignored

materializeVanityPath(vanityPath, DotAsset targetType) accepts a targetType parameter but never uses it — it delegates unconditionally to normalizeVanityPath(vanityPath). Callers passing a DotAsset value receive no indication the argument is discarded, which is misleading and a latent API hazard.

public Optional<String> materializeVanityPath(final String vanityPath, final DotAsset targetType) {
    return normalizeVanityPath(vanityPath);  // targetType silently ignored
}

💡 Either remove the parameter and update callers, or document explicitly in Javadoc that the parameter is reserved and has no current effect.


[🟡 Medium] Task260408CreateS3VanityAliasTable.java:49-59 — unbounded varchar columns (carried from previous review)

Hash columns (canonical_path_hash, vanity_path_hash) hold SHA-256 hex (always 64 chars); id columns (endpoint_id, host_id, vanity_url_id) hold UUIDs (36 chars). Bare varchar with no length communicates no contract and keeps the PK index footprint larger than necessary.

endpoint_id varchar not null,
canonical_path_hash varchar not null,
vanity_path_hash varchar not null,

💡 Use varchar(36) for UUID columns and varchar(64) for SHA-256 hash columns.


Next steps

  • 🔴 / 🟠 Fix locally and push — these need your judgment
  • 🟡 You can ask me to handle mechanical fixes inline: @claude fix <issue description> in <File.java>
  • Every new push triggers a fresh review automatically

…ings

- Add anonymous-read permission gate in renderTargetHtml before calling
  htmlPageAssetAPI.getHTML with system user: pages not publicly readable
  by anonymous visitors are now skipped rather than materialized as
  permanently public S3 static files
- Add the same anonymous-read gate in S3VanityTargetResolver.resolveFileAsset
  for file asset targets resolved with system user
- Track publishedNow aliases in publishAliases and compensate (delete from
  S3) if the subsequent repository.replaceMappings DB write fails, making
  the publish side symmetric with the existing deletedNow/restoreAliases
  compensation on the delete side
- Flip DB/S3 order in unpublishAliasesByVanityUrl to DB-first: deleteByVanityUrlId
  commits before S3 removes so a DB failure leaves no orphaned S3 keys;
  any S3-remove failures after a successful DB delete leave only harmless
  orphaned objects
- Replace Lucene string-concatenation search in findLiveVanityContentlet
  with a per-language findContentletByIdentifier loop, eliminating the
  structural injection risk and matching the pattern already used in
  findLiveVanityContentletForLanguage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dsilvam dsilvam enabled auto-merge June 4, 2026 14:41
@dsilvam dsilvam added this pull request to the merge queue Jun 4, 2026
Merged via the queue into main with commit beb3cce Jun 4, 2026
59 checks passed
@dsilvam dsilvam deleted the s3-static-push-vanity-url-dsilvam branch June 4, 2026 16:20
fmontes pushed a commit that referenced this pull request Jun 4, 2026
…#35808)

## Summary

This PR carries the same changes as #35643 (originally authored by
@riccardoruocco), opened under a different account so CI checks can run.

### Proposed Changes
* Add a feature flag, `STATIC_PUSH_S3_VANITY_ALIAS_ENABLED`, to enable
Vanity URL handling for AWS S3 static publishing. When the flag is
`false`, dotCMS behaves exactly as it does today and no Vanity URL alias
is generated.
* When the flag is `true`, publishing a Vanity URL to a static S3
endpoint makes dotCMS resolve the target content, identify the live
resource, render or copy it, and write the static clone to the S3 path
represented by the Vanity URL.
* This behavior is backed by the `s3_vanity_alias` table, which acts as
an operational snapshot of the Vanity URL state on S3 and stores the
aliases that have actually been materialized.
* When a canonical content is republished, dotCMS checks the alias table
and refreshes the existing Vanity URL clone if one is already tracked.
If no Vanity URL alias exists yet, nothing changes.
* When content is unpublished or removed, dotCMS uses the alias table to
remove the corresponding static alias from S3 and keep the bucket
aligned with the current state.
* When a Vanity URL is unpublished or updated, dotCMS updates the
tracked alias accordingly, so stale S3 keys are not left behind.

### Additional Info
The goal of this change is to make Vanity URLs work on static S3
publishing in the same way they already work at runtime on live dotCMS.

The implementation is intentionally opt-in and does not affect existing
installations unless `STATIC_PUSH_S3_VANITY_ALIAS_ENABLED=true` is set.

The `s3_vanity_alias` table is used as the source of truth for what was
actually written to S3, so publish, republish, unpublish, and delete can
all behave consistently without depending only on the current live
content state.

Closes #35663

---------

Co-authored-by: RiccardoRuocco <ruocco.rf@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Restore static S3 Vanity URL support for dotCMS 2 compatibility

5 participants