feat(s3-publishing): Add S3 static publishing support for vanity URLs#35808
Conversation
|
Claude finished @dsilvam's task in 7m 11s —— View job 🔍 dotCMS Backend ReviewThe 🔴 Critical findings from the previous review (system-user rendering non-public content, file-asset anonymous access gate) and the earlier 🟠 High items (S3-before-DB ordering in [🟠 High]
repository.deleteByVanityUrlId(context.endpointId, languageId, vanityUrlId); // commits
for (final S3VanityAlias alias : persistedAliases) {
removeMaterializedAlias(context, alias); // S3 — if this throws, DB already committed
}💡 Collect removed aliases in a [🟡 Medium]
protected String getCompleteFileKey(final String bucketRootPrefix, final String filePath) {
rejectPathTraversal(filePath); // only filePath is checked
// bucketRootPrefix concatenated without traversal validation
}💡 Add [🟡 Medium]
private static final String TABLE_NAME = "static_s3_vanity_mapping"; // never referenced
// ...
"FROM static_s3_vanity_mapping WHERE ..." // literal everywhere💡 Replace each hardcoded literal in the SQL constants with [🟡 Medium]
return Long.parseLong(String.valueOf(value)); // null -> "null" -> NumberFormatException💡 Add a null guard: [🟡 Medium]
import com.dotcms.business.WrapInTransaction;💡 Remove the unused import. [🟡 Medium]
public Optional<String> materializeVanityPath(final String vanityPath, final DotAsset targetType) {
return normalizeVanityPath(vanityPath); // targetType silently ignored
}💡 Either remove the parameter and update callers, or add a Javadoc note that the parameter is reserved and currently has no effect. [🟡 Medium]
endpoint_id varchar not null,
canonical_path_hash varchar not null,
vanity_path_hash varchar not null,💡 Use Next steps
|
|
Semgrep found 16
The method identified is susceptible to injection. The input should be validated and properly If this is a critical or high severity finding, please also link this issue in the #security channel in Slack. |
🔍 dotCMS Backend ReviewThe previously-flagged 🔴 Critical and 🟠 High findings (path traversal in [🟡 Medium]
endpoint_id varchar not null,
host_id varchar not null,
canonical_path_hash varchar not null,
vanity_path_hash varchar not null,
vanity_url_id varchar,💡 Tighten to [🟡 Medium]
private long longValue(final Object value) {
if (value instanceof Number) {
return ((Number) value).longValue();
}
return Long.parseLong(String.valueOf(value));
}💡 Either trust the JDBC contract and throw on unexpected types, or use [🟡 Medium]
import com.dotcms.business.WrapInTransaction;
import com.dotcms.vanityurl.business.VanityUrlAPI;
import com.dotcms.vanityurl.model.CachedVanityUrl;💡 Remove the unused Next steps
|
d3c5801 to
1f11202
Compare
- Path traversal: add containsTraversalOrUnsafePath() to S3VanityAliasSupport normalizers — URL-decodes once, then rejects .. / . segments, control chars, and residual % (double-encoded sequences) - Lucene injection: validate asset identifier as UUID before concatenating into ES/Lucene query in AWSS3Publisher - Defense-in-depth: add rejectPathTraversal() guard in getCompleteFileKey() before any path becomes an S3 object key - Connection pool: remove @WrapInTransaction from all four public service methods in S3VanityAliasService — S3 I/O now runs outside DB transactions; each repository call manages its own short-lived transaction Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… DDL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🔍 dotCMS Backend ReviewThe three 🟡 Medium items flagged in the previous review ( [🟠 High]
Logger.warn(this, "Skipping unsupported Vanity URL: " + vanityContentlet);💡 Log only stable identifiers: [🟠 High]
for (final S3VanityAlias alias : persistedAliases) {
removeMaterializedAlias(context, alias); // S3 I/O — irreversible
repository.deleteAlias(alias); // one commit per alias — no cross-alias atomicity
}💡 Complete all S3 removes first, then issue a single [🟠 High]
publishAlias(context, alias, file); // S3 write — committed
deleteObsoleteAliases(context, persistedAliases, alias); // S3 delete — committed
repository.replaceMappingsByVanityUrlId(...); // if this throws, DB is stale💡 Invert the order: write to DB first (inside its [🟡 Medium]
return htmlPageAssetAPI.getHTML(target.htmlPage, true, contentletInode, systemUser,
context.language.getId(), Constants.USER_AGENT_DOTCMS_PUSH_PUBLISH);[🟡 Medium]
final List<Contentlet> contentlets = contentletAPI.search(
"+identifier:" + assetId + " +live:true", 0, 0, null,
APILocator.getUserAPI().getSystemUser(), false);💡 Add a brief comment documenting the UUID guard as the sole injection barrier, or apply [🟡 Medium]
private static final String TABLE_NAME = "static_s3_vanity_mapping";
// ...
"FROM static_s3_vanity_mapping " // TABLE_NAME never referenced💡 Replace each hardcoded [🟡 Medium]
endpoint_id varchar not null,
canonical_path_hash varchar not null,
vanity_path_hash varchar not null,💡 Use [🟡 Medium]
return Long.parseLong(String.valueOf(value));💡 Either throw on unexpected types, or use [🟡 Medium]
import com.dotcms.business.WrapInTransaction;💡 Remove the unused import. Next steps
|
- Log only vanityContentlet.getIdentifier() instead of toString() to prevent PII exposure from reflective field dumping - Invert DB/S3 write order in publishMaterializedAlias: commit DB first, then issue S3 I/O; compensate DB on S3 failure to keep mappings consistent with actual S3 state - Complete all S3 removes before the single bulk deleteByVanityUrlId call in unpublishAliasesByVanityUrl to avoid cross-alias partial commit divergence between table and bucket - Add public deleteByVanityUrlId overloads (with and without languageId) backed by a new DELETE_BY_VANITY_URL_ID_ANY_LANGUAGE SQL constant - Replace Optional.of(fileAsset.getFileAsset()) with Optional.ofNullable in S3VanityResolvedTarget.physicalFile() to prevent NPE when binary is absent on the local node (remote storage, not yet synced) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🔍 dotCMS Backend Review[🔴 Critical]
return htmlPageAssetAPI.getHTML(target.htmlPage, true, contentletInode, systemUser,
context.language.getId(), Constants.USER_AGENT_DOTCMS_PUSH_PUBLISH);💡 Before rendering, assert the target has CMS Anonymous read permission: [🟠 High]
final List<Contentlet> contentlets = contentletAPI.search(
"+identifier:" + assetId + " +live:true",
0, 0, null, APILocator.getUserAPI().getSystemUser(), false);💡 Use [🟠 High]
publishAliases(context, aliasesToRefresh, alias -> { }); // S3 writes — not tracked for rollback
deleteAliases(context, aliasesToDelete, deletedNow::add); // S3 deletes
repository.replaceMappings(context.lookup, aliasesToRefresh); // DB — if this fails, published S3 objects leak💡 Track newly published aliases in a [🟠 High]
for (final S3VanityAlias alias : persistedAliases) {
removeMaterializedAlias(context, alias); // S3 I/O — no rollback if DB step later fails
}
repository.deleteByVanityUrlId(context.endpointId, languageId, vanityUrlId); // DB delete💡 Collect successfully removed aliases in a [🟡 Medium]
rejectPathTraversal(filePath); // only filePath checked
// bucketRootPrefix concatenated without validation:
completeFileKey = bucketRootPrefix + File.separator + completeFileKey;💡 Call [🟡 Medium]
final List<S3VanityAlias> aliasesToRefresh = filterExisting(currentAliases,
indexByStorageLocation(persistedAliases)); // new aliases silently dropped💡 Confirm whether this intentional "refresh only" contract is correct. If new aliases should also be published on canonical republish, [🟡 Medium]
private static final String TABLE_NAME = "static_s3_vanity_mapping"; // never used
// ...
"FROM static_s3_vanity_mapping WHERE ..." // TABLE_NAME not referenced💡 Replace each hardcoded literal with [🟡 Medium]
return Long.parseLong(String.valueOf(value)); // null value → "null" → NumberFormatException💡 Add a null guard: [🟡 Medium]
import com.dotcms.business.WrapInTransaction;💡 Remove the unused import. [🟡 Medium]
public Optional<String> materializeVanityPath(final String vanityPath, final DotAsset targetType) {
return normalizeVanityPath(vanityPath); // targetType silently ignored
}💡 Either remove the parameter and update callers, or document explicitly in Javadoc that the parameter is reserved and has no current effect. [🟡 Medium]
endpoint_id varchar not null,
canonical_path_hash varchar not null,
vanity_path_hash varchar not null,💡 Use Next steps
|
…ings - Add anonymous-read permission gate in renderTargetHtml before calling htmlPageAssetAPI.getHTML with system user: pages not publicly readable by anonymous visitors are now skipped rather than materialized as permanently public S3 static files - Add the same anonymous-read gate in S3VanityTargetResolver.resolveFileAsset for file asset targets resolved with system user - Track publishedNow aliases in publishAliases and compensate (delete from S3) if the subsequent repository.replaceMappings DB write fails, making the publish side symmetric with the existing deletedNow/restoreAliases compensation on the delete side - Flip DB/S3 order in unpublishAliasesByVanityUrl to DB-first: deleteByVanityUrlId commits before S3 removes so a DB failure leaves no orphaned S3 keys; any S3-remove failures after a successful DB delete leave only harmless orphaned objects - Replace Lucene string-concatenation search in findLiveVanityContentlet with a per-language findContentletByIdentifier loop, eliminating the structural injection risk and matching the pattern already used in findLiveVanityContentletForLanguage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…#35808) ## Summary This PR carries the same changes as #35643 (originally authored by @riccardoruocco), opened under a different account so CI checks can run. ### Proposed Changes * Add a feature flag, `STATIC_PUSH_S3_VANITY_ALIAS_ENABLED`, to enable Vanity URL handling for AWS S3 static publishing. When the flag is `false`, dotCMS behaves exactly as it does today and no Vanity URL alias is generated. * When the flag is `true`, publishing a Vanity URL to a static S3 endpoint makes dotCMS resolve the target content, identify the live resource, render or copy it, and write the static clone to the S3 path represented by the Vanity URL. * This behavior is backed by the `s3_vanity_alias` table, which acts as an operational snapshot of the Vanity URL state on S3 and stores the aliases that have actually been materialized. * When a canonical content is republished, dotCMS checks the alias table and refreshes the existing Vanity URL clone if one is already tracked. If no Vanity URL alias exists yet, nothing changes. * When content is unpublished or removed, dotCMS uses the alias table to remove the corresponding static alias from S3 and keep the bucket aligned with the current state. * When a Vanity URL is unpublished or updated, dotCMS updates the tracked alias accordingly, so stale S3 keys are not left behind. ### Additional Info The goal of this change is to make Vanity URLs work on static S3 publishing in the same way they already work at runtime on live dotCMS. The implementation is intentionally opt-in and does not affect existing installations unless `STATIC_PUSH_S3_VANITY_ALIAS_ENABLED=true` is set. The `s3_vanity_alias` table is used as the source of truth for what was actually written to S3, so publish, republish, unpublish, and delete can all behave consistently without depending only on the current live content state. Closes #35663 --------- Co-authored-by: RiccardoRuocco <ruocco.rf@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
This PR carries the same changes as #35643 (originally authored by @riccardoruocco), opened under a different account so CI checks can run.
Proposed Changes
STATIC_PUSH_S3_VANITY_ALIAS_ENABLED, to enable Vanity URL handling for AWS S3 static publishing. When the flag isfalse, dotCMS behaves exactly as it does today and no Vanity URL alias is generated.true, publishing a Vanity URL to a static S3 endpoint makes dotCMS resolve the target content, identify the live resource, render or copy it, and write the static clone to the S3 path represented by the Vanity URL.s3_vanity_aliastable, which acts as an operational snapshot of the Vanity URL state on S3 and stores the aliases that have actually been materialized.Additional Info
The goal of this change is to make Vanity URLs work on static S3 publishing in the same way they already work at runtime on live dotCMS.
The implementation is intentionally opt-in and does not affect existing installations unless
STATIC_PUSH_S3_VANITY_ALIAS_ENABLED=trueis set.The
s3_vanity_aliastable is used as the source of truth for what was actually written to S3, so publish, republish, unpublish, and delete can all behave consistently without depending only on the current live content state.Closes #35663