Skip to content

[core] Sort data evolution manifests by RowID#8329

Merged
JingsongLi merged 3 commits into
apache:masterfrom
JingsongLi:codex/manifest-sort-data-evolution-rowid
Jun 24, 2026
Merged

[core] Sort data evolution manifests by RowID#8329
JingsongLi merged 3 commits into
apache:masterfrom
JingsongLi:codex/manifest-sort-data-evolution-rowid

Conversation

@JingsongLi

Copy link
Copy Markdown
Contributor

Summary

Support manifest sort rewrite for data evolution tables by using RowID-aware sort keys. Partitioned data evolution tables sort entries by partition first, then RowID range, while non-partitioned tables sort directly by RowID.

Changes

  • Add a manifest sort key abstraction so existing partition sorting and data evolution RowID sorting share the same compaction flow.
  • Route non-partitioned data evolution manifests with complete RowID stats through sort compaction.
  • Order data evolution manifest entries by partition, first RowID, RowID range end, and descending max sequence number for duplicate RowID ranges.
  • Allow manifest sort validation for data evolution tables without partition keys.
  • Add tests for partitioned and non-partitioned data evolution manifest sorting.

Testing

  • mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=ManifestFileMetaTest#testDataEvolutionManifestSortByPartitionAndRowId,NoPartitionManifestFileMetaTest#testDataEvolutionManifestSortByRowId,SchemaValidationTest#testManifestSortValidation test
  • mvn -pl paimon-core -am -DskipTests compile
  • git diff --check

Notes

No migration required. If any manifest lacks RowID stats, data evolution sorting falls back to the existing non-RowID path.

@JingsongLi JingsongLi marked this pull request as draft June 23, 2026 07:37
@JingsongLi JingsongLi marked this pull request as ready for review June 23, 2026 11:14
Comment thread paimon-core/src/main/java/org/apache/paimon/schema/SchemaValidation.java Outdated

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the RowID-aware manifest sort path. I checked the fallback behavior when RowID stats are incomplete, schema validation for data-evolution/non-partition tables, and the partition-first + RowID range ordering for data evolution manifests. No blocking issues found.

Validation:

  • git diff --check
  • mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=ManifestFileMetaTest#testDataEvolutionManifestSortByPartitionAndRowId,NoPartitionManifestFileMetaTest#testDataEvolutionManifestSortByRowId,SchemaValidationTest#testManifestSortValidation test

Note: in this local temp workspace, the first test run hit the known codegen-loader ServiceLoader resource issue; after unpacking the codegen test resource, the targeted tests passed.

@JingsongLi JingsongLi merged commit 50f2863 into apache:master Jun 24, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants