Skip to content

[python] Support tag time_retained (TTL) on FileSystemCatalog#8319

Open
TheR1sing3un wants to merge 6 commits into
apache:masterfrom
TheR1sing3un:feat/python-tag-time-retained
Open

[python] Support tag time_retained (TTL) on FileSystemCatalog#8319
TheR1sing3un wants to merge 6 commits into
apache:masterfrom
TheR1sing3un:feat/python-tag-time-retained

Conversation

@TheR1sing3un

Copy link
Copy Markdown
Member

Purpose

FileSystemCatalog.create_tag rejected time_retained with NotImplementedError, and the Python Tag only inherited Snapshot fields, so get_tag and the $tags system table could only return None for create-time / TTL.

This implements real tag time_retained support on the FileSystem path, persisting tagCreateTime / tagTimeRetained in the same on-disk JSON shape as Java (org.apache.paimon.tag.Tag) so tag files round-trip across the Java and Python SDKs.

Changes

  • Tag now carries tag_create_time (LocalDateTime as a [y, mo, d, h, mi, s, ns] array) and tag_time_retained (Duration as decimal seconds), via a new per-field JSON codec.
  • create_tag / replace_tag thread time_retained through TagManager / FileStoreTable / FileSystemCatalog. With no retention, the plain Snapshot JSON is written (backward compatible), mirroring Java TagManager.createOrReplaceTag.
  • get_tag and the $tags system table surface real create_time / time_retained (matching Java Timestamp.fromLocalDateTime / Duration.toString()).
  • Tag expiration (TTL-based deletion) is intentionally out of scope.

Tests

Unit tests for the temporal codecs and Tag serde (Java golden-value shape, round-trip, reading Java-written tags, legacy plain-snapshot backward compatibility) plus FileSystemCatalog / $tags end-to-end coverage for create/replace with time_retained.

Does this PR introduce a user-facing change?

No.


Generative AI disclosure: drafted with AI assistance and reviewed by the author.

Add an optional encoder/decoder per dataclass field in the JSON
serializer (json_field_with_codec), applied only when present so existing
dataclasses are unaffected. Add time_utils codecs that mirror Jackson's
on-disk shapes for java.time types: LocalDateTime as a
[y, mo, d, h, mi, s, ns] array, Duration as decimal seconds, plus
duration_to_iso8601 and local_datetime_to_millis helpers.
Turn Tag into a dataclass extending Snapshot with optional
tag_create_time and tag_time_retained, serialized in the same on-disk
JSON shape as Java org.apache.paimon.tag.Tag so tag files round-trip
across the Java and Python SDKs. Add the from_snapshot_and_tag_ttl
factory mirroring Java's Tag.fromSnapshotAndTagTtl.
Thread time_retained through TagManager / FileStoreTable /
FileSystemCatalog so create_tag and replace_tag persist a create-time and
TTL. Mirror Java TagManager.createOrReplaceTag: with no retention the
plain Snapshot JSON is written (no tag-specific fields) to stay readable
by older readers; with a retention the richer Tag JSON is written.

Drop the NotImplementedError that previously rejected time_retained and
surface the values via FileSystemCatalog.get_tag and the $tags system
table instead of None.
Add unit tests for the temporal codecs and Tag serde (Java golden-value
on-disk shape, round-trip, reading Java-written tags, and legacy
plain-snapshot backward compatibility). Update the FileSystemCatalog and
$tags end-to-end tests to cover create/replace with time_retained,
Java-compatible on-disk JSON, and the no-TTL plain-snapshot path.
Comment thread paimon-python/pypaimon/tag/tag_manager.py Outdated
…rosecond

Address review: parse_duration returns rounded milliseconds, so wrapping it
with timedelta(milliseconds=...) silently turned sub-millisecond retentions
(e.g. "1ns", "500micro") into a zero-TTL tag.

Add parse_duration_nanos (full-precision integer nanoseconds; parse_duration
is left untouched since option parsing relies on its millisecond contract) and
use it on the tag path: retentions are kept at microsecond precision
("500micro" -> 0.0005s / PT0.0005S), while sub-microsecond values that Python's
timedelta cannot represent now raise instead of silently writing a zero-TTL tag.
@TheR1sing3un TheR1sing3un requested a review from JingsongLi June 23, 2026 07:17
tag_time_retained=None,
tag_create_time=(
None if tag.tag_create_time is None
else local_datetime_to_millis(tag.tag_create_time)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetTagResponse is supposed to mirror the Java REST response, but this conversion treats the tag LocalDateTime as UTC. The Java REST path converts tagCreateTime with atZone(ZoneId.systemDefault()).toInstant().toEpochMilli(), and TagManager creates the value with local LocalDateTime.now(). On non-UTC hosts, a filesystem-catalog tag created at local noon will be reported as noon UTC here, offset by the local timezone (for example +8h in Asia/Shanghai). Can we match the Java REST conversion here, and add a non-UTC timezone test to lock it down?

Address review: get_tag built GetTagResponse.tagCreateTime via the zone-less
(UTC) helper, but the Java REST path
(RESTFileSystemCatalog#getTag) converts it with
tagCreateTime.atZone(ZoneId.systemDefault()).toInstant().toEpochMilli(). On
non-UTC hosts the reported millis were offset by the local zone.

Add local_datetime_to_system_zone_millis (mirrors atZone(systemDefault)) and use
it in get_tag. The $tags system table keeps local_datetime_to_millis (zone-less),
matching Java Timestamp.fromLocalDateTime; the two paths intentionally differ on
non-UTC hosts, exactly as the Java sides do. Add a TZ-controlled test locking down
both conversions.
commit_identifier=snapshot.commit_identifier,
commit_kind=snapshot.commit_kind,
time_millis=snapshot.time_millis,
base_manifest_list_size=snapshot.base_manifest_list_size,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is preserved when building the TTL Tag, but trim_to_snapshot() below still drops it again (along with delta_manifest_list_size, changelog_manifest_list_size, and properties). Since FileSystemCatalog.get_tag() returns tag.trim_to_snapshot(), the REST/catalog response loses those Snapshot fields for TTL tags, unlike Java Tag.trimToSnapshot(). Please copy all optional Snapshot fields in trim_to_snapshot() too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants