[python] Support tag time_retained (TTL) on FileSystemCatalog#8319
[python] Support tag time_retained (TTL) on FileSystemCatalog#8319TheR1sing3un wants to merge 6 commits into
Conversation
Add an optional encoder/decoder per dataclass field in the JSON serializer (json_field_with_codec), applied only when present so existing dataclasses are unaffected. Add time_utils codecs that mirror Jackson's on-disk shapes for java.time types: LocalDateTime as a [y, mo, d, h, mi, s, ns] array, Duration as decimal seconds, plus duration_to_iso8601 and local_datetime_to_millis helpers.
Turn Tag into a dataclass extending Snapshot with optional tag_create_time and tag_time_retained, serialized in the same on-disk JSON shape as Java org.apache.paimon.tag.Tag so tag files round-trip across the Java and Python SDKs. Add the from_snapshot_and_tag_ttl factory mirroring Java's Tag.fromSnapshotAndTagTtl.
Thread time_retained through TagManager / FileStoreTable / FileSystemCatalog so create_tag and replace_tag persist a create-time and TTL. Mirror Java TagManager.createOrReplaceTag: with no retention the plain Snapshot JSON is written (no tag-specific fields) to stay readable by older readers; with a retention the richer Tag JSON is written. Drop the NotImplementedError that previously rejected time_retained and surface the values via FileSystemCatalog.get_tag and the $tags system table instead of None.
Add unit tests for the temporal codecs and Tag serde (Java golden-value on-disk shape, round-trip, reading Java-written tags, and legacy plain-snapshot backward compatibility). Update the FileSystemCatalog and $tags end-to-end tests to cover create/replace with time_retained, Java-compatible on-disk JSON, and the no-TTL plain-snapshot path.
…rosecond
Address review: parse_duration returns rounded milliseconds, so wrapping it
with timedelta(milliseconds=...) silently turned sub-millisecond retentions
(e.g. "1ns", "500micro") into a zero-TTL tag.
Add parse_duration_nanos (full-precision integer nanoseconds; parse_duration
is left untouched since option parsing relies on its millisecond contract) and
use it on the tag path: retentions are kept at microsecond precision
("500micro" -> 0.0005s / PT0.0005S), while sub-microsecond values that Python's
timedelta cannot represent now raise instead of silently writing a zero-TTL tag.
| tag_time_retained=None, | ||
| tag_create_time=( | ||
| None if tag.tag_create_time is None | ||
| else local_datetime_to_millis(tag.tag_create_time) |
There was a problem hiding this comment.
GetTagResponse is supposed to mirror the Java REST response, but this conversion treats the tag LocalDateTime as UTC. The Java REST path converts tagCreateTime with atZone(ZoneId.systemDefault()).toInstant().toEpochMilli(), and TagManager creates the value with local LocalDateTime.now(). On non-UTC hosts, a filesystem-catalog tag created at local noon will be reported as noon UTC here, offset by the local timezone (for example +8h in Asia/Shanghai). Can we match the Java REST conversion here, and add a non-UTC timezone test to lock it down?
Address review: get_tag built GetTagResponse.tagCreateTime via the zone-less (UTC) helper, but the Java REST path (RESTFileSystemCatalog#getTag) converts it with tagCreateTime.atZone(ZoneId.systemDefault()).toInstant().toEpochMilli(). On non-UTC hosts the reported millis were offset by the local zone. Add local_datetime_to_system_zone_millis (mirrors atZone(systemDefault)) and use it in get_tag. The $tags system table keeps local_datetime_to_millis (zone-less), matching Java Timestamp.fromLocalDateTime; the two paths intentionally differ on non-UTC hosts, exactly as the Java sides do. Add a TZ-controlled test locking down both conversions.
| commit_identifier=snapshot.commit_identifier, | ||
| commit_kind=snapshot.commit_kind, | ||
| time_millis=snapshot.time_millis, | ||
| base_manifest_list_size=snapshot.base_manifest_list_size, |
There was a problem hiding this comment.
This field is preserved when building the TTL Tag, but trim_to_snapshot() below still drops it again (along with delta_manifest_list_size, changelog_manifest_list_size, and properties). Since FileSystemCatalog.get_tag() returns tag.trim_to_snapshot(), the REST/catalog response loses those Snapshot fields for TTL tags, unlike Java Tag.trimToSnapshot(). Please copy all optional Snapshot fields in trim_to_snapshot() too.
Purpose
FileSystemCatalog.create_tagrejectedtime_retainedwithNotImplementedError, and the PythonTagonly inheritedSnapshotfields, soget_tagand the$tagssystem table could only returnNonefor create-time / TTL.This implements real tag
time_retainedsupport on the FileSystem path, persistingtagCreateTime/tagTimeRetainedin the same on-disk JSON shape as Java (org.apache.paimon.tag.Tag) so tag files round-trip across the Java and Python SDKs.Changes
Tagnow carriestag_create_time(LocalDateTime as a[y, mo, d, h, mi, s, ns]array) andtag_time_retained(Duration as decimal seconds), via a new per-field JSON codec.create_tag/replace_tagthreadtime_retainedthrough TagManager / FileStoreTable / FileSystemCatalog. With no retention, the plain Snapshot JSON is written (backward compatible), mirroring JavaTagManager.createOrReplaceTag.get_tagand the$tagssystem table surface realcreate_time/time_retained(matching JavaTimestamp.fromLocalDateTime/Duration.toString()).Tests
Unit tests for the temporal codecs and Tag serde (Java golden-value shape, round-trip, reading Java-written tags, legacy plain-snapshot backward compatibility) plus FileSystemCatalog /
$tagsend-to-end coverage for create/replace withtime_retained.Does this PR introduce a user-facing change?
No.
Generative AI disclosure: drafted with AI assistance and reviewed by the author.