feat: Add S3 archive FileIO support #8143
Conversation
|
I found one blocker in the S3 archive implementation: archive/unarchive currently change storage class by issuing a single CopyObject request for the same key. S3 single-copy only supports objects up to 5 GB, while Paimon data files can be larger than that, so this will fail for valid large data files. Please branch on the object size from HeadObjectResponse and use multipart copy / UploadPartCopy for large objects, with a test covering that path. |
| static UploadPartCopyRequest uploadPartCopyRequest( | ||
| String bucket, String key, String uploadId, CopyPartRange range, String eTag) { | ||
| UploadPartCopyRequest.Builder builder = | ||
| UploadPartCopyRequest.builder() |
There was a problem hiding this comment.
Large-object archive/unarchive bypasses S3A copy request preparation for each UploadPartCopy request. The single-copy path goes through RequestFactory.newCopyObjectRequestBuilder, which applies configured encryption settings, but this hand-built multipart-copy request never sets copySourceSSECustomerAlgorithm/key/MD5. For buckets configured with SSE-C, Paimon can write/read the object and small archives work, but any object above 5 GB will fail because S3 requires the source SSE-C headers on every UploadPartCopy. Please build these part-copy requests through the same S3A helper/factory path or propagate the configured copy-source encryption headers, and add coverage for that case.
Ref #5510 (comment)
Purpose
Implements S3-backed archive, restore, and unarchive operations for Paimon FileIO by mapping StorageType to S3 storage classes and issuing same-key S3 copy/restore requests.
We will have follow up PRs for OSS, other supported object storage and all other not supported will through unsupported exception.
Tests
mvn -pl paimon-filesystems/paimon-s3-impl -am -Pfast-build -DfailIfNoTests=false -Dtest=S3ArchiveOperationsTest test