Extend Cache Runtime: implements tiered level store for worker and client.#5972
Extend Cache Runtime: implements tiered level store for worker and client.#5972xliuqq wants to merge 7 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request refactors the tiered storage configuration in the CacheRuntime API, replacing the generic MediumSource with direct fields for ProcessMemory, EmptyDir, and HostPath, and implements the corresponding transformation logic in the cache engine. The review feedback identifies a critical path inconsistency for ProcessMemory between the engine transformation and ConfigMap generation, outdated examples in the newly added user guide documentation, copy-paste comment errors in the API types and utility functions, and suggests expanding unit tests to verify volume mounts for ProcessMemory.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| @@ -0,0 +1,280 @@ | |||
| # CacheRuntime TieredStore 配置示例 | |||
There was a problem hiding this comment.
This documentation provides examples using the old API structure for tieredStore, which has been removed in this PR. All examples should be updated to reflect the new API structure. For instance, medium, top-level path, and quota fields are no longer used. Instead, fields like processMemory, emptyDir, and hostPath should be used directly within a level.
| It("should add memory resources to container", func() { | ||
| tieredStore := &datav1alpha1.RuntimeTieredStore{ | ||
| Levels: []datav1alpha1.RuntimeTieredStoreLevel{ | ||
| { | ||
| ProcessMemory: &datav1alpha1.ProcessMemoryMediumSource{ | ||
| Quota: resource.MustParse("4Gi"), | ||
| }, | ||
| High: "0.9", | ||
| Low: "0.7", | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| // Set initial memory resources | ||
| podSpec.Containers[0].Resources.Requests[corev1.ResourceMemory] = resource.MustParse("2Gi") | ||
| podSpec.Containers[0].Resources.Limits[corev1.ResourceMemory] = resource.MustParse("4Gi") | ||
|
|
||
| err := engine.TransformRuntimeTieredStore(tieredStore, podSpec) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
|
|
||
| // Verify memory request is increased | ||
| memRequest := podSpec.Containers[0].Resources.Requests[corev1.ResourceMemory] | ||
| expectedRequest := resource.MustParse("6Gi") // 2Gi + 4Gi | ||
| Expect(memRequest.Cmp(expectedRequest)).To(Equal(0)) | ||
|
|
||
| // Verify memory limit is increased | ||
| memLimit := podSpec.Containers[0].Resources.Limits[corev1.ResourceMemory] | ||
| expectedLimit := resource.MustParse("8Gi") // 4Gi + 4Gi | ||
| Expect(memLimit.Cmp(expectedLimit)).To(Equal(0)) | ||
| }) |
There was a problem hiding this comment.
This test for ProcessMemory medium only verifies the memory resource changes. It should also verify that the correct volume and volume mount are created. Specifically, it should check that an emptyDir volume with StorageMediumMemory is created and mounted at /dev/shm to match the path configured in the runtime's configmap. This would have caught the path inconsistency bug between transform_tiered_store.go and cm.go.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds CacheRuntime tiered-store support by introducing a new tiered store transformer for worker/client pods, updating the CacheRuntime API/CRD schema, and wiring tiered store configuration into runtime ConfigMaps and e2e manifests.
Changes:
- Add
TransformRuntimeTieredStoreto translate tiered-store config into Pod volumes/mounts and (for memory media) container memory requests/limits. - Evolve CacheRuntime tiered-store API/CRD schema and propagate tier configs into generated runtime config (ConfigMap) + Helm CRDs.
- Update curvine e2e YAMLs and add Ginkgo/Gomega tests for tiered store transform behavior.
Reviewed changes
Copilot reviewed 15 out of 18 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/gha-e2e/curvine/minio.yaml | Moves/introduces the MinIO secret alongside MinIO service for e2e. |
| test/gha-e2e/curvine/dataset.yaml | Removes embedded secret now defined in minio.yaml. |
| test/gha-e2e/curvine/cacheruntimeclass.yaml | Generates worker data_dir entries from tiered store config levels/mountPaths/quotas. |
| test/gha-e2e/curvine/cacheruntime.yaml | Updates tiered store example config structure (emptyDir quota). |
| pkg/ddc/cache/engine/util.go | Adds tiered-store mount path helpers. |
| pkg/ddc/cache/engine/transform_worker.go | Wires tiered-store transform into worker pod transformation flow. |
| pkg/ddc/cache/engine/transform_client.go | Wires tiered-store transform into client pod transformation flow. |
| pkg/ddc/cache/engine/transform_tiered_store.go | Implements tiered-store -> podSpec transformation logic. |
| pkg/ddc/cache/engine/transform_tiered_store_test.go | Adds unit tests covering tiered-store transformation behaviors and errors. |
| pkg/ddc/cache/engine/cm.go | Extracts tiered-store levels into runtime component config for ConfigMap consumption. |
| pkg/common/cacheruntime.go | Adds TieredStoreLevels config struct used in runtime config JSON. |
| api/v1alpha1/cacheruntime_types.go | Reshapes tiered-store API types (processMemory/hostPath/emptyDir) and validations. |
| config/crd/bases/data.fluid.io_cacheruntimes.yaml | Updates CRD schema to match new tiered-store level structure. |
| charts/fluid/fluid/crds/data.fluid.io_cacheruntimes.yaml | Mirrors CRD schema update for Helm chart packaging. |
| docs/zh/userguide/cache_runtime_tieredstore.md | Adds tiered-store user guide examples and behavior notes (currently mismatched with new API). |
| pkg/ddc/cache/engine/transform_client_test.go | License header indentation normalization. |
Files not reviewed (2)
- api/v1alpha1/openapi_generated.go: Language not supported
- api/v1alpha1/zz_generated.deepcopy.go: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // add an memory emptyDir for /dev/shm in the container | ||
| volumeName := fmt.Sprintf("tiered-store-level-%d-memory", levelIndex) | ||
| mountPath := GetEmptyDirTieredStoreMountPath(levelIndex) | ||
| volume := corev1.Volume{ |
cheyang
left a comment
There was a problem hiding this comment.
Thanks for implementing this, @xliuqq. The restructured tiered store API is a nice improvement over the previous indirection through MediumSource / VolumeMediumSource. Having ProcessMemory, EmptyDir, and HostPath directly on RuntimeTieredStoreLevel makes the config much more ergonomic.
I found one correctness bug and a few smaller issues that should be addressed before merging:
Bug: Mount path mismatch between pod spec and ConfigMap for ProcessMemory
In pkg/ddc/cache/engine/transform_tiered_store.go, handleProcessMemory mounts the memory-backed emptyDir at:
mountPath := GetEmptyDirTieredStoreMountPath(levelIndex)
// returns: /etc/fluid/mount/tiered-store/level-0-index-0-emptydirBut in pkg/ddc/cache/engine/cm.go, extractTieredStoreLevels reports to the runtime config:
levelConfig.MountPaths = []string{GetMemoryTieredStoreMountPath(levelIndex)}
// returns: /dev/shmThe runtime (e.g., Curvine) will be told its processMemory tier data dir is /dev/shm, but the actual dedicated volume is mounted at /etc/fluid/mount/tiered-store/level-0-index-0-emptydir. The runtime will end up writing to the default container /dev/shm (which has no dedicated quota), while the dedicated volume goes unused.
These two should be consistent. Either handleProcessMemory should mount at /dev/shm, or extractTieredStoreLevels should use GetEmptyDirTieredStoreMountPath(levelIndex).
Comment copy-paste error on Low field
In api/v1alpha1/cacheruntime_types.go, the doc comment for Low includes stray text from the HostPathMediumSource.Type field:
// Low is the ratio of low watermark of the tier (e.g., "0.7").
// Eviction will continue until cache usage falls below this ratio.
// Type specifies the type of hostPath volume. <-- wrong
// Defaults to empty string (no validation). <-- wrong
// More info: https://kubernetes.io/docs/concepts/storage/volumes#hostpath <-- wrongThe last three lines should be removed. The same error propagates to openapi_generated.go.
Documentation uses the OLD API format
docs/zh/userguide/cache_runtime_tieredstore.md shows examples like:
levels:
- medium:
processMemory: {}
path:
- /dev/shm
quota:
- 8GiBut the new API uses:
levels:
- processMemory:
quota: 8GiThe doc should be updated to match the new schema, or it will confuse users immediately on merge.
Minor / non-blocking observations:
-
Ephemeral volume support removed: The old API supported
Ephemeral *corev1.EphemeralVolumeSource. If this is intentional (simplification), it might be worth a brief note in the PR description or a follow-up issue for future support. -
No webhook/CEL validation for mutual exclusivity: The "only one medium per level" constraint is validated at reconcile time in
TransformRuntimeTieredStore, which is fine operationally, but a validating webhook or CEL rule would give users earlier feedback. Not blocking, but worth considering. -
HostPathMediumSource.Paths/Quotaslength mismatch: Same as above — validated at reconcile time but could benefit from a webhook rule.
Overall: The architecture is sound, the test coverage is solid (both unit and e2e), and CI is all green. Please fix the mount path mismatch bug and the doc/comment issues, and this should be good to go.
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5972 +/- ##
==========================================
+ Coverage 63.56% 63.67% +0.11%
==========================================
Files 479 480 +1
Lines 33275 33496 +221
==========================================
+ Hits 21150 21328 +178
- Misses 10445 10486 +41
- Partials 1680 1682 +2 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
cheyang
left a comment
There was a problem hiding this comment.
Re-review (sha 40cb1a0)
Previous feedback (mount path mismatch, doc API format, Low field comment) has been addressed -- nice cleanup. The API redesign flattening MediumSource + VolumeMediumSource into direct ProcessMemory/EmptyDir/HostPath fields on RuntimeTieredStoreLevel is a much better shape.
One correctness issue remains before this can merge:
[Bug] ProcessMemoryMediumSource deepcopy is stale
ProcessMemoryMediumSource now carries a Quota resource.Quantity field, but zz_generated.deepcopy.go was not regenerated. Compare:
// EmptyDirMediumSource -- correct
func (in *EmptyDirMediumSource) DeepCopyInto(out *EmptyDirMediumSource) {
*out = *in
out.Quota = in.Quota.DeepCopy() // <-- handles *inf.Dec pointer
}
// ProcessMemoryMediumSource -- stale, still the old empty-struct version
func (in *ProcessMemoryMediumSource) DeepCopyInto(out *ProcessMemoryMediumSource) {
*out = *in // <-- shallow copies resource.Quantity (shares *inf.Dec)
}The same inconsistency propagates to RuntimeTieredStoreLevel.DeepCopyInto, which uses **out = **in for ProcessMemory but (*in).DeepCopyInto(*out) for EmptyDir.
Fix: re-run controller-gen object paths="./api/..." (or the project's equivalent make target) so the generator emits the proper Quota.DeepCopy() call.
[Nit] Doc volume-name format doesn't match code
In docs/en/userguide/cache_runtime_tieredstore.md, the "Implementation Details" section states:
Volume name format:
tieredstore-level{N}(EmptyDir)
Volume name format:tieredstore-level{N}-path{M}(HostPath)
Actual names in code (transform_tiered_store.go):
- ProcessMemory:
tiered-store-level-{N}-memory - EmptyDir:
tiered-store-level-{N}-index-0 - HostPath:
tiered-store-level-{N}-index-{M}
Please align the doc or just remove the implementation-detail names since users shouldn't depend on them.
[Nit] Missing trailing newline
docs/en/userguide/cache_runtime_tieredstore.md ends without a final newline.
Everything else looks solid: the tiered-store transform logic, ConfigMap extraction, mount-path consistency between pod spec and config, validation of mutually-exclusive media per level, and test coverage are all in good order. Once the deepcopy is regenerated this should be ready.
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Re-review of PR #5972 (sha 19f8b3f)The API refactoring from However, the one blocking issue from the previous review remains unresolved: Blocking
Non-blocking
Once the deepcopy is regenerated, this is ready to merge. |
|
|
||
| // Process each tier level | ||
| for i, level := range tieredStore.Levels { | ||
| // order: memory, host path, empty. only one can be specified per level |
There was a problem hiding this comment.
Nit: the mediaCount validation ("only one storage medium can be specified per level") runs after the handlers have already modified the pod spec. If a user specifies both processMemory and emptyDir on the same level, volumes and resource adjustments from the first handler will have been applied before the error is returned. Consider validating the count before calling any handlers, or separate validation from mutation.
There was a problem hiding this comment.
Pod spec is a memory state, and it is newly created every time it is tuned. When an error is returned, the pod spec will not be actually modified.
This validation should be defined in validation webhook for CacheRuntime CRD.
Signed-off-by: xliuqq <xlzq1992@gmail.com>
|
failed link check was caused by other pr, not this pr. |
| SizeLimit: &emptyDirMediumSource.Quota, | ||
| }, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Nit: &emptyDirMediumSource.Quota shares a reference with the CacheRuntime API object's field. The handleProcessMemory handler correctly copies into a local variable first (totalQuota := memoryMediumSource.Quota). Consider making this consistent for defensive coding:
quota := emptyDirMediumSource.Quota.DeepCopy()
...
SizeLimit: "a,Not a functional issue today (the Quantity is never mutated between here and serialization), but inconsistent with the ProcessMemory handler's approach.
| func getSecretFilePath(secretName, secretKey string) string { | ||
| return fmt.Sprintf("%s/%s", getSecretMountPath(secretName), secretKey) | ||
| } | ||
|
|
There was a problem hiding this comment.
The levelIndex parameter is accepted but unused -- all processMemory tiers mount at /dev/shm. If a user accidentally configures two processMemory levels in the same component, both volumes try to mount at the same path, which Kubernetes will reject. Consider either validating at most one processMemory level per component, or incorporating the level index (e.g., /dev/shm/level-{N}).
Signed-off-by: xliuqq <xlzq1992@gmail.com>
|



Ⅰ. Describe what this PR does
As discussed in #5898, implements tiered level store for worker and client.
Ⅱ. Does this pull request fix one issue?
fixes #5898, and part of #5412
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
e2e test and unit-test
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews