Skip to content

Extend Cache Runtime: implements tiered level store for worker and client.#5972

Open
xliuqq wants to merge 7 commits into
fluid-cloudnative:masterfrom
xliuqq:tiered_store
Open

Extend Cache Runtime: implements tiered level store for worker and client.#5972
xliuqq wants to merge 7 commits into
fluid-cloudnative:masterfrom
xliuqq:tiered_store

Conversation

@xliuqq

@xliuqq xliuqq commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Ⅰ. Describe what this PR does

As discussed in #5898, implements tiered level store for worker and client.

Ⅱ. Does this pull request fix one issue?

fixes #5898, and part of #5412

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

e2e test and unit-test

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

@xliuqq xliuqq requested review from cheyang and Copilot June 10, 2026 15:37
@fluid-e2e-bot

fluid-e2e-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zwwhdls for approval by writing /assign @zwwhdls in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the tiered storage configuration in the CacheRuntime API, replacing the generic MediumSource with direct fields for ProcessMemory, EmptyDir, and HostPath, and implements the corresponding transformation logic in the cache engine. The review feedback identifies a critical path inconsistency for ProcessMemory between the engine transformation and ConfigMap generation, outdated examples in the newly added user guide documentation, copy-paste comment errors in the API types and utility functions, and suggests expanding unit tests to verify volume mounts for ProcessMemory.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/ddc/cache/engine/transform_tiered_store.go Outdated
@@ -0,0 +1,280 @@
# CacheRuntime TieredStore 配置示例

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This documentation provides examples using the old API structure for tieredStore, which has been removed in this PR. All examples should be updated to reflect the new API structure. For instance, medium, top-level path, and quota fields are no longer used. Instead, fields like processMemory, emptyDir, and hostPath should be used directly within a level.

Comment thread api/v1alpha1/cacheruntime_types.go Outdated
Comment thread pkg/common/cacheruntime.go Outdated
Comment on lines +87 to +116
It("should add memory resources to container", func() {
tieredStore := &datav1alpha1.RuntimeTieredStore{
Levels: []datav1alpha1.RuntimeTieredStoreLevel{
{
ProcessMemory: &datav1alpha1.ProcessMemoryMediumSource{
Quota: resource.MustParse("4Gi"),
},
High: "0.9",
Low: "0.7",
},
},
}

// Set initial memory resources
podSpec.Containers[0].Resources.Requests[corev1.ResourceMemory] = resource.MustParse("2Gi")
podSpec.Containers[0].Resources.Limits[corev1.ResourceMemory] = resource.MustParse("4Gi")

err := engine.TransformRuntimeTieredStore(tieredStore, podSpec)
Expect(err).NotTo(HaveOccurred())

// Verify memory request is increased
memRequest := podSpec.Containers[0].Resources.Requests[corev1.ResourceMemory]
expectedRequest := resource.MustParse("6Gi") // 2Gi + 4Gi
Expect(memRequest.Cmp(expectedRequest)).To(Equal(0))

// Verify memory limit is increased
memLimit := podSpec.Containers[0].Resources.Limits[corev1.ResourceMemory]
expectedLimit := resource.MustParse("8Gi") // 4Gi + 4Gi
Expect(memLimit.Cmp(expectedLimit)).To(Equal(0))
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test for ProcessMemory medium only verifies the memory resource changes. It should also verify that the correct volume and volume mount are created. Specifically, it should check that an emptyDir volume with StorageMediumMemory is created and mounted at /dev/shm to match the path configured in the runtime's configmap. This would have caught the path inconsistency bug between transform_tiered_store.go and cm.go.

Comment thread pkg/ddc/cache/engine/util.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds CacheRuntime tiered-store support by introducing a new tiered store transformer for worker/client pods, updating the CacheRuntime API/CRD schema, and wiring tiered store configuration into runtime ConfigMaps and e2e manifests.

Changes:

  • Add TransformRuntimeTieredStore to translate tiered-store config into Pod volumes/mounts and (for memory media) container memory requests/limits.
  • Evolve CacheRuntime tiered-store API/CRD schema and propagate tier configs into generated runtime config (ConfigMap) + Helm CRDs.
  • Update curvine e2e YAMLs and add Ginkgo/Gomega tests for tiered store transform behavior.

Reviewed changes

Copilot reviewed 15 out of 18 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/gha-e2e/curvine/minio.yaml Moves/introduces the MinIO secret alongside MinIO service for e2e.
test/gha-e2e/curvine/dataset.yaml Removes embedded secret now defined in minio.yaml.
test/gha-e2e/curvine/cacheruntimeclass.yaml Generates worker data_dir entries from tiered store config levels/mountPaths/quotas.
test/gha-e2e/curvine/cacheruntime.yaml Updates tiered store example config structure (emptyDir quota).
pkg/ddc/cache/engine/util.go Adds tiered-store mount path helpers.
pkg/ddc/cache/engine/transform_worker.go Wires tiered-store transform into worker pod transformation flow.
pkg/ddc/cache/engine/transform_client.go Wires tiered-store transform into client pod transformation flow.
pkg/ddc/cache/engine/transform_tiered_store.go Implements tiered-store -> podSpec transformation logic.
pkg/ddc/cache/engine/transform_tiered_store_test.go Adds unit tests covering tiered-store transformation behaviors and errors.
pkg/ddc/cache/engine/cm.go Extracts tiered-store levels into runtime component config for ConfigMap consumption.
pkg/common/cacheruntime.go Adds TieredStoreLevels config struct used in runtime config JSON.
api/v1alpha1/cacheruntime_types.go Reshapes tiered-store API types (processMemory/hostPath/emptyDir) and validations.
config/crd/bases/data.fluid.io_cacheruntimes.yaml Updates CRD schema to match new tiered-store level structure.
charts/fluid/fluid/crds/data.fluid.io_cacheruntimes.yaml Mirrors CRD schema update for Helm chart packaging.
docs/zh/userguide/cache_runtime_tieredstore.md Adds tiered-store user guide examples and behavior notes (currently mismatched with new API).
pkg/ddc/cache/engine/transform_client_test.go License header indentation normalization.
Files not reviewed (2)
  • api/v1alpha1/openapi_generated.go: Language not supported
  • api/v1alpha1/zz_generated.deepcopy.go: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +102 to +105
// add an memory emptyDir for /dev/shm in the container
volumeName := fmt.Sprintf("tiered-store-level-%d-memory", levelIndex)
mountPath := GetEmptyDirTieredStoreMountPath(levelIndex)
volume := corev1.Volume{
Comment thread api/v1alpha1/cacheruntime_types.go
Comment thread docs/zh/userguide/cache_runtime_tieredstore.md
Comment thread api/v1alpha1/cacheruntime_types.go
Comment thread test/gha-e2e/curvine/minio.yaml
Comment thread test/gha-e2e/curvine/cacheruntimeclass.yaml
Comment thread pkg/ddc/cache/engine/transform_tiered_store_test.go

@cheyang cheyang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this, @xliuqq. The restructured tiered store API is a nice improvement over the previous indirection through MediumSource / VolumeMediumSource. Having ProcessMemory, EmptyDir, and HostPath directly on RuntimeTieredStoreLevel makes the config much more ergonomic.

I found one correctness bug and a few smaller issues that should be addressed before merging:


Bug: Mount path mismatch between pod spec and ConfigMap for ProcessMemory

In pkg/ddc/cache/engine/transform_tiered_store.go, handleProcessMemory mounts the memory-backed emptyDir at:

mountPath := GetEmptyDirTieredStoreMountPath(levelIndex)
// returns: /etc/fluid/mount/tiered-store/level-0-index-0-emptydir

But in pkg/ddc/cache/engine/cm.go, extractTieredStoreLevels reports to the runtime config:

levelConfig.MountPaths = []string{GetMemoryTieredStoreMountPath(levelIndex)}
// returns: /dev/shm

The runtime (e.g., Curvine) will be told its processMemory tier data dir is /dev/shm, but the actual dedicated volume is mounted at /etc/fluid/mount/tiered-store/level-0-index-0-emptydir. The runtime will end up writing to the default container /dev/shm (which has no dedicated quota), while the dedicated volume goes unused.

These two should be consistent. Either handleProcessMemory should mount at /dev/shm, or extractTieredStoreLevels should use GetEmptyDirTieredStoreMountPath(levelIndex).


Comment copy-paste error on Low field

In api/v1alpha1/cacheruntime_types.go, the doc comment for Low includes stray text from the HostPathMediumSource.Type field:

// Low is the ratio of low watermark of the tier (e.g., "0.7").
// Eviction will continue until cache usage falls below this ratio.
// Type specifies the type of hostPath volume.           <-- wrong
// Defaults to empty string (no validation).             <-- wrong
// More info: https://kubernetes.io/docs/concepts/storage/volumes#hostpath  <-- wrong

The last three lines should be removed. The same error propagates to openapi_generated.go.


Documentation uses the OLD API format

docs/zh/userguide/cache_runtime_tieredstore.md shows examples like:

levels:
  - medium:
      processMemory: {}
    path:
      - /dev/shm
    quota:
      - 8Gi

But the new API uses:

levels:
  - processMemory:
      quota: 8Gi

The doc should be updated to match the new schema, or it will confuse users immediately on merge.


Minor / non-blocking observations:

  1. Ephemeral volume support removed: The old API supported Ephemeral *corev1.EphemeralVolumeSource. If this is intentional (simplification), it might be worth a brief note in the PR description or a follow-up issue for future support.

  2. No webhook/CEL validation for mutual exclusivity: The "only one medium per level" constraint is validated at reconcile time in TransformRuntimeTieredStore, which is fine operationally, but a validating webhook or CEL rule would give users earlier feedback. Not blocking, but worth considering.

  3. HostPathMediumSource.Paths/Quotas length mismatch: Same as above — validated at reconcile time but could benefit from a webhook rule.


Overall: The architecture is sound, the test coverage is solid (both unit and e2e), and CI is all green. Please fix the mount path mismatch bug and the doc/comment issues, and this should be good to go.

xliuqq added 2 commits June 11, 2026 20:09
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 80.71749% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.67%. Comparing base (35e26a8) to head (ba09a5d).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
pkg/ddc/cache/engine/cm.go 13.95% 37 Missing ⚠️
pkg/ddc/cache/engine/transform_client.go 40.00% 2 Missing and 1 partial ⚠️
pkg/ddc/cache/engine/transform_worker.go 40.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5972      +/-   ##
==========================================
+ Coverage   63.56%   63.67%   +0.11%     
==========================================
  Files         479      480       +1     
  Lines       33275    33496     +221     
==========================================
+ Hits        21150    21328     +178     
- Misses      10445    10486      +41     
- Partials     1680     1682       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xliuqq added 2 commits June 11, 2026 21:07
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>

@cheyang cheyang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (sha 40cb1a0)

Previous feedback (mount path mismatch, doc API format, Low field comment) has been addressed -- nice cleanup. The API redesign flattening MediumSource + VolumeMediumSource into direct ProcessMemory/EmptyDir/HostPath fields on RuntimeTieredStoreLevel is a much better shape.

One correctness issue remains before this can merge:


[Bug] ProcessMemoryMediumSource deepcopy is stale

ProcessMemoryMediumSource now carries a Quota resource.Quantity field, but zz_generated.deepcopy.go was not regenerated. Compare:

// EmptyDirMediumSource -- correct
func (in *EmptyDirMediumSource) DeepCopyInto(out *EmptyDirMediumSource) {
    *out = *in
    out.Quota = in.Quota.DeepCopy()   // <-- handles *inf.Dec pointer
}

// ProcessMemoryMediumSource -- stale, still the old empty-struct version
func (in *ProcessMemoryMediumSource) DeepCopyInto(out *ProcessMemoryMediumSource) {
    *out = *in   // <-- shallow copies resource.Quantity (shares *inf.Dec)
}

The same inconsistency propagates to RuntimeTieredStoreLevel.DeepCopyInto, which uses **out = **in for ProcessMemory but (*in).DeepCopyInto(*out) for EmptyDir.

Fix: re-run controller-gen object paths="./api/..." (or the project's equivalent make target) so the generator emits the proper Quota.DeepCopy() call.


[Nit] Doc volume-name format doesn't match code

In docs/en/userguide/cache_runtime_tieredstore.md, the "Implementation Details" section states:

Volume name format: tieredstore-level{N} (EmptyDir)
Volume name format: tieredstore-level{N}-path{M} (HostPath)

Actual names in code (transform_tiered_store.go):

  • ProcessMemory: tiered-store-level-{N}-memory
  • EmptyDir: tiered-store-level-{N}-index-0
  • HostPath: tiered-store-level-{N}-index-{M}

Please align the doc or just remove the implementation-detail names since users shouldn't depend on them.

[Nit] Missing trailing newline

docs/en/userguide/cache_runtime_tieredstore.md ends without a final newline.


Everything else looks solid: the tiered-store transform logic, ConfigMap extraction, mount-path consistency between pod spec and config, validation of mutually-exclusive media per level, and test coverage are all in good order. Once the deepcopy is regenerated this should be ready.

Signed-off-by: xliuqq <xlzq1992@gmail.com>
@cheyang

cheyang commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Re-review of PR #5972 (sha 19f8b3f)

The API refactoring from MediumSource/VolumeMediumSource to direct ProcessMemory/EmptyDir/HostPath fields on RuntimeTieredStoreLevel is a clear improvement. The tiered-store transform logic, ConfigMap extraction, mount-path consistency, and test coverage are well-structured.

However, the one blocking issue from the previous review remains unresolved:

Blocking

  • ProcessMemoryMediumSource deepcopy is still stale. The struct now carries a Quota resource.Quantity field, but zz_generated.deepcopy.go was not regenerated. DeepCopyInto does a bare *out = *in, sharing the internal *inf.Dec pointer between copies. This same issue was flagged in the prior review. Please re-run controller-gen object paths="./api/..." to regenerate.

Non-blocking

  • strings.ReplaceAll "i" "B" in the curvine e2e template is a fragile unit conversion (replaces all occurrences of the letter i). A suffix-aware mapping would be safer.
  • Both English and Chinese tieredstore doc files are missing a final trailing newline.
  • extractTieredStoreLevels in cm.go lacks dedicated unit tests.
  • Mutually-exclusive media validation in TransformRuntimeTieredStore runs after handlers have already modified the pod spec; consider validating before mutating.

Once the deepcopy is regenerated, this is ready to merge.

Comment thread api/v1alpha1/zz_generated.deepcopy.go Outdated
Comment thread test/gha-e2e/curvine/cacheruntimeclass.yaml
Comment thread docs/en/userguide/cache_runtime_tieredstore.md Outdated

// Process each tier level
for i, level := range tieredStore.Levels {
// order: memory, host path, empty. only one can be specified per level

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the mediaCount validation ("only one storage medium can be specified per level") runs after the handlers have already modified the pod spec. If a user specifies both processMemory and emptyDir on the same level, volumes and resource adjustments from the first handler will have been applied before the error is returned. Consider validating the count before calling any handlers, or separate validation from mutation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pod spec is a memory state, and it is newly created every time it is tuned. When an error is returned, the pod spec will not be actually modified.

This validation should be defined in validation webhook for CacheRuntime CRD.

Signed-off-by: xliuqq <xlzq1992@gmail.com>
@xliuqq xliuqq requested a review from cheyang June 14, 2026 09:34
@xliuqq

xliuqq commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator Author

failed link check was caused by other pr, not this pr.

SizeLimit: &emptyDirMediumSource.Quota,
},
},
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: &emptyDirMediumSource.Quota shares a reference with the CacheRuntime API object's field. The handleProcessMemory handler correctly copies into a local variable first (totalQuota := memoryMediumSource.Quota). Consider making this consistent for defensive coding:

quota := emptyDirMediumSource.Quota.DeepCopy()
...
SizeLimit: &quota,

Not a functional issue today (the Quantity is never mutated between here and serialization), but inconsistent with the ProcessMemory handler's approach.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

func getSecretFilePath(secretName, secretKey string) string {
return fmt.Sprintf("%s/%s", getSecretMountPath(secretName), secretKey)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The levelIndex parameter is accepted but unused -- all processMemory tiers mount at /dev/shm. If a user accidentally configures two processMemory levels in the same component, both volumes try to mount at the same path, which Kubernetes will reject. Consider either validating at most one processMemory level per component, or incorporating the level index (e.g., /dev/shm/level-{N}).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Signed-off-by: xliuqq <xlzq1992@gmail.com>
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURES] optimize RuntimeTieredStore struct for CacheRuntime

3 participants