Skip to content

Add sharding.read.* coalescing runtime config options#3987

Open
aldenks wants to merge 5 commits into
zarr-developers:mainfrom
aldenks:sharding-coalesce-config-options
Open

Add sharding.read.* coalescing runtime config options#3987
aldenks wants to merge 5 commits into
zarr-developers:mainfrom
aldenks:sharding-coalesce-config-options

Conversation

@aldenks
Copy link
Copy Markdown
Contributor

@aldenks aldenks commented May 20, 2026

Follow up #3004 by adding runtime configuration options for the thresholds that control how requests are coalesced when reading in the sharding codec.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label May 20, 2026
@github-actions github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label May 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.35%. Comparing base (27abff2) to head (3012535).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3987      +/-   ##
==========================================
- Coverage   93.39%   93.35%   -0.05%     
==========================================
  Files          88       88              
  Lines       11839    11840       +1     
==========================================
- Hits        11057    11053       -4     
- Misses        782      787       +5     
Files with missing lines Coverage Δ
src/zarr/codecs/sharding.py 91.96% <100.00%> (+0.02%) ⬆️
src/zarr/core/config.py 100.00% <ø> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Copy Markdown
Contributor

d-v-b commented May 20, 2026

disclaimer: I'm not a big fan of our global config object, so I'd like to explore some alternative ways for the sharding reads to access this configuration.

A few options:

  • new attributes on the sharding codec
    I'm not a big fan of this, because declaring the sharding codec explicitly from create_array is tedious, and also because we want to move away from the codecs knowing too much about IO operations.
  • new fields on ArrayConfig
    probably the best option. we still use the global config for setting the defaults, but the sharding codec gets these parameters from the array config object, which is tied to the array, not a mutable global.

@aldenks
Copy link
Copy Markdown
Contributor Author

aldenks commented May 20, 2026

@d-v-b I also like new fields on ArrayConfig. Thinking that through:

  • This allows you to set coalesce options differently per array
  • If you only use global configs, whatever global setting you have at the time of array open is what is used for the life of that array.
  • The global config field becomes something like array.sharding_coalesce_max_gap_bytes and array.sharding_coalesce_max_bytes matching the ArrayConfig convention of pulling from a singly nested field under array..
  • For the time being, if you're interacting with zarr via xarray then you still can only set these via the global config but that's a pre-existing inability to specify the ArrayConfig via when opening via xarray.

That sound alright?

@d-v-b
Copy link
Copy Markdown
Contributor

d-v-b commented May 21, 2026

@d-v-b I also like new fields on ArrayConfig. Thinking that through:

* This allows you to set coalesce options differently per array

* If you only use global configs, whatever global setting you have _at the time of array open_ is what is used for the life of that array.

* The global config field becomes something like `array.sharding_coalesce_max_gap_bytes` and `array.sharding_coalesce_max_bytes` matching the ArrayConfig convention of pulling from a singly nested field under `array.`.

* For the time being, if you're interacting with zarr via xarray then you still can only set these via the global config but that's a pre-existing inability to specify the ArrayConfig via when opening via xarray.

That sound alright?

yeah, that sounds right. the array config object is designed to make it easy to get a cheap copy of an array with a new config, using the with_config method. Unfortunately, xarray makes it very hard to use this, because xarray doesn't give access to the base zarr array. So until xarray adds a zarr array-config-aware API, the global config is the only knob xarray users have, without re-creating the dataarray entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants