install: reflink-capable filesystems with ostree pull through composefs first#2205
install: reflink-capable filesystems with ostree pull through composefs first#2205cgwalters wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a "composefs-first" import pipeline that allows synthesizing ostree commits directly from a composefs OCI repository using reflinks (FICLONE) to share disk blocks. This approach avoids tar round-trips and improves space efficiency on reflink-capable filesystems like XFS and btrfs. Key changes include a new pull_auto dispatcher, updated installation and upgrade logic to utilize the new pipeline, new fsck consistency checks for composefs-ostree alignment, and comprehensive integration tests. Review feedback identified a missing argument in a GIO stream constructor, a type mismatch in GVariant array creation, and potential fragility in internal CLI argument parsing for the cfsctl proxy.
0ecf7a0 to
4f3055d
Compare
9ea4e48 to
db2a36f
Compare
Ostree sets the immutable ext4 attribute on each deployment checkout directory, which causes lsetfilecon() to return EPERM during the final SELinux relabeling pass even though those files are already correctly labeled by the earlier composefs import pass. Rather than skipping the entire ostree/deploy subtree (which would leave stateroot metadata and var directories unlabeled), enumerate the actual checkout directories under ostree/deploy/<stateroot>/deploy/ and skip only those immutable roots by dev/ino. Also generalise ensure_dir_labeled_recurse to accept a slice of (dev, ino) pairs to skip rather than a single Option, so multiple checkout directories can be excluded in one pass. Assisted-by: OpenCode (Claude Sonnet 4.6) Signed-off-by: Colin Walters <walters@verbum.org>
BOOTC_filesystem was silently ignored when BOOTC_variant=ostree because install_args() only emitted --filesystem inside the composefs_backend block. bcvk's --filesystem flag is not composefs-specific (the cache hash includes filesystem type and bcvk creates a fresh base disk per filesystem), so the guard was wrong. Move --filesystem before the composefs_backend block so that e.g. BOOTC_filesystem=ext4 just test-tmt-nobuild readonly actually installs on ext4, exercising the reflink-probe fallback path. Assisted-by: OpenCode (claude-sonnet-4-6@default) Signed-off-by: Colin Walters <walters@verbum.org>
The repo disallows `str::len` (clippy::disallowed-methods) to avoid confusing byte length with character count. Rework the assertion to use `strip_prefix`, which both checks the prefix and gives us the remainder to assert is non-empty. Assisted-by: opencode (Claude Opus 4.8) Signed-off-by: Colin Walters <walters@verbum.org>
bootc already routes INFO tracing events to the journal (when run as
root), and a number of lifecycle events carry a `message_id` field with
a hardcoded 128-bit hex constant, intended to populate systemd's
well-known MESSAGE_ID so operators can query e.g.
`journalctl MESSAGE_ID=<id>` and hook the message catalog.
Two defects prevented this from working:
- tracing-journald prefixes all custom fields with `F_`, so the field
landed as `F_MESSAGE_ID` rather than the native MESSAGE_ID, and
`bootc.image.reference` as `F_BOOTC_IMAGE_REFERENCE`. Dropping the
prefix (with_field_prefix(None)) lets the layer's name sanitizer map
`message_id` -> MESSAGE_ID and `bootc.image.reference` ->
BOOTC_IMAGE_REFERENCE, while MESSAGE is left untouched.
- many of the id constants were 33 hex chars (132-bit), which systemd
rejects as a malformed MESSAGE_ID. Regenerate them as valid, unique
128-bit values (this also removes two accidental duplicates).
Prep for the unified-storage upgrade fix, which relies on these events
being usefully queryable in the journal.
Assisted-by: opencode (Claude Opus 4.8)
Signed-off-by: Colin Walters <walters@verbum.org>
The goal of unified storage is to make the same on-disk layer data simultaneously visible to containers-storage (for \`podman run\`), composefs (for the boot overlay), and ostree (for deployment tracking), using reflinks so layers are stored once regardless of how many stores reference them. This wires the full pipeline into the ostree backend. When a system has unified storage enabled — either at install time via the [install.storage] config key, or post-install via \`bootc image set-unified\` — upgrades and switches route through pull_via_composefs: Stage 1: pull image into bootc-owned containers-storage Stage 2: zero-copy reflink import into the composefs OCI repo Stage 3: synthesize an ostree commit from the composefs tree Whether unified storage is active is tracked by composefs/bootc.json (BootcRepoMeta). This replaces the previous heuristic that checked per-image presence in containers-storage, which broke when switching to a new image reference. The install config gains a storage.unified key with three values: disabled (default), enabled (fail if reflinks unavailable), and enabled-with-copy (copy fallback). This lets an image opt into unified storage without requiring a CLI flag to be threaded through every installer. bootc image list cross-references composefs tags against containers-storage by config digest to report images as unified (in all three stores) or partial (cstorage only, composefs import pending). bootc internals fsck images checks consistency and --repair restores cstorage from composefs when needed. The cstorage GC is extended to protect images that have composefs tags, since the composefs splitstreams reference the cstorage layer data — pruning one without the other would corrupt the repo. Assisted-by: OpenCode (claude-sonnet-4-6@default) Signed-off-by: Colin Walters <walters@verbum.org>
db2a36f to
d981e69
Compare
A spike of further work on #20 - basically, we control both ends of the composefs-rs and ostree sides, and if reflinks are enabled we can change things so we always pull into composefs first - making that the source of truth for image storage, and then just reflinking from there into ostree.