Skip to content

ACL COSI: handle shared BTRFS UUIDs and ESP space management#673

Draft
bfjelds wants to merge 22 commits into
mainfrom
user/bfjelds/mjolnir/acl-cosi-combined
Draft

ACL COSI: handle shared BTRFS UUIDs and ESP space management#673
bfjelds wants to merge 22 commits into
mainfrom
user/bfjelds/mjolnir/acl-cosi-combined

Conversation

@bfjelds
Copy link
Copy Markdown
Member

@bfjelds bfjelds commented Jun 4, 2026

Summary

Enables trident to successfully perform A/B updates on ACL when the COSI image shares BTRFS filesystem UUIDs with the currently active OS. Also adds pre-staging ESP cleanup to prevent "no space left on device" failures during repeated updates.

Related PRs

Combined Validation

https://dev.azure.com/mariner-org/ACL/_build/results?buildId=1132645

Changes

ACL BTRFS UUID handling

  • Skip duplicate FS UUID check for ACL — ACL uses PARTUUID-based boot (not GRUB search --fs-uuid), so shared UUIDs are safe
  • Validate ACL duplicate UUIDs via verity root hash — ensures the duplicate is genuinely the same image, not a different OS masquerading with the same UUID
  • Runtime BTRFS UUID bind-mount fallback — when kernel BTRFS rejects a mount due to duplicate UUIDs, bind-mount from the already-mounted filesystem instead
  • Verify verity root hash before bind-mount — safety check to confirm the active and target partitions have matching verity hashes

UKI enhancements

  • Activate correct verity addon for target A/B slot — selects the right verity.addon.efi based on which slot is being updated
  • Include UKI addons in findUkiEntries COSI metadata — ensures addon files are discovered during COSI parsing
  • Search UKI addon cmdlines for usrhash= parameter — extracts verity root hash from addon kernel cmdline

ESP space management

  • Pre-staging UKI cleanup — removes old target-slot UKIs before staging new ones, preventing ESP overflow on 128 MB partitions
  • Multi-OS scoping — cleanup matches exact slot+os-index (e.g. azla0) not just slot letter, safe for multiboot
  • Original UKI cleanup guarded by install_index == 0 — only the OS that placed the original UKI can remove it

Internal testing support

  • forceAbUpdate internal param — bypasses SHA384 identity check, allowing the same COSI to be applied repeatedly for A↔B cycle testing

Testing

  • Build verified clean on Linux (cargo check -p trident, cargo fmt)
  • acl-pipelines branch user/bfjelds/single-acl-build updated with 5-cycle A↔B test and ESP diagnostics
  • Pipeline validation in progress

bfjelds and others added 17 commits June 1, 2026 14:17
ACL images ship with PARTUUID-based verity addons — templates for both
A and B slots stored in acl/uki-addons/ on the ESP, with slot A active
by default. During an A/B update, trident must swap the active addon
to match the target slot so the new UKI boots with the correct verity
partition identity.

Add activate_verity_addon_for_target_volume() which:
- Checks for ACL verity addon templates on the image ESP
- Copies the correct slot template into the staged addon directory
- Is a silent no-op for non-ACL images (no template dir)
- Errors if template dir exists but the selected slot is missing

Called from copy_file_artifacts() after stage_uki_on_esp(), gated on
ctx.image_distro().is_acl() to ensure only ACL images are affected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ACL uses identical FS UUIDs across A/B slots by design — partitions
are distinguished by PARTUUID instead. The within-image uniqueness
check is unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Scan each UKI's .extra.d/ directory for *.addon.efi files and extract
their .cmdline PE sections. Addons are stored as a new field on the
boot entry so the COSI metadata captures the full effective cmdline
(main UKI + addons).

Both Go (mkcosi) and Rust (metadata deserialization) updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With PARTUUID-based verity addons, usrhash= moved from the main UKI
cmdline to the verity addon cmdline. Update extractUsrhashFromUKIEntries
to also search addon cmdlines when looking for the root hash.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When staging an A/B update on ACL (Azure Container Linux) UKI images,
the COSI image may share BTRFS filesystem UUIDs with the active OS.
BTRFS maintains a kernel-global UUID registry and refuses to mount a
filesystem whose UUID is already registered by another mounted device,
causing the staging verity device mount to fail.

This change detects the UUID collision by checking the well-known ACL
USR-A/USR-B partition UUIDs (by PARTUUID) before the mount loop. When
a collision is detected, it bind-mounts the active /usr into the
newroot instead of attempting to mount the staging verity device. This
is safe because:

- USR is verity-protected and read-only
- Matching UUIDs means identical filesystem content
- The chroot only reads from /usr during provisioning
- After reboot, initramfs sets up the correct verity device normally

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the bind-mount workaround activates for ACL BTRFS UUID collisions,
compare the staging USR verity root hash (from COSI metadata) against
the active USR root hash (from /proc/cmdline usrhash= parameter) to
cryptographically prove the filesystems are byte-identical.

If the staging hash is available but the active hash cannot be read or
does not match, the bind-mount is refused and the normal mount path
proceeds (which will fail with the BTRFS UUID error, as expected for
genuinely different content).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When internalParams.forceAbUpdate is true, trident will proceed with
an A/B update even when the old and new OS image SHA384 hashes match.
This is useful for testing A/B update flows repeatedly with the same
COSI file.

Usage in trident-config.yaml:
  internalParams:
    forceAbUpdate: true

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the blanket ACL skip in validate_filesystem_uniqueness() with
proper validation. When a duplicate FS UUID is found during A/B update
on ACL, the update is only allowed if:

1. The duplicate is on the /usr mount point
2. The staging COSI has a verity root hash
3. The active system has a usrhash= in /proc/cmdline
4. The normalized hashes match (merkle tree proof of identical content)

If COSI partition metadata is available, also validates that the staging
USR partition has a known ACL PARTUUID.

Extracts ACL constants and read_active_usr_roothash() into a shared
engine::acl module used by both osimage.rs and newroot.rs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DiscoverablePartitionType does not have is_acl_usr() — that method
lives on the HC PartitionType enum. Since we already check for known
ACL USR PARTUUIDs, the part_type check was redundant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The ESP (128 MB) can overflow when multiple UKIs accumulate across A/B
updates. Before staging a new UKI, remove old UKIs for the target slot:

1. Trident-managed UKIs matching the target slot (all install indices)
2. Non-trident-managed (original install) UKIs, but only when trident
   already manages the other slot (proving it owns boot management)

The other slot's UKI is always preserved as the active/rollback path.

Also extract UKI_SLOT_A/UKI_SLOT_B constants to replace string literals.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In multi-OS configurations, the ESP has UKI pairs per OS instance
(azla0/azlb0, azla1/azlb1, etc.). Cleanup must only remove UKIs for
the specific slot+os-index being updated, not all UKIs for the same
slot letter across different OS instances.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In multiboot configurations, the original UKI has OS 0's partition
references baked in. OS 1+ instances never depend on it, but only
OS 0 should remove it since it's the owner.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bfjelds and others added 5 commits June 4, 2026 11:04
Move /proc/cmdline read out of validate_acl_duplicate_uuid into its
caller (validate_filesystem_uniqueness). The function now accepts
active_usr_roothash as Option<String>, making it fully testable in
unit tests without filesystem access.

Add 7 unit tests covering all validation paths:
- matching hash (success)
- case-insensitive matching (success)
- wrong mount point (reject)
- no staging verity hash (reject)
- mismatched hashes (reject)
- no active hash / None (reject)
- empty active hash (reject)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DR-001 (High): Replace if-let with let-else for missing staging hash in
detect_acl_btrfs_uuid_collision - None now logs a warning and refuses
the bind-mount instead of silently proceeding unverified.

DR-002 (High): Replace suffix.contains() with exact suffix equality in
cleanup_ukis_before_staging - prevents azla0 from matching azla01.efi
in multiboot with 10+ OS instances.

DR-003 (Medium): Extract verity_hashes_match() into engine::acl module,
replacing duplicated normalize+compare logic in newroot.rs and osimage.rs.
Rejects empty hashes so "" == "" cannot incorrectly pass.

DR-004 (Medium): Document pre-staging cleanup ordering rationale in
esp.rs - explains the crash-safety trade-off (active slot UKI preserved
as A/B fallback).

DR-005 (Medium): Make remove_uki_and_addons idempotent by treating
NotFound as success - prevents orphaned addon dirs if UKI was already
removed by a prior partial cleanup.

DR-006 (Medium): Document that cleanup_ukis_before_staging is
intentionally universal (not ACL-gated) - ESP space constraints apply
to all UKI-based A/B updates.

DR-007 (Medium): Replace byte-index hash slicing with char-safe
hash_preview() using chars().take(16) - prevents panics on non-ASCII
input (defense in depth for hex hashes).

Adds unit tests for verity_hashes_match(), hash_preview(),
cleanup_ukis_before_staging (exact suffix matching, multi-index cleanup),
and remove_uki_and_addons (idempotency, addon directory cleanup).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bfjelds
Copy link
Copy Markdown
Member Author

bfjelds commented Jun 5, 2026

/azp run [GITHUB]-trident-pr-e2e

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant