Skip to content

[1.14] Implement virtio discard and write zeroes#13

Merged
kalyazin merged 19 commits into
e2b-dev:firecracker-v1.14-direct-memfrom
kalyazin:feat/virtio-blk-discard-wz-1.14
May 12, 2026
Merged

[1.14] Implement virtio discard and write zeroes#13
kalyazin merged 19 commits into
e2b-dev:firecracker-v1.14-direct-memfrom
kalyazin:feat/virtio-blk-discard-wz-1.14

Conversation

@kalyazin
Copy link
Copy Markdown

@kalyazin kalyazin commented May 8, 2026

Changes

  • implement VIRTIO_BLK_F_DISCARD
  • implement VIRTIO_BLK_F_WRITE_ZEROES

The feature bits are advertised by Firecracker for rw block devices via calling:

  • discard -> fallocate(ALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
  • write zeroes (unmap) -> fallocate(ALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
  • write zeroes (!unmap) -> fallocate(FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE)

If the underlying host filesystem doesn't support those (EOPNOTSUPP returned), Firecracker caches the knowledge and does not attempt to run them again.

No new Firecracker API is introduced (no opt-in is required to make use of the feature).

Note: snapshot rebuild is required to make use of the feature as the feature bits are negotiated at the time the block device is initialised (usually on boot).

Reason

Reduce host memory consumed by the drive.

@cla-bot cla-bot Bot added the cla-signed label May 8, 2026
Copy link
Copy Markdown
Collaborator

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor comments

Comment thread src/vmm/src/devices/virtio/block/virtio/io/sync_io.rs Outdated
Comment thread src/vmm/src/devices/virtio/block/virtio/device.rs
Comment thread src/vmm/src/devices/virtio/block/virtio/io/sync_io.rs Outdated
@kalyazin kalyazin force-pushed the feat/virtio-blk-discard-wz-1.14 branch from d06e144 to 9ba1023 Compare May 11, 2026 15:12
@cursor
Copy link
Copy Markdown

cursor Bot commented May 11, 2026

PR Summary

High Risk
High risk because it changes core virtio-blk feature negotiation, config-space layout, and I/O execution paths (sync + io_uring) and adds a new allowed syscall (fallocate), which can affect guest-visible behavior and snapshot/restore compatibility.

Overview
Adds virtio-blk DISCARD and WRITE_ZEROES support for writable drives by advertising VIRTIO_BLK_F_DISCARD/VIRTIO_BLK_F_WRITE_ZEROES, populating the related config-space fields, and executing requests via fallocate (hole punching and zero-range).

Introduces per-disk caching of EOPNOTSUPP to disable these operations after the first unsupported attempt, wires this through both sync and io_uring engines (including a new IORING_OP_FALLOCATE operation), and updates snapshot restore/hot-swap to refresh the derived config fields and reset the caches.

Expands metrics (discard_count, write_zeroes_count), updates seccomp allowlists to permit fallocate, adds documentation pages, and adds unit/integration tests covering feature advertisement, request validation, and snapshot/restore behavior.

Reviewed by Cursor Bugbot for commit 8a2503f. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread src/vmm/src/devices/virtio/block/virtio/device.rs
Copy link
Copy Markdown
Collaborator

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the cursor comment addressed 🚢

@kalyazin kalyazin force-pushed the feat/virtio-blk-discard-wz-1.14 branch from 9ba1023 to 1d5b429 Compare May 11, 2026 16:12
Comment thread src/vmm/src/devices/virtio/block/virtio/device.rs Outdated
The virtio-blk spec defines a 56-byte config space with fields for
geometry, topology, writeback, and discard at fixed offsets. The
previous struct only held `capacity` (8 bytes), so the driver
never saw the fields beyond offset 8.

The full layout is required to advertise VIRTIO_BLK_F_DISCARD: the
spec mandates that `max_discard_sectors` (offset 36) and
`max_discard_seg` (offset 40) be set before the feature can be used
by the guest. Without the full struct, there is nowhere to write
those values. All new fields default to zero, so there is no
behavioral change in this commit.

A compile-time size assertion
(`assert!(size_of::<ConfigSpace>() == 56)`) guards against future
accidental padding changes. Tests are updated to use
`..Default::default()` for forward-compatible ConfigSpace init.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
@kalyazin kalyazin force-pushed the feat/virtio-blk-discard-wz-1.14 branch from 1d5b429 to 3878cac Compare May 11, 2026 16:47
Comment thread src/vmm/src/devices/virtio/block/virtio/device.rs
@kalyazin kalyazin force-pushed the feat/virtio-blk-discard-wz-1.14 branch from 3878cac to 6207bc8 Compare May 11, 2026 17:08
Comment thread src/vmm/src/devices/virtio/block/virtio/device.rs
@kalyazin kalyazin force-pushed the feat/virtio-blk-discard-wz-1.14 branch from 6207bc8 to eeda405 Compare May 11, 2026 19:59
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit eeda405. Configure here.

Comment thread src/vmm/src/devices/virtio/block/virtio/request.rs
kalyazin added 15 commits May 11, 2026 21:18
The virtio-blk spec (section 5.2.6.14) defines VIRTIO_BLK_T_DISCARD
(type 11) as a request the guest driver issues to free sectors it no
longer needs. To handle it, the VMM needs three things introduced here:

- `RequestType::Discard` so the dispatcher can route the request type.
- `DiscardWriteZeroes` (16 bytes, ByteValued) matching the layout of a
  single discard segment that the guest places in the data descriptor.
- `DISCARD_SEGMENT_SIZE` constant (16) used in parse() to validate that
  the data buffer length is a non-zero multiple of the segment size.
- `IoErr::InvalidFlags` / `IoErr::InvalidOffset` for per-segment
  validation errors added in the processing step.

Placeholder arms in `finish()` and `process()` keep the build green;
they are replaced with the real implementation in the next commit.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Add SyncIoError::Discard, SyncFileEngine::discard() and
AsyncFileEngine::discard() (synchronous fallback — discard is not
latency-critical and IORING_OP_FALLOCATE is not in the io_uring
allowlist). Wire up FileEngine::discard() dispatching to both engines.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Fix 2-space indented getrandom entries (should be 4 spaces like all
other entries) and remove trailing whitespace from madvise comment.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
VIRTIO_BLK_F_DISCARD issues fallocate(FALLOC_FL_PUNCH_HOLE) on the
backing file. Add fallocate to the vmm thread seccomp filter on both
x86_64 and aarch64 so the syscall is not blocked at runtime.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Add parse() validation: discard data descriptor must be read-only and
data_len must be a non-zero multiple of DISCARD_SEGMENT_SIZE (16 bytes).

Add process_discard_segments(): validates flags==0, num_sectors>0, and
sector bounds per segment, then calls FileEngine::discard() using
fallocate(FALLOC_FL_PUNCH_HOLE). Returns VIRTIO_BLK_S_IOERR on any
validation or syscall failure; VIRTIO_BLK_S_OK on success.

Add discard_count metric incremented on each successful discard.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Populate max_discard_sectors and max_discard_seg in restore() to match
what new() produces for the expanded ConfigSpace. No version bump needed
— main already carried 9.0.0 → 10.0.0 in this release cycle.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Set VIRTIO_BLK_F_DISCARD in avail_features for all writable block
devices. Populate max_discard_sectors (capped to disk size) and
max_discard_seg=1 in the config space. Read-only devices advertise
neither the feature nor non-zero discard config fields.

This is a behavioral change for existing writable block device
configurations: guests will now see and may negotiate the discard
feature. See docs/api_requests/block-discard.md for host filesystem
requirements and recommendations.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Explain how discard works, host requirements, limitations, and snapshot
compatibility impact.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Test feature flag advertisement, config space discard fields,
and end-to-end discard request processing (success and error paths).
Update test_virtio_features and test_virtio_read_config to account
for the new VIRTIO_BLK_F_DISCARD bit and config space fields.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Two tests:
- test_discard: boot VM with writable ext4 disk, write+delete data,
  run fstrim, verify rc=0 and discard_count metric increments.
- test_discard_not_advertised_for_read_only: verify read-only devices
  report discard_max_bytes=0 in sysfs (feature not advertised).
Both parametrized over Sync and Async IO engines.
Also add discard_count to the fcmetrics block_metrics list.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Replace the 8-byte _unused1 placeholder at offset 48 with the virtio-blk
write-zeroes fields and the required trailing padding:
  - max_write_zeroes_sectors  (u32,    offset 48)
  - max_write_zeroes_seg      (u32,    offset 52)
  - write_zeroes_may_unmap    (u8,     offset 56)
  - _unused1                  (u8; 3,  offset 57) — spec's unused1
  - _pad                      (u8; 4,  offset 60) — Rust alignment padding;
                                                     the virtio-blk spec
                                                     ends at byte 60.

Bump the compile-time size assertion to 64 and add per-field
offset_of! assertions guarding every field's position against future
accidental layout drift. The offsets mirror the Linux kernel header
include/uapi/linux/virtio_blk.h::struct virtio_blk_config exactly.

Fields default to zero; no behavioural change. They will be populated
when VIRTIO_BLK_F_WRITE_ZEROES is wired up.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Add the RequestType::WriteZeroes variant and From<u32> mapping for
VIRTIO_BLK_T_WRITE_ZEROES. Add IoErr::WriteZeroesUnsupported and
Status::WriteZeroesUnsupported (parallel to the discard equivalents)
that return VIRTIO_BLK_S_UNSUPP silently when the host filesystem
doesn't support the operation. Add a write_zeroes_unsupported
EOPNOTSUPP cache on DiskProperties (not persisted).

Add placeholder arms in PendingRequest::finish() and Request::process()
so the code compiles. Both currently return WriteZeroesUnsupported;
they will be replaced when the full feature is wired up.

Update the test-only From<RequestType> for u32 and request_type_flags()
helpers to handle the new variant.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Add write_zeroes(offset, len, unmap, req) on FileEngine plus the
corresponding SyncFileEngine::write_zeroes() and
AsyncFileEngine::push_write_zeroes() implementations.

The unmap flag selects the fallocate mode at call time:
  - unmap=true:  FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE (mode 3)
  - unmap=false: FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE (mode 17)

The async path uses io_uring's IORING_OP_FALLOCATE (already in the
allowlist from the discard work). The sync path calls libc::fallocate.

Add SyncIoError::WriteZeroes and extend BlockIoError::is_eopnotsupp()
to also match SyncIoError::WriteZeroes so the EOPNOTSUPP fallback
machinery (added later) catches both modes.

No seccomp / io_uring restriction changes are required: fallocate is
already in the seccomp filter and IORING_OP_FALLOCATE is already
allowed via the discard work.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Wire up Request::parse() validation and Request::process() handling
for write-zeroes requests, mirroring the discard path:

  - parse(): reject write-only data descriptors and require data_len
    to be a non-zero multiple of DISCARD_SEGMENT_SIZE (the
    DiscardWriteZeroes struct is shared between both ops).
  - parse_write_zeroes_segment(): validate flags (only bit 0,
    VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP, is permitted), reject
    num_sectors=0, check sector range, and return the unmap flag
    so process() can pick the fallocate mode.
  - process(): short-circuit to VIRTIO_BLK_S_UNSUPP if a previous
    EOPNOTSUPP set disk.write_zeroes_unsupported; otherwise dispatch
    to FileEngine::write_zeroes().
  - the (Ok(_), RequestType::WriteZeroes) finish() arm now increments
    the write_zeroes_count metric and returns VIRTIO_BLK_S_OK.
  - the EOPNOTSUPP detection in the error path also handles
    RequestType::WriteZeroes, setting the cache and returning
    Status::WriteZeroesUnsupported (silent UNSUPP).

Add the write_zeroes_count metric and aggregate() line.

The feature flag is not yet advertised — that lands in the next
commit, after the request handling is fully in place.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
…vices

Set VIRTIO_BLK_F_WRITE_ZEROES in avail_features for any non-read-only
block device, alongside VIRTIO_BLK_F_DISCARD. Populate the
max_write_zeroes_sectors, max_write_zeroes_seg, and
write_zeroes_may_unmap config fields in both VirtioBlock::new() and
the persist::restore() path so the values match what the guest sees
on a fresh boot vs after a snapshot restore.

write_zeroes_may_unmap=1 lets the guest set the UNMAP flag on
individual segments, which we then translate to fallocate's
PUNCH_HOLE mode (UNMAP=0 uses ZERO_RANGE).

Update the test_virtio_features and test_virtio_read_config
expectations to account for the new feature bit and config fields.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
kalyazin added 3 commits May 11, 2026 21:19
Add three tests parametrised over Sync and Async file engines:

  - test_write_zeroes_feature_and_config: VIRTIO_BLK_F_WRITE_ZEROES
    advertised on writable devices; max_write_zeroes_sectors,
    max_write_zeroes_seg, and write_zeroes_may_unmap populated as
    expected; read-only devices advertise none of these.
  - test_write_zeroes: end-to-end happy path (UNMAP=1 → PUNCH_HOLE,
    metric incremented, VIRTIO_BLK_S_OK) plus four error cases
    (reserved flag bit set, sector range out of bounds, num_sectors=0,
    invalid data length).
  - test_write_zeroes_unsupported_cached: pre-set the cache flag and
    verify subsequent requests return VIRTIO_BLK_S_UNSUPP without
    incrementing the success or invalid-request counters.

The UNMAP=0 (ZERO_RANGE) happy path is not asserted in the unit
tests because ZERO_RANGE support is filesystem-dependent (the host's
tmpfs in some kernels returns EOPNOTSUPP); end-to-end UNMAP=0
correctness is left to the integration tests running on ext4. The
unit tests cover wire-flag handling for UNMAP=0 via the
invalid-flags and zero-sectors cases.

Also extend test_request_type_from to assert the new
VIRTIO_BLK_T_WRITE_ZEROES → RequestType::WriteZeroes mapping.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Two tests, parametrised over Sync and Async IO engines:

  - test_write_zeroes: boot VM with a writable block device, write
    random data to a 1 MiB region of /dev/vdb, run `blkdiscard -z`
    (BLKZEROOUT ioctl), then verify:
      * sysfs /queue/write_zeroes_max_bytes is non-zero (feature
        negotiated)
      * the region reads back as zeros after the operation
      * no requests failed
  - test_write_zeroes_not_advertised_for_read_only: read-only device
    reports write_zeroes_max_bytes=0 in sysfs.

The test deliberately does not assert that the write_zeroes_count
metric incremented. The Linux kernel's blkdev_issue_zeroout() may
legitimately fall back to issuing plain zero-page writes
(REQ_OP_WRITE) instead of REQ_OP_WRITE_ZEROES even when the feature
is advertised, depending on internal heuristics. Both paths leave
the device reading as zeros, so a metric assertion would be flaky in
CI without indicating any actual bug. Direct WRITE_ZEROES
request-handling correctness is covered by the unit tests.

Use cmp -n /dev/vdb /dev/zero for the zero check (od collapses
repeated lines with *, which makes naive byte-comparison fragile).

Add write_zeroes_count to host_tools/fcmetrics.py so the metrics
schema validation passes.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Add docs/api_requests/block-write-zeroes.md describing:
  - automatic advertisement on writable devices
  - UNMAP=0 → FALLOC_FL_ZERO_RANGE (zeros in place)
  - UNMAP=1 → FALLOC_FL_PUNCH_HOLE (zeros + deallocate)
  - host filesystem requirements
  - EOPNOTSUPP fallback (silent VIRTIO_BLK_S_UNSUPP, shared cache)
  - known limitations

Remove the "write_zeroes is not supported" line from block-discard.md
now that the feature is implemented.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
@kalyazin kalyazin force-pushed the feat/virtio-blk-discard-wz-1.14 branch from eeda405 to 8a2503f Compare May 11, 2026 20:24
@kalyazin kalyazin merged commit ab2399e into e2b-dev:firecracker-v1.14-direct-mem May 12, 2026
4 checks passed
@kalyazin kalyazin deleted the feat/virtio-blk-discard-wz-1.14 branch May 12, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants