Skip to content

Commit e97f3d9

Browse files
authored
Merge pull request #24721 from dvdksn/worktree-agent-a6949950
Fix incorrect page fault explanation in runmetrics docs
2 parents 21f548f + 6343856 commit e97f3d9

File tree

1 file changed

+55
-57
lines changed

1 file changed

+55
-57
lines changed

content/manuals/engine/containers/runmetrics.md

Lines changed: 55 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -200,74 +200,72 @@ indicates the number of page faults since the creation of the cgroup.
200200

201201
`cache`
202202
: The amount of memory used by the processes of this control group that can be
203-
associated precisely with a block on a block device. When you read from and
204-
write to files on disk, this amount increases. This is the case if you use
205-
"conventional" I/O (`open`, `read`, `write` syscalls) as well as mapped files
206-
(with `mmap`). It also accounts for the memory used by `tmpfs` mounts, though
207-
the reasons are unclear.
203+
associated precisely with a block on a block device. When you read from and
204+
write to files on disk, this amount increases. This is the case if you use
205+
"conventional" I/O (`open`, `read`, `write` syscalls) as well as mapped files
206+
(with `mmap`). It also accounts for the memory used by `tmpfs` mounts, though
207+
the reasons are unclear.
208208

209209
`rss`
210210
: The amount of memory that doesn't correspond to anything on disk: stacks,
211-
heaps, and anonymous memory maps.
211+
heaps, and anonymous memory maps.
212212

213213
`mapped_file`
214214
: Indicates the amount of memory mapped by the processes in the control group.
215-
It doesn't give you information about how much memory is used; it rather
216-
tells you how it's used.
215+
It doesn't give you information about how much memory is used; it rather
216+
tells you how it's used.
217217

218218
`pgfault`, `pgmajfault`
219219
: Indicate the number of times that a process of the cgroup triggered a "page
220-
fault" and a "major fault", respectively. A page fault happens when a process
221-
accesses a part of its virtual memory space which is nonexistent or protected.
222-
The former can happen if the process is buggy and tries to access an invalid
223-
address (it is sent a `SIGSEGV` signal, typically killing it with the famous
224-
`Segmentation fault` message). The latter can happen when the process reads
225-
from a memory zone which has been swapped out, or which corresponds to a mapped
226-
file: in that case, the kernel loads the page from disk, and let the CPU
227-
complete the memory access. It can also happen when the process writes to a
228-
copy-on-write memory zone: likewise, the kernel preempts the process, duplicate
229-
the memory page, and resume the write operation on the process's own copy of
230-
the page. "Major" faults happen when the kernel actually needs to read the data
231-
from disk. When it just duplicates an existing page, or allocate an empty page,
232-
it's a regular (or "minor") fault.
220+
fault" and a "major fault", respectively. A page fault happens when a process
221+
accesses a virtual memory page that is not currently mapped to a physical
222+
memory frame. This is a normal part of memory management. For example, a page
223+
fault occurs when the process reads from a memory zone that has been swapped
224+
out, or that corresponds to a memory-mapped file: in that case, the kernel
225+
loads the page from disk and lets the CPU complete the memory access. It also
226+
happens when the process writes to a copy-on-write memory zone: the kernel
227+
duplicates the memory page and resumes the write operation on the process's
228+
own copy of the page. "Major" faults happen when the kernel needs to read
229+
data from disk. When it duplicates an existing page, or allocates an empty
230+
page, it's a regular (or "minor") fault.
233231

234232
`swap`
235233
: The amount of swap currently used by the processes in this cgroup.
236234

237235
`active_anon`, `inactive_anon`
238236
: The amount of anonymous memory that has been identified has respectively
239-
_active_ and _inactive_ by the kernel. "Anonymous" memory is the memory that is
240-
_not_ linked to disk pages. In other words, that's the equivalent of the rss
241-
counter described above. In fact, the very definition of the rss counter is
242-
`active_anon` + `inactive_anon` - `tmpfs` (where tmpfs is the amount of
243-
memory used up by `tmpfs` filesystems mounted by this control group). Now,
244-
what's the difference between "active" and "inactive"? Pages are initially
245-
"active"; and at regular intervals, the kernel sweeps over the memory, and tags
246-
some pages as "inactive". Whenever they're accessed again, they're
247-
immediately re-tagged "active". When the kernel is almost out of memory, and
248-
time comes to swap out to disk, the kernel swaps "inactive" pages.
237+
_active_ and _inactive_ by the kernel. "Anonymous" memory is the memory that is
238+
_not_ linked to disk pages. In other words, that's the equivalent of the rss
239+
counter described above. In fact, the very definition of the rss counter is
240+
`active_anon` + `inactive_anon` - `tmpfs` (where tmpfs is the amount of
241+
memory used up by `tmpfs` filesystems mounted by this control group). Now,
242+
what's the difference between "active" and "inactive"? Pages are initially
243+
"active"; and at regular intervals, the kernel sweeps over the memory, and tags
244+
some pages as "inactive". Whenever they're accessed again, they're
245+
immediately re-tagged "active". When the kernel is almost out of memory, and
246+
time comes to swap out to disk, the kernel swaps "inactive" pages.
249247

250248
`active_file`, `inactive_file`
251249
: Cache memory, with _active_ and _inactive_ similar to the _anon_ memory
252-
above. The exact formula is `cache` = `active_file` + `inactive_file` +
253-
`tmpfs`. The exact rules used by the kernel to move memory pages between
254-
active and inactive sets are different from the ones used for anonymous memory,
255-
but the general principle is the same. When the kernel needs to reclaim memory,
256-
it's cheaper to reclaim a clean (=non modified) page from this pool, since it
257-
can be reclaimed immediately (while anonymous pages and dirty/modified pages
258-
need to be written to disk first).
250+
above. The exact formula is `cache` = `active_file` + `inactive_file` +
251+
`tmpfs`. The exact rules used by the kernel to move memory pages between
252+
active and inactive sets are different from the ones used for anonymous memory,
253+
but the general principle is the same. When the kernel needs to reclaim memory,
254+
it's cheaper to reclaim a clean (=non modified) page from this pool, since it
255+
can be reclaimed immediately (while anonymous pages and dirty/modified pages
256+
need to be written to disk first).
259257

260258
`unevictable`
261259
: The amount of memory that cannot be reclaimed; generally, it accounts for
262-
memory that has been "locked" with `mlock`. It's often used by crypto
263-
frameworks to make sure that secret keys and other sensitive material never
264-
gets swapped out to disk.
260+
memory that has been "locked" with `mlock`. It's often used by crypto
261+
frameworks to make sure that secret keys and other sensitive material never
262+
gets swapped out to disk.
265263

266264
`memory_limit`, `memsw_limit`
267265
: These aren't really metrics, but a reminder of the limits applied to this
268-
cgroup. The first one indicates the maximum amount of physical memory that can
269-
be used by the processes of this control group; the second one indicates the
270-
maximum amount of RAM+swap.
266+
cgroup. The first one indicates the maximum amount of physical memory that can
267+
be used by the processes of this control group; the second one indicates the
268+
maximum amount of RAM+swap.
271269

272270
Accounting for memory in the page cache is very complex. If two
273271
processes in different control groups both read the same file
@@ -309,28 +307,28 @@ relevant ones:
309307

310308
`blkio.sectors`
311309
: Contains the number of 512-bytes sectors read and written by the processes
312-
member of the cgroup, device by device. Reads and writes are merged in a single
313-
counter.
310+
member of the cgroup, device by device. Reads and writes are merged in a single
311+
counter.
314312

315313
`blkio.io_service_bytes`
316314
: Indicates the number of bytes read and written by the cgroup. It has 4
317-
counters per device, because for each device, it differentiates between
318-
synchronous vs. asynchronous I/O, and reads vs. writes.
315+
counters per device, because for each device, it differentiates between
316+
synchronous vs. asynchronous I/O, and reads vs. writes.
319317

320318
`blkio.io_serviced`
321319
: The number of I/O operations performed, regardless of their size. It also has
322-
4 counters per device.
320+
4 counters per device.
323321

324322
`blkio.io_queued`
325323
: Indicates the number of I/O operations currently queued for this cgroup. In
326-
other words, if the cgroup isn't doing any I/O, this is zero. The opposite is
327-
not true. In other words, if there is no I/O queued, it doesn't mean that the
328-
cgroup is idle (I/O-wise). It could be doing purely synchronous reads on an
329-
otherwise quiescent device, which can therefore handle them immediately,
330-
without queuing. Also, while it's helpful to figure out which cgroup is
331-
putting stress on the I/O subsystem, keep in mind that it's a relative
332-
quantity. Even if a process group doesn't perform more I/O, its queue size can
333-
increase just because the device load increases because of other devices.
324+
other words, if the cgroup isn't doing any I/O, this is zero. The opposite is
325+
not true. In other words, if there is no I/O queued, it doesn't mean that the
326+
cgroup is idle (I/O-wise). It could be doing purely synchronous reads on an
327+
otherwise quiescent device, which can therefore handle them immediately,
328+
without queuing. Also, while it's helpful to figure out which cgroup is
329+
putting stress on the I/O subsystem, keep in mind that it's a relative
330+
quantity. Even if a process group doesn't perform more I/O, its queue size can
331+
increase just because the device load increases because of other devices.
334332

335333
### Network metrics
336334

0 commit comments

Comments
 (0)