@@ -200,74 +200,72 @@ indicates the number of page faults since the creation of the cgroup.
200200
201201` cache `
202202: The amount of memory used by the processes of this control group that can be
203- associated precisely with a block on a block device. When you read from and
204- write to files on disk, this amount increases. This is the case if you use
205- "conventional" I/O (` open ` , ` read ` , ` write ` syscalls) as well as mapped files
206- (with ` mmap ` ). It also accounts for the memory used by ` tmpfs ` mounts, though
207- the reasons are unclear.
203+ associated precisely with a block on a block device. When you read from and
204+ write to files on disk, this amount increases. This is the case if you use
205+ "conventional" I/O (` open ` , ` read ` , ` write ` syscalls) as well as mapped files
206+ (with ` mmap ` ). It also accounts for the memory used by ` tmpfs ` mounts, though
207+ the reasons are unclear.
208208
209209` rss `
210210: The amount of memory that doesn't correspond to anything on disk: stacks,
211- heaps, and anonymous memory maps.
211+ heaps, and anonymous memory maps.
212212
213213` mapped_file `
214214: Indicates the amount of memory mapped by the processes in the control group.
215- It doesn't give you information about how much memory is used; it rather
216- tells you how it's used.
215+ It doesn't give you information about how much memory is used; it rather
216+ tells you how it's used.
217217
218218` pgfault ` , ` pgmajfault `
219219: Indicate the number of times that a process of the cgroup triggered a "page
220- fault" and a "major fault", respectively. A page fault happens when a process
221- accesses a part of its virtual memory space which is nonexistent or protected.
222- The former can happen if the process is buggy and tries to access an invalid
223- address (it is sent a ` SIGSEGV ` signal, typically killing it with the famous
224- ` Segmentation fault ` message). The latter can happen when the process reads
225- from a memory zone which has been swapped out, or which corresponds to a mapped
226- file: in that case, the kernel loads the page from disk, and let the CPU
227- complete the memory access. It can also happen when the process writes to a
228- copy-on-write memory zone: likewise, the kernel preempts the process, duplicate
229- the memory page, and resume the write operation on the process's own copy of
230- the page. "Major" faults happen when the kernel actually needs to read the data
231- from disk. When it just duplicates an existing page, or allocate an empty page,
232- it's a regular (or "minor") fault.
220+ fault" and a "major fault", respectively. A page fault happens when a process
221+ accesses a virtual memory page that is not currently mapped to a physical
222+ memory frame. This is a normal part of memory management. For example, a page
223+ fault occurs when the process reads from a memory zone that has been swapped
224+ out, or that corresponds to a memory-mapped file: in that case, the kernel
225+ loads the page from disk and lets the CPU complete the memory access. It also
226+ happens when the process writes to a copy-on-write memory zone: the kernel
227+ duplicates the memory page and resumes the write operation on the process's
228+ own copy of the page. "Major" faults happen when the kernel needs to read
229+ data from disk. When it duplicates an existing page, or allocates an empty
230+ page, it's a regular (or "minor") fault.
233231
234232` swap `
235233: The amount of swap currently used by the processes in this cgroup.
236234
237235` active_anon ` , ` inactive_anon `
238236: The amount of anonymous memory that has been identified has respectively
239- _ active_ and _ inactive_ by the kernel. "Anonymous" memory is the memory that is
240- _ not_ linked to disk pages. In other words, that's the equivalent of the rss
241- counter described above. In fact, the very definition of the rss counter is
242- ` active_anon ` + ` inactive_anon ` - ` tmpfs ` (where tmpfs is the amount of
243- memory used up by ` tmpfs ` filesystems mounted by this control group). Now,
244- what's the difference between "active" and "inactive"? Pages are initially
245- "active"; and at regular intervals, the kernel sweeps over the memory, and tags
246- some pages as "inactive". Whenever they're accessed again, they're
247- immediately re-tagged "active". When the kernel is almost out of memory, and
248- time comes to swap out to disk, the kernel swaps "inactive" pages.
237+ _ active_ and _ inactive_ by the kernel. "Anonymous" memory is the memory that is
238+ _ not_ linked to disk pages. In other words, that's the equivalent of the rss
239+ counter described above. In fact, the very definition of the rss counter is
240+ ` active_anon ` + ` inactive_anon ` - ` tmpfs ` (where tmpfs is the amount of
241+ memory used up by ` tmpfs ` filesystems mounted by this control group). Now,
242+ what's the difference between "active" and "inactive"? Pages are initially
243+ "active"; and at regular intervals, the kernel sweeps over the memory, and tags
244+ some pages as "inactive". Whenever they're accessed again, they're
245+ immediately re-tagged "active". When the kernel is almost out of memory, and
246+ time comes to swap out to disk, the kernel swaps "inactive" pages.
249247
250248` active_file ` , ` inactive_file `
251249: Cache memory, with _ active_ and _ inactive_ similar to the _ anon_ memory
252- above. The exact formula is ` cache ` = ` active_file ` + ` inactive_file ` +
253- ` tmpfs ` . The exact rules used by the kernel to move memory pages between
254- active and inactive sets are different from the ones used for anonymous memory,
255- but the general principle is the same. When the kernel needs to reclaim memory,
256- it's cheaper to reclaim a clean (=non modified) page from this pool, since it
257- can be reclaimed immediately (while anonymous pages and dirty/modified pages
258- need to be written to disk first).
250+ above. The exact formula is ` cache ` = ` active_file ` + ` inactive_file ` +
251+ ` tmpfs ` . The exact rules used by the kernel to move memory pages between
252+ active and inactive sets are different from the ones used for anonymous memory,
253+ but the general principle is the same. When the kernel needs to reclaim memory,
254+ it's cheaper to reclaim a clean (=non modified) page from this pool, since it
255+ can be reclaimed immediately (while anonymous pages and dirty/modified pages
256+ need to be written to disk first).
259257
260258` unevictable `
261259: The amount of memory that cannot be reclaimed; generally, it accounts for
262- memory that has been "locked" with ` mlock ` . It's often used by crypto
263- frameworks to make sure that secret keys and other sensitive material never
264- gets swapped out to disk.
260+ memory that has been "locked" with ` mlock ` . It's often used by crypto
261+ frameworks to make sure that secret keys and other sensitive material never
262+ gets swapped out to disk.
265263
266264` memory_limit ` , ` memsw_limit `
267265: These aren't really metrics, but a reminder of the limits applied to this
268- cgroup. The first one indicates the maximum amount of physical memory that can
269- be used by the processes of this control group; the second one indicates the
270- maximum amount of RAM+swap.
266+ cgroup. The first one indicates the maximum amount of physical memory that can
267+ be used by the processes of this control group; the second one indicates the
268+ maximum amount of RAM+swap.
271269
272270Accounting for memory in the page cache is very complex. If two
273271processes in different control groups both read the same file
@@ -309,28 +307,28 @@ relevant ones:
309307
310308` blkio.sectors `
311309: Contains the number of 512-bytes sectors read and written by the processes
312- member of the cgroup, device by device. Reads and writes are merged in a single
313- counter.
310+ member of the cgroup, device by device. Reads and writes are merged in a single
311+ counter.
314312
315313` blkio.io_service_bytes `
316314: Indicates the number of bytes read and written by the cgroup. It has 4
317- counters per device, because for each device, it differentiates between
318- synchronous vs. asynchronous I/O, and reads vs. writes.
315+ counters per device, because for each device, it differentiates between
316+ synchronous vs. asynchronous I/O, and reads vs. writes.
319317
320318` blkio.io_serviced `
321319: The number of I/O operations performed, regardless of their size. It also has
322- 4 counters per device.
320+ 4 counters per device.
323321
324322` blkio.io_queued `
325323: Indicates the number of I/O operations currently queued for this cgroup. In
326- other words, if the cgroup isn't doing any I/O, this is zero. The opposite is
327- not true. In other words, if there is no I/O queued, it doesn't mean that the
328- cgroup is idle (I/O-wise). It could be doing purely synchronous reads on an
329- otherwise quiescent device, which can therefore handle them immediately,
330- without queuing. Also, while it's helpful to figure out which cgroup is
331- putting stress on the I/O subsystem, keep in mind that it's a relative
332- quantity. Even if a process group doesn't perform more I/O, its queue size can
333- increase just because the device load increases because of other devices.
324+ other words, if the cgroup isn't doing any I/O, this is zero. The opposite is
325+ not true. In other words, if there is no I/O queued, it doesn't mean that the
326+ cgroup is idle (I/O-wise). It could be doing purely synchronous reads on an
327+ otherwise quiescent device, which can therefore handle them immediately,
328+ without queuing. Also, while it's helpful to figure out which cgroup is
329+ putting stress on the I/O subsystem, keep in mind that it's a relative
330+ quantity. Even if a process group doesn't perform more I/O, its queue size can
331+ increase just because the device load increases because of other devices.
334332
335333### Network metrics
336334
0 commit comments