From: Jann Horn <jannh@google.com>
To: mtk.manpages@gmail.com
Cc: linux-man@vger.kernel.org, linux-mm@kvack.org,
Mark Mossberg <mark.mossberg@gmail.com>
Subject: [PATCH] proc.5: Document inaccurate RSS due to SPLIT_RSS_COUNTING
Date: Mon, 12 Oct 2020 13:49:40 +0200 [thread overview]
Message-ID: <20201012114940.1317510-1-jannh@google.com> (raw)
Since 34e55232e59f7b19050267a05ff1226e5cd122a5 (introduced back in
v2.6.34), Linux uses per-thread RSS counters to reduce cache contention on
the per-mm counters. With a 4K page size, that means that you can end up
with the counters off by up to 252KiB per thread.
Example:
$ cat rsstest.c
#include <stdlib.h>
#include <err.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/eventfd.h>
#include <sys/prctl.h>
void dump(int pid) {
char cmd[1000];
sprintf(cmd,
"grep '^VmRSS' /proc/%d/status;"
"grep '^Rss:' /proc/%d/smaps_rollup;"
"echo",
pid, pid
);
system(cmd);
}
int main(void) {
eventfd_t dummy;
int child_wait = eventfd(0, EFD_SEMAPHORE|EFD_CLOEXEC);
int child_resume = eventfd(0, EFD_SEMAPHORE|EFD_CLOEXEC);
if (child_wait == -1 || child_resume == -1) err(1, "eventfd");
pid_t child = fork();
if (child == -1) err(1, "fork");
if (child == 0) {
if (prctl(PR_SET_PDEATHSIG, SIGKILL)) err(1, "PDEATHSIG");
if (getppid() == 1) exit(0);
char *mapping = mmap(NULL, 80 * 0x1000, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
eventfd_write(child_wait, 1);
eventfd_read(child_resume, &dummy);
for (int i=0; i<40; i++) mapping[0x1000 * i] = 1;
eventfd_write(child_wait, 1);
eventfd_read(child_resume, &dummy);
for (int i=40; i<80; i++) mapping[0x1000 * i] = 1;
eventfd_write(child_wait, 1);
eventfd_read(child_resume, &dummy);
exit(0);
}
eventfd_read(child_wait, &dummy);
dump(child);
eventfd_write(child_resume, 1);
eventfd_read(child_wait, &dummy);
dump(child);
eventfd_write(child_resume, 1);
eventfd_read(child_wait, &dummy);
dump(child);
eventfd_write(child_resume, 1);
exit(0);
}
$ gcc -o rsstest rsstest.c && ./rsstest
VmRSS: 68 kB
Rss: 616 kB
VmRSS: 68 kB
Rss: 776 kB
VmRSS: 812 kB
Rss: 936 kB
$
Let's document that those counters aren't entirely accurate.
Reported-by: Mark Mossberg <mark.mossberg@gmail.com>
Signed-off-by: Jann Horn <jannh@google.com>
---
man5/proc.5 | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)
diff --git a/man5/proc.5 b/man5/proc.5
index ed309380b53b..13208811efb0 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -2265,6 +2265,9 @@ This is just the pages which
count toward text, data, or stack space.
This does not include pages
which have not been demand-loaded in, or which are swapped out.
+This value is inaccurate; see
+.I /proc/[pid]/statm
+below.
.TP
(25) \fIrsslim\fP \ %lu
Current soft limit in bytes on the rss of the process;
@@ -2409,9 +2412,9 @@ The columns are:
size (1) total program size
(same as VmSize in \fI/proc/[pid]/status\fP)
resident (2) resident set size
- (same as VmRSS in \fI/proc/[pid]/status\fP)
+ (inaccurate; same as VmRSS in \fI/proc/[pid]/status\fP)
shared (3) number of resident shared pages (i.e., backed by a file)
- (same as RssFile+RssShmem in \fI/proc/[pid]/status\fP)
+ (inaccurate; same as RssFile+RssShmem in \fI/proc/[pid]/status\fP)
text (4) text (code)
.\" (not including libs; broken, includes data segment)
lib (5) library (unused since Linux 2.6; always 0)
@@ -2420,6 +2423,16 @@ data (6) data + stack
dt (7) dirty pages (unused since Linux 2.6; always 0)
.EE
.in
+.IP
+.\" See SPLIT_RSS_COUNTING in the kernel.
+.\" Inaccuracy is bounded by TASK_RSS_EVENTS_THRESH.
+Some of these values are somewhat inaccurate (up to 63 pages per thread) because
+of a kernel-internal scalability optimization.
+If accurate values are required, use
+.I /proc/[pid]/smaps
+or
+.I /proc/[pid]/smaps_rollup
+instead, which are much slower but provide accurate, detailed information.
.TP
.I /proc/[pid]/status
Provides much of the information in
@@ -2596,6 +2609,9 @@ directly access physical memory.
.IP *
.IR VmHWM :
Peak resident set size ("high water mark").
+This value is inaccurate; see
+.I /proc/[pid]/statm
+above.
.IP *
.IR VmRSS :
Resident set size.
@@ -2604,16 +2620,25 @@ Note that the value here is the sum of
.IR RssFile ,
and
.IR RssShmem .
+This value is inaccurate; see
+.I /proc/[pid]/statm
+above.
.IP *
.IR RssAnon :
Size of resident anonymous memory.
.\" commit bf9683d6990589390b5178dafe8fd06808869293
(since Linux 4.5).
+This value is inaccurate; see
+.I /proc/[pid]/statm
+above.
.IP *
.IR RssFile :
Size of resident file mappings.
.\" commit bf9683d6990589390b5178dafe8fd06808869293
(since Linux 4.5).
+This value is inaccurate; see
+.I /proc/[pid]/statm
+above.
.IP *
.IR RssShmem :
Size of resident shared memory (includes System V shared memory,
@@ -2622,6 +2647,9 @@ mappings from
and shared anonymous mappings).
.\" commit bf9683d6990589390b5178dafe8fd06808869293
(since Linux 4.5).
+This value is inaccurate; see
+.I /proc/[pid]/statm
+above.
.IP *
.IR VmData ", " VmStk ", " VmExe :
Size of data, stack, and text segments.
@@ -2640,6 +2668,9 @@ Size of second-level page tables (added in Linux 4.0; removed in Linux 4.15).
.\" commit b084d4353ff99d824d3bc5a5c2c22c70b1fba722
Swapped-out virtual memory size by anonymous private pages;
shmem swap usage is not included (since Linux 2.6.34).
+This value is inaccurate; see
+.I /proc/[pid]/statm
+above.
.IP *
.IR HugetlbPages :
Size of hugetlb memory portions
base-commit: 92e4056a29156598d057045ad25f59d44fcd1bb5
--
2.28.0.1011.ga647a8990f-goog
next reply other threads:[~2020-10-12 11:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-12 11:49 Jann Horn [this message]
2020-10-12 14:52 ` Jann Horn
2020-10-27 7:05 ` Michael Kerrisk (man-pages)
2020-10-27 10:35 ` Jann Horn
2020-10-27 12:18 ` Michal Hocko
2020-10-27 13:49 ` Michal Hocko
2020-10-27 13:49 ` Michael Kerrisk (man-pages)
2020-10-12 15:07 ` Michal Hocko
2020-10-12 15:20 ` Jann Horn
2020-10-12 15:33 ` Michal Hocko
2020-10-27 18:56 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201012114940.1317510-1-jannh@google.com \
--to=jannh@google.com \
--cc=linux-man@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.mossberg@gmail.com \
--cc=mtk.manpages@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox