[PATCH 06/10] mm: /proc/sys/vm/stat_refresh to force vmstat update

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Yang Shi <yang.shi@linaro.org>, Ning Qu <quning@gmail.com>,
	Christoph Lameter <cl@linux.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 06/10] mm: /proc/sys/vm/stat_refresh to force vmstat update
Date: Tue, 5 Apr 2016 13:49:36 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.1604051348010.5965@eggly.anvils> (raw)
In-Reply-To: <alpine.LSU.2.11.1604051329480.5965@eggly.anvils>

Provide /proc/sys/vm/stat_refresh to force an immediate update of
per-cpu into global vmstats: useful to avoid a sleep(2) or whatever
before checking counts when testing.  Originally added to work around
a bug which left counts stranded indefinitely on a cpu going idle
(an inaccuracy magnified when small below-batch numbers represent
"huge" amounts of memory), but I believe that bug is now fixed:
nonetheless, this is still a useful knob.

Its schedule_on_each_cpu() is probably too expensive just to fold
into reading /proc/meminfo itself: give this mode 0600 to prevent
abuse.  Allow a write or a read to do the same: nothing to read,
but "grep -h Shmem /proc/sys/vm/stat_refresh /proc/meminfo" is
convenient.  Oh, and since global_page_state() itself is careful
to disguise any underflow as 0, hack in an "Invalid argument" and
pr_warn() if a counter is negative after the refresh - this helped
to fix a misaccounting of NR_ISOLATED_FILE in my migration code.

But on recent kernels, I find that NR_ALLOC_BATCH and NR_PAGES_SCANNED
often go negative some of the time. I have not yet worked out why, but
have no evidence that it's actually harmful.  Punt for the moment by
just ignoring the anomaly on those.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 Documentation/sysctl/vm.txt |   14 ++++++++
 include/linux/vmstat.h      |    4 ++
 kernel/sysctl.c             |    7 ++++
 mm/vmstat.c                 |   58 ++++++++++++++++++++++++++++++++++
 4 files changed, 83 insertions(+)

--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -57,6 +57,7 @@ Currently, these files are in /proc/sys/
 - panic_on_oom
 - percpu_pagelist_fraction
 - stat_interval
+- stat_refresh
 - swappiness
 - user_reserve_kbytes
 - vfs_cache_pressure
@@ -754,6 +755,19 @@ is 1 second.
 
 ==============================================================
 
+stat_refresh
+
+Any read or write (by root only) flushes all the per-cpu vm statistics
+into their global totals, for more accurate reports when testing
+e.g. cat /proc/sys/vm/stat_refresh /proc/meminfo
+
+As a side-effect, it also checks for negative totals (elsewhere reported
+as 0) and "fails" with EINVAL if any are found, with a warning in dmesg.
+(At time of writing, a few stats are known sometimes to be found negative,
+with no ill effects: errors and warnings on these stats are suppressed.)
+
+==============================================================
+
 swappiness
 
 This control is used to define how aggressive the kernel will swap
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -193,6 +193,10 @@ void quiet_vmstat(void);
 void cpu_vm_stats_fold(int cpu);
 void refresh_zone_stat_thresholds(void);
 
+struct ctl_table;
+int vmstat_refresh(struct ctl_table *, int write,
+		   void __user *buffer, size_t *lenp, loff_t *ppos);
+
 void drain_zonestat(struct zone *zone, struct per_cpu_pageset *);
 
 int calculate_pressure_threshold(struct zone *zone);
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1509,6 +1509,13 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
+	{
+		.procname	= "stat_refresh",
+		.data		= NULL,
+		.maxlen		= 0,
+		.mode		= 0600,
+		.proc_handler	= vmstat_refresh,
+	},
 #endif
 #ifdef CONFIG_MMU
 	{
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1378,6 +1378,64 @@ static DEFINE_PER_CPU(struct delayed_wor
 int sysctl_stat_interval __read_mostly = HZ;
 static cpumask_var_t cpu_stat_off;
 
+static void refresh_vm_stats(struct work_struct *work)
+{
+	refresh_cpu_vm_stats(true);
+}
+
+int vmstat_refresh(struct ctl_table *table, int write,
+		   void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	long val;
+	int err;
+	int i;
+
+	/*
+	 * The regular update, every sysctl_stat_interval, may come later
+	 * than expected: leaving a significant amount in per_cpu buckets.
+	 * This is particularly misleading when checking a quantity of HUGE
+	 * pages, immediately after running a test.  /proc/sys/vm/stat_refresh,
+	 * which can equally be echo'ed to or cat'ted from (by root),
+	 * can be used to update the stats just before reading them.
+	 *
+	 * Oh, and since global_page_state() etc. are so careful to hide
+	 * transiently negative values, report an error here if any of
+	 * the stats is negative, so we know to go looking for imbalance.
+	 */
+	err = schedule_on_each_cpu(refresh_vm_stats);
+	if (err)
+		return err;
+	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
+		val = atomic_long_read(&vm_stat[i]);
+		if (val < 0) {
+			switch (i) {
+			case NR_ALLOC_BATCH:
+			case NR_PAGES_SCANNED:
+				/*
+				 * These are often seen to go negative in
+				 * recent kernels, but not to go permanently
+				 * negative.  Whilst it would be nicer not to
+				 * have exceptions, rooting them out would be
+				 * another task, of rather low priority.
+				 */
+				break;
+			default:
+				pr_warn("%s: %s %ld\n",
+					__func__, vmstat_text[i], val);
+				err = -EINVAL;
+				break;
+			}
+		}
+	}
+	if (err)
+		return err;
+	if (write)
+		*ppos += *lenp;
+	else
+		*lenp = 0;
+	return 0;
+}
+
 static void vmstat_update(struct work_struct *w)
 {
 	if (refresh_cpu_vm_stats(true)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-04-05 20:49 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 20:37 [PATCH 00/10] mm: easy preliminaries to THPagecache Hugh Dickins
2016-04-05 20:40 ` [PATCH 01/10] mm: update_lru_size warn and reset bad lru_size Hugh Dickins
2016-04-14 11:56   ` Vlastimil Babka
2016-04-05 20:42 ` [PATCH 02/10] mm: update_lru_size do the __mod_zone_page_state Hugh Dickins
2016-04-05 20:44 ` [PATCH 03/10] mm: use __SetPageSwapBacked and dont ClearPageSwapBacked Hugh Dickins
2016-04-06  9:52   ` Mel Gorman
2016-04-05 20:45 ` [PATCH 04/10] tmpfs: preliminary minor tidyups Hugh Dickins
2016-04-05 20:47 ` [PATCH 05/10] tmpfs: mem_cgroup charge fault to vm_mm not current mm Hugh Dickins
2016-04-05 20:49 ` Hugh Dickins [this message]
2016-04-05 20:51 ` [PATCH 07/10] huge mm: move_huge_pmd does not need new_vma Hugh Dickins
2016-04-11 10:28   ` Kirill A. Shutemov
2016-04-05 20:52 ` [PATCH 08/10] huge pagecache: extend mremap pmd rmap lockout to files Hugh Dickins
2016-04-11 10:32   ` Kirill A. Shutemov
2016-04-05 20:55 ` [PATCH 09/10] huge pagecache: mmap_sem is unlocked when truncation splits pmd Hugh Dickins
2016-04-11 10:35   ` Kirill A. Shutemov
2016-04-14 17:39   ` Matthew Wilcox
2016-04-16  4:55     ` Andrew Morton
2016-04-05 21:02 ` [PATCH 10/10] arch: fix has_transparent_hugepage() Hugh Dickins
2016-04-05 23:25   ` David Miller
2016-04-06  3:22   ` Vineet Gupta
2016-04-06  6:58   ` Ingo Molnar
2016-04-06 12:29     ` Chris Metcalf
2016-04-06 21:58       ` Ingo Molnar
2016-04-06 11:56   ` Gerald Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1604051348010.5965@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=cl@linux.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=quning@gmail.com \
    --cc=yang.shi@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox