From: Hugh Dickins <hughd@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Vlastimil Babka <vbabka@suse.cz>,
linux-mm@kvack.org, kernel-team@fb.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm: vmstat: fix /proc/sys/vm/stat_refresh generating false warnings
Date: Tue, 23 Feb 2021 23:24:23 -0800 (PST) [thread overview]
Message-ID: <alpine.LSU.2.11.2102232210130.9202@eggly.anvils> (raw)
In-Reply-To: <20200806182555.d7a7fc9853b5a239ffe9f846@linux-foundation.org>
On Thu, 6 Aug 2020, Andrew Morton wrote:
> On Thu, 6 Aug 2020 16:38:04 -0700 Roman Gushchin <guro@fb.com> wrote:
August, yikes, I thought it was much more recent.
>
> > it seems that Hugh and me haven't reached a consensus here.
> > Can, you, please, not merge this patch into 5.9, so we would have
> > more time to find a solution, acceptable for all?
>
> No probs. I already had a big red asterisk on it ;)
I've a suspicion that Andrew might be tiring of his big red asterisk,
and wanting to unload
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix-2.patch
into 5.12.
I would prefer not, and reiterate my Nack: but no great harm will
befall the cosmos if he overrules that, and it does go through to
5.12 - I'll just want to revert it again later. And I do think a
more straightforward way of suppressing those warnings would be just
to delete the code that issues them, rather than brushing them under
a carpet of overtuning.
I've been running mmotm with the patch below (shown as sign of good
faith, and for you to try, but not ready to go yet) for a few months
now - overriding your max_drift, restoring nr_writeback and friends to
the same checking, fixing the obvious reason why nr_zone_write_pending
and nr_writeback are seen negative occasionally (interrupt interrupting
to decrement those stats before they have even been incremented).
Two big BUTs (if not asterisks): since adding that patch, I have
usually forgotten all about it, so forgotten to run the script that
echoes /proc/sys/vm/stat_refresh at odd intervals while under load:
so have less data than I'd intended by now. And secondly (and I've
just checked again this evening) I do still see nr_zone_write_pending
and nr_writeback occasionally caught negative while under load. So,
there's something more at play, perhaps the predicted Gushchin Effect
(but wouldn't they go together if so? I've only seen them separately),
or maybe something else, I don't know.
Those are the only stats I've seen caught negative, but I don't have
CMA configured at all. You mention nr_free_cma as the only(?) other
stat you've seen negative, that of course I won't see, but looking
at the source I now notice that NR_FREE_CMA_PAGES is incremented
and decremented according to page migratetype...
... internally we have another stat that's incremented and decremented
according to page migratetype, and that one has been seen negative too:
isn't page migratetype something that usually stays the same, but
sometimes the migratetype of the page's block can change, even while
some pages of it are allocated? Not a stable basis for maintaining
stats, though won't matter much if they are only for display.
vmstat_refresh could just exempt nr_zone_write_pending, nr_writeback
and nr_free_cma from warnings, if we cannot find a fix to them: but
I see no reason to suppress warnings on all the other vmstats.
The patch I've been testing with:
--- mmotm/mm/page-writeback.c 2021-02-14 14:32:24.000000000 -0800
+++ hughd/mm/page-writeback.c 2021-02-20 18:01:11.264162616 -0800
@@ -2769,6 +2769,13 @@ int __test_set_page_writeback(struct pag
int ret, access_ret;
lock_page_memcg(page);
+ /*
+ * Increment counts in advance, so that they will not go negative
+ * if test_clear_page_writeback() comes in to decrement them.
+ */
+ inc_lruvec_page_state(page, NR_WRITEBACK);
+ inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
+
if (mapping && mapping_use_writeback_tags(mapping)) {
XA_STATE(xas, &mapping->i_pages, page_index(page));
struct inode *inode = mapping->host;
@@ -2804,9 +2811,14 @@ int __test_set_page_writeback(struct pag
} else {
ret = TestSetPageWriteback(page);
}
- if (!ret) {
- inc_lruvec_page_state(page, NR_WRITEBACK);
- inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
+
+ if (WARN_ON_ONCE(ret)) {
+ /*
+ * Correct counts in retrospect, if PageWriteback was already
+ * set; but does any filesystem ever allow this to happen?
+ */
+ dec_lruvec_page_state(page, NR_WRITEBACK);
+ dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
}
unlock_page_memcg(page);
access_ret = arch_make_page_accessible(page);
--- mmotm/mm/vmstat.c 2021-02-20 17:59:44.838171232 -0800
+++ hughd/mm/vmstat.c 2021-02-20 18:01:11.272162661 -0800
@@ -1865,7 +1865,7 @@ int vmstat_refresh(struct ctl_table *tab
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
val = atomic_long_read(&vm_zone_stat[i]);
- if (val < -max_drift) {
+ if (val < 0) {
pr_warn("%s: %s %ld\n",
__func__, zone_stat_name(i), val);
err = -EINVAL;
@@ -1874,13 +1874,21 @@ int vmstat_refresh(struct ctl_table *tab
#ifdef CONFIG_NUMA
for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) {
val = atomic_long_read(&vm_numa_stat[i]);
- if (val < -max_drift) {
+ if (val < 0) {
pr_warn("%s: %s %ld\n",
__func__, numa_stat_name(i), val);
err = -EINVAL;
}
}
#endif
+ for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
+ val = atomic_long_read(&vm_node_stat[i]);
+ if (val < 0) {
+ pr_warn("%s: %s %ld\n",
+ __func__, node_stat_name(i), val);
+ err = -EINVAL;
+ }
+ }
if (err)
return err;
if (write)
next prev parent reply other threads:[~2021-02-24 7:24 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-14 17:39 Roman Gushchin
2020-07-20 8:03 ` Michal Hocko
2020-07-20 20:20 ` Roman Gushchin
2020-07-30 3:45 ` Hugh Dickins
2020-07-30 16:23 ` Roman Gushchin
2020-07-31 4:06 ` Hugh Dickins
2020-08-01 1:18 ` Roman Gushchin
2020-08-01 2:17 ` Hugh Dickins
2020-08-04 0:40 ` Roman Gushchin
2020-08-06 3:01 ` Hugh Dickins
2020-08-06 3:51 ` Roman Gushchin
2020-08-06 16:41 ` Hugh Dickins
2020-08-06 23:38 ` Roman Gushchin
2020-08-07 0:16 ` Hugh Dickins
2020-08-07 1:25 ` Andrew Morton
2021-02-24 7:24 ` Hugh Dickins [this message]
2021-02-25 1:53 ` Roman Gushchin
2021-02-25 17:21 ` Hugh Dickins
2021-02-25 18:06 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.2102232210130.9202@eggly.anvils \
--to=hughd@google.com \
--cc=akpm@linux-foundation.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox