linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Yang Shi <shy828301@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Eric Bergen <ebergen@meta.com>
Subject: Re: [PATCH] mm: vmscan: split khugepaged stats from direct reclaim stats
Date: Thu, 27 Oct 2022 10:15:00 -0400	[thread overview]
Message-ID: <Y1qSZHK/U0SpNqNa@cmpxchg.org> (raw)
In-Reply-To: <CAJD7tkb5y9oqgVauVPiS0KbiL2Wnsu7jhK7Q44oUBZzBXwKUYA@mail.gmail.com>

On Wed, Oct 26, 2022 at 07:41:21PM -0700, Yosry Ahmed wrote:
> On Wed, Oct 26, 2022 at 1:51 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, Oct 26, 2022 at 10:32 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Tue, Oct 25, 2022 at 02:53:01PM -0700, Yang Shi wrote:
> > > > On Tue, Oct 25, 2022 at 1:54 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > >
> > > > > On Tue, Oct 25, 2022 at 12:40:15PM -0700, Yang Shi wrote:
> > > > > > On Tue, Oct 25, 2022 at 10:05 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > > >
> > > > > > > Direct reclaim stats are useful for identifying a potential source for
> > > > > > > application latency, as well as spotting issues with kswapd. However,
> > > > > > > khugepaged currently distorts the picture: as a kernel thread it
> > > > > > > doesn't impose allocation latencies on userspace, and it explicitly
> > > > > > > opts out of kswapd reclaim. Its activity showing up in the direct
> > > > > > > reclaim stats is misleading. Counting it as kswapd reclaim could also
> > > > > > > cause confusion when trying to understand actual kswapd behavior.
> > > > > > >
> > > > > > > Break out khugepaged from the direct reclaim counters into new
> > > > > > > pgsteal_khugepaged, pgdemote_khugepaged, pgscan_khugepaged counters.
> > > > > > >
> > > > > > > Test with a huge executable (CONFIG_READ_ONLY_THP_FOR_FS):
> > > > > > >
> > > > > > > pgsteal_kswapd 1342185
> > > > > > > pgsteal_direct 0
> > > > > > > pgsteal_khugepaged 3623
> > > > > > > pgscan_kswapd 1345025
> > > > > > > pgscan_direct 0
> > > > > > > pgscan_khugepaged 3623
> > > > > >
> > > > > > There are other kernel threads or works may allocate memory then
> > > > > > trigger memory reclaim, there may be similar problems for them and
> > > > > > someone may try to add a new stat. So how's about we make the stats
> > > > > > more general, for example, call it "pg{steal|scan}_kthread"?
> > > > >
> > > > > I'm not convinved that's a good idea.
> > > > >
> > > > > Can you generally say that userspace isn't indirectly waiting for one
> > > > > of those allocating threads? With khugepaged, we know.
> > > >
> > > > AFAIK, ksm may do slab allocation with __GFP_DIRECT_RECLAIM.
> > >
> > > Right, but ksm also uses __GFP_KSWAPD_RECLAIM. So while userspace
> > > isn't directly waiting for ksm, when ksm enters direct reclaim it's
> > > because kswapd failed. This is of interest to kernel developers.
> > > Userspace will likely see direct reclaim in that scenario as well, so
> > > the ksm direct reclaim counts aren't liable to confuse users.
> > >
> > > Khugepaged on the other hand will *always* reclaim directly, even if
> > > there is no memory pressure or kswapd failure. The direct reclaim
> > > counts there are misleading to both developers and users.
> > >
> > > What it really should be is pgscan_nokswapd_nouserprocesswaiting, but
> > > that just seems kind of long ;-)
> > >
> > > I'm also not sure anybody but khugepaged is doing direct reclaim
> > > without kswapd reclaim. It seems unlikely we'll get more of those.
> >
> > IIUC you actually don't care about how many direct reclaim are
> > triggered by khugepaged, but you would like to separate the direct
> > reclaim stats between that are triggered directly by userspace
> > actions, which may stall userspace, and that aren't, which don't stall
> > userspace. If so it doesn't sound that important to distinguish
> > whether the direct reclaim are triggered by khugepaged or other kernel
> > threads even though other kthreads are not liable to confuse users
> > IMHO.

I feel like I've sufficiently explained my reason for wanting to
separate out the __GFP_KSWAPD_RECLAIM special case from other sites.

> My 2c, if we care about direct reclaim as in reclaim that may stall
> user space application allocations, then there are other reclaim
> contexts that may pollute the direct reclaim stats. For instance,
> proactive reclaim, or reclaim done by writing a limit lower than the
> current usage to memory.max or memory.high, as they are not done in
> the context of the application allocating memory.
>
> At Google, we have some internal direct reclaim memcg statistics, and
> the way we handle this is by passing a flag from such contexts to
> try_to_free_mem_cgroup_pages() in the reclaim_options arg. This flag
> is echod into a scan_struct bit, which we then use to filter out
> direct reclaim operations that actually cause latencies in user space
> allocations.
> 
> Perhaps something similar might be more generic here? I am not sure
> what context khugepaged reclaims memory from, but I think it's not a
> memcg context, so maybe we want to generalize the reclaim_options arg
> to try_to_free_pages() or whatever interface khugepaged uses to free
> memory.

So at the /proc/vmstat level, I'm not sure it matters much because it
doesn't count any cgroup_reclaim() activity.

But at the cgroup level, it sure would be nice to split out proactive
reclaim churn. Both in terms of not polluting direct reclaim counts,
but also for *knowing* how much proactive reclaim is doing.

Do you have separate counters for this?


  reply	other threads:[~2022-10-27 14:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 17:05 Johannes Weiner
2022-10-25 17:16 ` Matthew Wilcox
2022-10-25 20:43   ` Johannes Weiner
2022-10-25 19:40 ` Yang Shi
2022-10-25 20:54   ` Johannes Weiner
2022-10-25 21:53     ` Yang Shi
2022-10-26 17:32       ` Johannes Weiner
2022-10-26 20:51         ` Yang Shi
2022-10-27  2:41           ` Yosry Ahmed
2022-10-27 14:15             ` Johannes Weiner [this message]
2022-10-27 20:43               ` Yosry Ahmed
2022-10-28 14:39                 ` Johannes Weiner
2022-10-28 17:41                   ` Yosry Ahmed
2022-10-31 16:00                     ` Johannes Weiner
2022-10-31 16:46                       ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1qSZHK/U0SpNqNa@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=ebergen@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox