From: "T.J. Mercier" <tjmercier@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
android-mm@google.com, yuzhao@google.com,
yangyifei03@kuaishou.com, cgroups@vger.kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Revert "mm:vmscan: fix inaccurate reclaim during proactive reclaim"
Date: Wed, 24 Jan 2024 09:14:40 -0800 [thread overview]
Message-ID: <CABdmKX3kPXAv18+AezoqAeb3ALCbLxcth+ZHv2jZGXsZG5C6hw@mail.gmail.com> (raw)
In-Reply-To: <Za_m9Pymh0y-lzgX@tiehlicka>
On Tue, Jan 23, 2024 at 8:19 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 23-01-24 05:58:05, T.J. Mercier wrote:
> > On Tue, Jan 23, 2024 at 1:33 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Sun 21-01-24 21:44:12, T.J. Mercier wrote:
> > > > This reverts commit 0388536ac29104a478c79b3869541524caec28eb.
> > > >
> > > > Proactive reclaim on the root cgroup is 10x slower after this patch when
> > > > MGLRU is enabled, and completion times for proactive reclaim on much
> > > > smaller non-root cgroups take ~30% longer (with or without MGLRU).
> > >
> > > What is the reclaim target in these pro-active reclaim requests?
> >
> > Two targets:
> > 1) /sys/fs/cgroup/memory.reclaim
> > 2) /sys/fs/cgroup/uid_0/memory.reclaim (a bunch of Android system services)
>
> OK, I was not really clear. I was curious about nr_to_reclaim.
>
> > Note that lru_gen_shrink_node is used for 1, but shrink_node_memcgs is
> > used for 2.
> >
> > The 10x comes from the rate of reclaim (~70k pages/sec vs ~6.6k
> > pages/sec) for 1. After this revert the root reclaim took only about
> > 10 seconds. Before the revert it's still running after about 3 minutes
> > using a core at 100% the whole time, and I'm too impatient to wait
> > longer to record times for comparison.
> >
> > The 30% comes from the average of a few runs for 2:
> > Before revert:
> > $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time
> > echo "" > /sys/fs/cgroup/uid_0/memory.reclaim'
>
> Ohh, so you want to reclaim all of it (resp. as much as possible).
>
Right, the main use-case here is we decide an application should be
backgrounded and its cgroup frozen. Before freezing, reclaim as much
as possible so that the frozen processes' RAM use is as low as
possible while they're dormant.
> [...]
>
> > > > After the patch the reclaim rate is
> > > > consistently ~6.6k pages/sec due to the reduced nr_pages value causing
> > > > scan aborts as soon as SWAP_CLUSTER_MAX pages are reclaimed. The
> > > > proactive reclaim doesn't complete after several minutes because
> > > > try_to_free_mem_cgroup_pages is still capable of reclaiming pages in
> > > > tiny SWAP_CLUSTER_MAX page chunks and nr_retries is never decremented.
> > >
> > > I do not understand this part. How does a smaller reclaim target manages
> > > to have reclaimed > 0 while larger one doesn't?
> >
> > They both are able to make progress. The main difference is that a
> > single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon
> > after it reclaims nr_to_reclaim, and before it touches all memcgs. So
> > a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish
> > pages with MGLRU. WIthout MGLRU the memcg walk is not aborted
> > immediately after nr_to_reclaim is reached, so a single call to
> > try_to_free_mem_cgroup_pages can actually reclaim thousands of pages
> > even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.)
> > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@google.com/
>
> OK, I do see how try_to_free_mem_cgroup_pages might over reclaim but I
> do not really follow how increasing the batch actually fixes the issue
> that there is always progress being made and therefore memory_reclaim
> takes ages to terminates?
Oh, because the page reclaim rate with a small batch is just much
lower than with a very large batch. We have to restart reclaim from
fresh each time a batch is completed before we get to a place where
we're actually freeing/swapping pages again. That setup cost is
amortized over many more pages with a large batch size, but appears to
be pretty significant for small batch sizes.
next prev parent reply other threads:[~2024-01-24 17:14 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-21 21:44 T.J. Mercier
2024-01-23 2:24 ` Yu Zhao
2024-01-23 9:33 ` Michal Hocko
2024-01-23 13:58 ` T.J. Mercier
2024-01-23 16:19 ` Michal Hocko
2024-01-24 17:14 ` T.J. Mercier [this message]
2024-01-23 16:48 ` Johannes Weiner
2024-01-24 7:50 ` Michal Hocko
2024-01-24 17:46 ` T.J. Mercier
2024-01-26 16:34 ` Johannes Weiner
2024-01-26 16:41 ` T.J. Mercier
2024-01-30 20:58 ` T.J. Mercier
2024-01-30 21:56 ` Johannes Weiner
2024-01-27 6:17 ` Yu Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABdmKX3kPXAv18+AezoqAeb3ALCbLxcth+ZHv2jZGXsZG5C6hw@mail.gmail.com \
--to=tjmercier@google.com \
--cc=akpm@linux-foundation.org \
--cc=android-mm@google.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=yangyifei03@kuaishou.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox