From: Michal Hocko <mhocko@suse.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andi Kleen <ak@linux.intel.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>,
Jesse Barnes <jsbarnes@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Jonathan Corbet <corbet@lwn.net>,
Matthew Wilcox <willy@infradead.org>,
Mel Gorman <mgorman@suse.de>,
Michael Larabel <Michael@michaellarabel.com>,
Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
Will Deacon <will@kernel.org>, Ying Huang <ying.huang@intel.com>,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
page-reclaim@google.com, x86@kernel.org,
Konstantin Kharlamov <Hi-Angel@yandex.ru>
Subject: Re: [PATCH v6 6/9] mm: multigenerational lru: aging
Date: Mon, 24 Jan 2022 15:01:47 +0100 [thread overview]
Message-ID: <Ye6xS6xUD1SORdHJ@dhcp22.suse.cz> (raw)
In-Reply-To: <Ye3IfmZGwNYSCgV6@google.com>
On Sun 23-01-22 14:28:30, Yu Zhao wrote:
> On Wed, Jan 19, 2022 at 10:42:47AM +0100, Michal Hocko wrote:
> > On Wed 19-01-22 00:04:10, Yu Zhao wrote:
> > > On Mon, Jan 10, 2022 at 11:54:42AM +0100, Michal Hocko wrote:
> > > > On Sun 09-01-22 21:47:57, Yu Zhao wrote:
> > > > > On Fri, Jan 07, 2022 at 03:44:50PM +0100, Michal Hocko wrote:
> > > > > > On Tue 04-01-22 13:22:25, Yu Zhao wrote:
> > > > > > [...]
> > > > > > > +static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_mm_walk *walk)
> > > > > > > +{
> > > > > > > + static const struct mm_walk_ops mm_walk_ops = {
> > > > > > > + .test_walk = should_skip_vma,
> > > > > > > + .p4d_entry = walk_pud_range,
> > > > > > > + };
> > > > > > > +
> > > > > > > + int err;
> > > > > > > +#ifdef CONFIG_MEMCG
> > > > > > > + struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> > > > > > > +#endif
> > > > > > > +
> > > > > > > + walk->next_addr = FIRST_USER_ADDRESS;
> > > > > > > +
> > > > > > > + do {
> > > > > > > + unsigned long start = walk->next_addr;
> > > > > > > + unsigned long end = mm->highest_vm_end;
> > > > > > > +
> > > > > > > + err = -EBUSY;
> > > > > > > +
> > > > > > > + rcu_read_lock();
> > > > > > > +#ifdef CONFIG_MEMCG
> > > > > > > + if (memcg && atomic_read(&memcg->moving_account))
> > > > > > > + goto contended;
> > > > > > > +#endif
> > > > > > > + if (!mmap_read_trylock(mm))
> > > > > > > + goto contended;
> > > > > >
> > > > > > Have you evaluated the behavior under mmap_sem contention? I mean what
> > > > > > would be an effect of some mms being excluded from the walk? This path
> > > > > > is called from direct reclaim and we do allocate with exclusive mmap_sem
> > > > > > IIRC and the trylock can fail in a presence of pending writer if I am
> > > > > > not mistaken so even the read lock holder (e.g. an allocation from the #PF)
> > > > > > can bypass the walk.
> > > > >
> > > > > You are right. Here it must be a trylock; otherwise it can deadlock.
> > > >
> > > > Yeah, this is clear.
> > > >
> > > > > I think there might be a misunderstanding: the aging doesn't
> > > > > exclusively rely on page table walks to gather the accessed bit. It
> > > > > prefers page table walks but it can also fallback to the rmap-based
> > > > > function, i.e., lru_gen_look_around(), which only gathers the accessed
> > > > > bit from at most 64 PTEs and therefore is less efficient. But it still
> > > > > retains about 80% of the performance gains.
> > > >
> > > > I have to say that I really have hard time to understand the runtime
> > > > behavior depending on that interaction. How does the reclaim behave when
> > > > the virtual scan is enabled, partially enabled and almost completely
> > > > disabled due to different constrains? I do not see any such an
> > > > evaluation described in changelogs and I consider this to be a rather
> > > > important information to judge the overall behavior.
> > >
> > > It doesn't have (partially) enabled/disabled states nor does its
> > > behavior change with different reclaim constraints. Having either
> > > would make its design too complex to implement or benchmark.
> >
> > Let me clarify. By "partially enabled" I really meant behavior depedning
> > on runtime conditions. Say mmap_sem cannot be locked for half of scanned
> > tasks and/or allocation for the mm walker fails due to lack of memory.
> > How does this going to affect reclaim efficiency.
>
> Understood. This is not only possible -- it's the default for our ARM
> hardware that doesn't support the accessed bit, i.e., CPUs that don't
> automatically set the accessed bit.
>
> In try_to_inc_max_seq(), we have:
> /*
> * If the hardware doesn't automatically set the accessed bit, fallback
> * to lru_gen_look_around(), which only clears the accessed bit in a
> * handful of PTEs. Spreading the work out over a period of time usually
> * is less efficient, but it avoids bursty page faults.
> */
> if the accessed bit is not supported
> return
>
> if alloc_mm_walk() fails
> return
>
> walk_mm()
> if mmap_sem contented
> return
>
> scan page tables
>
> We have a microbenchmark that specifically measures this worst case
> scenario by entirely disabling page table scanning. Its results showed
> that this still retains more than 90% of the optimal performance. I'll
> share this microbenchmark in another email when answering Barry's
> questions regarding the accessed bit.
>
> Our profiling infra also indirectly confirms this: it collects data
> from real users running on hardware with and without the accessed
> bit. Users running on hardware without the accessed bit indeed suffer
> a small performance degradation, compared with users running on
> hardware with it. But they still benefit almost as much, compared with
> users running on the same hardware but without MGLRU.
This definitely a good information to have in the cover letter.
> > How does a user/admin
> > know that the memory reclaim is in a "degraded" mode because of the
> > contention?
>
> As we previously discussed here:
> https://lore.kernel.org/linux-mm/Ydu6fXg2FmrseQOn@google.com/
> there used to be a counter measuring the contention, and it was deemed
> unnecessary and removed in v4. But I don't have a problem if we want
> to revive it.
Well, counter might be rather tricky but few trace points would make some
sense to me.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2022-01-24 14:01 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-04 20:22 [PATCH v6 0/9] Multigenerational LRU Framework Yu Zhao
2022-01-04 20:22 ` [PATCH v6 1/9] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-01-05 10:45 ` Will Deacon
2022-01-05 20:47 ` Yu Zhao
2022-01-06 10:30 ` Will Deacon
2022-01-07 7:25 ` Yu Zhao
2022-01-11 14:19 ` Will Deacon
2022-01-11 22:27 ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 2/9] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-01-04 21:24 ` Linus Torvalds
2022-01-04 20:22 ` [PATCH v6 3/9] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-01-04 20:22 ` [PATCH v6 4/9] mm: multigenerational lru: groundwork Yu Zhao
2022-01-04 21:34 ` Linus Torvalds
2022-01-11 8:16 ` Aneesh Kumar K.V
2022-01-12 2:16 ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 5/9] mm: multigenerational lru: mm_struct list Yu Zhao
2022-01-07 9:06 ` Michal Hocko
2022-01-08 0:19 ` Yu Zhao
2022-01-10 15:21 ` Michal Hocko
2022-01-12 8:08 ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 6/9] mm: multigenerational lru: aging Yu Zhao
2022-01-06 16:06 ` Michal Hocko
2022-01-06 21:27 ` Yu Zhao
2022-01-07 8:43 ` Michal Hocko
2022-01-07 21:12 ` Yu Zhao
2022-01-06 16:12 ` Michal Hocko
2022-01-06 21:41 ` Yu Zhao
2022-01-07 8:55 ` Michal Hocko
2022-01-07 9:00 ` Michal Hocko
2022-01-10 3:58 ` Yu Zhao
2022-01-10 14:37 ` Michal Hocko
2022-01-13 9:43 ` Yu Zhao
2022-01-13 12:02 ` Michal Hocko
2022-01-19 6:31 ` Yu Zhao
2022-01-19 9:44 ` Michal Hocko
2022-01-10 15:01 ` Michal Hocko
2022-01-10 16:01 ` Vlastimil Babka
2022-01-10 16:25 ` Michal Hocko
2022-01-11 23:16 ` Yu Zhao
2022-01-12 10:28 ` Michal Hocko
2022-01-13 9:25 ` Yu Zhao
2022-01-07 13:11 ` Michal Hocko
2022-01-07 23:36 ` Yu Zhao
2022-01-10 15:35 ` Michal Hocko
2022-01-11 1:18 ` Yu Zhao
2022-01-11 9:00 ` Michal Hocko
[not found] ` <1641900108.61dd684cb0e59@mail.inbox.lv>
2022-01-11 12:15 ` Michal Hocko
2022-01-11 14:22 ` Alexey Avramov
2022-01-07 14:44 ` Michal Hocko
2022-01-10 4:47 ` Yu Zhao
2022-01-10 10:54 ` Michal Hocko
2022-01-19 7:04 ` Yu Zhao
2022-01-19 9:42 ` Michal Hocko
2022-01-23 21:28 ` Yu Zhao
2022-01-24 14:01 ` Michal Hocko [this message]
2022-01-10 16:57 ` Michal Hocko
2022-01-12 1:01 ` Yu Zhao
2022-01-12 10:17 ` Michal Hocko
2022-01-12 23:43 ` Yu Zhao
2022-01-13 11:57 ` Michal Hocko
2022-01-23 21:40 ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 7/9] mm: multigenerational lru: eviction Yu Zhao
2022-01-11 10:37 ` Aneesh Kumar K.V
2022-01-12 8:05 ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 8/9] mm: multigenerational lru: user interface Yu Zhao
2022-01-10 10:27 ` Mike Rapoport
2022-01-12 8:35 ` Yu Zhao
2022-01-12 10:31 ` Michal Hocko
2022-01-12 15:45 ` Mike Rapoport
2022-01-13 9:47 ` Yu Zhao
2022-01-13 10:31 ` Aneesh Kumar K.V
2022-01-13 23:02 ` Yu Zhao
2022-01-14 5:20 ` Aneesh Kumar K.V
2022-01-14 6:50 ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 9/9] mm: multigenerational lru: Kconfig Yu Zhao
2022-01-04 21:39 ` Linus Torvalds
2022-01-04 20:22 ` [PATCH v6 0/9] Multigenerational LRU Framework Yu Zhao
2022-01-04 20:30 ` Yu Zhao
2022-01-04 21:43 ` Linus Torvalds
2022-01-05 21:12 ` Yu Zhao
2022-01-07 9:38 ` Michal Hocko
2022-01-07 18:45 ` Yu Zhao
2022-01-10 15:39 ` Michal Hocko
2022-01-10 22:04 ` Yu Zhao
2022-01-10 22:46 ` Jesse Barnes
2022-01-11 1:41 ` Linus Torvalds
2022-01-11 10:40 ` Michal Hocko
2022-01-11 8:41 ` Yu Zhao
2022-01-11 8:53 ` Holger Hoffstätte
2022-01-11 9:26 ` Jan Alexander Steffens (heftig)
2022-01-11 16:04 ` Shuang Zhai
2022-01-12 1:46 ` Suleiman Souhlal
2022-01-12 6:07 ` Sofia Trinh
2022-01-12 16:17 ` Daniel Byrne
2022-01-18 9:21 ` Yu Zhao
2022-01-18 9:36 ` Donald Carr
2022-01-19 20:19 ` Steven Barrett
2022-01-19 22:25 ` Brian Geffon
2022-01-05 2:44 ` Shuang Zhai
2022-01-05 8:55 ` SeongJae Park
2022-01-05 10:53 ` Yu Zhao
2022-01-05 11:25 ` SeongJae Park
2022-01-05 21:06 ` Yu Zhao
2022-01-10 14:49 ` Alexey Avramov
2022-01-11 10:24 ` Alexey Avramov
2022-01-12 20:56 ` Oleksandr Natalenko
2022-01-13 8:59 ` Yu Zhao
2022-01-23 5:43 ` Barry Song
2022-01-25 6:48 ` Yu Zhao
2022-01-28 8:54 ` Barry Song
2022-02-08 9:16 ` Yu Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ye6xS6xUD1SORdHJ@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=Hi-Angel@yandex.ru \
--cc=Michael@michaellarabel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=jsbarnes@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=page-reclaim@google.com \
--cc=riel@surriel.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox