Re: [PATCH v6 6/9] mm: multigenerational lru: aging

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andi Kleen <ak@linux.intel.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>,
	Jesse Barnes <jsbarnes@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mgorman@suse.de>,
	Michael Larabel <Michael@michaellarabel.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Will Deacon <will@kernel.org>, Ying Huang <ying.huang@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	page-reclaim@google.com, x86@kernel.org,
	Konstantin Kharlamov <Hi-Angel@yandex.ru>
Subject: Re: [PATCH v6 6/9] mm: multigenerational lru: aging
Date: Mon, 24 Jan 2022 15:01:47 +0100	[thread overview]
Message-ID: <Ye6xS6xUD1SORdHJ@dhcp22.suse.cz> (raw)
In-Reply-To: <Ye3IfmZGwNYSCgV6@google.com>

On Sun 23-01-22 14:28:30, Yu Zhao wrote:
> On Wed, Jan 19, 2022 at 10:42:47AM +0100, Michal Hocko wrote:
> > On Wed 19-01-22 00:04:10, Yu Zhao wrote:
> > > On Mon, Jan 10, 2022 at 11:54:42AM +0100, Michal Hocko wrote:
> > > > On Sun 09-01-22 21:47:57, Yu Zhao wrote:
> > > > > On Fri, Jan 07, 2022 at 03:44:50PM +0100, Michal Hocko wrote:
> > > > > > On Tue 04-01-22 13:22:25, Yu Zhao wrote:
> > > > > > [...]
> > > > > > > +static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_mm_walk *walk)
> > > > > > > +{
> > > > > > > +	static const struct mm_walk_ops mm_walk_ops = {
> > > > > > > +		.test_walk = should_skip_vma,
> > > > > > > +		.p4d_entry = walk_pud_range,
> > > > > > > +	};
> > > > > > > +
> > > > > > > +	int err;
> > > > > > > +#ifdef CONFIG_MEMCG
> > > > > > > +	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> > > > > > > +#endif
> > > > > > > +
> > > > > > > +	walk->next_addr = FIRST_USER_ADDRESS;
> > > > > > > +
> > > > > > > +	do {
> > > > > > > +		unsigned long start = walk->next_addr;
> > > > > > > +		unsigned long end = mm->highest_vm_end;
> > > > > > > +
> > > > > > > +		err = -EBUSY;
> > > > > > > +
> > > > > > > +		rcu_read_lock();
> > > > > > > +#ifdef CONFIG_MEMCG
> > > > > > > +		if (memcg && atomic_read(&memcg->moving_account))
> > > > > > > +			goto contended;
> > > > > > > +#endif
> > > > > > > +		if (!mmap_read_trylock(mm))
> > > > > > > +			goto contended;
> > > > > > 
> > > > > > Have you evaluated the behavior under mmap_sem contention? I mean what
> > > > > > would be an effect of some mms being excluded from the walk? This path
> > > > > > is called from direct reclaim and we do allocate with exclusive mmap_sem
> > > > > > IIRC and the trylock can fail in a presence of pending writer if I am
> > > > > > not mistaken so even the read lock holder (e.g. an allocation from the #PF)
> > > > > > can bypass the walk.
> > > > > 
> > > > > You are right. Here it must be a trylock; otherwise it can deadlock.
> > > > 
> > > > Yeah, this is clear.
> > > > 
> > > > > I think there might be a misunderstanding: the aging doesn't
> > > > > exclusively rely on page table walks to gather the accessed bit. It
> > > > > prefers page table walks but it can also fallback to the rmap-based
> > > > > function, i.e., lru_gen_look_around(), which only gathers the accessed
> > > > > bit from at most 64 PTEs and therefore is less efficient. But it still
> > > > > retains about 80% of the performance gains.
> > > > 
> > > > I have to say that I really have hard time to understand the runtime
> > > > behavior depending on that interaction. How does the reclaim behave when
> > > > the virtual scan is enabled, partially enabled and almost completely
> > > > disabled due to different constrains? I do not see any such an
> > > > evaluation described in changelogs and I consider this to be a rather
> > > > important information to judge the overall behavior.
> > > 
> > > It doesn't have (partially) enabled/disabled states nor does its
> > > behavior change with different reclaim constraints. Having either
> > > would make its design too complex to implement or benchmark.
> > 
> > Let me clarify. By "partially enabled" I really meant behavior depedning
> > on runtime conditions. Say mmap_sem cannot be locked for half of scanned
> > tasks and/or allocation for the mm walker fails due to lack of memory.
> > How does this going to affect reclaim efficiency.
> 
> Understood. This is not only possible -- it's the default for our ARM
> hardware that doesn't support the accessed bit, i.e., CPUs that don't
> automatically set the accessed bit.
> 
> In try_to_inc_max_seq(), we have:
>     /*
>      * If the hardware doesn't automatically set the accessed bit, fallback
>      * to lru_gen_look_around(), which only clears the accessed bit in a
>      * handful of PTEs. Spreading the work out over a period of time usually
>      * is less efficient, but it avoids bursty page faults.
>      */
>     if the accessed bit is not supported
>         return
> 
>     if alloc_mm_walk() fails
>         return
> 
>     walk_mm()
>         if mmap_sem contented
>             return
> 
>         scan page tables
> 
> We have a microbenchmark that specifically measures this worst case
> scenario by entirely disabling page table scanning. Its results showed
> that this still retains more than 90% of the optimal performance. I'll
> share this microbenchmark in another email when answering Barry's
> questions regarding the accessed bit.
> 
> Our profiling infra also indirectly confirms this: it collects data
> from real users running on hardware with and without the accessed
> bit. Users running on hardware without the accessed bit indeed suffer
> a small performance degradation, compared with users running on
> hardware with it. But they still benefit almost as much, compared with
> users running on the same hardware but without MGLRU.

This definitely a good information to have in the cover letter.

> > How does a user/admin
> > know that the memory reclaim is in a "degraded" mode because of the
> > contention?
> 
> As we previously discussed here:
> https://lore.kernel.org/linux-mm/Ydu6fXg2FmrseQOn@google.com/
> there used to be a counter measuring the contention, and it was deemed
> unnecessary and removed in v4. But I don't have a problem if we want
> to revive it.

Well, counter might be rather tricky but few trace points would make some
sense to me.

-- 
Michal Hocko
SUSE Labs

next prev parent reply	other threads:[~2022-01-24 14:01 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-04 20:22 [PATCH v6 0/9] Multigenerational LRU Framework Yu Zhao
2022-01-04 20:22 ` [PATCH v6 1/9] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-01-05 10:45   ` Will Deacon
2022-01-05 20:47     ` Yu Zhao
2022-01-06 10:30       ` Will Deacon
2022-01-07  7:25         ` Yu Zhao
2022-01-11 14:19           ` Will Deacon
2022-01-11 22:27             ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 2/9] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-01-04 21:24   ` Linus Torvalds
2022-01-04 20:22 ` [PATCH v6 3/9] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-01-04 20:22 ` [PATCH v6 4/9] mm: multigenerational lru: groundwork Yu Zhao
2022-01-04 21:34   ` Linus Torvalds
2022-01-11  8:16   ` Aneesh Kumar K.V
2022-01-12  2:16     ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 5/9] mm: multigenerational lru: mm_struct list Yu Zhao
2022-01-07  9:06   ` Michal Hocko
2022-01-08  0:19     ` Yu Zhao
2022-01-10 15:21       ` Michal Hocko
2022-01-12  8:08         ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 6/9] mm: multigenerational lru: aging Yu Zhao
2022-01-06 16:06   ` Michal Hocko
2022-01-06 21:27     ` Yu Zhao
2022-01-07  8:43       ` Michal Hocko
2022-01-07 21:12         ` Yu Zhao
2022-01-06 16:12   ` Michal Hocko
2022-01-06 21:41     ` Yu Zhao
2022-01-07  8:55       ` Michal Hocko
2022-01-07  9:00         ` Michal Hocko
2022-01-10  3:58           ` Yu Zhao
2022-01-10 14:37             ` Michal Hocko
2022-01-13  9:43               ` Yu Zhao
2022-01-13 12:02                 ` Michal Hocko
2022-01-19  6:31                   ` Yu Zhao
2022-01-19  9:44                     ` Michal Hocko
2022-01-10 15:01     ` Michal Hocko
2022-01-10 16:01       ` Vlastimil Babka
2022-01-10 16:25         ` Michal Hocko
2022-01-11 23:16       ` Yu Zhao
2022-01-12 10:28         ` Michal Hocko
2022-01-13  9:25           ` Yu Zhao
2022-01-07 13:11   ` Michal Hocko
2022-01-07 23:36     ` Yu Zhao
2022-01-10 15:35       ` Michal Hocko
2022-01-11  1:18         ` Yu Zhao
2022-01-11  9:00           ` Michal Hocko
     [not found]         ` <1641900108.61dd684cb0e59@mail.inbox.lv>
2022-01-11 12:15           ` Michal Hocko
2022-01-11 14:22         ` Alexey Avramov
2022-01-07 14:44   ` Michal Hocko
2022-01-10  4:47     ` Yu Zhao
2022-01-10 10:54       ` Michal Hocko
2022-01-19  7:04         ` Yu Zhao
2022-01-19  9:42           ` Michal Hocko
2022-01-23 21:28             ` Yu Zhao
2022-01-24 14:01               ` Michal Hocko [this message]
2022-01-10 16:57   ` Michal Hocko
2022-01-12  1:01     ` Yu Zhao
2022-01-12 10:17       ` Michal Hocko
2022-01-12 23:43         ` Yu Zhao
2022-01-13 11:57           ` Michal Hocko
2022-01-23 21:40             ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 7/9] mm: multigenerational lru: eviction Yu Zhao
2022-01-11 10:37   ` Aneesh Kumar K.V
2022-01-12  8:05     ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 8/9] mm: multigenerational lru: user interface Yu Zhao
2022-01-10 10:27   ` Mike Rapoport
2022-01-12  8:35     ` Yu Zhao
2022-01-12 10:31       ` Michal Hocko
2022-01-12 15:45       ` Mike Rapoport
2022-01-13  9:47         ` Yu Zhao
2022-01-13 10:31   ` Aneesh Kumar K.V
2022-01-13 23:02     ` Yu Zhao
2022-01-14  5:20       ` Aneesh Kumar K.V
2022-01-14  6:50         ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 9/9] mm: multigenerational lru: Kconfig Yu Zhao
2022-01-04 21:39   ` Linus Torvalds
2022-01-04 20:22 ` [PATCH v6 0/9] Multigenerational LRU Framework Yu Zhao
2022-01-04 20:30 ` Yu Zhao
2022-01-04 21:43   ` Linus Torvalds
2022-01-05 21:12     ` Yu Zhao
2022-01-07  9:38   ` Michal Hocko
2022-01-07 18:45     ` Yu Zhao
2022-01-10 15:39       ` Michal Hocko
2022-01-10 22:04         ` Yu Zhao
2022-01-10 22:46           ` Jesse Barnes
2022-01-11  1:41             ` Linus Torvalds
2022-01-11 10:40             ` Michal Hocko
2022-01-11  8:41   ` Yu Zhao
2022-01-11  8:53     ` Holger Hoffstätte
2022-01-11  9:26     ` Jan Alexander Steffens (heftig)
2022-01-11 16:04     ` Shuang Zhai
2022-01-12  1:46     ` Suleiman Souhlal
2022-01-12  6:07     ` Sofia Trinh
2022-01-12 16:17       ` Daniel Byrne
2022-01-18  9:21     ` Yu Zhao
2022-01-18  9:36     ` Donald Carr
2022-01-19 20:19     ` Steven Barrett
2022-01-19 22:25     ` Brian Geffon
2022-01-05  2:44 ` Shuang Zhai
2022-01-05  8:55 ` SeongJae Park
2022-01-05 10:53   ` Yu Zhao
2022-01-05 11:25     ` SeongJae Park
2022-01-05 21:06       ` Yu Zhao
2022-01-10 14:49 ` Alexey Avramov
2022-01-11 10:24 ` Alexey Avramov
2022-01-12 20:56 ` Oleksandr Natalenko
2022-01-13  8:59   ` Yu Zhao
2022-01-23  5:43 ` Barry Song
2022-01-25  6:48   ` Yu Zhao
2022-01-28  8:54     ` Barry Song
2022-02-08  9:16       ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ye6xS6xUD1SORdHJ@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=Hi-Angel@yandex.ru \
    --cc=Michael@michaellarabel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=jsbarnes@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=page-reclaim@google.com \
    --cc=riel@surriel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox