From: Sean Christopherson <seanjc@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Michael Larabel <michael@michaellarabel.com>,
kvmarm@lists.linux.dev, kvm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org, x86@kernel.org,
linux-mm@google.com
Subject: Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young()
Date: Thu, 23 Feb 2023 11:11:35 -0800 [thread overview]
Message-ID: <Y/e6Z+KIl6sYJoRg@google.com> (raw)
In-Reply-To: <CAOUHufbAKpv95k6rVedstjD_7JzP0RrbOD652gyZh2vbAjGPOg@mail.gmail.com>
On Thu, Feb 23, 2023, Yu Zhao wrote:
> On Thu, Feb 23, 2023 at 10:43 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Feb 16, 2023, Yu Zhao wrote:
> > > kswapd (MGLRU before)
> > > 100.00% balance_pgdat
> > > 100.00% shrink_node
> > > 100.00% shrink_one
> > > 99.97% try_to_shrink_lruvec
> > > 99.06% evict_folios
> > > 97.41% shrink_folio_list
> > > 31.33% folio_referenced
> > > 31.06% rmap_walk_file
> > > 30.89% folio_referenced_one
> > > 20.83% __mmu_notifier_clear_flush_young
> > > 20.54% kvm_mmu_notifier_clear_flush_young
> > > => 19.34% _raw_write_lock
> > >
> > > kswapd (MGLRU after)
> > > 100.00% balance_pgdat
> > > 100.00% shrink_node
> > > 100.00% shrink_one
> > > 99.97% try_to_shrink_lruvec
> > > 99.51% evict_folios
> > > 71.70% shrink_folio_list
> > > 7.08% folio_referenced
> > > 6.78% rmap_walk_file
> > > 6.72% folio_referenced_one
> > > 5.60% lru_gen_look_around
> > > => 1.53% __mmu_notifier_test_clear_young
> >
> > Do you happen to know how much of the improvement is due to batching, and how
> > much is due to using a walkless walk?
>
> No. I have three benchmarks running at the moment:
> 1. Windows SQL server guest on x86 host,
> 2. Apache Spark guest on arm64 host, and
> 3. Memcached guest on ppc64 host.
>
> If you are really interested in that, I can reprioritize -- I need to
> stop 1) and use that machine to get the number for you.
After looking at the "MGLRU before" stack again, it's definitely worth getting
those numbers. The "before" isn't just taking mmu_lock, it's taking mmu_lock for
write _and_ flushing remote TLBs on _every_ PTE. I suspect the batching is a
tiny percentage of the overall win (might be larger with RETPOLINE and friends),
and that the bulk of the improvement comes from avoiding the insanity of
kvm_mmu_notifier_clear_flush_young().
Speaking of which, what would it take to drop mmu_notifier_clear_flush_young()
entirely? I.e. why can MGLRU tolerate stale information but !MGLRU cannot? If
we simply deleted mmu_notifier_clear_flush_young() and used mmu_notifier_clear_young()
instead, would anyone notice, let alone care?
> > > @@ -5699,6 +5797,9 @@ static ssize_t show_enabled(struct kobject *kobj, struct kobj_attribute *attr, c
> > > if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG))
> > > caps |= BIT(LRU_GEN_NONLEAF_YOUNG);
> > >
> > > + if (kvm_arch_has_test_clear_young() && get_cap(LRU_GEN_SPTE_WALK))
> > > + caps |= BIT(LRU_GEN_SPTE_WALK);
> >
> > As alluded to in patch 1, unless batching the walks even if KVM does _not_ support
> > a lockless walk is somehow _worse_ than using the existing mmu_notifier_clear_flush_young(),
> > I think batching the calls should be conditional only on LRU_GEN_SPTE_WALK. Or
> > if we want to avoid batching when there are no mmu_notifier listeners, probe
> > mmu_notifiers. But don't call into KVM directly.
>
> I'm not sure I fully understand. Let's present the problem on the MM
> side: assuming KVM supports lockless walks, batching can still be
> worse (very unlikely), because GFNs can exhibit no memory locality at
> all. So this option allows userspace to disable batching.
I'm asking the opposite. Is there a scenario where batching+lock is worse than
!batching+lock? If not, then don't make batching depend on lockless walks.
> I fully understand why you don't want MM to call into KVM directly. No
> acceptable ways to set up a clear interface between MM and KVM other
> than the MMU notifier?
There are several options I can think of, but before we go spend time designing
the best API, I'd rather figure out if we care in the first place.
next prev parent reply other threads:[~2023-02-23 19:11 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-17 4:12 [PATCH mm-unstable v1 0/5] mm/kvm: lockless accessed bit harvest Yu Zhao
2023-02-17 4:12 ` [PATCH mm-unstable v1 1/5] mm/kvm: add mmu_notifier_test_clear_young() Yu Zhao
2023-02-23 17:13 ` Sean Christopherson
2023-02-23 17:40 ` Yu Zhao
2023-02-23 21:12 ` Sean Christopherson
2023-02-23 17:34 ` Sean Christopherson
2023-02-17 4:12 ` [PATCH mm-unstable v1 2/5] kvm/x86: add kvm_arch_test_clear_young() Yu Zhao
2023-02-17 4:19 ` Yu Zhao
2023-02-17 16:27 ` Sean Christopherson
2023-02-23 5:58 ` Yu Zhao
2023-02-23 17:09 ` Sean Christopherson
2023-02-23 17:27 ` Yu Zhao
2023-02-23 18:23 ` Sean Christopherson
2023-02-23 18:34 ` Yu Zhao
2023-02-23 18:47 ` Sean Christopherson
2023-02-23 19:02 ` Yu Zhao
2023-02-23 19:21 ` Sean Christopherson
2023-02-23 19:25 ` Yu Zhao
2023-02-17 4:12 ` [PATCH mm-unstable v1 3/5] kvm/arm64: " Yu Zhao
2023-02-17 4:21 ` Yu Zhao
2023-02-17 9:00 ` Marc Zyngier
2023-02-23 3:58 ` Yu Zhao
2023-02-23 9:03 ` Marc Zyngier
2023-02-23 9:18 ` Yu Zhao
2023-02-17 9:09 ` Oliver Upton
2023-02-17 16:00 ` Sean Christopherson
2023-02-23 5:25 ` Yu Zhao
2023-02-23 4:43 ` Yu Zhao
2023-02-17 4:12 ` [PATCH mm-unstable v1 4/5] kvm/powerpc: " Yu Zhao
2023-02-17 4:24 ` Yu Zhao
2023-02-17 4:12 ` [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young() Yu Zhao
2023-02-23 17:43 ` Sean Christopherson
2023-02-23 18:08 ` Yu Zhao
2023-02-23 19:11 ` Sean Christopherson [this message]
2023-02-23 19:36 ` Yu Zhao
2023-02-23 19:58 ` Sean Christopherson
2023-02-23 20:09 ` Yu Zhao
2023-02-23 20:28 ` Sean Christopherson
2023-02-23 20:48 ` Yu Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y/e6Z+KIl6sYJoRg@google.com \
--to=seanjc@google.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@google.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=michael@michaellarabel.com \
--cc=pbonzini@redhat.com \
--cc=x86@kernel.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox