linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>,
	James Houghton <jthoughton@google.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Albert Ou <aou@eecs.berkeley.edu>,
	Ankit Agrawal <ankita@nvidia.com>,
	 Anup Patel <anup@brainfault.org>,
	Atish Patra <atishp@atishpatra.org>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Bibo Mao <maobibo@loongson.cn>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	David Rientjes <rientjes@google.com>,
	 Huacai Chen <chenhuacai@kernel.org>,
	James Morse <james.morse@arm.com>,
	 Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	 Nicholas Piggin <npiggin@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	 Paul Walmsley <paul.walmsley@sifive.com>,
	Raghavendra Rao Ananta <rananta@google.com>,
	 Ryan Roberts <ryan.roberts@arm.com>,
	Sean Christopherson <seanjc@google.com>,
	 Shaoqin Huang <shahuang@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Tianrui Zhao <zhaotianrui@loongson.cn>,
	 Will Deacon <will@kernel.org>, Zenghui Yu <yuzenghui@huawei.com>,
	kvm-riscv@lists.infradead.org,  kvm@vger.kernel.org,
	kvmarm@lists.linux.dev,  linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org,  linux-mips@vger.kernel.org,
	linux-mm@kvack.org,  linux-riscv@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org,  loongarch@lists.linux.dev
Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging
Date: Fri, 31 May 2024 14:06:49 -0700	[thread overview]
Message-ID: <CALzav=e4PmGV6ayuMCwbKWq8EnOomYKBj-0Lj+gV-kPO=h156A@mail.gmail.com> (raw)
In-Reply-To: <CAOUHufa50Dy8CJ5+D10Khs4NU-3Pv0B8qi-GYkcppctTVUkPcA@mail.gmail.com>

On Fri, May 31, 2024 at 1:31 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Fri, May 31, 2024 at 1:24 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > On Wed, May 29, 2024 at 03:03:21PM -0600, Yu Zhao wrote:
> > > On Wed, May 29, 2024 at 12:05 PM James Houghton <jthoughton@google.com> wrote:
> > > >
> > > > Secondary MMUs are currently consulted for access/age information at
> > > > eviction time, but before then, we don't get accurate age information.
> > > > That is, pages that are mostly accessed through a secondary MMU (like
> > > > guest memory, used by KVM) will always just proceed down to the oldest
> > > > generation, and then at eviction time, if KVM reports the page to be
> > > > young, the page will be activated/promoted back to the youngest
> > > > generation.
> > >
> > > Correct, and as I explained offline, this is the only reasonable
> > > behavior if we can't locklessly walk secondary MMUs.
> > >
> > > Just for the record, the (crude) analogy I used was:
> > > Imagine a large room with many bills ($1, $5, $10, ...) on the floor,
> > > but you are only allowed to pick up 10 of them (and put them in your
> > > pocket). A smart move would be to survey the room *first and then*
> > > pick up the largest ones. But if you are carrying a 500 lbs backpack,
> > > you would just want to pick up whichever that's in front of you rather
> > > than walk the entire room.
> > >
> > > MGLRU should only scan (or lookaround) secondary MMUs if it can be
> > > done lockless. Otherwise, it should just fall back to the existing
> > > approach, which existed in previous versions but is removed in this
> > > version.
> >
> > Grabbing the MMU lock for write to scan sucks, no argument there. But
> > can you please be specific about the impact of read lock v. RCU in the
> > case of arm64? I had asked about this before and you never replied.
> >
> > My concern remains that adding support for software table walkers
> > outside of the MMU lock entirely requires more work than just deferring
> > the deallocation to an RCU callback. Walkers that previously assumed
> > 'exclusive' access while holding the MMU lock for write must now cope
> > with volatile PTEs.
> >
> > Yes, this problem already exists when hardware sets the AF, but the
> > lock-free walker implementation needs to be generic so it can be applied
> > for other PTE bits.
>
> Direct reclaim is multi-threaded and each reclaimer can take the mmu
> lock for read (testing the A-bit) or write (unmapping before paging
> out) on arm64. The fundamental problem of using the readers-writer
> lock in this case is priority inversion: the readers have lower
> priority than the writers, so ideally, we don't want the readers to
> block the writers at all.
>
> Using my previous (crude) analogy: puting the bill right in front of
> you (the writers) profits immediately whereas searching for the
> largest bill (the readers) can be futile.
>
> As I said earlier, I prefer we drop the arm64 support for now, but I
> will not object to taking the mmu lock for read when clearing the
> A-bit, as long as we fully understand the problem here and document it
> clearly.

FWIW, Google Cloud has been doing proactive reclaim and kstaled-based
aging (a Google-internal page aging daemon, for those outside of
Google) for many years on x86 VMs with the A-bit harvesting
under the write-lock. So I'm skeptical that making ARM64 lockless is
necessary to allow Secondary MMUs to participate in MGLRU aging with
acceptable performance for Cloud usecases. I don't even think it's
necessary on x86 but it's a simple enough change that we might as well
just do it.

I suspect under pathological conditions (host under intense memory
pressure and high rate of reclaim occurring) making A-bit harvesting
lockless will perform better. But under such conditions VM performance
is likely going to suffer regardless. In a Cloud environment we deal
with that through other mechanisms to reduce the rate of reclaim and
make the host healthy.

For these reasons, I think there's value in giving users the option to
enable Secondary MMUs participation MGLRU aging even when A-bit
test/clearing is not done locklessly. I believe this was James' intent
with the Kconfig. Perhaps a default-off writable module parameter
would be better to avoid distros accidentally turning it on?

If and when there is a usecase for optimizing VM performance under
pathological reclaim conditions on ARM, we can make it lockless then.


  reply	other threads:[~2024-05-31 21:07 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29 18:05 [PATCH v4 0/7] mm: multi-gen LRU: Walk secondary MMU page tables while aging James Houghton
2024-05-29 18:05 ` [PATCH v4 1/7] mm/Kconfig: Add LRU_GEN_WALKS_SECONDARY_MMU James Houghton
2024-05-29 18:05 ` [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging James Houghton
2024-05-29 21:03   ` Yu Zhao
2024-05-29 21:59     ` Sean Christopherson
2024-05-29 22:21       ` Yu Zhao
2024-05-29 22:58         ` Sean Christopherson
2024-05-30  1:08           ` James Houghton
2024-05-31  6:05             ` Yu Zhao
2024-05-31  7:02               ` Oliver Upton
2024-05-31 16:45                 ` Yu Zhao
2024-05-31 18:41                   ` Oliver Upton
2024-06-03 22:45               ` James Houghton
2024-06-03 23:03                 ` Sean Christopherson
2024-06-03 23:16                   ` James Houghton
2024-06-04  0:23                     ` Sean Christopherson
2024-05-31  7:24     ` Oliver Upton
2024-05-31 20:31       ` Yu Zhao
2024-05-31 21:06         ` David Matlack [this message]
2024-05-31 21:09           ` David Matlack
2024-05-31 21:18         ` Oliver Upton
2024-05-29 18:05 ` [PATCH v4 3/7] KVM: Add lockless memslot walk to KVM James Houghton
2024-05-29 21:51   ` Sean Christopherson
2024-05-30  3:26     ` James Houghton
2024-05-29 18:05 ` [PATCH v4 4/7] KVM: Move MMU lock acquisition for test/clear_young to architecture James Houghton
2024-05-29 21:55   ` Sean Christopherson
2024-05-30  3:27     ` James Houghton
2024-05-29 18:05 ` [PATCH v4 5/7] KVM: x86: Relax locking for kvm_test_age_gfn and kvm_age_gfn James Houghton
2024-05-29 18:05 ` [PATCH v4 6/7] KVM: arm64: " James Houghton
2024-05-31 19:11   ` Oliver Upton
2024-05-31 19:18     ` Oliver Upton
2024-06-04 22:20       ` James Houghton
2024-06-04 23:00         ` Oliver Upton
2024-06-04 23:36           ` Sean Christopherson
2024-05-29 18:05 ` [PATCH v4 7/7] KVM: selftests: Add multi-gen LRU aging to access_tracking_perf_test James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALzav=e4PmGV6ayuMCwbKWq8EnOomYKBj-0Lj+gV-kPO=h156A@mail.gmail.com' \
    --to=dmatlack@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankita@nvidia.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=atishp@atishpatra.org \
    --cc=axelrasmussen@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=corbet@lwn.net \
    --cc=james.morse@arm.com \
    --cc=jthoughton@google.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=maobibo@loongson.cn \
    --cc=maz@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=rananta@google.com \
    --cc=rientjes@google.com \
    --cc=ryan.roberts@arm.com \
    --cc=seanjc@google.com \
    --cc=shahuang@redhat.com \
    --cc=shuah@kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    --cc=yuzhao@google.com \
    --cc=zhaotianrui@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox