linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Phil Elwell <phil@raspberrypi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,  linux-rpi-kernel@lists.infradead.org,
	 Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Will Deacon <will@kernel.org>
Subject: Re: Questions about TLB flushing and lru_gen_look_around
Date: Thu, 12 Sep 2024 21:59:00 -0600	[thread overview]
Message-ID: <CAOUHufb6-8Ti-Ey-rf9xmbk6gTwOjaxivTd76GVA343EJHVg7w@mail.gmail.com> (raw)
In-Reply-To: <CAMEGJJ1tDp+ujAdSM+3_TtSmKp7AWD=PFA51Rg1SvfP4nAc2Zg@mail.gmail.com>

Hi Phil,

On Thu, Sep 12, 2024 at 7:03 AM Phil Elwell <phil@raspberrypi.com> wrote:
>
> Hi,
>
> I've spent many hours recently trying to diagnose a problem that
> manifests as a CPU spin, under load and memory pressure, that can last
> for many seconds. The problem can be seen on our downstream kernels
> from 6.5 onwards, when built for ARCH=arm, running on a Pi 3B (BCM2837
> - quad A53). I've not tested a pure Linux 6.5, but this is not a bug
> report.
>
> Pi 3B has limited RAM (1GB), and it was discovered that restricting
> this further to 512MB made the spins more frequent, as did adding
> other processes. Running an ARM64 kernel in the same configuration
> leads to normal OOM behaviour.
>
> I traced the spin to a loop in __copy_to_user_memcpy where
> pin_page_for_write fails repeatedly, sometimes for hundreds of
> thousands of times. The pin is failing because the user page in
> question is marked as being old (L_PTE_YOUNG is unset). When this
> happens, the code tries to freshen the page using __put_user, but in
> this case it is not triggering the required page fault. Digging
> deeper, it can be seen that the PTE in the ARM's shadow hardware PTE
> is 0 as expected, but clearly the MMU is not seeing this otherwise it
> would be faulting; a TLB flush for that PTE fixes it.
>
> The TLB non-coherency for that PTE can be attributed to a call to
> ptep_test_and_clear_young from lru_gen_look_around, which clears the
> L_PTE_YOUNG bit in the Linux PTE

Yes, it does that.

> and zeroes the hardware PTE

I don't see how it can happen, or why it's needed. Could you explain?

> but doesn't call flush_tlb_cache.

Correct, and this is because that arch-specific API currently doesn't
require TLB flushes, from the MM's POV. None of the current callers
does, I doubt they were used on arm (32 bit) at all, except MGLRU.

> Two possible "fixes" are:
>
> a. Replace ptep_test_and_clear_young with ptep_clear_flush_young,
> which includes the TLB flush.
> b. After the loop over the page range from "start" to "end", include a
> call to flush_tlb_range from "start" to "end" if the "young" count is
> non-zero.
>
> My questions are:
>
> 1. Which bit of code is meant to take care of TLB coherency where
> lru_gen_look_around has made changes?

None, since the API doesn't explicitly require it (or at least the MM
assumes), as I mentioned above.

> 2. Between the two patches a) and b), which is preferable? b) would
> seem better if IPIs are needed to broadcast the TLB flushes, but it
> seems that BCM2837 has new enough CPU cores not to require such
> broadcasts.

Could this be fixed within arm? If not, we would have to update the
requirement of that arch-specific API. This would affect other archs
that don't require TLB flushes, assuming they exist. And we would need
to fix all callers of ptep_test_and_clear_young() in MM.

> 3. walk_pte_range has a similar loop, but it seems it doesn't need to
> be patched to fix my spin, possibly because it isn't called.

Correct.

> If a
> patch to lru_gen_look_around is needed, might one be needed here as
> well?

No, because that code is disabled, unless hardware can set A-bit,
e.g., arm64 v8.2.

Thanks.


  reply	other threads:[~2024-09-13  3:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-12 13:03 Phil Elwell
2024-09-13  3:59 ` Yu Zhao [this message]
2024-09-13  8:50   ` Phil Elwell
2024-09-26 18:34     ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOUHufb6-8Ti-Ey-rf9xmbk6gTwOjaxivTd76GVA343EJHVg7w@mail.gmail.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rpi-kernel@lists.infradead.org \
    --cc=phil@raspberrypi.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox