From: Yu Zhao <yuzhao@google.com>
To: Phil Elwell <phil@raspberrypi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-rpi-kernel@lists.infradead.org,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Will Deacon <will@kernel.org>
Subject: Re: Questions about TLB flushing and lru_gen_look_around
Date: Thu, 12 Sep 2024 21:59:00 -0600 [thread overview]
Message-ID: <CAOUHufb6-8Ti-Ey-rf9xmbk6gTwOjaxivTd76GVA343EJHVg7w@mail.gmail.com> (raw)
In-Reply-To: <CAMEGJJ1tDp+ujAdSM+3_TtSmKp7AWD=PFA51Rg1SvfP4nAc2Zg@mail.gmail.com>
Hi Phil,
On Thu, Sep 12, 2024 at 7:03 AM Phil Elwell <phil@raspberrypi.com> wrote:
>
> Hi,
>
> I've spent many hours recently trying to diagnose a problem that
> manifests as a CPU spin, under load and memory pressure, that can last
> for many seconds. The problem can be seen on our downstream kernels
> from 6.5 onwards, when built for ARCH=arm, running on a Pi 3B (BCM2837
> - quad A53). I've not tested a pure Linux 6.5, but this is not a bug
> report.
>
> Pi 3B has limited RAM (1GB), and it was discovered that restricting
> this further to 512MB made the spins more frequent, as did adding
> other processes. Running an ARM64 kernel in the same configuration
> leads to normal OOM behaviour.
>
> I traced the spin to a loop in __copy_to_user_memcpy where
> pin_page_for_write fails repeatedly, sometimes for hundreds of
> thousands of times. The pin is failing because the user page in
> question is marked as being old (L_PTE_YOUNG is unset). When this
> happens, the code tries to freshen the page using __put_user, but in
> this case it is not triggering the required page fault. Digging
> deeper, it can be seen that the PTE in the ARM's shadow hardware PTE
> is 0 as expected, but clearly the MMU is not seeing this otherwise it
> would be faulting; a TLB flush for that PTE fixes it.
>
> The TLB non-coherency for that PTE can be attributed to a call to
> ptep_test_and_clear_young from lru_gen_look_around, which clears the
> L_PTE_YOUNG bit in the Linux PTE
Yes, it does that.
> and zeroes the hardware PTE
I don't see how it can happen, or why it's needed. Could you explain?
> but doesn't call flush_tlb_cache.
Correct, and this is because that arch-specific API currently doesn't
require TLB flushes, from the MM's POV. None of the current callers
does, I doubt they were used on arm (32 bit) at all, except MGLRU.
> Two possible "fixes" are:
>
> a. Replace ptep_test_and_clear_young with ptep_clear_flush_young,
> which includes the TLB flush.
> b. After the loop over the page range from "start" to "end", include a
> call to flush_tlb_range from "start" to "end" if the "young" count is
> non-zero.
>
> My questions are:
>
> 1. Which bit of code is meant to take care of TLB coherency where
> lru_gen_look_around has made changes?
None, since the API doesn't explicitly require it (or at least the MM
assumes), as I mentioned above.
> 2. Between the two patches a) and b), which is preferable? b) would
> seem better if IPIs are needed to broadcast the TLB flushes, but it
> seems that BCM2837 has new enough CPU cores not to require such
> broadcasts.
Could this be fixed within arm? If not, we would have to update the
requirement of that arch-specific API. This would affect other archs
that don't require TLB flushes, assuming they exist. And we would need
to fix all callers of ptep_test_and_clear_young() in MM.
> 3. walk_pte_range has a similar loop, but it seems it doesn't need to
> be patched to fix my spin, possibly because it isn't called.
Correct.
> If a
> patch to lru_gen_look_around is needed, might one be needed here as
> well?
No, because that code is disabled, unless hardware can set A-bit,
e.g., arm64 v8.2.
Thanks.
next prev parent reply other threads:[~2024-09-13 3:59 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-12 13:03 Phil Elwell
2024-09-13 3:59 ` Yu Zhao [this message]
2024-09-13 8:50 ` Phil Elwell
2024-09-26 18:34 ` Yu Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOUHufb6-8Ti-Ey-rf9xmbk6gTwOjaxivTd76GVA343EJHVg7w@mail.gmail.com \
--to=yuzhao@google.com \
--cc=akpm@linux-foundation.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=linux-rpi-kernel@lists.infradead.org \
--cc=phil@raspberrypi.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox