linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ardb@kernel.org>
To: Will Deacon <will@kernel.org>
Cc: Asahi Lina <lina@asahilina.net>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 asahi@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	 Catalin Marinas <catalin.marinas@arm.com>,
	ryan.roberts@arm.com, mark.rutland@arm.com
Subject: Re: LPA2 on non-LPA2 hardware broken with 16K pages
Date: Fri, 19 Jul 2024 11:02:29 -0700	[thread overview]
Message-ID: <CAMj1kXFi0sRVMRNhMVEnYBrLT4DycPoDMUa9VkP8wqqdf59eeA@mail.gmail.com> (raw)
In-Reply-To: <20240718131428.GA21243@willie-the-truck>

On Thu, 18 Jul 2024 at 06:14, Will Deacon <will@kernel.org> wrote:
>
> Hi Lina, [+Ard, Mark and Ryan],
>
> On Thu, Jul 18, 2024 at 06:39:10PM +0900, Asahi Lina wrote:
> > I ran into this with the Asahi Linux downstream kernel, based on v6.9.9,
> > but I believe the problem is also still upstream. The issue seems to be
> > an interaction between folding one page table level at compile time and
> > another one at runtime.
>
> Thanks for reporting this!
>
> > With this config, we have:
> >
> > CONFIG_PGTABLE_LEVELS=4
> > PAGE_SHIFT=14
> > PMD_SHIFT=25
> > PUD_SHIFT=36
> > PGDIR_SHIFT=47
> > pgtable_l5_enabled() == false (compile time)
> > pgtable_l4_enabled() == false (runtime, due to no LPA2)
>
> I think this is 'defconfig' w/ 16k pages, although I wasn't able to
> trigger the issue quickly under QEMU with that. Your analysis looks
> correct, however.
>
> > With p4d folded at compile-time, and pud folded at runtime when LPA2 is
> > not supported.
> >
> > With this setup, pgd_offset() is broken since the pgd is actually
> > supposed to become a pud but the shift is wrong, as it is set at compile
> > time:
> >
> > #define pgd_index(a)  (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
> >
> > static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address)
> > {
> >         return (pgd + pgd_index(address));
> > };
> >
> > Then we follow the gup logic (abbreviated):
> >
> > gup_pgd_range:
> >     pgdp = pgd_offset(current->mm, addr);
> >     pgd_t pgd = READ_ONCE(*pgdp);
> >
> > At this point, pgd is just the 0th entry of the top level page table
> > (since those extra address bits will always be 0 for valid 47-bit user
> > addresses).
> >
> > p4d then gets folded via pgtable-nop4d.h:
> >
> > gup_p4d_range:
> >     p4dp = p4d_offset_lockless(pgdp, pgd, addr);
> >          = p4d_offset(&(pgd), address)
> >          = &pgd
> >     p4d_t p4d = READ_ONCE(*p4dp);
> >
> > Now we have p4dp = stack address of pgd, and p4d = pgd.
> >
> > gup_pud_range:
> >     pudp = pud_offset_lockless(p4dp, p4d, addr);
> >          -> if (!pgtable_l4_enabled())
> >            = p4d_to_folded_pud(p4dp, addr);
> >            = (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr);
> >     pud_t pud = READ_ONCE(*pudp);
> >
> > Which is bad pointer math because it only works if p4dp points to a real
> > page table entry inside a page table, not a single u64 stack address.
>
> Cheers for the explanation; I agree that 6.10 looks like it's affected
> in the same way, even though I couldn't reproduce the crash. I think the
> root of the problem is that p4d_offset_lockless() returns a stack
> address when the p4d level is folded. I wondered about changing the
> dummy pXd_offset_lockless() definitions in linux/pgtable.h to pass the
> real pointer through instead of the address of the local, but then I
> suppose _most_ pXd_offset() implementations are going to dereference
> that and it would break the whole point of having _lockless routines
> to start with.
>
> What if we provided our own implementation of p4d_offset_lockless()
> for the folding case, which could just propagate the page-table pointer?
> Diff below.
>
> > This causes random oopses in internal_get_user_pages_fast and related
> > codepaths.
>
> Do you have a reliable way to trigger those? I tried doing some GUPpy
> things like strace (access_process_vm()) but it all seemed fine.
>

Thanks for the cc, and thanks to Lina for the excellent diagnosis -
this is really helpful.

> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index f8efbc128446..3afe624a39e1 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1065,6 +1065,13 @@ static inline bool pgtable_l5_enabled(void) { return false; }
>
>  #define p4d_offset_kimg(dir,addr)      ((p4d_t *)dir)
>
> +static inline
> +p4d_t *p4d_offset_lockless(pgd_t *pgdp, pgd_t pgd, unsigned long addr)

This is in the wrong place, I think - we already define this for the
5-level case (around line 1760).

We'll need to introduce another version for the 4-level case, so
perhaps, to reduce the risk of confusion, we might define it as

static inline
p4d_t *p4d_offset_lockless_folded(pgd_t *pgdp, pgd_t pgd, unsigned long addr)
{
...
}
#ifdef __PAGETABLE_P4D_FOLDED
#define p4d_offset_lockless p4d_offset_lockless_folded
#endif


> +{

We might add

if (pgtable_l4_enabled())
    pgdp = &pgd;

here to preserve the existing 'lockless' behavior when PUDs are not folded.


> +       return p4d_offset(pgdp, addr);
> +}
> +#define p4d_offset_lockless p4d_offset_lockless
> +
>  #endif  /* CONFIG_PGTABLE_LEVELS > 4 */
>

I suggest we also add something like the below so we can catch these
issues more easily

--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -874,9 +874,26 @@ static inline phys_addr_t p4d_page_paddr(p4d_t p4d)

 static inline pud_t *p4d_to_folded_pud(p4d_t *p4dp, unsigned long addr)
 {
+       /*
+        * The transformation below does not work correctly for descriptors
+        * copied to the stack.
+        */
+       VM_WARN_ON((u64)p4dp >= VMALLOC_START && !__is_kernel((u64)p4dp));
+
        return (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr);
 }


  parent reply	other threads:[~2024-07-19 18:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-18  9:39 Asahi Lina
2024-07-18 13:14 ` Will Deacon
2024-07-18 13:21   ` Dev Jain
2024-07-18 14:34   ` Asahi Lina
2024-07-19 18:02   ` Ard Biesheuvel [this message]
2024-07-23 14:52     ` Will Deacon
2024-07-23 15:02       ` Ard Biesheuvel
2024-07-23 16:05         ` Will Deacon
2024-07-23 16:28           ` Ard Biesheuvel
2024-07-24 11:33             ` Will Deacon
2024-07-24 12:10               ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMj1kXFi0sRVMRNhMVEnYBrLT4DycPoDMUa9VkP8wqqdf59eeA@mail.gmail.com \
    --to=ardb@kernel.org \
    --cc=asahi@lists.linux.dev \
    --cc=catalin.marinas@arm.com \
    --cc=lina@asahilina.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=ryan.roberts@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox