From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A86E2C3DA59 for ; Fri, 19 Jul 2024 18:02:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E38BA6B0082; Fri, 19 Jul 2024 14:02:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEA9B6B0083; Fri, 19 Jul 2024 14:02:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB1676B0088; Fri, 19 Jul 2024 14:02:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AD3E36B0082 for ; Fri, 19 Jul 2024 14:02:46 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3FA26C089F for ; Fri, 19 Jul 2024 18:02:46 +0000 (UTC) X-FDA: 82357272732.30.937A23A Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf15.hostedemail.com (Postfix) with ESMTP id 470BEA0005 for ; Fri, 19 Jul 2024 18:02:44 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ogeL58es; spf=pass (imf15.hostedemail.com: domain of ardb@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=ardb@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721412108; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=98Ho+ObWOCU9ReQT8mta+vi0g6UlfxoCgT5/CgDLbfc=; b=MZdb8zWWH/cjureUDhHlwSPcTrP/ZvA8DF3VWgZOT7LmbpEX3bwFKhrctx48WBQrYoBhmF d/vPQeCcZcBMnBp0pXFPvlQ3gXACE3Xl6UmP9u7EdOE+szEN5UFfE16MP1M7V2er9UmNWP fp5Csy0syEpu51tOosM/R6NKZNHBg28= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ogeL58es; spf=pass (imf15.hostedemail.com: domain of ardb@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=ardb@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721412108; a=rsa-sha256; cv=none; b=Ag+knaLbFjKmhtn0mRKBxEUluYD82T8QZqnsNA0YX82aCS2jyoLG287pmDER0ypNWpENIS ESjIxelInl+7pnuPZYjWW4PKTI7JY3oeyWeI0In4KTLgTMgiaXrygsxi/ECv4zcOjmfvoa 60VLAhqwmKtpuPodOwR9OWsxbZw7A5U= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 24CCF61C60 for ; Fri, 19 Jul 2024 18:02:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4DCAC32782 for ; Fri, 19 Jul 2024 18:02:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721412162; bh=n1FqbzjvL4oeiSE+vq0cxm8HZQx17StXvNMY2i+PePY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=ogeL58esXl77oq2Zo5Pekycl0w01K+pGYBajI2l2WC/MF2JKvXu849ERSxJm2fe1x LGGZad1Js+G7NFQGe/bpxrAfTYAE6C8tiVOq6ZukWwZ+7YDz4HfM6y2V+3YLLo75pt 70jcMtIqeAHLPJkw0SN0k2nXR5+WkhsHD7yBfE6zQ728W9J/mCG56m3STAWH2qbtdy Z/yiCYIodH3DWHd1EGH4CWa4GoomfOlA+V3oA7tasoAVzPdAwoGpoUE3ROHKH6u5rf cxDF5x3Zu8Zb8owLmQAKEKWrVTAU43gZyEMaapQQDpEunzSiOk/BhXAlAM50eX8njk 9tN8mdPpybD3A== Received: by mail-lf1-f47.google.com with SMTP id 2adb3069b0e04-52e9c55febcso2741941e87.2 for ; Fri, 19 Jul 2024 11:02:42 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCUsLnYY1UM8/jUAGb3IY3H/Ybnyi0ZMTBmjiXFxwLoIc+AmEurOuriwNXseTxw5yjHZxdQ5LTl1KIlhiD/1oCBI0Vg= X-Gm-Message-State: AOJu0YxavTnUVw+UkDH1/NfjOLVxm2nsAlOdjlNtoqQvCRmvSyL2+tLy N2wbCAfP3zGR9GYIocLy+aEQ4TjMvSa6KV0mxPCOnYtkebWjWBMGpRZRcKF88QgBwtMp7VkK20s 9pBDRPg9Tii+vlOPgN5/QkyhxTFY= X-Google-Smtp-Source: AGHT+IHdVWJSMOJxE8p+Abn2JQ25UNR6Oqo7buQXEDwsnwYETG8oBRMNNZyFdqWYUNefLoY1WYzkR1+tJv19zPWkQoQ= X-Received: by 2002:a2e:8ed2:0:b0:2ec:543a:b629 with SMTP id 38308e7fff4ca-2ef168268d8mr3248841fa.34.1721412161045; Fri, 19 Jul 2024 11:02:41 -0700 (PDT) MIME-Version: 1.0 References: <50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net> <20240718131428.GA21243@willie-the-truck> In-Reply-To: <20240718131428.GA21243@willie-the-truck> From: Ard Biesheuvel Date: Fri, 19 Jul 2024 11:02:29 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: LPA2 on non-LPA2 hardware broken with 16K pages To: Will Deacon Cc: Asahi Lina , linux-mm@kvack.org, linux-kernel@vger.kernel.org, asahi@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Catalin Marinas , ryan.roberts@arm.com, mark.rutland@arm.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 470BEA0005 X-Stat-Signature: z5kpjp4t7cbez6z8gz17jh1bfgucta93 X-Rspam-User: X-HE-Tag: 1721412164-149003 X-HE-Meta: U2FsdGVkX1//oPDQlcw9ATRWCjGyMAs9urEzynKzHpx4Ssd95EnIm8zYWyASSRFGE9aVttDjksojTTAxthZdnsGxgwM3gVXMm5nT6tHpTvmaTrobSbe2BzFtKdyhLJsYuuatZbg5JSG/lVhr3UVrMO+c7UjffmJ50nkaQ1JieqiZO6wms/LiYfoFPOj0ISmm3CxipITpx67Mx5cF1l5FbAkCoOrNLPCaJzG346HCS6Ors6S8e7dyBXE/1k+Zl66ENSWeFOmJVAw/BfO3OHpyU4E/uEBsrykiKsc4OiBNad/cr0tJQfmifom/hy+S0y3BEO0YLTztyT2dcc/y4WIHy8kNSqWKfoehorfE3jNFRgq63ay11vOLJt/blM8uYbQxIAklQisiOUXmhtlM8y6SxqBoytln1k8nrDee3OG2DfyN/2p8YaG3nZ67k2Q4QbECe+PvzvsxdJ+K+zby9wN8OLcmoC76a8s4h668O5rkjNZHtwGUNdBzNGXOr8GFL+LjahaTi6x/6U7uCT6jlD3CUg9ydNu+VIXSRFueJVpzL+5kUPAXRCNvOEeFnwz4wYZEpKPUP+WLFTo2TMv8iyz2pK2yamVNSYlNFvfiC+NdaKwX599obDrlX5RH1hXsIS6/z0/kD00UMB/DsBCt+h6njpTIIT2BvAq0n9iiTNdexnX0wWb6lwkVOc+t9HrTPEEh2ntsocaUXfw3JR7Ie3LF0vd7GShFLbdvldaPlbqI0O+531bcPQK3nlGHhYSiHbNGISFJXWHKWjfzr/hbHGClLP628lP7RCYPY24fgkowPJMU+MFCT59AN7HB0la6eDPTLu+kRpB9szj01hrCWHZ121SsuFX5qh7mgdvKgyk5TKEX3F0cDF8s9AWNtMeIIUc/1KR2iAjJc8Aws6x/47F5AdRxmGThloD0NT4OU422ush+LM+tFq0E61+kulP2RX38WpABPqtt2CgFoqyft8l ay2GtMur kfK2v4O/3BVpTSFzDnbQtbbfx/iqQ2iOv/tUF0bwM8rtkVYLkrlqOrr3nLCq9Jt6jd97QTKAXJRy6aI7ze/+hU1rKT/hHAu5ueb/z1VO4uLDbCwsVVGSFtKjAE8uqEkWY1NBdVgSdx6cMzYbN4gp4a0ZLZdzWmZXSC4bMcFr8Abby9THBEqHyr1AtWzWkFJGQoIemUDCAXq3hVdHFaERXPBkPI+nG9nxz/d+yMBn2c6QpP579ooOrQktk+zWC9FmOJGoS6tabpl80tnKV9G+JXZNLjnLOiswZ24ocFpbpL/BfTW7lZF4q7XcH5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 18 Jul 2024 at 06:14, Will Deacon wrote: > > Hi Lina, [+Ard, Mark and Ryan], > > On Thu, Jul 18, 2024 at 06:39:10PM +0900, Asahi Lina wrote: > > I ran into this with the Asahi Linux downstream kernel, based on v6.9.9, > > but I believe the problem is also still upstream. The issue seems to be > > an interaction between folding one page table level at compile time and > > another one at runtime. > > Thanks for reporting this! > > > With this config, we have: > > > > CONFIG_PGTABLE_LEVELS=4 > > PAGE_SHIFT=14 > > PMD_SHIFT=25 > > PUD_SHIFT=36 > > PGDIR_SHIFT=47 > > pgtable_l5_enabled() == false (compile time) > > pgtable_l4_enabled() == false (runtime, due to no LPA2) > > I think this is 'defconfig' w/ 16k pages, although I wasn't able to > trigger the issue quickly under QEMU with that. Your analysis looks > correct, however. > > > With p4d folded at compile-time, and pud folded at runtime when LPA2 is > > not supported. > > > > With this setup, pgd_offset() is broken since the pgd is actually > > supposed to become a pud but the shift is wrong, as it is set at compile > > time: > > > > #define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) > > > > static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address) > > { > > return (pgd + pgd_index(address)); > > }; > > > > Then we follow the gup logic (abbreviated): > > > > gup_pgd_range: > > pgdp = pgd_offset(current->mm, addr); > > pgd_t pgd = READ_ONCE(*pgdp); > > > > At this point, pgd is just the 0th entry of the top level page table > > (since those extra address bits will always be 0 for valid 47-bit user > > addresses). > > > > p4d then gets folded via pgtable-nop4d.h: > > > > gup_p4d_range: > > p4dp = p4d_offset_lockless(pgdp, pgd, addr); > > = p4d_offset(&(pgd), address) > > = &pgd > > p4d_t p4d = READ_ONCE(*p4dp); > > > > Now we have p4dp = stack address of pgd, and p4d = pgd. > > > > gup_pud_range: > > pudp = pud_offset_lockless(p4dp, p4d, addr); > > -> if (!pgtable_l4_enabled()) > > = p4d_to_folded_pud(p4dp, addr); > > = (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr); > > pud_t pud = READ_ONCE(*pudp); > > > > Which is bad pointer math because it only works if p4dp points to a real > > page table entry inside a page table, not a single u64 stack address. > > Cheers for the explanation; I agree that 6.10 looks like it's affected > in the same way, even though I couldn't reproduce the crash. I think the > root of the problem is that p4d_offset_lockless() returns a stack > address when the p4d level is folded. I wondered about changing the > dummy pXd_offset_lockless() definitions in linux/pgtable.h to pass the > real pointer through instead of the address of the local, but then I > suppose _most_ pXd_offset() implementations are going to dereference > that and it would break the whole point of having _lockless routines > to start with. > > What if we provided our own implementation of p4d_offset_lockless() > for the folding case, which could just propagate the page-table pointer? > Diff below. > > > This causes random oopses in internal_get_user_pages_fast and related > > codepaths. > > Do you have a reliable way to trigger those? I tried doing some GUPpy > things like strace (access_process_vm()) but it all seemed fine. > Thanks for the cc, and thanks to Lina for the excellent diagnosis - this is really helpful. > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index f8efbc128446..3afe624a39e1 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -1065,6 +1065,13 @@ static inline bool pgtable_l5_enabled(void) { return false; } > > #define p4d_offset_kimg(dir,addr) ((p4d_t *)dir) > > +static inline > +p4d_t *p4d_offset_lockless(pgd_t *pgdp, pgd_t pgd, unsigned long addr) This is in the wrong place, I think - we already define this for the 5-level case (around line 1760). We'll need to introduce another version for the 4-level case, so perhaps, to reduce the risk of confusion, we might define it as static inline p4d_t *p4d_offset_lockless_folded(pgd_t *pgdp, pgd_t pgd, unsigned long addr) { ... } #ifdef __PAGETABLE_P4D_FOLDED #define p4d_offset_lockless p4d_offset_lockless_folded #endif > +{ We might add if (pgtable_l4_enabled()) pgdp = &pgd; here to preserve the existing 'lockless' behavior when PUDs are not folded. > + return p4d_offset(pgdp, addr); > +} > +#define p4d_offset_lockless p4d_offset_lockless > + > #endif /* CONFIG_PGTABLE_LEVELS > 4 */ > I suggest we also add something like the below so we can catch these issues more easily --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -874,9 +874,26 @@ static inline phys_addr_t p4d_page_paddr(p4d_t p4d) static inline pud_t *p4d_to_folded_pud(p4d_t *p4dp, unsigned long addr) { + /* + * The transformation below does not work correctly for descriptors + * copied to the stack. + */ + VM_WARN_ON((u64)p4dp >= VMALLOC_START && !__is_kernel((u64)p4dp)); + return (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr); }