From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC9C0C3DA49 for ; Thu, 18 Jul 2024 13:22:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 57D586B0089; Thu, 18 Jul 2024 09:22:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52D926B0092; Thu, 18 Jul 2024 09:22:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41C276B0093; Thu, 18 Jul 2024 09:22:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 22ED56B0089 for ; Thu, 18 Jul 2024 09:22:00 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9DB70409ED for ; Thu, 18 Jul 2024 13:21:59 +0000 (UTC) X-FDA: 82352936358.10.D3BCE4D Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 6BB25C001C for ; Thu, 18 Jul 2024 13:21:57 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721308897; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5OJryzunSjbZh14QG82JmT/jSGHeRobNJckNVRVDiok=; b=uqaKOOHp5JlBuEUA2C/UOcZFK5/3O9+yRAHbWOA90AQt7l/mnPt/NniD9hUWpc25BERKMc 3ozxAb2gvPoMojZ62ZsMk9tbkezg8RHVetneE1MIWh2JG/y7KJxaSLOpyLwSOE9QIaOFJN d7a1i6BGiwUDH5bLxhAu4QxbheqdDyY= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721308897; a=rsa-sha256; cv=none; b=d6+WgybMClGbauTE/C2MV2XW8qzKZURfBXOL/yDU3FbRNVcMUU/7hx+A26uHk4s4uZgtzg Uk6otqVYAAvEFtgN5x1Dt2/Lw7cJBgDptEukZWv0tKsIQmpjs5E/8C0VJvAdpnHmLMR9Dp 5PfLerRzQlrx3FEH57GnPEiUHi2XnV4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA57C1063; Thu, 18 Jul 2024 06:22:21 -0700 (PDT) Received: from [10.162.42.20] (e116581.arm.com [10.162.42.20]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 08A1C3F73F; Thu, 18 Jul 2024 06:21:52 -0700 (PDT) Message-ID: <5a16730d-3153-45d2-870d-6ecdc2097b5b@arm.com> Date: Thu, 18 Jul 2024 18:51:46 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: LPA2 on non-LPA2 hardware broken with 16K pages To: Will Deacon , Asahi Lina Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, asahi@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Catalin Marinas , ryan.roberts@arm.com, mark.rutland@arm.com, ardb@kernel.org References: <50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net> <20240718131428.GA21243@willie-the-truck> Content-Language: en-US From: Dev Jain In-Reply-To: <20240718131428.GA21243@willie-the-truck> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 6BB25C001C X-Stat-Signature: z9gc31cqz9jabopqgbto1xmmkhq6zxr5 X-Rspam-User: X-HE-Tag: 1721308917-741409 X-HE-Meta: U2FsdGVkX1/Vrf97EU26hgX+8l2j7p39kvC0J2VydcYWBxZe8+lT9Xk4Yq/YlqUZ4XFIRyOrwbJGrCX43/gv5458eKSv1r7nxMVfQZIWjm7pE22uJ1w7aWmCWYj1XtoIk4qOEUYo1LQlFrtPh3+1w//h1HxKmHfRvveVgCqo90FmuSzMWuxNguEsMAW+EXMWC2g6h5C1i6MB6E3SrRgfL/uNM7hfcCPHtTPdNsE99Msl8Bl4q+D/9hx8VU5DdqBxybCSbML5C2Izs6jjfHhpxTq8fLeVqQ7BmGF/Vuug1pOoqN6BFsJ2ik/USVQllCE3VyGid57YaTpbvy6B5oTESSYC/EhlgAG7AYIYZXX42yDVcyFAsvLEd+LgB4pgsrTWedOrzQjxSzmCpeEeZYsOUlysYkjjymuEY4JObkOHiwrT8eyRJM0yGgLU1Ej+ZwKgh8frhJ2yPiC72wpQ+H6pUC6DA3wjWfo9KnTpXXJfstIq4tSNXPbMoOr9A0yjyc8oQSL4uCjQrRcUl3BAQ3PNAhC1iGn34NxA6v+1V4Vk6Al2MnF4sayQ9+D89T0GFSG2RokorQy1+BZxL+J3wuEL92h8p057qD1pSigQHOZGBA0N3MbsdpIX/+4wcB+sahP4othKOHuMm15lZufyZQuv0JnKfDs08pM3g02XUWkJSZOnXdDsC77S9vkfuvK6qxH8A6b/dZiuKUiTt1ark2XRiPyritekVmjQ6f61X0flEssjemPI1iiQAtrP4ZAJEHHTiLBnpYq5+yTgj9zhbe9VjeoVBnOKiI/Pz1deWqY74gcXx9zmJ8uju3jTLejAyUsWSext//DFWMINylTmuJNv179uOU1MPYa8FwpkM99OG0OcBdKZ/j/iuyB1absSHQyzja5D7GstsZgL6twgCIHv+ZvYaVSI0YjnxsfmMducJAnRxlXcIf1OMOricu8TIiVvtWe/8Yi9D7P0yPTXMUs OXTFUT/N t8BlB4HcBbVLu0dYuADN1D5WqmmEVgKERnxcEJzFvVT76SRCdzV4Nfxeu7T4JuIKAongZ8H2BuWNYJ2oVA/zav22dkT0mOJQdWCNEX/WNwMO9cST3EYHc4Y1amL3k4e9mI4rOmnJYjMcOTorGDvxPlmrVqQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 7/18/24 18:44, Will Deacon wrote: > Hi Lina, [+Ard, Mark and Ryan], > > On Thu, Jul 18, 2024 at 06:39:10PM +0900, Asahi Lina wrote: >> I ran into this with the Asahi Linux downstream kernel, based on v6.9.9, >> but I believe the problem is also still upstream. The issue seems to be >> an interaction between folding one page table level at compile time and >> another one at runtime. > Thanks for reporting this! > >> With this config, we have: >> >> CONFIG_PGTABLE_LEVELS=4 >> PAGE_SHIFT=14 >> PMD_SHIFT=25 >> PUD_SHIFT=36 >> PGDIR_SHIFT=47 >> pgtable_l5_enabled() == false (compile time) >> pgtable_l4_enabled() == false (runtime, due to no LPA2) > I think this is 'defconfig' w/ 16k pages, although I wasn't able to > trigger the issue quickly under QEMU with that. Your analysis looks > correct, however. Hi Will, I was also trying to debug this; indeed this is 16K defconfig, and pgtable_l4_enabled() is returning false on non-LPA2 hardware. Is this the intended behaviour? Don't we require 4-level pagetable to resolve virtual addresses on 16K? > >> With p4d folded at compile-time, and pud folded at runtime when LPA2 is >> not supported. >> >> With this setup, pgd_offset() is broken since the pgd is actually >> supposed to become a pud but the shift is wrong, as it is set at compile >> time: >> >> #define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) >> >> static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address) >> { >> return (pgd + pgd_index(address)); >> }; >> >> Then we follow the gup logic (abbreviated): >> >> gup_pgd_range: >> pgdp = pgd_offset(current->mm, addr); >> pgd_t pgd = READ_ONCE(*pgdp); >> >> At this point, pgd is just the 0th entry of the top level page table >> (since those extra address bits will always be 0 for valid 47-bit user >> addresses). >> >> p4d then gets folded via pgtable-nop4d.h: >> >> gup_p4d_range: >> p4dp = p4d_offset_lockless(pgdp, pgd, addr); >> = p4d_offset(&(pgd), address) >> = &pgd >> p4d_t p4d = READ_ONCE(*p4dp); >> >> Now we have p4dp = stack address of pgd, and p4d = pgd. >> >> gup_pud_range: >> pudp = pud_offset_lockless(p4dp, p4d, addr); >> -> if (!pgtable_l4_enabled()) >> = p4d_to_folded_pud(p4dp, addr); >> = (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr); >> pud_t pud = READ_ONCE(*pudp); >> >> Which is bad pointer math because it only works if p4dp points to a real >> page table entry inside a page table, not a single u64 stack address. > Cheers for the explanation; I agree that 6.10 looks like it's affected > in the same way, even though I couldn't reproduce the crash. I think the > root of the problem is that p4d_offset_lockless() returns a stack > address when the p4d level is folded. I wondered about changing the > dummy pXd_offset_lockless() definitions in linux/pgtable.h to pass the > real pointer through instead of the address of the local, but then I > suppose _most_ pXd_offset() implementations are going to dereference > that and it would break the whole point of having _lockless routines > to start with. > > What if we provided our own implementation of p4d_offset_lockless() > for the folding case, which could just propagate the page-table pointer? > Diff below. > >> This causes random oopses in internal_get_user_pages_fast and related >> codepaths. > Do you have a reliable way to trigger those? I tried doing some GUPpy > things like strace (access_process_vm()) but it all seemed fine. > > Thanks, > > Will > > --->8 > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index f8efbc128446..3afe624a39e1 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -1065,6 +1065,13 @@ static inline bool pgtable_l5_enabled(void) { return false; } > > #define p4d_offset_kimg(dir,addr) ((p4d_t *)dir) > > +static inline > +p4d_t *p4d_offset_lockless(pgd_t *pgdp, pgd_t pgd, unsigned long addr) > +{ > + return p4d_offset(pgdp, addr); > +} > +#define p4d_offset_lockless p4d_offset_lockless > + > #endif /* CONFIG_PGTABLE_LEVELS > 4 */ > > #define pgd_ERROR(e) \ > >