From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E86FCC3DA49 for ; Thu, 18 Jul 2024 13:14:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 180BD6B0085; Thu, 18 Jul 2024 09:14:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 10A3C6B0088; Thu, 18 Jul 2024 09:14:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EED796B0092; Thu, 18 Jul 2024 09:14:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CFF5D6B0085 for ; Thu, 18 Jul 2024 09:14:40 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 86A7CC0F23 for ; Thu, 18 Jul 2024 13:14:40 +0000 (UTC) X-FDA: 82352917920.19.FA18F6A Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf15.hostedemail.com (Postfix) with ESMTP id 292BEA002D for ; Thu, 18 Jul 2024 13:14:37 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=kohwbtlZ; spf=pass (imf15.hostedemail.com: domain of will@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=will@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721308458; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zWOb+ml3l//+4sDQm/MwgJu7OY7F1S20N+3iWKEinlw=; b=1NH5oZOIlVcf5gSaoYaRALPa5G9n16kcgIBY76tyLGPugkcPPaF3sDahD62Qbnaioav9NE ohzHgy5Xu6HFAr3HGraVo2R2FGbjVS9eSATgfmHlY8wr7oU90i/ph/VLnyWr8ARI4IiWpk fqfD4vAIhB8t4eIYftFzBtS4MtUlkEk= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=kohwbtlZ; spf=pass (imf15.hostedemail.com: domain of will@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=will@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721308458; a=rsa-sha256; cv=none; b=WULGR6qtp0YEK4SbOOBkMHQokejQNoKMP585kATBOCAru8BWnq+i0G3hwZuLHc9iwfk0gW oWOfWUUjzOvdLA03WFvyUXBmO7FH0dRoHKWsE/vsQ6s6/zkOVN8XJjquB8eTtaRiryKyUe Nvphfg6qM8pIPwVFnQkpGnkTEGu7vIA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id C782CCE1A72; Thu, 18 Jul 2024 13:14:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 655A0C116B1; Thu, 18 Jul 2024 13:14:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721308474; bh=HvrJ9oF4i6bQhzSV0CfGEEb9fgSftBt40VoTMdJ6uxw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=kohwbtlZAW0NkahfLKpf6AS09Jn/Gy//fwHk2p2jfOja9MlwghkCWBkrohbkyV26v RvFn597yfmv6Js+6xv1t/XwPcs3NDtAQueB0BMBGlZiSgdoybnkC+DgCn5mPz3qIBO T6t1rAtbuMd5kRVzMv4xO8He/BJF0jERUtOFLsSKRVO/jYGDvMTSNtmxWfcce+B5PA L1d2COfz5pz7y3syNoajDo3KcroceWhEUhGZVcOnW4LrUhFgdgG4JYFP1exBCtvjaz uQSdaLmWehWYGSauYvbItyCdJ25N3SecMB47BRSIBH3eNfi5b7/V7uSLpJgPdgT46X QPcEIgTlYZRow== Date: Thu, 18 Jul 2024 14:14:28 +0100 From: Will Deacon To: Asahi Lina Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, asahi@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Catalin Marinas , ryan.roberts@arm.com, mark.rutland@arm.com, ardb@kernel.org Subject: Re: LPA2 on non-LPA2 hardware broken with 16K pages Message-ID: <20240718131428.GA21243@willie-the-truck> References: <50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 292BEA002D X-Stat-Signature: 487pf4k14ut8uxrugnajo597fcq49kw9 X-Rspam-User: X-HE-Tag: 1721308477-423844 X-HE-Meta: U2FsdGVkX19y0Gt1uUP1gLyc75lE/TdRjxasaYybpRAOeLdsLBKMTIwF5wGzgj08+cncPCTNuY/N4yyO07GYFyp1Fv5zBJCwRckHSq5uvxwhtIBNUAQs3dPvin/dylgCZ35HSAHFeWmrm0RAUe4YD3i2UfgkUNIikMTyaFLEbg3mXGxB2nF4EXmGCcu6CoQRYiY7YP3KvSbd4jGqE6KDGoiO1T6VEoE9sGZeUsHe/+IUjpGe2rlyQ0dtTQgaP6A4mi42ZGgGnur+Uz4IaukvrhyVjokWm/BZVrwAgzCYpmcLL3yIqhVyv7vgpEFyQnbcjhd7N096p1nKhpEqMXX0HZEmkU9ARVVKQjPTLU71VWHQfQ3dl3vsTwVf79Ao6GxxwKTKyH3wEHoxRhUO3S/wxpjBVQb/kRUuWsArLkPEwhgrq3Z2H9YJMRL3MvSDMw/41qO1MGQ3ZjZrxFGHR1TPw0VEPfIwCI9S6xO3lpgIcgBnh9rdCLBp00EfrANjlZM8zGZlLmJGDJLSNzNnWBcQb8L2CqVId5OfHF/E22h68y0SmmNQY3bcMRmezHlwVSecuq9aiHh8hjfj20WtKRwRXRLvP2Ogj+nzAGS8PyWQapNRr8tmAG3wgQAu1/QduZkk/lkBLcsTbXvXoUQMxj1dvb2Yg4fyrBVWw4RZyTZ2J8e3NYV18TVcOdfJZR/ICSvIOFzCJsRHbOeh6DHXa3rds0HoTvlgKF3Xp4GjOzYunIQT82/O0CqFGE7HghF6r8S9Oi+gTh38A03TLuajzJgjqKObZW3YVKOtM2d3rHthmJcvRVLnG2UdlhxT9Tfw3PokbVSOoB6TwfGo1bbNpv8SZ70InyY8kaSEvgMfl6a7+3BsmK1ChUvNQAUdZ3LGi58a8DjuQ48JMSLtLXYST/xlOg+Hmlp3m8vScZw6ZBtsP6VRk1SBaRAvqsF6a0M6/zDKH0SLpQ+6cuYzmMVXnfM nDVQ9k3W i4mu450SYsNS+ZGtACxiJTnqoQYGNRicxur13n9+OXcMBfAaZnm4+MQzZCbEVV6sDzJLtiuxzIhquzx71I0fHtNbw9uTfD6UF+mhB0CWw9Xy8O2jCKpTxhLumelxubegwih/ZKU2JZCcOPsPUA/4vnkagFlpyc0AbHbQM0LKzBhYW2qM/cQq9mmxFdw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Lina, [+Ard, Mark and Ryan], On Thu, Jul 18, 2024 at 06:39:10PM +0900, Asahi Lina wrote: > I ran into this with the Asahi Linux downstream kernel, based on v6.9.9, > but I believe the problem is also still upstream. The issue seems to be > an interaction between folding one page table level at compile time and > another one at runtime. Thanks for reporting this! > With this config, we have: > > CONFIG_PGTABLE_LEVELS=4 > PAGE_SHIFT=14 > PMD_SHIFT=25 > PUD_SHIFT=36 > PGDIR_SHIFT=47 > pgtable_l5_enabled() == false (compile time) > pgtable_l4_enabled() == false (runtime, due to no LPA2) I think this is 'defconfig' w/ 16k pages, although I wasn't able to trigger the issue quickly under QEMU with that. Your analysis looks correct, however. > With p4d folded at compile-time, and pud folded at runtime when LPA2 is > not supported. > > With this setup, pgd_offset() is broken since the pgd is actually > supposed to become a pud but the shift is wrong, as it is set at compile > time: > > #define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) > > static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address) > { > return (pgd + pgd_index(address)); > }; > > Then we follow the gup logic (abbreviated): > > gup_pgd_range: > pgdp = pgd_offset(current->mm, addr); > pgd_t pgd = READ_ONCE(*pgdp); > > At this point, pgd is just the 0th entry of the top level page table > (since those extra address bits will always be 0 for valid 47-bit user > addresses). > > p4d then gets folded via pgtable-nop4d.h: > > gup_p4d_range: > p4dp = p4d_offset_lockless(pgdp, pgd, addr); > = p4d_offset(&(pgd), address) > = &pgd > p4d_t p4d = READ_ONCE(*p4dp); > > Now we have p4dp = stack address of pgd, and p4d = pgd. > > gup_pud_range: > pudp = pud_offset_lockless(p4dp, p4d, addr); > -> if (!pgtable_l4_enabled()) > = p4d_to_folded_pud(p4dp, addr); > = (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr); > pud_t pud = READ_ONCE(*pudp); > > Which is bad pointer math because it only works if p4dp points to a real > page table entry inside a page table, not a single u64 stack address. Cheers for the explanation; I agree that 6.10 looks like it's affected in the same way, even though I couldn't reproduce the crash. I think the root of the problem is that p4d_offset_lockless() returns a stack address when the p4d level is folded. I wondered about changing the dummy pXd_offset_lockless() definitions in linux/pgtable.h to pass the real pointer through instead of the address of the local, but then I suppose _most_ pXd_offset() implementations are going to dereference that and it would break the whole point of having _lockless routines to start with. What if we provided our own implementation of p4d_offset_lockless() for the folding case, which could just propagate the page-table pointer? Diff below. > This causes random oopses in internal_get_user_pages_fast and related > codepaths. Do you have a reliable way to trigger those? I tried doing some GUPpy things like strace (access_process_vm()) but it all seemed fine. Thanks, Will --->8 diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index f8efbc128446..3afe624a39e1 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1065,6 +1065,13 @@ static inline bool pgtable_l5_enabled(void) { return false; } #define p4d_offset_kimg(dir,addr) ((p4d_t *)dir) +static inline +p4d_t *p4d_offset_lockless(pgd_t *pgdp, pgd_t pgd, unsigned long addr) +{ + return p4d_offset(pgdp, addr); +} +#define p4d_offset_lockless p4d_offset_lockless + #endif /* CONFIG_PGTABLE_LEVELS > 4 */ #define pgd_ERROR(e) \