From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0075DC3DA63 for ; Wed, 24 Jul 2024 11:33:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EF446B008C; Wed, 24 Jul 2024 07:33:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39F7D6B0093; Wed, 24 Jul 2024 07:33:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2669E6B0096; Wed, 24 Jul 2024 07:33:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 083BD6B008C for ; Wed, 24 Jul 2024 07:33:51 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 68DCF1408C1 for ; Wed, 24 Jul 2024 11:33:50 +0000 (UTC) X-FDA: 82374436620.27.7811F01 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id AAAFE40008 for ; Wed, 24 Jul 2024 11:33:47 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SZ4qpA9J; spf=pass (imf07.hostedemail.com: domain of will@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=will@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721820779; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JCFCzoMJyCvzQnGZC9fAD8RFGMF21b/I+lDwWcNepIE=; b=Cb9N5zN/MqZoDlwrlhziOfUD9xrPQIqp7RxJ0gYCm7b16JJvxWT+sq9ws/JeHJqBy2ys54 7KX4gwHwTGM2TmwO/m4BQsd0dcp4dZrw97g9x3BQvSJMmzK/6u/79ZIX31Kgeqp7WQHgkT LRn35seEngNibmGP/wOuxdpVV5ZQV+U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721820779; a=rsa-sha256; cv=none; b=JDKyiA1gHiTDbYF+I2FNXTAtOG6/EZxF+uLeFA5Ejp8UamdPZkg7QDRIwuAFxss2b7gkfN UIfrLvIuYoMOtU4GHj1FubC6ySQv6ygNKAOWWcBqUk0gkgPUDVkeIv0ro9U3c09XAW2v1P JheaTNcbkLyKTntUZEnc7FVlxZMQ8tQ= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SZ4qpA9J; spf=pass (imf07.hostedemail.com: domain of will@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=will@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 7FCAD6119D; Wed, 24 Jul 2024 11:33:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7EE80C32782; Wed, 24 Jul 2024 11:33:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721820826; bh=gBHbPU82zZ4Z5Y800YED50fqun4wIRYLdr5Afy6feFQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SZ4qpA9JLoAPUfFf4NYmLc6cuTleS/PmaQSRVFPL//tKrDKb99/zmkxlWj2fUnRkd u/0hhIo71tZ1YWj9KDw4jLlO9+7D67M4G+6B6UcVWi0U/fGbanBzlWs2uyy2eAIwsl eCeuj6/HUcRzU0kZQQLPb4k8dh4HLRV4uBxkdBsDWziYBPdoUZA+TeR7WwM0eGdSnP 1t9EEwcHJUO8hGoNof2RqR/PBu9o5DVro/UjSxMMb41JiNLxguFKClOxPq77OWG/ax YgSOJsJgmnqt0P9oEIJGF85d+gLXtM9IXYedBp0wIcHZLwMJRtLKKNpuwohem22peP W1Ho4J/4rIpww== Date: Wed, 24 Jul 2024 12:33:41 +0100 From: Will Deacon To: Ard Biesheuvel Cc: Asahi Lina , linux-mm@kvack.org, linux-kernel@vger.kernel.org, asahi@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Catalin Marinas , ryan.roberts@arm.com, mark.rutland@arm.com Subject: Re: LPA2 on non-LPA2 hardware broken with 16K pages Message-ID: <20240724113340.GA27474@willie-the-truck> References: <50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net> <20240718131428.GA21243@willie-the-truck> <20240723145214.GA26403@willie-the-truck> <20240723160543.GA26546@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Stat-Signature: eaix3baiwemenbcksrm3omj6kz1gasg9 X-Rspamd-Queue-Id: AAAFE40008 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1721820827-488765 X-HE-Meta: U2FsdGVkX1/j9KhzjvGVYBwtAdq0lDeuXmlsylJAtUaZvsKZqmPhAz2KO024zRLLXaTjxp6dHkb15qX26fagVHL2zXlCQDVjVwmXCep7pwga9+o2CAFldDg2SFzXHkSVVcKnW7QobC6bP+/m3d+m4Y5JitzQJvwrkxitmX8NYMWMuINEkhsM1k3ERyEHntSIo1CB26FbfJAVychJ17tCbuOsU9jclkbFrh96c9IB/sAI9MG/B5eT+GiKAnCxqGlL+UW/euetA9VmTHu0TnFG6MwDBhRu1lEJuJCkaLSO9HHnCu05l5bgQl8dX5EcPm3fp2UN65yUJz3eyS7uyq+7UbOGZ3p/bk8oNjmmEf4zytT9rveGow6a/zZjTqfwI0qi+UmAoCDLvZLSIHkEqSxeXaa+EJFORP04Am7p9RlAjNNDy2BB9Q/GD5znwuTIlmUo6VizWkZkPQ36ToVQgZFWt5kLupsI3oex5S1ACiA6RN/pMMYb9iz3Aejo0GTbOmQBbh3T3fF6rVRcz+6J+AunNpB122v/nsU5VtIgaMWWdsEN1nWIxNPZ0QRXZe3w5pXNVIoxYmWAQWHBaECrXgzO7LxcE13T2bxYy9rm+4uvPrruQjsRbH8h+lELBMvjqfD/t1VszBhmPO8eaX+6MSyJ/cNYlXqOG8cUAyqwc7okAstVRro4x1zYxSs20fpcETANh2EMQ264zSPuNuimbavYThV7FmF5zVg0g68cZN0eoF4VQp2Vl12wPDXbWpdv1DfqgJNw3KwDyM0UUQ7bLNKrXDWbO8iMdVCdox5mBcCgx7GrlFmaIORBPcP+XyeG8zu4O0Juc5vxKC3hSanZnCwMBtNiJMb7ZpE9P8RcoThXx0c+dQpaIs30JKA52+qle402MxwoqgjQiOHf93XUqBn/9PBKeAPe94ZK8SozUJ71zvtyGvvF7RRnudWewbBB3KEo8JEh/5rP9F9Ant9fT/Q DWcogqf7 YZT6k8lW1HvnQFsxTZhV20wc5FUzQbg2HMAptZVxVmM8J8AECcQx0wFBLMtKNEZOdAMBvzoTvaTSnN8JkucVTxl3ROOp40A9jBYwY7GX0UD2c6rsbH9pc4WIpYy1AudbvwNAhXb6oTYd3eed6ojrppnCYIdWEOJ/qEo3kUnKbPFoLg7T6fFSnx+aW6A9oMuaXTuIA3YvAnkMsNGYfl4250Hmvm/JScobv+WrqC1ddEEgU3nNdTFVAnDQ0eg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 23, 2024 at 06:28:16PM +0200, Ard Biesheuvel wrote: > On Tue, 23 Jul 2024 at 18:05, Will Deacon wrote: > > > > On Tue, Jul 23, 2024 at 05:02:15PM +0200, Ard Biesheuvel wrote: > > > On Tue, 23 Jul 2024 at 16:52, Will Deacon wrote: > > > > On Fri, Jul 19, 2024 at 11:02:29AM -0700, Ard Biesheuvel wrote: > ... > > > > > > > > > > We might add > > > > > > > > > > if (pgtable_l4_enabled()) > > > > > pgdp = &pgd; > > > > > > > > > > here to preserve the existing 'lockless' behavior when PUDs are not > > > > > folded. > > > > > > > > The code still needs to be 'lockless' for the 5-level case, so I don't > > > > think this is necessary. > > > > > > The 5-level case is never handled here. > > > > Urgh, yes, sorry. I've done a fantasticly bad job of explaining myself. > > > > > There is the 3-level case, where the runtime PUD folding needs the > > > actual address in order to recalculate the descriptor address using > > > the correct shift. In this case, we don't dereference the pointer > > > anyway so the 'lockless' thing doesn't matter (afaict) > > > > > > In the 4-level case, we want to preserve the original behavior, where > > > pgd is not reloaded from pgdp. Setting pgdp to &pgd achieves that. > > > > Right. What I'm trying to get at is the case where we have folding. For > > example, with my patch applied, if we have 3 levels then the lockless > > GUP walk looks like: > > > > > > pgd_t pgd = READ_ONCE(*pgdp); > > > > p4dp = p4d_offset_lockless(pgdp, pgd, addr); > > => Returns pgdp > > p4d_t p4d = READ_ONCE(*p4dp); > > > > pudp = pud_offset_lockless(p4dp, p4d, addr); > > => Returns &p4d, which is again the pgdp > > pud_t pud = READ_ONCE(*pudp); > > > > > > So here we're reloading the same pointer multiple times and my argument > > is that if we need to add logic to avoid this for the > > pgtable_l4_enabled() case, then we have bigger problems. > > > > The 3-level case is not relevant here. My suggestion only affects the > 4-level case: That's exactly what I'm uneasy about :/ > if (pgtable_l4_enabled()) > pgdp = &pgd; > > which prevents us from evaluating *pgdp twice, which seems to me to be > the reason these routines exist in the first place. Given that the > 3-level runtime-folded case is the one we are trying to fix here, I'd > argue that keeping the 4-level case the same as before is important. I think consistency between 4-level and 3-level is far more important. Adding code to avoid reloading the entry for one specific case (the pgtable_l4_enabled() case), whilst requiring other cases (e.g. the 3-level compile-time folded case) to reload from the pointer is inconsistent. Either they both need it or neither of them need it, no? > > > > Yes, we'll load the same entry multiple times, > > > > but it should be fine because they're in the context of a different > > > > (albeit folded) level. > > > > > > > > > > I don't understand what you are saying here. Why is that fine? > > > > I think it's fine because (a) the CPU guarantees same address > > read-after-read ordering and (b) We only evaluate the most recently read > > value. It would be a problem if we mixed data from different reads but, > > because the use is confined to that 'level', we don't end up doing that. > > > > Dunno, am I making any sense? > > > > So what is the point of p?d_offset_lockless()? Is it a performance > optimization that we don't care about on arm64? Or does this reasoning > still only apply to the folded case? Well, naturally it's all undocumented. S390 added the macros, but it looks like that was because they hit a similar problem to the one reported by Lina (see d3f7b1bb2040 ("mm/gup: fix gup_fast with dynamic page table folding")) and, really, that change is just moving the logic out of the fast GUP code and into some new macros. If you think about trying to do fast GUP using the regular pXd_offset() macros with 5 levels, the problem is a little more obvious. For example, it would look something like: // gup_fast_pgd_range() pgd_t pgd = READ_ONCE(*pgdp); if (pgd_none(pgd)) return; ... // gup_fast_p4d_range() p4dp = p4d_offset(pgdp, addr); => READ_ONCE(*pgdp); A concurrent writer could e.g. clear *pgdp between the two reads and we'd end up with junk in p4dp because of a ToCToU-type bug. But I don't think that applies to the case we're discussing, because the reload of the entry happens in the following READ_ONCE() rather than in the _offset macro: p4d_t p4d = READ_ONCE(*p4dp); and, as this is in the context of a new level, everything is then rechecked so that a concurrent modification won't cause us to crash. Will