From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06761C5475B for ; Fri, 1 Mar 2024 18:47:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75E1C94000D; Fri, 1 Mar 2024 13:47:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 70D62940007; Fri, 1 Mar 2024 13:47:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D4A494000D; Fri, 1 Mar 2024 13:47:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4BD5B940007 for ; Fri, 1 Mar 2024 13:47:53 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0F912C05E6 for ; Fri, 1 Mar 2024 18:47:53 +0000 (UTC) X-FDA: 81849354426.09.2DA9461 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf09.hostedemail.com (Postfix) with ESMTP id 563F8140021 for ; Fri, 1 Mar 2024 18:47:51 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of cmarinas@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709318871; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XdhwNMZD9V0cFujJmhmV2klgF1u5CiAYbbA/rcXqG1c=; b=CQE0CAmc1ApAuT9eVsOS3Fctt7cNTZZKbnzfvv2z54PwG0xvF+hv0BZI4bf2yxz1ghx+Aq CKUYGrUOmVKdNeylF+Mq7RonbkXmVCLzOqA0m17XxS2VMqQVqVjqjyhpB/RrgxzrzqKmHQ QBwLFX2awH0RkGsxXLwshsKAR9F1iWE= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of cmarinas@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709318871; a=rsa-sha256; cv=none; b=0W6QSH+d9LqUru98F75TJlUYWe/FxLMSNPSltIGxn+X5OHeGdWOjJqaX7fpWlpn6YhPgXN Awx8oUMzsyCPda2DSBigkRH7ZKh92RxnOyHjWhYx5wVPM16wJ1/sYayghEPtRRNou7qyq2 kngscFRfRNLzNt6SCpdUe1XHna+T3Zk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 33B9F616C4; Fri, 1 Mar 2024 18:47:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA14DC433C7; Fri, 1 Mar 2024 18:47:48 +0000 (UTC) Date: Fri, 1 Mar 2024 18:47:46 +0000 From: Catalin Marinas To: Ryan Roberts Cc: Andrew Morton , Mark Rutland , John Hubbard , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org Subject: Re: [PATCH 2/2] arm64/mm: Improve comment in contpte_ptep_get_lockless() Message-ID: References: <20240226120321.1055731-1-ryan.roberts@arm.com> <20240226120321.1055731-3-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240226120321.1055731-3-ryan.roberts@arm.com> X-Rspamd-Queue-Id: 563F8140021 X-Rspam-User: X-Stat-Signature: fizrb65c3n6shqco4casqrapzzpj8o7p X-Rspamd-Server: rspam01 X-HE-Tag: 1709318871-698169 X-HE-Meta: U2FsdGVkX19VqRu4IM2EKWoRnsEDV6ZFCKwjQVRWHUkLt0vcsiX87L2Dj7pBLU+q3mn30Hv3+YijnaWlxfNumTD8wKMlAZfGvAarcPDlfRw9EoZ3kgvMsuCH0EsYN6LvxtDz6X1z1Ly8/o/3gv+Dzi5Ej1LnlEdOey1tjE2IhuIhkrDw6NKIJln2CWo4DxaIYDV0EEgKy2m15yxgeD+d2yu6g7TxYzMv6W1H+wXESF6sftEazPvQT3y3G3B6wMEMvvsgetoF3kbPtTnpp1S3F0JCVygqYauaYoHoAHDrXNj1DvlYccM5Aeh0to+c+qrvZPx5+Em8p7AbMVI22IdkSrXuFiIm1ll2a2TkMR4IllSosc1YkNbS570tGJD6/xaQdWMwjL7+CYEw+5R0yk3fuErFPOJWWoRnRBAt8nHmnWEcyT4ReG3UXBWRIwoQPXYqBqTOM9Q+PDn7W7mpvQYu0C9lJTY0N20Sqq7TFyoMyzYlH5dswPS1iNUcC4N+mx6UM81yPJmxTiRlXMAbFjmv+wkpPD9dpIXoeV7W1ARujTEmd/ISy3AUEKojo+1pmBp1qZx88EM8q14PPs1kSliGXBoezUgegb9a0ON5UjIQLOCmJ44btuaf8ehLrxb+asnlBJEXTyePKcbHOAKmsc+vz5FDeGTaEDzIaeumxucJDk/h9R5Gk8r1n0Vr5/G7RXjU8+LhKAegiirBcsBopDLRWMwvwJ8oSxq1uivedOrz90VjPWSZWzqSzRePLf+Sy3uKXC7VQFSwDfC4HFvpJRNHLpVXLvkMbPRTNszG+BZyTY9/iV7pAWkFtwjZMf6Tkkqnohzn9QBF8/TfzFOc8XfYPl/Pn5t5UH82G/VhcLAglM+I/X3TS83KUPFA89J2z4IlVssb08Ya4EG4jah7gYCPAG2cb0LHm6Q3EDwLfIGEjCochypH2/auMdiVc+UGpQpz+mKh0cYxmg9M0y+aPaC x2H/4lq1 HbcPygiv/ApxrSRcFD7N+49BXDmP/g8kGVdKw/wjZ4HxubRJ6p3cGLp9gqCq0D3zgObGVNK7Gehy6YzSuKq1KdP+N3vrEwBQPuLTrvXU8x3iy9HPV9jf9Y55PCDpSHwfP8oQ/f4wB9afDk96b4Sy5m5i9LxLFuBRaM4Kq82qAujv4CH4buTWu+EgYOldUFNMxNoWi/SDaIsyw4eONKneJJu5r5gLrAyGll4mwb4l37e3+ULI8hObCmpD6UYtR2T4okdvexUNEq6OLl7rJMhvQpyLLSKd14BCIVCTILK+ZzdJw5ovB0tMqR4mzYsx3Kau9S1OuXhJ4C80mAhfz8/YQJpiFCBPKa3LI5uQ1q8WghK0FfCv87T7YEcfBgLvTsjQNq81m6YRZdudG/YUqt78J4MwiQv+Cjye9xtsq7yDO5PSpxjJNKwClrbZLlmEZmCTxb4sdi2gPsKRqpbvri2Uf1qQBd3/8yXrkwNkx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 26, 2024 at 12:03:21PM +0000, Ryan Roberts wrote: > Make clear the atmicity/consistency requirements of the API and how we > achieve them. > > Link: https://lore.kernel.org/linux-mm/Zc-Tqqfksho3BHmU@arm.com/ > Signed-off-by: Ryan Roberts > --- > arch/arm64/mm/contpte.c | 24 ++++++++++++++---------- > 1 file changed, 14 insertions(+), 10 deletions(-) > > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c > index be0a226c4ff9..1b64b4c3f8bf 100644 > --- a/arch/arm64/mm/contpte.c > +++ b/arch/arm64/mm/contpte.c > @@ -183,16 +183,20 @@ EXPORT_SYMBOL_GPL(contpte_ptep_get); > pte_t contpte_ptep_get_lockless(pte_t *orig_ptep) > { > /* > - * Gather access/dirty bits, which may be populated in any of the ptes > - * of the contig range. We may not be holding the PTL, so any contiguous > - * range may be unfolded/modified/refolded under our feet. Therefore we > - * ensure we read a _consistent_ contpte range by checking that all ptes > - * in the range are valid and have CONT_PTE set, that all pfns are > - * contiguous and that all pgprots are the same (ignoring access/dirty). > - * If we find a pte that is not consistent, then we must be racing with > - * an update so start again. If the target pte does not have CONT_PTE > - * set then that is considered consistent on its own because it is not > - * part of a contpte range. > + * The ptep_get_lockless() API requires us to read and return *orig_ptep > + * so that it is self-consistent, without the PTL held, so we may be > + * racing with other threads modifying the pte. Usually a READ_ONCE() > + * would suffice, but for the contpte case, we also need to gather the > + * access and dirty bits from across all ptes in the contiguous block, > + * and we can't read all of those neighbouring ptes atomically, so any > + * contiguous range may be unfolded/modified/refolded under our feet. > + * Therefore we ensure we read a _consistent_ contpte range by checking > + * that all ptes in the range are valid and have CONT_PTE set, that all > + * pfns are contiguous and that all pgprots are the same (ignoring > + * access/dirty). If we find a pte that is not consistent, then we must > + * be racing with an update so start again. If the target pte does not > + * have CONT_PTE set then that is considered consistent on its own > + * because it is not part of a contpte range. > */ I haven't had the time to properly think about this function but, depending on what its semantics are, we might not guarantee that, at the time of reading a pte, we have the correct dirty state from the other ptes in the range. Theoretical: let's say we read the first pte in the contig range and it's clean but further down there's a dirty one. Another (v)CPU breaks the contig range, sets the dirty bit everywhere, there's some pte_mkclean for all of them and they are collapsed into a contig range again. The function above on the first (v)CPU returns a clean pte when it should have actually been dirty at the time of read. Throughout the callers of this function, I couldn't find one where it matters. So I concluded that they don't need the dirty state. Normally the dirty state is passed to the page flags, so not lost after the pte has been cleaned. -- Catalin