From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A553CC54E67 for ; Tue, 26 Mar 2024 16:31:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A4E06B0099; Tue, 26 Mar 2024 12:31:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 354A96B009A; Tue, 26 Mar 2024 12:31:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21D3A6B009B; Tue, 26 Mar 2024 12:31:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 10FEB6B0099 for ; Tue, 26 Mar 2024 12:31:15 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 284AE1A07B6 for ; Tue, 26 Mar 2024 16:31:14 +0000 (UTC) X-FDA: 81939730068.02.B53CCC0 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf02.hostedemail.com (Postfix) with ESMTP id 3E42B8001B for ; Tue, 26 Mar 2024 16:31:11 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711470672; a=rsa-sha256; cv=none; b=JqRdqtvWK9dTmNFcQFrA+LqvtacXngqIpyNN25WSTtFhgk3oEuXMO0JPZ3kukWABeCm+eZ 3gpL7CtMrNbeuW+4OQBX40yi+bh2WDkoGAXagbv/cLfJkN5zVUgad3idvIO+ETFvID8gC9 lq4ET4BavhmMFFXtbKlR+riWkEWkDUY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711470672; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Lqp+HVKJSnmG2SYVp3UwuxNr9oinASQdxW+ipuMuEOg=; b=nJVuSJTswsathoDodW6UPcfgv+8OBHb0O6+0UvIvffGVwGyI9SOekyeJwCrmXJhuEFW8uY DADyn0hm/4ESSX57D0GTXLA/2Q+W3SACpdck8tobHaq4yUu1zoGyaOgHA0nyq4Fmw5mEzR Xoy2LlwyRP2tGvE5Eo0qKRrTp8b2Mx4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DFD242F4; Tue, 26 Mar 2024 09:31:44 -0700 (PDT) Received: from [10.1.29.179] (XHFQ2J9959.cambridge.arm.com [10.1.29.179]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 683FA3F694; Tue, 26 Mar 2024 09:31:09 -0700 (PDT) Message-ID: Date: Tue, 26 Mar 2024 16:31:07 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64 Content-Language: en-GB To: David Hildenbrand , Mark Rutland , Catalin Marinas , Will Deacon , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Andrew Morton , Muchun Song Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240215121756.2734131-1-ryan.roberts@arm.com> <0ae22147-e1a1-4bcb-8a4c-f900f3f8c39e@redhat.com> From: Ryan Roberts In-Reply-To: <0ae22147-e1a1-4bcb-8a4c-f900f3f8c39e@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3E42B8001B X-Stat-Signature: fot9oa7ax78ukjqkcz1m1zxm6dg86j5o X-HE-Tag: 1711470671-519540 X-HE-Meta: U2FsdGVkX1+YLy+B+Fwq0YjzVnFGX2M/M1sUBQxU6B1jIXS1/t9Lfs+bNPkgwoH55K51I2ykjR12Gv0XvSrvOlmKzIVc0ZEwkNONgz0CHbxGOeMO8BdefUr63iQ/I4RGukNEb+XCiKIvnJniSb0Ke2Z6Y77ZGD/EzFHL1QCwX3Ubnja1uwndLoTOde/ArWXmzwJHf6T0SZX/EoMUjqp7gFHYTVN2i3Di98vyOAxzJ9Tjspyt+GngFk0v7o8srIjfTQiYYoEtB68vMzLWEmIFoBYZsVau2ky6G+av3YuqWnrQbNmtFaaYGWwkcLPy1BxV1eRXgSW3ya1QyADGpW9kBwppvlN4lDlpCi2Uud4D3Z+x7FxQKh1ou20s37fOiEt/66oqoDEksxCpBlToi3AcmjVsBigzQruHIB7vvOJ+YiuWY4lDtGhZ1BuBBuurGPcCyE6pdY8nZczrHJLAYEa2OOeu5762oc6dM56g4uigF1SRSjhz8hdf+DD+7Ue2wq9wbrqLdHJg4Rc49okxodbltLX5Zr6S5rR3Z0A6Trw6LQLPVkkcTSw9J0fFrjcjwr6fIPwjGl3osvCpSnqpkTrUAsTuyUwUsOgnD/rcK9/aipbRpWR9+bm09Al4rLluXC7CbFGMVjUPZsIWtOMErRz3t5V49Wl1STzxNW6MDt4qXeMbxUWr/l9LSSzQKWQo3mSIYdHb1PwqRcnkLuGL7wOgenGNBP7m43tybQ3LA6je1QaLlcMBpxmYPrwGacYa503atGJExZqshaTfNEMxgghSOnTBNocNUw9f4KKZ+T3gUKX+3zarhq6Da80EEWmGb6IgDidlbYiDTI3Vr8ATJ4MAE+AImojy3Vy8FyEVoTwCrQcqZqLQLXfd5rWQJc/vgqhkKgrTCnnIxzRyIKFBn++3t2xRxsoU5XOfa63QmguIexRWV3TVDzVgsfQPakvQswVjBVhD/YNB2IUbf/htvd9 3G0ndF+H bSCg8MJR1ON2MlF82nRf8CWOBwoE8GDxppZZ3cIwYBAgI4CwdentqIf+JXS90kCulm3v9ofJFdD9MgNd/l7ah0Bh16jPhbboWHwfK9fWTwqj5XBb+aF40EEbqlGMvl4Zce9XWOvD4adYAeaad/xUlz8ajLrrEM+QIR1e02JwnRaecwhX2TeszSolv4PPO82KifyFF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 26/03/2024 16:17, David Hildenbrand wrote: > On 15.02.24 13:17, Ryan Roberts wrote: >> This is an RFC for a series that aims to reduce the cost and complexity of >> ptep_get_lockless() for arm64 when supporting transparent contpte mappings [1]. >> The approach came from discussion with Mark and David [2]. >> >> It introduces a new helper, ptep_get_lockless_norecency(), which allows the >> access and dirty bits in the returned pte to be incorrect. This relaxation >> permits arm64's implementation to just read the single target pte, and avoids >> having to iterate over the full contpte block to gather the access and dirty >> bits, for the contpte case. >> >> It turns out that none of the call sites using ptep_get_lockless() require >> accurate access and dirty bit information, so we can also convert those sites. >> Although a couple of places need care (see patches 2 and 3). >> >> Arguably patch 3 is a bit fragile, given the wide accessibility of >> vmf->orig_pte. So it might make sense to drop this patch and stick to using >> ptep_get_lockless() in the page fault path. I'm keen to hear opinions. > > Yes. Especially as we have these pte_same() checks that might just fail now > because of wrong accessed/dirty bits? Which pte_same() checks are you referring to? I've changed them all to pte_same_norecency() which ignores the access/dirty bits when doing the comparison. > > Likely, we just want to read "the real deal" on both sides of the pte_same() > handling. Sorry I'm not sure I understand? You mean read the full pte including access/dirty? That's the same as dropping the patch, right? Of course if we do that, we still have to keep pte_get_lockless() around for this case. In an ideal world we would convert everything over to ptep_get_lockless_norecency() and delete ptep_get_lockless() to remove the ugliness from arm64. > >> >> I've chosen the name "recency" because it's shortish and somewhat descriptive, >> and is alredy used in a couple of places to mean similar things (see mglru and >> damon). I'm open to other names if anyone has better ideas. > > Not a native speaker; works for me. > >> >> If concensus is that this approach is generally acceptable, I intend to create a >> series in future to do a similar thing with ptep_get() -> ptep_get_norecency(). > > Yes, sounds good to me. >