From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7E97C02192 for ; Fri, 7 Feb 2025 05:35:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 084706B0083; Fri, 7 Feb 2025 00:35:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 034566B0085; Fri, 7 Feb 2025 00:35:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3DC26B0088; Fri, 7 Feb 2025 00:35:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C4DEF6B0083 for ; Fri, 7 Feb 2025 00:35:47 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4CFB61C9366 for ; Fri, 7 Feb 2025 05:35:47 +0000 (UTC) X-FDA: 83092036734.18.DA5835C Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf02.hostedemail.com (Postfix) with ESMTP id 5A63C80011 for ; Fri, 7 Feb 2025 05:35:45 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738906545; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vGqiGPGbm+dicx6Pe88cvPL8KDO/CfoaA36HltATy38=; b=8BtOwEWBfeF/E8Ms57bCrvpUU/enyYtcoeWYzsLdC9eWtYPAdZCYHQzJ5hPmgz73JQEP/n PUwbid+CJ8da9uekzqVXFOqw2mslWvSacqw3qsjkHLCSWFsd8Q3vWqU9I4fztAxLN/WQYn DaoEapS18o43wB2gpAFfOYKlitmgvZQ= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738906545; a=rsa-sha256; cv=none; b=f3R74lq5Zd+B1VuZgbMty+ulWJg/T1OUqhbPSwUd64J++DuOFynhD89IsSNj2Lvv4z2EFR 2ISQ+5mS2M5TpF5DZClfA+oYq1VINdDOqH/G4ffSQHU8+HrVwYzeBkCgBm5Ofm6Van9A+X y+pci75fLGav9LMK5VwZZwG7Z8zYSlo= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 81E11106F; Thu, 6 Feb 2025 21:36:07 -0800 (PST) Received: from [10.162.16.89] (a077893.blr.arm.com [10.162.16.89]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 681BE3F63F; Thu, 6 Feb 2025 21:35:39 -0800 (PST) Message-ID: <858ecac5-9ba7-48da-8f34-ffda28d17609@arm.com> Date: Fri, 7 Feb 2025 11:05:36 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 08/16] arm64/mm: Hoist barriers out of ___set_ptes() loop To: Ryan Roberts , Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20250205151003.88959-1-ryan.roberts@arm.com> <20250205151003.88959-9-ryan.roberts@arm.com> Content-Language: en-US From: Anshuman Khandual In-Reply-To: <20250205151003.88959-9-ryan.roberts@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5A63C80011 X-Stat-Signature: bem7bw7squimr719zqku3yhqq34gy4h3 X-Rspam-User: X-HE-Tag: 1738906545-456416 X-HE-Meta: U2FsdGVkX1/nkffYrUWI6RlzuOT942GyHYEAMygxBpALcvyUY0mODwjYdX+ld31mkR4/rMc9wzom7JwmSQg1d/SuSnS7GD5ovy5nWLgLolF0yjgFS00F+XIXnCiBMJ4ZYm+JC20RdpUD3Oiw/98QOrns7c0D420+rY9hwjWsBAF+0Nlf9PPiTF2GwzMDAoCKhAFk+ie9PFnvheq8RDt5kr/qDYI1KqWHx0f45SA9TEuvLb2B9p87OUebbbNpUsYgEeEmhXGeQbgPYMr4zHzLBVbDFXbgDPvLCAE+10AyV5Zm1fMHLebHbB9t3GNE7Z/7znkrcN59F8j3VslN+hZJyxYhXeryyTVWyknh2ojEU7Z2VZmRPwgC0Rtq5j7cIb6r2jumrxxaioFXgemOI7/XUsJihV/rVmDPTx6i1VEARvYnZ86mT0j4/4Xewz16T1HeK+LDc8c393n7vJur3ConL+ffzjOkVwMF0tEQjSM2RUpAEDb7/VY2oHuhBNKWRRNO5QGfRPZbl5b/fy+JqQMU3t+Ae/chn1vLu+3boNYE7fhzD2YuksH9jQoqap6ouhBQgjdbN+W0e+q5OXrIDopwX4rjvrNw02/rQ0oSIILbec54cH+P2TeVAqzPDnedmeEJH7AEHLOlQS4Q/a2azr9A2+qQAw0T20PnMLHgMDTYz+3dyh5zEg3Zw3TTS5vYGyWp/sq115d1N6PSVTe8SqZCzFbE9yEXhgzC38ryhaadDdcJSKDKv8wjgrb2owW13bBfOghHhcf450uOq56IJXI0A9l2zDlrzNlDpX5aQIucZtemKlzymDbD5lctZr5A9eu3Ro/88ohb1SYD7pwc/2TimZJRnUNYocuajZzjwSviyU6X61pUUPunEOP1tJhW+8WhTfm3H3jdkjhPaz/1/enfIHw/AV6Nbwd8oIRsqaoXzWan80MK7IFIDdxSjllEmpDU24HurNJC6z+HNR7CiLG 9XvshNxo u5/qU0wFsz/wqWTQSLGW+GbtTAYHNx6N2og3PPm2gShSOZXT1aJ00UV1whPgnIP+olkEOAeUsz+mxuWT6QRBvtei7MvNxGd92GM159evwBNEb+8+pq+DKrJ5vSY8ehBBHgJaRlBcsWHdBwr3Za4L16p8IbbkHRQ7rGVfB2IRYNv26W6Ea+BLCzmaw7xx9z4EfIgdI3DgYQE6Na2/4gTtEK/A9qGOqjDtf7mewet4tehGiQKx85N7Pork/7g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2/5/25 20:39, Ryan Roberts wrote: > ___set_ptes() previously called __set_pte() for each PTE in the range, > which would conditionally issue a DSB and ISB to make the new PTE value > immediately visible to the table walker if the new PTE was valid and for > kernel space. > > We can do better than this; let's hoist those barriers out of the loop > so that they are only issued once at the end of the loop. We then reduce > the cost by the number of PTEs in the range. > > Signed-off-by: Ryan Roberts > --- > arch/arm64/include/asm/pgtable.h | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 3b55d9a15f05..1d428e9c0e5a 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -317,10 +317,8 @@ static inline void __set_pte_nosync(pte_t *ptep, pte_t pte) > WRITE_ONCE(*ptep, pte); > } > > -static inline void __set_pte(pte_t *ptep, pte_t pte) > +static inline void __set_pte_complete(pte_t pte) > { > - __set_pte_nosync(ptep, pte); > - > /* > * Only if the new pte is valid and kernel, otherwise TLB maintenance > * or update_mmu_cache() have the necessary barriers. > @@ -331,6 +329,12 @@ static inline void __set_pte(pte_t *ptep, pte_t pte) > } > } > > +static inline void __set_pte(pte_t *ptep, pte_t pte) > +{ > + __set_pte_nosync(ptep, pte); > + __set_pte_complete(pte); > +} > + > static inline pte_t __ptep_get(pte_t *ptep) > { > return READ_ONCE(*ptep); > @@ -647,12 +651,14 @@ static inline void ___set_ptes(struct mm_struct *mm, pte_t *ptep, pte_t pte, > > for (;;) { > __check_safe_pte_update(mm, ptep, pte); > - __set_pte(ptep, pte); > + __set_pte_nosync(ptep, pte); > if (--nr == 0) > break; > ptep++; > pte = pte_advance_pfn(pte, stride); > } > + > + __set_pte_complete(pte); Given that the loop now iterates over number of page table entries without corresponding consecutive dsb/isb sync, could there be a situation where something else gets scheduled on the cpu before __set_pte_complete() is called ? Hence leaving the entire page table entries block without desired mapping effect. IOW how __set_pte_complete() is ensured to execute once the loop above completes. Otherwise this change LGTM. > } > > static inline void __set_ptes(struct mm_struct *mm,