From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6ACBCE9A03B for ; Thu, 19 Feb 2026 12:15:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 327036B0088; Thu, 19 Feb 2026 07:15:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D53A6B0089; Thu, 19 Feb 2026 07:15:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18C5D6B008A; Thu, 19 Feb 2026 07:15:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0486D6B0088 for ; Thu, 19 Feb 2026 07:15:17 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 71C131C199 for ; Thu, 19 Feb 2026 12:15:16 +0000 (UTC) X-FDA: 84461101032.19.A15220F Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf04.hostedemail.com (Postfix) with ESMTP id 7C1B040014 for ; Thu, 19 Feb 2026 12:15:13 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=lVSZXCFC; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=TSDvSAHV; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=mPeuNfQl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=O051G0Ew; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf04.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=pfalcato@suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1771503313; a=rsa-sha256; cv=none; b=2eKG4F7Jn5GTrZEK8KuhIIWtisuC3YaaX7+ocP2aEmCbahWxbsXrRSGO2oNd93yrmGU8cn QYu6bSGPZktV9wh+xgpBvphO2sCNx7FEfbyps6/L1mUn93lkQNEIJ5FO7sHqvZ6dapMpBJ rmQZKExE7gGX/SvUNsMuJOlOFcL9tno= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=lVSZXCFC; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=TSDvSAHV; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=mPeuNfQl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=O051G0Ew; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf04.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=pfalcato@suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1771503313; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zRigAImyjDZChEnrcuB3IZZMiALE3/N4AR5Xw/jYMRk=; b=3grkmiT09hhGFkVR5JRzPjC2V/7SYGu7dyHPOaKvHhU4FeFJxl0Az6l+QuDzwziQTyG1/G n2v98DTz+A3GuFQQJTwVr5iFq99+i1NFQX/R7UM1GerYn7iChHUnGmuatK1dyutjdTeZ0x XzRY5kHUngb81JaBzCMjbqlAr3bmmjE= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id EB6815BCC9; Thu, 19 Feb 2026 12:15:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1771503312; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zRigAImyjDZChEnrcuB3IZZMiALE3/N4AR5Xw/jYMRk=; b=lVSZXCFCtnmMIt0yuktS+NKNFxY+VTXYK+mayGAjpWx0shi+Sf1iDMz86Nu4/9i0rImjdm 6f1Z36jI/WSFOHbSvIWRn8d9jkq40hgsm/9aI0q9SRufHik1nT0h9+c4inrEAsnIbNKlL+ 5auB1zOXP2bXx2p0J/XEHP1dlJbk26Q= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1771503312; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zRigAImyjDZChEnrcuB3IZZMiALE3/N4AR5Xw/jYMRk=; b=TSDvSAHVtw0qT+kKJ7Ja5O1UNoiWQpApZWF7B32NOCpbaCWe0XF3xHNZvsTe8Is3rToMa6 Bmdm5IlPx1PJa9Aw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1771503311; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zRigAImyjDZChEnrcuB3IZZMiALE3/N4AR5Xw/jYMRk=; b=mPeuNfQl4e2oK7i8aPwz1BTgLBhf6QKQ8t1+ExGyoTU4t5Htc+oF/KuykNhFsAMKL3zVyp U7PCKAsxe3T59fK6lhy7FSNF0Sosd4oQyJJBYpRsTKw83O69f4o2QOBTvSqUjxeOpSbmIn 68QT4b5E8EU85ewA8v+HUJtZFc60jtU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1771503311; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zRigAImyjDZChEnrcuB3IZZMiALE3/N4AR5Xw/jYMRk=; b=O051G0EwOWndd5ES1ttxEF1x6LD6JStl637dc3FoOKkZLoTtWeSlVpq0zZVu2zNQfaTy36 7DaQaYSKanLoZgCw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E05733EA65; Thu, 19 Feb 2026 12:15:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id HDh8M87+lmmkYAAAD6G6ig (envelope-from ); Thu, 19 Feb 2026 12:15:10 +0000 Date: Thu, 19 Feb 2026 12:15:09 +0000 From: Pedro Falcato To: "David Hildenbrand (Arm)" Cc: Dev Jain , Luke Yang , surenb@google.com, jhladky@redhat.com, akpm@linux-foundation.org, Liam.Howlett@oracle.com, willy@infradead.org, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [REGRESSION] mm/mprotect: 2x+ slowdown for >=400KiB regions since PTE batching (cac1db8c3aad) Message-ID: References: <764792ea-6029-41d8-b079-5297ca62505a@kernel.org> <71fbee21-f1b4-4202-a790-5076850d8d00@arm.com> <8315cbde-389c-40c5-ac72-92074625489a@arm.com> <5dso4ctke4baz7hky62zyfdzyg27tcikdbg5ecnrqmnluvmxzo@sciiqgatpqqv> <340be2bc-cf9b-4e22-b557-dfde6efa9de8@kernel.org> <624496ee-4709-497f-9ac1-c63bcf4724d6@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <624496ee-4709-497f-9ac1-c63bcf4724d6@kernel.org> X-Rspamd-Action: no action X-Stat-Signature: dzyue7o7fiwd43hzaj1n1fo38o8ekzbw X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7C1B040014 X-HE-Tag: 1771503313-416876 X-HE-Meta: U2FsdGVkX1/TGC0bEKWG7R6Z7feZYKR5VdDF42s1+suixzImSotRWq23vFyaClI4QGM9STvZOaXvCRZ9JUbcXOKaIAq+3hmuMZJfcyS8B8vciPhXLrWrTldX7v+QMU53Bbjbpb0QCrP7lQTWL8oWwDvKjH6R2NG2SxllMFhChi+Rr73f4NQJaQqpf7VReGagPlNJgEm5psSbTg+/4M1EMrseH6K8iEVjhEcPmbpRuk5dxedmFi8VsUwmYmUBhFOw63psvbsOYEIjb0vNgYnb3MZmPEIrLId7fwoSGfTXJQjY8O0zTW7zD/ttuwpaL7in/vIuV86c2F0aQ6tU0loQph+NHZmkBwVAP5BmLUycHEgNeVkYMOylNiGorFRRQcd1A9b9SGfp3vAGOTzuup5p+s0HKbeQqfzaXY5Adu3eOOslHAardqvxP+SFeHB+EU0ma28dEgQEYvHOB4DoWbMI7ThI4kEZDteBMCI86rWI2lMbfFP8MFQ+6pJX7GlC+m5z5KVNBBEvV0o5sI5yl+GEVU1Di82VyZgA8hRunC4L8/La7LPsoq5mYrJFLOHCX3kn+ALVHKgMKICuNU+O1XZvyvcCz/XVrV23UHbP9fLXeyldYYLkJNpgnmcP9f/FiK8Lfn50GVHGnO1eOi7lVUKckd43hod3ljQK34NSxBjIDSe7HyK918qxHn5MbOysvVHSgiR9ADoEu5UOURmVfzhw0zkxws3rfTxytg1MBO6+h6y7693kBSeI79UONKwXFuP/xj0FSV5Dv5wR4vtacCcGE71ZSkQ8lTs6VKGjSB4svJfdA/QUewXl/urrTSlM5QP9UzkJp6RHs8PLLkc4Ha+9btFQyCD4wJ+k5bEBTZ0+ZSnUg0u1tNJBvUtPxyQ172Um/duf5P2FaZEBciN5KucRNXLuL9vJdj4sFm/YY4HcOIupWCtTamQptMW7FUGR7ICb7FuLUvSeQaiIINEFBXn erxtGaaD FUewD0ZOt8iECMbH5GQYKIsn3BkS5llhwMj3qlnMvTkpFT51Ci3AdwUrl9xzeIUHxVHMTx45x8ShQUQ8HOdB44NO7hV4AN4F6LM522bO8TKVdvtC60qxvQJ7LYi1Tf/kCoY+lr31ZePq+S5jFokXQgXFBsgmsHnL3bK03wFD/zkNn44Ri30DG56omLWVRHpNKvI/LuOgygkisOTfw25EAZeaTJ3QUmr2vDqXpQj1hEGAcJFGdOKLlLf3wikRstnmFY0ES/DsQeQArlKvkrHZEYF+sZXzVd56qhENL4v9cbwsVyKI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 18, 2026 at 01:24:28PM +0100, David Hildenbrand (Arm) wrote: > On 2/18/26 12:58, Pedro Falcato wrote: > > On Wed, Feb 18, 2026 at 11:46:29AM +0100, David Hildenbrand (Arm) wrote: > > > On 2/18/26 11:38, Dev Jain wrote: > > > > > > > > > > > > There are two things at play here: > > > > > > > > 1. All arches are expected to benefit from pte batching on large folios, because > > > > of doing similar operations together in one shot. For code paths except mprotect > > > > and mremap, that benefit is far more clear due to: > > > > > > > > a) batching across atomic operations etc. For example, see copy_present_ptes -> folio_ref_add. > > > > Instead of bumping the reference by 1 nr times, we bump it by nr in one shot. > > > > > > > > b) vm_normal_folio was already being invoked. So, all in all the only new overhead > > > > we introduce is of folio_pte_batch(_flags). In fact, since we already have the > > > > folio, I recall that we even just special case the large folio case, out from > > > > the small folio case. Thus 4K folio processing will have no overhead. > > > > > > > > 2. Due to the requirements of contpte, ptep_get() on arm64 needs to fetch a/d bits > > > > across a cont block. Thus, for each ptep_get, it does 16 pte accesses. To avoid this, > > > > it becomes critical to batch on arm64. > > > > > > > > > > > > > > > > Nice. > > > > > > > > > > > > I dunno, need other opinions. > > > > > > Let's repeat my question: what, besides the micro-benchmark in some cases > > > with all small-folios, are we trying to optimize here. No hand waving > > > (Androids does this or that) please. > > > > I don't understand what you're looking for. an mprotect-based workload? those > > obviously don't really exist, apart from something like a JIT engine cranking > > out a lot of mprotect() calls in an aggressive fashion. Or perhaps some of that > > usage of mprotect that our DB friends like to use sometimes (discussed in > > $OTHER_CONTEXTS), though those are generally hugepages. > > > > Anything besides a homemade micro-benchmark that highlights why we should > care about this exact fast and repeated sequence of events. > > I'm surprise that such a "large regression" does not show up in any other > non-home-made benchmark that people/bots are running. That's really what I > am questioning. I don't know, perhaps there isn't a will-it-scale test for this. That's alright. Even the standard will-it-scale and stress-ng tests people use to detect regressions usually have glaring problems and are insanely microbenchey. > > Having that said, I'm all for optimizing it if there is a real problem > there. > > > I don't see how this can justify large performance regressions in a system > > call, for something every-architecture-not-named-arm64 does not have. > Take a look at the reported performance improvements on AMD with large > folios. Sure, but pte-mapped 2M folios is almost a worst-case (why not a PMD at that point...) > > The issue really is that small folios don't perform well, on any > architecture. But to detect large vs. small folios we need the ... folio. > > So once we optimize for small folios (== don't try to detect large folios) > we'll degrade large folios. I suspect it's not that huge of a deal. Worst case you can always provide a software PTE_CONT bit that would e.g be set when mapping a large folio. Or perhaps "if this pte has a PFN, and the next pte has PFN + 1, then we're probably in a large folio, thus do the proper batching stuff". I think that could satisfy everyone. There are heuristics we can use, and perhaps pte_batch_hint() does not need to be that simple and useless in the !arm64 case then. I'll try to look into a cromulent solution for everyone. (shower thought: do we always get wins when batching large folios, or do these need to be of a significant order to get wins?) But personally I would err on the side of small folios, like we did for mremap() a few months back. > > > For fork() and unmap() we were able to avoid most of the performance > regressions for small folios by special-casing the implementation on two > variants: nr_pages == 1 (incl. small folios) vs. nr_pages != 1 (large > folios). > > We cannot avoid the vm_normal_folio(). Maybe the function-call overhead > could be avoided by providing an inlined variant -- if that is the real > problem. > > But likely it's also just access to the folio when we really don't need it > in some cases. /me shrieks at the thought of the extra cacheline accesses in the glorious memdesc future :) -- Pedro