From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93203C369A1 for ; Tue, 8 Apr 2025 18:49:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D7EE280019; Tue, 8 Apr 2025 14:49:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 15E3A280017; Tue, 8 Apr 2025 14:49:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F4026280019; Tue, 8 Apr 2025 14:49:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D1D4D280017 for ; Tue, 8 Apr 2025 14:49:52 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 43651AEEFE for ; Tue, 8 Apr 2025 18:49:53 +0000 (UTC) X-FDA: 83311765866.24.C18C117 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id 75D81180004 for ; Tue, 8 Apr 2025 18:49:51 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CoNqvkBP; spf=pass (imf24.hostedemail.com: domain of sj@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744138191; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aNa2UaR3C+HPreuPgyuWjXKWj764X1+e0LsArmUPLEM=; b=tiIcrn84RPcYgVu2j/2K0KPQV/tTc6Jx9IxpeS/mV4Tfqv2Y2pI5+rtpNbS456fTP49s/w 7WBuGbAh6/otrSabhIcdyOFaerK4w2zHBcDYAS9qQILAlHrUtkrQrlFBcKzLF8fjO3achn Pb2W/izTAsnecWh33amEEn0XG1f0UTA= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CoNqvkBP; spf=pass (imf24.hostedemail.com: domain of sj@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744138191; a=rsa-sha256; cv=none; b=OLGVkJSkU9G5y6GAhvRkHjq8PU9ej7yq9RsBf6G4Xu6xMzdyp8n2J22w+pHe4i+rz0LOTy wXoVGMdXvFFaALMTNYhCjyExtWS5+EQbOsJPXSfd2wKR0HdddEHoRfsyeVe95q6hPl1qHP AGaXTotOfw7IcGHNxtTvsG+8pscbmBQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 3BB575C0F07; Tue, 8 Apr 2025 18:47:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C866AC4CEE5; Tue, 8 Apr 2025 18:49:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1744138190; bh=JUCB2bV+iYGyCNwUBsapZGnABiFfj1VKYKVKsbGH/zg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CoNqvkBP3vFgDfJdX5HyBql73FohlJRuDNvRuQMGzKxFAB+caL0nqg429qqNKUjdi sUIOtwxJQ4Cv4RXAtKYPfoslTCwqJ5gR0e3zEOlPOJdethtVBJwjg9tgm80VHZ4Al/ HZbgg+oWX2Na/wcsAvrLXMigNawQBde8SdI/fptF56V9IGK367akIkh+ReWJ2XOT+U muCiP0HGwc6xz0I3feNwoVXK/CExOe7Wg8KMeAp4MkNAoGHokYq4hyHnL8Zbwzd9YB 86T5QoYppsM5bvL+pZwsNTKolNQy61fVP96xchN9BFu6aUuneHAuoouaW+SbI5kO5Y 8bCGM26EeJGUw== From: SeongJae Park To: Lorenzo Stoakes Cc: SeongJae Park , Andrew Morton , "Liam R.Howlett" , David Hildenbrand , Rik van Riel , Shakeel Butt , Vlastimil Babka , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 2/4] mm/madvise: batch tlb flushes for MADV_FREE Date: Tue, 8 Apr 2025 11:49:47 -0700 Message-Id: <20250408184947.62625-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <12a8989c-c4f3-45a5-a66e-06ef7c2ef876@lucifer.local> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 75D81180004 X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: 8kza84cbox7tbuu4ibmz6gef78mskink X-HE-Tag: 1744138191-56743 X-HE-Meta: U2FsdGVkX1/htRa3cANZMkVmrUmhQjdu6FqkeSMjXfU708E8lr55erNseOmFKdu1C2WOGuTAvpU9FCnsMkRaTxk+YzqK1MZKT4PSy8glBia74VGMmt1v010vxiqqYS5arUGuOLwcu3K9jHLmQNE60KVE1u6NG1m8szREU6nW2+jwFetW0TQX/f5USeOgykjP0sVVjwRF0w2kQ2ZD+lSVD5c8VKbbFwbDQU5h/vvkyWgpQVNLOwr6hy2e0rB+SLVipIVAH/9G/DcO5b1vtwQeNu0ah07AzjerIKxL9BiFOQCPxUXC6tS+eYh3fqFdgbU8dMmd8MbjaoQslATEofLexVBue3uDLWzZBHoXIFFtiEh9dw1km0pfA/HU178J2NDBvuI3RjsVZiVXz0nYTIInKCsgA2C9ejpydriaLlJkaEshMbm6+TZkbhLmIP714k1+kguc7ILwu8B0m2Z/CwY8k07Mamjcq7+cqdg0LbR+NYxk8YIsWvRls8Qwcvgts/0xZ+/xyZwqUJE4DxWlblwq8X1Z64ngMrWGf54TVh6yxxp12ftLVzzwtmwq3GgMciXEuSLWZHfWB0HHxmXcYoh9ecoDVhJl8j6/Py1LmkENGmEvCDRbuNGMnoGb1c5K2aUuhbSCFhLjyN4sl05oiTeKxepy/vZZyaXawukpenbcUitfTh3d+lRrwxwUM6oXG42Q7p3CIWdYj/Ym4ymjdQVubiVNCj7ykKjYffDhaLDVGtJa/BJvXR6CXk8QueEqV4OT7bqDVjzvVPP5H99RXtJURMmbwl22fcehjfgklL+fQcUS7zrjOI0O649NijTmGRsyGl6qS+LSBb0IVNifb8+nzqFIFeJJ9ZfDv5/wq89VyjQT4yYtbjiNH9/6mtIiuP2fpdARfJFsTHp5TOVkHn1qa/YPXR1zFksKJ2tndJEUq+XTDu/Iljv5VfqPPX9mBfMItYso/5sKRsO92mN1wUl dzVLycgb mJ7+6T15jdSCFmAE9UYq8ZSOnIANdcCqBXCYj2+6IHhKq22Cyy8UoJ6JMBGmV6Zi73po8ldLBl/9SxzltxZSHgODue1PZJ+uiMsmRiLobu2AweWCs6I6ARJKxVndT7ktXlRPwVvVz00wGsNihPFgKlfNv+GD6Nsf/DMyuJX4DTN2pEXzvV8Cm5Sbokk30sZDmYfR3VaOjKHK7dS061xUwZxTZ5s7ae6C/AE2DI4+8wZSTcnJg5hkA7iTJqWGppoSkGueRzhJux3UYeh8XRYiN9+jA+WjHPw+nM4wSIY/8H4N7EjBvE/i2EWAnOw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 8 Apr 2025 13:58:18 +0100 Lorenzo Stoakes wrote: > On Fri, Apr 04, 2025 at 02:06:58PM -0700, SeongJae Park wrote: > > MADV_FREE handling for [process_]madvise() flushes tlb for each vma of > > each address range. Update the logic to do tlb flushes in a batched > > way. Initialize an mmu_gather object from do_madvise() and > > vector_madvise(), which are the entry level functions for > > [process_]madvise(), respectively. And pass those objects to the > > function for per-vma work, via madvise_behavior struct. Make the > > per-vma logic not flushes tlb on their own but just saves the tlb > > entries to the received mmu_gather object. Finally, the entry level > > functions flush the tlb entries that gathered for the entire user > > request, at once. > > > > Signed-off-by: SeongJae Park > > Other than some nitty stuff, and a desire for some careful testing of the > horrid edge case that err... I introduced :P this looks fine, so: > > Reviewed-by: Lorenzo Stoakes Thank you for your kind review! I will make the next revision following your suggestions as I answered below. > > > --- > > mm/madvise.c | 59 +++++++++++++++++++++++++++++++++++++++++----------- > > 1 file changed, 47 insertions(+), 12 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 8bcfdd995d18..564095e381b2 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -799,12 +799,13 @@ static const struct mm_walk_ops madvise_free_walk_ops = { > > .walk_lock = PGWALK_RDLOCK, > > }; > > > > -static int madvise_free_single_vma(struct vm_area_struct *vma, > > - unsigned long start_addr, unsigned long end_addr) > > +static int madvise_free_single_vma( > > + struct madvise_behavior *behavior, struct vm_area_struct *vma, > > This is pedantic, but elsewhere you differentiate between int behavior and > struct madvise_behavior by referringt to the later as madv_behavior. > > The naming kind of sucks in general though. > > But for consistency, let's maybe rename this to madv_behavior, and we can > maybe do a commit later to do a rename across the board? I completely agree. I will rename so in the next spin. > > > + unsigned long start_addr, unsigned long end_addr) > > { > > struct mm_struct *mm = vma->vm_mm; > > struct mmu_notifier_range range; > > - struct mmu_gather tlb; > > + struct mmu_gather *tlb = behavior->tlb; > > > > /* MADV_FREE works for only anon vma at the moment */ > > if (!vma_is_anonymous(vma)) [...] > > @@ -953,7 +951,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, > > if (action == MADV_DONTNEED || action == MADV_DONTNEED_LOCKED) > > return madvise_dontneed_single_vma(vma, start, end); > > else if (action == MADV_FREE) > > - return madvise_free_single_vma(vma, start, end); > > + return madvise_free_single_vma(behavior, vma, start, end); > > else > > return -EINVAL; > > On error paths, do we correctly finish the batched (botched? :P) TLB > operation? Yes, the change calls tlb_finish_mmu() and tlb_gather_mmu() as needed in the error paths. Of course I might forgot calling those in some edge cases. Please let me know if you find such mistakes. > > > } [...] > > @@ -1841,14 +1873,17 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter, > > } > > > > /* Drop and reacquire lock to unwind race. */ > > + madvise_finish_tlb(&madv_behavior); > > madvise_unlock(mm, behavior); > > madvise_lock(mm, behavior); > > + madvise_init_tlb(&madv_behavior, mm); > > continue; > > Have you found a way in which to test this? Perhaps force this case and > find a means of asserting the TLB flushing behaves as expected? I think > we're ok from the logic, but it's such a tricky one it'd be good to find a > means of doing so, albeit in a manual way. No, unfortunately I haven't found a good way to test this case. > > > } > > if (ret < 0) > > break; > > iov_iter_advance(iter, iter_iov_len(iter)); > > } > > + madvise_finish_tlb(&madv_behavior); > > madvise_unlock(mm, behavior); > > > > ret = (total_len - iov_iter_count(iter)) ? : ret; > > -- > > 2.39.5 Thanks, SJ