From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADB66C61CE7 for ; Wed, 11 Jun 2025 10:03:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33F0E6B008A; Wed, 11 Jun 2025 06:03:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F0986B008C; Wed, 11 Jun 2025 06:03:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DF056B0092; Wed, 11 Jun 2025 06:03:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F320A6B008A for ; Wed, 11 Jun 2025 06:03:20 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 62520592B2 for ; Wed, 11 Jun 2025 10:03:20 +0000 (UTC) X-FDA: 83542682160.11.5EFCCD6 Received: from mail-ua1-f54.google.com (mail-ua1-f54.google.com [209.85.222.54]) by imf30.hostedemail.com (Postfix) with ESMTP id 7ED518000E for ; Wed, 11 Jun 2025 10:03:18 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NzFGhpeV; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749636198; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RH2TjfGR3UbJRiCQLXrBI6sqM+KWNI78BUn8bMomgqo=; b=GJkOtzN8Gt8ZJEmjHlLhpCYFDeZyERBqBBCu5xEIbYBCsNGTVPXFL/TGr7oPke4Vf5hVjr mSUQ1z25XiBwuud4A4eQdIDPtHt/mRT4TSSIYFyR7zILXFAglc7Bw0jIemQbyO5XHuVjap qG4dTkGtIe7UfMJb9XgB980kdqAVx/Q= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NzFGhpeV; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749636198; a=rsa-sha256; cv=none; b=G3Ok0N0uTY8XYZHAxw1kxpR7SIUOMjyy8+f4Qqm8ExDdgrZu/AOfhlKmPVoyulpyhVbQZ5 M+nF1ONOv1UA5vFVHOPhap7PL7I1eBsLxpsYCswZ38vCnHKNZ02/TzpUwejMAo9UUlHn52 Vkmtd7ONw7F8sp8ZR6kSZbS339k2IFM= Received: by mail-ua1-f54.google.com with SMTP id a1e0cc1a2514c-87edd8f4e9fso703060241.0 for ; Wed, 11 Jun 2025 03:03:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749636197; x=1750240997; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RH2TjfGR3UbJRiCQLXrBI6sqM+KWNI78BUn8bMomgqo=; b=NzFGhpeV8ftGA7VyfVWIu/rLB1vyxgfNDQLqqEbTkrC2KIR4U7HNDXfYoCqd0KIUur uOsjBTLBLVEuDcqRQX2pD5qlXcNoLyWci2NXmAyppkjZMQEclyObvuejfaMH6xXJVE6N 6lA/i5Uj9hg0CA1EChLaMI+SmNyrRl+JEYdBpuxBE+HtQF27+VxO+ixES6UpZVWLdGjs FNMYfLimgc6JZpO29LaAqAjotb59XnStrSk4HjXHdFmPYfkU0Y5e4/J++r9wiQ/gjTVb hRyWaVd4WyHK4kDDe/wm8sTV9cHqxOJjGtU3jcYEaC9LnmTN3Wu+XzXM6YApduIHPJhZ yzCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749636197; x=1750240997; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RH2TjfGR3UbJRiCQLXrBI6sqM+KWNI78BUn8bMomgqo=; b=ayGbevgnSq7wWBQlCRmj7eZmhoDnj1cjtUNyTwhIANPwOoDYQKy9qkSpJi66v2DdLH ZEYtYoLDXWEDah66ZVha2oHpq00lIsQXQznrN/Y9bFy6XuUikXztgT/Lay8XjT3rU8T8 9XY2DXh/a31+Xp7e8MCIq7DWC6tHNqHVjaILaaa3LhYsZcnmijf3hAhjpERfMeeedkWk bh1jX+p+y2EPC9FXd2XB3p1mCXmjv6R1l3f/2turEcLNlilNnxLFKiUaNf8w6QaQwn3y 8s0JagHqDoNHjq5EMnWjpVCYQ9nALq9lhEmZQoltQvQbUdi9+jPiHQKPFCLA80RF87+3 RXHA== X-Forwarded-Encrypted: i=1; AJvYcCUdMxt9BD4m8tKyqWgxk+mG70g5hFicXab9z8snlKRa/q/jzMsp9HYdfK7rZTk+dHdc26ql9KhWtQ==@kvack.org X-Gm-Message-State: AOJu0YwNLYReo/NrAKtzvRazSoKzhZgWFX07XxNbVoZRnVyAVpfxwqAJ o6vdnQpAGuUZWxDd3v0NF7snvVNcuamPhiPu8dWCv7QEebeRd2bAJzyL6ZK8Pkql0GMTVVLAtyN /0CxzzOkwBtT9OQioACvqcleOq3YJXM0= X-Gm-Gg: ASbGncuWeeG4ZiYdQIESzIlvk82u/9UCfsp4f5QMkK04d23fxDR38Sm0E5MEZ6JOMve 4D/LOq4oDDnbYlb6y+elcpuVwFnzrauJVWPchuRPQPFnbmSRQaoocvpmBXHbyQ50OLgtw7luZRg SdQWmpAQJsP1K2Qah13ameOHLBYl1UbtZWZF8QUb/arC0T X-Google-Smtp-Source: AGHT+IHXh8E6knSuG0t51X8INr9lE/unVkRjWkyhv0vgom0Wqv/EOlwQDxrDoUYVQSm8lIH2VhKC4uEnsvjauOp9Rt0= X-Received: by 2002:a05:6102:6e81:b0:4e7:bf30:da62 with SMTP id ada2fe7eead31-4e7bf30db91mr295132137.17.1749636197452; Wed, 11 Jun 2025 03:03:17 -0700 (PDT) MIME-Version: 1.0 References: <20250610055920.21323-1-21cnbao@gmail.com> <8e3b3a63-9060-4bf9-ad85-3ef5f5d541db@lucifer.local> In-Reply-To: <8e3b3a63-9060-4bf9-ad85-3ef5f5d541db@lucifer.local> From: Barry Song <21cnbao@gmail.com> Date: Wed, 11 Jun 2025 22:03:05 +1200 X-Gm-Features: AX0GCFsk0hK52nquwg-KWQo29DcvdDnbM1r_lddEkl2kSzmZx4m-4k_CG0vwSck Message-ID: Subject: Re: [PATCH RFC] mm: madvise: use per_vma lock for MADV_FREE To: Lorenzo Stoakes Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , "Liam R. Howlett" , David Hildenbrand , Vlastimil Babka , Jann Horn , Suren Baghdasaryan , Lokesh Gidra , Mike Rapoport , Michal Hocko , Tangquan Zheng , Qi Zheng Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7ED518000E X-Stat-Signature: hz3ep3qug5kf36soqwga7dfuruz9ks98 X-Rspam-User: X-HE-Tag: 1749636198-842092 X-HE-Meta: U2FsdGVkX1/ikd2GDXQOIs/kNL5mWsfRluTOyq+fY9bmi/l49x3gNINUrNex1K6uaf1sSY+L9F5I29gyogMut//o/Jbxj5eZHSQQP+9L6ZYy/zczVMOZDtsDAD9mt08TkkZsNuXjddvgzTCmql4F7Mo3snOLHquoqLirVXU1SnMEIlKLVeTsrv9u97pad/XzavUL0sZPXsmAGIdR5yglbatmRRJkMly9oEtpNHqb2RUrU41n4Ti8oDekzZnhC7NIcseu+Bk8mSCNp0kL+BpxZ1Jhui4EYyw2SPqXIJ8r+/hr2Sk/X8k4eoVQJS0iAEb9XQx++bjyHMN74uWMyUmao3HeXv4PsIHaWmoRlFn4HAmATDcZdDEodfAcDleoQ7UCy8rnWklMX388xyqNrgc9qpcwkHB0REvWRqW5bkVMdtBIRL1aGnvgV+FqhlNQDcvqg2dX0k07RDnJhESyO4danC1/0N4MAb5CIzOzD8iEodc1FwIJlzsuSDmITWjkQ1vNExZFLpyUumPT8euSZpx/ABKn1wVJeyPl9DIkg/o95exKw20L10ptVfgasJNsZQlnYEloYcvRYw1AssysJwsKTjjEO5wBXTYBBML1IRlOR/52xE2R7NT3N9Fh9ogeXdUPDOER261lAAD+hsN1tj0ovPey6CB8ZG7sW7JO1GKnvaaRJdeKa9F/qId/g0pvi7G5188k24Qyls2do+btarFz1ezRiIqscI0xxiN9AfggrCq/pbW0CVDRrlw0W5vrToc19uX2dO3Q6zg4QUyJWG/gFakpWvR8RRmq4ZlAFS4I6IJN8mVzA5ArOIUtAfxa2mSUnbpMwDQmrelt3RPl5X3ZP2tRCIrDZPHkb7/bomPLyLzkVO0ghsXEf4q16kVDLVH87g7xdkQkpu0pyV88lRzt0mY5uCzxMfPB2zg1/Mjn+GF5iQ54ms58dTgjZRTGZG0qX9u+vtomvMaR9wkk6Wz j8I7YaEs 30l/dkE8QarpXeFjeQFKETxzyqJSvTDNw+3w6rDmKKX0HcQpRG8vciu1eBpPnRGnjVtufw/G9jPA2a5mAT2z8tW/jN9b7wCzWr97WZSGX1+Z9ZylNGo1jQwkFfxmx8rkjM/NeO8Hlt88+yfO4A4gsv59/9Ms+9CLgYpQph0cXdtyGNtb+LKudg2AJzug5QNB8fpw48Mg2e0f/aLnnoX5Rh2sGK0KR4aQzDnrikh5IYbG2EijYNx93Z+rrtktLfPdQ52gDZ7O+xBvFes/bmUFReWrYKlvJou/JNLyEZ36aNciIxK70sSFxA5IqQi4YW744b045NRHkgwxmQYwZi6dHAVtgARqHBv6Zqq6INJvazbDbND3ovYEW+wb9OrgIJpMk0QjUUsAN3sGF4rak2KorrKa34L6AqwIL59oRExcTHFQvbsAZYwbWXf6YgKj4Izg5VIFnPUuQ9fcFtRlzUZvQqhbWnNmS8UxbcaOPV8OiYE8Qcyg+JKhw+af0R7LZ76fPS1N51w+IbXBahBCIIZ4a3mBf4EejyFu+ZWnEeOKRo6z0+dpntFTJPhiKWg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 11, 2025 at 6:52=E2=80=AFAM Lorenzo Stoakes wrote: > > On Tue, Jun 10, 2025 at 05:59:20PM +1200, Barry Song wrote: > > From: Barry Song > > > > MADV_FREE is another option, besides MADV_DONTNEED, for dynamic memory > > freeing in user-space native or Java heap memory management. For exampl= e, > > jemalloc can be configured to use MADV_FREE, and recent versions of the > > Android Java heap have also increasingly adopted MADV_FREE. Supporting > > per-VMA locking for MADV_FREE thus appears increasingly necessary. > > > > We have replaced walk_page_range() with walk_page_range_vma(). Along wi= th > > the proposed madvise_lock_mode by Lorenzo, the necessary infrastructure= is > > now in place to begin exploring per-VMA locking support for MADV_FREE a= nd > > potentially other madvise using walk_page_range_vma(). > > Thanks :) > > > > > This patch adds support for the PGWALK_VMA_RDLOCK walk_lock mode in > > walk_page_range_vma(), and leverages madvise_lock_mode from > > madv_behavior to select the appropriate walk_lock=E2=80=94either mmap_l= ock or > > per-VMA lock=E2=80=94based on the context. > > > > To ensure thread safety, madvise_free_walk_ops is now defined as a stac= k > > variable instead of a global constant. > > A nit but I'd add 'because we now dynamically update the walk_ops->walk_l= ock > field we must make sure this is thread safe' or something like this to cl= arify > the need for this Sure. > > Did we not have to worry about this before I guess because the mmap lock = would > exclude other threads? Probably not. It was a constant, and no one needed to modify it before, no matter how many threads called MADV_FREE. > > An aside, but I wonder if we have this implicit assumption elsewhere that= VMA > locks defeat... hm :) > > > > > Cc: Lorenzo Stoakes > > Cc: "Liam R. Howlett" > > Cc: David Hildenbrand > > Cc: Vlastimil Babka > > Cc: Jann Horn > > Cc: Suren Baghdasaryan > > Cc: Lokesh Gidra > > Cc: Mike Rapoport > > Cc: Michal Hocko > > Cc: Tangquan Zheng > > Cc: Qi Zheng > > Signed-off-by: Barry Song > > Looks good to me, kinda neat how the previous work for the MADV_DONTNEED = under > VMA lock stuff made this pretty straightforward :) > > So: > > Reviewed-by: Lorenzo Stoakes Thanks! > > > --- > > include/linux/pagewalk.h | 2 ++ > > mm/madvise.c | 20 ++++++++++++++------ > > mm/pagewalk.c | 6 ++++++ > > 3 files changed, 22 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h > > index 9700a29f8afb..a4afa64ef0ab 100644 > > --- a/include/linux/pagewalk.h > > +++ b/include/linux/pagewalk.h > > @@ -14,6 +14,8 @@ enum page_walk_lock { > > PGWALK_WRLOCK =3D 1, > > /* vma is expected to be already write-locked during the walk */ > > PGWALK_WRLOCK_VERIFY =3D 2, > > + /* vma is expected to be already read-locked during the walk */ > > + PGWALK_VMA_RDLOCK_VERIFY =3D 3, > > }; > > > > /** > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 381eedde8f6d..23d58eb31c8f 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -775,10 +775,14 @@ static int madvise_free_pte_range(pmd_t *pmd, uns= igned long addr, > > return 0; > > } > > > > -static const struct mm_walk_ops madvise_free_walk_ops =3D { > > - .pmd_entry =3D madvise_free_pte_range, > > - .walk_lock =3D PGWALK_RDLOCK, > > -}; > > +static inline enum page_walk_lock get_walk_lock(enum madvise_lock_mode= mode) > > +{ > > + /* Other modes don't require fixing up the walk_lock. */ > > + VM_WARN_ON_ONCE(mode !=3D MADVISE_VMA_READ_LOCK && > > + mode !=3D MADVISE_MMAP_READ_LOCK); > > I find this a bit hard to parse... > > > + return mode =3D=3D MADVISE_VMA_READ_LOCK ? > > + PGWALK_VMA_RDLOCK_VERIFY : PGWALK_RDLOCK; > > ...might be better as something like: > > switch (mode) { > case MADVISE_VMA_READ_LOCK: > return PGWALK_VMA_RDLOCK_VERIFY; > case MADVISE_MMAP_READ_LOCK: > return PGWALK_RDLOCK; > default: > /* Invalid. */ > WARN_ON_ONCE(1); > return PGWALK_RDLOCK; > } I actually tried both before sending and, for some reason, preferred the one I sent. But I'm totally happy to go with whichever approach you prefer:-) > > > +} > > > > static int madvise_free_single_vma(struct madvise_behavior *madv_behav= ior, > > struct vm_area_struct *vma, > > @@ -787,6 +791,9 @@ static int madvise_free_single_vma(struct madvise_b= ehavior *madv_behavior, > > struct mm_struct *mm =3D vma->vm_mm; > > struct mmu_notifier_range range; > > struct mmu_gather *tlb =3D madv_behavior->tlb; > > + struct mm_walk_ops walk_ops =3D { > > + .pmd_entry =3D madvise_free_pte_range, > > + }; > > > > /* MADV_FREE works for only anon vma at the moment */ > > if (!vma_is_anonymous(vma)) > > @@ -806,8 +813,9 @@ static int madvise_free_single_vma(struct madvise_b= ehavior *madv_behavior, > > > > mmu_notifier_invalidate_range_start(&range); > > tlb_start_vma(tlb, vma); > > + walk_ops.walk_lock =3D get_walk_lock(madv_behavior->lock_mode); > > walk_page_range_vma(vma, range.start, range.end, > > - &madvise_free_walk_ops, tlb); > > + &walk_ops, tlb); > > tlb_end_vma(tlb, vma); > > mmu_notifier_invalidate_range_end(&range); > > return 0; > > @@ -1653,7 +1661,6 @@ static enum madvise_lock_mode get_lock_mode(struc= t madvise_behavior *madv_behavi > > case MADV_WILLNEED: > > case MADV_COLD: > > case MADV_PAGEOUT: > > - case MADV_FREE: > > case MADV_POPULATE_READ: > > case MADV_POPULATE_WRITE: > > case MADV_COLLAPSE: > > @@ -1662,6 +1669,7 @@ static enum madvise_lock_mode get_lock_mode(struc= t madvise_behavior *madv_behavi > > return MADVISE_MMAP_READ_LOCK; > > case MADV_DONTNEED: > > case MADV_DONTNEED_LOCKED: > > + case MADV_FREE: > > return MADVISE_VMA_READ_LOCK; > > default: > > return MADVISE_MMAP_WRITE_LOCK; > > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > > index e478777c86e1..c984aacc5552 100644 > > --- a/mm/pagewalk.c > > +++ b/mm/pagewalk.c > > @@ -420,6 +420,9 @@ static int __walk_page_range(unsigned long start, u= nsigned long end, > > static inline void process_mm_walk_lock(struct mm_struct *mm, > > enum page_walk_lock walk_lock) > > { > > + if (walk_lock =3D=3D PGWALK_VMA_RDLOCK_VERIFY) > > + return; > > + > > if (walk_lock =3D=3D PGWALK_RDLOCK) > > mmap_assert_locked(mm); > > else > > @@ -437,6 +440,9 @@ static inline void process_vma_walk_lock(struct vm_= area_struct *vma, > > case PGWALK_WRLOCK_VERIFY: > > vma_assert_write_locked(vma); > > break; > > + case PGWALK_VMA_RDLOCK_VERIFY: > > + vma_assert_locked(vma); > > + break; > > case PGWALK_RDLOCK: > > /* PGWALK_RDLOCK is handled by process_mm_walk_lock */ > > break; > > -- > > 2.39.3 (Apple Git-146) > > Thanks Barry