From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CD6FC61CE7 for ; Wed, 11 Jun 2025 10:04:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 00DD96B0088; Wed, 11 Jun 2025 06:04:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F27A16B0089; Wed, 11 Jun 2025 06:04:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E19636B0092; Wed, 11 Jun 2025 06:04:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C01A46B0088 for ; Wed, 11 Jun 2025 06:04:48 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4A62B1A07A6 for ; Wed, 11 Jun 2025 10:04:48 +0000 (UTC) X-FDA: 83542685856.28.69E4E41 Received: from mail-ua1-f48.google.com (mail-ua1-f48.google.com [209.85.222.48]) by imf15.hostedemail.com (Postfix) with ESMTP id 70F5EA0008 for ; Wed, 11 Jun 2025 10:04:46 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RO+OwY5q; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749636286; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zSauKG6wymdqnn8TxZJhdyCnb4cwpHLFj7X9h54mxJQ=; b=6t3GJX5bVxSjYIA0dTtX+QesM9m2TvBYRCmn+Ng46yn5S6rp1rkIhWFo67DpWsvempKu+l ArOyopt7a4by/hhBYFeeuDy27m4EkuHq1GhRBQn535S4gjI5bZxSfvbDKRqyGKIJbbgKeM XSdo6Utw/dvOnFUrtpwh2EL5QCOsYHM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RO+OwY5q; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749636286; a=rsa-sha256; cv=none; b=dDI8M/dtPuZA1lawt9LeJZQivQDe437auCL1a1URUUXNgRn9wWfIkqcoyFicx3j3FNplFL R5SIHHV37EUBpTrf0gEVlKvW08fCYEULNTV8Hm7ItLWGvvGcR5BG4T1LToq4V3YIxCIMq8 81fYwh7DE3Bpoueiy6IlchiZh9NPVZc= Received: by mail-ua1-f48.google.com with SMTP id a1e0cc1a2514c-86f9c719d63so1471860241.1 for ; Wed, 11 Jun 2025 03:04:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749636285; x=1750241085; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zSauKG6wymdqnn8TxZJhdyCnb4cwpHLFj7X9h54mxJQ=; b=RO+OwY5q6iYdLFiPWU/Q5nvOh2UOnB62KMdeAof3cBC0oWjkDcO2FAugQzh6gpNsjM rxPiKG+5OHwOp+cACWrsLQA4bDL4yKigg2ZeBGkG/xx3ARylBpU06xasyRtALHUHS/X9 +Q5wZSvm5zmldoNG2H5REE3D76L3Xo3bXS2SVN1NTB4ShTmjVcp3ZiBbWra6D3igkf71 hw50UgDO4LcOeMtqYLaMGKVeAxodFHIksgcSr+SgITtEgr5h/e+1nVokM3J+ImSEAB7y WPsr+weTiLWmCqbPAxYTsQVsNH6LQtvZL3qn2Z0jG6AiioJYlKsbxV7J54IMa0XiF5rV 4H5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749636285; x=1750241085; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zSauKG6wymdqnn8TxZJhdyCnb4cwpHLFj7X9h54mxJQ=; b=PzVNTUKlYLDpb7PmhNMZ8KSN01ykXoLX2SCW9W3M4D8Axbx2XTVwNozc4Jlim6N4XT Ym8qb44s7IuQnbXUIomh8J2v+t0AwIrrrjRFam0WhDpnR7gfqjfT6xodiDkoQmoxxjnT 3HWrGxmuUn6MAQZBYxBrNvTX3EtpNJ1AIftVKzwhr644z+awPPzS2GdVoACpaPWTgB7G fn4DWt5YUXwdYYymoqMfF8txx+BdpA66JuaLRmQvaAfpB0u+B6NZCiBGL/6vCZGHf+MI mKGaukuErPfC1GTowZ4Vi0l7Z2DUg9/oWD0wOazLJb+7g53Cf59eLgrFYTHvU3VVZT6T KfAg== X-Forwarded-Encrypted: i=1; AJvYcCXoXZkwGanOBD4ttGKa7ukswecm9dLZRcDyDW6LNk+U2QU2sdUEDLLy8ZAx3pLbzb8iyNRvaJizLQ==@kvack.org X-Gm-Message-State: AOJu0Yz6dRnVaFOkWn+MCFn4eYL3hrhTfKRQc/j6MyHpPXlmNa6YXe8o NgR83VIOrHU/uz8UjZrh3FyZSyNX8VBVrb9gl+pDtZQZZZCZBkYxxLCt0Ru262gXI5KD5F5mU5O fH3P4lyUoBSYL0vsqsCwZczKanIzpNoY= X-Gm-Gg: ASbGncsvKlvxN770ua5cD4X+CBmeuKG+BM9UuBH268rDUKeCWnlIUisTqY5QdpZe0BG EAZoklpsLsruCHI6IMOjbMAZGpx0XjRAwdMZVEeg1W0xDYgGGke8Wq0nbdqJslvZrHmeL5GGSOU O8gsZtXBrK7ICnbxS4Z5CE2+XYOjByvojzI0Rmt1dy1JEF X-Google-Smtp-Source: AGHT+IENApZj18vflLJZu2mp+mgbKEd0EYRCG6grkuOBJfsqMVwtlTiso3U8qJdBhO90xK0T8O/lPHtVxpNSBOgHLsU= X-Received: by 2002:a05:6102:32cc:b0:4e6:df89:66c4 with SMTP id ada2fe7eead31-4e7baf1f19fmr2015851137.15.1749636285502; Wed, 11 Jun 2025 03:04:45 -0700 (PDT) MIME-Version: 1.0 References: <20250610055920.21323-1-21cnbao@gmail.com> <65c1df74-4e58-4a09-9451-b18dae5adb3f@redhat.com> In-Reply-To: <65c1df74-4e58-4a09-9451-b18dae5adb3f@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 11 Jun 2025 22:04:34 +1200 X-Gm-Features: AX0GCFsJ-8o8W5846crcjv1G8K2o9qe4u1QSWrMscRGuy7AFGaML7vJka3IwTSk Message-ID: Subject: Re: [PATCH RFC] mm: madvise: use per_vma lock for MADV_FREE To: David Hildenbrand Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Jann Horn , Suren Baghdasaryan , Lokesh Gidra , Mike Rapoport , Michal Hocko , Tangquan Zheng , Qi Zheng Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 70F5EA0008 X-Stat-Signature: z65k97q3khts933cxio5449p7yogu568 X-Rspam-User: X-HE-Tag: 1749636286-152206 X-HE-Meta: U2FsdGVkX19COHdeiWX1eESvSJPnkIu9V32ig5Y6rYVGyMsJBmSqMhVY0KpwmXWqGHl/l7G/JfXqvyIReXBMnmvTVU88GnvTQadUzv+ASwim5MW/WrVPKlnSJ6nVwf8P8l6aJXuDat4F1OAyBkSi4lHZlsRq60NUGs7+s0as8iA+wydxywdw/idLG2yNxwlrhUkhH5qcW+VThQe6bSvSMoLjozJx6bd6qwSO+SJtg87lgJvTZl72TTPNQQPuRVZOJNg49aU1BuIZGOA8ycBtm2c/1MQzDtiawJGdHsARPhlzENQ3Oa15JANR11JFmSAEMUAVT1IV2+JJz83GkmIqJzV3/UjdCAYo+p9l6M1fvVyrvzhw0W1cRRNyiq0RYokQgnU6PekrFGPhZ23Z8mn1aQxuwOgW0sXOJmVjLad42wHhlCDb9FEMecsuFtHpSM1qa0et/rQU2jI7cpapiEcCDLpoGhn6sN7OQjiQwGpQ0nRz9Ek8Q0lTinov1UJBmyNgDKqTSXQvYAlxGcgWGgtJhanN2jyqWJJJtj3BZqvcHNHcyfOiEPnvVYzVZE3ti7xrZ8VMFgXKecp2TCdNnDE4cXybP3cqjeKq8U+zn4sxTRPSRV7pG9szXzjqMuISlOl5R75/r09brZTH3u0o4nZGcS+Yp05ZItW4wTnNU2Q9Vu7jSGdMwgl0SrbYkcyHYDrGe9GP5tcjj27D03GRYkVvTpWRCDYIPAbwYHLw9Ta/KwerLbBBqq7wsGfomNAX27djdmuS5jdB5KHhWXyKdlZbN8RHlSlWkTqzJqg+b70LSlmcQyrt8f+WHDSc/A+ir6ywZOZL0kd9DtDLdKHFShncqL7HDahgPpeRGnspEklzmk3bZEqSCbUZYZ96U5oNr8uzPKZZnt7b3vCsT0fdny6yK5BU2uH4PlnQlJQ7kaxbJLlF/RV5Z9pfxA+wSOcACNWAgHMjbo5uveYpoWtytJ9 J80FigpR d29BYgfzxAErh2D1+8B5fE+IJm79eTY4V4FZgQFbabktgZq+qDAB4ZOZJw3qHPEuTdbYk82PUNE2bQxlF66ET1tKFoCSZmNHvogWjKzeaKmtE29e+LmCStWMpxCNEitzTo3faNvJrXev67/Oml6lmMlMxNTY5nRc1Kk897CVSfBDbtcBo6A2Fuv31bf3jqaQJrNbGBwPyBTxOMzGHpywaRoxW6iu6JkPPfFfXLsPxrbtlva7rAnadU8EpAFJSgTbJ9fSg87HReHqAHf8/tNnH8NHaYHJqpVFjZgIPc9ZAMjuNr8Nvf/6FR/xMEjCPM3brKWFo7seC5Q+67ndZ74qyqCp2i5ckw3ibAW/XgjnH0pjLq43hoyVbuq1SGwJsrL07tHTCUbSRW09A0utybltmFgDbKYyA+1Kmc0EDMmeu42JcT0r67oX+dlVWi59wK3dcDCcSqxFsQGMTk7wRbWbsuLBzDU2EMNbKyHGPB79WLl3F5GeWjKPcEi1+FVEDhCk5MmDBcf81jslCYDvvxWG2tVJ5/IkqXkGjmJvji9dvhoQ/GDDWvMmW/VyapQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 10, 2025 at 7:04=E2=80=AFPM David Hildenbrand wrote: > > On 10.06.25 07:59, Barry Song wrote: > > From: Barry Song > > > > MADV_FREE is another option, besides MADV_DONTNEED, for dynamic memory > > freeing in user-space native or Java heap memory management. For exampl= e, > > jemalloc can be configured to use MADV_FREE, and recent versions of the > > Android Java heap have also increasingly adopted MADV_FREE. Supporting > > per-VMA locking for MADV_FREE thus appears increasingly necessary. > > > > We have replaced walk_page_range() with walk_page_range_vma(). Along wi= th > > the proposed madvise_lock_mode by Lorenzo, the necessary infrastructure= is > > now in place to begin exploring per-VMA locking support for MADV_FREE a= nd > > potentially other madvise using walk_page_range_vma(). > > > > This patch adds support for the PGWALK_VMA_RDLOCK walk_lock mode in > > walk_page_range_vma(), and leverages madvise_lock_mode from > > madv_behavior to select the appropriate walk_lock=E2=80=94either mmap_l= ock or > > per-VMA lock=E2=80=94based on the context. > > > > To ensure thread safety, madvise_free_walk_ops is now defined as a stac= k > > variable instead of a global constant. > > > > Cc: Lorenzo Stoakes > > Cc: "Liam R. Howlett" > > Cc: David Hildenbrand > > Cc: Vlastimil Babka > > Cc: Jann Horn > > Cc: Suren Baghdasaryan > > Cc: Lokesh Gidra > > Cc: Mike Rapoport > > Cc: Michal Hocko > > Cc: Tangquan Zheng > > Cc: Qi Zheng > > Signed-off-by: Barry Song > > --- > > include/linux/pagewalk.h | 2 ++ > > mm/madvise.c | 20 ++++++++++++++------ > > mm/pagewalk.c | 6 ++++++ > > 3 files changed, 22 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h > > index 9700a29f8afb..a4afa64ef0ab 100644 > > --- a/include/linux/pagewalk.h > > +++ b/include/linux/pagewalk.h > > @@ -14,6 +14,8 @@ enum page_walk_lock { > > PGWALK_WRLOCK =3D 1, > > /* vma is expected to be already write-locked during the walk */ > > PGWALK_WRLOCK_VERIFY =3D 2, > > + /* vma is expected to be already read-locked during the walk */ > > + PGWALK_VMA_RDLOCK_VERIFY =3D 3, > > }; > > > > /** > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 381eedde8f6d..23d58eb31c8f 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -775,10 +775,14 @@ static int madvise_free_pte_range(pmd_t *pmd, uns= igned long addr, > > return 0; > > } > > > > -static const struct mm_walk_ops madvise_free_walk_ops =3D { > > - .pmd_entry =3D madvise_free_pte_range, > > - .walk_lock =3D PGWALK_RDLOCK, > > -}; > > +static inline enum page_walk_lock get_walk_lock(enum madvise_lock_mode= mode) > > +{ > > + /* Other modes don't require fixing up the walk_lock. */ > > + VM_WARN_ON_ONCE(mode !=3D MADVISE_VMA_READ_LOCK && > > + mode !=3D MADVISE_MMAP_READ_LOCK); > > + return mode =3D=3D MADVISE_VMA_READ_LOCK ? > > + PGWALK_VMA_RDLOCK_VERIFY : PGWALK_RDLOCK; > > +} > > > > static int madvise_free_single_vma(struct madvise_behavior *madv_beha= vior, > > struct vm_area_struct *vma, > > @@ -787,6 +791,9 @@ static int madvise_free_single_vma(struct madvise_b= ehavior *madv_behavior, > > struct mm_struct *mm =3D vma->vm_mm; > > struct mmu_notifier_range range; > > struct mmu_gather *tlb =3D madv_behavior->tlb; > > + struct mm_walk_ops walk_ops =3D { > > + .pmd_entry =3D madvise_free_pte_range, > > + }; > > > > /* MADV_FREE works for only anon vma at the moment */ > > if (!vma_is_anonymous(vma)) > > @@ -806,8 +813,9 @@ static int madvise_free_single_vma(struct madvise_b= ehavior *madv_behavior, > > > > mmu_notifier_invalidate_range_start(&range); > > tlb_start_vma(tlb, vma); > > + walk_ops.walk_lock =3D get_walk_lock(madv_behavior->lock_mode); > > walk_page_range_vma(vma, range.start, range.end, > > - &madvise_free_walk_ops, tlb); > > + &walk_ops, tlb); > > tlb_end_vma(tlb, vma); > > mmu_notifier_invalidate_range_end(&range); > > return 0; > > @@ -1653,7 +1661,6 @@ static enum madvise_lock_mode get_lock_mode(struc= t madvise_behavior *madv_behavi > > case MADV_WILLNEED: > > case MADV_COLD: > > case MADV_PAGEOUT: > > - case MADV_FREE: > > case MADV_POPULATE_READ: > > case MADV_POPULATE_WRITE: > > case MADV_COLLAPSE: > > @@ -1662,6 +1669,7 @@ static enum madvise_lock_mode get_lock_mode(struc= t madvise_behavior *madv_behavi > > return MADVISE_MMAP_READ_LOCK; > > case MADV_DONTNEED: > > case MADV_DONTNEED_LOCKED: > > + case MADV_FREE: > > return MADVISE_VMA_READ_LOCK; > > default: > > return MADVISE_MMAP_WRITE_LOCK; > > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > > index e478777c86e1..c984aacc5552 100644 > > --- a/mm/pagewalk.c > > +++ b/mm/pagewalk.c > > @@ -420,6 +420,9 @@ static int __walk_page_range(unsigned long start, u= nsigned long end, > > static inline void process_mm_walk_lock(struct mm_struct *mm, > > enum page_walk_lock walk_lock) > > { > > + if (walk_lock =3D=3D PGWALK_VMA_RDLOCK_VERIFY) > > + return; > > + > > if (walk_lock =3D=3D PGWALK_RDLOCK) > > mmap_assert_locked(mm); > > Nit: I'd have converted the "else" into "else if (walk_lock !=3D > PGWALK_VMA_RDLOCK_VERIFY) Seems good to me. > > > else > > @@ -437,6 +440,9 @@ static inline void process_vma_walk_lock(struct vm_= area_struct *vma, > > case PGWALK_WRLOCK_VERIFY: > > vma_assert_write_locked(vma); > > break; > > + case PGWALK_VMA_RDLOCK_VERIFY: > > + vma_assert_locked(vma); > > + break; > > case PGWALK_RDLOCK: > > /* PGWALK_RDLOCK is handled by process_mm_walk_lock */ > > break; > > Nothing jumped at me an I think this should be ok > > Acked-by: David Hildenbrand Thanks! > > -- > Cheers, > > David / dhildenb >