From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04F1DC5B552 for ; Tue, 10 Jun 2025 05:59:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 76A226B007B; Tue, 10 Jun 2025 01:59:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 741376B0089; Tue, 10 Jun 2025 01:59:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67DCE6B008A; Tue, 10 Jun 2025 01:59:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4B85B6B007B for ; Tue, 10 Jun 2025 01:59:35 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E910F5EC0F for ; Tue, 10 Jun 2025 05:59:34 +0000 (UTC) X-FDA: 83538439068.10.EF980AC Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf19.hostedemail.com (Postfix) with ESMTP id 1F2AB1A0006 for ; Tue, 10 Jun 2025 05:59:32 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=L0vlUtBO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749535173; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=uMGllP0kHtqQgBWxobNAgqAN07VXjKAKrzg9IPnrFtA=; b=Qb1eNYGvCCvsCXuDAaekag1dRFLktcCUaKTfpLa9iqy0VrcKrKczP/pkDG5oeWH/vZcaXq fnUm9NqUBW9ewhjtsCkTuyzfEMnAQkhAqpAcJD0piYTnbb+2hvmJBRm44kMn3GBnj2ss+6 76lHFmL+MN/Nrdnd38Q6xgZENtqRJHI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749535173; a=rsa-sha256; cv=none; b=SW/5vJMsEpjpVzlJqiQK5oqGMacGi8XuKKnV0ez+6p1xVGEGoMBupcVqgPIWyzK0fUz0NZ IuAozxBpvuIUmDlwaRTIqyIoi/X+FDCZvwgzCZuFTDwkjEKKy88DbXtwVmWJFQc2Qwo7mq ztUzaUAaRmEJ+aKiHDNbeqpTFvhAG8c= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=L0vlUtBO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-742c73f82dfso4059408b3a.2 for ; Mon, 09 Jun 2025 22:59:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749535172; x=1750139972; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=uMGllP0kHtqQgBWxobNAgqAN07VXjKAKrzg9IPnrFtA=; b=L0vlUtBOa9+7NU/iBkgNyWfhn3UsqkcRSp2MgZfKVV43F9l11XaY0/sM3APsM2KLfq 2c8gi8WsS6S0GpRWAcWlWAoKKUIp2RVzyAmCvMUZ/aBbCHkPbHSTAoXUrCLNtbjSFi5I ag4L3rmdMhC30UJX7IbZOhs/XAxpjQ/9oxrVettVv6MC8UjbbwOMkfsfoWEyOlw+Bfa6 Ynkg/CtpvDjiKc7KMOkmWtDh801fsejvpurMlP36qZFrPGqtWL/+qiZQfckJvfg9OM86 YjCyJWoi728yXQo8XLVUwfrw/ZFplDi14hrT87re+BdpgbKUpx8V5ZMtnsYDoj9RmX+Q okhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749535172; x=1750139972; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uMGllP0kHtqQgBWxobNAgqAN07VXjKAKrzg9IPnrFtA=; b=RFtfMwKVczSM4G1tILK24NihjoR/RSQjbtAOSzM/+LTfQkU55LeShWIH6f0peFAcUE DXa2nMBSyH1EZBf3BvhId3smc7vnN22DMgmsYuQGNMBLKRisC+BxH1PZOPidNVYFlIKD +vHVb6EmDN9FKQbz6+ATKrEauAvGRbqDV7P5Do8m8+sHSaYE73kjReb19UMJm+BP5j5q CHYFR1+p2yxpohv0EifhyF5qV++dv1hkXTW2A28FfpbajuAQ7LDwXJSyGQMmErUgwY59 LxjAIWvv5td8yH7i57YClJqHAUKrusl1W6sUSZpurQi+64pRmkDAuH8tIoj7thAqYdA2 q0nw== X-Forwarded-Encrypted: i=1; AJvYcCWqDpeOYpnfmtMGyHP1S3lNO78EBNS/V3TgIPruOuoMxQ1tzg8mkR56OosPOI5F+eP6d5IAD0KQog==@kvack.org X-Gm-Message-State: AOJu0YyBrC5i/QOZkZAbuIKoMCfM/CifUujUuBVqoEmpxNbnzMSeQ1Tr 2e42mXnmyW9nVZZgsfYT5R8mDfqD2aVpuNzyvHQRBht2O0+09qBttwpB X-Gm-Gg: ASbGncuBdn9ViSgxrTpX9xuvx+xO/YItr1dsB0Qfen+SPuiVSK5GmbMPItS73ClWeQI 2bX/JdYGVrLQY9AZT7Ei5kzJkpWobZVRCqd+VvkERxrdA9TvaigTYURl4IXJzj3YNcN4KNrnBZ1 CpHpgoPh7TtZsAIeyTTziYyf/WQARt5pfwSUiHpiT61DndPgeHl4NYOz3R1KwaNNe2XuKXGbvhf KComma0OVZRruMaSUMobhcdWPvP/u7TLOE+DXidicGuI2LyBcpjFpftSkmhCI2xXTcBWksnXDop sr6AUHndW/vP8bcpQcOmsTZkuZAta40Q3EpFNGAQcH+4sQchisIR25sLfeeA2Unk9KJpwXI2LuJ 0LvOnFv6fFquqs7g= X-Google-Smtp-Source: AGHT+IEAcEs3T8PPYHuPGihjJG7CmDR9TSUX/bdswV8JxA1i4kvPMQcC7N7Q+Baxtj7fUMX8e+g4oQ== X-Received: by 2002:a05:6a00:4607:b0:736:32d2:aa82 with SMTP id d2e1a72fcca58-74827f33090mr20226636b3a.23.1749535171847; Mon, 09 Jun 2025 22:59:31 -0700 (PDT) Received: from Barrys-MBP.hub ([118.92.145.159]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7482af7b4b9sm6708007b3a.71.2025.06.09.22.59.25 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 09 Jun 2025 22:59:30 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Barry Song , Lorenzo Stoakes , "Liam R. Howlett" , David Hildenbrand , Vlastimil Babka , Jann Horn , Suren Baghdasaryan , Lokesh Gidra , Mike Rapoport , Michal Hocko , Tangquan Zheng , Qi Zheng Subject: [PATCH RFC] mm: madvise: use per_vma lock for MADV_FREE Date: Tue, 10 Jun 2025 17:59:20 +1200 Message-Id: <20250610055920.21323-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1F2AB1A0006 X-Stat-Signature: tifg9o6rsouur7anchukcwxdjoeyps87 X-Rspam-User: X-HE-Tag: 1749535172-370338 X-HE-Meta: U2FsdGVkX18ywdnRU8YWgFJiOfgMyumtSGD05QxrQySQmxdkOn3K3rxKc9rCkb0gF+wujezZDRr2Zo9dMRFqa+Fv5KGe4AGqWKa2cqH8eMYwl4G/ffuDiHh+RTOMSvx276fpGCURSbWBvAgFhLFdMGhJEpkkUos/001CuLFDrlDqPghAvus7HTSQga7FS/U/nBopR4Px82H/OPiK2aHs/+PB9ELxh+sgmibhg+zOYvlpQLM3CGR2UuEzjOnf40qLOh0JSli9bUAsh7VUPR434v3LgT8yRr3P4E2WpffKzYv9xEJ9t93rijq/xWK0yH8F/LJE0cwPmWskjacCEKAoGCCbEdQHMKG7HtoGq1ZbWL2AQynDwyeOezNMYqc2O95FdT6c0QzCimXZnY4Bv9vaZZmUzrp4TUvfl5frznUycyCp3wg76fRZEJAm+fobUyr/SBt6dmk8GuGeAXuetZgod44uI2kQ6xYDbuSrTG8R+PCu1hPkaGEB1ZOfzLzWaf0pjo+WjHopuBWjjmZ8UdHRRQLRIAOtRDNfrQauiPz4ytNB7MhIEox4VjERe06z7hGWv736vr2p+hdV0i4VM0R52BL+SjA3W8Q6s9QqC5Ths2NQ1lwGeSoWnDtLbHfMC73hjcMX5jVgTp3ybNKIG91J1w4a4+8+PsiSf+0TEMPeS8OUZEpFlzL5IGuTf4vJoShQFcCeTK8igbcGsX9y2T9Yj/50F3lxQ2ACQ+PCfou5q3tzfVL3qGE420MBlZispUWeVLcS3W2ko/gwuEXIs6zMD1NzJ0X/SHKPV1Tki3q2qdg6GHtkLhSs+ODq8At5lETeidspHi434RZHf1JFVn2flmw2ALClYl8hRYbgKXMWj1s8Gr8urIxHRRlY6BbpdVS1w/XWE+HuPwJy0/dq4jnJvmeMaj+YHQjcZfIu8Ek2zgJ8y/h+9U6G95A1RR336kv4piyZDaTp6RCZublt9LH OHm3SpwL iFxhPL9vfkeZoq6uRjg/IXfSaTylZRVOpp+bDjd6MVjTXTkGTM0bAWiTFkHeC3prwsAn2bsNpy3wyt0Uz5nnRfv8ecgIaLDyCXzJxmSya1id+n3iMqhDOi7vxzRXM9a0Oo+nrxZ59xm6JcF9WNOh+AJs6EvKaHWc0UWtI0hIkjEmi7NfBCtdmWHyAYKs0FBeFMqYrskGRxwRjU94e6Zvxo7jgc8mrvlibu2ioEIz0Oat5mjnnDLTwjK1p4zGAF1F/1oEZMqWZh1a32ZjzATG5JqHoMTpLnnYWhXiecEcuWqCjVoAfEfIMDFcgcDao3FqRy4ovzTCF5bSXCbZbMoqMckoP5hmBLrH65N1RR2GX5+77V6uU4h2IM/W7yAkPA7HH+Hx8KXCBBnlchApo2UuPfSW9iCRi9xijDLtkWgfK1E8+wWriyZbVlEE5YqmlOlw/UkOJzufo21/Osha7A0yu02vjy1sxfNds6ziv+lawfIBWDL6lwd+KDJA1E6lpG0McFCycHSPhzvBKkw0jDiTpfeWLm2C3EqWm7KWO357AYJMoZDaD38//2WvNh1YdhCyRBh0YujDQK83V6uT++YxP2Vj5+l5Lt8YRZaqxCblE7unMOmiEORc4r+zPtYj5dmFwWbad X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song MADV_FREE is another option, besides MADV_DONTNEED, for dynamic memory freeing in user-space native or Java heap memory management. For example, jemalloc can be configured to use MADV_FREE, and recent versions of the Android Java heap have also increasingly adopted MADV_FREE. Supporting per-VMA locking for MADV_FREE thus appears increasingly necessary. We have replaced walk_page_range() with walk_page_range_vma(). Along with the proposed madvise_lock_mode by Lorenzo, the necessary infrastructure is now in place to begin exploring per-VMA locking support for MADV_FREE and potentially other madvise using walk_page_range_vma(). This patch adds support for the PGWALK_VMA_RDLOCK walk_lock mode in walk_page_range_vma(), and leverages madvise_lock_mode from madv_behavior to select the appropriate walk_lock—either mmap_lock or per-VMA lock—based on the context. To ensure thread safety, madvise_free_walk_ops is now defined as a stack variable instead of a global constant. Cc: Lorenzo Stoakes Cc: "Liam R. Howlett" Cc: David Hildenbrand Cc: Vlastimil Babka Cc: Jann Horn Cc: Suren Baghdasaryan Cc: Lokesh Gidra Cc: Mike Rapoport Cc: Michal Hocko Cc: Tangquan Zheng Cc: Qi Zheng Signed-off-by: Barry Song --- include/linux/pagewalk.h | 2 ++ mm/madvise.c | 20 ++++++++++++++------ mm/pagewalk.c | 6 ++++++ 3 files changed, 22 insertions(+), 6 deletions(-) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 9700a29f8afb..a4afa64ef0ab 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -14,6 +14,8 @@ enum page_walk_lock { PGWALK_WRLOCK = 1, /* vma is expected to be already write-locked during the walk */ PGWALK_WRLOCK_VERIFY = 2, + /* vma is expected to be already read-locked during the walk */ + PGWALK_VMA_RDLOCK_VERIFY = 3, }; /** diff --git a/mm/madvise.c b/mm/madvise.c index 381eedde8f6d..23d58eb31c8f 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -775,10 +775,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, return 0; } -static const struct mm_walk_ops madvise_free_walk_ops = { - .pmd_entry = madvise_free_pte_range, - .walk_lock = PGWALK_RDLOCK, -}; +static inline enum page_walk_lock get_walk_lock(enum madvise_lock_mode mode) +{ + /* Other modes don't require fixing up the walk_lock. */ + VM_WARN_ON_ONCE(mode != MADVISE_VMA_READ_LOCK && + mode != MADVISE_MMAP_READ_LOCK); + return mode == MADVISE_VMA_READ_LOCK ? + PGWALK_VMA_RDLOCK_VERIFY : PGWALK_RDLOCK; +} static int madvise_free_single_vma(struct madvise_behavior *madv_behavior, struct vm_area_struct *vma, @@ -787,6 +791,9 @@ static int madvise_free_single_vma(struct madvise_behavior *madv_behavior, struct mm_struct *mm = vma->vm_mm; struct mmu_notifier_range range; struct mmu_gather *tlb = madv_behavior->tlb; + struct mm_walk_ops walk_ops = { + .pmd_entry = madvise_free_pte_range, + }; /* MADV_FREE works for only anon vma at the moment */ if (!vma_is_anonymous(vma)) @@ -806,8 +813,9 @@ static int madvise_free_single_vma(struct madvise_behavior *madv_behavior, mmu_notifier_invalidate_range_start(&range); tlb_start_vma(tlb, vma); + walk_ops.walk_lock = get_walk_lock(madv_behavior->lock_mode); walk_page_range_vma(vma, range.start, range.end, - &madvise_free_walk_ops, tlb); + &walk_ops, tlb); tlb_end_vma(tlb, vma); mmu_notifier_invalidate_range_end(&range); return 0; @@ -1653,7 +1661,6 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi case MADV_WILLNEED: case MADV_COLD: case MADV_PAGEOUT: - case MADV_FREE: case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: case MADV_COLLAPSE: @@ -1662,6 +1669,7 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi return MADVISE_MMAP_READ_LOCK; case MADV_DONTNEED: case MADV_DONTNEED_LOCKED: + case MADV_FREE: return MADVISE_VMA_READ_LOCK; default: return MADVISE_MMAP_WRITE_LOCK; diff --git a/mm/pagewalk.c b/mm/pagewalk.c index e478777c86e1..c984aacc5552 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -420,6 +420,9 @@ static int __walk_page_range(unsigned long start, unsigned long end, static inline void process_mm_walk_lock(struct mm_struct *mm, enum page_walk_lock walk_lock) { + if (walk_lock == PGWALK_VMA_RDLOCK_VERIFY) + return; + if (walk_lock == PGWALK_RDLOCK) mmap_assert_locked(mm); else @@ -437,6 +440,9 @@ static inline void process_vma_walk_lock(struct vm_area_struct *vma, case PGWALK_WRLOCK_VERIFY: vma_assert_write_locked(vma); break; + case PGWALK_VMA_RDLOCK_VERIFY: + vma_assert_locked(vma); + break; case PGWALK_RDLOCK: /* PGWALK_RDLOCK is handled by process_mm_walk_lock */ break; -- 2.39.3 (Apple Git-146)