From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E48AC71157 for ; Wed, 18 Jun 2025 10:11:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A07D96B008A; Wed, 18 Jun 2025 06:11:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9DF1F6B008C; Wed, 18 Jun 2025 06:11:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F4EA6B0092; Wed, 18 Jun 2025 06:11:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7DB0A6B008A for ; Wed, 18 Jun 2025 06:11:41 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 25C5B1A10CC for ; Wed, 18 Jun 2025 10:11:41 +0000 (UTC) X-FDA: 83568104802.13.2F16244 Received: from mail-ua1-f45.google.com (mail-ua1-f45.google.com [209.85.222.45]) by imf21.hostedemail.com (Postfix) with ESMTP id 2E3111C000D for ; Wed, 18 Jun 2025 10:11:39 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="N/1Eqqde"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750241499; a=rsa-sha256; cv=none; b=w7uUwHVFBnfNS1hQ4t3hZnA7xUWD505jgfrILhMRAaz1Aa46Iejei1q0UzUP6fO3IPlRNJ jRcTQiBri1Q1z/UvHsb6sAJzy2rw5ThEt3u5rNNmvHVZRxaq9EE5UKg61M739hLdrQpFci doNJ5reAF2AGh0YqZaYs6SCTXVndFIY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="N/1Eqqde"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750241499; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vrp7i+AfDkHRc5AuzlDsWI51xwZNRpHNEaxonjwFLuM=; b=T3/qVSuIF0KILu1LQuk3y7lnO/HQJCtZBefD1HaSUknuHGLRjb013eUgvj62sjCA3CNA0V OUoAG/ogRnuZkEJS93tJeUNNE1/pKoZ/uA4mreZ/Q5gX7xhL/92FLIt5AHM6wyFN91B8tA stTC2p8OtI6upOB5ADNPyHrPb95GEVc= Received: by mail-ua1-f45.google.com with SMTP id a1e0cc1a2514c-87ec9a4c86cso1718319241.1 for ; Wed, 18 Jun 2025 03:11:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750241498; x=1750846298; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Vrp7i+AfDkHRc5AuzlDsWI51xwZNRpHNEaxonjwFLuM=; b=N/1EqqdeHShDfSnfII5/65wqeZ4O8hvqhNPEotPqVCmLvNWSwc5ZaGCr00d/DZhnM9 npGZfeHHSsxs/8sQKgFcAJbHvB5qPXAIa6utxWrMx8DfpB510s3Tkl9seLX0euVcJUjM ga02QFQS9xL2v1QNA1LDhlEfSpvsEOwptPrL7/Jo9VvP2ALkuRpKRGNdbEekrfvZCGUX swlr6uPug5kNnYsjX395Gq9s1Bo+MiitszhOE/NztaaApSGm8QJXv2Wup3P+85c2cuX0 2UHE1RJqwsGUGKBFAE5GVBwTfL0Ij1w5niX+IAUKGZKNYevYQ4it7V3unADD49BaKhCG hhYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750241498; x=1750846298; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vrp7i+AfDkHRc5AuzlDsWI51xwZNRpHNEaxonjwFLuM=; b=dthkpf5fFUKVFCGFkFh8jVH8Rv9ZftSumkkW9qFFI9l+lgTA8qucAH98BD3Ebv6bcA a3CNeztfeVp//Tlicc9FKpp6hr2OjYAEtfUT42GTTfZ26YZsN5PRldqxxNpl7ojkGlI1 taXyfQf9yv0qNZHEg6kq8FAfY5u/qHhouWz9MqASQNJAouFV9Vd/OcdPdTLDNUEOcHW4 LhJCvEZ/h1EQz3n4ulSzzt6mzB+KISDxnja0WFgUD8X/nl4W/n/BmE9pHRo4VbCuFHC/ NsBpzaym7jeTFbGIbudT2D5YPI2UM7chhgEcnMa3hodzpVJw0nZ2jW5eQsdznYt6AZ0C ZoIw== X-Forwarded-Encrypted: i=1; AJvYcCWz+X4ijubm/9DY1OnwqClJtaE1Urbtog/WGrlXhp9fX+qR7y6ZnwS9wG5Q/uM9ZqEco3T8dmaRTQ==@kvack.org X-Gm-Message-State: AOJu0Ywkw3p+uxiW87kVCxpXfDYLnvBR+2l2LX+FfcgGHycoRSms39Aq UBzDDBu7KTcWZKX+g3lPGHc4hgGfSflWgLTj3FOo+6ufgVRwrs1fcrbvsAfTXohVY5P2ymdam4D cXBICJnCeSeg6vAH4YmWKn49uQbbxoQ8= X-Gm-Gg: ASbGncvAAZXZFnK9HMLhoMpZueJ8xruMIFVUTMCDx853+5xBTmdD9iIQ1CboHZ5a6yI /aB7+xcTjY2Mu2qDdJaPo4HCZqCmQCX00+WIPTqfBzi+QUKXNUiHrwOl/zLkoOoa3dopLL3hMdW kXzd0k+t9xg9sibHi2WTMaJMlsE1ZRJ2H6Ay1tKvp28ky/5JZ7FdLUrQ== X-Google-Smtp-Source: AGHT+IH98mof/2HzWIONeLNaZjvdKauwTxNOUIDWlrk4q1afMz4rxZckPmNZF9XIG8qY1g6jFh6db6wmygytHif9HDE= X-Received: by 2002:a05:6102:6e8d:b0:4e9:9659:3a5f with SMTP id ada2fe7eead31-4e99659460fmr819783137.10.1750241498165; Wed, 18 Jun 2025 03:11:38 -0700 (PDT) MIME-Version: 1.0 References: <20250607220150.2980-1-21cnbao@gmail.com> <309d22ca-6cd9-4601-8402-d441a07d9443@lucifer.local> In-Reply-To: <309d22ca-6cd9-4601-8402-d441a07d9443@lucifer.local> From: Barry Song <21cnbao@gmail.com> Date: Wed, 18 Jun 2025 18:11:26 +0800 X-Gm-Features: Ac12FXwR8dvbs_c8yh3Ekh2UjKBZKtUF1-gWTbW9Cz4Yyz3s5yGhCbOIQDU7SFA Message-ID: Subject: Re: [PATCH v4] mm: use per_vma lock for MADV_DONTNEED To: Lorenzo Stoakes Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , "Liam R. Howlett" , David Hildenbrand , Vlastimil Babka , Jann Horn , Suren Baghdasaryan , Lokesh Gidra , Tangquan Zheng , Qi Zheng , Lance Yang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2E3111C000D X-Stat-Signature: p4z3tnf4jy9iygxnfopud91d5j44eb9y X-Rspam-User: X-HE-Tag: 1750241498-401260 X-HE-Meta: U2FsdGVkX19XRNZGRvpi7kW+FzisVH0ofHzTW4CqnwnjQfjtXOvmiY8wAMsTPPEVzef2wvLUAUB7XUosvI9BGJ+wKAoIU8SGaw91iipJ+MOZcWgEu1Y/NmBbFWQME035KkBcT0ShvZVkv5YNSRvPIsOyC58/Wf9iZVkfpDPa8a0uaFCyvpxz8P0MJDgn1rNr3zkcBvM3elBES8S1cmMIJZa+YhwYc201Rqb9yeC774XkxLemkm8Nw3xRNFEw+fNJN0K2vfvjjvWTc90zx8hgYVQU31L/OmVKzmByNd4UCG8tacAgjLu5kUiGKHHLB+i+vHo3aIVh311akuuT92s2uaV/JJBlj2ZICrJ+GGgsou0uXXv7C8gGUtNM8FI9a68YSflFwbwDEwbstqQ1CQ2SxvDOjQRE+fHWP3wl4lixi0rBaO0TEaT0oVM8UnbuvaHPeLsT1ySoRTk9NOgDCCXTi9J4BHWFfETa73WhgAHSqURakaFE/4mwD3I37PlWCGZ82z/KymiFQXNCHZx9KRHgs5Ct6G2Eeii5bDMq4OcZTOzFzhVgWxfNt+7cyvBbwzZ8BzVrbnQ3vTZBAsqTOvjy0STQ2UCMPX2mYADWYhb9JgwhbKMKBRJ92m1FLpXn5099TtL7L3vHGSfkAEBoyMzGhEqcCvRQnJPKW+mT3mopYnubJsMbmSjnFOgtCIoh5DNiK2V8swc9rBIuI1qI3ngSAKBFTRozHgtwgsLd5jhahHbeBTki+nTKAYJAoaI1H1rCNVVQVhV79ptEQ9RoNyOsedf+NpfqfqUaUxvcil/PvUoE0dr85lzsk2cbrFj1xT7tb0Md2AU3Gd121TS44v9dqSthlSUl6EyI6TTfq04rbUDK4a0tynZHMcjF6dqGNlki8uAvuYbsQNweIZS5EzSPnQMQ8w5n8acwccLtWfDyHTNxiicSStEjjnopcquYSC8GUpZDkfgENBxbE16TlVl 6GOfjZWp tocnSFe5EnVKDDcdWlSy6RrwvIJt2kWNCJb+SJNJkZ51ug4OnVT2onk+aPpTddpvncRCvpPWzKzPD1PawZRrcG7QTTlQ6ghdfbj5O3gWO7Ws8lkUormBQvSstmmJUfmmRI9LjM/ows8153lfUXy65kgNFR0gMAZiimd0cEb/kJ2ItfYNrCybOyAs3o6TueKBIX6zqSzPzU5dQMsIs6avwMD+8pyEe9dVarrlvmN5ZosBcPnJ05m/W9RIrHa9PRKQBAkMA4iY0YuCw3O7rUjIuRuI1aW9hVlHe3EsnA2c7HdI/QYEhfAYeaZmOQaShZeawxIe1RqgI6eS69Bhch9GHxN5C3K3R/xpOfSecxuelB5Iexz0Qdw41bV1xYZpZVJa760/TLeDHZ193MBqTsy43iH6VPY/yPDWZQ0CGdnnr1fN+igMLnnfcbiZcZ1QyKVm36iiA3j7L3NTtiIgpnCI36e42vjPkHAoWLUBKB8+HddztHPrSuJQwB67SO0O4OGdxdc55CEhjU8+/uuD+5VsihCk8ZQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 17, 2025 at 9:39=E2=80=AFPM Lorenzo Stoakes wrote: > > +cc Lance > > Hi Barry, > > This needs a quick fixpatch, as discovered by Lance in [0], which I did a= n > analysis on [1]. > > Basically, _theoretically_ though not currently in practice, we might end= up > accessing uninitialised state in the struct vm_area_struct **prev value p= assed > around madvise. > > The solution for now is to simply initialise it in the VMA read lock case= , as > all users of this set *prev =3D vma prior to performing the operation. > > Cheers, Lorenzo > > [0]: https://lore.kernel.org/all/20250617020544.57305-1-lance.yang@linux.= dev/ > [1]: https://lore.kernel.org/all/6181fd25-6527-4cd0-b67f-2098191d262d@luc= ifer.local/ > > On Sun, Jun 08, 2025 at 10:01:50AM +1200, Barry Song wrote: > > From: Barry Song > > > > Certain madvise operations, especially MADV_DONTNEED, occur far more > > frequently than other madvise options, particularly in native and Java > > heaps for dynamic memory management. > > > > Currently, the mmap_lock is always held during these operations, even w= hen > > unnecessary. This causes lock contention and can lead to severe priorit= y > > inversion, where low-priority threads=E2=80=94such as Android's HeapTas= kDaemon=E2=80=94 > > hold the lock and block higher-priority threads. > > > > This patch enables the use of per-VMA locks when the advised range lies > > entirely within a single VMA, avoiding the need for full VMA traversal.= In > > practice, userspace heaps rarely issue MADV_DONTNEED across multiple VM= As. > > > > Tangquan=E2=80=99s testing shows that over 99.5% of memory reclaimed by= Android > > benefits from this per-VMA lock optimization. After extended runtime, > > 217,735 madvise calls from HeapTaskDaemon used the per-VMA path, while > > only 1,231 fell back to mmap_lock. > > > > To simplify handling, the implementation falls back to the standard > > mmap_lock if userfaultfd is enabled on the VMA, avoiding the complexity= of > > userfaultfd_remove(). > > > > Many thanks to Lorenzo's work[1] on: > > "Refactor the madvise() code to retain state about the locking mode > > utilised for traversing VMAs. > > > > Then use this mechanism to permit VMA locking to be done later in the > > madvise() logic and also to allow altering of the locking mode to permi= t > > falling back to an mmap read lock if required." > > > > One important point, as pointed out by Jann[2], is that > > untagged_addr_remote() requires holding mmap_lock. This is because > > address tagging on x86 and RISC-V is quite complex. > > > > Until untagged_addr_remote() becomes atomic=E2=80=94which seems unlikel= y in > > the near future=E2=80=94we cannot support per-VMA locks for remote proc= esses. > > So for now, only local processes are supported. > > > > Link: https://lore.kernel.org/all/0b96ce61-a52c-4036-b5b6-5c50783db51f@= lucifer.local/ [1] > > Link: https://lore.kernel.org/all/CAG48ez11zi-1jicHUZtLhyoNPGGVB+ROeAJC= Uw48bsjk4bbEkA@mail.gmail.com/ [2] > > Reviewed-by: Lorenzo Stoakes > > Cc: "Liam R. Howlett" > > Cc: David Hildenbrand > > Cc: Vlastimil Babka > > Cc: Jann Horn > > Cc: Suren Baghdasaryan > > Cc: Lokesh Gidra > > Cc: Tangquan Zheng > > Cc: Qi Zheng > > Signed-off-by: Barry Song > > --- > > -v4: > > * collect Lorenzo's RB; > > * use visit() for per-vma path > > > > mm/madvise.c | 195 ++++++++++++++++++++++++++++++++++++++------------- > > 1 file changed, 147 insertions(+), 48 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 56d9ca2557b9..8382614b71d1 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -48,38 +48,19 @@ struct madvise_walk_private { > > bool pageout; > > }; > > > > +enum madvise_lock_mode { > > + MADVISE_NO_LOCK, > > + MADVISE_MMAP_READ_LOCK, > > + MADVISE_MMAP_WRITE_LOCK, > > + MADVISE_VMA_READ_LOCK, > > +}; > > + > > struct madvise_behavior { > > int behavior; > > struct mmu_gather *tlb; > > + enum madvise_lock_mode lock_mode; > > }; > > > > -/* > > - * Any behaviour which results in changes to the vma->vm_flags needs t= o > > - * take mmap_lock for writing. Others, which simply traverse vmas, nee= d > > - * to only take it for reading. > > - */ > > -static int madvise_need_mmap_write(int behavior) > > -{ > > - switch (behavior) { > > - case MADV_REMOVE: > > - case MADV_WILLNEED: > > - case MADV_DONTNEED: > > - case MADV_DONTNEED_LOCKED: > > - case MADV_COLD: > > - case MADV_PAGEOUT: > > - case MADV_FREE: > > - case MADV_POPULATE_READ: > > - case MADV_POPULATE_WRITE: > > - case MADV_COLLAPSE: > > - case MADV_GUARD_INSTALL: > > - case MADV_GUARD_REMOVE: > > - return 0; > > - default: > > - /* be safe, default to 1. list exceptions explicitly */ > > - return 1; > > - } > > -} > > - > > #ifdef CONFIG_ANON_VMA_NAME > > struct anon_vma_name *anon_vma_name_alloc(const char *name) > > { > > @@ -1486,6 +1467,44 @@ static bool process_madvise_remote_valid(int beh= avior) > > } > > } > > > > +/* > > + * Try to acquire a VMA read lock if possible. > > + * > > + * We only support this lock over a single VMA, which the input range = must > > + * span either partially or fully. > > + * > > + * This function always returns with an appropriate lock held. If a VM= A read > > + * lock could be acquired, we return the locked VMA. > > + * > > + * If a VMA read lock could not be acquired, we return NULL and expect= caller to > > + * fallback to mmap lock behaviour. > > + */ > > +static struct vm_area_struct *try_vma_read_lock(struct mm_struct *mm, > > + struct madvise_behavior *madv_behavior, > > + unsigned long start, unsigned long end) > > +{ > > + struct vm_area_struct *vma; > > + > > + vma =3D lock_vma_under_rcu(mm, start); > > + if (!vma) > > + goto take_mmap_read_lock; > > + /* > > + * Must span only a single VMA; uffd and remote processes are > > + * unsupported. > > + */ > > + if (end > vma->vm_end || current->mm !=3D mm || > > + userfaultfd_armed(vma)) { > > + vma_end_read(vma); > > + goto take_mmap_read_lock; > > + } > > + return vma; > > + > > +take_mmap_read_lock: > > + mmap_read_lock(mm); > > + madv_behavior->lock_mode =3D MADVISE_MMAP_READ_LOCK; > > + return NULL; > > +} > > + > > /* > > * Walk the vmas in range [start,end), and call the visit function on = each one. > > * The visit function will get start and end parameters that cover the= overlap > > @@ -1496,7 +1515,8 @@ static bool process_madvise_remote_valid(int beha= vior) > > */ > > static > > int madvise_walk_vmas(struct mm_struct *mm, unsigned long start, > > - unsigned long end, void *arg, > > + unsigned long end, struct madvise_behavior *madv_be= havior, > > + void *arg, > > int (*visit)(struct vm_area_struct *vma, > > struct vm_area_struct **prev, unsigned= long start, > > unsigned long end, void *arg)) > > @@ -1505,6 +1525,20 @@ int madvise_walk_vmas(struct mm_struct *mm, unsi= gned long start, > > struct vm_area_struct *prev; > > unsigned long tmp; > > int unmapped_error =3D 0; > > + int error; > > + > > + /* > > + * If VMA read lock is supported, apply madvise to a single VMA > > + * tentatively, avoiding walking VMAs. > > + */ > > + if (madv_behavior && madv_behavior->lock_mode =3D=3D MADVISE_VMA_= READ_LOCK) { > > + vma =3D try_vma_read_lock(mm, madv_behavior, start, end); > > + if (vma) { > > + error =3D visit(vma, &prev, start, end, arg); > > + vma_end_read(vma); > > + return error; > > + } > > + } > > > > /* > > * If the interval [start,end) covers some unmapped address > > @@ -1516,8 +1550,6 @@ int madvise_walk_vmas(struct mm_struct *mm, unsig= ned long start, > > prev =3D vma; > > > > for (;;) { > > - int error; > > - > > /* Still start < end. */ > > if (!vma) > > return -ENOMEM; > > @@ -1598,34 +1630,86 @@ int madvise_set_anon_name(struct mm_struct *mm,= unsigned long start, > > if (end =3D=3D start) > > return 0; > > > > - return madvise_walk_vmas(mm, start, end, anon_name, > > + return madvise_walk_vmas(mm, start, end, NULL, anon_name, > > madvise_vma_anon_name); > > } > > #endif /* CONFIG_ANON_VMA_NAME */ > > > > -static int madvise_lock(struct mm_struct *mm, int behavior) > > + > > +/* > > + * Any behaviour which results in changes to the vma->vm_flags needs t= o > > + * take mmap_lock for writing. Others, which simply traverse vmas, nee= d > > + * to only take it for reading. > > + */ > > +static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *m= adv_behavior) > > { > > + int behavior =3D madv_behavior->behavior; > > + > > if (is_memory_failure(behavior)) > > - return 0; > > + return MADVISE_NO_LOCK; > > > > - if (madvise_need_mmap_write(behavior)) { > > + switch (behavior) { > > + case MADV_REMOVE: > > + case MADV_WILLNEED: > > + case MADV_COLD: > > + case MADV_PAGEOUT: > > + case MADV_FREE: > > + case MADV_POPULATE_READ: > > + case MADV_POPULATE_WRITE: > > + case MADV_COLLAPSE: > > + case MADV_GUARD_INSTALL: > > + case MADV_GUARD_REMOVE: > > + return MADVISE_MMAP_READ_LOCK; > > + case MADV_DONTNEED: > > + case MADV_DONTNEED_LOCKED: > > + return MADVISE_VMA_READ_LOCK; > > + default: > > + return MADVISE_MMAP_WRITE_LOCK; > > + } > > +} > > + > > +static int madvise_lock(struct mm_struct *mm, > > + struct madvise_behavior *madv_behavior) > > +{ > > + enum madvise_lock_mode lock_mode =3D get_lock_mode(madv_behavior)= ; > > + > > + switch (lock_mode) { > > + case MADVISE_NO_LOCK: > > + break; > > + case MADVISE_MMAP_WRITE_LOCK: > > if (mmap_write_lock_killable(mm)) > > return -EINTR; > > - } else { > > + break; > > + case MADVISE_MMAP_READ_LOCK: > > mmap_read_lock(mm); > > + break; > > + case MADVISE_VMA_READ_LOCK: > > + /* We will acquire the lock per-VMA in madvise_walk_vmas(= ). */ > > + break; > > } > > + > > + madv_behavior->lock_mode =3D lock_mode; > > return 0; > > } > > > > -static void madvise_unlock(struct mm_struct *mm, int behavior) > > +static void madvise_unlock(struct mm_struct *mm, > > + struct madvise_behavior *madv_behavior) > > { > > - if (is_memory_failure(behavior)) > > + switch (madv_behavior->lock_mode) { > > + case MADVISE_NO_LOCK: > > return; > > - > > - if (madvise_need_mmap_write(behavior)) > > + case MADVISE_MMAP_WRITE_LOCK: > > mmap_write_unlock(mm); > > - else > > + break; > > + case MADVISE_MMAP_READ_LOCK: > > mmap_read_unlock(mm); > > + break; > > + case MADVISE_VMA_READ_LOCK: > > + /* We will drop the lock per-VMA in madvise_walk_vmas(). = */ > > + break; > > + } > > + > > + madv_behavior->lock_mode =3D MADVISE_NO_LOCK; > > } > > > > static bool madvise_batch_tlb_flush(int behavior) > > @@ -1710,6 +1794,21 @@ static bool is_madvise_populate(int behavior) > > } > > } > > > > +/* > > + * untagged_addr_remote() assumes mmap_lock is already held. On > > + * architectures like x86 and RISC-V, tagging is tricky because each > > + * mm may have a different tagging mask. However, we might only hold > > + * the per-VMA lock (currently only local processes are supported), > > + * so untagged_addr is used to avoid the mmap_lock assertion for > > + * local processes. > > + */ > > +static inline unsigned long get_untagged_addr(struct mm_struct *mm, > > + unsigned long start) > > +{ > > + return current->mm =3D=3D mm ? untagged_addr(start) : > > + untagged_addr_remote(mm, start); > > +} > > + > > static int madvise_do_behavior(struct mm_struct *mm, > > unsigned long start, size_t len_in, > > struct madvise_behavior *madv_behavior) > > @@ -1721,7 +1820,7 @@ static int madvise_do_behavior(struct mm_struct *= mm, > > > > if (is_memory_failure(behavior)) > > return madvise_inject_error(behavior, start, start + len_= in); > > - start =3D untagged_addr_remote(mm, start); > > + start =3D get_untagged_addr(mm, start); > > end =3D start + PAGE_ALIGN(len_in); > > > > blk_start_plug(&plug); > > @@ -1729,7 +1828,7 @@ static int madvise_do_behavior(struct mm_struct *= mm, > > error =3D madvise_populate(mm, start, end, behavior); > > else > > error =3D madvise_walk_vmas(mm, start, end, madv_behavior= , > > - madvise_vma_behavior); > > + madv_behavior, madvise_vma_behavior); > > blk_finish_plug(&plug); > > return error; > > } > > @@ -1817,13 +1916,13 @@ int do_madvise(struct mm_struct *mm, unsigned l= ong start, size_t len_in, int beh > > > > if (madvise_should_skip(start, len_in, behavior, &error)) > > return error; > > - error =3D madvise_lock(mm, behavior); > > + error =3D madvise_lock(mm, &madv_behavior); > > if (error) > > return error; > > madvise_init_tlb(&madv_behavior, mm); > > error =3D madvise_do_behavior(mm, start, len_in, &madv_behavior); > > madvise_finish_tlb(&madv_behavior); > > - madvise_unlock(mm, behavior); > > + madvise_unlock(mm, &madv_behavior); > > > > return error; > > } > > @@ -1847,7 +1946,7 @@ static ssize_t vector_madvise(struct mm_struct *m= m, struct iov_iter *iter, > > > > total_len =3D iov_iter_count(iter); > > > > - ret =3D madvise_lock(mm, behavior); > > + ret =3D madvise_lock(mm, &madv_behavior); > > if (ret) > > return ret; > > madvise_init_tlb(&madv_behavior, mm); > > @@ -1880,8 +1979,8 @@ static ssize_t vector_madvise(struct mm_struct *m= m, struct iov_iter *iter, > > > > /* Drop and reacquire lock to unwind race. */ > > madvise_finish_tlb(&madv_behavior); > > - madvise_unlock(mm, behavior); > > - ret =3D madvise_lock(mm, behavior); > > + madvise_unlock(mm, &madv_behavior); > > + ret =3D madvise_lock(mm, &madv_behavior); > > if (ret) > > goto out; > > madvise_init_tlb(&madv_behavior, mm); > > @@ -1892,7 +1991,7 @@ static ssize_t vector_madvise(struct mm_struct *m= m, struct iov_iter *iter, > > iov_iter_advance(iter, iter_iov_len(iter)); > > } > > madvise_finish_tlb(&madv_behavior); > > - madvise_unlock(mm, behavior); > > + madvise_unlock(mm, &madv_behavior); > > > > out: > > ret =3D (total_len - iov_iter_count(iter)) ? : ret; > > -- > > 2.39.3 (Apple Git-146) > > > > ----8<---- > From 1ffcaea75ebdaffe15805386f6d7733883d265a5 Mon Sep 17 00:00:00 2001 > From: Lorenzo Stoakes > Date: Tue, 17 Jun 2025 14:35:13 +0100 > Subject: [PATCH] mm/madvise: avoid any chance of uninitialised pointer de= ref > > If we were to extend madvise() to support more operations under VMA lock, > we could potentially dereference prev to uninitialised state in > madvise_update_vma(). > > Avoid this by explicitly setting prev to vma before invoking the visit() > function. > > This has no impact on behaviour, as all visitors compatible with a VMA lo= ck > do not require prev to be set to the previous VMA and at any rate we only > examine a single VMA in VMA lock mode. > > Reported-by: Lance Yang > Signed-off-by: Lorenzo Stoakes > --- > mm/madvise.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/madvise.c b/mm/madvise.c > index efe5d64e1175..0970623a0e98 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1333,6 +1333,8 @@ static int madvise_vma_behavior(struct vm_area_stru= ct *vma, > return madvise_guard_remove(vma, prev, start, end); > } > > + /* We cannot provide prev in this lock mode. */ > + VM_WARN_ON_ONCE(arg->lock_mode =3D=3D MADVISE_VMA_READ_LOCK); Thanks, Lorenzo. Do we even reach this point for MADVISE_MMAP_READ_LOCK cases? madvise_update_vma() attempts to merge or split VMAs=E2=80=94wouldn't that = be a scenario that requires a write lock? The prerequisite for using a VMA read lock is that the operation must be safe under an mmap read lock as well. > anon_name =3D anon_vma_name(vma); > anon_vma_name_get(anon_name); > error =3D madvise_update_vma(vma, prev, start, end, new_flags, > @@ -1549,6 +1551,7 @@ int madvise_walk_vmas(struct mm_struct *mm, unsigne= d long start, > if (madv_behavior && madv_behavior->lock_mode =3D=3D MADVISE_VMA_= READ_LOCK) { > vma =3D try_vma_read_lock(mm, madv_behavior, start, end); > if (vma) { > + prev =3D vma; > error =3D visit(vma, &prev, start, end, arg); > vma_end_read(vma); > return error; > -- > 2.49.0 Thanks Barry