From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6897C71157 for ; Wed, 18 Jun 2025 09:52:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 465866B0088; Wed, 18 Jun 2025 05:52:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4164E6B0089; Wed, 18 Jun 2025 05:52:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32C166B008A; Wed, 18 Jun 2025 05:52:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 25C406B0088 for ; Wed, 18 Jun 2025 05:52:44 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B24D2C10AE for ; Wed, 18 Jun 2025 09:52:43 +0000 (UTC) X-FDA: 83568057006.07.745FB75 Received: from mail-ua1-f44.google.com (mail-ua1-f44.google.com [209.85.222.44]) by imf04.hostedemail.com (Postfix) with ESMTP id D304040014 for ; Wed, 18 Jun 2025 09:52:41 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Y5rFmReO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750240361; a=rsa-sha256; cv=none; b=zI8HUVcXa7sp9xNW588VqNMgQtawfKD3elgQGwDkn9fecIyk71OBvl+4aeQsOLK1e8gM9+ ZzFwDJfWTukIiv6yics8bUQfo2IDZ4iCiCtCEawzBt7S1F7A1o9Hiad3l+PBz1D5D/Orsv nFk5EE+l7ECgMFwhpjR1S6R8lg4tTdI= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Y5rFmReO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750240361; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=muzPKRA0oXpJ18M/OJ/yeBi5jJNJKBY7O/8ZgYL2clQ=; b=ZkFWJpQe0FafoXQAgO85p45GI/HIwrjSiaIhNZ8ungHKy3QYtdC0F9z+0j0xiS+2/YtAMY 7zW/6fI1BF8MmXxn9IIvarM1ZTL5dFKxB/p3OOWXnvMMUl2BlkLpyqYqR//FfoKXAo1OVe nPLGFzw8Qp9wI6E/vtGRuD3eycol+cY= Received: by mail-ua1-f44.google.com with SMTP id a1e0cc1a2514c-87f161d449dso3687022241.3 for ; Wed, 18 Jun 2025 02:52:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750240361; x=1750845161; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=muzPKRA0oXpJ18M/OJ/yeBi5jJNJKBY7O/8ZgYL2clQ=; b=Y5rFmReOmP3JcB0BRwHM1VSjGL/JIjz+7g9IZ9oDNZz9r4PRa/WrRdNe9n1HOsbhj8 YkC4AkqNCP2ln76RAXv52/i5Z93kutf0pL7QP6CyUtJ+R8ujXiZzjsLtSbo/MAZWa9Zi ONLk/U6GdDFuH7uFZaBG/XK+lLypDTjxmFn86tUHnqUGtzSOWr+qXciqzDsHv3hcIj5e vLleUgz5Sbo1lFZlAF15LSQVuLek7zTdzXOBSgOMe+5zAwrz/NTQ5bmvqy5lrTh5kul+ 6GOg+/+Bmu6m7QgqALQnv2tH+qjUxCJhMMECxCdTN0A725yHr9VbCypU4kBK356wBGMi KBVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750240361; x=1750845161; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=muzPKRA0oXpJ18M/OJ/yeBi5jJNJKBY7O/8ZgYL2clQ=; b=IxwPKK3W757pIU8PbHyyacFfku4RSjA6WLePZ4Jq83N0GEhHDzLvDoIxdgPq1Rn6Qw pmMDjL8BwHaqunRZb6v3qje3qEp7O8FmskWh1ggPn+/Q+xvnyja7X/yzmLPVKdZPGQb5 PpmaEEEIDbzFEzrBIwVnS5i4XPX8BoqxxkCYaLqCJhNbRLzPXWvo0OZN5Rt2gfpWBxvw 6lBBomdzBU8ROc4mTV+I0fkxYKKWGnJg9u0GEAgtwhcCy5xzg+jGDh/xp9Y2gX9JVmE9 n72PhAAtRQkImDvU+liI5mQiBGcd3z1SDcqmpXgDQ1kf7uGABnxl4l7/+yv5a69baICX dixg== X-Forwarded-Encrypted: i=1; AJvYcCUlwK6DS+wgtWtFaotdzZCvUq38i4NpjXT3EQslXKBDEzF5/zwt/jO6zHOIV4cAmpFx6vWIo4cxjw==@kvack.org X-Gm-Message-State: AOJu0YxrxuCDEwC4s714JnXBC8QP9iKG3CR94uc4j8kuTdLcQpHbwNkC DE1HVhpPezJx52dww49e+GmgiAwA9AzsLV5mtOQwbiZZgQeWGPL5Pwk//07Dvr0t59af2sFLMBV OVRvTeC7PTOyZIwhjYu27+0O3dn1WV10= X-Gm-Gg: ASbGncu1nC3wkMY5dcMxv1NR4KGl5ww981HAPeo9sE4HktxU7zEI03d0OJbx+BJjS11 h42TPdfanLbXnemi027zEobvXjhAgP+JqGh2WoBGfA5/NjNG7ErApJkNJa2/wTbTPtNVpzq0etT 3cKj2SHuAgYK3RQtZxAdj8OKHmuzmW4IotVTHeAssCtvA= X-Google-Smtp-Source: AGHT+IENCLY7tb8wCgeKdSRxaRBa5eDJ5oqLt2UY/7tfrWQQhCiff7S5KElIHcvGO9Q37IJcQ5Hg26gnlGHrF+Lm1oE= X-Received: by 2002:a05:6102:4bcd:b0:4e4:5df7:a10a with SMTP id ada2fe7eead31-4e7f6207960mr11453168137.16.1750240360732; Wed, 18 Jun 2025 02:52:40 -0700 (PDT) MIME-Version: 1.0 References: <20250607220150.2980-1-21cnbao@gmail.com> <309d22ca-6cd9-4601-8402-d441a07d9443@lucifer.local> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 18 Jun 2025 17:52:28 +0800 X-Gm-Features: Ac12FXzvMFuQyJcjxcM9DVZI3g5FK4SRq_miWgwgHfjO7ZltaPDxc2PlzJBo2NE Message-ID: Subject: Re: [PATCH v4] mm: use per_vma lock for MADV_DONTNEED To: Lance Yang Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , "Liam R. Howlett" , David Hildenbrand , Vlastimil Babka , Jann Horn , Suren Baghdasaryan , Lokesh Gidra , Tangquan Zheng , Qi Zheng , Lance Yang , Lorenzo Stoakes , Zi Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 4qkwu1pt78g4pp6dhd5hferh5waacxwy X-Rspamd-Queue-Id: D304040014 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1750240361-57114 X-HE-Meta: U2FsdGVkX1/hENg8ibS2TAixwGhcmjShJb69eVQAxwVadQJmf9bfAWbRbznYiT1EDYlULJP4eh5WoD6Rut2qnze+mc1GtkGnsjtIfYQPfwmBhK9vPLaE+oPyeTCJfYtVhde3U217iNVoEfuh6eOLXwWBKcZhYrhUAMwcl0/YoWPgAzXXhA4TMN+HbyEAOis3EL2FZryrLLr80srsvdMInhlxkmcU6vgZ+U853y1E/MyzYpKPsPdYqEuenN1SyDB7fGGmmlbPHGXfXlcGi+zj6Dcc3F/9UoV1X794aNWaUQIM+nF/d+88Ns/DA1nIUEZ61YX1Q7dUrzB8ymTDgKMMXbU+eu/PbDtXZ7uzMSmNTlwp6ultAcxcdLeAGXWDbr8otZpYYrgWSz0UVsrA9v6tBgHNyzVBkzzfpV2q7j6kHNiSKAQojeoW7zJEXinUScOEGQqJdXNx6CqSIOiAZ4di7ERqkgelZZ3PULgRQudt0IugSogm2ZuNv8+Vu8QD95hhD5nA056+ZIuAbPjl1+v2RUi/A5yUMrJFXCOUxegTJH4isEb4JI3p7cSU1zEwUvWk3hHvbSkaKp6CQyPAacIacCJX+g1xejqZF75sIHwibFvONCoyNX2PnyEut2waP+5uZaSKVVFFtCB49cuCg73L+BqYiA28NKUBCl3qMBmbQvVCxg5h7HPrbP57oIBYmDRABdFg8drJLytfALOBChoVrkbyORHjm0a9bZJ9Epht/jkwY9odTGVyXrX0pWOf5UZOhY4UPgfamf1J1uf8J3CZT5b9jQNdUiZurxTFxkVeBCLVVpxgq44RaTta5XKPRF9bwcImtHkO0aGCzErSzk7i+fiRvcmVezSxQfeGnn/O6eT+Kjtu2KhMfvCdp01e7z+OQYy0IfAjgmEXssTX8hpXZ/C204eF+RRqAbJsnpebp5or6OtPyHsxEl4zwn9cBjYcMWs2Ks4nH7TbXyTTETy m47yYs6l qOJtdwXKeokdzwOTB4qdjtIHQUR4++fMtSWudTkLqsdOS1C0NwuvjBN1Vw3dDp3vLT3CXXmY+VgMIe1+o32KUAXlX+XFgRdXHhEWtI17de/7pWrmgANHw4/9Ef1y9jbJyKefaB43cMPqd3+K7qHBktdGKlrXF316v5jsCARnxFHHCLXmTJToAmxsJAXgGtatRG2UGU3/U2/7lxu8jSoupqkR281gMOPc4UfGYlzigpOMlnffdwOHXvwrbVvO0UuDDr17i1gqA/vDLKW9yCMVcIrGE6JHOIHTNNbfTPwL7iYeVW7/K/BzOqdzghV8PPPzPb4F6H0zm9H8Om6DVpoC973kHhIZShUzC9xbZmxuNllh/SCZj4V7E5uR6A8+7qSgmh4bZ0EQbH3+tjxx/CYJaCGVnvUV9wEsf9P4Hbe2IcfvZtxw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 18, 2025 at 10:25=E2=80=AFAM Lance Yang = wrote: > > Hi all, > > Crazy, the per-VMA lock for madvise is an absolute game-changer ;) > > On 2025/6/17 21:38, Lorenzo Stoakes wrote: > [...] > > > > On Sun, Jun 08, 2025 at 10:01:50AM +1200, Barry Song wrote: > >> From: Barry Song > >> > >> Certain madvise operations, especially MADV_DONTNEED, occur far more > >> frequently than other madvise options, particularly in native and Java > >> heaps for dynamic memory management. > >> > >> Currently, the mmap_lock is always held during these operations, even = when > >> unnecessary. This causes lock contention and can lead to severe priori= ty > >> inversion, where low-priority threads=E2=80=94such as Android's HeapTa= skDaemon=E2=80=94 > >> hold the lock and block higher-priority threads. > >> > >> This patch enables the use of per-VMA locks when the advised range lie= s > >> entirely within a single VMA, avoiding the need for full VMA traversal= . In > >> practice, userspace heaps rarely issue MADV_DONTNEED across multiple V= MAs. > >> > >> Tangquan=E2=80=99s testing shows that over 99.5% of memory reclaimed b= y Android > >> benefits from this per-VMA lock optimization. After extended runtime, > >> 217,735 madvise calls from HeapTaskDaemon used the per-VMA path, while > >> only 1,231 fell back to mmap_lock. > >> > >> To simplify handling, the implementation falls back to the standard > >> mmap_lock if userfaultfd is enabled on the VMA, avoiding the complexit= y of > >> userfaultfd_remove(). > >> > >> Many thanks to Lorenzo's work[1] on: > >> "Refactor the madvise() code to retain state about the locking mode > >> utilised for traversing VMAs. > >> > >> Then use this mechanism to permit VMA locking to be done later in the > >> madvise() logic and also to allow altering of the locking mode to perm= it > >> falling back to an mmap read lock if required." > >> > >> One important point, as pointed out by Jann[2], is that > >> untagged_addr_remote() requires holding mmap_lock. This is because > >> address tagging on x86 and RISC-V is quite complex. > >> > >> Until untagged_addr_remote() becomes atomic=E2=80=94which seems unlike= ly in > >> the near future=E2=80=94we cannot support per-VMA locks for remote pro= cesses. > >> So for now, only local processes are supported. > > Just to put some numbers on it, I ran a micro-benchmark with 100 > parallel threads, where each thread calls madvise() on its own 1GiB > chunk of 64KiB mTHP-backed memory. The performance gain is huge: > > 1) MADV_DONTNEED saw its average time drop from 0.0508s to 0.0270s (~47% > faster) > 2) MADV_FREE saw its average time drop from 0.3078s to 0.1095s (~64% > faster) Thanks for the report, Lance. I assume your micro-benchmark includes some explicit or implicit operations that may require mmap_write_lock(). As mmap_read_lock() only waits for writers and does not block other mmap_read_lock() calls. By the way, I would expect that per-VMA locking for madvise_dontneed or madvise_free would benefit nearly all Linux and Android systems, as long as they use a dynamic C/Java memory allocator. > > Thanks, > Lance Thanks Barry