From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23043C7115A for ; Wed, 18 Jun 2025 10:32:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8FB66B0092; Wed, 18 Jun 2025 06:32:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B67226B0093; Wed, 18 Jun 2025 06:32:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA4216B0095; Wed, 18 Jun 2025 06:32:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9E3626B0092 for ; Wed, 18 Jun 2025 06:32:50 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6E1411D75AB for ; Wed, 18 Jun 2025 10:32:50 +0000 (UTC) X-FDA: 83568158100.04.6FF9A60 Received: from mail-vs1-f43.google.com (mail-vs1-f43.google.com [209.85.217.43]) by imf04.hostedemail.com (Postfix) with ESMTP id 733184001A for ; Wed, 18 Jun 2025 10:32:48 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="UNyp/Z6a"; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750242768; a=rsa-sha256; cv=none; b=eU9f4ZOysU/WQD/hI5IIayOTC0Vt0LUNogDlt27sE6gqCAJtrdQOp9GB2RSYFtbTfFWD59 +kSsvOtLN5yh/zbNZi07246+h8WR3kezoHl6hjIvGCKULmEPCm0xd9TpBq3dq5iN6IDm18 uFNWoCSDRZwKyjdgTeqeOPyBKJVhMPg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="UNyp/Z6a"; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750242768; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lU+pzqvL/Yt+cxi9AuUllWrZ9p+aLRr7tJP24I45bHI=; b=57Syv612Co6UKnSYgMBnVsh2jwhgzKN3Lvu8KkY2CrtsR4bOFqDGa1bNeBwxt08srDPan9 /jx6B62nCHpNq8rrcYkviMziEoZh4nOBv1l8TturXJUW0Y5exfvqFS+FOd2/jVKjOaovAq Df3F5pyRR1g7rpow2ffOKsxZ+hqFWKQ= Received: by mail-vs1-f43.google.com with SMTP id ada2fe7eead31-4e80ff08dd6so440964137.1 for ; Wed, 18 Jun 2025 03:32:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750242767; x=1750847567; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lU+pzqvL/Yt+cxi9AuUllWrZ9p+aLRr7tJP24I45bHI=; b=UNyp/Z6aQrN0OPO672N2GXtANT4ZmbwqJh869v2ZmEjvbDYf8/OUIOT+QfZ8mjDE3J OvpEY4g54mB+J7X9+Io8pP+oKozW9UhHPFaTexYTZE6lmvSD2JjxUBUeh8wmTcSSjIHr L+dbVhPz7gUIhIuFFDEY6cnuOecr4i73H7FE8eV4mOnuennCND9QK64GnodUdnwTRbKM V8OX8OamKZ/JRrIl01t16FZmX/Jl4XxtLNwb41r6AOj29zffANY+uLrpfC7Gn7OXHG7+ LgJjfzXmnPUSNlGOmozHpS54B46aSH7zwC8Yeq0V8P1HAShmPiGOTVOsjnZ4wJqMO7bd eq+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750242767; x=1750847567; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lU+pzqvL/Yt+cxi9AuUllWrZ9p+aLRr7tJP24I45bHI=; b=wooDWhUYpEeOHQxd/MDJnxYHsjh3foceFkVRgtQkrCaMATcRFoOTJ6TvAevsCSv9U4 fUE8iXt726JfhBEWDSGJOFSVu2xJy2uKW4SSjusd8pfMeXOvoTXoWNYqYMreJPa4Xx60 cc6oXIEqK6jvQjzP5D2jLX3ykT6wNw1SvUAnhhueBceqx1aEpXypAp6mVcc2lJ4IJSnQ C2JfhbOHk00EwR80m+Q8/MVpL2NZGStyYZ/Nq5kSZTB9mz9A3mwL+9h5u+kRG5e+Oezn AD2ttJr6Qbm1hu/lCwm+AjPHbcjzFPjMEBApxkKx0Ssl0bNZeGGTQRVsGYVZsiiOi9nb qiVw== X-Forwarded-Encrypted: i=1; AJvYcCUHj5UrOkpXhBNFFHp26Aui/JBKEMyp8w74etVZ5cXWNAWpsQLdL/t1Sn/rFi5YTdn0VdM2FrO4BQ==@kvack.org X-Gm-Message-State: AOJu0YyBIHR/nck84DfzRoFCZt9mEiRWPrE/gy7n6DFjxImuqrWAOROc AUvDsDeJ/kUPke5GIXdDZk/VeIc21y1OhPIhmCCp9W8ytekw6q6PjpA5a0T0+kGHdVFBnreu3dF SKOLz+ylVcI7bZd6/cANrrcBpAqdKWTU= X-Gm-Gg: ASbGncuIbas4PAWp1BOp39Mmh5NqLeOQRbM8rICCyBHbsXJoRYl2cH50iMMi8phJxc1 vFpQ2OrMWbUCfcXyu5DAGUMBOWk6oq4a52dHo1bhvgCz9j/8kp8Wt7/4qOdX5GUcSGnzMpMAc2d +3FsvJF6JisAZGk2kaBvI+sk4jmj5+ytorDGQ5GFi0pIM= X-Google-Smtp-Source: AGHT+IEaJVB4tMkw2x2fbh5d1EW9KLwKMc6MKqks+45BQkwMs1+F9WOytNwvsjJPa3MWwnNKWhXs6HULpjYYlnTxqJQ= X-Received: by 2002:a05:6102:41a6:b0:4e7:866c:5cd9 with SMTP id ada2fe7eead31-4e9980aec58mr1095888137.11.1750242767417; Wed, 18 Jun 2025 03:32:47 -0700 (PDT) MIME-Version: 1.0 References: <20250607220150.2980-1-21cnbao@gmail.com> <309d22ca-6cd9-4601-8402-d441a07d9443@lucifer.local> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 18 Jun 2025 18:32:36 +0800 X-Gm-Features: Ac12FXyP3am24e1zyR-C769M7_eQzhr2bUBOnfWcFSv8jkQbj6h2MA7umGMWpn4 Message-ID: Subject: Re: [PATCH v4] mm: use per_vma lock for MADV_DONTNEED To: David Hildenbrand Cc: Lance Yang , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , "Liam R. Howlett" , Vlastimil Babka , Jann Horn , Suren Baghdasaryan , Lokesh Gidra , Tangquan Zheng , Qi Zheng , Lance Yang , Lorenzo Stoakes , Zi Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 733184001A X-Stat-Signature: k8jj7hdynfos6ih4nsnqtdt96qij3mdn X-Rspam-User: X-HE-Tag: 1750242768-201576 X-HE-Meta: U2FsdGVkX1+gscpD846Ze9Q0qgC/Q8dHMooFU/ZCi/Lp6c38JAs+1dt+e7zqHam2Oub73ts2VXeCW6mZXjTndGP36zdw8j4FHflPE9LwsXbA3hY5UmQSvA4KDWSLQVxjYIPAbJjC/IDKqjy/vL+AA2W1VpQbopgTHMGpHqqzOX7VaVMGGrwLEK2wcouj1zkYqnnGfbmaTiS7GvZlGudY1167GFtsE8bsnH6R7LWXJgFD3XHxY9RmjF97ZVCVounXHA7F/OhAWxqTKwNNKMOoRQMoRZ3mqtgcZBnHAJ+jCqSB3s4ETvb+oDyT8pDpGXSaAKcqgtgy/iKY/zx4O7jYAO2lbDgAGHjJWMuB5cWHo4SA4Pdzg2I/3+tofsvNorWSF4WCkIqlDOVr++1kCtPaPN8KArYvEQTjyPcFBqlN0V+ohiw4evRkjWiXsEazLa5Iqix00R0F/HKSL/rVfSkd6RIN84WIC0u6O+3fNFh4gZDyBVvpE8tjNTgHDzuEZdboRKIkP6Dym4Twjv8mm7UgUyxuEwciZqj5/OYUL5ZKY2qPZwGFn0iL1U39yy1dQghIpW+oUiL2FYCNvru1NSluNFeymLvEXjGZOjPZ+yn3k5mNtkEni7g0EPYpusQDLUWN4mWSlcVjZZCyuIkHi7QAtloxsDV0EqYuhmEMExQKlZtNKzpSPMMKxRe6JnNjrxWdYeU4VtOc+wnXj0GpLcJGTX3gFJyyEpO2EEVuCKIejR5Db/hUU9iyDNYwia8QsGCCYNAcs9S6bCgHZ/S/zJ6gv037895U/c4FBlE8xnYCURzVxXH2OllPhglornCvxO0uNhAKtsRHRbtJRGlPvmKlcA8biigkCPYXpZHeRdfc29RniVfqj8AkWUVeXwrsmN3WJCWGRY2nimxhBHQRVvdZJj/D1lE+VdfpqavsqdYTNSYTcsGLuosmOkoRXpW4sdHlUo5dXAsJ9ZMgJ/B6uG6 8lcYo5QP 7xe9N7d6Kck4nrnkGqQW1rCudbF2+VP193ndWXEbGcGHN2IMGX3Fx/VHM7JMHVQRmT5aHt/Q2D3aZI3wG7w/aE8UZqSw3emPUg2nP5b6qr7N6RU693lRWcFLsb3j46nImItcnfGp9g00u/DLnKbfc1+De8bOJAJTQQdREphptZUEClYqyYxmLrjlpcHYfUqpb6qDl+V+uTGX9a5lVUxchlYWpIrIRVrrLOE2UfyeyrZB7TtlH5ipVCOjP+TxXRmf/9D6BWZmLtRyY4nbS/FZzN7q2MBN1MQpBc4YhmSeK4kCHdJ/2+0shuRR6wF+gjMVUkhuj+iL2aFFtIqt823pQ1pEjGuk4/m5XrTOtz55H6Dbba3VYg2MMf7w9aMF84OtuDiSmtpa/ivKg1bbDWGZ/bfRqqsQKLW+6E2E+5qsjKhD8RfDKWg5MFZDDkA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 18, 2025 at 6:30=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Wed, Jun 18, 2025 at 6:18=E2=80=AFPM David Hildenbrand wrote: > > > > On 18.06.25 11:52, Barry Song wrote: > > > On Wed, Jun 18, 2025 at 10:25=E2=80=AFAM Lance Yang wrote: > > >> > > >> Hi all, > > >> > > >> Crazy, the per-VMA lock for madvise is an absolute game-changer ;) > > >> > > >> On 2025/6/17 21:38, Lorenzo Stoakes wrote: > > >> [...] > > >>> > > >>> On Sun, Jun 08, 2025 at 10:01:50AM +1200, Barry Song wrote: > > >>>> From: Barry Song > > >>>> > > >>>> Certain madvise operations, especially MADV_DONTNEED, occur far mo= re > > >>>> frequently than other madvise options, particularly in native and = Java > > >>>> heaps for dynamic memory management. > > >>>> > > >>>> Currently, the mmap_lock is always held during these operations, e= ven when > > >>>> unnecessary. This causes lock contention and can lead to severe pr= iority > > >>>> inversion, where low-priority threads=E2=80=94such as Android's He= apTaskDaemon=E2=80=94 > > >>>> hold the lock and block higher-priority threads. > > >>>> > > >>>> This patch enables the use of per-VMA locks when the advised range= lies > > >>>> entirely within a single VMA, avoiding the need for full VMA trave= rsal. In > > >>>> practice, userspace heaps rarely issue MADV_DONTNEED across multip= le VMAs. > > >>>> > > >>>> Tangquan=E2=80=99s testing shows that over 99.5% of memory reclaim= ed by Android > > >>>> benefits from this per-VMA lock optimization. After extended runti= me, > > >>>> 217,735 madvise calls from HeapTaskDaemon used the per-VMA path, w= hile > > >>>> only 1,231 fell back to mmap_lock. > > >>>> > > >>>> To simplify handling, the implementation falls back to the standar= d > > >>>> mmap_lock if userfaultfd is enabled on the VMA, avoiding the compl= exity of > > >>>> userfaultfd_remove(). > > >>>> > > >>>> Many thanks to Lorenzo's work[1] on: > > >>>> "Refactor the madvise() code to retain state about the locking mod= e > > >>>> utilised for traversing VMAs. > > >>>> > > >>>> Then use this mechanism to permit VMA locking to be done later in = the > > >>>> madvise() logic and also to allow altering of the locking mode to = permit > > >>>> falling back to an mmap read lock if required." > > >>>> > > >>>> One important point, as pointed out by Jann[2], is that > > >>>> untagged_addr_remote() requires holding mmap_lock. This is because > > >>>> address tagging on x86 and RISC-V is quite complex. > > >>>> > > >>>> Until untagged_addr_remote() becomes atomic=E2=80=94which seems un= likely in > > >>>> the near future=E2=80=94we cannot support per-VMA locks for remote= processes. > > >>>> So for now, only local processes are supported. > > >> > > >> Just to put some numbers on it, I ran a micro-benchmark with 100 > > >> parallel threads, where each thread calls madvise() on its own 1GiB > > >> chunk of 64KiB mTHP-backed memory. The performance gain is huge: > > >> > > >> 1) MADV_DONTNEED saw its average time drop from 0.0508s to 0.0270s (= ~47% > > >> faster) > > >> 2) MADV_FREE saw its average time drop from 0.3078s to 0.1095s (= ~64% > > >> faster) > > > > > > Thanks for the report, Lance. I assume your micro-benchmark includes = some > > > explicit or implicit operations that may require mmap_write_lock(). > > > As mmap_read_lock() only waits for writers and does not block other > > > mmap_read_lock() calls. > > > > The number rather indicate that one test was run with (m)THPs enabled > > and the other not? Just a thought. The locking overhead from my > > experience is not that significant. > > Right. I don't expect pure madvise_dontneed/free=E2=80=94without any addi= tional > behavior requiring mmap_write_lock=E2=80=94to improve performance signifi= cantly. > The main benefit would be avoiding contention on the write lock. > > Consider this scenario: > timestamp1: Thread A acquires the read lock > timestamp2: Thread B attempts to acquire the write lock > timestamp3: Threads C, D, and E attempt to acquire the read lock > > In this case, thread B must wait for A, and threads C, D, and E will > wait for both A and B. Any write lock request effectively blocks all > subsequent read acquisitions. > > In the worst case, thread A might be a GC thread with a high nice value. > If it's preempted by other threads, the delay can reach several > milliseconds=E2=80=94as we've observed in some cases. sorry for the typo. I mean a few hundred milliseconds. > > > > > -- > > Cheers, > > > > David / dhildenb > > > > Thanks > Barry