From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC3BBC0219D for ; Tue, 11 Feb 2025 08:48:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 653CD6B0083; Tue, 11 Feb 2025 03:48:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6036F6B0085; Tue, 11 Feb 2025 03:48:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CB676B0088; Tue, 11 Feb 2025 03:48:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3003D6B0083 for ; Tue, 11 Feb 2025 03:48:16 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A8406121468 for ; Tue, 11 Feb 2025 08:48:15 +0000 (UTC) X-FDA: 83107036950.19.48B1E38 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf06.hostedemail.com (Postfix) with ESMTP id B050D180007 for ; Tue, 11 Feb 2025 08:48:13 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LxlQxRId; spf=pass (imf06.hostedemail.com: domain of haoxing990@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=haoxing990@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739263693; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yf7VELKCYnELOQS5s4JF67cVbrr4dcOglnhTTsT6CQo=; b=frMEBSBgkxEIe7SRC5hl+5NKDDgC1RU9MXCd7/21005Y1049V0GDUiW2LOSIMvsd7dFTIt I16svKSonAW9IN5aEwAza4hR/dZ0sq1QKYpW1NH5HSs/wfwyZAFeCiVqht8vTbkVJdY1ku I+pgoyt6eHB17uL+7w4/CIzliNsqxiA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LxlQxRId; spf=pass (imf06.hostedemail.com: domain of haoxing990@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=haoxing990@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739263693; a=rsa-sha256; cv=none; b=TulGyvlkb0QhPWh3lqQ8PpA4RS5O1gBuLO3UYUy1pPA1RPApaACsAGEn7ok3/5ZQrP01Iq fdpBNpiiYZ+umtgPKGGrSXhP647HTroS8fDwmuvvSMFNBwZxJMTQE1wiL4elgn4Ul+bKAh nYugYx7GVPFb7AA57v/fJskeSHrQ2U4= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-21f6d2642faso70359675ad.1 for ; Tue, 11 Feb 2025 00:48:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739263692; x=1739868492; darn=kvack.org; h=in-reply-to:from:references:cc:to:subject:user-agent:mime-version :date:message-id:from:to:cc:subject:date:message-id:reply-to; bh=yf7VELKCYnELOQS5s4JF67cVbrr4dcOglnhTTsT6CQo=; b=LxlQxRIdqSsdk12XgJcQEwVNkO/THcRAE0lH13OMj6QmCd8wpieLUvENb9GUgjUua5 0A0hakAjEJDLvj/5ZUly1iJNrz/upndalMf2PS7Yjk95mcvJvVSp/yqzr4WMSdzLxZJH mgXfiV7JOXXGbb1Y1rvc52xUF+E6XCzUEF0z6xLvbGkOYT29DsZtuTcFn5iFrLTHPOLi BO/Z78fbB9WavvCIEY2Sb7CBqf5VJi/e7mLdVoHZGf9z32MXUYZlMZHAJ9PbpVAV47ri pvIvUbW+GyRNLq3tABG4ldrXfiNgL0+0ORnQU0dkXfu2+afdRM/WbX+UVNAcmHXjEPxa NymA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739263692; x=1739868492; h=in-reply-to:from:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=yf7VELKCYnELOQS5s4JF67cVbrr4dcOglnhTTsT6CQo=; b=ImNXntRcMZWjbDXecK63d5VQ6bdxfFfRqgsuekBIMApAGRfQervFYj2R2Z+sMiVwPx llgZhwlMOvpgxfoBtXvsEKzNyE97jd1VTUON7vh/YeKWmKHNQTpvkmJei0uc0YWVpwcE 0lmFOxk0o0hFp2CRu7hMan9AWQUX7VZ/n212VJMwh5SYH3PPKQOdAwhHh+/Yi+FIGML8 3ay1xhVORhFCaA3Hp2GBimd2xnvrdwLKZpdUaq/M5TNc+0zT84TzGUm9lioe5C0++YND wpCvetK/JD7akAJHNQAe08tKkBG8vc7uyAqX8jdTPd5mWRXuGW9KIZI/Li+JmMzkTjyy O8rg== X-Forwarded-Encrypted: i=1; AJvYcCU87uqKoLNRimwZOOsZSPmia5wKo8oJ/8Ds6AD/etBZYmV3V5wzG77darrT3Ak4CQj9aON/N5HUkg==@kvack.org X-Gm-Message-State: AOJu0YzYLXzy+8g1kgO5EX6HAakKbDjCa2iK5CtSqBYLBztClTAkeeUp 5lbstgSHGr599sZfFQj6oOZe5wLHKgJcrLLl6gwoOyqacDIfAjK1 X-Gm-Gg: ASbGnctetPBFYPTwhGaznu5L+rJNBTksg81ZL30LOq6/ojQMOTgUh6LBKZx85ziXg13 dOK25NgfvaaHciIDtYIo06KzyNQ7GcPATDeLm1hEGxanV2zOTiYFxj1QTfEwIbit6Rej1zqtK0r nMRo4K/ZAKTtPl/SOZYQ3oFcKKsAfPeyua2vRXuRRQ+14J1XQVoEms/2Xfe34lZsqu7LACRsdhq wGD7KYoxqStg10xkyrGGpao8cNQk41dKru+Sj2/xOiKG7JlKl4/YbgZ16szp08gugEyxcAcRqIW AGwKTSNNQshMPFq8y52zFU/kbvpUonzjTGr1czN151w= X-Google-Smtp-Source: AGHT+IFc5HfmbgQa3TmbKmI3FLP4dTLfxGFb9TGGMqi1R/fxv5ySECGrgTKaMeAR3gTThB+2wOBOlQ== X-Received: by 2002:a17:902:ce8c:b0:21f:6bda:e492 with SMTP id d9443c01a7336-21f6bdae843mr176477495ad.35.1739263692191; Tue, 11 Feb 2025 00:48:12 -0800 (PST) Received: from [192.168.255.10] ([43.132.141.24]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21f3683d56asm90926735ad.118.2025.02.11.00.48.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Feb 2025 00:48:11 -0800 (PST) Content-Type: multipart/alternative; boundary="------------RxTuoy0ZSH070INCtFozFxGm" Message-ID: Date: Tue, 11 Feb 2025 16:48:06 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/4] mm/madvise: remove redundant mmap_lock operations from process_madvise() To: SeongJae Park , Andrew Morton Cc: "Liam R. Howlett" , David Hildenbrand , Davidlohr Bueso , Lorenzo Stoakes , Shakeel Butt , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20250206061517.2958-1-sj@kernel.org> From: Vern Hao In-Reply-To: <20250206061517.2958-1-sj@kernel.org> X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B050D180007 X-Stat-Signature: 8f5tskon89reemo8y4s3fysgbkixibx5 X-HE-Tag: 1739263693-924915 X-HE-Meta: U2FsdGVkX1/bh0qjT08MjjndwuleRQxvJxRvD29DgViYbChoZww6IzXqkJh2Zn5t6Sicm7Pv0oM6eVdzgqiFv2aReB3Y1mBs4iYSskv8sUYDwlURodDmBI7Blp6kIE8pARwsvOwMT0J7bHDiE5kGaMADISpwRdbhY24jBax5D80tsl7z6H91yunvW7jvXqkVjmlU8hW6Orovy0CRCtY8v+PW9GJT3LEcl5qPhaogl5v0V9TLE/j+NXQ+825L3PK5RQW6pALPpLAuUscd/GOUt5sNIhf5fTWe4dI+K/P6DXzxeU+taNp0Z6+8rA+DhJqXcIxc5n5b8BQgxphQdnbM4uaBYDHpd1XywpIz6IwGDA3IOtSeY6wB8Xg7Sy0Ta/AVca54l+HgVeLPPcSHVKKdAAruZI19hKYfc3r4gVzWfN70OIbadhcm3j6godiprlWRiye26bIi3m9437+1Zux0jR+XQxC4eo2ns/EY/7b/7+NPWXD9/gOrqvlxRnXnaTxN7P4Pod9z0tzjOCy3CxeUwuiHMSFZ+YbQzL/vaGJ5EyXeLBm1KrKb1mECfUgaY3bqeBB4cl3v0tIv5KoxczKoOMVKAXf7r+xPvvkpHzkLDuOtAr3pjlJz3IlOhmLHhhFuoPjEMFmgZmM4lrFAi7o5AP7NkO5jZAXMDF7KGGQiOnmb8ZyWcovlo4CbzfgzVY69GojrEPD9/kmtMMW1l2rHRNLl+7T/MeEiF7ZR/O2kfakIaD7x+fUO/EKgkhZxmFHbhLjZq83FIky5a1X4/bHQqO9nNd4w/RYqTyyvSW/my2zZSeapeSnB3VyD7DCM/QHOo97XRrDNeuGwu9yaQtwfaqe5RT5gDRtVe8ZpUcMrzdSOKWG7yBYwy/M88G8SwS4X/JPHc6tZhPhEQCcpYq9oUpSTDa3H0OJo60HhZ6ADFJ+qVDp6f7DPnxrOyke04egBWcMElqfSkB9M52Lz1wh U90ctaBM k5H8Q6K3YCy/P1l+lk3seC+xRlwsFed0ur8BHXxG0Ebxj57gQVXVzMHKpniqOhH+xRMDTjeaW0hMWo9GLCR+GsPvfwGYMGCzLvq4IVZUswDSgZilYzWkV7QSILDFxiPpBmwFuSOACkrk9VIxGM1Eytbw1SEHCC+yCgWpTjks8uh0qBxTMLFBRX53OVOnnfEGG4F/fKE/vnVtJykT2EsbqE+bxwqKntfXT3XTDgabi7rrm0adU8YoCOLnw2SUteoUnle3ffY65KzSS37t97CCRkZIPRKUKwWc/Sgr7hhZDePX2yxBLiws5JtQtWrn4Q8+0bYgCR5hZk9c9dXINGqRycyIHmbVo3kN4UXHVad8k+prWGN1OzWdbBzVsSMYVw1xx902r189k/D2E5EAiJRz2oDivHOgtM7ZDADRy5eaCboqjX0tMFDnea5GEp6d5yoeniMmrJTvpyAHqPm7jUDar0PB+KhDYF0IViwRp6APmRYXiRWqEPO+5dyQF3MbwyUf1wHwgYt1FNwac6ckNIudXpsDB8jpKG2nQTMz8ALq2Kf2pg+H/EmN9noaQj2DcUHZEHbXwdlQupUQjbyiqOwePhcqgVhZg4ITobkwd3to5zo+JJXRJdWpQ7Pxa5Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000021, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is a multi-part message in MIME format. --------------RxTuoy0ZSH070INCtFozFxGm Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2025/2/6 14:15, SeongJae Park wrote: > process_madvise() calls do_madvise() for each address range. Then, each > do_madvise() invocation holds and releases same mmap_lock. Optimize the > redundant lock operations by splitting do_madvise() internal logics > including the mmap_lock operations, and calling the small logics > directly from process_madvise() in a sequence that removes the redundant > locking. As a result of this change, process_madvise() becomes more > efficient and less racy in terms of its results and latency. > > Note that the potential downside of this series is that other mmap_lock > holders may take more time due to the increased length of mmap_lock > critical section for process_madvise() calls. But there is maximum > limit in the kernel space (IOV_MAX), and the user space can control the > critical section length by setting the request size. Hence, the > downside would be limited and controllable. > > Evaluation > ========== > > I measured the time to apply MADV_DONTNEED advice to 256 MiB memory > using multiple madvise() calls, 4 KiB per each call. I also do the same > with process_madvise(), but with varying batch size (vlen) from 1 to > 1024. The source code for the measurement is available at GitHub[1]. > Because the microbenchmark result is not that stable, I ran each > configuration five times and use the average. > > The measurement results are as below. 'sz_batches' column shows the > batch size of process_madvise() calls. '0' batch size is for madvise() > calls case. Hi,  i just wonder why these patches can reduce latency time on call madvise() DONT_NEED. > 'before' and 'after' columns are the measured time to apply > MADV_DONTNEED to the 256 MiB memory buffer in nanoseconds, on kernels > that built without and with the last patch of this series, respectively. > So lower value means better efficiency. 'after/before' column is the > ratio of 'after' to 'before'. > > sz_batches before after after/before > 0 146294215.2 121280536.2 0.829017989769427 > 1 165851018.8 136305598.2 0.821855658085351 > 2 129469321.2 103740383.6 0.801273866569094 > 4 110369232.4 87835896.2 0.795836795182785 > 8 102906232.4 77420920.2 0.752344327397609 > 16 97551017.4 74959714.4 0.768415506038587 > 32 94809848.2 71200848.4 0.750985786305689 > 64 96087575.6 72593180 0.755489765942227 > 128 96154163.8 68517055.4 0.712575022154163 > 256 92901257.6 69054216.6 0.743307662177439 > 512 93646170.8 67053296.2 0.716028168874151 > 1024 92663219.2 70168196.8 0.75723892830177 > > In despite of the unstable nature of the tet program, the trend is > somewhat we can expect. The measurement shows this patch reduces the > process_madvise() latency, proportional to the batching size. The > latency gain was about 20% with the batch size 2, and it has increased > to about 28% with the batch size 512, since more number of mmap locking > is reduced with larger batch size. > > Note that the standard devitation of the measurements for each > sz_batches configuration was ranging from 1.9% to 7.2%. That is, this > result is still not very stable. The average of the standard deviations > for different batch sizes were 4.62% and 4.70% for the 'before' and > 'after' kernel measurements. > > Also note that this patch has somehow decreased latencies of madvise() > and single batch size process_madvise(). Seems this code path is small > enough to significantly be affected by compiler optimizations including > inlining of split-out functions. Please focus on only the improvement > amount that changed by the batch size. > > Changelog > ========= > > Changes from RFC v2 > (https://lore.kernel.org/20250117013058.1843-1-sj@kernel.org) > - Release and acquire mmap lock again when a race-caused failure happens > (Lorenzo Stoakes) > - Collected Reviewed-by: tags from Shakeel, Lorenzo and Davidlohr. > > Changes from RFC v1 > (https://lore.kernel.org/20250111004618.1566-1-sj@kernel.org) > - Split out do_madvise() and use those from vector_madvise(), instead of > adding a flag to do_madvise() (Liam R. Howlett) > > [1]https://github.com/sjp38/eval_proc_madvise > > SeongJae Park (4): > mm/madvise: split out mmap locking operations for madvise() > mm/madvise: split out madvise input validity check > mm/madvise: split out madvise() behavior execution > mm/madvise: remove redundant mmap_lock operations from > process_madvise() > > mm/madvise.c | 154 +++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 107 insertions(+), 47 deletions(-) > > > base-commit: f104b8534d19f31443a4fe6cb701bdb15fd931eb --------------RxTuoy0ZSH070INCtFozFxGm Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit


On 2025/2/6 14:15, SeongJae Park wrote:
process_madvise() calls do_madvise() for each address range.  Then, each
do_madvise() invocation holds and releases same mmap_lock.  Optimize the
redundant lock operations by splitting do_madvise() internal logics
including the mmap_lock operations, and calling the small logics
directly from process_madvise() in a sequence that removes the redundant
locking.  As a result of this change, process_madvise() becomes more
efficient and less racy in terms of its results and latency.

Note that the potential downside of this series is that other mmap_lock
holders may take more time due to the increased length of mmap_lock
critical section for process_madvise() calls.  But there is maximum
limit in the kernel space (IOV_MAX), and the user space can control the
critical section length by setting the request size.  Hence, the
downside would be limited and controllable.

Evaluation
==========

I measured the time to apply MADV_DONTNEED advice to 256 MiB memory
using multiple madvise() calls, 4 KiB per each call.  I also do the same
with process_madvise(), but with varying batch size (vlen) from 1 to
1024.  The source code for the measurement is available at GitHub[1].
Because the microbenchmark result is not that stable, I ran each
configuration five times and use the average.

The measurement results are as below.  'sz_batches' column shows the
batch size of process_madvise() calls.  '0' batch size is for madvise()
calls case. 
Hi,  i just wonder why these patches can reduce latency time on call madvise() DONT_NEED.
 'before' and 'after' columns are the measured time to apply
MADV_DONTNEED to the 256 MiB memory buffer in nanoseconds, on kernels
that built without and with the last patch of this series, respectively.
So lower value means better efficiency.  'after/before' column is the
ratio of 'after' to 'before'.

    sz_batches  before       after        after/before
    0           146294215.2  121280536.2  0.829017989769427
    1           165851018.8  136305598.2  0.821855658085351
    2           129469321.2  103740383.6  0.801273866569094
    4           110369232.4  87835896.2   0.795836795182785
    8           102906232.4  77420920.2   0.752344327397609
    16          97551017.4   74959714.4   0.768415506038587
    32          94809848.2   71200848.4   0.750985786305689
    64          96087575.6   72593180     0.755489765942227
    128         96154163.8   68517055.4   0.712575022154163
    256         92901257.6   69054216.6   0.743307662177439
    512         93646170.8   67053296.2   0.716028168874151
    1024        92663219.2   70168196.8   0.75723892830177

In despite of the unstable nature of the tet program, the trend is
somewhat we can expect.  The measurement shows this patch reduces the
process_madvise() latency, proportional to the batching size.  The
latency gain was about 20% with the batch size 2, and it has increased
to about 28% with the batch size 512, since more number of mmap locking
is reduced with larger batch size.

Note that the standard devitation of the measurements for each
sz_batches configuration was ranging from 1.9% to 7.2%.  That is, this
result is still not very stable.  The average of the standard deviations
for different batch sizes were 4.62% and 4.70% for the 'before' and
'after' kernel measurements.

Also note that this patch has somehow decreased latencies of madvise()
and single batch size process_madvise().  Seems this code path is small
enough to significantly be affected by compiler optimizations including
inlining of split-out functions.  Please focus on only the improvement
amount that changed by the batch size.

Changelog
=========

Changes from RFC v2
(https://lore.kernel.org/20250117013058.1843-1-sj@kernel.org)
- Release and acquire mmap lock again when a race-caused failure happens
  (Lorenzo Stoakes)
- Collected Reviewed-by: tags from Shakeel, Lorenzo and Davidlohr.

Changes from RFC v1
(https://lore.kernel.org/20250111004618.1566-1-sj@kernel.org)
- Split out do_madvise() and use those from vector_madvise(), instead of
  adding a flag to do_madvise() (Liam R. Howlett)

[1] https://github.com/sjp38/eval_proc_madvise

SeongJae Park (4):
  mm/madvise: split out mmap locking operations for madvise()
  mm/madvise: split out madvise input validity check
  mm/madvise: split out madvise() behavior execution
  mm/madvise: remove redundant mmap_lock operations from
    process_madvise()

 mm/madvise.c | 154 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 107 insertions(+), 47 deletions(-)


base-commit: f104b8534d19f31443a4fe6cb701bdb15fd931eb
--------------RxTuoy0ZSH070INCtFozFxGm--