From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C93D1C282DE
	for <linux-mm@archiver.kernel.org>; Wed,  5 Mar 2025 20:22:37 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id EFC7F28000A; Wed,  5 Mar 2025 15:22:34 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id EAAD4280007; Wed,  5 Mar 2025 15:22:34 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id D9E2628000A; Wed,  5 Mar 2025 15:22:34 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id B6125280007
	for <linux-mm@kvack.org>; Wed,  5 Mar 2025 15:22:34 -0500 (EST)
Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id B49171C8FF8
	for <linux-mm@kvack.org>; Wed,  5 Mar 2025 20:22:36 +0000 (UTC)
X-FDA: 83188620312.08.5066F8A
Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176])
	by imf15.hostedemail.com (Postfix) with ESMTP id C363DA0019
	for <linux-mm@kvack.org>; Wed,  5 Mar 2025 20:22:34 +0000 (UTC)
Authentication-Results: imf15.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=SlOZ9KuK;
	spf=pass (imf15.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1741206155;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=60IYXncwzNAIP2DDPmSH9jAFXc71tM9uKRjLp1njXNA=;
	b=C0OlCF/v2KCVq9XJeSj4ZkmHXlLUnFcPBJ/odeZX7twg/WK71UCRotz4D9i1+PPQkxnygG
	2X8N/WC1XGfdLDXBzrbCRnT0B+NlauxVMXHug5kTULMUuXk30wigyrL5cHzKuA5zv3B8lE
	DqiVh5BA/KF9oViooBdSGTaVK3PBb3U=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741206155; a=rsa-sha256;
	cv=none;
	b=XqTgWR5jSX5up0B7+QB6fEhsLW3ah27E8BxRZJsFKn+NjgAztTVszPa21PQ1KX0+qcTxtH
	0CulSPC2U/0t1dyGIcMiCITxlgHXr3U6XrAnapwZN2U1L3xxIdKlvPM7b5CarQ3c9D85RI
	h5mb9X3MGeSEypbtm/yGTu3e/oQZPoU=
ARC-Authentication-Results: i=1;
	imf15.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=SlOZ9KuK;
	spf=pass (imf15.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
Date: Wed, 5 Mar 2025 12:22:25 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1741206152;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=60IYXncwzNAIP2DDPmSH9jAFXc71tM9uKRjLp1njXNA=;
	b=SlOZ9KuKK6o/G8SJDyBxmZR8PyO6enQbmzy41bVGtgaIGTlSNUOquZK2nusuY2JZUDxOIq
	ie0od8LW6I7uYE7lvYISL+k/PcOE+T9aA6X1+GgmDy237fdzJclo0ikgq9/4E0qcP52jqv
	pXXpueDBk4R+enqEUKdVUfNzaLXlSUo=
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Shakeel Butt <shakeel.butt@linux.dev>
To: SeongJae Park <sj@kernel.org>
Cc: "Liam R. Howlett" <howlett@gmail.com>, 
	Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, 
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, kernel-team@meta.com, 
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH 00/16] mm/madvise: batch tlb flushes for
 MADV_DONTNEED and MADV_FREE
Message-ID: <ro2wtggwxbmwk6lhvcixwrefo44x7ggeumevv7lyupvudwxjsg@onh2e46eqzcy>
References: <20250305181611.54484-1-sj@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20250305181611.54484-1-sj@kernel.org>
X-Migadu-Flow: FLOW_OUT
X-Stat-Signature: pn6rag6re6ouhcokndwjwboorhws5w89
X-Rspamd-Queue-Id: C363DA0019
X-Rspam-User: 
X-Rspamd-Server: rspam01
X-HE-Tag: 1741206154-114064
X-HE-Meta: U2FsdGVkX18wqDaeAC+HwocYfshWSRTDRBKBSINJ/0Lc13Oy7FsOcR3SdT5W5uhCXqimSqdTbddcJq/dfjTnREZLRgwfIQEEQEkmGRKiZKjp7aIEj9ac7kRvTK3LiulHF/5/zVtBtvM1QRdeCpdMDQjP3xedmsUUDb5K0Php/JBYyD1H/k4sTcPUBd9g1az90ViehF10cxmvvw4GJ0LZxnRDSr+hkDxVExF2doCady+A2F01SXVREyJ5IUSFB++OhV76344ldHV+hKa4XHVUj/SFcbIAi+pznUx8Fjfr2GWrMPoFwcha53DraPTsx58o5z7EXW0/1y5KCNizMGdAWnHnMZSBxiREcR4t4ELYrM/6r/5dHThih+hvtc+2qQ+huG2S2j+XCTuynnnfeMm22HlMpo3AwIm0+bRbueA12pUypO7JR3ppgOC52uvc8u/qRkVGhI197PnqTCoOkGFuZolpQZyhKf03CaW8eATv38c0RVZraZ5u9GguWFTeuEnW5WaUSzVQml1TKzSLcbKXpHcLZVz+ceZ6gu+GuMkiW170A1343VmrV59nrBxzi0J4gw8QZ9CeWZqF6RJFfRZftG372ULd/2o5DHUi0kp9oNW2z7MX6nkysXyxbfpu22XQr83ZRFq8wxAsv3lRCH88Zl69LunjDrS/Rr8v8wu14Xw+iqi6DEBJV+V/g7hOAEJELm0IKNgKnfAPAGyRYEGki8fyYE3X7F6N38UxkbKcM3cn/H5/17KeV5yiBOmpO5g3+gkwa549UlonxVECkjqkTdmn+RJYyxiuVm+v9/So7Xa4oAuhAuPYjM7XNwgGST+KantI1k+Wy/Tm3kjcYOvOX9xBBadPpltH5rxR8gXYHEIb+6gEHN4QZxHNp821ytdWvnOszwIK4VSPnPt0bueX+GY0IzNCCjgHk7fHbpEYCwOJ0zwIswLxRIa+aI3kcMoW+P3uVPUTK593CxTJPK2
 MVEFOgZC
 KVmsqvB0RWKxgtL77LwmS9qJOX3pW3RSaWtE99EPVbAx098WoZ6N2qHxCFhYx1ym8rW+mwxrGbJ6nUBpnn7U1/a7dNAKBdP4++Fj5/RaKMkdhtFl6VaDgmhmbnXQiXN+19VMOKwe8tUXK3HK31jdGpaqJA3/Ygr/dXkorg5JERnfRucE4REwLc+QuJ8YR+2taNRmyeI16lzc9qU/MhCGbkLHJ34jOjg6SDrJt
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000022, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, Mar 05, 2025 at 10:15:55AM -0800, SeongJae Park wrote:
> For MADV_DONTNEED[_LOCKED] or MADV_FREE madvise requests, tlb flushes
> can happen for each vma of the given address ranges.  Because such tlb
> flushes are for address ranges of same process, doing those in a batch
> is more efficient while still being safe.  Modify madvise() and
> process_madvise() entry level code path to do such batched tlb flushes,
> while the internal unmap logics do only gathering of the tlb entries to
> flush.
> 
> In more detail, modify the entry functions to initialize an mmu_gather
> ojbect and pass it to the internal logics.  Also modify the internal
> logics to do only gathering of the tlb entries to flush into the
> received mmu_gather object.  After all internal function calls are done,
> the entry functions finish the mmu_gather object to flush the gathered
> tlb entries in the one batch.
> 
> Patches Seuquence
> =================
> 
> First four patches are minor cleanups of madvise.c for readability.
> 
> Following four patches (patches 5-8) define new data structure for
> managing information that required for batched tlb flushing (mmu_gather
> and behavior), and update code paths for MADV_DONTNEED[_LOCKED] and
> MADV_FREE handling internal logics to receive it.
> 
> Three patches (patches 9-11) for making internal MADV_DONTNEED[_LOCKED]
> and MADV_FREE handling logic ready for batched tlb flushing follow. 

I think you forgot to complete the above sentence or the 'follow' at the
end seems weird.

> The
> patches keep the support of unbatched tlb flushes use case, for
> fine-grained and safe transitions.
> 
> Next three patches (patches 12-14) update madvise() and
> process_madvise() code to do the batched tlb flushes utilizing the
> previous patches introduced changes.
> 
> Final two patches (patches 15-16) clean up the internal logics'
> unbatched tlb flushes use case support code, which is no more be used.
> 
> Test Results
> ============
> 
> I measured the time to apply MADV_DONTNEED advice to 256 MiB memory
> using multiple process_madvise() calls.  I apply the advice in 4 KiB
> sized regions granularity, but with varying batch size (vlen) from 1 to
> 1024.  The source code for the measurement is available at GitHub[1].
> 
> The measurement results are as below.  'sz_batches' column shows the
> batch size of process_madvise() calls.  'before' and 'after' columns are
> the measured time to apply MADV_DONTNEED to the 256 MiB memory buffer in
> nanoseconds, on kernels that built without and with the MADV_DONTNEED
> tlb flushes batching patch of this series, respectively.  For the
> baseline, mm-unstable tree of 2025-03-04[2] has been used.
> 'after/before' column is the ratio of 'after' to 'before'.  So
> 'afetr/before' value lower than 1.0 means this patch increased
> efficiency over the baseline.  And lower value means better efficiency.

I would recommend to replace the after/end column with percentage i.e.
percentage improvement or degradation.

> 
>     sz_batches    before       after        after/before
>     1             102842895    106507398    1.03563204828102
>     2             73364942     74529223     1.01586971880929
>     4             58823633     51608504     0.877343022998937
>     8             47532390     44820223     0.942940655834895
>     16            43591587     36727177     0.842529018271347
>     32            44207282     33946975     0.767904595446515
>     64            41832437     26738286     0.639175910310939
>     128           40278193     23262940     0.577556694263817
>     256           41568533     22355103     0.537789077136785
>     512           41626638     22822516     0.54826709762148
>     1024          44440870     22676017     0.510251419470411
> 
> For <=2 batch size, tlb flushes batching shows no big difference but
> slight overhead.  I think that's in an error range of this simple
> micro-benchmark, and therefore can be ignored.  

I would recommend to run the experiment multiple times and report
averages and standard deviation which will support your error range
claim.

> Starting from batch size
> 4, however, tlb flushes batching shows clear efficiency gain.  The
> efficiency gain tends to be proportional to the batch size, as expected.
> The efficiency gain ranges from about 13 percent with batch size 4, and
> up to 49 percent with batch size 1,024.
> 
> Please note that this is a very simple microbenchmark, so real
> efficiency gain on real workload could be very different.
> 

I think you are running a single thread benchmark on a free machine. I
expect this series to be much more beneficial on loaded machine and for
multi-threaded applications. No need to test that scenario but if you
have already done that then it would be good to report.