From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C93D1C282DE for ; Wed, 5 Mar 2025 20:22:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EFC7F28000A; Wed, 5 Mar 2025 15:22:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EAAD4280007; Wed, 5 Mar 2025 15:22:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9E2628000A; Wed, 5 Mar 2025 15:22:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B6125280007 for ; Wed, 5 Mar 2025 15:22:34 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B49171C8FF8 for ; Wed, 5 Mar 2025 20:22:36 +0000 (UTC) X-FDA: 83188620312.08.5066F8A Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) by imf15.hostedemail.com (Postfix) with ESMTP id C363DA0019 for ; Wed, 5 Mar 2025 20:22:34 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SlOZ9KuK; spf=pass (imf15.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741206155; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=60IYXncwzNAIP2DDPmSH9jAFXc71tM9uKRjLp1njXNA=; b=C0OlCF/v2KCVq9XJeSj4ZkmHXlLUnFcPBJ/odeZX7twg/WK71UCRotz4D9i1+PPQkxnygG 2X8N/WC1XGfdLDXBzrbCRnT0B+NlauxVMXHug5kTULMUuXk30wigyrL5cHzKuA5zv3B8lE DqiVh5BA/KF9oViooBdSGTaVK3PBb3U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741206155; a=rsa-sha256; cv=none; b=XqTgWR5jSX5up0B7+QB6fEhsLW3ah27E8BxRZJsFKn+NjgAztTVszPa21PQ1KX0+qcTxtH 0CulSPC2U/0t1dyGIcMiCITxlgHXr3U6XrAnapwZN2U1L3xxIdKlvPM7b5CarQ3c9D85RI h5mb9X3MGeSEypbtm/yGTu3e/oQZPoU= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SlOZ9KuK; spf=pass (imf15.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Wed, 5 Mar 2025 12:22:25 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1741206152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=60IYXncwzNAIP2DDPmSH9jAFXc71tM9uKRjLp1njXNA=; b=SlOZ9KuKK6o/G8SJDyBxmZR8PyO6enQbmzy41bVGtgaIGTlSNUOquZK2nusuY2JZUDxOIq ie0od8LW6I7uYE7lvYISL+k/PcOE+T9aA6X1+GgmDy237fdzJclo0ikgq9/4E0qcP52jqv pXXpueDBk4R+enqEUKdVUfNzaLXlSUo= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: SeongJae Park Cc: "Liam R. Howlett" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE Message-ID: References: <20250305181611.54484-1-sj@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250305181611.54484-1-sj@kernel.org> X-Migadu-Flow: FLOW_OUT X-Stat-Signature: pn6rag6re6ouhcokndwjwboorhws5w89 X-Rspamd-Queue-Id: C363DA0019 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1741206154-114064 X-HE-Meta: U2FsdGVkX18wqDaeAC+HwocYfshWSRTDRBKBSINJ/0Lc13Oy7FsOcR3SdT5W5uhCXqimSqdTbddcJq/dfjTnREZLRgwfIQEEQEkmGRKiZKjp7aIEj9ac7kRvTK3LiulHF/5/zVtBtvM1QRdeCpdMDQjP3xedmsUUDb5K0Php/JBYyD1H/k4sTcPUBd9g1az90ViehF10cxmvvw4GJ0LZxnRDSr+hkDxVExF2doCady+A2F01SXVREyJ5IUSFB++OhV76344ldHV+hKa4XHVUj/SFcbIAi+pznUx8Fjfr2GWrMPoFwcha53DraPTsx58o5z7EXW0/1y5KCNizMGdAWnHnMZSBxiREcR4t4ELYrM/6r/5dHThih+hvtc+2qQ+huG2S2j+XCTuynnnfeMm22HlMpo3AwIm0+bRbueA12pUypO7JR3ppgOC52uvc8u/qRkVGhI197PnqTCoOkGFuZolpQZyhKf03CaW8eATv38c0RVZraZ5u9GguWFTeuEnW5WaUSzVQml1TKzSLcbKXpHcLZVz+ceZ6gu+GuMkiW170A1343VmrV59nrBxzi0J4gw8QZ9CeWZqF6RJFfRZftG372ULd/2o5DHUi0kp9oNW2z7MX6nkysXyxbfpu22XQr83ZRFq8wxAsv3lRCH88Zl69LunjDrS/Rr8v8wu14Xw+iqi6DEBJV+V/g7hOAEJELm0IKNgKnfAPAGyRYEGki8fyYE3X7F6N38UxkbKcM3cn/H5/17KeV5yiBOmpO5g3+gkwa549UlonxVECkjqkTdmn+RJYyxiuVm+v9/So7Xa4oAuhAuPYjM7XNwgGST+KantI1k+Wy/Tm3kjcYOvOX9xBBadPpltH5rxR8gXYHEIb+6gEHN4QZxHNp821ytdWvnOszwIK4VSPnPt0bueX+GY0IzNCCjgHk7fHbpEYCwOJ0zwIswLxRIa+aI3kcMoW+P3uVPUTK593CxTJPK2 MVEFOgZC KVmsqvB0RWKxgtL77LwmS9qJOX3pW3RSaWtE99EPVbAx098WoZ6N2qHxCFhYx1ym8rW+mwxrGbJ6nUBpnn7U1/a7dNAKBdP4++Fj5/RaKMkdhtFl6VaDgmhmbnXQiXN+19VMOKwe8tUXK3HK31jdGpaqJA3/Ygr/dXkorg5JERnfRucE4REwLc+QuJ8YR+2taNRmyeI16lzc9qU/MhCGbkLHJ34jOjg6SDrJt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000022, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 05, 2025 at 10:15:55AM -0800, SeongJae Park wrote: > For MADV_DONTNEED[_LOCKED] or MADV_FREE madvise requests, tlb flushes > can happen for each vma of the given address ranges. Because such tlb > flushes are for address ranges of same process, doing those in a batch > is more efficient while still being safe. Modify madvise() and > process_madvise() entry level code path to do such batched tlb flushes, > while the internal unmap logics do only gathering of the tlb entries to > flush. > > In more detail, modify the entry functions to initialize an mmu_gather > ojbect and pass it to the internal logics. Also modify the internal > logics to do only gathering of the tlb entries to flush into the > received mmu_gather object. After all internal function calls are done, > the entry functions finish the mmu_gather object to flush the gathered > tlb entries in the one batch. > > Patches Seuquence > ================= > > First four patches are minor cleanups of madvise.c for readability. > > Following four patches (patches 5-8) define new data structure for > managing information that required for batched tlb flushing (mmu_gather > and behavior), and update code paths for MADV_DONTNEED[_LOCKED] and > MADV_FREE handling internal logics to receive it. > > Three patches (patches 9-11) for making internal MADV_DONTNEED[_LOCKED] > and MADV_FREE handling logic ready for batched tlb flushing follow. I think you forgot to complete the above sentence or the 'follow' at the end seems weird. > The > patches keep the support of unbatched tlb flushes use case, for > fine-grained and safe transitions. > > Next three patches (patches 12-14) update madvise() and > process_madvise() code to do the batched tlb flushes utilizing the > previous patches introduced changes. > > Final two patches (patches 15-16) clean up the internal logics' > unbatched tlb flushes use case support code, which is no more be used. > > Test Results > ============ > > I measured the time to apply MADV_DONTNEED advice to 256 MiB memory > using multiple process_madvise() calls. I apply the advice in 4 KiB > sized regions granularity, but with varying batch size (vlen) from 1 to > 1024. The source code for the measurement is available at GitHub[1]. > > The measurement results are as below. 'sz_batches' column shows the > batch size of process_madvise() calls. 'before' and 'after' columns are > the measured time to apply MADV_DONTNEED to the 256 MiB memory buffer in > nanoseconds, on kernels that built without and with the MADV_DONTNEED > tlb flushes batching patch of this series, respectively. For the > baseline, mm-unstable tree of 2025-03-04[2] has been used. > 'after/before' column is the ratio of 'after' to 'before'. So > 'afetr/before' value lower than 1.0 means this patch increased > efficiency over the baseline. And lower value means better efficiency. I would recommend to replace the after/end column with percentage i.e. percentage improvement or degradation. > > sz_batches before after after/before > 1 102842895 106507398 1.03563204828102 > 2 73364942 74529223 1.01586971880929 > 4 58823633 51608504 0.877343022998937 > 8 47532390 44820223 0.942940655834895 > 16 43591587 36727177 0.842529018271347 > 32 44207282 33946975 0.767904595446515 > 64 41832437 26738286 0.639175910310939 > 128 40278193 23262940 0.577556694263817 > 256 41568533 22355103 0.537789077136785 > 512 41626638 22822516 0.54826709762148 > 1024 44440870 22676017 0.510251419470411 > > For <=2 batch size, tlb flushes batching shows no big difference but > slight overhead. I think that's in an error range of this simple > micro-benchmark, and therefore can be ignored. I would recommend to run the experiment multiple times and report averages and standard deviation which will support your error range claim. > Starting from batch size > 4, however, tlb flushes batching shows clear efficiency gain. The > efficiency gain tends to be proportional to the batch size, as expected. > The efficiency gain ranges from about 13 percent with batch size 4, and > up to 49 percent with batch size 1,024. > > Please note that this is a very simple microbenchmark, so real > efficiency gain on real workload could be very different. > I think you are running a single thread benchmark on a free machine. I expect this series to be much more beneficial on loaded machine and for multi-threaded applications. No need to test that scenario but if you have already done that then it would be good to report.