From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DCA2C19F32 for ; Wed, 5 Mar 2025 22:58:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81D69280005; Wed, 5 Mar 2025 17:58:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CD5A280003; Wed, 5 Mar 2025 17:58:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66D7D280005; Wed, 5 Mar 2025 17:58:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4946F280003 for ; Wed, 5 Mar 2025 17:58:06 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 152B5C0B87 for ; Wed, 5 Mar 2025 22:58:08 +0000 (UTC) X-FDA: 83189012256.12.B9107B4 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf09.hostedemail.com (Postfix) with ESMTP id 6252114000A for ; Wed, 5 Mar 2025 22:58:06 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="nI+oXO6/"; spf=pass (imf09.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741215486; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BWFToNk4L59PSLDATDh9cJQY4kBO2+XJ8JKMbtNiU+0=; b=K6AhUs1SQnYBBMWhx0sw4WE6cK9QHs2LGJ5fxJKKPqRh8El7m1nWf6bnirH/HgyqzRWBLd QfR1aKGl/gadUn/BKTl5T17Ax4EDf66FTpmV/b/4NLEk/vqYb68t2vpNX1hEuOUuz+eWVS jwI6KBfvbqjgutsCzD88d1FFRexRSUA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741215486; a=rsa-sha256; cv=none; b=73SKjPz1G6iWQlaBxdU9G+AfjhOYfNOiXCvJ2ObnLZok0UzJpUmdZM3n9hovLFs0JH3eTj 1Fg/yetzjkWDR6q6Mib/tOTMulx60Mg0X1rpydgVqt0u8LXuIfxhPTV1ljo2qTm29SsOZI Sfv+hREop1w7BJHle7gsMZlD0vKqE2k= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="nI+oXO6/"; spf=pass (imf09.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id E3FEBA46536; Wed, 5 Mar 2025 22:52:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE658C4CED1; Wed, 5 Mar 2025 22:58:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741215484; bh=U7UUG3IAebvuSH3KIMAvaRXjcrI688/RCigv4OtEDUE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nI+oXO6/flE62nyvNu03ThfyeTeH4sD/N8/9JJI0nzH1ndtrfXC7iB1MnFmRIt93c oj7xiFA1ncpRhQ20y87t2M6OVLXgpbccjceFAVYl464Gl4ZSxKo7VNC67Fp5U9VWSb wwC5b+JacTmUvq3h1b1pg4OtTpY6rritDKHc8IqwqZq5j+q51CD2EuCXT+aP3w9dvE xYy/fY0Pdw25eqwOHW+9COMSzj7o7YEs6Ue5/rZIivLmXRPGzbCvoPtUQ/uKWvQMOt H8TBysPsBVjPK+gojXi/p1reCstQroF2upP++kt1r0caM/wz58Ztyk0kM/CABOxiVh hBMAsZn3pMdEw== From: SeongJae Park To: Shakeel Butt Cc: SeongJae Park , "Liam R. Howlett" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE Date: Wed, 5 Mar 2025 14:58:02 -0800 Message-Id: <20250305225803.60171-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: q1p38y4jr9wtn9nxmt7fa4sp7qj395ds X-Rspamd-Queue-Id: 6252114000A X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1741215486-117728 X-HE-Meta: U2FsdGVkX18GGYT6IM+d6oN2x14xxVVNF8Irbed2vgE/XMSu5y4HFjFL7P9sRAfQeA+/3juBK4zhVhvm2w8WLELUYW4nHbjbj5GhGlFwS9wzTW9THtON9qzJqvLO76VA4qXdstc50yf2DkmDkGLnrvfZxrEAlAdMF+BGMmwYyMrRKDMUV4JDfeT5sp3IEfUAMZxq4/RL18NumDy0ZhWZ71EgKFKvrw/GakyK/C7pn8SgKchxjf9tJ5rm4/7a8xVg5bds3+MBTvtKKEJcfKGracj+xoxrVfvxs1WdPUzSs5JZ1tHEeQhQtMCeq+fdx8ydfD+njWgKXqphFTLc5UYVtHvCtLJIBwLCI7yJVjcoVZtoEXhsKH0866Opno5qhCds1aFqrRL/FgOp7JS9oNdepAiWGcH7EOZCupL6YuJsudj119ntr4PQ6exb/SsLUxswOcRZljLsCrWpQMeQ4GSVJUbA1q8dpLDAE7XTF6G1WTBhr3KgXldRSA26b/qR7FqJjQ2FmeeVrX7AergT+Jj81PXF9abYIRBwFNWSNvvBH+6mOJpvpWjw8lPC043t3yupDIZrQ57bCKnEfPGlhViHxTyzitIuK40DH+vxehniuthoiFu6OWiEIljDO2KgZrLgtNbnxNYM7sWPM3i0Hzx5G+BoHzKnlt8hTlPjdpQvWmXcHf/2cyP3gWgeq4fGbde3hvrhrgif8J9NjuxH0c+wbS4dOyIIksB8tkLXbwZzdb+I+G/oK+RNt9ZZloNR9Km/CMZ/sv+CsYrqIMY0t2MkTPDP/JMSFD8HC7DUEu/21ez96qYUqJGWtCtuZ3guyRp71H+0Y9lxbEqZaQNrB/FzCw54LGvwgi2fiv3TwSYue0NS8DCZ01fG20koErvY9keut3Ks0TGydEFRYg6GNQSwZOCFNIunTZV+W2etdmYNojNOb/ZGWiqwPEcUNGevqqFpth7t8xVAcVu6WliB+jh fRb1hfj3 kv9G1CqCAgQGQ7pOBXuqIbonSVgEh1Vt6ARa3hu+wlX20FQndZPdWIrPDwEA0etH4Y/KU7JF6pvthw4DqALIuzp27asOMIixWJAgH48UBC4LoU69vMRtXHG00wdY3g4H5lIW0ET5xZp/n7jhi5pwRYfKzvZGhivAsE/HIOdmR0oU6dUqq8oCv667DZ2pu3s/PmTwkMgz68sPTzcv8Dg9/reoqIYxUYiICjj/dTxluObVUyPUjYTsUppCKC4tH+gUpoAUD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 5 Mar 2025 12:22:25 -0800 Shakeel Butt wrote: > On Wed, Mar 05, 2025 at 10:15:55AM -0800, SeongJae Park wrote: > > For MADV_DONTNEED[_LOCKED] or MADV_FREE madvise requests, tlb flushes > > can happen for each vma of the given address ranges. Because such tlb > > flushes are for address ranges of same process, doing those in a batch > > is more efficient while still being safe. Modify madvise() and > > process_madvise() entry level code path to do such batched tlb flushes, > > while the internal unmap logics do only gathering of the tlb entries to > > flush. > > > > In more detail, modify the entry functions to initialize an mmu_gather > > ojbect and pass it to the internal logics. Also modify the internal > > logics to do only gathering of the tlb entries to flush into the > > received mmu_gather object. After all internal function calls are done, > > the entry functions finish the mmu_gather object to flush the gathered > > tlb entries in the one batch. > > > > Patches Seuquence > > ================= > > > > First four patches are minor cleanups of madvise.c for readability. > > > > Following four patches (patches 5-8) define new data structure for > > managing information that required for batched tlb flushing (mmu_gather > > and behavior), and update code paths for MADV_DONTNEED[_LOCKED] and > > MADV_FREE handling internal logics to receive it. > > > > Three patches (patches 9-11) for making internal MADV_DONTNEED[_LOCKED] > > and MADV_FREE handling logic ready for batched tlb flushing follow. > > I think you forgot to complete the above sentence or the 'follow' at the > end seems weird. Thank you for catching this. I just wanted to say these three patches come after the previous ones. I will wordsmith this part in the next version. > > > The > > patches keep the support of unbatched tlb flushes use case, for > > fine-grained and safe transitions. > > > > Next three patches (patches 12-14) update madvise() and > > process_madvise() code to do the batched tlb flushes utilizing the > > previous patches introduced changes. > > > > Final two patches (patches 15-16) clean up the internal logics' > > unbatched tlb flushes use case support code, which is no more be used. > > > > Test Results > > ============ > > > > I measured the time to apply MADV_DONTNEED advice to 256 MiB memory > > using multiple process_madvise() calls. I apply the advice in 4 KiB > > sized regions granularity, but with varying batch size (vlen) from 1 to > > 1024. The source code for the measurement is available at GitHub[1]. > > > > The measurement results are as below. 'sz_batches' column shows the > > batch size of process_madvise() calls. 'before' and 'after' columns are > > the measured time to apply MADV_DONTNEED to the 256 MiB memory buffer in > > nanoseconds, on kernels that built without and with the MADV_DONTNEED > > tlb flushes batching patch of this series, respectively. For the > > baseline, mm-unstable tree of 2025-03-04[2] has been used. > > 'after/before' column is the ratio of 'after' to 'before'. So > > 'afetr/before' value lower than 1.0 means this patch increased > > efficiency over the baseline. And lower value means better efficiency. > > I would recommend to replace the after/end column with percentage i.e. > percentage improvement or degradation. Thank you for the nice suggestion. I will do so in the next version. > > > > > sz_batches before after after/before > > 1 102842895 106507398 1.03563204828102 > > 2 73364942 74529223 1.01586971880929 > > 4 58823633 51608504 0.877343022998937 > > 8 47532390 44820223 0.942940655834895 > > 16 43591587 36727177 0.842529018271347 > > 32 44207282 33946975 0.767904595446515 > > 64 41832437 26738286 0.639175910310939 > > 128 40278193 23262940 0.577556694263817 > > 256 41568533 22355103 0.537789077136785 > > 512 41626638 22822516 0.54826709762148 > > 1024 44440870 22676017 0.510251419470411 > > > > For <=2 batch size, tlb flushes batching shows no big difference but > > slight overhead. I think that's in an error range of this simple > > micro-benchmark, and therefore can be ignored. > > I would recommend to run the experiment multiple times and report > averages and standard deviation which will support your error range > claim. Again, good suggestion. I will do so. > > > Starting from batch size > > 4, however, tlb flushes batching shows clear efficiency gain. The > > efficiency gain tends to be proportional to the batch size, as expected. > > The efficiency gain ranges from about 13 percent with batch size 4, and > > up to 49 percent with batch size 1,024. > > > > Please note that this is a very simple microbenchmark, so real > > efficiency gain on real workload could be very different. > > > > I think you are running a single thread benchmark on a free machine. I > expect this series to be much more beneficial on loaded machine and for > multi-threaded applications. Your understanding of my test setup is correct and I agree to your expectation. > No need to test that scenario but if you > have already done that then it would be good to report. I don't have such test results or plans for those with specific timeline for now. I will share those if I get a chance, of course. Thanks, SJ