From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B349D2E9C7 for ; Mon, 11 Nov 2024 09:15:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A5266B0089; Mon, 11 Nov 2024 04:15:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 22EC96B008A; Mon, 11 Nov 2024 04:15:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 08D3B6B008C; Mon, 11 Nov 2024 04:15:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DA2BD6B0089 for ; Mon, 11 Nov 2024 04:15:27 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 54EF61A17C6 for ; Mon, 11 Nov 2024 09:15:27 +0000 (UTC) X-FDA: 82773255474.28.4BB1542 Received: from fout-b3-smtp.messagingengine.com (fout-b3-smtp.messagingengine.com [202.12.124.146]) by imf14.hostedemail.com (Postfix) with ESMTP id 669C4100009 for ; Mon, 11 Nov 2024 09:14:40 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm2 header.b="v pkqrXf"; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=YkH0erlT; spf=pass (imf14.hostedemail.com: domain of kirill@shutemov.name designates 202.12.124.146 as permitted sender) smtp.mailfrom=kirill@shutemov.name; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731316334; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2EuvaPmNkNvZwvgVke3OldybsScSc3h+cb5ManvZGQY=; b=5LI7QMs/lzffZRb9nbnpD08Qj/8G6us12VRh9HaYwaqo/Cdo4d+JTntU6d/rCAMadJV3nk FkH6TFAxzhWkQc/pxDMNuUzPsi0UdAm33FdawvKWEr/yX5f1Dm0rjgT7+Wweos5WPjdUVZ Q1otR3LQv4d/on317KFyPmBYIPX73PY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm2 header.b="v pkqrXf"; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=YkH0erlT; spf=pass (imf14.hostedemail.com: domain of kirill@shutemov.name designates 202.12.124.146 as permitted sender) smtp.mailfrom=kirill@shutemov.name; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731316334; a=rsa-sha256; cv=none; b=5VSlnB/t8udbmOypkqI8n26ik5cg0p7eBKfqzItLGFLci/QOHR6X1jK+g0qYq+DItbtcBX dKIfZusWBxwwkiPrpo7HKm+m2XzGqJpMYjLu1b1wTKInWh2rvAsiBmk38pnNZPm6NcdMxw cvUaXYLCesupe1AeET/tjmTdieTEvks= Received: from phl-compute-07.internal (phl-compute-07.phl.internal [10.202.2.47]) by mailfout.stl.internal (Postfix) with ESMTP id 091BB114019A; Mon, 11 Nov 2024 04:15:24 -0500 (EST) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-07.internal (MEProxy); Mon, 11 Nov 2024 04:15:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-type:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1731316523; x= 1731402923; bh=2EuvaPmNkNvZwvgVke3OldybsScSc3h+cb5ManvZGQY=; b=v pkqrXfywcQ1vJuLJM2Ii/R4SNkVjRp2reZXj1EBd1HBTVUC3ft4u/aNuouWss0T/ DAAWoS3BPDBwNI7l6806JLVs5jk1TI3TNS9HECh/hmL9l8uAIqCRHpwT9afHA7ID Q1b6GT1BZWoHbZ7a+l3lp0vHYkX4GOKq2tP9vQCixctufWQXOP/3EghF8FZn0vz1 lATVLUaTbinT/W3bsD8PrhePGo3gnEuaLqr3oWhOmR7nS/IQ6wV5WD35qRiWzGIf VS/0hio2B+cGZ7Ic7OVGG0mJKF0R/6v+ZCaWTXb3hw1J7ClVPdH91cGt4dd2pXN6 fjCGlOmfjHQAMCd+PGVaA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1731316523; x=1731402923; bh=2EuvaPmNkNvZwvgVke3OldybsScSc3h+cb5 ManvZGQY=; b=YkH0erlTqgfmMbLC5gRMWzYn8rBgzDjOPRhz6S4lROZkNimdxdk IM8ubshIu5tXcVDtJ8kBLaV/O9ZlmkVW6ix1vufdogddxn2nZX38GBdOuidCahWO SPi4OV1OmhVOd7JH8PtOnL8eKyUVXOQ2wezMnGzdCqsj//jCrnQ9E5Lusn0zvdEd l+SQPVH2aDDYXG6MAE3d3HgSxPwSY261R4QxsAhPSKJlRsYxMaPmChpeEMThoyO1 hbM90oyOMHRM8D6Q8gmovOt1cDwfudRhwFgjRQ+mpCsVymyq5Dq15rFw+ZAbrRCZ ywnxP95HQRd3GRVNSFVm4D/78tk7N1MWpIQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefuddruddvgddtudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdpuffr tefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnth hsucdlqddutddtmdenucfjughrpeffhffvvefukfhfgggtuggjsehttdfstddttddvnecu hfhrohhmpedfmfhirhhilhhlucetrdcuufhhuhhtvghmohhvfdcuoehkihhrihhllhessh hhuhhtvghmohhvrdhnrghmvgeqnecuggftrfgrthhtvghrnhepffdvveeuteduhffhffev lefhteefveevkeelveejudduvedvuddvleetudevhfeknecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovhdr nhgrmhgvpdhnsggprhgtphhtthhopeejpdhmohguvgepshhmthhpohhuthdprhgtphhtth hopegrgigsohgvsehkvghrnhgvlhdrughkpdhrtghpthhtoheplhhinhhugidqmhhmsehk vhgrtghkrdhorhhgpdhrtghpthhtoheplhhinhhugidqfhhsuggvvhgvlhesvhhgvghrrd hkvghrnhgvlhdrohhrghdprhgtphhtthhopehhrghnnhgvshestghmphigtghhghdrohhr ghdprhgtphhtthhopegtlhhmsehmvghtrgdrtghomhdprhgtphhtthhopehlihhnuhigqd hkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhih sehinhhfrhgruggvrggurdhorhhg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 11 Nov 2024 04:15:20 -0500 (EST) Date: Mon, 11 Nov 2024 11:15:16 +0200 From: "Kirill A. Shutemov" To: Jens Axboe Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org Subject: Re: [PATCH 08/15] mm/filemap: add read support for RWF_UNCACHED Message-ID: References: <20241110152906.1747545-1-axboe@kernel.dk> <20241110152906.1747545-9-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241110152906.1747545-9-axboe@kernel.dk> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 669C4100009 X-Stat-Signature: c4qujfdg8wiq1yoirdfpremrtpzp1p6i X-Rspam-User: X-HE-Tag: 1731316480-192155 X-HE-Meta: U2FsdGVkX18txodF+5dGZ+cSrWq7iCu64jU7H9kmS3xmZAg5soDbbCUs4JkcDFKhwOZV1wpIYt5yxzdD5jXoUgqTEebwBRwQRILXaxTSo6RSpA0+UU9+NONO2d5VwrnCQeUnfXehqEQQjc1R9mOTvxHJIln03Xk1hkCgH6iecpv97n6YtrVLPCEehfyxfri17qZdhPbyGAedDf+m4OPUGmK19qbUw0d+zlMWJ2z1jEM2/+Xd+pDQ61iE1uxBgo1BFdfpwv8dfwrjBIqX/56FZgpWf0hBbfjcbGCDe5YPGjUoT52TVimFbU/Y1cMZDt8c/2creAft5tTaii1ekGWZiXYFhCJN2kOIufoKEM7zR+K8Qe4EUG9+mmzLhDJ3lNt72q06G2+TgYWnSDQT5Eec0D9BlrglE7MJeUTu+z/s3wwdtxix1PtT2mcwPqb7+Pp/co+kUpIN+X7PZ5LrnXiEL/DhWB5EBfUF5J+DVqeP9wF3yQlAuXFbLlsXNKCsLwI/vCpLqFqSqq1CgHNvodl0NnFLsuIGw+ecjl8nIJoP0ZPYCl/+StUrXSwpZEohoJmHNwQ7fYdll/0asWZRT5FkwX9aViu+IfdHY7ip6pPZbWlVQGvy/XKZ2UZK0Xmd3rI6bgoR0hosDj1EsRNTC5lTgI76lOzktts1iVoAR/csfUIyj3ygJo/6GRAiFCAi0tLIGzW1JhjF+P83BhZCc5ainI07BHOwihXMBiEwl6shvCLCOqDdRpZBPVeEAft3L16UY+i6ASZLREjo5tr3/UkcosE4zE4BGGb6wt0cUoY8SWmpmBhlqdARqAgYJGRWnby5ScwPNbo/Ppz3vmNjvIPkFmZWxCKl/jnTWmxCykXLxVho7raQ1VfBUtRpC+93A09YYC/Kpg4zu0j7jsoo4FUU9kDh7vI9z7EYBo/a6Vo0AAeUi2ZWHwmsdjnopuXz9pdpbcGq8nOXc7JBH7m2u2W //JnyHfj y9hD1quwXgEaj/L2lYzDti/MQAXBWSB1P52vrc2Y4WaEhWlvnerJXxjtPNbe/NBYqCa1GQ8MsyL8NfoQiXNiI6Gs+RZ3Rv4MzhwUp1tdbNrVv2HDr6np4D7lWV+pH4I93qbE7jctMNgu2o6BX2l5jEYe9DfkHrdvBV94rgOBZ4S8x3UPP0cPrbsI4vgVZNAOcbJ5zJgKQTDw8Ra6uFF2RzmBiju95v7fibm8ioFjhrNzkqk/ldku+oivUGOUk0u2XlGj6D5fn1WeypmKTwB1qpBW5hGaveTouxqtFl2R7Xy0ZU49oxCkjpb2fYkpvfCWFOsAe+gPiBzB6g+YBWa6HFare8EHXgcfKyx9r/sXQGchew3wIFRpZb5tTgnXOoZIw45DaaaPl+Cd4Ak9VPr7Smq7guXFaeyU0Y993o7Zao0O7rRx02FcSs50Yu3DysqvP37x+RwcXWwnn4WU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Nov 10, 2024 at 08:28:00AM -0700, Jens Axboe wrote: > Add RWF_UNCACHED as a read operation flag, which means that any data > read wil be removed from the page cache upon completion. Uses the page > cache to synchronize, and simply prunes folios that were instantiated > when the operation completes. While it would be possible to use private > pages for this, using the page cache as synchronization is handy for a > variety of reasons: > > 1) No special truncate magic is needed > 2) Async buffered reads need some place to serialize, using the page > cache is a lot easier than writing extra code for this > 3) The pruning cost is pretty reasonable > > and the code to support this is much simpler as a result. > > You can think of uncached buffered IO as being the much more attractive > cousing of O_DIRECT - it has none of the restrictions of O_DIRECT. Yes, > it will copy the data, but unlike regular buffered IO, it doesn't run > into the unpredictability of the page cache in terms of reclaim. As an > example, on a test box with 32 drives, reading them with buffered IO > looks as follows: > > Reading bs 65536, uncached 0 > 1s: 145945MB/sec > 2s: 158067MB/sec > 3s: 157007MB/sec > 4s: 148622MB/sec > 5s: 118824MB/sec > 6s: 70494MB/sec > 7s: 41754MB/sec > 8s: 90811MB/sec > 9s: 92204MB/sec > 10s: 95178MB/sec > 11s: 95488MB/sec > 12s: 95552MB/sec > 13s: 96275MB/sec > > where it's quite easy to see where the page cache filled up, and > performance went from good to erratic, and finally settles at a much > lower rate. Looking at top while this is ongoing, we see: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 7535 root 20 0 267004 0 0 S 3199 0.0 8:40.65 uncached > 3326 root 20 0 0 0 0 R 100.0 0.0 0:16.40 kswapd4 > 3327 root 20 0 0 0 0 R 100.0 0.0 0:17.22 kswapd5 > 3328 root 20 0 0 0 0 R 100.0 0.0 0:13.29 kswapd6 > 3332 root 20 0 0 0 0 R 100.0 0.0 0:11.11 kswapd10 > 3339 root 20 0 0 0 0 R 100.0 0.0 0:16.25 kswapd17 > 3348 root 20 0 0 0 0 R 100.0 0.0 0:16.40 kswapd26 > 3343 root 20 0 0 0 0 R 100.0 0.0 0:16.30 kswapd21 > 3344 root 20 0 0 0 0 R 100.0 0.0 0:11.92 kswapd22 > 3349 root 20 0 0 0 0 R 100.0 0.0 0:16.28 kswapd27 > 3352 root 20 0 0 0 0 R 99.7 0.0 0:11.89 kswapd30 > 3353 root 20 0 0 0 0 R 96.7 0.0 0:16.04 kswapd31 > 3329 root 20 0 0 0 0 R 96.4 0.0 0:11.41 kswapd7 > 3345 root 20 0 0 0 0 R 96.4 0.0 0:13.40 kswapd23 > 3330 root 20 0 0 0 0 S 91.1 0.0 0:08.28 kswapd8 > 3350 root 20 0 0 0 0 S 86.8 0.0 0:11.13 kswapd28 > 3325 root 20 0 0 0 0 S 76.3 0.0 0:07.43 kswapd3 > 3341 root 20 0 0 0 0 S 74.7 0.0 0:08.85 kswapd19 > 3334 root 20 0 0 0 0 S 71.7 0.0 0:10.04 kswapd12 > 3351 root 20 0 0 0 0 R 60.5 0.0 0:09.59 kswapd29 > 3323 root 20 0 0 0 0 R 57.6 0.0 0:11.50 kswapd1 > [...] > > which is just showing a partial list of the 32 kswapd threads that are > running mostly full tilt, burning ~28 full CPU cores. > > If the same test case is run with RWF_UNCACHED set for the buffered read, > the output looks as follows: > > Reading bs 65536, uncached 0 > 1s: 153144MB/sec > 2s: 156760MB/sec > 3s: 158110MB/sec > 4s: 158009MB/sec > 5s: 158043MB/sec > 6s: 157638MB/sec > 7s: 157999MB/sec > 8s: 158024MB/sec > 9s: 157764MB/sec > 10s: 157477MB/sec > 11s: 157417MB/sec > 12s: 157455MB/sec > 13s: 157233MB/sec > 14s: 156692MB/sec > > which is just chugging along at ~155GB/sec of read performance. Looking > at top, we see: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 7961 root 20 0 267004 0 0 S 3180 0.0 5:37.95 uncached > 8024 axboe 20 0 14292 4096 0 R 1.0 0.0 0:00.13 top > > where just the test app is using CPU, no reclaim is taking place outside > of the main thread. Not only is performance 65% better, it's also using > half the CPU to do it. > > Signed-off-by: Jens Axboe > --- > mm/filemap.c | 18 ++++++++++++++++-- > mm/swap.c | 2 ++ > 2 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 38dc94b761b7..bd698340ef24 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2474,6 +2474,8 @@ static int filemap_create_folio(struct kiocb *iocb, > folio = filemap_alloc_folio(mapping_gfp_mask(mapping), min_order); > if (!folio) > return -ENOMEM; > + if (iocb->ki_flags & IOCB_UNCACHED) > + __folio_set_uncached(folio); > > /* > * Protect against truncate / hole punch. Grabbing invalidate_lock > @@ -2519,6 +2521,8 @@ static int filemap_readahead(struct kiocb *iocb, struct file *file, > > if (iocb->ki_flags & IOCB_NOIO) > return -EAGAIN; > + if (iocb->ki_flags & IOCB_UNCACHED) > + ractl.uncached = 1; > page_cache_async_ra(&ractl, folio, last_index - folio->index); > return 0; > } > @@ -2548,6 +2552,8 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count, > return -EAGAIN; > if (iocb->ki_flags & IOCB_NOWAIT) > flags = memalloc_noio_save(); > + if (iocb->ki_flags & IOCB_UNCACHED) > + ractl.uncached = 1; > page_cache_sync_ra(&ractl, last_index - index); > if (iocb->ki_flags & IOCB_NOWAIT) > memalloc_noio_restore(flags); > @@ -2706,8 +2712,16 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter, > } > } > put_folios: > - for (i = 0; i < folio_batch_count(&fbatch); i++) > - folio_put(fbatch.folios[i]); > + for (i = 0; i < folio_batch_count(&fbatch); i++) { > + struct folio *folio = fbatch.folios[i]; > + > + if (folio_test_uncached(folio)) { > + folio_lock(folio); > + invalidate_complete_folio2(mapping, folio, 0); > + folio_unlock(folio); I am not sure it is safe. What happens if it races with page fault? The only current caller of invalidate_complete_folio2() unmaps the folio explicitly before calling it. And folio lock prevents re-faulting. I think we need to give up PG_uncached if we see folio_mapped(). And maybe also mark the page accessed. > + } > + folio_put(folio); > + } > folio_batch_init(&fbatch); > } while (iov_iter_count(iter) && iocb->ki_pos < isize && !error); > > diff --git a/mm/swap.c b/mm/swap.c > index 835bdf324b76..f2457acae383 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -472,6 +472,8 @@ static void folio_inc_refs(struct folio *folio) > */ > void folio_mark_accessed(struct folio *folio) > { > + if (folio_test_uncached(folio)) > + return; if (folio_test_uncached(folio)) { if (folio_mapped(folio)) folio_clear_uncached(folio); else return; } > if (lru_gen_enabled()) { > folio_inc_refs(folio); > return; > -- > 2.45.2 > -- Kiryl Shutsemau / Kirill A. Shutemov