linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org,
	willy@infradead.org, kirill@shutemov.name, bfoster@redhat.com
Subject: Re: [PATCHSET v8 0/12] Uncached buffered IO
Date: Mon, 13 Jan 2025 08:34:18 -0700	[thread overview]
Message-ID: <3cba2c9e-4136-4199-84a6-ddd6ad302875@kernel.dk> (raw)
In-Reply-To: <20250107193532.f8518eb71a469b023b6a9220@linux-foundation.org>

(sorry missed this reply!)

On 1/7/25 8:35 PM, Andrew Morton wrote:
> On Fri, 20 Dec 2024 08:47:38 -0700 Jens Axboe <axboe@kernel.dk> wrote:
> 
>> So here's a new approach to the same concent, but using the page cache
>> as synchronization. Due to excessive bike shedding on the naming, this
>> is now named RWF_DONTCACHE, and is less special in that it's just page
>> cache IO, except it prunes the ranges once IO is completed.
>>
>> Why do this, you may ask? The tldr is that device speeds are only
>> getting faster, while reclaim is not. Doing normal buffered IO can be
>> very unpredictable, and suck up a lot of resources on the reclaim side.
>> This leads people to use O_DIRECT as a work-around, which has its own
>> set of restrictions in terms of size, offset, and length of IO. It's
>> also inherently synchronous, and now you need async IO as well. While
>> the latter isn't necessarily a big problem as we have good options
>> available there, it also should not be a requirement when all you want
>> to do is read or write some data without caching.
> 
> Of course, we're doing something here which userspace could itself do:
> drop the pagecache after reading it (with appropriate chunk sizing) and
> for writes, sync the written area then invalidate it.  Possible
> added benefits from using separate threads for this.
> 
> I suggest that diligence requires that we at least justify an in-kernel
> approach at this time, please.

Conceptually yes. But you'd end up doing extra work to do it. Some of
that not so expensive, like system calls, and others more so, like LRU
manipulation. Outside of that, I do think it makes sense to expose as a
generic thing, rather than require applications needing to kick
writeback manually, reclaim manually, etc.

> And there's a possible middle-ground implementation where the kernel
> itself kicks off threads to do the drop-behind just before the read or
> write syscall returns, which will probably be simpler.  Can we please
> describe why this also isn't acceptable?

That's more of an implementation detail. I didn't test anything like
that, though we surely could. If it's better, there's no reason why it
can't just be changed to do that. My gut tells me you want the task/CPU
that just did the page cache additions to do the pruning to, that should
be more efficient than having a kworker or similar do it.

> Also, it seems wrong for a read(RWF_DONTCACHE) to drop cache if it was
> already present.  Because it was presumably present for a reason.  Does
> this implementation already take care of this?  To make an application
> which does read(/etc/passwd, RWF_DONTCACHE) less annoying?

The implementation doesn't drop pages that were already present, only
pages that got created/added to the page cache for the operation. So
that part should already work as you expect.

> Also, consuming a new page flag isn't a minor thing.  It would be nice
> to see some justification around this, and some decription of how many
> we have left.

For sure, though various discussions on this already occurred and Kirill
posted patches for unifying some of this already. It's not something I
wanted to tackle, as I think that should be left to people more familiar
with the page/folio flags and they (sometimes odd) interactions.

-- 
Jens Axboe


  reply	other threads:[~2025-01-13 15:34 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20 15:47 Jens Axboe
2024-12-20 15:47 ` [PATCH 01/12] mm/filemap: change filemap_create_folio() to take a struct kiocb Jens Axboe
2024-12-20 16:11   ` Matthew Wilcox
2024-12-20 15:47 ` [PATCH 02/12] mm/filemap: use page_cache_sync_ra() to kick off read-ahead Jens Axboe
2024-12-20 16:12   ` Matthew Wilcox
2024-12-20 15:47 ` [PATCH 03/12] mm/readahead: add folio allocation helper Jens Axboe
2024-12-20 16:12   ` Matthew Wilcox
2024-12-20 15:47 ` [PATCH 04/12] mm: add PG_dropbehind folio flag Jens Axboe
2024-12-20 15:47 ` [PATCH 05/12] mm/readahead: add readahead_control->dropbehind member Jens Axboe
2024-12-20 15:47 ` [PATCH 06/12] mm/truncate: add folio_unmap_invalidate() helper Jens Axboe
2024-12-20 16:21   ` Matthew Wilcox
2024-12-20 16:28     ` Jens Axboe
2025-01-02 20:12       ` Jens Axboe
2024-12-20 15:47 ` [PATCH 07/12] fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag Jens Axboe
2025-01-04  8:39   ` (subset) " Christian Brauner
2025-01-06 15:44     ` Jens Axboe
2024-12-20 15:47 ` [PATCH 08/12] mm/filemap: add read support for RWF_DONTCACHE Jens Axboe
2024-12-20 15:47 ` [PATCH 09/12] mm/filemap: drop streaming/uncached pages when writeback completes Jens Axboe
2025-01-18  3:29   ` Jingbo Xu
2025-03-04  3:12   ` Ritesh Harjani
2024-12-20 15:47 ` [PATCH 10/12] mm/filemap: add filemap_fdatawrite_range_kick() helper Jens Axboe
2025-01-18  3:25   ` Jingbo Xu
2024-12-20 15:47 ` [PATCH 11/12] mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue Jens Axboe
2024-12-20 15:47 ` [PATCH 12/12] mm: add FGP_DONTCACHE folio creation flag Jens Axboe
2025-01-08  3:35 ` [PATCHSET v8 0/12] Uncached buffered IO Andrew Morton
2025-01-13 15:34   ` Jens Axboe [this message]
2025-01-14  0:46     ` Andrew Morton
2025-01-14  0:56       ` Jens Axboe
2025-01-16 10:06       ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3cba2c9e-4136-4199-84a6-ddd6ad302875@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=akpm@linux-foundation.org \
    --cc=bfoster@redhat.com \
    --cc=clm@meta.com \
    --cc=hannes@cmpxchg.org \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox