From: "Darrick J. Wong" <djwong@kernel.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org,
willy@infradead.org, kirill@shutemov.name, bfoster@redhat.com
Subject: Re: [PATCH 07/12] fs: add RWF_UNCACHED iocb and FOP_UNCACHED file_operations flag
Date: Fri, 6 Dec 2024 09:35:39 -0800 [thread overview]
Message-ID: <20241206173539.GA7816@frogsfrogsfrogs> (raw)
In-Reply-To: <20241203153232.92224-9-axboe@kernel.dk>
On Tue, Dec 03, 2024 at 08:31:43AM -0700, Jens Axboe wrote:
> If a file system supports uncached buffered IO, it may set FOP_UNCACHED
> and enable RWF_UNCACHED. If RWF_UNCACHED is attempted without the file
> system supporting it, it'll get errored with -EOPNOTSUPP.
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
> include/linux/fs.h | 14 +++++++++++++-
> include/uapi/linux/fs.h | 6 +++++-
> 2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7e29433c5ecc..b64a78582f06 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -322,6 +322,7 @@ struct readahead_control;
> #define IOCB_NOWAIT (__force int) RWF_NOWAIT
> #define IOCB_APPEND (__force int) RWF_APPEND
> #define IOCB_ATOMIC (__force int) RWF_ATOMIC
> +#define IOCB_UNCACHED (__force int) RWF_UNCACHED
>
> /* non-RWF related bits - start at 16 */
> #define IOCB_EVENTFD (1 << 16)
> @@ -356,7 +357,8 @@ struct readahead_control;
> { IOCB_SYNC, "SYNC" }, \
> { IOCB_NOWAIT, "NOWAIT" }, \
> { IOCB_APPEND, "APPEND" }, \
> - { IOCB_ATOMIC, "ATOMIC"}, \
> + { IOCB_ATOMIC, "ATOMIC" }, \
> + { IOCB_UNCACHED, "UNCACHED" }, \
> { IOCB_EVENTFD, "EVENTFD"}, \
> { IOCB_DIRECT, "DIRECT" }, \
> { IOCB_WRITE, "WRITE" }, \
> @@ -2127,6 +2129,8 @@ struct file_operations {
> #define FOP_UNSIGNED_OFFSET ((__force fop_flags_t)(1 << 5))
> /* Supports asynchronous lock callbacks */
> #define FOP_ASYNC_LOCK ((__force fop_flags_t)(1 << 6))
> +/* File system supports uncached read/write buffered IO */
> +#define FOP_UNCACHED ((__force fop_flags_t)(1 << 7))
>
> /* Wrap a directory iterator that needs exclusive inode access */
> int wrap_directory_iterator(struct file *, struct dir_context *,
> @@ -3614,6 +3618,14 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags,
> if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE))
> return -EOPNOTSUPP;
> }
> + if (flags & RWF_UNCACHED) {
Should FMODE_NOREUSE imply RWF_UNCACHED? I know, I'm dredging this up
again from v3:
https://lore.kernel.org/linux-fsdevel/ZzKn4OyHXq5r6eiI@dread.disaster.area/
but the manpage for fadvise says NOREUSE means "The specified data will
be accessed only once." and I think that fits what you're doing here.
And yeah, it's annoying that people keep asking for moar knobs to tweak
io operations: Let's have a mount option, and a fadvise mode, and a
fcntl mode, and finally per-io flags! (mostly kidding)
Also, one of your replies referenced a poc to set UNCACHED on NOREUSE
involving willy and yu. Where was that? I've found this:
https://lore.kernel.org/linux-fsdevel/ZzI97bky3Rwzw18C@casper.infradead.org/
but that turned into a documentation discussion.
There were also a few unanswered questions (imo) from the last few
iterations of this patchset.
If someone issues a lot of small appending uncached writes to a file,
does that mean the writes and writeback will now be lockstepping each
other to write out the folio? Or should programs simply not do that?
What if I wanted to do a bunch of small writes to adjacent bytes,
amortize writeback over a single disk io, and not wait for reclaim to
drop the folio? Admittedly that doesn't really fit with "will be
accessed only once" so I think "don't do that" is an acceptable answer.
And, I guess if the application really wants fine-grained control then
it /can/ still pwrite, sync_file_range, and fadvise(WONTNEED). Though
that's three syscalls/uring ops/whatever. But that might be cheaper
than repeated rewrites.
--D
> + /* file system must support it */
> + if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED))
> + return -EOPNOTSUPP;
> + /* DAX mappings not supported */
> + if (IS_DAX(ki->ki_filp->f_mapping->host))
> + return -EOPNOTSUPP;
> + }
> kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);
> if (flags & RWF_SYNC)
> kiocb_flags |= IOCB_DSYNC;
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 753971770733..dc77cd8ae1a3 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -332,9 +332,13 @@ typedef int __bitwise __kernel_rwf_t;
> /* Atomic Write */
> #define RWF_ATOMIC ((__force __kernel_rwf_t)0x00000040)
>
> +/* buffered IO that drops the cache after reading or writing data */
> +#define RWF_UNCACHED ((__force __kernel_rwf_t)0x00000080)
> +
> /* mask of flags supported by the kernel */
> #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
> - RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC)
> + RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC |\
> + RWF_UNCACHED)
>
> #define PROCFS_IOCTL_MAGIC 'f'
>
> --
> 2.45.2
>
>
next prev parent reply other threads:[~2024-12-06 17:35 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-03 15:31 [PATCHSET v6 0/12] Uncached buffered IO Jens Axboe
2024-12-03 15:31 ` [PATCH 01/12] mm/filemap: change filemap_create_folio() to take a struct kiocb Jens Axboe
2024-12-10 11:13 ` Christoph Hellwig
2024-12-12 15:49 ` Jens Axboe
2024-12-03 15:31 ` [PATCH 02/12] mm/readahead: add folio allocation helper Jens Axboe
2024-12-03 15:31 ` [PATCH 03/12] mm: add PG_uncached page flag Jens Axboe
2024-12-03 15:31 ` [PATCH 04/12] mm/readahead: add readahead_control->uncached member Jens Axboe
2024-12-03 15:31 ` [PATCH 05/12] mm/filemap: use page_cache_sync_ra() to kick off read-ahead Jens Axboe
2024-12-10 11:15 ` Christoph Hellwig
2024-12-03 15:31 ` [PATCH 06/12] mm/truncate: add folio_unmap_invalidate() helper Jens Axboe
2024-12-10 11:21 ` Christoph Hellwig
2024-12-12 20:19 ` Jens Axboe
2024-12-03 15:31 ` [PATCH 07/12] fs: add RWF_UNCACHED iocb and FOP_UNCACHED file_operations flag Jens Axboe
2024-12-06 17:35 ` Darrick J. Wong [this message]
2024-12-10 11:22 ` Christoph Hellwig
2024-12-12 19:42 ` Jens Axboe
2024-12-03 15:31 ` [PATCH 08/12] mm/filemap: add read support for RWF_UNCACHED Jens Axboe
2024-12-03 15:31 ` [PATCH 09/12] mm/filemap: drop uncached pages when writeback completes Jens Axboe
2024-12-03 15:31 ` [PATCH 10/12] mm/filemap: add filemap_fdatawrite_range_kick() helper Jens Axboe
2024-12-03 15:31 ` [PATCH 11/12] mm/filemap: make buffered writes work with RWF_UNCACHED Jens Axboe
2024-12-06 17:17 ` Darrick J. Wong
2024-12-06 18:22 ` Jens Axboe
2024-12-10 11:31 ` Christoph Hellwig
2024-12-12 15:51 ` Jens Axboe
2024-12-03 15:31 ` [PATCH 12/12] mm: add FGP_UNCACHED folio creation flag Jens Axboe
2024-12-03 18:23 ` [PATCHSET v6 0/12] Uncached buffered IO Christoph Lameter (Ampere)
2024-12-03 21:06 ` Jens Axboe
2024-12-03 22:16 ` Christoph Lameter (Ampere)
2024-12-03 22:41 ` Jens Axboe
2024-12-04 5:52 ` Darrick J. Wong
2024-12-04 16:36 ` Jens Axboe
2024-12-10 11:11 ` Christoph Hellwig
2024-12-12 15:48 ` Jens Axboe
2024-12-12 16:59 ` Christoph Lameter (Ampere)
2024-12-12 19:14 ` Jens Axboe
2024-12-12 19:35 ` Matthew Wilcox
2024-12-12 19:36 ` Jens Axboe
2024-12-12 20:06 ` Christoph Lameter (Ampere)
2024-12-13 5:04 ` Johannes Weiner
2024-12-13 14:49 ` Jens Axboe
2024-12-06 17:37 ` Darrick J. Wong
2024-12-10 9:48 ` Bharata B Rao
2024-12-12 15:46 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241206173539.GA7816@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=axboe@kernel.dk \
--cc=bfoster@redhat.com \
--cc=clm@meta.com \
--cc=hannes@cmpxchg.org \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox