From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59FD9E77173 for ; Fri, 6 Dec 2024 17:35:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C4F16B02B2; Fri, 6 Dec 2024 12:35:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 974EE6B02B3; Fri, 6 Dec 2024 12:35:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83C796B02B4; Fri, 6 Dec 2024 12:35:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 658616B02B2 for ; Fri, 6 Dec 2024 12:35:43 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 14AB9141E81 for ; Fri, 6 Dec 2024 17:35:43 +0000 (UTC) X-FDA: 82865235768.03.635543A Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id 7C02920012 for ; Fri, 6 Dec 2024 17:35:33 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=u779ivq5; spf=pass (imf03.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733506533; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mvH4kzsoZtXZdP/hdZ9Z1gppWKj8Jm/JkgAURMQPD+g=; b=Zx4w5O9sqYHM7vbQRPWSL9si4X2bmkFUCnsHsTb607sxxKL3+jw+sNlLQw3Es3HSMO8RQT hnKgAE2izS1oY333U0yigsOYNvyZClhPG8IAwE7HHOCMVC6xO5mWqFtkxfLVKhVThh5ot3 9QlVUfQU0ZDk/eBry6riTSRNOnHG7ms= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733506533; a=rsa-sha256; cv=none; b=pkz7D7p6umj3iQdE+TK/yeE/fMdJe3qYnD0O3QYqwTQrSpuqWx03nmx0g61h9MQ9KIUguv OM+wz/8RcJudka2LtXRr1Q8uFkXVYRPn9+P1rTXVpQLy+5kQNUVLR4UdoXe3ByKy/zoH7u VqJ/5gbo3kYImhhzvV0el9hBH6E7mIw= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=u779ivq5; spf=pass (imf03.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 416365C6290; Fri, 6 Dec 2024 17:34:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A8CD5C4CED1; Fri, 6 Dec 2024 17:35:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733506539; bh=ml4wCe8+4O3h3jjAznKlCNTzTZpIkv4tLb/kiCGnGE0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=u779ivq5sT3CNXg9/WIqZL8Dhfjqt8n2kva4qkAiMmn7IuEBySJ8LKx9GgjsF/lCe ZMLQAaSaJETnegUlAVDhYuRE5qJlru62s6F7bPIwhlUCSKNJZssrehe+iicWOD2kGi HkNEYYV3B+anG1hyIQ2QsqO8qeOecvKHDDlAWhdC/CSp0XzF30cSmmSoWZPaHX/SPY QJIjiL8nCtqw0t1iKyM046Vhm6htmG9uTA7++pieHnxqGOGFEfRIzuZX2ag2iz279F o9aeg3bC8vJkMx68pPGQGvRJCEOqNz4goovWwZc2pQT5Xe6anX4GOVPmMzwAPn/68l XB0FlzqBOEgyg== Date: Fri, 6 Dec 2024 09:35:39 -0800 From: "Darrick J. Wong" To: Jens Axboe Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org, kirill@shutemov.name, bfoster@redhat.com Subject: Re: [PATCH 07/12] fs: add RWF_UNCACHED iocb and FOP_UNCACHED file_operations flag Message-ID: <20241206173539.GA7816@frogsfrogsfrogs> References: <20241203153232.92224-2-axboe@kernel.dk> <20241203153232.92224-9-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241203153232.92224-9-axboe@kernel.dk> X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7C02920012 X-Stat-Signature: 8asayz45prdathye7f7i75c3f1f9fxh8 X-Rspam-User: X-HE-Tag: 1733506533-905172 X-HE-Meta: U2FsdGVkX1+NPlmGyk0ucjK0V1kRi4n4yxey0k5WlNcxQApP5xNLUHqe+6ZZ+nkg6Fu1LE0kYCIGjvVgbJoefmihCsq/K71Jd7HINCxMSG1XZWQ+n2lx3SppGq6+oHUV10tBRdLv5Y83/fJ+1/rlCa4A0csMoevU7G/mCiKEJPcQvBHkXrJpHz8KX9Ro5YnsKXvx8Hf1BIqwf8iuCYsCkRfaCbTLy0tniFPSXnYVnYb629wTb4aUUm4hCai82YdbFxE2QYyhFWQc1CAqHnOg63cxc/KyKv3mKlsctAKpgCQ82Cn6vxWTdY+n7Zaqu3N8qxzUQV/b5akzFqpQEpDiYehm8wfHjgheEGNvaSxHOjUY+bFHglY5G9u1NqeTj1xG9fzmDg+EzEvxWMTCSFh9aE+e4VzFEwIi/2VkEzNYbTCmwh3729MrgzE0iuSjFBIayT4YkLlmf8DvHxlGjqEFCCM7p9VFcgFWG75EOjd4R8WckFDfrooJcN04lv26/gQlznemmsHfgbLLgucCqflI84H4R0JBaYr8YSF5H/WAvUNpeFzM8ffuLSX0Cr8yKdWACKNx38U0tboePLcYa9pFS7lFUL8o/P4GHY1O/t73DJBVnRX5radQ6CoTm9ZKme+7hVoNNI7ZCsk5TPJ6KTjq83CE5ObKj3+PKkYVF3hAgVNZEsKRh6hjQfzZd+sSza3ZK4fYv8GwsKuoq3oNodp+WlPFKhzA1oNYAhxUQTfR80vv86LCIjBewDebGHJEbTK9gk1qwZ8T4PZxXhZF9IOeBo+R6BiPBqLdUcSo63ZhuSIOJtXJXTsmwN+fbsD3R0BLMfQTUWMG59rNYEkGucbMDaDb1YrqSeAbp+zTpTk+XHl+UGMPl2zZTZSbtqMzIDXbiVU4Ee79+UlVVvMNqkDZl6d8EN1tcUnxpbWmYs0Z0RfO8mLb1V5epa7PAbvoayXhYf14qIMEJ97J8ahhxdj vfAcfNBr PrB5KkCMjo0tYopTkwwBfaU+e24CgnVpQep3e5rQR1kKr298Vm5KfEIPzrGHK3YYqDqYdzxU3S0N5fra4RZVYcOMqqOLNNFFZpcHq0U9PGWwTMp2MkgXYIlH560qHURr8ilUB4k9gOPT2j7gOvIlziRnct99uZDg5C3fE/PSoz9iRZs0E4YE3EgEhSQ8fSg1Ygo5clSIVG7hO/YFi0XdvUrYOlklExNQnhTdF3cqJuQja5YgyGAotI6J4jHC+0wBoanm7w+YhMLylfPR6aJQpdYfqfMqCD+vD64qvCeeUJW7geqaNpSPollBGDriK3V/jVhbvLQEsT/Yx5m0Isx0WmbzesXevOqR2FOht8S7WaG+ZJy9OaDVKhkoPBdxbSP7yKCJa X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 03, 2024 at 08:31:43AM -0700, Jens Axboe wrote: > If a file system supports uncached buffered IO, it may set FOP_UNCACHED > and enable RWF_UNCACHED. If RWF_UNCACHED is attempted without the file > system supporting it, it'll get errored with -EOPNOTSUPP. > > Signed-off-by: Jens Axboe > --- > include/linux/fs.h | 14 +++++++++++++- > include/uapi/linux/fs.h | 6 +++++- > 2 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 7e29433c5ecc..b64a78582f06 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -322,6 +322,7 @@ struct readahead_control; > #define IOCB_NOWAIT (__force int) RWF_NOWAIT > #define IOCB_APPEND (__force int) RWF_APPEND > #define IOCB_ATOMIC (__force int) RWF_ATOMIC > +#define IOCB_UNCACHED (__force int) RWF_UNCACHED > > /* non-RWF related bits - start at 16 */ > #define IOCB_EVENTFD (1 << 16) > @@ -356,7 +357,8 @@ struct readahead_control; > { IOCB_SYNC, "SYNC" }, \ > { IOCB_NOWAIT, "NOWAIT" }, \ > { IOCB_APPEND, "APPEND" }, \ > - { IOCB_ATOMIC, "ATOMIC"}, \ > + { IOCB_ATOMIC, "ATOMIC" }, \ > + { IOCB_UNCACHED, "UNCACHED" }, \ > { IOCB_EVENTFD, "EVENTFD"}, \ > { IOCB_DIRECT, "DIRECT" }, \ > { IOCB_WRITE, "WRITE" }, \ > @@ -2127,6 +2129,8 @@ struct file_operations { > #define FOP_UNSIGNED_OFFSET ((__force fop_flags_t)(1 << 5)) > /* Supports asynchronous lock callbacks */ > #define FOP_ASYNC_LOCK ((__force fop_flags_t)(1 << 6)) > +/* File system supports uncached read/write buffered IO */ > +#define FOP_UNCACHED ((__force fop_flags_t)(1 << 7)) > > /* Wrap a directory iterator that needs exclusive inode access */ > int wrap_directory_iterator(struct file *, struct dir_context *, > @@ -3614,6 +3618,14 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags, > if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE)) > return -EOPNOTSUPP; > } > + if (flags & RWF_UNCACHED) { Should FMODE_NOREUSE imply RWF_UNCACHED? I know, I'm dredging this up again from v3: https://lore.kernel.org/linux-fsdevel/ZzKn4OyHXq5r6eiI@dread.disaster.area/ but the manpage for fadvise says NOREUSE means "The specified data will be accessed only once." and I think that fits what you're doing here. And yeah, it's annoying that people keep asking for moar knobs to tweak io operations: Let's have a mount option, and a fadvise mode, and a fcntl mode, and finally per-io flags! (mostly kidding) Also, one of your replies referenced a poc to set UNCACHED on NOREUSE involving willy and yu. Where was that? I've found this: https://lore.kernel.org/linux-fsdevel/ZzI97bky3Rwzw18C@casper.infradead.org/ but that turned into a documentation discussion. There were also a few unanswered questions (imo) from the last few iterations of this patchset. If someone issues a lot of small appending uncached writes to a file, does that mean the writes and writeback will now be lockstepping each other to write out the folio? Or should programs simply not do that? What if I wanted to do a bunch of small writes to adjacent bytes, amortize writeback over a single disk io, and not wait for reclaim to drop the folio? Admittedly that doesn't really fit with "will be accessed only once" so I think "don't do that" is an acceptable answer. And, I guess if the application really wants fine-grained control then it /can/ still pwrite, sync_file_range, and fadvise(WONTNEED). Though that's three syscalls/uring ops/whatever. But that might be cheaper than repeated rewrites. --D > + /* file system must support it */ > + if (!(ki->ki_filp->f_op->fop_flags & FOP_UNCACHED)) > + return -EOPNOTSUPP; > + /* DAX mappings not supported */ > + if (IS_DAX(ki->ki_filp->f_mapping->host)) > + return -EOPNOTSUPP; > + } > kiocb_flags |= (__force int) (flags & RWF_SUPPORTED); > if (flags & RWF_SYNC) > kiocb_flags |= IOCB_DSYNC; > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h > index 753971770733..dc77cd8ae1a3 100644 > --- a/include/uapi/linux/fs.h > +++ b/include/uapi/linux/fs.h > @@ -332,9 +332,13 @@ typedef int __bitwise __kernel_rwf_t; > /* Atomic Write */ > #define RWF_ATOMIC ((__force __kernel_rwf_t)0x00000040) > > +/* buffered IO that drops the cache after reading or writing data */ > +#define RWF_UNCACHED ((__force __kernel_rwf_t)0x00000080) > + > /* mask of flags supported by the kernel */ > #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ > - RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC) > + RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC |\ > + RWF_UNCACHED) > > #define PROCFS_IOCTL_MAGIC 'f' > > -- > 2.45.2 > >