From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F7AAD41D41 for ; Tue, 12 Nov 2024 00:57:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 961376B0095; Mon, 11 Nov 2024 19:57:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9112B6B00E9; Mon, 11 Nov 2024 19:57:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78B4E6B00F4; Mon, 11 Nov 2024 19:57:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5A8C66B0095 for ; Mon, 11 Nov 2024 19:57:11 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E29F981BA6 for ; Tue, 12 Nov 2024 00:57:10 +0000 (UTC) X-FDA: 82775627886.12.6D75CD6 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf09.hostedemail.com (Postfix) with ESMTP id 7C50C140005 for ; Tue, 12 Nov 2024 00:56:40 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=xfXTWMtq; spf=pass (imf09.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731372856; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZUwyokbteUzniHEENjrKpRRTwbWjJZjJmULNQblWcik=; b=hbtG9+AmE49TlF1r5M4AjqJTskz8Qu2u4RBFMmuUp4swh+zgHheKrxbrwIzQmUpbm1uzJ3 n5qIz4YAP4hOXw3kZr3OxjrEwdCfGU/YrykpP4co79skayVB/6w0Fm7Mm4VDCdSihTpesJ o/0XOlwogjLoGJ0N8k4LCXeUBM+GiEI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=xfXTWMtq; spf=pass (imf09.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731372856; a=rsa-sha256; cv=none; b=opfXtGSFBkxyz/XrqeopATfhcfK2DgJqE7gPceOlnsYqRFoCUg8ydUTiuOm5AIbKV224E9 eQThCrZ5ffYJOtlD9rfCEVoA6dD5iZ7pMpkK5qiI7n91/4e9LAKTDhT+poAFILnXsfq0+D von/Fgf7NzQ97nr/OCQARhqmTzQql9c= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-20c693b68f5so52307785ad.1 for ; Mon, 11 Nov 2024 16:57:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1731373027; x=1731977827; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ZUwyokbteUzniHEENjrKpRRTwbWjJZjJmULNQblWcik=; b=xfXTWMtqVCxjRKMtVNkwJVQb+MzF+boDkFRJDU3IYuGKBiT5THSJCtoKkP9o1c5dxk xutYOnZAL39S43lrfknsixwueU5zEb3wRHWoRJ4gqD4s4V8LZHIe0kr2DlxZlHTPYlKm Ovheo1VFQP4SawAYybeydvLOCnIy0CI2vade3qnH0CLkGWbkPMAWUknQo9yqLRGJLAdr oVG9cEkcV6monPYUVk8kIGigQf3a/1H9CoBDGVQ/6SyiwMm4NQPcsfr8VSZMYZGvXF2l 2dMM7hq5nRnCtRM4uD9r7Z4emMnUT8gMlK4VICM+FFr1zabtLHfVPyu7ghqS18AcTipY kHzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731373027; x=1731977827; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ZUwyokbteUzniHEENjrKpRRTwbWjJZjJmULNQblWcik=; b=jj7TKQ1cpzj/MtNaxw0g0iBpKKERdVj4e4W3aW9HLOSLacl887lp42R4XgUKuT64Fw EKYQdZCRFU6xeyVIhEDyHY4SlB210IC+wjtXOn2CYuE0W6ZAq91OAWBmkK8OwXejmdCy nZ7NSJCwARcJsrf0BNhG0rBkpQ4nR69jWNUyYRRtmj1c6Bjno1PmagPCTyr/sGbXOWiB r5Xc28qk1Ujtw8zv1+9wXWqMYsZHbJQPwY8TUwvmjFNmfaFIeD3oAkclbxhO6vWHn/n1 +dInT+BmZgcZIzUcY6FP0b9LXItOWWfPxIIf/EYMAiMQ/uZONJD9+VaS3X4LFkl4+jH9 5G0w== X-Gm-Message-State: AOJu0Ywip1XYHJqQzq0Dov7CYOabi3vEWlDsuXdIFZ58l/r19oJP8lxP salS/jvuAuFc2/Oso01f4cw8EgFXji2fQH+2xYwcpUNGEjYxMWNF4Eef4OUDQ7U= X-Google-Smtp-Source: AGHT+IG6t7qQXVQ3LjFCpVZCH1u4sMB615TBElvGr3S07x/zbJIq4pbcacpDwlnZyNM4UL5MtKIurg== X-Received: by 2002:a17:902:d50c:b0:20b:449c:8978 with SMTP id d9443c01a7336-21183559ba4mr207039635ad.31.1731373027524; Mon, 11 Nov 2024 16:57:07 -0800 (PST) Received: from dread.disaster.area (pa49-186-86-168.pa.vic.optusnet.com.au. [49.186.86.168]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21177e6a3f4sm81430715ad.234.2024.11.11.16.57.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Nov 2024 16:57:07 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1tAfDM-00DQ0B-1Q; Tue, 12 Nov 2024 11:57:04 +1100 Date: Tue, 12 Nov 2024 11:57:04 +1100 From: Dave Chinner To: Jens Axboe Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org, kirill@shutemov.name, linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Subject: Re: [PATCH 10/16] mm/filemap: make buffered writes work with RWF_UNCACHED Message-ID: References: <20241111234842.2024180-1-axboe@kernel.dk> <20241111234842.2024180-11-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241111234842.2024180-11-axboe@kernel.dk> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 7C50C140005 X-Stat-Signature: zabcnnq1hujreyg1eiijus1u418dpi4i X-Rspam-User: X-HE-Tag: 1731373000-399121 X-HE-Meta: U2FsdGVkX18k8auWjxrTdJ8pPoWUGh368qCG+5yVDv46cj7GEFXcf+KL3VGY+89a2VT2Lhq+b9U8ZULqbOtylk/KFEFGzbYYmVVMjAFVC7KaDLwvC4/AYDLudVnuNEJLOZkl9ccZqmCrHRJ1zziV/4NekaRIgLKA8m7kmzrLzsp1+Xl3XtUK5/81zMML09iIfPm+eyLDEHMMJPdeeHgiVTK8RGCs/T5JLjFdp9GVqOFq9ipO/AeRRqEGs9ALgyvEeANJnYCb0notSoEsi34SMuvhQ6jB4rfreIOQcsnapNPQaWDUqvcINCiVBTGJtb1sIRMAsYOTx84k8/Gdk497PMx8UUw7ZiHHFzyPtQ9zTRFNvq1Dvz9ECDwAeKteLVHJ+yi00ELHfHEhKdwBchevpNcZysf9rITwnsm6qGmTauY2mjWkjO2sfmHtie0f15obUWnqdVbWgWhEjWbM3icnXC5PltqFPt9fcp59Mc/N1ZlpVOLV7j8NpzyVW3Bjmuoh8WQzpqHZOAFHQdD1d6tUPjcW/Qce3rjiqOcSlbbp//40E/DpbLTx1r/kQEYfCNpNHrI429tkBBiZOUEtRSMyX+7HeRGhQLN3KqQqmQRRp4cGYO1nvRmViCnnO/51wlIveHoVDUQTQuC+eB3FVPEECEnYk1Tv/af67ctiU+7rN+k2jiNvwEArZ1Bl1aFmyHgWObUglTHwj+E/n4GEAuBdAh0eRsoPZqpPjab+EoUvbTtUQlAF+XglCu1yTECOUI3twBuENUmbDRrEYMMeYWImBaJpgkrBiQr83c6Rm/6p2y35LOln649VySk7xKM2c+U3jRSfIkiC8sCxaoQlfFVHKAyHmc4qnGBoVfSk3ZahOUkDzsD1bhbWYEK9sEk8HWM9wsttkO8ewWm/6lmhrBSGhxo7/mekSm8ZGJUF4o0V1ELVCl5UooTTEQXYACu5tJ6Ghi1b8iD6lvTmbtwg82s gq0TMzJm QEo+kicxCBqBsXPvBPVyAWWilFaW6EN993PX+uc5lil1YAtptsktYJxjYTZzP5ql3xT4SZFF6aDSqD5Q0CEw9j3I2LMC39XWT26Wavl+g9uLVYvZueD59VkYZXgeYWUjs4MRZmzEOp6ij94gtQAqNNHMT4aBQa400FCJFMxOpsxJIXhAhbzvpNxf5jJ5NyaFV8eylgrrqOtWeuo02flHoTMRYt5lhGrUkWhTGEQr8+b9lb50FwhcjkhWfTaKi6p/QU2hFcpHxzxaTq26XszvBqju+1BtlIIZBSFpND3aIUEvw6oqc0dGAC4FkyD+TQDTVNYALhfjInMsX4PiPTxE7yz5eG6oNpw4w0ET00F1dphRKNOufTIa56LDosef9EwMwbim+XfwTTQ6ba8F8LjMaJjGRonfUXe9DjO5D58s8vgnFYEYJTsczYxwNVlU7A5FtB53w0bN40d972eOqORjdYZPYXA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 11, 2024 at 04:37:37PM -0700, Jens Axboe wrote: > If RWF_UNCACHED is set for a write, mark new folios being written with > uncached. This is done by passing in the fact that it's an uncached write > through the folio pointer. We can only get there when IOCB_UNCACHED was > allowed, which can only happen if the file system opts in. Opting in means > they need to check for the LSB in the folio pointer to know if it's an > uncached write or not. If it is, then FGP_UNCACHED should be used if > creating new folios is necessary. > > Uncached writes will drop any folios they create upon writeback > completion, but leave folios that may exist in that range alone. Since > ->write_begin() doesn't currently take any flags, and to avoid needing > to change the callback kernel wide, use the foliop being passed in to > ->write_begin() to signal if this is an uncached write or not. File > systems can then use that to mark newly created folios as uncached. > > Add a helper, generic_uncached_write(), that generic_file_write_iter() > calls upon successful completion of an uncached write. This doesn't implement an "uncached" write operation. This implements a cache write-through operation. We've actually been talking about this for some time as a desirable general buffered write trait on fast SSDs. Excessive write-behind caching is a real problem in general, especially when doing streaming sequential writes to pcie 4 and 5 nvme SSDs that can do more than 7GB/s to disk. When the page cache fills up, we see all the same problems you are trying to work around in this series with "uncached" writes. IOWS, what we really want is page cache write-through as an automatic feature for buffered writes. > @@ -70,6 +71,34 @@ static inline int filemap_write_and_wait(struct address_space *mapping) > return filemap_write_and_wait_range(mapping, 0, LLONG_MAX); > } > > +/* > + * generic_uncached_write - start uncached writeback > + * @iocb: the iocb that was written > + * @written: the amount of bytes written > + * > + * When writeback has been handled by write_iter, this helper should be called > + * if the file system supports uncached writes. If %IOCB_UNCACHED is set, it > + * will kick off writeback for the specified range. > + */ > +static inline void generic_uncached_write(struct kiocb *iocb, ssize_t written) > +{ > + if (iocb->ki_flags & IOCB_UNCACHED) { > + struct address_space *mapping = iocb->ki_filp->f_mapping; > + > + /* kick off uncached writeback */ > + __filemap_fdatawrite_range(mapping, iocb->ki_pos, > + iocb->ki_pos + written, WB_SYNC_NONE); > + } > +} Yup, this is basically write-through. > + > +/* > + * Value passed in to ->write_begin() if IOCB_UNCACHED is set for the write, > + * and the ->write_begin() handler on a file system supporting FOP_UNCACHED > + * must check for this and pass FGP_UNCACHED for folio creation. > + */ > +#define foliop_uncached ((struct folio *) 0xfee1c001) > +#define foliop_is_uncached(foliop) (*(foliop) == foliop_uncached) > + > /** > * filemap_set_wb_err - set a writeback error on an address_space > * @mapping: mapping in which to set writeback error > diff --git a/mm/filemap.c b/mm/filemap.c > index 40debe742abe..0d312de4e20c 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -430,6 +430,7 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, > > return filemap_fdatawrite_wbc(mapping, &wbc); > } > +EXPORT_SYMBOL_GPL(__filemap_fdatawrite_range); > > static inline int __filemap_fdatawrite(struct address_space *mapping, > int sync_mode) > @@ -4076,7 +4077,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) > ssize_t written = 0; > > do { > - struct folio *folio; > + struct folio *folio = NULL; > size_t offset; /* Offset into folio */ > size_t bytes; /* Bytes to write to folio */ > size_t copied; /* Bytes copied from user */ > @@ -4104,6 +4105,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) > break; > } > > + /* > + * If IOCB_UNCACHED is set here, we now the file system > + * supports it. And hence it'll know to check folip for being > + * set to this magic value. If so, it's an uncached write. > + * Whenever ->write_begin() changes prototypes again, this > + * can go away and just pass iocb or iocb flags. > + */ > + if (iocb->ki_flags & IOCB_UNCACHED) > + folio = foliop_uncached; > + > status = a_ops->write_begin(file, mapping, pos, bytes, > &folio, &fsdata); > if (unlikely(status < 0)) > @@ -4234,8 +4245,10 @@ ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > ret = __generic_file_write_iter(iocb, from); > inode_unlock(inode); > > - if (ret > 0) > + if (ret > 0) { > + generic_uncached_write(iocb, ret); > ret = generic_write_sync(iocb, ret); Why isn't the writethrough check inside generic_write_sync()? Having to add it to every filesystem that supports write-through is unwieldy. If the IO is DSYNC or SYNC, we're going to run WB_SYNC_ALL writeback through the generic_write_sync() path already, so the only time we actually want to run WB_SYNC_NONE write-through here is if the iocb is not marked as dsync. Hence I think this write-through check should be done conditionally inside generic_write_sync(), not in addition to the writeback generic_write_sync() might need to do... That also gives us a common place for adding cache write-through trigger logic (think writebehind trigger logic similar to readahead) and this is also a place where we could automatically tag mapping ranges for reclaim on writeback completion.... -Dave. -- Dave Chinner david@fromorbit.com