From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE9F3C5B543 for ; Thu, 5 Jun 2025 17:30:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB7FF6B00BD; Thu, 5 Jun 2025 13:30:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A688B6B00BE; Thu, 5 Jun 2025 13:30:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 930426B00BF; Thu, 5 Jun 2025 13:30:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6C06D6B00BD for ; Thu, 5 Jun 2025 13:30:36 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1D9011410B9 for ; Thu, 5 Jun 2025 17:30:36 +0000 (UTC) X-FDA: 83522036472.07.A799E87 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf10.hostedemail.com (Postfix) with ESMTP id 21FB4C0004 for ; Thu, 5 Jun 2025 17:30:33 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fAygtQ81; spf=pass (imf10.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749144634; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cAZDQBmtH9vtyMCWGx5DoW75+DpM1qiPsTcYVOdKgr8=; b=jp1DpQGEF/jxRaDyTteXTowqGPnBbnhwNuWIVBz9Pn5Wr9NPSeakr07GqwmLGx1EIpFYFX 3Bhh5ItrDsveodQNrreMNVcVekBZBilpy+Y6N83t946VJDkff36zVfi+vjKW4CJQEqCMnW Dp9rP6OA0BeG+kHSavpaseNC0QoIuoQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fAygtQ81; spf=pass (imf10.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749144634; a=rsa-sha256; cv=none; b=ZfiItVhZRU8MJgh/651ru8rdiRtvnwODQOpzwMmV0T7c0MnCUlIhy+gtnJW62mWX4mSaqe uAu345I8EgTs1ntnJLGSISpefFPCD5HO7wsTj7WZ3v6OtYHz/k1TaHtQDXi8BH2DP4UgDy ojPaZ4If428fWOs4EeHkwI4CwGLHSPQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1749144633; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cAZDQBmtH9vtyMCWGx5DoW75+DpM1qiPsTcYVOdKgr8=; b=fAygtQ814ootFH2OYS41jtrbgQt1kiH0KpHlHXZX7ArZla4H9sxqROewtgJi/q96Nxi7tO VBr5gal83cByyw6XKD4tbjzCGnOn894F44O1mCbrkoUMHNqc3fbGb72SdSO0JCp4n0/u8V d5NJrTfSycuB72LgrptElSdRs6JCnac= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-582-8A6U9k8zPey-mRS8WgmvFw-1; Thu, 05 Jun 2025 13:30:32 -0400 X-MC-Unique: 8A6U9k8zPey-mRS8WgmvFw-1 X-Mimecast-MFC-AGG-ID: 8A6U9k8zPey-mRS8WgmvFw_1749144631 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 349DF180045B; Thu, 5 Jun 2025 17:30:31 +0000 (UTC) Received: from bfoster.redhat.com (unknown [10.22.88.123]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 78F9630002C0; Thu, 5 Jun 2025 17:30:30 +0000 (UTC) From: Brian Foster To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 3/7] iomap: optional zero range dirty folio processing Date: Thu, 5 Jun 2025 13:33:53 -0400 Message-ID: <20250605173357.579720-4-bfoster@redhat.com> In-Reply-To: <20250605173357.579720-1-bfoster@redhat.com> References: <20250605173357.579720-1-bfoster@redhat.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 21FB4C0004 X-Stat-Signature: kcmm5yexhpxqzfzd4hk4c6jej1gqokx8 X-Rspam-User: X-HE-Tag: 1749144633-726145 X-HE-Meta: U2FsdGVkX1/7EWdBomOX5HTu3S+X0MrPnDBm0JGKXS/Wv8wbqvf/8ka6J81fycnwk+42ssC29Uq8TgRoIPaoVcZIY/SUf+zT7zu2RoDSkJWL3ShadOKFrULnGonY+MpWQXF2is4GHenYAgSR5zLb5hOgHYIDZPW+sNEoHuBrvWLRdQOJD2ySiPML0Hw4EK3l0TJDNhVkB3gPlJgf0k54FmUQSviddNKHN+w3DA2NK7JZ9Xah8O2zTSewy+TNS5tFBLXtqjnJdyHeTUmvOji9X3sXdlAU8b7wviou0iUpVfhdwHElVBjjVoBVrUv/a/iI3I5MPM6TUtiC44QfDxUXClEgomg2tBdWHrvTEnAeMPWEsTs15+RaY1g+ZwDyayNtBKUYvhqMU/C6FF1Hav7OmcVcPgi99wE429qsObYsLyPU+n4CEIWeRwjQnBzltPIC5I9uMQTkjYKLE1+vKnd3k6ylL3RuP/S2IqlEbZXwr8Sl9IgyTXloSjo4hWZPOmT2RmIsCwCQTKYRO0jOVEQ/Vmgtjp33CgHkYSK+JljwUSwWYoh+Cwd0LP3OqjwEGc6JS9dKYPG4ACGJKvSy1RGvEOTTjncGaYHYYPk9Nyyj5uU4hrJtfAvh50mJO4vFl4Ija949/fwtpiuI5mOu3EfAFk667Ry0MDUje/9P52mvwr9Bl4mhp7hvCqDv5e303KXa82I/uCC1eGgxuZNlBqKsvi6BNy+mRiNEsNfW1kqXdLd1fz6rXrJBqtErfl4ByYxlfD0UPop1IrJ6iFs06tLI/oNFOl1xh0lXr1l32YUkS+Z06Q324cfNdkPWaIOU8StccpYFasgkWCeHOdhYmygQTEm8WMnw3Z8p2G+JXTkZieweuW7HOCNKBCkWZmrOMqY55d6R+l+iHmyjtJccAvusXqiN0T5e5UCsVQVe/kcMjOr8xNVo61Kt2S8666MLNvw5Fwn3F4e201+zmbmChKc c9IQ2Vlt x5wtmBHv9kPUBhWYR+rxa2w8e70hMFwqcpFvf+N5gHec1mULe9zSamMJxoxcSWnLBpJiVo+nUi6KleaNoxlEO+2ktW23ozAl3vMMWC7yXjVFH1J/tKuB76qWcz8Dj26BD1O3Gm1dh3vuRAY/HSp624zD70mlk5IPubiE357XeQZlBKHQ6SK+U0FdhDkRA44NEOvdUlV/quZswbohSoP/NkSQM2U4+OYSFLNh34CjDeHNtUZ70Rom3J53z6DCr3rrGkr/QhpdMYU7ilAhQ8z4j6oUKf/2rK7UzJV8rcS8Y4pQmrknU6VRxEpWeQNt/1vnlZUDaNE1mWRQ5NHk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The only way zero range can currently process unwritten mappings with dirty pagecache is to check whether the range is dirty before mapping lookup and then flush when at least one underlying mapping is unwritten. This ordering is required to prevent iomap lookup from racing with folio writeback and reclaim. Since zero range can skip ranges of unwritten mappings that are clean in cache, this operation can be improved by allowing the filesystem to provide a set of dirty folios that require zeroing. In turn, rather than flush or iterate file offsets, zero range can iterate on folios in the batch and advance over clean or uncached ranges in between. Add a folio_batch in struct iomap and provide a helper for fs' to populate the batch at lookup time. Update the folio lookup path to return the next folio in the batch, if provided, and advance the iter if the folio starts beyond the current offset. Signed-off-by: Brian Foster --- fs/iomap/buffered-io.c | 73 +++++++++++++++++++++++++++++++++++++++--- fs/iomap/iter.c | 6 ++++ include/linux/iomap.h | 4 +++ 3 files changed, 78 insertions(+), 5 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 16499655e7b0..cf2f4f869920 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -750,6 +750,16 @@ static struct folio *__iomap_get_folio(struct iomap_iter *iter, size_t len) if (!mapping_large_folio_support(iter->inode->i_mapping)) len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos)); + if (iter->fbatch) { + struct folio *folio = folio_batch_next(iter->fbatch); + + if (folio) { + folio_get(folio); + folio_lock(folio); + } + return folio; + } + if (folio_ops && folio_ops->get_folio) return folio_ops->get_folio(iter, pos, len); else @@ -811,6 +821,8 @@ static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop, int status = 0; len = min_not_zero(len, *plen); + *foliop = NULL; + *plen = 0; if (fatal_signal_pending(current)) return -EINTR; @@ -819,6 +831,12 @@ static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop, if (IS_ERR(folio)) return PTR_ERR(folio); + /* no folio means we're done with a batch */ + if (!folio) { + WARN_ON_ONCE(!iter->fbatch); + return 0; + } + /* * Now we have a locked folio, before we do anything with it we need to * check that the iomap we have cached is not stale. The inode extent @@ -839,6 +857,20 @@ static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop, } } + /* + * folios in a batch may not be contiguous. If we've skipped forward, + * advance the iter to the pos of the current folio. If the folio starts + * beyond the end of the mapping, it may have been trimmed since the + * lookup for whatever reason. Return a NULL folio to terminate the op. + */ + if (folio_pos(folio) > iter->pos) { + len = min_t(u64, folio_pos(folio) - iter->pos, + iomap_length(iter)); + status = iomap_iter_advance(iter, &len); + if (status || !len) + goto out_unlock; + } + pos = iomap_trim_folio_range(iter, folio, poffset, &len); BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length); if (srcmap != &iter->iomap) @@ -1380,6 +1412,12 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) if (iter->iomap.flags & IOMAP_F_STALE) break; + /* a NULL folio means we're done with a folio batch */ + if (!folio) { + status = iomap_iter_advance_full(iter); + break; + } + /* warn about zeroing folios beyond eof that won't write back */ WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size); @@ -1401,6 +1439,26 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) return status; } +loff_t +iomap_fill_dirty_folios( + struct iomap_iter *iter, + loff_t offset, + loff_t length) +{ + struct address_space *mapping = iter->inode->i_mapping; + pgoff_t start = offset >> PAGE_SHIFT; + pgoff_t end = (offset + length - 1) >> PAGE_SHIFT; + + iter->fbatch = kmalloc(sizeof(struct folio_batch), GFP_KERNEL); + if (!iter->fbatch) + return offset + length; + folio_batch_init(iter->fbatch); + + filemap_get_folios_dirty(mapping, &start, end, iter->fbatch); + return (start << PAGE_SHIFT); +} +EXPORT_SYMBOL_GPL(iomap_fill_dirty_folios); + int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops, void *private) @@ -1429,7 +1487,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, * flushing on partial eof zeroing, special case it to zero the * unaligned start portion if already dirty in pagecache. */ - if (off && + if (!iter.fbatch && off && filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) { iter.len = plen; while ((ret = iomap_iter(&iter, ops)) > 0) @@ -1445,13 +1503,18 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, * if dirty and the fs returns a mapping that might convert on * writeback. */ - range_dirty = filemap_range_needs_writeback(inode->i_mapping, - iter.pos, iter.pos + iter.len - 1); + range_dirty = filemap_range_needs_writeback(mapping, iter.pos, + iter.pos + iter.len - 1); while ((ret = iomap_iter(&iter, ops)) > 0) { const struct iomap *srcmap = iomap_iter_srcmap(&iter); - if (srcmap->type == IOMAP_HOLE || - srcmap->type == IOMAP_UNWRITTEN) { + if (WARN_ON_ONCE(iter.fbatch && + srcmap->type != IOMAP_UNWRITTEN)) + return -EIO; + + if (!iter.fbatch && + (srcmap->type == IOMAP_HOLE || + srcmap->type == IOMAP_UNWRITTEN)) { s64 status; if (range_dirty) { diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c index 6ffc6a7b9ba5..89bd5951a6fd 100644 --- a/fs/iomap/iter.c +++ b/fs/iomap/iter.c @@ -9,6 +9,12 @@ static inline void iomap_iter_reset_iomap(struct iomap_iter *iter) { + if (iter->fbatch) { + folio_batch_release(iter->fbatch); + kfree(iter->fbatch); + iter->fbatch = NULL; + } + iter->status = 0; memset(&iter->iomap, 0, sizeof(iter->iomap)); memset(&iter->srcmap, 0, sizeof(iter->srcmap)); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 522644d62f30..0b9b460b2873 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -9,6 +9,7 @@ #include #include #include +#include struct address_space; struct fiemap_extent_info; @@ -239,6 +240,7 @@ struct iomap_iter { unsigned flags; struct iomap iomap; struct iomap srcmap; + struct folio_batch *fbatch; void *private; }; @@ -345,6 +347,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio); int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); +loff_t iomap_fill_dirty_folios(struct iomap_iter *iter, loff_t offset, + loff_t length); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops, void *private); int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, -- 2.49.0