* [PATCH v3 0/3] tmpfs: zero post-eof ranges on file extension
@ 2025-11-21 15:22 Brian Foster
2025-11-21 15:22 ` [PATCH v3 1/3] tmpfs: factor out folio zeroing logic at writeout time Brian Foster
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Brian Foster @ 2025-11-21 15:22 UTC (permalink / raw)
To: linux-mm; +Cc: Hugh Dickins, Baolin Wang
Hi all,
Apologies for the quicker than expected respin here, but during
discussion with Baolin on v2 it became apparent that I factored the
first two patches in an unnecessarily confusing way, and I wanted to get
that resolved more quickly. Otherwise there are a few smaller changes
here that I think are incremental improvements.
Baolin, I left off the R-b tag from v2 just out of caution and how I
reworked the first two patches. The end result is mostly similar, but
I'd appreciate a quick look before adding that back. Otherwise most of
the functionality here should be the same as v2.
Thoughts, reviews, flames appreciated.
Brian
v3:
- Reorder changes in patches 1-2 to factor out existing logic first.
- Use local scope variable in shmem_writeout() fallocend block.
- Tweak combined zeroing logic in shmem_writeout().
- Clean up commit logs.
v2: https://lore.kernel.org/linux-mm/20251112162522.412295-1-bfoster@redhat.com/
- Rework to zero uptodate post-eof folios on writeout and full range
from EOF on size extension.
- Misc. cleanups: variable renames (pos -> end), code relocation,
comment cleanups/removal.
- Update commit log to call out POSIX requirement.
v1: https://lore.kernel.org/linux-mm/20250625184930.269727-1-bfoster@redhat.com/
Brian Foster (3):
tmpfs: factor out folio zeroing logic at writeout time
tmpfs: zero post-eof folio ranges on swapout
tmpfs: zero post-eof ranges on file extension
mm/shmem.c | 135 ++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 114 insertions(+), 21 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/3] tmpfs: factor out folio zeroing logic at writeout time
2025-11-21 15:22 [PATCH v3 0/3] tmpfs: zero post-eof ranges on file extension Brian Foster
@ 2025-11-21 15:22 ` Brian Foster
2025-11-21 15:22 ` [PATCH v3 2/3] tmpfs: zero post-eof folio ranges on swapout Brian Foster
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2025-11-21 15:22 UTC (permalink / raw)
To: linux-mm; +Cc: Hugh Dickins, Baolin Wang
tmpfs currently zeroes !uptodate folios at shmem_writeout() time to
ensure they are zeroed in swap. We want to expand this behavior to zero
post-eof ranges to better abide POSIX file extension requirements. As a
first step, split out the existing zeroing code into a separate block.
No functional changes in this patch.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
mm/shmem.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 0a25ee095b86..651602460770 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1626,22 +1626,23 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
* good idea to continue anyway, once we're pushing into swap. So
* reactivate the folio, and let shmem_fallocate() quit when too many.
*/
+ if (!folio_test_uptodate(folio) && inode->i_private) {
+ struct shmem_falloc *shmem_falloc;
+ spin_lock(&inode->i_lock);
+ shmem_falloc = inode->i_private;
+ if (shmem_falloc &&
+ !shmem_falloc->waitq &&
+ index >= shmem_falloc->start &&
+ index < shmem_falloc->next)
+ shmem_falloc->nr_unswapped += nr_pages;
+ else
+ shmem_falloc = NULL;
+ spin_unlock(&inode->i_lock);
+ if (shmem_falloc)
+ goto redirty;
+ }
+
if (!folio_test_uptodate(folio)) {
- if (inode->i_private) {
- struct shmem_falloc *shmem_falloc;
- spin_lock(&inode->i_lock);
- shmem_falloc = inode->i_private;
- if (shmem_falloc &&
- !shmem_falloc->waitq &&
- index >= shmem_falloc->start &&
- index < shmem_falloc->next)
- shmem_falloc->nr_unswapped += nr_pages;
- else
- shmem_falloc = NULL;
- spin_unlock(&inode->i_lock);
- if (shmem_falloc)
- goto redirty;
- }
folio_zero_range(folio, 0, folio_size(folio));
flush_dcache_folio(folio);
folio_mark_uptodate(folio);
--
2.51.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 2/3] tmpfs: zero post-eof folio ranges on swapout
2025-11-21 15:22 [PATCH v3 0/3] tmpfs: zero post-eof ranges on file extension Brian Foster
2025-11-21 15:22 ` [PATCH v3 1/3] tmpfs: factor out folio zeroing logic at writeout time Brian Foster
@ 2025-11-21 15:22 ` Brian Foster
2025-11-21 15:22 ` [PATCH v3 3/3] tmpfs: zero post-eof ranges on file extension Brian Foster
2025-11-25 2:20 ` [PATCH v3 0/3] " Baolin Wang
3 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2025-11-21 15:22 UTC (permalink / raw)
To: linux-mm; +Cc: Hugh Dickins, Baolin Wang
Zero post-eof folios at swap out time to help preserve POSIX
requirements and facilitate efficient post-eof zeroing during
extending operations. This uses an analogous approach to pagecache
writeback to ensure post-eof ranges are zeroed on-disk.
This facilitates file extension zeroing by allowing it to skip
swapped out folios that might have been preallocated beyond the
original i_size.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
mm/shmem.c | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 651602460770..97ca2b3dd1b9 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1577,6 +1577,8 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
struct inode *inode = mapping->host;
struct shmem_inode_info *info = SHMEM_I(inode);
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
+ loff_t i_size = i_size_read(inode);
+ pgoff_t end_index = DIV_ROUND_UP(i_size, PAGE_SIZE);
pgoff_t index;
int nr_pages;
bool split = false;
@@ -1596,9 +1598,9 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
* (unless fallocate has been used to preallocate beyond EOF).
*/
if (folio_test_large(folio)) {
- index = shmem_fallocend(inode,
- DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE));
- if ((index > folio->index && index < folio_next_index(folio)) ||
+ pgoff_t fallocend = shmem_fallocend(inode, end_index);
+ if ((fallocend > folio->index &&
+ fallocend < folio_next_index(folio)) ||
!IS_ENABLED(CONFIG_THP_SWAP))
split = true;
}
@@ -1642,8 +1644,20 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
goto redirty;
}
- if (!folio_test_uptodate(folio)) {
- folio_zero_range(folio, 0, folio_size(folio));
+ /*
+ * Ranges beyond EOF must be zeroed at writeout time. This mirrors
+ * traditional writeback behavior and facilitates zeroing on file size
+ * changes without having to swap back in.
+ */
+ if (!folio_test_uptodate(folio) ||
+ folio_next_index(folio) >= end_index) {
+ size_t from = offset_in_folio(folio, i_size);
+
+ if (!folio_test_uptodate(folio) || index >= end_index)
+ folio_zero_segment(folio, 0, folio_size(folio));
+ else if (from)
+ folio_zero_segment(folio, from, folio_size(folio));
+
flush_dcache_folio(folio);
folio_mark_uptodate(folio);
}
--
2.51.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 3/3] tmpfs: zero post-eof ranges on file extension
2025-11-21 15:22 [PATCH v3 0/3] tmpfs: zero post-eof ranges on file extension Brian Foster
2025-11-21 15:22 ` [PATCH v3 1/3] tmpfs: factor out folio zeroing logic at writeout time Brian Foster
2025-11-21 15:22 ` [PATCH v3 2/3] tmpfs: zero post-eof folio ranges on swapout Brian Foster
@ 2025-11-21 15:22 ` Brian Foster
2025-11-25 2:20 ` [PATCH v3 0/3] " Baolin Wang
3 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2025-11-21 15:22 UTC (permalink / raw)
To: linux-mm; +Cc: Hugh Dickins, Baolin Wang
POSIX requires that "If the file size is increased, the extended
area shall appear as if it were zero-filled". It is possible to use
mmap to write past EOF and that data will become visible instead of
zeroes. This behavior is reproduced by fstests test generic/363.
Most traditional filesystems zero any post-eof portion of a folio at
writeback time or when the file size is extended by truncate or
extending writes. This ensures that the previously post-eof range of
the folio is zeroed before it is exposed to the file.
The tmpfs writeout path has been updated to zero post-eof folio
ranges similar to traditional writeback. This ensures post-eof
ranges are zeroed "on disk" and allows size extension zeroing to
skip over swap entries as they are already appropriately zeroed.
To that end, introduce a new zeroing helper for proper zeroing on
file extending operations. This looks up resident folios between the
original and new eof and for those that are uptodate, zeroes them
before the associated ranges are exposed to the file. This preserves
POSIX semantics and allows generic/363 to pass on tmpfs.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
mm/shmem.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 79 insertions(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 97ca2b3dd1b9..81dd2bfb0444 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1101,6 +1101,78 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
return folio;
}
+/*
+ * Zero a post-EOF range about to be exposed by size extension. Zero from the
+ * current i_size through lend, the latter of which typically refers to the
+ * start offset of an extending operation. Skip swap entries because associated
+ * folios were zeroed at swapout time.
+ */
+static void shmem_zero_eof(struct inode *inode, loff_t lend)
+{
+ struct address_space *mapping = inode->i_mapping;
+ loff_t lstart = i_size_read(inode);
+ pgoff_t index = (lstart + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ pgoff_t end = lend >> PAGE_SHIFT;
+ struct folio_batch fbatch;
+ struct folio *folio;
+ int i;
+ bool same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
+
+ folio = filemap_lock_folio(mapping, lstart >> PAGE_SHIFT);
+ if (!IS_ERR(folio)) {
+ same_folio = lend < folio_next_pos(folio);
+ index = folio_next_index(folio);
+
+ if (folio_test_uptodate(folio)) {
+ size_t from = offset_in_folio(folio, lstart);
+ size_t len = min_t(loff_t, folio_size(folio) - from,
+ lend - lstart);
+
+ folio_zero_range(folio, from, len);
+ }
+
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+
+ if (!same_folio) {
+ folio = filemap_lock_folio(mapping, lend >> PAGE_SHIFT);
+ if (!IS_ERR(folio)) {
+ end = folio->index;
+
+ if (folio_test_uptodate(folio)) {
+ size_t len = lend - folio_pos(folio);
+ folio_zero_range(folio, 0, len);
+ }
+
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ }
+
+ /*
+ * Zero uptodate folios fully within the target range. Uptodate folios
+ * beyond EOF are generally unexpected, but can exist if a larger
+ * falloc'd and uptodate EOF folio is split.
+ */
+ folio_batch_init(&fbatch);
+ while (index < end) {
+ if (!filemap_get_folios(mapping, &index, end - 1, &fbatch))
+ break;
+ for (i = 0; i < folio_batch_count(&fbatch); i++) {
+ folio = fbatch.folios[i];
+
+ folio_lock(folio);
+ if (folio_test_uptodate(folio) &&
+ folio->mapping == mapping) {
+ folio_zero_segment(folio, 0, folio_size(folio));
+ }
+ folio_unlock(folio);
+ }
+ folio_batch_release(&fbatch);
+ }
+}
+
/*
* Remove range of pages and swap entries from page cache, and free them.
* If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
@@ -1331,6 +1403,8 @@ static int shmem_setattr(struct mnt_idmap *idmap,
oldsize, newsize);
if (error)
return error;
+ if (newsize > oldsize)
+ shmem_zero_eof(inode, newsize);
i_size_write(inode, newsize);
update_mtime = true;
} else {
@@ -3514,6 +3588,8 @@ static ssize_t shmem_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
ret = file_update_time(file);
if (ret)
goto unlock;
+ if (iocb->ki_pos > i_size_read(inode))
+ shmem_zero_eof(inode, iocb->ki_pos);
ret = generic_perform_write(iocb, from);
unlock:
inode_unlock(inode);
@@ -3846,8 +3922,10 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
cond_resched();
}
- if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
+ if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size) {
+ shmem_zero_eof(inode, offset + len);
i_size_write(inode, offset + len);
+ }
undone:
spin_lock(&inode->i_lock);
inode->i_private = NULL;
--
2.51.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 0/3] tmpfs: zero post-eof ranges on file extension
2025-11-21 15:22 [PATCH v3 0/3] tmpfs: zero post-eof ranges on file extension Brian Foster
` (2 preceding siblings ...)
2025-11-21 15:22 ` [PATCH v3 3/3] tmpfs: zero post-eof ranges on file extension Brian Foster
@ 2025-11-25 2:20 ` Baolin Wang
3 siblings, 0 replies; 5+ messages in thread
From: Baolin Wang @ 2025-11-25 2:20 UTC (permalink / raw)
To: Brian Foster, linux-mm; +Cc: Hugh Dickins
On 2025/11/21 23:22, Brian Foster wrote:
> Hi all,
>
> Apologies for the quicker than expected respin here, but during
> discussion with Baolin on v2 it became apparent that I factored the
> first two patches in an unnecessarily confusing way, and I wanted to get
> that resolved more quickly. Otherwise there are a few smaller changes
> here that I think are incremental improvements.
>
> Baolin, I left off the R-b tag from v2 just out of caution and how I
> reworked the first two patches. The end result is mostly similar, but
> I'd appreciate a quick look before adding that back. Otherwise most of
> the functionality here should be the same as v2.
>
> Thoughts, reviews, flames appreciated.
Overall LGTM. And I tried to do some tmpfs read/write tests, and I
didn't notice any obvious regressions. Anyway, let's wait and see if
Hugh has any input. Thanks for your work.
> v3:
> - Reorder changes in patches 1-2 to factor out existing logic first.
> - Use local scope variable in shmem_writeout() fallocend block.
> - Tweak combined zeroing logic in shmem_writeout().
> - Clean up commit logs.
> v2: https://lore.kernel.org/linux-mm/20251112162522.412295-1-bfoster@redhat.com/
> - Rework to zero uptodate post-eof folios on writeout and full range
> from EOF on size extension.
> - Misc. cleanups: variable renames (pos -> end), code relocation,
> comment cleanups/removal.
> - Update commit log to call out POSIX requirement.
> v1: https://lore.kernel.org/linux-mm/20250625184930.269727-1-bfoster@redhat.com/
>
> Brian Foster (3):
> tmpfs: factor out folio zeroing logic at writeout time
> tmpfs: zero post-eof folio ranges on swapout
> tmpfs: zero post-eof ranges on file extension
>
> mm/shmem.c | 135 ++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 114 insertions(+), 21 deletions(-)
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-11-25 2:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-21 15:22 [PATCH v3 0/3] tmpfs: zero post-eof ranges on file extension Brian Foster
2025-11-21 15:22 ` [PATCH v3 1/3] tmpfs: factor out folio zeroing logic at writeout time Brian Foster
2025-11-21 15:22 ` [PATCH v3 2/3] tmpfs: zero post-eof folio ranges on swapout Brian Foster
2025-11-21 15:22 ` [PATCH v3 3/3] tmpfs: zero post-eof ranges on file extension Brian Foster
2025-11-25 2:20 ` [PATCH v3 0/3] " Baolin Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox