From: Brian Foster <bfoster@redhat.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>,
Matthew Wilcox <willy@infradead.org>,
linux-mm@kvack.org
Subject: Re: [PATCH] tmpfs: zero post-eof folio range on file extension
Date: Fri, 27 Jun 2025 07:54:52 -0400 [thread overview]
Message-ID: <aF6GjBQLc1YsO0Hh@bfoster> (raw)
In-Reply-To: <fcac9402-04a1-408d-9d7a-8d5a370dac3a@linux.alibaba.com>
On Fri, Jun 27, 2025 at 11:21:56AM +0800, Baolin Wang wrote:
>
>
> On 2025/6/26 20:55, Brian Foster wrote:
> > On Wed, Jun 25, 2025 at 10:35:44PM -0700, Hugh Dickins wrote:
> > > On Wed, 25 Jun 2025, Matthew Wilcox wrote:
> > > > On Wed, Jun 25, 2025 at 02:49:30PM -0400, Brian Foster wrote:
> > > > > Most traditional filesystems zero the post-eof portion of the eof
> > > > > folio at writeback time, or when the file size is extended by
> > > > > truncate or extending writes. This ensures that the previously
> > > > > post-eof range of the folio is zeroed before it is exposed to the
> > > > > file.
> > > > >
> > > > > tmpfs doesn't implement the writeback path the way a traditional
> > > > > filesystem does, so zeroing behavior won't be exactly the same.
> > > > > However, it can still perform explicit zeroing from the various
> > > > > operations that extend a file and expose a post-eof portion of the
> > > > > eof folio. The current lack of zeroing is observed via failure of
> > > > > fstests test generic/363 on tmpfs. This test injects post-eof mapped
> > > > > writes in certain situations to detect gaps in zeroing behavior.
> > > > >
> > > > > Add a new eof zeroing helper for file extending operations. Look up
> > > > > the current eof folio, and if one exists, zero the range about to be
> > > > > exposed. This allows generic/363 to pass on tmpfs.
> > > >
> > > > This seems like what I did here?
> > > >
> > > > https://lore.kernel.org/linux-fsdevel/20230202204428.3267832-4-willy@infradead.org/
> > > >
> > > > Which fix should we prefer?
> > >
> >
> > Quite similar, indeed. This is actually about the same as my initial
> > prototype when I was just trying to confirm the problem for truncate. As
> > Hugh notes, we still need to cover other size extension paths
> > (fallocate, buffered write).
> >
> > > Thank you Brian, thank you Matthew.
> > >
> > > Of course my immediate reaction is to prefer the smaller patch,
> > > Matthew's, but generic/363 still fails with that one: I need to look
> > > into what each of you is doing (and that mail thread from 2023) and
> > > make up my own mind.
> > >
> >
> > FWIW, I started off with something trying to use shmem_undo_range() and
> > eventually moved toward the current patch because I couldn't get it
> > working quite right. Explicit zeroing seemed like less complexity than
> > calling through undo_range() on various operations, primarily due to
> > less risk of unintentional side effect. It's possible (likely :P) that
> > was just user error on my part, so now that I have a functional patch I
> > can revisit that approach if it is preferred.
>
> I also tried using shmem_truncate_range() to fix this issue, but I failed.
> Ultimately, at least for me, I think the fix is simple and works.
>
Ok, thanks.
> > However one thing I wasn't aware of at the time was the additional
> > zeroing needed into the target range on fallocate, so that might require
> > care to avoid not immediately punching out the folios that were
> > fallocated just prior. I suspect this would mean we'd need a helper or
> > something to restrict the range to undo to just the eof folio. That
> > seems like a plausible approach, I'm just not so sure the final result
> > will end up much smaller or simpler than this patch.
> >
> > > (I'm worried why I had no copy of Matthew's 2023 patch: it's sadly
> > > common for me to bury patches for long periods of time, but not
> > > usually to lose them completely. But that is my worry, not yours.)
> > >
> > > I haven't been much concerned by generic/383 failing on tmpfs:
> > > but promise to respect your efforts in due course.
> > >
> >
> > It's certainly not the bug of the century. ;) I added the test somewhat
> > recently because we had bigger zeroing issues on other filesystems and I
> > noticed we had no decent test coverage.
> >
> > > I procrastinate "in due course" because (a) the full correct answwer
> > > will depend on what happens with large folios, and (b) large folio
> > > work in 6.14 changed (I'll say broke) the behaviour of huge=always
> > > at eof (I expected a danger of that, and thought I checked before
> > > 6.14-rc settled, but must have messed up my checking somehow).
> > >
> >
> > Interesting.. I assume huge=always refers to a mount option. I need to
> > give that a test.
>
> I tested your patch by adding the 'transparent_hugepage_tmpfs=always'
> command line parameter, which will change the default huge policy to
> 'huge=always' for tmpfs mounts. Your patch also passed the generic/363 test
> with the tmpfs 'huge=always' mount option.
>
Thanks. I ran a full auto fstests run with huge=always and an uptodate
tweak yesterday and also didn't see any regressions. My change was
pretty much the same as yours below other than I inverted the logic.
I do see two new test failures with huge=always in generic/285 and
generic/436, but that appears to be related to the mount option and not
the patch. Both of those seem to be seek related, FWIW.
> > I'm also curious if either of you have any thoughts on the uptodate
> > question. Does filtering zeroing on uptodate folios ensure we'd zero
> > only folios that were previously written to?
>
> Yes, I think so. Caller will handle the !uptodate folios. So I change your
> patch to:
>
> static void shmem_zero_eof(struct inode *inode, loff_t pos)
> {
> struct folio *folio;
> loff_t i_size = i_size_read(inode);
> size_t from, len;
>
> folio = shmem_get_partial_folio(inode, i_size >> PAGE_SHIFT);
> if (!folio)
> return;
>
> if (folio_test_uptodate(folio)) {
> /* zero to the end of the folio or start of extending
> operation */
> from = offset_in_folio(folio, i_size);
> len = min_t(loff_t, folio_size(folio) - from, pos - i_size);
> folio_zero_range(folio, from, len);
> }
>
> folio_unlock(folio);
> folio_put(folio);
> }
>
> > > So there's more (and more urgent) to sort out before settling on
> > > the right generic/383 fix.
>
> However, let's still wait to see if Hugh has any better ideas.
>
Ack.. I'll give it a bit and then follow up with a v2 later unless I
hear otherwise. Thanks again, appreciate the feedback.
Brian
next prev parent reply other threads:[~2025-06-27 11:51 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-25 18:49 Brian Foster
2025-06-25 19:21 ` Matthew Wilcox
2025-06-26 5:35 ` Hugh Dickins
2025-06-26 12:55 ` Brian Foster
2025-06-27 3:21 ` Baolin Wang
2025-06-27 11:54 ` Brian Foster [this message]
2025-07-09 7:57 ` Hugh Dickins
2025-07-10 6:47 ` Baolin Wang
2025-07-10 22:20 ` Hugh Dickins
2025-07-11 3:50 ` Baolin Wang
2025-07-11 7:50 ` Hugh Dickins
2025-07-11 8:42 ` Baolin Wang
2025-07-11 16:08 ` Brian Foster
2025-07-11 20:15 ` Brian Foster
2025-07-14 3:05 ` Baolin Wang
2025-07-14 14:38 ` Brian Foster
2025-07-10 12:36 ` Brian Foster
2025-07-10 23:02 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aF6GjBQLc1YsO0Hh@bfoster \
--to=bfoster@redhat.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox