From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>, Brian Foster <bfoster@redhat.com>
Cc: linux-mm@kvack.org, Matthew Wilcox <willy@infradead.org>,
Usama Arif <usamaarif64@gmail.com>
Subject: Re: [PATCH] tmpfs: zero post-eof folio range on file extension
Date: Thu, 10 Jul 2025 14:47:20 +0800 [thread overview]
Message-ID: <67f0461b-3359-41e7-a7cd-b059cbef4154@linux.alibaba.com> (raw)
In-Reply-To: <297e44e9-1b58-d7c4-192c-9408204ab1e3@google.com>
On 2025/7/9 15:57, Hugh Dickins wrote:
> Thanks for suggestions, let me come back to comment on your original.
>
> On Wed, 25 Jun 2025, Brian Foster wrote:
>>
>
> Although Matthew's alternative commit was insufficient and wrong
> (because use of truncation risks deleting folios already promised
> by fallocate), I found his brief commit message very very helpful
> in understanding your patch - it makes clear what POSIX promises,
> is a better justification than just passing an xfstest you wrote,
> and explains why you're right to target places where i_size changes:
>
> POSIX requires that "If the file size is increased, the extended area
> shall appear as if it were zero-filled". It is possible to use mmap to
> write past EOF and that data will become visible instead of zeroes.
>
> Please add that paragraph here, right at the head of your commit message.
>
>> Most traditional filesystems zero the post-eof portion of the eof
>> folio at writeback time, or when the file size is extended by
>> truncate or extending writes. This ensures that the previously
>> post-eof range of the folio is zeroed before it is exposed to the
>> file.
>>
>> tmpfs doesn't implement the writeback path the way a traditional
>> filesystem does, so zeroing behavior won't be exactly the same.
>> However, it can still perform explicit zeroing from the various
>> operations that extend a file and expose a post-eof portion of the
>> eof folio. The current lack of zeroing is observed via failure of
>> fstests test generic/363 on tmpfs. This test injects post-eof mapped
>> writes in certain situations to detect gaps in zeroing behavior.
>>
>> Add a new eof zeroing helper for file extending operations. Look up
>> the current eof folio, and if one exists, zero the range about to be
>> exposed. This allows generic/363 to pass on tmpfs.
>>
>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>> ---
>>
>> Hi all,
>>
>> This survives the aforemented reproducer, an fstests regression run, and
>> ~100m fsx operations without issues. Let me know if there are any other
>> recommended tests for tmpfs and I'm happy to run them. Otherwise, a
>> couple notes as I'm not terribly familiar with tmpfs...
>>
>> First, I used _get_partial_folio() because we really only want to zero
>> an eof folio if one has been previously allocated. My understanding is
>> that lookup path will avoid unnecessary folio allocation in such cases,
>> but let me know if that's wrong.
>>
>> Also, it seems that the falloc path leaves newly preallocated folios
>> !uptodate until they are used. This had me wondering if perhaps
>> shmem_zero_eof() could just skip out if the eof folio happens to be
>> !uptodate. Hm?
>
> Yes, you were right to think that it's better to skip the !uptodates,
> and Baolin made good suggestion there (though I'll unravel it a bit).
>
>>
>> Thoughts, reviews, flames appreciated.
>
> Sorry, much as you'd appreciate a flame, I cannot oblige: I think you've
> done a very good job here (but it's not ready yet), and you've done it
> in such a way that huge=always passes generic/363 with no trouble.
>
> I did keep on wanting to change this and that of your patch below; but
> then later came around to seeing why your choices were better than what
> I was going to suggest.
>
> I had difficulty getting deep enough into it, but think I'm there now.
> And have identified one missed aspect, which rather changes around what
> you should do - I'd have preferred to get into that at the end, but
> since it affects what shmem_zero_eof() should look like, I'd better
> talk about it here first.
>
> The problem is with huge pages (or large folios) in shmem_writeout():
> what goes in as a large folio may there have to be split into small
> pages; or it may be swapped out as one large folio, but fragmentation
> at swapin time demand that it be split into small pages when swapped in.
Good point.
> So, if there has been swapout since the large folio was modified beyond
> EOF, the folio that shmem_zero_eof() brings in does not guarantee what
> length needs to be zeroed.
>
> We could set that aside as a deficiency to be fixed later on: that
> would not be unreasonable, but I'm guessing that won't satisfy you.
>
> We could zero the maximum (the remainder of PMD size I believe) in
> shmem_zero_eof(): looping over small folios within the range, skipping
> !uptodate ones (but we do force them uptodate when swapping out, in
> order to keep the space reservation). TBH I've ignored that as a bad
> option, but it doesn't seem so bad to me now: ugly, but maybe not bad.
However, IIUC, if the large folios are split in shmem_writeout(), and
those small folios which beyond EOF will be dropped and freed in
__split_unmapped_folio(), should we still consider them?
Please correct me if I missed anything.
> The solution I've had in mind (and pursue in comments below) is to do
> the EOF zeroing in shmem_writeout() before it splits; and then avoid
> swapin in shmem_zero_eof() when i_size is raised.
>
> That solution partly inspired by the shmem symlink uninit-value bug
> https://lore.kernel.org/linux-mm/670793eb.050a0220.8109b.0003.GAE@google.com/
> which I haven't rushed to fix, but ought to be fixed along with this one
> (by "along with" I don't mean that both have to be fixed in one single
> patch, but it makes sense to consider them together). I was inclined
> not to zero the whole page in shmem_symlink(), but zero before swapout.
>
> It worries me that an EOF page might be swapped out and in a thousand
> times, but i_size set only once: I'm going for a solution which memsets
> a thousand times rather than once? But if that actually shows up as an
> issue in any workload, then we can add a shmem inode flag to say whether
> the EOF folio has been exposed via mmap (or symlink) so may need zeroing.
>
> What's your preference? My comments below assume the latter solution,
> but that may be wrong.
next prev parent reply other threads:[~2025-07-10 6:47 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-25 18:49 Brian Foster
2025-06-25 19:21 ` Matthew Wilcox
2025-06-26 5:35 ` Hugh Dickins
2025-06-26 12:55 ` Brian Foster
2025-06-27 3:21 ` Baolin Wang
2025-06-27 11:54 ` Brian Foster
2025-07-09 7:57 ` Hugh Dickins
2025-07-10 6:47 ` Baolin Wang [this message]
2025-07-10 22:20 ` Hugh Dickins
2025-07-11 3:50 ` Baolin Wang
2025-07-11 7:50 ` Hugh Dickins
2025-07-11 8:42 ` Baolin Wang
2025-07-11 16:08 ` Brian Foster
2025-07-11 20:15 ` Brian Foster
2025-07-14 3:05 ` Baolin Wang
2025-07-14 14:38 ` Brian Foster
2025-07-10 12:36 ` Brian Foster
2025-07-10 23:02 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67f0461b-3359-41e7-a7cd-b059cbef4154@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=bfoster@redhat.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=usamaarif64@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox