Re: [PATCH] tmpfs: zero post-eof folio range on file extension

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>, Brian Foster <bfoster@redhat.com>
Cc: linux-mm@kvack.org, Matthew Wilcox <willy@infradead.org>,
	Usama Arif <usamaarif64@gmail.com>
Subject: Re: [PATCH] tmpfs: zero post-eof folio range on file extension
Date: Thu, 10 Jul 2025 14:47:20 +0800	[thread overview]
Message-ID: <67f0461b-3359-41e7-a7cd-b059cbef4154@linux.alibaba.com> (raw)
In-Reply-To: <297e44e9-1b58-d7c4-192c-9408204ab1e3@google.com>



On 2025/7/9 15:57, Hugh Dickins wrote:
> Thanks for suggestions, let me come back to comment on your original.
> 
> On Wed, 25 Jun 2025, Brian Foster wrote:
>>   
> 
> Although Matthew's alternative commit was insufficient and wrong
> (because use of truncation risks deleting folios already promised
> by fallocate), I found his brief commit message very very helpful
> in understanding your patch - it makes clear what POSIX promises,
> is a better justification than just passing an xfstest you wrote,
> and explains why you're right to target places where i_size changes:
> 
> POSIX requires that "If the file size is increased, the extended area
> shall appear as if it were zero-filled".  It is possible to use mmap to
> write past EOF and that data will become visible instead of zeroes.
> 
> Please add that paragraph here, right at the head of your commit message.
> 
>> Most traditional filesystems zero the post-eof portion of the eof
>> folio at writeback time, or when the file size is extended by
>> truncate or extending writes. This ensures that the previously
>> post-eof range of the folio is zeroed before it is exposed to the
>> file.
>>
>> tmpfs doesn't implement the writeback path the way a traditional
>> filesystem does, so zeroing behavior won't be exactly the same.
>> However, it can still perform explicit zeroing from the various
>> operations that extend a file and expose a post-eof portion of the
>> eof folio. The current lack of zeroing is observed via failure of
>> fstests test generic/363 on tmpfs. This test injects post-eof mapped
>> writes in certain situations to detect gaps in zeroing behavior.
>>
>> Add a new eof zeroing helper for file extending operations. Look up
>> the current eof folio, and if one exists, zero the range about to be
>> exposed. This allows generic/363 to pass on tmpfs.
>>
>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>> ---
>>
>> Hi all,
>>
>> This survives the aforemented reproducer, an fstests regression run, and
>> ~100m fsx operations without issues. Let me know if there are any other
>> recommended tests for tmpfs and I'm happy to run them. Otherwise, a
>> couple notes as I'm not terribly familiar with tmpfs...
>>
>> First, I used _get_partial_folio() because we really only want to zero
>> an eof folio if one has been previously allocated. My understanding is
>> that lookup path will avoid unnecessary folio allocation in such cases,
>> but let me know if that's wrong.
>>
>> Also, it seems that the falloc path leaves newly preallocated folios
>> !uptodate until they are used. This had me wondering if perhaps
>> shmem_zero_eof() could just skip out if the eof folio happens to be
>> !uptodate. Hm?
> 
> Yes, you were right to think that it's better to skip the !uptodates,
> and Baolin made good suggestion there (though I'll unravel it a bit).
> 
>>
>> Thoughts, reviews, flames appreciated.
> 
> Sorry, much as you'd appreciate a flame, I cannot oblige: I think you've
> done a very good job here (but it's not ready yet), and you've done it
> in such a way that huge=always passes generic/363 with no trouble.
> 
> I did keep on wanting to change this and that of your patch below; but
> then later came around to seeing why your choices were better than what
> I was going to suggest.
> 
> I had difficulty getting deep enough into it, but think I'm there now.
> And have identified one missed aspect, which rather changes around what
> you should do - I'd have preferred to get into that at the end, but
> since it affects what shmem_zero_eof() should look like, I'd better
> talk about it here first.
> 
> The problem is with huge pages (or large folios) in shmem_writeout():
> what goes in as a large folio may there have to be split into small
> pages; or it may be swapped out as one large folio, but fragmentation
> at swapin time demand that it be split into small pages when swapped in.

Good point.

> So, if there has been swapout since the large folio was modified beyond
> EOF, the folio that shmem_zero_eof() brings in does not guarantee what
> length needs to be zeroed.
> 
> We could set that aside as a deficiency to be fixed later on: that
> would not be unreasonable, but I'm guessing that won't satisfy you.
> 
> We could zero the maximum (the remainder of PMD size I believe) in
> shmem_zero_eof(): looping over small folios within the range, skipping
> !uptodate ones (but we do force them uptodate when swapping out, in
> order to keep the space reservation).  TBH I've ignored that as a bad
> option, but it doesn't seem so bad to me now: ugly, but maybe not bad.

However, IIUC, if the large folios are split in shmem_writeout(), and 
those small folios which beyond EOF will be dropped and freed in 
__split_unmapped_folio(), should we still consider them?

Please correct me if I missed anything.

> The solution I've had in mind (and pursue in comments below) is to do
> the EOF zeroing in shmem_writeout() before it splits; and then avoid
> swapin in shmem_zero_eof() when i_size is raised.
> 
> That solution partly inspired by the shmem symlink uninit-value bug
> https://lore.kernel.org/linux-mm/670793eb.050a0220.8109b.0003.GAE@google.com/
> which I haven't rushed to fix, but ought to be fixed along with this one
> (by "along with" I don't mean that both have to be fixed in one single
> patch, but it makes sense to consider them together).  I was inclined
> not to zero the whole page in shmem_symlink(), but zero before swapout.
> 
> It worries me that an EOF page might be swapped out and in a thousand
> times, but i_size set only once: I'm going for a solution which memsets
> a thousand times rather than once?  But if that actually shows up as an
> issue in any workload, then we can add a shmem inode flag to say whether
> the EOF folio has been exposed via mmap (or symlink) so may need zeroing.
> 
> What's your preference?  My comments below assume the latter solution,
> but that may be wrong.

next prev parent reply	other threads:[~2025-07-10  6:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25 18:49 Brian Foster
2025-06-25 19:21 ` Matthew Wilcox
2025-06-26  5:35   ` Hugh Dickins
2025-06-26 12:55     ` Brian Foster
2025-06-27  3:21       ` Baolin Wang
2025-06-27 11:54         ` Brian Foster
2025-07-09  7:57 ` Hugh Dickins
2025-07-10  6:47   ` Baolin Wang [this message]
2025-07-10 22:20     ` Hugh Dickins
2025-07-11  3:50       ` Baolin Wang
2025-07-11  7:50         ` Hugh Dickins
2025-07-11  8:42           ` Baolin Wang
2025-07-11 16:08         ` Brian Foster
2025-07-11 20:15           ` Brian Foster
2025-07-14  3:05             ` Baolin Wang
2025-07-14 14:38               ` Brian Foster
2025-07-10 12:36   ` Brian Foster
2025-07-10 23:02     ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67f0461b-3359-41e7-a7cd-b059cbef4154@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=bfoster@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=usamaarif64@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox