Re: [PATCH v2 1/3] tmpfs: zero post-eof uptodate folios on swapout

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-mm@kvack.org, Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v2 1/3] tmpfs: zero post-eof uptodate folios on swapout
Date: Wed, 19 Nov 2025 11:53:41 +0800	[thread overview]
Message-ID: <9e696091-9c40-46cf-91b9-c95e1d38a1a8@linux.alibaba.com> (raw)
In-Reply-To: <aRyFOsUk--AtbEGM@bfoster>



On 2025/11/18 22:39, Brian Foster wrote:
> On Tue, Nov 18, 2025 at 10:33:44AM +0800, Baolin Wang wrote:
>>
>>
>> On 2025/11/13 00:25, Brian Foster wrote:
>>> As a first step to facilitate efficient post-eof zeroing in tmpfs,
>>> zero post-eof uptodate folios at swap out time. This ensures that
>>> post-eof ranges are zeroed "on disk" (i.e. analogous to traditional
>>> pagecache writeback) and facilitates zeroing on file size changes by
>>> allowing it to not have to swap in.
>>>
>>> Note that shmem_writeout() already zeroes !uptodate folios so this
>>> introduces some duplicate logic. We'll clean this up in the next
>>> patch.
>>>
>>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>>> ---
>>>    mm/shmem.c | 19 +++++++++++++++++--
>>>    1 file changed, 17 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 0a25ee095b86..5fb3c911894f 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -1577,6 +1577,8 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
>>>    	struct inode *inode = mapping->host;
>>>    	struct shmem_inode_info *info = SHMEM_I(inode);
>>>    	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>>> +	loff_t i_size = i_size_read(inode);
>>> +	pgoff_t end_index = DIV_ROUND_UP(i_size, PAGE_SIZE);
>>>    	pgoff_t index;
>>>    	int nr_pages;
>>>    	bool split = false;
>>> @@ -1596,8 +1598,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
>>>    	 * (unless fallocate has been used to preallocate beyond EOF).
>>>    	 */
>>>    	if (folio_test_large(folio)) {
>>> -		index = shmem_fallocend(inode,
>>> -			DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE));
>>> +		index = shmem_fallocend(inode, end_index);
>>>    		if ((index > folio->index && index < folio_next_index(folio)) ||
>>>    		    !IS_ENABLED(CONFIG_THP_SWAP))
>>>    			split = true;
>>> @@ -1647,6 +1648,20 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
>>>    		folio_mark_uptodate(folio);
>>>    	}
>>> +	/*
>>> +	 * Ranges beyond EOF must be zeroed at writeout time. This mirrors
>>> +	 * traditional writeback behavior and facilitates zeroing on file size
>>> +	 * changes without having to swap back in.
>>> +	 */
>>> +	if (folio_next_index(folio) >= end_index) {
>>> +		size_t from = offset_in_folio(folio, i_size);
>>> +
>>> +		if (index >= end_index) {
>>> +			folio_zero_segment(folio, 0, folio_size(folio));
>>> +		} else if (from)
>>> +			folio_zero_segment(folio, from, folio_size(folio));
>>> +	}
>>
>> As I mentioned before[1], if a large folio is beyond EOF, it will be split
>> in shmem_writeout(), and those small folios beyond EOF will be dropped and
>> freed in __folio_split(). Of course, there's another special case as Hugh
>> mentioned: when there's a 'fallocend' beyond i_size (e.g., fallocate()), it
>> will keep the pages allocated beyond EOF after the split. However, your
>> 'end_index' here does not consider 'fallocend,' so it seems to me that this
>> portion of the code doesn't actually take effect.
>>
> 
> Hi Boalin,

s/Boalin/Baolin :)

> 
> So I get that split post-eof folios can fall off depending on fallocate
> status. I'm not sure what you mean by considering fallocend, however.
> ISTM that fallocend contributes to the boundary where we decide to split
> and/or preserve, but i_size is what is relevant for zeroing. It's not
> clear to me if you're suggesting the logic is potentially spurious, or
> this might not actually be zeroing correctly due to falloc interactions.
> Can you clarify the concern please? Thanks.

Sorry for not being clear enough (for my quick response yesterday). 
After thinking more, I want to divide this into 3 cases to clearly 
explain the logic here:

1. Without fallocate(), if a large folio is beyond EOF (i.e. i_size), it 
will be split in shmem_writeout(), and those small folios beyond EOF 
will be dropped and freed in __folio_split(). So, your changes should 
also have no impact, because after the split, ‘folio_next_index(folio)’ 
is definitely <= ‘end_index’. So the logic is correct.

2. With fallocate(), If a large folio is beyond EOF (i.e. i_size) but 
smaller than 'fallocend', the folio will not be split. So, we should 
zero the post-EOF part. Because 'index' (now representing 'fallocend') 
is greater than 'end_index', you are zeroing the entire large folio, 
which does not seem correct to me.

if (index >= end_index) {
	folio_zero_segment(folio, 0, folio_size(folio));
} else if ...

I think you should only zero the range from 'from' to 
'folio_size(folio)' of this large folio in this case. Right?

3. With fallocate(), If a large folio is beyond EOF (i.e. i_size) and 
also beyond 'fallocend', the large folio will be split to small folios. 
If we continue attempting to write out these small folios beyond EOF, we 
need to zero the entire mall folios at this point. So, the logic looks 
correct (because 'index' > 'end_index').

Based on the above analysis, I believe the logic should be:

if (folio_next_index(folio) >= end_index) {
	size_t from = offset_in_folio(folio, i_size);

	if (!folio_test_large(folio) && index >= end_index)
		folio_zero_segment(folio, 0, folio_size(folio));
	else if (from)
		folio_zero_segment(folio, from, folio_size(folio));
}

The logic here is a bit complex, please correct me if I misunderstood you.

next prev parent reply	other threads:[~2025-11-19  3:53 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12 16:25 [PATCH v2 0/3] tmpfs: zero post-eof ranges on file extension Brian Foster
2025-11-12 16:25 ` [PATCH v2 1/3] tmpfs: zero post-eof uptodate folios on swapout Brian Foster
2025-11-18  2:33   ` Baolin Wang
2025-11-18 14:39     ` Brian Foster
2025-11-19  3:53       ` Baolin Wang [this message]
2025-11-19 14:08         ` Brian Foster
2025-11-20  1:57           ` Baolin Wang
2025-11-20 14:12             ` Brian Foster
2025-11-12 16:25 ` [PATCH v2 2/3] tmpfs: combine !uptodate and post-eof zeroing logic at swapout Brian Foster
2025-11-20  2:56   ` Baolin Wang
2025-11-20 14:14     ` Brian Foster
2025-11-12 16:25 ` [PATCH v2 3/3] tmpfs: zero post-eof ranges on file extension Brian Foster
2025-11-20  5:57   ` Baolin Wang
2025-11-20 14:21     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e696091-9c40-46cf-91b9-c95e1d38a1a8@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=bfoster@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox