linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>
Cc: Brian Foster <bfoster@redhat.com>,
	linux-mm@kvack.org, Matthew Wilcox <willy@infradead.org>,
	Usama Arif <usamaarif64@gmail.com>
Subject: Re: [PATCH] tmpfs: zero post-eof folio range on file extension
Date: Fri, 11 Jul 2025 16:42:53 +0800	[thread overview]
Message-ID: <4a8f64f8-cdef-4081-8654-7e8c43b6f18b@linux.alibaba.com> (raw)
In-Reply-To: <a42f25c0-94fe-0627-3039-56b81355282b@google.com>



On 2025/7/11 15:50, Hugh Dickins wrote:
> On Fri, 11 Jul 2025, Baolin Wang wrote:
>> On 2025/7/11 06:20, Hugh Dickins wrote:
>>> On Thu, 10 Jul 2025, Baolin Wang wrote:
>>>> On 2025/7/9 15:57, Hugh Dickins wrote:
>>> ...
>>>>>
>>>>> The problem is with huge pages (or large folios) in shmem_writeout():
>>>>> what goes in as a large folio may there have to be split into small
>>>>> pages; or it may be swapped out as one large folio, but fragmentation
>>>>> at swapin time demand that it be split into small pages when swapped in.
>>>>
>>>> Good point.
>>>>
>>>>> So, if there has been swapout since the large folio was modified beyond
>>>>> EOF, the folio that shmem_zero_eof() brings in does not guarantee what
>>>>> length needs to be zeroed.
>>>>>
>>>>> We could set that aside as a deficiency to be fixed later on: that
>>>>> would not be unreasonable, but I'm guessing that won't satisfy you.
>>>>>
>>>>> We could zero the maximum (the remainder of PMD size I believe) in
>>>>> shmem_zero_eof(): looping over small folios within the range, skipping
>>>>> !uptodate ones (but we do force them uptodate when swapping out, in
>>>>> order to keep the space reservation).  TBH I've ignored that as a bad
>>>>> option, but it doesn't seem so bad to me now: ugly, but maybe not bad.
>>>>
>>>> However, IIUC, if the large folios are split in shmem_writeout(), and those
>>>> small folios which beyond EOF will be dropped and freed in
>>>> __split_unmapped_folio(), should we still consider them?
>>>
>>> You're absolutely right about the normal case, and thank you for making
>>> that point.  Had I forgotten that when writing?  Or was I already
>>> jumping ahead to the problem case?  I don't recall, but was certainly
>>> wrong for not mentioning it.
>>>
>>> The abnormal case is when there's a "fallocend" beyond i_size (or beyond
>>> the small page extent spanning i_size) i.e. fallocate() has promised to
>>> keep pages allocated beyond EOF.  In that case, __split_unmapped_folio()
>>> is keeping those pages.
>>
>> Ah, yes, you are right.
>>
>>> There could well be some optimization, involving fallocend, to avoid
>>> zeroing more than necessary; but I wouldn't want to say what in a hurry,
>>> it's quite confusing!
>>
>> Like you said, not only can a large folio split occur during swapout, but it
>> can also happen during a punch hole operation. Moreover, considering the
>> abnormal case of fallocate() you mentioned, we should find a more common
>> approach to mitigate the impact of fallocate().
>>
>> For instance, when splitting, we could clear the 'uptodate' flag for these EOF
>> small folios that are beyond 'i_size' but less than the 'fallocend', so that
>> these EOF small folios will be re-initialized if they are used again. What do
>> you think?
> 
> First impression: that's a great idea, much better than anything I was
> proposing.  Let's hope I don't perceive some drawback overnight and renege.
> 
> I don't love your patch below so much, we would probably want to gather
> the shmem_mapping() peculiarities together better (and seeing that
> repeated i_size_read(): IIRC 32-bit locking doesn't allow it in there).

Absolutely yes, the following code just shows my thought:)

> And there tends to be an assumption (don't ask me where) that a page once
> uptodate remains that way until it's freed: maybe no problem before it's
> inserted in the xarray (as you have), or maybe better before unfreezing,
> or maybe the page lock is already enough.

These EOF small folios are unmapped, frozen, locked, and not yet in the 
xarray. It seems there is no other way to get them (pfn walker could, 
but the page lock or freeze will abort the pfn walker process), so it 
appears to be safe to change the 'uptodate' flag. Anyway, let me still 
do some investigation.

> Those my initial reactions.

Thanks for comments.

>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index ce130225a8e5..2ccb442525d1 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3546,6 +3546,18 @@ static int __split_unmapped_folio(struct folio *folio,
>> int new_order,
>>                          lru_add_split_folio(origin_folio, release, lruvec,
>>                                          list);
>>
>> +                       /*
>> +                        * fallocate() will keep folios allocated beyond EOF,
>> we should
>> +                        * clear the uptodate flag for these folios to re-zero
>> them
>> +                        * if necessary.
>> +                        */
>> +                       if (shmem_mapping(mapping)) {
>> +                               loff_t i_size = i_size_read(mapping->host);
>> +
>> +                               if (i_size < end && release->index >= i_size)
>> +                                       folio_clear_uptodate(release);
>> +                       }
>> +
>>                          /* Some pages can be beyond EOF: drop them from cache
>> */
>>                          if (release->index >= end) {
>>                                  if (shmem_mapping(mapping))



  reply	other threads:[~2025-07-11  8:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25 18:49 Brian Foster
2025-06-25 19:21 ` Matthew Wilcox
2025-06-26  5:35   ` Hugh Dickins
2025-06-26 12:55     ` Brian Foster
2025-06-27  3:21       ` Baolin Wang
2025-06-27 11:54         ` Brian Foster
2025-07-09  7:57 ` Hugh Dickins
2025-07-10  6:47   ` Baolin Wang
2025-07-10 22:20     ` Hugh Dickins
2025-07-11  3:50       ` Baolin Wang
2025-07-11  7:50         ` Hugh Dickins
2025-07-11  8:42           ` Baolin Wang [this message]
2025-07-11 16:08         ` Brian Foster
2025-07-11 20:15           ` Brian Foster
2025-07-14  3:05             ` Baolin Wang
2025-07-14 14:38               ` Brian Foster
2025-07-10 12:36   ` Brian Foster
2025-07-10 23:02     ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a8f64f8-cdef-4081-8654-7e8c43b6f18b@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=bfoster@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=usamaarif64@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox