Re: [PATCH v2] mm: shmem: don't set large-order range for internal shmem mount

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Zi Yan <ziy@nvidia.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	willy@infradead.org, akpm@linux-foundation.org, hughd@google.com,
	ljs@kernel.org, lance.yang@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm: shmem: don't set large-order range for internal shmem mount
Date: Thu, 16 Apr 2026 10:08:23 +0800	[thread overview]
Message-ID: <50a01c86-fbf1-4f93-9557-6e5cc1dd1dd7@linux.alibaba.com> (raw)
In-Reply-To: <D29FD39A-50FE-4EE5-8D14-A7B40E565074@nvidia.com>



On 4/16/26 9:52 AM, Zi Yan wrote:
> On 15 Apr 2026, at 21:45, Baolin Wang wrote:
> 
>> On 4/16/26 9:36 AM, Zi Yan wrote:
>>> On 15 Apr 2026, at 21:22, Baolin Wang wrote:
>>>
>>>> On 4/16/26 9:11 AM, Zi Yan wrote:
>>>>> On 15 Apr 2026, at 21:05, Baolin Wang wrote:
>>>>>
>>>>>> On 4/15/26 10:36 PM, David Hildenbrand (Arm) wrote:
>>>>>>> On 4/15/26 12:05, Baolin Wang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/15/26 5:54 PM, David Hildenbrand (Arm) wrote:
>>>>>>>>>>
>>>>>>>>>> Yes, that makes sense.
>>>>>>>>>>
>>>>>>>>>> However, it’s also possible that the mapping does not support large
>>>>>>>>>> folios, yet anonymous shmem can still allocate large folios via the
>>>>>>>>>> sysfs interfaces. That doesn't make sense, right?
>>>>>>>>>
>>>>>>>>> That's what I am saying: if there could be large folios in there, then
>>>>>>>>> let's tell the world.
>>>>>>>>>
>>>>>>>>> Getting in a scenario where the mapping claims to not support large
>>>>>>>>> folios, but then we have large folios in there is inconsistent, not?
>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For the current anonymous shmem (tmpfs is already clear, no questions),
>>>>>>>>>> I don’t think there will be any "will never have/does never allow"
>>>>>>>>>> cases, because it can be changed dynamically via the sysfs interfaces.
>>>>>>>>>
>>>>>>>>> Right. It's about non-anon shmem with huge=off.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If we still want that logic, then for anonymous shmem we can treat it as
>>>>>>>>>> always "might have large folios".
>>>>>>>>
>>>>>>>> OK. To resolve the confusion about 1, the logic should be changed as
>>>>>>>> follows. Does that make sense to you?
>>>>>>>>
>>>>>>>> if (sbinfo->huge || (sb->s_flags & SB_KERNMOUNT))
>>>>>>>>         mapping_set_large_folios(inode->i_mapping);
>>>>>>>
>>>>>>> I think that's better.
>>>>>>
>>>>>> Thanks for your valuable input.
>>>>>>
>>>>>> But has Willy says, maybe we can just
>>>>>>> unconditionally set it and have it even simpler.
>>>>>>
>>>>>> However, for tmpfs mounts, we should still respect the 'huge=' mount option. See commit 5a90c155defa ("tmpfs: don't enable large folios if not supported").
>>>>>
>>>>> Is it possible to get sbinfo->huge during tmpfs’s folio allocation time, so that
>>>>> even if all tmpfs has mapping_set_large_folios() but sbinfo->huge can still
>>>>> decide whether huge page will be allocated for a tmpfs?
>>>>
>>>> Yes, of course. However, the issue isn’t whether tmpfs allows allocating large folios.
>>>>
>>>> The problem commit 5a90c155defa tries to fix is that when tmpfs is mounted with the 'huge=never' option, we will not allocate large folios for it. Then when writing tmpfs files, generic_perform_write() will call mapping_max_folio_size() to get the chunk size and ends up with an order-9 size for writing tmpfs files. However, this tmpfs file is populated only with small folios, resulting in a performance regression.
>>>
>>> IIUC, generic_perform_write() needs to use a small chunk if tmpfs denies huge.
>>> It seems that Kefeng did that in the first try[1]. But willy suggested
>>> the current fix.
>>>
>>> I wonder if we should revisit Kefeng’s first version.
>>>
>>> [1] https://lore.kernel.org/all/20240914140613.2334139-1-wangkefeng.wang@huawei.com/
>>
>> Personally, I still prefer the current fix (commit 5a90c155defa). We should honor the tmpfs mount option. If it explicitly says no large folios, we shouldn’t call mapping_set_large_folios(). Isn’t that more consistent with its semantics?
> 
> Filesystems wishing to turn on large folios in the pagecache should call
> ``mapping_set_large_folios`` when initializing the incore inode.
> 
> You mean tmpfs with huge option set is a FS wishing to turn on large
> folios in the pagecache, otherwise it is a FS wishing not to have large folio
> in the pagecache. tmpfs with different options is seen as different FSes.

What I mean is that tmpfs is somewhat different from other filesystems. 
We have tried to make tmpfs behave like other FSes, but differences 
remain. For example, the previous fix to tmpfs’s large folio allocation 
policy, see commit 69e0a3b49003 ("mm: shmem: fix the strategy for the 
tmpfs 'huge=' options").

So the tmpfs specific 'huge=' mount option is another way it differs 
from other filesystems.

next prev parent reply	other threads:[~2026-04-16  2:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15  8:22 Baolin Wang
2026-04-15  8:47 ` David Hildenbrand (Arm)
2026-04-15  9:04   ` Baolin Wang
2026-04-15  9:19     ` David Hildenbrand (Arm)
2026-04-15  9:41       ` Baolin Wang
2026-04-15  9:54         ` David Hildenbrand (Arm)
2026-04-15 10:05           ` Baolin Wang
2026-04-15 14:36             ` David Hildenbrand (Arm)
2026-04-16  1:05               ` Baolin Wang
2026-04-16  1:11                 ` Zi Yan
2026-04-16  1:22                   ` Baolin Wang
2026-04-16  1:36                     ` Zi Yan
2026-04-16  1:45                       ` Baolin Wang
2026-04-16  1:52                         ` Zi Yan
2026-04-16  2:08                           ` Baolin Wang [this message]
2026-04-15 13:45 ` Matthew Wilcox
2026-04-16  1:02   ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50a01c86-fbf1-4f93-9557-6e5cc1dd1dd7@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=hughd@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox