linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
	Daniel Gomez <d@kruces.com>, Daniel Gomez <da.gomez@samsung.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>,
	akpm@linux-foundation.org, hughd@google.com,
	wangkefeng.wang@huawei.com, 21cnbao@gmail.com,
	ryan.roberts@arm.com, ioworker0@gmail.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC PATCH v3 0/4] Support large folios for tmpfs
Date: Thu, 31 Oct 2024 09:53:34 +0100	[thread overview]
Message-ID: <d758a4f4-e0e6-4a78-beb4-e513de229310@redhat.com> (raw)
In-Reply-To: <0b7671fd-3fea-4086-8a85-fe063a62fa80@linux.alibaba.com>

>>>
>>> If we don't want to go with the shmem_enabled toggles, we should
>>> probably still extend the documentation to cover "all THP sizes", like
>>> we did elsewhere.
>>>
>>> huge=never: no THPs of any size
>>> huge=always: THPs of any size (fault/write/etc)
>>> huge=fadvise: like "always" but only with fadvise/madvise
>>> huge=within_size: like "fadvise" but respect i_size
>>
>> Thinking some more about that over the weekend, this is likely the way
>> to go, paired with conditionally changing the default to
>> always/within_size. I suggest a kconfig option for that.
> 
> I am still worried about adding a new kconfig option, which might
> complicate the tmpfs controls further.

Why exactly?

If we are changing a default similar to 
CONFIG_TRANSPARENT_HUGEPAGE_NEVER -> CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS, 
it would make perfectly sense to give people building a kernel control 
over that.

If we want to support this feature in a distro kernel like RHEL we'll 
have to leave the default unmodified. Otherwise I see no way (excluding 
downstream-only hacks) to backport this into distro kernels.

> 
>> That should probably do as a first shot; I assume people will want more
>> control over which size to use, especially during page faults, but that
>> can likely be added later.

I know, it puts you in a bad position because there are different 
opinions floating around. But let's try to find something that is 
reasonable and still acceptable. And let's hope that Hugh will voice an 
opinion :D

> 
> After some discussions, I think the first step is to achieve two goals:
> 1) Try to make tmpfs use large folios like other file systems, that
> means we should avoid adding more complex control options (per Matthew).
> 2) Still need maintain compatibility with the 'huge=' mount option (per
> Kirill), as I also remembered we have customers who use
> 'huge=within_size' to allocate THPs for better performance.

> 
> Based on these considerations, my first step is to neither add a new
> 'huge=' option parameter nor introduce the mTHP interfaces control for
> tmpfs, but rather to change the default huge allocation behavior for
> tmpfs. That is to say, when 'huge=' option is not configured, we will
> allow the huge folios allocation based on the write size. As a result,
> the behavior of huge pages for tmpfs will change as follows:
 > > no 'huge=' set: can allocate any size huge folios based on write size
 > huge=never: no any size huge folios> huge=always: only PMD sized THP 
allocation as before
 > huge=fadvise: like "always" but only with fadvise/madvise> 
huge=within_size: like "fadvise" but respect i_size

I don't like that:

(a) there is no way to explicitly enable/name that new behavior.
(b) "always" etc. are only concerned about PMDs.


So again, I suggest:

huge=never: No THPs of any size
huge=always: THPs of any size
huge=fadvise: like "always" but only with fadvise/madvise 
huge=within_size: like "fadvise" but respect i_size

"huge=" default depends on a Kconfig option.

With that we:

(1) Maximize the cases where we will use large folios of any sizes
     (which Willy cares about).
(2) Have a way to disable them completely (which I care about).
(3) Allow distros to keep the default unchanged.

Likely, for now we will only try allocating PMD-sized THPs during page 
faults, and allocate different sizes only during write(). So the effect 
for many use cases (VMs, DBs) that primarily mmap() tmpfs files will be 
completely unchanged even with "huge=always".

It will get more tricky once we change that behavior as well, but that's 
something to likely figure out if it is a real problem at at different 
day :)


I really preferred using the sysfs toggles (as discussed with Hugh in 
the meeting back then), but I can also understand why we at least want 
to try making tmpfs behave more like other file systems. But I'm a bit 
more careful to not ignore the cases where it really isn't like any 
other file system.

If we start making PMD-sized THPs special in any non-configurable way, 
then we are effectively off *worse* than allowing to configure them 
properly. So if someone voices "but we want only PMD-sized" ones, the 
next one will say "but we only want cont-pte sized-ones" and then we 
should provide an option to control the actual sizes to use differently, 
in some way. But let's see if that is even required.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-10-31  8:53 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-10  9:58 Baolin Wang
2024-10-10  9:58 ` [RFC PATCH v3 1/4] mm: factor out the order calculation into a new helper Baolin Wang
2024-10-10  9:58 ` [RFC PATCH v3 2/4] mm: shmem: change shmem_huge_global_enabled() to return huge order bitmap Baolin Wang
2024-10-10  9:58 ` [RFC PATCH v3 3/4] mm: shmem: add large folio support to the write and fallocate paths for tmpfs Baolin Wang
2024-10-10  9:58 ` [RFC PATCH v3 4/4] docs: tmpfs: add documention for 'write_size' huge option Baolin Wang
2024-10-16  7:49 ` [RFC PATCH v3 0/4] Support large folios for tmpfs Kefeng Wang
2024-10-16  9:29   ` Baolin Wang
2024-10-16 13:45     ` Kefeng Wang
2024-10-17  9:52       ` Baolin Wang
2024-10-16 14:06 ` Matthew Wilcox
2024-10-17  9:34   ` Baolin Wang
2024-10-17 11:26     ` Kirill A. Shutemov
2024-10-21  6:24       ` Baolin Wang
2024-10-21  8:54         ` Kirill A. Shutemov
2024-10-21 13:34           ` Daniel Gomez
2024-10-22  3:41             ` Baolin Wang
2024-10-22 15:31               ` David Hildenbrand
2024-10-23  8:04                 ` Baolin Wang
2024-10-23  9:27                   ` David Hildenbrand
2024-10-24 10:49                     ` Daniel Gomez
2024-10-24 10:52                       ` Daniel Gomez
2024-10-25  2:56                       ` Baolin Wang
2024-10-25 20:21                       ` David Hildenbrand
2024-10-28  9:48                         ` David Hildenbrand
2024-10-31  3:43                           ` Baolin Wang
2024-10-31  8:53                             ` David Hildenbrand [this message]
2024-10-31 10:04                               ` Baolin Wang
2024-10-31 10:46                                 ` David Hildenbrand
2024-11-05 12:45                                   ` Baolin Wang
2024-11-05 14:56                                     ` David Hildenbrand
2024-11-06  3:17                                       ` Baolin Wang
2024-10-31 10:46                                 ` David Hildenbrand
2024-10-28 21:56                         ` Daniel Gomez
2024-10-29 12:20                           ` David Hildenbrand
2024-10-22  3:34           ` Baolin Wang
2024-10-22 10:06             ` Kirill A. Shutemov
2024-10-23  9:25               ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d758a4f4-e0e6-4a78-beb4-e513de229310@redhat.com \
    --to=david@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=d@kruces.com \
    --cc=da.gomez@samsung.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox