linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Daniel Gomez <da.gomez@samsung.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hughd@google.com" <hughd@google.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"ioworker0@gmail.com" <ioworker0@gmail.com>,
	"wangkefeng.wang@huawei.com" <wangkefeng.wang@huawei.com>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"21cnbao@gmail.com" <21cnbao@gmail.com>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"shy828301@gmail.com" <shy828301@gmail.com>,
	"ziy@nvidia.com" <ziy@nvidia.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/8] add mTHP support for anonymous shmem
Date: Wed, 8 May 2024 19:03:57 +0200	[thread overview]
Message-ID: <de9f9d07-6534-419b-86a8-628e13020c1e@redhat.com> (raw)
In-Reply-To: <qye6lbmybiivdrr2vtlwgzzqelp7zoekezgiwo6mpirrl2576z@voszmqjbnm2q>

On 08.05.24 16:28, Daniel Gomez wrote:
> On Wed, May 08, 2024 at 01:58:19PM +0200, David Hildenbrand wrote:
>> On 08.05.24 13:39, Daniel Gomez wrote:
>>> On Mon, May 06, 2024 at 04:46:24PM +0800, Baolin Wang wrote:
>>>> Anonymous pages have already been supported for multi-size (mTHP) allocation
>>>> through commit 19eaf44954df, that can allow THP to be configured through the
>>>> sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
>>>>
>>>> However, the anonymous shared pages will ignore the anonymous mTHP rule
>>>> configured through the sysfs interface, and can only use the PMD-mapped
>>>> THP, that is not reasonable. Many implement anonymous page sharing through
>>>> mmap(MAP_SHARED | MAP_ANONYMOUS), especially in database usage scenarios,
>>>> therefore, users expect to apply an unified mTHP strategy for anonymous pages,
>>>> also including the anonymous shared pages, in order to enjoy the benefits of
>>>> mTHP. For example, lower latency than PMD-mapped THP, smaller memory bloat
>>>> than PMD-mapped THP, contiguous PTEs on ARM architecture to reduce TLB miss etc.
>>>>
>>>> The primary strategy is similar to supporting anonymous mTHP. Introduce
>>>> a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
>>>> which can have all the same values as the top-level
>>>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
>>>> additional "inherit" option. By default all sizes will be set to "never"
>>>> except PMD size, which is set to "inherit". This ensures backward compatibility
>>>> with the shmem enabled of the top level, meanwhile also allows independent
>>>> control of shmem enabled for each mTHP.
>>>
>>> I'm trying to understand the adoption of mTHP and how it fits into the adoption
>>> of (large) folios that the kernel is moving towards. Can you, or anyone involved
>>> here, explain this? How much do they overlap, and can we benefit from having
>>> both? Is there any argument against the adoption of large folios here that I
>>> might have missed?
>>
>> mTHP are implemented using large folios, just like traditional PMD-sized THP
>> are. (you really should explore the history of mTHP and how it all works
>> internally)
> 
> I'll check more in deep the code. By any chance are any of you going to be at
> LSFMM this year? I have this session [1] scheduled for Wednesday and it would
> be nice to get your feedback on it and if you see this working together with
> mTHP/THP.
>

I'll be around and will attend that session! But note that I am still 
scratching my head what to do with "ordinary" shmem, especially because 
of the weird way shmem behaves in contrast to real files (below). Some 
input from Hugh might be very helpful.

Example: you write() to a shmem file and populate a 2M THP. Then, nobody 
touches that file for a long time. There are certainly other mmap() 
users that could better benefit from that THP ... and without swap that 
THP will be trapped there possibly a long time (unless I am missing an 
important piece of shmem THP design :) )? Sure, if we only have THP's 
it's nice, that's just not the reality unfortunately. IIRC, that's one 
of the reasons why THP for shmem can be enabled/disabled. But again, 
still scratching my head ...


Note that this patch set only tackles anonymous shmem 
(MAP_SHARED|MAP_ANON), which is in 99.999% of all cases only accessed 
via page tables (memory allocated during page faults). I think there are 
ways to grab the fd (/proc/self/fd), but IIRC only corner cases 
read/write that.

So in that sense, anonymous shmem (this patch set) behaves mostly like 
ordinary anonymous memory, and likely there is not much overlap with 
other "allocate large folios during read/write/fallocate" as in [1]. 
swap might have an overlap.


The real confusion begins when we have ordinary shmem: some users never 
mmap it and only read/write, some users never read/write it and only 
mmap it and some (less common?) users do both.

And shmem really is special: it looks like "just another file", but 
memory-consumption and reclaim wise it behaves just like anonymous 
memory. It might be swappable ("usually very limited backing disk space 
available") or it might not.

In a subthread here we are discussing what to do with that special 
"shmem_enabled = force" mode ... and it's all complicated I think.

> [1] https://lore.kernel.org/all/4ktpayu66noklllpdpspa3vm5gbmb5boxskcj2q6qn7md3pwwt@kvlu64pqwjzl/
> 
>>
>> The biggest challenge with memory that cannot be evicted on memory pressure
>> to be reclaimed (in contrast to your ordinary files in the pagecache) is
>> memory waste, well, and placement of large chunks of memory in general,
>> during page faults.
>>
>> In the worst case (no swap), you allocate a large chunk of memory once and
>> it will stick around until freed: no reclaim of that memory.
> 
> I can see that path being triggered by some fstests but only for THP (where we
> can actually reclaim memory).

Is that when we punch-hole a partial THP and split it? I'd be interested 
in what that test does.



-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-05-08 17:04 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06  8:46 Baolin Wang
2024-05-06  8:46 ` [PATCH 1/8] mm: move highest_order() and next_order() out of the THP config Baolin Wang
2024-05-07 10:21   ` Ryan Roberts
2024-05-08  2:13     ` Baolin Wang
2024-05-08  9:06       ` Ryan Roberts
2024-05-08  9:40         ` Baolin Wang
2024-05-06  8:46 ` [PATCH 2/8] mm: memory: extend finish_fault() to support large folio Baolin Wang
2024-05-07 10:37   ` Ryan Roberts
2024-05-08  3:44     ` Baolin Wang
2024-05-08  7:15       ` David Hildenbrand
2024-05-08  9:06         ` Baolin Wang
2024-05-08  8:53       ` Ryan Roberts
2024-05-08  9:31         ` Baolin Wang
2024-05-08 10:47           ` Ryan Roberts
2024-05-09  1:10             ` Baolin Wang
2024-05-06  8:46 ` [PATCH 3/8] mm: shmem: add an 'order' parameter for shmem_alloc_hugefolio() Baolin Wang
2024-05-06  8:46 ` [PATCH 4/8] mm: shmem: add THP validation for PMD-mapped THP related statistics Baolin Wang
2024-05-06  8:46 ` [PATCH 5/8] mm: shmem: add multi-size THP sysfs interface for anonymous shmem Baolin Wang
2024-05-07 10:52   ` Ryan Roberts
2024-05-08  4:45     ` Baolin Wang
2024-05-08  7:08       ` David Hildenbrand
2024-05-08  7:12         ` David Hildenbrand
2024-05-08  9:02           ` Ryan Roberts
2024-05-08  9:56             ` Baolin Wang
2024-05-08 10:48               ` Ryan Roberts
2024-05-08 12:02             ` David Hildenbrand
2024-05-08 12:10               ` David Hildenbrand
2024-05-08 12:43                 ` Ryan Roberts
2024-05-08 12:44                   ` Ryan Roberts
2024-05-08 12:45                   ` David Hildenbrand
2024-05-08 12:54                     ` Ryan Roberts
2024-05-08 13:07                       ` David Hildenbrand
2024-05-08 13:44                         ` Ryan Roberts
2024-05-06  8:46 ` [PATCH 6/8] mm: shmem: add mTHP support " Baolin Wang
2024-05-07 10:46   ` kernel test robot
2024-05-08  6:03     ` Baolin Wang
2024-05-06  8:46 ` [PATCH 7/8] mm: shmem: add mTHP size alignment in shmem_get_unmapped_area Baolin Wang
2024-05-06  8:46 ` [PATCH 8/8] mm: shmem: add mTHP counters for anonymous shmem Baolin Wang
2024-05-06 10:54 ` [PATCH 0/8] add mTHP support " Lance Yang
2024-05-07  1:47   ` Baolin Wang
2024-05-07  6:50     ` Lance Yang
2024-05-07 10:20 ` Ryan Roberts
2024-05-08  5:45   ` Baolin Wang
     [not found] ` <CGME20240508113934eucas1p13a3972f3f9955365f40155e084a7c7d5@eucas1p1.samsung.com>
2024-05-08 11:39   ` Daniel Gomez
2024-05-08 11:58     ` David Hildenbrand
2024-05-08 14:28       ` Daniel Gomez
2024-05-08 17:03         ` David Hildenbrand [this message]
2024-05-09 19:18           ` Daniel Gomez
2024-05-09  3:08         ` Baolin Wang
2024-05-08 19:23       ` Luis Chamberlain
2024-05-09 17:48         ` David Hildenbrand
2024-05-10 18:53           ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de9f9d07-6534-419b-86a8-628e13020c1e@redhat.com \
    --to=david@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=da.gomez@samsung.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox