linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Gomez <da.gomez@samsung.com>
To: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hughd@google.com" <hughd@google.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"ioworker0@gmail.com" <ioworker0@gmail.com>,
	"wangkefeng.wang@huawei.com" <wangkefeng.wang@huawei.com>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"21cnbao@gmail.com" <21cnbao@gmail.com>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"shy828301@gmail.com" <shy828301@gmail.com>,
	"ziy@nvidia.com" <ziy@nvidia.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/8] add mTHP support for anonymous shmem
Date: Thu, 9 May 2024 19:18:03 +0000	[thread overview]
Message-ID: <npn7qn4lmmyn5ed5zilcgoxr7z2immt6cwidma36nlufq2n56j@uq4maeua5yas> (raw)
In-Reply-To: <de9f9d07-6534-419b-86a8-628e13020c1e@redhat.com>

On Wed, May 08, 2024 at 07:03:57PM +0200, David Hildenbrand wrote:
> On 08.05.24 16:28, Daniel Gomez wrote:
> > On Wed, May 08, 2024 at 01:58:19PM +0200, David Hildenbrand wrote:
> > > On 08.05.24 13:39, Daniel Gomez wrote:
> > > > On Mon, May 06, 2024 at 04:46:24PM +0800, Baolin Wang wrote:
> > > > > Anonymous pages have already been supported for multi-size (mTHP) allocation
> > > > > through commit 19eaf44954df, that can allow THP to be configured through the
> > > > > sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
> > > > > 
> > > > > However, the anonymous shared pages will ignore the anonymous mTHP rule
> > > > > configured through the sysfs interface, and can only use the PMD-mapped
> > > > > THP, that is not reasonable. Many implement anonymous page sharing through
> > > > > mmap(MAP_SHARED | MAP_ANONYMOUS), especially in database usage scenarios,
> > > > > therefore, users expect to apply an unified mTHP strategy for anonymous pages,
> > > > > also including the anonymous shared pages, in order to enjoy the benefits of
> > > > > mTHP. For example, lower latency than PMD-mapped THP, smaller memory bloat
> > > > > than PMD-mapped THP, contiguous PTEs on ARM architecture to reduce TLB miss etc.
> > > > > 
> > > > > The primary strategy is similar to supporting anonymous mTHP. Introduce
> > > > > a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
> > > > > which can have all the same values as the top-level
> > > > > '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
> > > > > additional "inherit" option. By default all sizes will be set to "never"
> > > > > except PMD size, which is set to "inherit". This ensures backward compatibility
> > > > > with the shmem enabled of the top level, meanwhile also allows independent
> > > > > control of shmem enabled for each mTHP.
> > > > 
> > > > I'm trying to understand the adoption of mTHP and how it fits into the adoption
> > > > of (large) folios that the kernel is moving towards. Can you, or anyone involved
> > > > here, explain this? How much do they overlap, and can we benefit from having
> > > > both? Is there any argument against the adoption of large folios here that I
> > > > might have missed?
> > > 
> > > mTHP are implemented using large folios, just like traditional PMD-sized THP
> > > are. (you really should explore the history of mTHP and how it all works
> > > internally)
> > 
> > I'll check more in deep the code. By any chance are any of you going to be at
> > LSFMM this year? I have this session [1] scheduled for Wednesday and it would
> > be nice to get your feedback on it and if you see this working together with
> > mTHP/THP.
> > 
> 
> I'll be around and will attend that session! But note that I am still
> scratching my head what to do with "ordinary" shmem, especially because of
> the weird way shmem behaves in contrast to real files (below). Some input
> from Hugh might be very helpful.

I'm looking forward to meet you there and have your feedback!

> 
> Example: you write() to a shmem file and populate a 2M THP. Then, nobody
> touches that file for a long time. There are certainly other mmap() users
> that could better benefit from that THP ... and without swap that THP will
> be trapped there possibly a long time (unless I am missing an important
> piece of shmem THP design :) )? Sure, if we only have THP's it's nice,
> that's just not the reality unfortunately. IIRC, that's one of the reasons
> why THP for shmem can be enabled/disabled. But again, still scratching my
> head ...
> 
> 
> Note that this patch set only tackles anonymous shmem (MAP_SHARED|MAP_ANON),
> which is in 99.999% of all cases only accessed via page tables (memory
> allocated during page faults). I think there are ways to grab the fd
> (/proc/self/fd), but IIRC only corner cases read/write that.
> 
> So in that sense, anonymous shmem (this patch set) behaves mostly like
> ordinary anonymous memory, and likely there is not much overlap with other
> "allocate large folios during read/write/fallocate" as in [1]. swap might
> have an overlap.
> 
> 
> The real confusion begins when we have ordinary shmem: some users never mmap
> it and only read/write, some users never read/write it and only mmap it and
> some (less common?) users do both.
> 
> And shmem really is special: it looks like "just another file", but
> memory-consumption and reclaim wise it behaves just like anonymous memory.
> It might be swappable ("usually very limited backing disk space available")
> or it might not.
> 
> In a subthread here we are discussing what to do with that special
> "shmem_enabled = force" mode ... and it's all complicated I think.
> 
> > [1] https://lore.kernel.org/all/4ktpayu66noklllpdpspa3vm5gbmb5boxskcj2q6qn7md3pwwt@kvlu64pqwjzl/
> > 
> > > 
> > > The biggest challenge with memory that cannot be evicted on memory pressure
> > > to be reclaimed (in contrast to your ordinary files in the pagecache) is
> > > memory waste, well, and placement of large chunks of memory in general,
> > > during page faults.
> > > 
> > > In the worst case (no swap), you allocate a large chunk of memory once and
> > > it will stick around until freed: no reclaim of that memory.
> > 
> > I can see that path being triggered by some fstests but only for THP (where we
> > can actually reclaim memory).
> 
> Is that when we punch-hole a partial THP and split it? I'd be interested in
> what that test does.

The reclaim path I'm referring to is triggered when we reach max capacity
(-ENOSPC) in shmem_alloc_and_add_folio(). We reclaim space by splitting large
folios (regardless of their dirty or uptodate condition).

One of the tests that hits this path is generic/100 (with huge option enabled).
- First, it creates a directory structure in $TEMP_DIR (/tmp). Dir size is
around 26M.
- Then, it tars it up into $TEMP_DIR/temp.tar.
- Finally, untars the compressed file into $TEST_DIR (/media/test, which is the
huge tmpfs mountdir). What happens in generic/100 under the huge=always case
is that you fill up the dedicated space very quickly (this is 1G in xfstests
for tmpfs) and then you start reclaiming.

> 
> 
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 

  reply	other threads:[~2024-05-09 19:18 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06  8:46 Baolin Wang
2024-05-06  8:46 ` [PATCH 1/8] mm: move highest_order() and next_order() out of the THP config Baolin Wang
2024-05-07 10:21   ` Ryan Roberts
2024-05-08  2:13     ` Baolin Wang
2024-05-08  9:06       ` Ryan Roberts
2024-05-08  9:40         ` Baolin Wang
2024-05-06  8:46 ` [PATCH 2/8] mm: memory: extend finish_fault() to support large folio Baolin Wang
2024-05-07 10:37   ` Ryan Roberts
2024-05-08  3:44     ` Baolin Wang
2024-05-08  7:15       ` David Hildenbrand
2024-05-08  9:06         ` Baolin Wang
2024-05-08  8:53       ` Ryan Roberts
2024-05-08  9:31         ` Baolin Wang
2024-05-08 10:47           ` Ryan Roberts
2024-05-09  1:10             ` Baolin Wang
2024-05-06  8:46 ` [PATCH 3/8] mm: shmem: add an 'order' parameter for shmem_alloc_hugefolio() Baolin Wang
2024-05-06  8:46 ` [PATCH 4/8] mm: shmem: add THP validation for PMD-mapped THP related statistics Baolin Wang
2024-05-06  8:46 ` [PATCH 5/8] mm: shmem: add multi-size THP sysfs interface for anonymous shmem Baolin Wang
2024-05-07 10:52   ` Ryan Roberts
2024-05-08  4:45     ` Baolin Wang
2024-05-08  7:08       ` David Hildenbrand
2024-05-08  7:12         ` David Hildenbrand
2024-05-08  9:02           ` Ryan Roberts
2024-05-08  9:56             ` Baolin Wang
2024-05-08 10:48               ` Ryan Roberts
2024-05-08 12:02             ` David Hildenbrand
2024-05-08 12:10               ` David Hildenbrand
2024-05-08 12:43                 ` Ryan Roberts
2024-05-08 12:44                   ` Ryan Roberts
2024-05-08 12:45                   ` David Hildenbrand
2024-05-08 12:54                     ` Ryan Roberts
2024-05-08 13:07                       ` David Hildenbrand
2024-05-08 13:44                         ` Ryan Roberts
2024-05-06  8:46 ` [PATCH 6/8] mm: shmem: add mTHP support " Baolin Wang
2024-05-07 10:46   ` kernel test robot
2024-05-08  6:03     ` Baolin Wang
2024-05-06  8:46 ` [PATCH 7/8] mm: shmem: add mTHP size alignment in shmem_get_unmapped_area Baolin Wang
2024-05-06  8:46 ` [PATCH 8/8] mm: shmem: add mTHP counters for anonymous shmem Baolin Wang
2024-05-06 10:54 ` [PATCH 0/8] add mTHP support " Lance Yang
2024-05-07  1:47   ` Baolin Wang
2024-05-07  6:50     ` Lance Yang
2024-05-07 10:20 ` Ryan Roberts
2024-05-08  5:45   ` Baolin Wang
     [not found] ` <CGME20240508113934eucas1p13a3972f3f9955365f40155e084a7c7d5@eucas1p1.samsung.com>
2024-05-08 11:39   ` Daniel Gomez
2024-05-08 11:58     ` David Hildenbrand
2024-05-08 14:28       ` Daniel Gomez
2024-05-08 17:03         ` David Hildenbrand
2024-05-09 19:18           ` Daniel Gomez [this message]
2024-05-09  3:08         ` Baolin Wang
2024-05-08 19:23       ` Luis Chamberlain
2024-05-09 17:48         ` David Hildenbrand
2024-05-10 18:53           ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=npn7qn4lmmyn5ed5zilcgoxr7z2immt6cwidma36nlufq2n56j@uq4maeua5yas \
    --to=da.gomez@samsung.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox