linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Yin Fengwei <fengwei.yin@intel.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org, jack@suse.cz,
	hughd@google.com, kirill.shutemov@linux.intel.com,
	mhocko@suse.com, ak@linux.intel.com, aarcange@redhat.com,
	npiggin@gmail.com, mgorman@techsingularity.net,
	willy@infradead.org, rppt@kernel.org, dave.hansen@intel.com,
	ying.huang@intel.com, tim.c.chen@intel.com
Subject: Re: [RFC PATCH 0/4] Multiple consecutive page for anonymous mapping
Date: Mon, 9 Jan 2023 18:33:09 +0100	[thread overview]
Message-ID: <a79d773c-640a-860c-0d3c-6e1267f39165@redhat.com> (raw)
In-Reply-To: <20230109072232.2398464-1-fengwei.yin@intel.com>

On 09.01.23 08:22, Yin Fengwei wrote:
> In a nutshell:  4k is too small and 2M is too big.  We started
> asking ourselves whether there was something in the middle that
> we could do.  This series shows what that middle ground might
> look like.  It provides some of the benefits of THP while
> eliminating some of the downsides.
> 
> This series uses "multiple consecutive pages" (mcpages) of
> between 8K and 2M of base pages for anonymous user space mappings.
> This will lead to less internal fragmentation versus 2M mappings
> and thus less memory consumption and wasted CPU time zeroing
> memory which will never be used.

Hi,

what I understand is that this is some form of faultaround for anonymous 
memory, with the special-case that we try to allocate the pages 
consecutively.

Some thoughts:

(1) Faultaround might be unexpected for some workloads and increase
     memory consumption unnecessarily.

Yes, something like that can happen with THP BUT

(a) THP can be disabled or is frequently only enabled for madvised
     regions -- for example, exactly for this reason.
(b) Some workloads (especially memory ballooning) rely on memory not
     suddenly re-appearing after MADV_DONTNEED. This works even with THP,
     because the 4k MADV_DONTNEED will first PTE-map the THP. Because
     there is a PTE page table, we won't suddenly get a THP populated
     again (unless khugepaged is configured to fill holes).


I strongly assume we will need something similar to force-disable, 
selectively-enable etc.


(2) This steals consecutive pages to immediately split them up

I know, everybody thinks it might be valuable for their use case to grab 
all higher-order pages :) It will be "fun" once all these cases start 
competing. TBH, splitting up them immediately again smells like being 
the lowest priority among all higher-order users.


(3) All effort will be lost once page compaction gets active, compacts,
     and simply migrates to random 4k pages. This is most probably the
     biggest "issue" of the whole approach AFAIKS: it's only temporary
     because there is no notion of these pages belonging together
     anymore.

> 
> In the implementation, we allocate high order page with order of
> mcpage (e.g., order 2 for 16KB mcpage). This makes sure the
> physical contiguous memory is used and benefit sequential memory
> access latency.
> 
> Then split the high order page. By doing this, the sub-page of
> mcpage is just 4K normal page. The current kernel page
> management is applied to "mc" pages without any changes. Batching
> page faults is allowed with mcpage and reduce page faults number.
> 
> There are costs with mcpage. Besides no TLB benefit THP brings, it
> increases memory consumption and latency of allocation page
> comparing to 4K base page.
> 
> This series is the first step of mcpage. The furture work can be
> enable mcpage for more components like page cache, swapping etc.
> Finally, most pages in system will be allocated/free/reclaimed
> with mcpage order.

I think avoiding new, herd-to-get terminology ("mcpage") might be 
better. I know, everybody wants to give its child a name, but the name 
us not really future proof: "multiple consecutive pages" might at one 
point be maybe just a folio.

I'd summarize the ideas as "faultaround" whereby we try optimizing for 
locality.

Note that a similar (but different) concept already exists (hidden) for 
hugetlb e.g., on arm64. The feature is called "cont-pte" -- a sequence 
of PTEs that logically map a hugetlb page.

-- 
Thanks,

David / dhildenb



  parent reply	other threads:[~2023-01-09 17:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-09  7:22 Yin Fengwei
2023-01-09  7:22 ` [RFC PATCH 1/4] mcpage: add size/mask/shift definition for multiple consecutive page Yin Fengwei
2023-01-09 13:24   ` Matthew Wilcox
2023-01-09 16:30     ` Dave Hansen
2023-01-09 17:01       ` Matthew Wilcox
2023-01-10  2:53     ` Yin, Fengwei
2023-01-09  7:22 ` [RFC PATCH 2/4] mcpage: anon page: Use mcpage for anonymous mapping Yin Fengwei
2023-01-09  7:22 ` [RFC PATCH 3/4] mcpage: add vmstat counters for mcpages Yin Fengwei
2023-01-09  7:22 ` [RFC PATCH 4/4] mcpage: get_unmapped_area return mcpage size aligned addr Yin Fengwei
2023-01-09  8:37 ` [RFC PATCH 0/4] Multiple consecutive page for anonymous mapping Kirill A. Shutemov
2023-01-11  6:13   ` Yin, Fengwei
2023-01-09 17:33 ` David Hildenbrand [this message]
2023-01-09 19:11   ` Matthew Wilcox
2023-01-10 14:13     ` David Hildenbrand
2023-01-10  3:57   ` Yin, Fengwei
2023-01-10 14:40     ` David Hildenbrand
2023-01-11  6:12       ` Yin, Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a79d773c-640a-860c-0d3c-6e1267f39165@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=npiggin@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tim.c.chen@intel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox