linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yin Fengwei <fengwei.yin@intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org, jack@suse.cz,
	hughd@google.com, kirill.shutemov@linux.intel.com,
	mhocko@suse.com, ak@linux.intel.com, aarcange@redhat.com,
	npiggin@gmail.com, mgorman@techsingularity.net,
	willy@infradead.org, rppt@kernel.org, dave.hansen@intel.com,
	ying.huang@intel.com, tim.c.chen@intel.com
Cc: fengwei.yin@intel.com
Subject: [RFC PATCH 0/4] Multiple consecutive page for anonymous mapping
Date: Mon,  9 Jan 2023 15:22:28 +0800	[thread overview]
Message-ID: <20230109072232.2398464-1-fengwei.yin@intel.com> (raw)

In a nutshell:  4k is too small and 2M is too big.  We started
asking ourselves whether there was something in the middle that
we could do.  This series shows what that middle ground might
look like.  It provides some of the benefits of THP while
eliminating some of the downsides.

This series uses "multiple consecutive pages" (mcpages) of
between 8K and 2M of base pages for anonymous user space mappings.
This will lead to less internal fragmentation versus 2M mappings
and thus less memory consumption and wasted CPU time zeroing
memory which will never be used.

In the implementation, we allocate high order page with order of
mcpage (e.g., order 2 for 16KB mcpage). This makes sure the
physical contiguous memory is used and benefit sequential memory
access latency.

Then split the high order page. By doing this, the sub-page of
mcpage is just 4K normal page. The current kernel page
management is applied to "mc" pages without any changes. Batching
page faults is allowed with mcpage and reduce page faults number.

There are costs with mcpage. Besides no TLB benefit THP brings, it
increases memory consumption and latency of allocation page
comparing to 4K base page.

This series is the first step of mcpage. The furture work can be
enable mcpage for more components like page cache, swapping etc.
Finally, most pages in system will be allocated/free/reclaimed
with mcpage order.

The series is constructed as following:
Patch 1 add the mcpage size related definitions and Kconfig entry
Patch 2 specific for x86_64 to align mmap start address to mcpage
        size
Patch 3 is the main change. It adds code to hook to anonymous page
        fault handle and apply mcpage to anonymous mapping
Patch 4 adds some statistic of mcpage

The overall code change is quite straight forward. The most thing I
like to hear here is whether this is a right direction I can go
further.

This series does not leverage compound pages.  This means that
normal kernel code that encounters an 'mcpage' region does not
need to do anything special.  It also does not leverage folios,
although trying to leverage folios is something that we would
like to explore.  We would welcome input on how that might
happen.

Some performance data were collected with 16K mcpage size and
shown in patch 2/4 and 4/4. If you have other workload and like
to know the impact, just let me know. I can setup the env and
run the test.


Yin Fengwei (4):
  mcpage: add size/mask/shift definition for multiple consecutive page
  mcpage: anon page: Use mcpage for anonymous mapping
  mcpage: add vmstat counters for mcpages
  mcpage: get_unmapped_area return mcpage size aligned addr

 arch/x86/kernel/sys_x86_64.c  |   8 ++
 include/linux/gfp.h           |   5 ++
 include/linux/mcpage_mm.h     |  35 +++++++++
 include/linux/mm_types.h      |  11 +++
 include/linux/vm_event_item.h |  10 +++
 mm/Kconfig                    |  19 +++++
 mm/Makefile                   |   1 +
 mm/mcpage_memory.c            | 140 ++++++++++++++++++++++++++++++++++
 mm/memory.c                   |  12 +++
 mm/mempolicy.c                |  51 +++++++++++++
 mm/vmstat.c                   |   7 ++
 11 files changed, 299 insertions(+)
 create mode 100644 include/linux/mcpage_mm.h
 create mode 100644 mm/mcpage_memory.c


base-commit: b7bfaa761d760e72a969d116517eaa12e404c262
-- 
2.30.2



             reply	other threads:[~2023-01-09  7:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-09  7:22 Yin Fengwei [this message]
2023-01-09  7:22 ` [RFC PATCH 1/4] mcpage: add size/mask/shift definition for multiple consecutive page Yin Fengwei
2023-01-09 13:24   ` Matthew Wilcox
2023-01-09 16:30     ` Dave Hansen
2023-01-09 17:01       ` Matthew Wilcox
2023-01-10  2:53     ` Yin, Fengwei
2023-01-09  7:22 ` [RFC PATCH 2/4] mcpage: anon page: Use mcpage for anonymous mapping Yin Fengwei
2023-01-09  7:22 ` [RFC PATCH 3/4] mcpage: add vmstat counters for mcpages Yin Fengwei
2023-01-09  7:22 ` [RFC PATCH 4/4] mcpage: get_unmapped_area return mcpage size aligned addr Yin Fengwei
2023-01-09  8:37 ` [RFC PATCH 0/4] Multiple consecutive page for anonymous mapping Kirill A. Shutemov
2023-01-11  6:13   ` Yin, Fengwei
2023-01-09 17:33 ` David Hildenbrand
2023-01-09 19:11   ` Matthew Wilcox
2023-01-10 14:13     ` David Hildenbrand
2023-01-10  3:57   ` Yin, Fengwei
2023-01-10 14:40     ` David Hildenbrand
2023-01-11  6:12       ` Yin, Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230109072232.2398464-1-fengwei.yin@intel.com \
    --to=fengwei.yin@intel.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=npiggin@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tim.c.chen@intel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox