From: Yin Fengwei <fengwei.yin@intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org, jack@suse.cz,
hughd@google.com, kirill.shutemov@linux.intel.com,
mhocko@suse.com, ak@linux.intel.com, aarcange@redhat.com,
npiggin@gmail.com, mgorman@techsingularity.net,
willy@infradead.org, rppt@kernel.org, dave.hansen@intel.com,
ying.huang@intel.com, tim.c.chen@intel.com
Cc: fengwei.yin@intel.com
Subject: [RFC PATCH 1/4] mcpage: add size/mask/shift definition for multiple consecutive page
Date: Mon, 9 Jan 2023 15:22:29 +0800 [thread overview]
Message-ID: <20230109072232.2398464-2-fengwei.yin@intel.com> (raw)
In-Reply-To: <20230109072232.2398464-1-fengwei.yin@intel.com>
Huge page in current kernel could bring obvious performance improvement
for some workloads with less TLB missing and less page fault. But the
limited options of huge page size (2M/1G for x86_64) also brings extra
cost like larger memory consumption, and more CPU cycle for page zeroing.
The idea of the multiple consecutive page (abbr as "mcpage") is using
collection of physical contiguous 4K page other than huge page for
anonymous mapping. Target is to have more choices to trade off the pros
and cons of huge page. Comparing to huge page, it will not get so much
benefit of TLB missing and page fault. And it will not pay too much extra
cost for large memory consumption and larger latency introduced by page
compaction, page zeroing etc.
The size of mcpage can be configured. The default value of 16K size is
just picked up arbitrarily. User should choose the value according to the
result of tuning their workload with different mcpage size.
To have physical contiguous pages, high order pages is allocated (order
is calculated according to mcpage size). Then the high order page will
be split. By doing this, each sub page of mcpage is just normal 4K page.
The current kernel page management infrastructure is applied to "mc"
pages without any change.
To reduce the page fault number, multiple page table entries are populated
in one page fault with sub pages pfn of mcpage. This also brings a little
bit cost of memory consumption.
Update Kconfig to allow user define the mcpage order. Define MACROs like
mcpage mask/shift/nr/size.
In this RFC patch, only Kconfig is used for mcpage order to show the idea.
Runtime parameter will be chosen if make this official patch in the future.
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
include/linux/mm_types.h | 11 +++++++++++
mm/Kconfig | 19 +++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3b8475007734..fa561c7b6290 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -71,6 +71,17 @@ struct mem_cgroup;
#define _struct_page_alignment __aligned(sizeof(unsigned long))
#endif
+#ifdef CONFIG_MCPAGE_ORDER
+#define MCPAGE_ORDER CONFIG_MCPAGE_ORDER
+#else
+#define MCPAGE_ORDER 0
+#endif
+
+#define MCPAGE_SIZE (1 << (MCPAGE_ORDER + PAGE_SHIFT))
+#define MCPAGE_MASK (~(MCPAGE_SIZE - 1))
+#define MCPAGE_SHIFT (MCPAGE_ORDER + PAGE_SHIFT)
+#define MCPAGE_NR (1 << (MCPAGE_ORDER))
+
struct page {
unsigned long flags; /* Atomic flags, some possibly
* updated asynchronously */
diff --git a/mm/Kconfig b/mm/Kconfig
index ff7b209dec05..c202dc99ab6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -650,6 +650,25 @@ config HUGETLB_PAGE_SIZE_VARIABLE
Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
clamped down to MAX_ORDER - 1.
+config MCPAGE
+ bool "multiple consecutive page <mcpage>"
+ default n
+ help
+ Enable multiple consecutive page: mcpage is page collections (sub-page)
+ which are physical contiguous. When mapping to user space, all the
+ sub-pages will be mapped to user space in one page fault handler.
+ Expect to trade off the pros and cons of huge page. Like less
+ unnecessary extra memory zeroing and less memory consumption.
+ But with no TLB benefit.
+
+config MCPAGE_ORDER
+ int "multiple consecutive page order"
+ default 2
+ depends on X86_64 && MCPAGE
+ help
+ The order of mcpage. Should be chosen carefully by tuning your
+ workload.
+
config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
--
2.30.2
next prev parent reply other threads:[~2023-01-09 7:19 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-09 7:22 [RFC PATCH 0/4] Multiple consecutive page for anonymous mapping Yin Fengwei
2023-01-09 7:22 ` Yin Fengwei [this message]
2023-01-09 13:24 ` [RFC PATCH 1/4] mcpage: add size/mask/shift definition for multiple consecutive page Matthew Wilcox
2023-01-09 16:30 ` Dave Hansen
2023-01-09 17:01 ` Matthew Wilcox
2023-01-10 2:53 ` Yin, Fengwei
2023-01-09 7:22 ` [RFC PATCH 2/4] mcpage: anon page: Use mcpage for anonymous mapping Yin Fengwei
2023-01-09 7:22 ` [RFC PATCH 3/4] mcpage: add vmstat counters for mcpages Yin Fengwei
2023-01-09 7:22 ` [RFC PATCH 4/4] mcpage: get_unmapped_area return mcpage size aligned addr Yin Fengwei
2023-01-09 8:37 ` [RFC PATCH 0/4] Multiple consecutive page for anonymous mapping Kirill A. Shutemov
2023-01-11 6:13 ` Yin, Fengwei
2023-01-09 17:33 ` David Hildenbrand
2023-01-09 19:11 ` Matthew Wilcox
2023-01-10 14:13 ` David Hildenbrand
2023-01-10 3:57 ` Yin, Fengwei
2023-01-10 14:40 ` David Hildenbrand
2023-01-11 6:12 ` Yin, Fengwei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230109072232.2398464-2-fengwei.yin@intel.com \
--to=fengwei.yin@intel.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=npiggin@gmail.com \
--cc=rppt@kernel.org \
--cc=tim.c.chen@intel.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox