linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/5] mm: introduce THP deferred setting
@ 2025-02-11  0:40 Nico Pache
  2025-02-11  0:40 ` [RFC v2 1/5] mm: defer THP insertion to khugepaged Nico Pache
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Nico Pache @ 2025-02-11  0:40 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, usamaarif642, audra,
	akpm, rostedt, mathieu.desnoyers, tiwai, baolin.wang, corbet,
	shuah

This series is a follow-up to [1], which adds mTHP support to khugepaged.
mTHP khugepaged support was necessary for the global="defer" and
mTHP="inherit" case (and others) to make sense.

We've seen cases were customers switching from RHEL7 to RHEL8 see a
significant increase in the memory footprint for the same workloads.

Through our investigations we found that a large contributing factor to
the increase in RSS was an increase in THP usage.

For workloads like MySQL, or when using allocators like jemalloc, it is
often recommended to set /transparent_hugepages/enabled=never. This is
in part due to performance degradations and increased memory waste.

This series introduces enabled=defer, this setting acts as a middle
ground between always and madvise. If the mapping is MADV_HUGEPAGE, the
page fault handler will act normally, making a hugepage if possible. If
the allocation is not MADV_HUGEPAGE, then the page fault handler will
default to the base size allocation. The caveat is that khugepaged can
still operate on pages thats not MADV_HUGEPAGE.

This allows for two things... one, applications specifically designed to
use hugepages will get them, and two, applications that don't use
hugepages can still benefit from them without aggressively inserting
THPs at every possible chance. This curbs the memory waste, and defers
the use of hugepages to khugepaged. Khugepaged can then scan the memory
for eligible collapsing.

Admins may want to lower max_ptes_none, if not, khugepaged may
aggressively collapse single allocations into hugepages.

TESTING:
- Built for x86_64, aarch64, ppc64le, and s390x
- selftests mm
- In [1] I provided a script [2] that has multiple access patterns
- lots of general use. These changes have been running in my VM for some time
- redis testing. This test was my original case for the defer mode. What I was
   able to prove was that THP=always leads to increased max_latency cases; hence
   why it is recommended to disable THPs for redis servers. However with 'defer'
   we dont have the max_latency spikes and can still get the system to utilize
   THPs. I further tested this with the mTHP defer setting and found that redis
   (and probably other jmalloc users) can utilize THPs via defer (+mTHP defer)
   without a large latency penalty and some potential gains.
   I uploaded some mmtest results here [3] which compares:
       stock+thp=never
       stock+(m)thp=always
       khugepaged-mthp + defer (max_ptes_none=64)

  The results show that (m)THPs can cause some throughput regression in some
  cases, but also has gains in other cases. The mTHP+defer results have more
  gains and less losses over the (m)THP=always case.

V2 Changes:
- base changes on mTHP khugepaged support
- Fix selftests parsing issue
- add mTHP defer option
- add mTHP defer Documentation

[1] - https://lkml.org/lkml/2025/2/10/1982
[2] - https://gitlab.com/npache/khugepaged_mthp_test
[3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.html

Nico Pache (5):
  mm: defer THP insertion to khugepaged
  mm: document transparent_hugepage=defer usage
  selftests: mm: add defer to thp setting parser
  khugepaged: add defer option to mTHP options
  mm: document mTHP defer setting

 Documentation/admin-guide/mm/transhuge.rst | 40 ++++++++++---
 include/linux/huge_mm.h                    | 18 +++++-
 mm/huge_memory.c                           | 69 +++++++++++++++++++---
 mm/khugepaged.c                            | 10 ++--
 tools/testing/selftests/mm/thp_settings.c  |  1 +
 tools/testing/selftests/mm/thp_settings.h  |  1 +
 6 files changed, 115 insertions(+), 24 deletions(-)

-- 
2.48.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v2 1/5] mm: defer THP insertion to khugepaged
  2025-02-11  0:40 [RFC v2 0/5] mm: introduce THP deferred setting Nico Pache
@ 2025-02-11  0:40 ` Nico Pache
  2025-02-17 14:59   ` Usama Arif
  2025-02-11  0:40 ` [RFC v2 2/5] mm: document transparent_hugepage=defer usage Nico Pache
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Nico Pache @ 2025-02-11  0:40 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, usamaarif642, audra,
	akpm, rostedt, mathieu.desnoyers, tiwai, baolin.wang, corbet,
	shuah

setting /transparent_hugepages/enabled=always allows applications
to benefit from THPs without having to madvise. However, the pf handler
takes very few considerations to decide weather or not to actually use a
THP. This can lead to a lot of wasted memory. khugepaged only operates
on memory that was either allocated with enabled=always or MADV_HUGEPAGE.

Introduce the ability to set enabled=defer, which will prevent THPs from
being allocated by the page fault handler unless madvise is set,
leaving it up to khugepaged to decide which allocations will collapse to a
THP. This should allow applications to benefits from THPs, while curbing
some of the memory waste.

Signed-off-by: Nico Pache <npache@redhat.com>
---
 include/linux/huge_mm.h | 15 +++++++++++++--
 mm/huge_memory.c        | 31 +++++++++++++++++++++++++++----
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 93e509b6c00e..fb381ca720ea 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -44,6 +44,7 @@ enum transparent_hugepage_flag {
 	TRANSPARENT_HUGEPAGE_UNSUPPORTED,
 	TRANSPARENT_HUGEPAGE_FLAG,
 	TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
+	TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG,
 	TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
 	TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
 	TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG,
@@ -177,6 +178,7 @@ static inline bool hugepage_global_enabled(void)
 {
 	return transparent_hugepage_flags &
 			((1<<TRANSPARENT_HUGEPAGE_FLAG) |
+			(1<<TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG) |
 			(1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG));
 }
 
@@ -186,6 +188,12 @@ static inline bool hugepage_global_always(void)
 			(1<<TRANSPARENT_HUGEPAGE_FLAG);
 }
 
+static inline bool hugepage_global_defer(void)
+{
+	return transparent_hugepage_flags &
+			(1<<TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG);
+}
+
 static inline int highest_order(unsigned long orders)
 {
 	return fls_long(orders) - 1;
@@ -282,13 +290,16 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
 				       unsigned long tva_flags,
 				       unsigned long orders)
 {
+	if ((tva_flags & TVA_IN_PF) && hugepage_global_defer() &&
+			!(vm_flags & VM_HUGEPAGE))
+		return 0;
+
 	/* Optimization to check if required orders are enabled early. */
 	if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
 		unsigned long mask = READ_ONCE(huge_anon_orders_always);
-
 		if (vm_flags & VM_HUGEPAGE)
 			mask |= READ_ONCE(huge_anon_orders_madvise);
-		if (hugepage_global_always() ||
+		if (hugepage_global_always() || hugepage_global_defer() ||
 		    ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
 			mask |= READ_ONCE(huge_anon_orders_inherit);
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3d3ebdc002d5..a5e66a12bae8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -297,12 +297,15 @@ static ssize_t enabled_show(struct kobject *kobj,
 	const char *output;
 
 	if (test_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags))
-		output = "[always] madvise never";
+		output = "[always] madvise defer never";
 	else if (test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
 			  &transparent_hugepage_flags))
-		output = "always [madvise] never";
+		output = "always [madvise] defer never";
+	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG,
+			  &transparent_hugepage_flags))
+		output = "always madvise [defer] never";
 	else
-		output = "always madvise [never]";
+		output = "always madvise defer [never]";
 
 	return sysfs_emit(buf, "%s\n", output);
 }
@@ -315,13 +318,20 @@ static ssize_t enabled_store(struct kobject *kobj,
 
 	if (sysfs_streq(buf, "always")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
+		clear_bit(TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG, &transparent_hugepage_flags);
 		set_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
+	} else if (sysfs_streq(buf, "defer")) {
+		clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
+		clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
+		set_bit(TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG, &transparent_hugepage_flags);
 	} else if (sysfs_streq(buf, "madvise")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
+		clear_bit(TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG, &transparent_hugepage_flags);
 		set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
 	} else if (sysfs_streq(buf, "never")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
+		clear_bit(TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG, &transparent_hugepage_flags);
 	} else
 		ret = -EINVAL;
 
@@ -943,18 +953,31 @@ static int __init setup_transparent_hugepage(char *str)
 			&transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
 			  &transparent_hugepage_flags);
+		clear_bit(TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG,
+			  &transparent_hugepage_flags);
 		ret = 1;
+	} else if (!strcmp(str, "defer")) {
+		clear_bit(TRANSPARENT_HUGEPAGE_FLAG,
+			  &transparent_hugepage_flags);
+		clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
+			  &transparent_hugepage_flags);
+		set_bit(TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG,
+			  &transparent_hugepage_flags);
 	} else if (!strcmp(str, "madvise")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_FLAG,
 			  &transparent_hugepage_flags);
+		clear_bit(TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG,
+			  &transparent_hugepage_flags);
 		set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
-			&transparent_hugepage_flags);
+			  &transparent_hugepage_flags);
 		ret = 1;
 	} else if (!strcmp(str, "never")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_FLAG,
 			  &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
 			  &transparent_hugepage_flags);
+		clear_bit(TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG,
+			  &transparent_hugepage_flags);
 		ret = 1;
 	}
 out:
-- 
2.48.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v2 2/5] mm: document transparent_hugepage=defer usage
  2025-02-11  0:40 [RFC v2 0/5] mm: introduce THP deferred setting Nico Pache
  2025-02-11  0:40 ` [RFC v2 1/5] mm: defer THP insertion to khugepaged Nico Pache
@ 2025-02-11  0:40 ` Nico Pache
  2025-02-17 15:04   ` Usama Arif
  2025-02-11  0:40 ` [RFC v2 3/5] selftests: mm: add defer to thp setting parser Nico Pache
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Nico Pache @ 2025-02-11  0:40 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, usamaarif642, audra,
	akpm, rostedt, mathieu.desnoyers, tiwai, baolin.wang, corbet,
	shuah

The new transparent_hugepage=defer option allows for a more conservative
approach to THPs. Document its usage in the transhuge admin-guide.

Signed-off-by: Nico Pache <npache@redhat.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index dff8d5985f0f..b3b18573bbb4 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -88,8 +88,9 @@ In certain cases when hugepages are enabled system wide, application
 may end up allocating more memory resources. An application may mmap a
 large region but only touch 1 byte of it, in that case a 2M page might
 be allocated instead of a 4k page for no good. This is why it's
-possible to disable hugepages system-wide and to only have them inside
-MADV_HUGEPAGE madvise regions.
+possible to disable hugepages system-wide, only have them inside
+MADV_HUGEPAGE madvise regions, or defer them away from the page fault
+handler to khugepaged.
 
 Embedded systems should enable hugepages only inside madvise regions
 to eliminate any risk of wasting any precious byte of memory and to
@@ -99,6 +100,15 @@ Applications that gets a lot of benefit from hugepages and that don't
 risk to lose memory by using hugepages, should use
 madvise(MADV_HUGEPAGE) on their critical mmapped regions.
 
+Applications that would like to benefit from THPs but would still like a
+more memory conservative approach can choose 'defer'. This avoids
+inserting THPs at the page fault handler unless they are MADV_HUGEPAGE.
+Khugepaged will then scan the mappings for potential collapses into PMD
+sized pages. Admins using this the 'defer' setting should consider
+tweaking khugepaged/max_ptes_none. The current default of 511 may
+aggressively collapse your PTEs into PMDs. Lower this value to conserve
+more memory (ie. max_ptes_none=64).
+
 .. _thp_sysfs:
 
 sysfs
@@ -136,6 +146,7 @@ The top-level setting (for use with "inherit") can be set by issuing
 one of the following commands::
 
 	echo always >/sys/kernel/mm/transparent_hugepage/enabled
+	echo defer >/sys/kernel/mm/transparent_hugepage/enabled
 	echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
 	echo never >/sys/kernel/mm/transparent_hugepage/enabled
 
@@ -274,7 +285,8 @@ of small pages into one large page::
 A higher value leads to use additional memory for programs.
 A lower value leads to gain less thp performance. Value of
 max_ptes_none can waste cpu time very little, you can
-ignore it.
+ignore it. Consider lowering this value when using
+``transparent_hugepage=defer``
 
 ``max_ptes_swap`` specifies how many pages can be brought in from
 swap when collapsing a group of pages into a transparent huge page::
@@ -299,8 +311,8 @@ Boot parameters
 
 You can change the sysfs boot time default for the top-level "enabled"
 control by passing the parameter ``transparent_hugepage=always`` or
-``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
-kernel command line.
+``transparent_hugepage=madvise`` or ``transparent_hugepage=defer`` or
+``transparent_hugepage=never`` to the kernel command line.
 
 Alternatively, each supported anonymous THP size can be controlled by
 passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,
-- 
2.48.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v2 3/5] selftests: mm: add defer to thp setting parser
  2025-02-11  0:40 [RFC v2 0/5] mm: introduce THP deferred setting Nico Pache
  2025-02-11  0:40 ` [RFC v2 1/5] mm: defer THP insertion to khugepaged Nico Pache
  2025-02-11  0:40 ` [RFC v2 2/5] mm: document transparent_hugepage=defer usage Nico Pache
@ 2025-02-11  0:40 ` Nico Pache
  2025-02-11  0:40 ` [RFC v2 4/5] khugepaged: add defer option to mTHP options Nico Pache
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Nico Pache @ 2025-02-11  0:40 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, usamaarif642, audra,
	akpm, rostedt, mathieu.desnoyers, tiwai, baolin.wang, corbet,
	shuah

add the defer setting to the selftests library for reading thp settings.

Signed-off-by: Nico Pache <npache@redhat.com>
---
 tools/testing/selftests/mm/thp_settings.c | 1 +
 tools/testing/selftests/mm/thp_settings.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/testing/selftests/mm/thp_settings.c b/tools/testing/selftests/mm/thp_settings.c
index ad872af1c81a..b2f9f62b302a 100644
--- a/tools/testing/selftests/mm/thp_settings.c
+++ b/tools/testing/selftests/mm/thp_settings.c
@@ -20,6 +20,7 @@ static const char * const thp_enabled_strings[] = {
 	"always",
 	"inherit",
 	"madvise",
+	"defer",
 	NULL
 };
 
diff --git a/tools/testing/selftests/mm/thp_settings.h b/tools/testing/selftests/mm/thp_settings.h
index fc131d23d593..0d52e6d4f754 100644
--- a/tools/testing/selftests/mm/thp_settings.h
+++ b/tools/testing/selftests/mm/thp_settings.h
@@ -11,6 +11,7 @@ enum thp_enabled {
 	THP_ALWAYS,
 	THP_INHERIT,
 	THP_MADVISE,
+	THP_DEFER,
 };
 
 enum thp_defrag {
-- 
2.48.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v2 4/5] khugepaged: add defer option to mTHP options
  2025-02-11  0:40 [RFC v2 0/5] mm: introduce THP deferred setting Nico Pache
                   ` (2 preceding siblings ...)
  2025-02-11  0:40 ` [RFC v2 3/5] selftests: mm: add defer to thp setting parser Nico Pache
@ 2025-02-11  0:40 ` Nico Pache
  2025-02-11  0:40 ` [RFC v2 5/5] mm: document mTHP defer setting Nico Pache
  2025-02-17 14:53 ` [RFC v2 0/5] mm: introduce THP deferred setting Usama Arif
  5 siblings, 0 replies; 14+ messages in thread
From: Nico Pache @ 2025-02-11  0:40 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, usamaarif642, audra,
	akpm, rostedt, mathieu.desnoyers, tiwai, baolin.wang, corbet,
	shuah

Now that we have defer to globally disable THPs at fault time, lets add
a defer setting to the mTHP options. This will allow khugepaged to
operate at that order, while avoiding it at PF time.

Signed-off-by: Nico Pache <npache@redhat.com>
---
 include/linux/huge_mm.h |  5 +++++
 mm/huge_memory.c        | 38 +++++++++++++++++++++++++++++++++-----
 mm/khugepaged.c         | 10 +++++-----
 3 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index fb381ca720ea..8173a9ab0f3b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -92,6 +92,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
 #define TVA_SMAPS		(1 << 0)	/* Will be used for procfs */
 #define TVA_IN_PF		(1 << 1)	/* Page fault handler */
 #define TVA_ENFORCE_SYSFS	(1 << 2)	/* Obey sysfs configuration */
+#define TVA_IN_KHUGEPAGE	((1 << 2) | (1 << 3)) /* Khugepaged defer support */
 
 #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \
 	(!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order)))
@@ -173,6 +174,7 @@ extern unsigned long transparent_hugepage_flags;
 extern unsigned long huge_anon_orders_always;
 extern unsigned long huge_anon_orders_madvise;
 extern unsigned long huge_anon_orders_inherit;
+extern unsigned long huge_anon_orders_defer;
 
 static inline bool hugepage_global_enabled(void)
 {
@@ -297,6 +299,9 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
 	/* Optimization to check if required orders are enabled early. */
 	if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
 		unsigned long mask = READ_ONCE(huge_anon_orders_always);
+
+		if ((tva_flags) & (TVA_IN_KHUGEPAGE))
+			mask |= READ_ONCE(huge_anon_orders_defer);
 		if (vm_flags & VM_HUGEPAGE)
 			mask |= READ_ONCE(huge_anon_orders_madvise);
 		if (hugepage_global_always() || hugepage_global_defer() ||
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a5e66a12bae8..de45595b0f98 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
 unsigned long huge_anon_orders_always __read_mostly;
 unsigned long huge_anon_orders_madvise __read_mostly;
 unsigned long huge_anon_orders_inherit __read_mostly;
+unsigned long huge_anon_orders_defer __read_mostly;
 static bool anon_orders_configured __initdata;
 
 static inline bool file_thp_enabled(struct vm_area_struct *vma)
@@ -505,13 +506,15 @@ static ssize_t anon_enabled_show(struct kobject *kobj,
 	const char *output;
 
 	if (test_bit(order, &huge_anon_orders_always))
-		output = "[always] inherit madvise never";
+		output = "[always] inherit madvise defer never";
 	else if (test_bit(order, &huge_anon_orders_inherit))
-		output = "always [inherit] madvise never";
+		output = "always [inherit] madvise defer never";
 	else if (test_bit(order, &huge_anon_orders_madvise))
-		output = "always inherit [madvise] never";
+		output = "always inherit [madvise] defer never";
+	else if (test_bit(order, &huge_anon_orders_defer))
+		output = "always inherit madvise [defer] never";
 	else
-		output = "always inherit madvise [never]";
+		output = "always inherit madvise defer [never]";
 
 	return sysfs_emit(buf, "%s\n", output);
 }
@@ -527,25 +530,36 @@ static ssize_t anon_enabled_store(struct kobject *kobj,
 		spin_lock(&huge_anon_orders_lock);
 		clear_bit(order, &huge_anon_orders_inherit);
 		clear_bit(order, &huge_anon_orders_madvise);
+		clear_bit(order, &huge_anon_orders_defer);
 		set_bit(order, &huge_anon_orders_always);
 		spin_unlock(&huge_anon_orders_lock);
 	} else if (sysfs_streq(buf, "inherit")) {
 		spin_lock(&huge_anon_orders_lock);
 		clear_bit(order, &huge_anon_orders_always);
 		clear_bit(order, &huge_anon_orders_madvise);
+		clear_bit(order, &huge_anon_orders_defer);
 		set_bit(order, &huge_anon_orders_inherit);
 		spin_unlock(&huge_anon_orders_lock);
 	} else if (sysfs_streq(buf, "madvise")) {
 		spin_lock(&huge_anon_orders_lock);
 		clear_bit(order, &huge_anon_orders_always);
 		clear_bit(order, &huge_anon_orders_inherit);
+		clear_bit(order, &huge_anon_orders_defer);
 		set_bit(order, &huge_anon_orders_madvise);
 		spin_unlock(&huge_anon_orders_lock);
+	} else if (sysfs_streq(buf, "defer")) {
+		spin_lock(&huge_anon_orders_lock);
+		clear_bit(order, &huge_anon_orders_always);
+		clear_bit(order, &huge_anon_orders_inherit);
+		clear_bit(order, &huge_anon_orders_madvise);
+		set_bit(order, &huge_anon_orders_defer);
+		spin_unlock(&huge_anon_orders_lock);
 	} else if (sysfs_streq(buf, "never")) {
 		spin_lock(&huge_anon_orders_lock);
 		clear_bit(order, &huge_anon_orders_always);
 		clear_bit(order, &huge_anon_orders_inherit);
 		clear_bit(order, &huge_anon_orders_madvise);
+		clear_bit(order, &huge_anon_orders_defer);
 		spin_unlock(&huge_anon_orders_lock);
 	} else
 		ret = -EINVAL;
@@ -991,7 +1005,7 @@ static char str_dup[PAGE_SIZE] __initdata;
 static int __init setup_thp_anon(char *str)
 {
 	char *token, *range, *policy, *subtoken;
-	unsigned long always, inherit, madvise;
+	unsigned long always, inherit, madvise, defer;
 	char *start_size, *end_size;
 	int start, end, nr;
 	char *p;
@@ -1003,6 +1017,8 @@ static int __init setup_thp_anon(char *str)
 	always = huge_anon_orders_always;
 	madvise = huge_anon_orders_madvise;
 	inherit = huge_anon_orders_inherit;
+	defer = huge_anon_orders_defer;
+
 	p = str_dup;
 	while ((token = strsep(&p, ";")) != NULL) {
 		range = strsep(&token, ":");
@@ -1042,18 +1058,28 @@ static int __init setup_thp_anon(char *str)
 				bitmap_set(&always, start, nr);
 				bitmap_clear(&inherit, start, nr);
 				bitmap_clear(&madvise, start, nr);
+				bitmap_clear(&defer, start, nr);
 			} else if (!strcmp(policy, "madvise")) {
 				bitmap_set(&madvise, start, nr);
 				bitmap_clear(&inherit, start, nr);
 				bitmap_clear(&always, start, nr);
+				bitmap_clear(&defer, start, nr);
 			} else if (!strcmp(policy, "inherit")) {
 				bitmap_set(&inherit, start, nr);
 				bitmap_clear(&madvise, start, nr);
 				bitmap_clear(&always, start, nr);
+				bitmap_clear(&defer, start, nr);
+			} else if (!strcmp(policy, "defer")) {
+				bitmap_set(&defer, start, nr);
+				bitmap_clear(&madvise, start, nr);
+				bitmap_clear(&always, start, nr);
+				bitmap_clear(&inherit, start, nr);
 			} else if (!strcmp(policy, "never")) {
 				bitmap_clear(&inherit, start, nr);
 				bitmap_clear(&madvise, start, nr);
 				bitmap_clear(&always, start, nr);
+				bitmap_clear(&defer, start, nr);
+
 			} else {
 				pr_err("invalid policy %s in thp_anon boot parameter\n", policy);
 				goto err;
@@ -1064,6 +1090,8 @@ static int __init setup_thp_anon(char *str)
 	huge_anon_orders_always = always;
 	huge_anon_orders_madvise = madvise;
 	huge_anon_orders_inherit = inherit;
+	huge_anon_orders_defer = defer;
+
 	anon_orders_configured = true;
 	return 1;
 
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fc30698b8e6e..a83bc812ea64 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -488,7 +488,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
 {
 	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
 	    hugepage_pmd_enabled()) {
-		if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS,
+		if (thp_vma_allowable_order(vma, vm_flags, TVA_IN_KHUGEPAGE,
 					    PMD_ORDER))
 			__khugepaged_enter(vma->vm_mm);
 	}
@@ -943,7 +943,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
 				   struct collapse_control *cc, int order)
 {
 	struct vm_area_struct *vma;
-	unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0;
+	unsigned long tva_flags = cc->is_khugepaged ? TVA_IN_KHUGEPAGE  : 0;
 
 	if (unlikely(khugepaged_test_exit_or_disable(mm)))
 		return SCAN_ANY_PROCESS;
@@ -1393,7 +1393,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 	bool writable = false;
 	int chunk_none_count = 0;
 	int scaled_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - MIN_MTHP_ORDER);
-	unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0;
+	unsigned long tva_flags = cc->is_khugepaged ? TVA_IN_KHUGEPAGE : 0;
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
 	result = find_pmd_or_thp_or_none(mm, address, &pmd);
@@ -2505,7 +2505,7 @@ static int khugepaged_collapse_single_pmd(unsigned long addr, struct mm_struct *
 				   struct collapse_control *cc)
 {
 	int result = SCAN_FAIL;
-	unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0;
+	unsigned long tva_flags = cc->is_khugepaged ? TVA_IN_KHUGEPAGE : 0;
 
 	if (!*mmap_locked) {
 		mmap_read_lock(mm);
@@ -2595,7 +2595,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 			break;
 		}
 		if (!thp_vma_allowable_order(vma, vma->vm_flags,
-					TVA_ENFORCE_SYSFS, PMD_ORDER)) {
+					TVA_IN_KHUGEPAGE, PMD_ORDER)) {
 skip:
 			progress++;
 			continue;
-- 
2.48.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v2 5/5] mm: document mTHP defer setting
  2025-02-11  0:40 [RFC v2 0/5] mm: introduce THP deferred setting Nico Pache
                   ` (3 preceding siblings ...)
  2025-02-11  0:40 ` [RFC v2 4/5] khugepaged: add defer option to mTHP options Nico Pache
@ 2025-02-11  0:40 ` Nico Pache
  2025-02-17 15:13   ` Usama Arif
  2025-02-17 14:53 ` [RFC v2 0/5] mm: introduce THP deferred setting Usama Arif
  5 siblings, 1 reply; 14+ messages in thread
From: Nico Pache @ 2025-02-11  0:40 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, usamaarif642, audra,
	akpm, rostedt, mathieu.desnoyers, tiwai, baolin.wang, corbet,
	shuah

Now that we have mTHP support in khugepaged, lets add it to the
transhuge admin guide to provide proper guidance.

Signed-off-by: Nico Pache <npache@redhat.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index b3b18573bbb4..99ba3763c1c4 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -63,7 +63,7 @@ often.
 THP can be enabled system wide or restricted to certain tasks or even
 memory ranges inside task's address space. Unless THP is completely
 disabled, there is ``khugepaged`` daemon that scans memory and
-collapses sequences of basic pages into PMD-sized huge pages.
+collapses sequences of basic pages into huge pages.
 
 The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
 interface and using madvise(2) and prctl(2) system calls.
@@ -103,8 +103,8 @@ madvise(MADV_HUGEPAGE) on their critical mmapped regions.
 Applications that would like to benefit from THPs but would still like a
 more memory conservative approach can choose 'defer'. This avoids
 inserting THPs at the page fault handler unless they are MADV_HUGEPAGE.
-Khugepaged will then scan the mappings for potential collapses into PMD
-sized pages. Admins using this the 'defer' setting should consider
+Khugepaged will then scan the mappings for potential collapses into (m)THP
+pages. Admins using this the 'defer' setting should consider
 tweaking khugepaged/max_ptes_none. The current default of 511 may
 aggressively collapse your PTEs into PMDs. Lower this value to conserve
 more memory (ie. max_ptes_none=64).
@@ -119,11 +119,14 @@ Global THP controls
 
 Transparent Hugepage Support for anonymous memory can be entirely disabled
 (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
-regions (to avoid the risk of consuming more memory resources) or enabled
-system wide. This can be achieved per-supported-THP-size with one of::
+regions (to avoid the risk of consuming more memory resources), defered to
+khugepaged, or enabled system wide.
+
+This can be achieved per-supported-THP-size with one of::
 
 	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
 	echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
+	echo defer >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
 	echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
 
 where <size> is the hugepage size being addressed, the available sizes
@@ -155,6 +158,13 @@ hugepage sizes have enabled="never". If enabling multiple hugepage
 sizes, the kernel will select the most appropriate enabled size for a
 given allocation.
 
+khugepaged use max_ptes_none scaled to the order of the enabled mTHP size to
+determine collapses. When using mTHPs its recommended to set max_ptes_none low.
+Ideally less than HPAGE_PMD_NR / 2 (255 on 4k page size). This will prevent
+undesired "creep" behavior that leads to continously collapsing to a larger
+mTHP size. max_ptes_shared and max_ptes_swap have no effect when collapsing to a
+mTHP, and mTHP collapse will fail on shared or swapped out pages.
+
 It's also possible to limit defrag efforts in the VM to generate
 anonymous hugepages in case they're not immediately free to madvise
 regions or to never try to defrag memory and simply fallback to regular
@@ -318,7 +328,7 @@ Alternatively, each supported anonymous THP size can be controlled by
 passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,
 where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
 supported anonymous THP)  and ``<state>`` is one of ``always``, ``madvise``,
-``never`` or ``inherit``.
+``defer``, ``never`` or ``inherit``.
 
 For example, the following will set 16K, 32K, 64K THP to ``always``,
 set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
-- 
2.48.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 0/5] mm: introduce THP deferred setting
  2025-02-11  0:40 [RFC v2 0/5] mm: introduce THP deferred setting Nico Pache
                   ` (4 preceding siblings ...)
  2025-02-11  0:40 ` [RFC v2 5/5] mm: document mTHP defer setting Nico Pache
@ 2025-02-17 14:53 ` Usama Arif
  2025-02-17 19:23   ` Nico Pache
  5 siblings, 1 reply; 14+ messages in thread
From: Usama Arif @ 2025-02-17 14:53 UTC (permalink / raw)
  To: Nico Pache, linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, audra, akpm, rostedt,
	mathieu.desnoyers, tiwai, baolin.wang, corbet, shuah



On 11/02/2025 00:40, Nico Pache wrote:
> This series is a follow-up to [1], which adds mTHP support to khugepaged.
> mTHP khugepaged support was necessary for the global="defer" and
> mTHP="inherit" case (and others) to make sense.
> 

Hi Nico,

Thanks for the patches!

Why is mTHP khugepaged a prerequisite for THP=defer?
THP=defer applies to PMD hugepages as well, so they should be independent.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] mm: defer THP insertion to khugepaged
  2025-02-11  0:40 ` [RFC v2 1/5] mm: defer THP insertion to khugepaged Nico Pache
@ 2025-02-17 14:59   ` Usama Arif
  2025-02-17 19:24     ` Nico Pache
  0 siblings, 1 reply; 14+ messages in thread
From: Usama Arif @ 2025-02-17 14:59 UTC (permalink / raw)
  To: Nico Pache, linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, audra, akpm, rostedt,
	mathieu.desnoyers, tiwai, baolin.wang, corbet, shuah



On 11/02/2025 00:40, Nico Pache wrote:
> setting /transparent_hugepages/enabled=always allows applications
> to benefit from THPs without having to madvise. However, the pf handler
> takes very few considerations to decide weather or not to actually use a
> THP. This can lead to a lot of wasted memory. khugepaged only operates
> on memory that was either allocated with enabled=always or MADV_HUGEPAGE.
> 
> Introduce the ability to set enabled=defer, which will prevent THPs from
> being allocated by the page fault handler unless madvise is set,
> leaving it up to khugepaged to decide which allocations will collapse to a
> THP. This should allow applications to benefits from THPs, while curbing
> some of the memory waste.
> 
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  include/linux/huge_mm.h | 15 +++++++++++++--
>  mm/huge_memory.c        | 31 +++++++++++++++++++++++++++----
>  2 files changed, 40 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 93e509b6c00e..fb381ca720ea 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -44,6 +44,7 @@ enum transparent_hugepage_flag {
>  	TRANSPARENT_HUGEPAGE_UNSUPPORTED,
>  	TRANSPARENT_HUGEPAGE_FLAG,
>  	TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
> +	TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG,

No strong preference, but maybe just TRANSPARENT_HUGEPAGE_DEFER_FLAG might be better?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 2/5] mm: document transparent_hugepage=defer usage
  2025-02-11  0:40 ` [RFC v2 2/5] mm: document transparent_hugepage=defer usage Nico Pache
@ 2025-02-17 15:04   ` Usama Arif
  2025-02-17 19:30     ` Nico Pache
  0 siblings, 1 reply; 14+ messages in thread
From: Usama Arif @ 2025-02-17 15:04 UTC (permalink / raw)
  To: Nico Pache, linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, audra, akpm, rostedt,
	mathieu.desnoyers, tiwai, baolin.wang, corbet, shuah



On 11/02/2025 00:40, Nico Pache wrote:
> The new transparent_hugepage=defer option allows for a more conservative
> approach to THPs. Document its usage in the transhuge admin-guide.
> 
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  Documentation/admin-guide/mm/transhuge.rst | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index dff8d5985f0f..b3b18573bbb4 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -88,8 +88,9 @@ In certain cases when hugepages are enabled system wide, application
>  may end up allocating more memory resources. An application may mmap a
>  large region but only touch 1 byte of it, in that case a 2M page might
>  be allocated instead of a 4k page for no good. This is why it's
> -possible to disable hugepages system-wide and to only have them inside
> -MADV_HUGEPAGE madvise regions.
> +possible to disable hugepages system-wide, only have them inside
> +MADV_HUGEPAGE madvise regions, or defer them away from the page fault
> +handler to khugepaged.
>  
>  Embedded systems should enable hugepages only inside madvise regions
>  to eliminate any risk of wasting any precious byte of memory and to
> @@ -99,6 +100,15 @@ Applications that gets a lot of benefit from hugepages and that don't
>  risk to lose memory by using hugepages, should use
>  madvise(MADV_HUGEPAGE) on their critical mmapped regions.
>  
> +Applications that would like to benefit from THPs but would still like a
> +more memory conservative approach can choose 'defer'. This avoids
> +inserting THPs at the page fault handler unless they are MADV_HUGEPAGE.
> +Khugepaged will then scan the mappings for potential collapses into PMD
> +sized pages. Admins using this the 'defer' setting should consider
> +tweaking khugepaged/max_ptes_none. The current default of 511 may
> +aggressively collapse your PTEs into PMDs. Lower this value to conserve
> +more memory (ie. max_ptes_none=64).
> +

maybe remove the "(ie. max_ptes_none=64)", its appearing as a recommendation for
the value, but it might not be optimal for different workloads. 

>  .. _thp_sysfs:
>  
>  sysfs
> @@ -136,6 +146,7 @@ The top-level setting (for use with "inherit") can be set by issuing
>  one of the following commands::
>  
>  	echo always >/sys/kernel/mm/transparent_hugepage/enabled
> +	echo defer >/sys/kernel/mm/transparent_hugepage/enabled
>  	echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
>  	echo never >/sys/kernel/mm/transparent_hugepage/enabled
>  
> @@ -274,7 +285,8 @@ of small pages into one large page::
>  A higher value leads to use additional memory for programs.
>  A lower value leads to gain less thp performance. Value of
>  max_ptes_none can waste cpu time very little, you can
> -ignore it.
> +ignore it. Consider lowering this value when using
> +``transparent_hugepage=defer``

lowering this value even with thp=always makes sense, as there might be cases
when pf might not give a THP, but a VMA becomes eligable to scan via khugepaged
later? I would remove this line.

>  
>  ``max_ptes_swap`` specifies how many pages can be brought in from
>  swap when collapsing a group of pages into a transparent huge page::
> @@ -299,8 +311,8 @@ Boot parameters
>  
>  You can change the sysfs boot time default for the top-level "enabled"
>  control by passing the parameter ``transparent_hugepage=always`` or
> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> -kernel command line.
> +``transparent_hugepage=madvise`` or ``transparent_hugepage=defer`` or
> +``transparent_hugepage=never`` to the kernel command line.
>  
>  Alternatively, each supported anonymous THP size can be controlled by
>  passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 5/5] mm: document mTHP defer setting
  2025-02-11  0:40 ` [RFC v2 5/5] mm: document mTHP defer setting Nico Pache
@ 2025-02-17 15:13   ` Usama Arif
  2025-02-17 19:40     ` Nico Pache
  0 siblings, 1 reply; 14+ messages in thread
From: Usama Arif @ 2025-02-17 15:13 UTC (permalink / raw)
  To: Nico Pache, linux-kernel, linux-doc, linux-kselftest, linux-mm
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, dave.hansen, will, baohua, jack, srivatsa,
	haowenchao22, hughd, aneesh.kumar, yang, peterx, ioworker0,
	wangkefeng.wang, ziy, jglisse, surenb, vishal.moola, zokeefe,
	zhengqi.arch, jhubbard, 21cnbao, willy, kirill.shutemov, david,
	aarcange, raquini, dev.jain, sunnanyong, audra, akpm, rostedt,
	mathieu.desnoyers, tiwai, baolin.wang, corbet, shuah



On 11/02/2025 00:40, Nico Pache wrote:
> Now that we have mTHP support in khugepaged, lets add it to the
> transhuge admin guide to provide proper guidance.
> 

I think you should move this patch to the mTHP khugepaged series, and just send
THP=defer separately from mTHP khguepaged.

> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  Documentation/admin-guide/mm/transhuge.rst | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index b3b18573bbb4..99ba3763c1c4 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -63,7 +63,7 @@ often.
>  THP can be enabled system wide or restricted to certain tasks or even
>  memory ranges inside task's address space. Unless THP is completely
>  disabled, there is ``khugepaged`` daemon that scans memory and
> -collapses sequences of basic pages into PMD-sized huge pages.
> +collapses sequences of basic pages into huge pages.
>  
>  The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
>  interface and using madvise(2) and prctl(2) system calls.
> @@ -103,8 +103,8 @@ madvise(MADV_HUGEPAGE) on their critical mmapped regions.
>  Applications that would like to benefit from THPs but would still like a
>  more memory conservative approach can choose 'defer'. This avoids
>  inserting THPs at the page fault handler unless they are MADV_HUGEPAGE.
> -Khugepaged will then scan the mappings for potential collapses into PMD
> -sized pages. Admins using this the 'defer' setting should consider
> +Khugepaged will then scan the mappings for potential collapses into (m)THP
> +pages. Admins using this the 'defer' setting should consider
>  tweaking khugepaged/max_ptes_none. The current default of 511 may
>  aggressively collapse your PTEs into PMDs. Lower this value to conserve
>  more memory (ie. max_ptes_none=64).
> @@ -119,11 +119,14 @@ Global THP controls
>  
>  Transparent Hugepage Support for anonymous memory can be entirely disabled
>  (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
> -regions (to avoid the risk of consuming more memory resources) or enabled
> -system wide. This can be achieved per-supported-THP-size with one of::
> +regions (to avoid the risk of consuming more memory resources), defered to
> +khugepaged, or enabled system wide.
> +
> +This can be achieved per-supported-THP-size with one of::
>  
>  	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>  	echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> +	echo defer >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>  	echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>  
>  where <size> is the hugepage size being addressed, the available sizes
> @@ -155,6 +158,13 @@ hugepage sizes have enabled="never". If enabling multiple hugepage
>  sizes, the kernel will select the most appropriate enabled size for a
>  given allocation.
>  
> +khugepaged use max_ptes_none scaled to the order of the enabled mTHP size to
> +determine collapses. When using mTHPs its recommended to set max_ptes_none low.
> +Ideally less than HPAGE_PMD_NR / 2 (255 on 4k page size). This will prevent
> +undesired "creep" behavior that leads to continously collapsing to a larger
> +mTHP size. max_ptes_shared and max_ptes_swap have no effect when collapsing to a
> +mTHP, and mTHP collapse will fail on shared or swapped out pages.
> +

This paragraph definitely belongs in the khugepaged series, as it doesn't have anything
to do with THP=defer.

re "Ideally less than HPAGE_PMD_NR / 2",
what if you are running on amd, and using 16K and 2M THP=always only as, thats where
the most TLB benefit is. Than this recommendation doesnt make sense?

Also even if you have all mTHP sizes as always, shouldnt you start by collapsing to
the largest THP size first? (I haven't reviewed the khugepaged series yet, so might
be have been discussed there, I will try and review it).

Did you see the creep behavior you mentioned in your experiments?


>  It's also possible to limit defrag efforts in the VM to generate
>  anonymous hugepages in case they're not immediately free to madvise
>  regions or to never try to defrag memory and simply fallback to regular
> @@ -318,7 +328,7 @@ Alternatively, each supported anonymous THP size can be controlled by
>  passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,
>  where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
>  supported anonymous THP)  and ``<state>`` is one of ``always``, ``madvise``,
> -``never`` or ``inherit``.
> +``defer``, ``never`` or ``inherit``.
>  
>  For example, the following will set 16K, 32K, 64K THP to ``always``,
>  set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 0/5] mm: introduce THP deferred setting
  2025-02-17 14:53 ` [RFC v2 0/5] mm: introduce THP deferred setting Usama Arif
@ 2025-02-17 19:23   ` Nico Pache
  0 siblings, 0 replies; 14+ messages in thread
From: Nico Pache @ 2025-02-17 19:23 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-kernel, linux-doc, linux-kselftest, linux-mm, ryan.roberts,
	anshuman.khandual, catalin.marinas, cl, vbabka, mhocko, apopple,
	dave.hansen, will, baohua, jack, srivatsa, haowenchao22, hughd,
	aneesh.kumar, yang, peterx, ioworker0, wangkefeng.wang, ziy,
	jglisse, surenb, vishal.moola, zokeefe, zhengqi.arch, jhubbard,
	21cnbao, willy, kirill.shutemov, david, aarcange, raquini,
	dev.jain, sunnanyong, audra, akpm, rostedt, mathieu.desnoyers,
	tiwai, baolin.wang, corbet, shuah

On Mon, Feb 17, 2025 at 7:54 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>
> On 11/02/2025 00:40, Nico Pache wrote:
> > This series is a follow-up to [1], which adds mTHP support to khugepaged.
> > mTHP khugepaged support was necessary for the global="defer" and
> > mTHP="inherit" case (and others) to make sense.
> >
>
> Hi Nico,
>
> Thanks for the patches!
Hi Usama,

Thank you for the review!

>
> Why is mTHP khugepaged a prerequisite for THP=defer?
> THP=defer applies to PMD hugepages as well, so they should be independent.

Its not a hard prerequisite, but I explained it a little here:
https://lore.kernel.org/lkml/CAA1CXcBPt4jHfH0Ggio5ghSYAQAXf08rO8R6b1faHzdjFf_Ajw@mail.gmail.com/

In general, the sysfs didnt really make sense without it, and given
mTHPs came along right when i was working on defer, I decided to add
it to mTHP too.

I worked on and tested these together so it felt right to sync up the
V2s for both of them.

Cheers,
-- Nico


>
>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] mm: defer THP insertion to khugepaged
  2025-02-17 14:59   ` Usama Arif
@ 2025-02-17 19:24     ` Nico Pache
  0 siblings, 0 replies; 14+ messages in thread
From: Nico Pache @ 2025-02-17 19:24 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-kernel, linux-doc, linux-kselftest, linux-mm, ryan.roberts,
	anshuman.khandual, catalin.marinas, cl, vbabka, mhocko, apopple,
	dave.hansen, will, baohua, jack, srivatsa, haowenchao22, hughd,
	aneesh.kumar, yang, peterx, ioworker0, wangkefeng.wang, ziy,
	jglisse, surenb, vishal.moola, zokeefe, zhengqi.arch, jhubbard,
	21cnbao, willy, kirill.shutemov, david, aarcange, raquini,
	dev.jain, sunnanyong, audra, akpm, rostedt, mathieu.desnoyers,
	tiwai, baolin.wang, corbet, shuah

On Mon, Feb 17, 2025 at 7:59 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>
> On 11/02/2025 00:40, Nico Pache wrote:
> > setting /transparent_hugepages/enabled=always allows applications
> > to benefit from THPs without having to madvise. However, the pf handler
> > takes very few considerations to decide weather or not to actually use a
> > THP. This can lead to a lot of wasted memory. khugepaged only operates
> > on memory that was either allocated with enabled=always or MADV_HUGEPAGE.
> >
> > Introduce the ability to set enabled=defer, which will prevent THPs from
> > being allocated by the page fault handler unless madvise is set,
> > leaving it up to khugepaged to decide which allocations will collapse to a
> > THP. This should allow applications to benefits from THPs, while curbing
> > some of the memory waste.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >  include/linux/huge_mm.h | 15 +++++++++++++--
> >  mm/huge_memory.c        | 31 +++++++++++++++++++++++++++----
> >  2 files changed, 40 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 93e509b6c00e..fb381ca720ea 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -44,6 +44,7 @@ enum transparent_hugepage_flag {
> >       TRANSPARENT_HUGEPAGE_UNSUPPORTED,
> >       TRANSPARENT_HUGEPAGE_FLAG,
> >       TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
> > +     TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG,
>
> No strong preference, but maybe just TRANSPARENT_HUGEPAGE_DEFER_FLAG might be better?

Not a bad idea, TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG is pretty long!
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 2/5] mm: document transparent_hugepage=defer usage
  2025-02-17 15:04   ` Usama Arif
@ 2025-02-17 19:30     ` Nico Pache
  0 siblings, 0 replies; 14+ messages in thread
From: Nico Pache @ 2025-02-17 19:30 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-kernel, linux-doc, linux-kselftest, linux-mm, ryan.roberts,
	anshuman.khandual, catalin.marinas, cl, vbabka, mhocko, apopple,
	dave.hansen, will, baohua, jack, srivatsa, haowenchao22, hughd,
	aneesh.kumar, yang, peterx, ioworker0, wangkefeng.wang, ziy,
	jglisse, surenb, vishal.moola, zokeefe, zhengqi.arch, jhubbard,
	21cnbao, willy, kirill.shutemov, david, aarcange, raquini,
	dev.jain, sunnanyong, audra, akpm, rostedt, mathieu.desnoyers,
	tiwai, baolin.wang, corbet, shuah

On Mon, Feb 17, 2025 at 8:04 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>
> On 11/02/2025 00:40, Nico Pache wrote:
> > The new transparent_hugepage=defer option allows for a more conservative
> > approach to THPs. Document its usage in the transhuge admin-guide.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 22 +++++++++++++++++-----
> >  1 file changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index dff8d5985f0f..b3b18573bbb4 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -88,8 +88,9 @@ In certain cases when hugepages are enabled system wide, application
> >  may end up allocating more memory resources. An application may mmap a
> >  large region but only touch 1 byte of it, in that case a 2M page might
> >  be allocated instead of a 4k page for no good. This is why it's
> > -possible to disable hugepages system-wide and to only have them inside
> > -MADV_HUGEPAGE madvise regions.
> > +possible to disable hugepages system-wide, only have them inside
> > +MADV_HUGEPAGE madvise regions, or defer them away from the page fault
> > +handler to khugepaged.
> >
> >  Embedded systems should enable hugepages only inside madvise regions
> >  to eliminate any risk of wasting any precious byte of memory and to
> > @@ -99,6 +100,15 @@ Applications that gets a lot of benefit from hugepages and that don't
> >  risk to lose memory by using hugepages, should use
> >  madvise(MADV_HUGEPAGE) on their critical mmapped regions.
> >
> > +Applications that would like to benefit from THPs but would still like a
> > +more memory conservative approach can choose 'defer'. This avoids
> > +inserting THPs at the page fault handler unless they are MADV_HUGEPAGE.
> > +Khugepaged will then scan the mappings for potential collapses into PMD
> > +sized pages. Admins using this the 'defer' setting should consider
> > +tweaking khugepaged/max_ptes_none. The current default of 511 may
> > +aggressively collapse your PTEs into PMDs. Lower this value to conserve
> > +more memory (ie. max_ptes_none=64).
> > +
>
> maybe remove the "(ie. max_ptes_none=64)", its appearing as a recommendation for
> the value, but it might not be optimal for different workloads.
>
> >  .. _thp_sysfs:
> >
> >  sysfs
> > @@ -136,6 +146,7 @@ The top-level setting (for use with "inherit") can be set by issuing
> >  one of the following commands::
> >
> >       echo always >/sys/kernel/mm/transparent_hugepage/enabled
> > +     echo defer >/sys/kernel/mm/transparent_hugepage/enabled
> >       echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
> >       echo never >/sys/kernel/mm/transparent_hugepage/enabled
> >
> > @@ -274,7 +285,8 @@ of small pages into one large page::
> >  A higher value leads to use additional memory for programs.
> >  A lower value leads to gain less thp performance. Value of
> >  max_ptes_none can waste cpu time very little, you can
> > -ignore it.
> > +ignore it. Consider lowering this value when using
> > +``transparent_hugepage=defer``
>
> lowering this value even with thp=always makes sense, as there might be cases
> when pf might not give a THP, but a VMA becomes eligable to scan via khugepaged
> later? I would remove this line.

Perhaps I should be more clear or create a different section for it.
The point was that defer was created to prevent internal fragmentation
and leave khugepaged to determine when a THP was "useful" (less
wasteful). But to achieve this less waste we should also not be using
the default.

Ideally I would want to change "always" to ignore max_ptes_none (acts
as max_ptes_none=511), and change the max_ptes_none default to 64 or
128. But that's a separate discussion that I didn't want detracting
from these postings.

>
> >
> >  ``max_ptes_swap`` specifies how many pages can be brought in from
> >  swap when collapsing a group of pages into a transparent huge page::
> > @@ -299,8 +311,8 @@ Boot parameters
> >
> >  You can change the sysfs boot time default for the top-level "enabled"
> >  control by passing the parameter ``transparent_hugepage=always`` or
> > -``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> > -kernel command line.
> > +``transparent_hugepage=madvise`` or ``transparent_hugepage=defer`` or
> > +``transparent_hugepage=never`` to the kernel command line.
> >
> >  Alternatively, each supported anonymous THP size can be controlled by
> >  passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 5/5] mm: document mTHP defer setting
  2025-02-17 15:13   ` Usama Arif
@ 2025-02-17 19:40     ` Nico Pache
  0 siblings, 0 replies; 14+ messages in thread
From: Nico Pache @ 2025-02-17 19:40 UTC (permalink / raw)
  To: Usama Arif
  Cc: linux-kernel, linux-doc, linux-kselftest, linux-mm, ryan.roberts,
	anshuman.khandual, catalin.marinas, cl, vbabka, mhocko, apopple,
	dave.hansen, will, baohua, jack, srivatsa, haowenchao22, hughd,
	aneesh.kumar, yang, peterx, ioworker0, wangkefeng.wang, ziy,
	jglisse, surenb, vishal.moola, zokeefe, zhengqi.arch, jhubbard,
	21cnbao, willy, kirill.shutemov, david, aarcange, raquini,
	dev.jain, sunnanyong, audra, akpm, rostedt, mathieu.desnoyers,
	tiwai, baolin.wang, corbet, shuah

On Mon, Feb 17, 2025 at 8:14 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>
> On 11/02/2025 00:40, Nico Pache wrote:
> > Now that we have mTHP support in khugepaged, lets add it to the
> > transhuge admin guide to provide proper guidance.
> >
>
> I think you should move this patch to the mTHP khugepaged series, and just send
> THP=defer separately from mTHP khguepaged.
>
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 22 ++++++++++++++++------
> >  1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index b3b18573bbb4..99ba3763c1c4 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -63,7 +63,7 @@ often.
> >  THP can be enabled system wide or restricted to certain tasks or even
> >  memory ranges inside task's address space. Unless THP is completely
> >  disabled, there is ``khugepaged`` daemon that scans memory and
> > -collapses sequences of basic pages into PMD-sized huge pages.
> > +collapses sequences of basic pages into huge pages.
> >
> >  The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
> >  interface and using madvise(2) and prctl(2) system calls.
> > @@ -103,8 +103,8 @@ madvise(MADV_HUGEPAGE) on their critical mmapped regions.
> >  Applications that would like to benefit from THPs but would still like a
> >  more memory conservative approach can choose 'defer'. This avoids
> >  inserting THPs at the page fault handler unless they are MADV_HUGEPAGE.
> > -Khugepaged will then scan the mappings for potential collapses into PMD
> > -sized pages. Admins using this the 'defer' setting should consider
> > +Khugepaged will then scan the mappings for potential collapses into (m)THP
> > +pages. Admins using this the 'defer' setting should consider
> >  tweaking khugepaged/max_ptes_none. The current default of 511 may
> >  aggressively collapse your PTEs into PMDs. Lower this value to conserve
> >  more memory (ie. max_ptes_none=64).
> > @@ -119,11 +119,14 @@ Global THP controls
> >
> >  Transparent Hugepage Support for anonymous memory can be entirely disabled
> >  (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
> > -regions (to avoid the risk of consuming more memory resources) or enabled
> > -system wide. This can be achieved per-supported-THP-size with one of::
> > +regions (to avoid the risk of consuming more memory resources), defered to
> > +khugepaged, or enabled system wide.
> > +
> > +This can be achieved per-supported-THP-size with one of::
> >
> >       echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> >       echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> > +     echo defer >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> >       echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> >
> >  where <size> is the hugepage size being addressed, the available sizes
> > @@ -155,6 +158,13 @@ hugepage sizes have enabled="never". If enabling multiple hugepage
> >  sizes, the kernel will select the most appropriate enabled size for a
> >  given allocation.
> >
> > +khugepaged use max_ptes_none scaled to the order of the enabled mTHP size to
> > +determine collapses. When using mTHPs its recommended to set max_ptes_none low.
> > +Ideally less than HPAGE_PMD_NR / 2 (255 on 4k page size). This will prevent
> > +undesired "creep" behavior that leads to continously collapsing to a larger
> > +mTHP size. max_ptes_shared and max_ptes_swap have no effect when collapsing to a
> > +mTHP, and mTHP collapse will fail on shared or swapped out pages.
> > +
>
> This paragraph definitely belongs in the khugepaged series, as it doesn't have anything
> to do with THP=defer.
>
> re "Ideally less than HPAGE_PMD_NR / 2",
> what if you are running on amd, and using 16K and 2M THP=always only as, thats where
> the most TLB benefit is. Than this recommendation doesnt make sense?

That may be correct, I believe the creep requires two adjacent mTHP
levels ( ie 512kb, 1024kb) to be enabled for the issue to really
present. Although with max_ptes_none=511, you will almost always
satisfy the collapse request, so your 16Kb mTHPs will be promoted to
PMDs. I dont believe 511 is a good default if using mTHPs.
>
> Also even if you have all mTHP sizes as always, shouldnt you start by collapsing to
> the largest THP size first? (I haven't reviewed the khugepaged series yet, so might
> be have been discussed there, I will try and review it).

We do start at the largest first. The creep happens on a second pass
of the PMD, not immediately in the same collapse.

>
> Did you see the creep behavior you mentioned in your experiments?

Yes, I provided an example of how it happens here.
https://lore.kernel.org/lkml/CAA1CXcDiGLD=dZpFRyAuz4TLrVZZYGp=u7=Z9Q+g9RXbf-s2nA@mail.gmail.com/

>
>
> >  It's also possible to limit defrag efforts in the VM to generate
> >  anonymous hugepages in case they're not immediately free to madvise
> >  regions or to never try to defrag memory and simply fallback to regular
> > @@ -318,7 +328,7 @@ Alternatively, each supported anonymous THP size can be controlled by
> >  passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,
> >  where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
> >  supported anonymous THP)  and ``<state>`` is one of ``always``, ``madvise``,
> > -``never`` or ``inherit``.
> > +``defer``, ``never`` or ``inherit``.
> >
> >  For example, the following will set 16K, 32K, 64K THP to ``always``,
> >  set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-02-17 19:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-11  0:40 [RFC v2 0/5] mm: introduce THP deferred setting Nico Pache
2025-02-11  0:40 ` [RFC v2 1/5] mm: defer THP insertion to khugepaged Nico Pache
2025-02-17 14:59   ` Usama Arif
2025-02-17 19:24     ` Nico Pache
2025-02-11  0:40 ` [RFC v2 2/5] mm: document transparent_hugepage=defer usage Nico Pache
2025-02-17 15:04   ` Usama Arif
2025-02-17 19:30     ` Nico Pache
2025-02-11  0:40 ` [RFC v2 3/5] selftests: mm: add defer to thp setting parser Nico Pache
2025-02-11  0:40 ` [RFC v2 4/5] khugepaged: add defer option to mTHP options Nico Pache
2025-02-11  0:40 ` [RFC v2 5/5] mm: document mTHP defer setting Nico Pache
2025-02-17 15:13   ` Usama Arif
2025-02-17 19:40     ` Nico Pache
2025-02-17 14:53 ` [RFC v2 0/5] mm: introduce THP deferred setting Usama Arif
2025-02-17 19:23   ` Nico Pache

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox