* [PATCH v11 mm-new 05/10] mm: thp: enable THP allocation exclusively through khugepaged
@ 2025-10-20 3:16 Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 06/10] mm: bpf-thp: add support for global mode Yafang Shao
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Yafang Shao @ 2025-10-20 3:16 UTC (permalink / raw)
To: akpm, ast, daniel, andrii, martin.lau, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
david, ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
dev.jain, hannes, usamaarif642, gutierrez.asier, willy,
ameryhung, rientjes, corbet, 21cnbao, shakeel.butt, tj,
lance.yang, rdunlap
Cc: bpf, linux-mm, linux-doc, linux-kernel, Yafang Shao
khugepaged_enter_vma() ultimately invokes any attached BPF function with
the TVA_KHUGEPAGED flag set when determining whether or not to enable
khugepaged THP for a freshly faulted in VMA.
Currently, on fault, we invoke this in do_huge_pmd_anonymous_page(), as
invoked by create_huge_pmd() and only when we have already checked to
see if an allowable TVA_PAGEFAULT order is specified.
Since we might want to disallow THP on fault-in but allow it via
khugepaged, we move things around so we always attempt to enter
khugepaged upon fault.
This change is safe because:
- khugepaged operates at the MM level rather than per-VMA. The THP
allocation might fail during page faults due to transient conditions
(e.g., memory pressure), it is safe to add this MM to khugepaged for
subsequent defragmentation.
- If __thp_vma_allowable_orders(TVA_PAGEFAULT) returns 0, then
__thp_vma_allowable_orders(TVA_KHUGEPAGED) will also return 0.
While we could also extend prctl() to utilize this new policy, such a
change would require a uAPI modification to PR_SET_THP_DISABLE.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Lance Yang <lance.yang@linux.dev>
Cc: Usama Arif <usamaarif642@gmail.com>
---
mm/huge_memory.c | 1 -
mm/memory.c | 13 ++++++++-----
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e105604868a5..45d13c798525 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1390,7 +1390,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
ret = vmf_anon_prepare(vmf);
if (ret)
return ret;
- khugepaged_enter_vma(vma);
if (!(vmf->flags & FAULT_FLAG_WRITE) &&
!mm_forbids_zeropage(vma->vm_mm) &&
diff --git a/mm/memory.c b/mm/memory.c
index 7a242cb07d56..5007f7526694 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6327,11 +6327,14 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
if (pud_trans_unstable(vmf.pud))
goto retry_pud;
- if (pmd_none(*vmf.pmd) &&
- thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) {
- ret = create_huge_pmd(&vmf);
- if (!(ret & VM_FAULT_FALLBACK))
- return ret;
+ if (pmd_none(*vmf.pmd)) {
+ if (vma_is_anonymous(vma))
+ khugepaged_enter_vma(vma);
+ if (thp_vma_allowable_order(vma, TVA_PAGEFAULT, PMD_ORDER)) {
+ ret = create_huge_pmd(&vmf);
+ if (!(ret & VM_FAULT_FALLBACK))
+ return ret;
+ }
} else {
vmf.orig_pmd = pmdp_get_lockless(vmf.pmd);
--
2.47.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v11 mm-new 06/10] mm: bpf-thp: add support for global mode
2025-10-20 3:16 [PATCH v11 mm-new 05/10] mm: thp: enable THP allocation exclusively through khugepaged Yafang Shao
@ 2025-10-20 3:16 ` Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 07/10] Documentation: add BPF THP Yafang Shao
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Yafang Shao @ 2025-10-20 3:16 UTC (permalink / raw)
To: akpm, ast, daniel, andrii, martin.lau, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
david, ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
dev.jain, hannes, usamaarif642, gutierrez.asier, willy,
ameryhung, rientjes, corbet, 21cnbao, shakeel.butt, tj,
lance.yang, rdunlap
Cc: bpf, linux-mm, linux-doc, linux-kernel, Yafang Shao
The per-process BPF-THP mode is unsuitable for managing shared resources
such as shmem THP and file-backed THP. This aligns with known cgroup
limitations for similar scenarios [0].
Introduce a global BPF-THP mode to address this gap. When registered:
- All existing per-process instances are disabled
- New per-process registrations are blocked
- Existing per-process instances remain registered (no forced unregistration)
The global mode takes precedence over per-process instances. Updates are
type-isolated: global instances can only be updated by new global
instances, and per-process instances by new per-process instances.
Link: https://lore.kernel.org/linux-mm/YwNold0GMOappUxc@slm.duckdns.org/ [0]
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
mm/huge_memory_bpf.c | 109 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 107 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory_bpf.c b/mm/huge_memory_bpf.c
index e8894c10d1d9..cad1ca6f59a4 100644
--- a/mm/huge_memory_bpf.c
+++ b/mm/huge_memory_bpf.c
@@ -33,6 +33,28 @@ struct bpf_thp_ops {
};
static DEFINE_SPINLOCK(thp_ops_lock);
+static struct bpf_thp_ops __rcu *bpf_thp_global; /* global mode */
+
+static unsigned long
+bpf_hook_thp_get_orders_global(struct vm_area_struct *vma,
+ enum tva_type type,
+ unsigned long orders)
+{
+ thp_order_fn_t *bpf_hook_thp_get_order;
+ int bpf_order;
+
+ rcu_read_lock();
+ bpf_hook_thp_get_order = rcu_dereference(bpf_thp_global->thp_get_order);
+ if (!bpf_hook_thp_get_order)
+ goto out;
+
+ bpf_order = bpf_hook_thp_get_order(vma, type, orders);
+ orders &= BIT(bpf_order);
+
+out:
+ rcu_read_unlock();
+ return orders;
+}
unsigned long bpf_hook_thp_get_orders(struct vm_area_struct *vma,
enum tva_type type,
@@ -45,6 +67,10 @@ unsigned long bpf_hook_thp_get_orders(struct vm_area_struct *vma,
if (!mm)
return orders;
+ /* Global BPF-THP takes precedence over per-process BPF-THP. */
+ if (rcu_access_pointer(bpf_thp_global))
+ return bpf_hook_thp_get_orders_global(vma, type, orders);
+
rcu_read_lock();
bpf_thp = rcu_dereference(mm->bpf_mm.bpf_thp);
if (!bpf_thp || !bpf_thp->thp_get_order)
@@ -177,6 +203,23 @@ static int bpf_thp_init_member(const struct btf_type *t,
return 0;
}
+static int bpf_thp_reg_gloabl(void *kdata, struct bpf_link *link)
+{
+ struct bpf_thp_ops *ops = kdata;
+
+ /* Protect the global pointer bpf_thp_global from concurrent writes. */
+ spin_lock(&thp_ops_lock);
+ /* Only one instance is allowed. */
+ if (rcu_access_pointer(bpf_thp_global)) {
+ spin_unlock(&thp_ops_lock);
+ return -EBUSY;
+ }
+
+ rcu_assign_pointer(bpf_thp_global, ops);
+ spin_unlock(&thp_ops_lock);
+ return 0;
+}
+
static int bpf_thp_reg(void *kdata, struct bpf_link *link)
{
struct bpf_thp_ops *bpf_thp = kdata;
@@ -187,6 +230,11 @@ static int bpf_thp_reg(void *kdata, struct bpf_link *link)
pid_t pid;
pid = bpf_thp->pid;
+
+ /* Fallback to global mode if pid is not set. */
+ if (!pid)
+ return bpf_thp_reg_gloabl(kdata, link);
+
p = find_get_task_by_vpid(pid);
if (!p)
return -ESRCH;
@@ -207,8 +255,10 @@ static int bpf_thp_reg(void *kdata, struct bpf_link *link)
* might register this task simultaneously.
*/
spin_lock(&thp_ops_lock);
- /* Each process is exclusively managed by a single BPF-THP. */
- if (rcu_access_pointer(mm->bpf_mm.bpf_thp))
+ /* Each process is exclusively managed by a single BPF-THP.
+ * Global mode disables per-process instances.
+ */
+ if (rcu_access_pointer(mm->bpf_mm.bpf_thp) || rcu_access_pointer(bpf_thp_global))
goto out_lock;
err = 0;
rcu_assign_pointer(mm->bpf_mm.bpf_thp, bpf_thp);
@@ -224,12 +274,33 @@ static int bpf_thp_reg(void *kdata, struct bpf_link *link)
return err;
}
+static void bpf_thp_unreg_global(void *kdata, struct bpf_link *link)
+{
+ struct bpf_thp_ops *bpf_thp;
+
+ spin_lock(&thp_ops_lock);
+ if (!rcu_access_pointer(bpf_thp_global)) {
+ spin_unlock(&thp_ops_lock);
+ return;
+ }
+
+ bpf_thp = rcu_replace_pointer(bpf_thp_global, NULL,
+ lockdep_is_held(&thp_ops_lock));
+ WARN_ON_ONCE(!bpf_thp);
+ spin_unlock(&thp_ops_lock);
+
+ synchronize_rcu();
+}
+
static void bpf_thp_unreg(void *kdata, struct bpf_link *link)
{
struct bpf_thp_ops *bpf_thp = kdata;
struct bpf_mm_ops *bpf_mm;
struct list_head *pos, *n;
+ if (!bpf_thp->pid)
+ return bpf_thp_unreg_global(kdata, link);
+
spin_lock(&thp_ops_lock);
list_for_each_safe(pos, n, &bpf_thp->mm_list) {
bpf_mm = list_entry(pos, struct bpf_mm_ops, bpf_thp_list);
@@ -242,6 +313,31 @@ static void bpf_thp_unreg(void *kdata, struct bpf_link *link)
synchronize_rcu();
}
+static int bpf_thp_update_global(void *kdata, void *old_kdata, struct bpf_link *link)
+{
+ struct bpf_thp_ops *old_bpf_thp = old_kdata;
+ struct bpf_thp_ops *bpf_thp = kdata;
+ struct bpf_thp_ops *old_global;
+
+ if (!old_bpf_thp || !bpf_thp)
+ return -EINVAL;
+
+ spin_lock(&thp_ops_lock);
+ /* BPF-THP global instance has already been removed. */
+ if (!rcu_access_pointer(bpf_thp_global)) {
+ spin_unlock(&thp_ops_lock);
+ return -ENOENT;
+ }
+
+ old_global = rcu_replace_pointer(bpf_thp_global, bpf_thp,
+ lockdep_is_held(&thp_ops_lock));
+ WARN_ON_ONCE(!old_global);
+ spin_unlock(&thp_ops_lock);
+
+ synchronize_rcu();
+ return 0;
+}
+
static int bpf_thp_update(void *kdata, void *old_kdata, struct bpf_link *link)
{
struct bpf_thp_ops *old_bpf_thp = old_kdata;
@@ -249,6 +345,15 @@ static int bpf_thp_update(void *kdata, void *old_kdata, struct bpf_link *link)
struct bpf_mm_ops *bpf_mm;
struct list_head *pos, *n;
+ /* Updates are confined to instances of the same scope:
+ * global to global, process-local to process-local.
+ */
+ if (!!old_bpf_thp->pid != !!bpf_thp->pid)
+ return -EINVAL;
+
+ if (!old_bpf_thp->pid)
+ return bpf_thp_update_global(kdata, old_kdata, link);
+
INIT_LIST_HEAD(&bpf_thp->mm_list);
/* Could be optimized to a per-instance lock if this lock becomes a bottleneck. */
--
2.47.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v11 mm-new 07/10] Documentation: add BPF THP
2025-10-20 3:16 [PATCH v11 mm-new 05/10] mm: thp: enable THP allocation exclusively through khugepaged Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 06/10] mm: bpf-thp: add support for global mode Yafang Shao
@ 2025-10-20 3:16 ` Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 08/10] selftests/bpf: add a simple BPF based THP policy Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 09/10] selftests/bpf: add test case to update " Yafang Shao
3 siblings, 0 replies; 5+ messages in thread
From: Yafang Shao @ 2025-10-20 3:16 UTC (permalink / raw)
To: akpm, ast, daniel, andrii, martin.lau, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
david, ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
dev.jain, hannes, usamaarif642, gutierrez.asier, willy,
ameryhung, rientjes, corbet, 21cnbao, shakeel.butt, tj,
lance.yang, rdunlap
Cc: bpf, linux-mm, linux-doc, linux-kernel, Yafang Shao
Add the documentation.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
Documentation/admin-guide/mm/transhuge.rst | 113 +++++++++++++++++++++
1 file changed, 113 insertions(+)
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 1654211cc6cf..4d2941158f09 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -738,3 +738,116 @@ support enabled just fine as always. No difference can be noted in
hugetlbfs other than there will be less overall fragmentation. All
usual features belonging to hugetlbfs are preserved and
unaffected. libhugetlbfs will also work fine as usual.
+
+BPF THP
+=======
+
+:Author: Yafang Shao <laoar.shao@gmail.com>
+:Date: October 2025
+
+Overview
+--------
+
+When the system is configured with "always" or "madvise" THP mode, a BPF program
+can be used to adjust THP allocation policies dynamically. This enables
+fine-grained control over THP decisions based on various factors including
+workload identity, allocation context, and system memory pressure.
+
+Program Interface
+-----------------
+
+This feature implements a struct_ops BPF program with the following interface::
+
+ struct bpf_thp_ops {
+ pid_t pid;
+ thp_order_fn_t *thp_get_order;
+ };
+
+Callback Functions
+------------------
+
+thp_get_order()
+~~~~~~~~~~~~~~~
+
+.. code-block:: c
+
+ int thp_get_order(struct vm_area_struct *vma,
+ enum tva_type type,
+ unsigned long orders);
+
+Parameters
+^^^^^^^^^^
+
+``vma``
+ ``vm_area_struct`` associated with the THP allocation.
+
+``type``
+ TVA type for the current ``vma``.
+
+``orders``
+ Bitmask of available THP orders for this allocation.
+
+Return value
+^^^^^^^^^^^^
+
+- The suggested THP order for allocation from the BPF program
+- Must be a valid, available order from the provided ``orders`` bitmask
+
+Operation Modes
+---------------
+
+Per Process Mode
+~~~~~~~~~~~~~~~~
+
+When registering a BPF-THP with a specific PID, the program is installed in the
+target task's ``mm_struct``::
+
+ struct mm_struct {
+ struct bpf_thp_ops __rcu *bpf_thp;
+ };
+
+Inheritance Behavior
+^^^^^^^^^^^^^^^^^^^^
+
+- Existing child processes are unaffected
+- Newly forked children inherit the BPF-THP from their parent
+- The BPF-THP persists across execve() calls
+
+Management Rules
+^^^^^^^^^^^^^^^^
+
+- When a BPF-THP instance is unregistered, all managed tasks' ``bpf_thp``
+ pointers are reset to ``NULL``
+- When a BPF-THP instance is updated, all managed tasks' ``bpf_thp`` pointers
+ are automatically updated to the new version
+- Each process can be managed by only one BPF-THP instance at a time
+
+Global Mode
+~~~~~~~~~~~
+
+If no PID is specified during registration, the BPF-THP operates in global mode.
+In this mode, all tasks in the system are managed by the global instance.
+
+Global Mode Precedence
+^^^^^^^^^^^^^^^^^^^^^^
+
+- The global instance takes precedence over all per-process instances
+- All existing per-process instances are disabled when a global instance is
+ registered
+- New per-process registrations are blocked while a global instance is active
+- Existing per-process instances remain registered (no forced unregistration)
+
+Instance Management
+^^^^^^^^^^^^^^^^^^^
+
+- Updates are type-isolated: global instances can only be updated by new global
+ instances, and per-process instances by new per-process instances
+- Only one global BPF-THP can be registered at a time
+- Global instances can be updated dynamically without requiring task restarts
+
+Implementation Notes
+--------------------
+
+- This is currently an experimental feature
+- ``CONFIG_BPF_THP`` must be enabled to use this functionality
+- The feature depends on proper THP configuration ("always" or "madvise" mode)
--
2.47.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v11 mm-new 08/10] selftests/bpf: add a simple BPF based THP policy
2025-10-20 3:16 [PATCH v11 mm-new 05/10] mm: thp: enable THP allocation exclusively through khugepaged Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 06/10] mm: bpf-thp: add support for global mode Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 07/10] Documentation: add BPF THP Yafang Shao
@ 2025-10-20 3:16 ` Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 09/10] selftests/bpf: add test case to update " Yafang Shao
3 siblings, 0 replies; 5+ messages in thread
From: Yafang Shao @ 2025-10-20 3:16 UTC (permalink / raw)
To: akpm, ast, daniel, andrii, martin.lau, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
david, ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
dev.jain, hannes, usamaarif642, gutierrez.asier, willy,
ameryhung, rientjes, corbet, 21cnbao, shakeel.butt, tj,
lance.yang, rdunlap
Cc: bpf, linux-mm, linux-doc, linux-kernel, Yafang Shao
This test case implements a basic THP policy that sets THPeligible to 0 for
a specific task. I selected THPeligible for verification because its
straightforward nature makes it ideal for validating the BPF THP policy
functionality.
Below configs must be enabled for this test:
CONFIG_BPF_MM=y
CONFIG_BPF_THP=y
CONFIG_TRANSPARENT_HUGEPAGE=y
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
MAINTAINERS | 2 +
tools/testing/selftests/bpf/config | 3 +
.../selftests/bpf/prog_tests/thp_adjust.c | 245 ++++++++++++++++++
.../selftests/bpf/progs/test_thp_adjust.c | 24 ++
4 files changed, 274 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/thp_adjust.c
create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 50faf3860a13..7febdd8b17b3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16523,6 +16523,8 @@ F: mm/huge_memory.c
F: mm/huge_memory_bpf.c
F: mm/khugepaged.c
F: mm/mm_slot.h
+F: tools/testing/selftests/bpf/prog_tests/thp_adjust.c
+F: tools/testing/selftests/bpf/progs/test_thp_adjust*
F: tools/testing/selftests/mm/khugepaged.c
F: tools/testing/selftests/mm/split_huge_page_test.c
F: tools/testing/selftests/mm/transhuge-stress.c
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 70b28c1e653e..8e57c449173b 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -7,8 +7,10 @@ CONFIG_BPF_JIT=y
CONFIG_BPF_KPROBE_OVERRIDE=y
CONFIG_BPF_LIRC_MODE2=y
CONFIG_BPF_LSM=y
+CONFIG_BPF_MM=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_BPF_SYSCALL=y
+CONFIG_BPF_THP=y
# CONFIG_BPF_UNPRIV_DEFAULT_OFF is not set
CONFIG_CGROUP_BPF=y
CONFIG_CRYPTO_HMAC=y
@@ -115,6 +117,7 @@ CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SYN_COOKIES=y
CONFIG_TEST_BPF=m
+CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_UDMABUF=y
CONFIG_USERFAULTFD=y
CONFIG_VSOCKETS=y
diff --git a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c
new file mode 100644
index 000000000000..2b23e2d08092
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <sys/mman.h>
+#include <test_progs.h>
+#include "test_thp_adjust.skel.h"
+
+#define LEN (16 * 1024 * 1024) /* 16MB */
+#define THP_ENABLED_FILE "/sys/kernel/mm/transparent_hugepage/enabled"
+#define PMD_SIZE_FILE "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
+
+static struct test_thp_adjust *skel;
+static char old_mode[32];
+static long pagesize;
+
+static int thp_mode_save(void)
+{
+ const char *start, *end;
+ char buf[128];
+ int fd, err;
+ size_t len;
+
+ fd = open(THP_ENABLED_FILE, O_RDONLY);
+ if (fd == -1)
+ return -1;
+
+ err = read(fd, buf, sizeof(buf) - 1);
+ if (err == -1)
+ goto close;
+
+ start = strchr(buf, '[');
+ end = start ? strchr(start, ']') : NULL;
+ if (!start || !end || end <= start) {
+ err = -1;
+ goto close;
+ }
+
+ len = end - start - 1;
+ if (len >= sizeof(old_mode))
+ len = sizeof(old_mode) - 1;
+ strncpy(old_mode, start + 1, len);
+ old_mode[len] = '\0';
+
+close:
+ close(fd);
+ return err;
+}
+
+static int thp_mode_set(const char *desired_mode)
+{
+ int fd, err;
+
+ fd = open(THP_ENABLED_FILE, O_RDWR);
+ if (fd == -1)
+ return -1;
+
+ err = write(fd, desired_mode, strlen(desired_mode));
+ close(fd);
+ return err;
+}
+
+static int thp_mode_reset(void)
+{
+ int fd, err;
+
+ fd = open(THP_ENABLED_FILE, O_WRONLY);
+ if (fd == -1)
+ return -1;
+
+ err = write(fd, old_mode, strlen(old_mode));
+ close(fd);
+ return err;
+}
+
+static char *thp_alloc(void)
+{
+ char *addr;
+ int err, i;
+
+ addr = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
+ if (addr == MAP_FAILED)
+ return NULL;
+
+ err = madvise(addr, LEN, MADV_HUGEPAGE);
+ if (err == -1)
+ goto unmap;
+
+ /* Accessing a single byte within a page is sufficient to trigger a page fault. */
+ for (i = 0; i < LEN; i += pagesize)
+ addr[i] = 1;
+ return addr;
+
+unmap:
+ munmap(addr, LEN);
+ return NULL;
+}
+
+static void thp_free(char *ptr)
+{
+ munmap(ptr, LEN);
+}
+
+static int get_pmd_order(void)
+{
+ ssize_t bytes_read, size;
+ int fd, order, ret = -1;
+ char buf[64], *endptr;
+
+ fd = open(PMD_SIZE_FILE, O_RDONLY);
+ if (fd < 0)
+ return -1;
+
+ bytes_read = read(fd, buf, sizeof(buf) - 1);
+ if (bytes_read <= 0)
+ goto close_fd;
+
+ /* Remove potential newline character */
+ if (buf[bytes_read - 1] == '\n')
+ buf[bytes_read - 1] = '\0';
+
+ size = strtoul(buf, &endptr, 10);
+ if (endptr == buf || *endptr != '\0')
+ goto close_fd;
+ if (size % pagesize != 0)
+ goto close_fd;
+ ret = size / pagesize;
+ if ((ret & (ret - 1)) == 0) {
+ order = 0;
+ while (ret > 1) {
+ ret >>= 1;
+ order++;
+ }
+ ret = order;
+ }
+
+close_fd:
+ close(fd);
+ return ret;
+}
+
+static int get_thp_eligible(pid_t pid, unsigned long addr)
+{
+ int this_vma = 0, eligible = -1;
+ unsigned long start, end;
+ char smaps_path[64];
+ FILE *smaps_file;
+ char line[4096];
+
+ snprintf(smaps_path, sizeof(smaps_path), "/proc/%d/smaps", pid);
+ smaps_file = fopen(smaps_path, "r");
+ if (!smaps_file)
+ return -1;
+
+ while (fgets(line, sizeof(line), smaps_file)) {
+ if (sscanf(line, "%lx-%lx", &start, &end) == 2) {
+ /* addr is monotonic */
+ if (addr < start)
+ break;
+ this_vma = (addr >= start && addr < end) ? 1 : 0;
+ continue;
+ }
+
+ if (!this_vma)
+ continue;
+
+ if (strstr(line, "THPeligible:")) {
+ sscanf(line, "THPeligible: %d", &eligible);
+ break;
+ }
+ }
+
+ fclose(smaps_file);
+ return eligible;
+}
+
+static void subtest_thp_eligible(void)
+{
+ struct bpf_link *ops_link;
+ int elighble;
+ char *ptr;
+
+ ops_link = bpf_map__attach_struct_ops(skel->maps.thp_eligible_ops);
+ if (!ASSERT_OK_PTR(ops_link, "attach struct_ops"))
+ return;
+
+ ptr = thp_alloc();
+ if (!ASSERT_OK_PTR(ptr, "THP alloc"))
+ goto detach;
+
+ elighble = get_thp_eligible(getpid(), (unsigned long)ptr);
+ ASSERT_EQ(elighble, 0, "THPeligible");
+
+ thp_free(ptr);
+detach:
+ bpf_link__destroy(ops_link);
+}
+
+static int thp_adjust_setup(void)
+{
+ int err = -1, pmd_order;
+
+ pagesize = sysconf(_SC_PAGESIZE);
+ pmd_order = get_pmd_order();
+ if (!ASSERT_NEQ(pmd_order, -1, "get_pmd_order"))
+ return -1;
+
+ if (!ASSERT_NEQ(thp_mode_save(), -1, "THP mode save"))
+ return -1;
+ if (!ASSERT_GE(thp_mode_set("madvise"), 0, "THP mode set"))
+ return -1;
+
+ skel = test_thp_adjust__open();
+ if (!ASSERT_OK_PTR(skel, "open"))
+ goto thp_reset;
+
+ skel->bss->pmd_order = pmd_order;
+ skel->struct_ops.thp_eligible_ops->pid = getpid();
+
+ err = test_thp_adjust__load(skel);
+ if (!ASSERT_OK(err, "load"))
+ goto destroy;
+ return 0;
+
+destroy:
+ test_thp_adjust__destroy(skel);
+thp_reset:
+ ASSERT_GE(thp_mode_reset(), 0, "THP mode reset");
+ return err;
+}
+
+static void thp_adjust_destroy(void)
+{
+ test_thp_adjust__destroy(skel);
+ ASSERT_GE(thp_mode_reset(), 0, "THP mode reset");
+}
+
+void test_thp_adjust(void)
+{
+ if (thp_adjust_setup() == -1)
+ return;
+
+ if (test__start_subtest("thp_eligible"))
+ subtest_thp_eligible();
+
+ thp_adjust_destroy();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust.c b/tools/testing/selftests/bpf/progs/test_thp_adjust.c
new file mode 100644
index 000000000000..b180a7f9b923
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_thp_adjust.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+int pmd_order;
+
+SEC("struct_ops/thp_get_order")
+int BPF_PROG(thp_not_eligible, struct vm_area_struct *vma, enum tva_type type,
+ unsigned long orders)
+{
+ /* THPeligible in /proc/pid/smaps is 0 */
+ if (type == TVA_SMAPS)
+ return 0;
+ return pmd_order;
+}
+
+SEC(".struct_ops.link")
+struct bpf_thp_ops thp_eligible_ops = {
+ .thp_get_order = (void *)thp_not_eligible,
+};
--
2.47.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v11 mm-new 09/10] selftests/bpf: add test case to update THP policy
2025-10-20 3:16 [PATCH v11 mm-new 05/10] mm: thp: enable THP allocation exclusively through khugepaged Yafang Shao
` (2 preceding siblings ...)
2025-10-20 3:16 ` [PATCH v11 mm-new 08/10] selftests/bpf: add a simple BPF based THP policy Yafang Shao
@ 2025-10-20 3:16 ` Yafang Shao
3 siblings, 0 replies; 5+ messages in thread
From: Yafang Shao @ 2025-10-20 3:16 UTC (permalink / raw)
To: akpm, ast, daniel, andrii, martin.lau, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
david, ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
dev.jain, hannes, usamaarif642, gutierrez.asier, willy,
ameryhung, rientjes, corbet, 21cnbao, shakeel.butt, tj,
lance.yang, rdunlap
Cc: bpf, linux-mm, linux-doc, linux-kernel, Yafang Shao
This test case exercises the BPF THP update mechanism by modifying an
existing policy. The behavior confirms that:
- EBUSY error occurs when attempting to install a BPF program on a process
that already has an active BPF program
- Updates to currently running programs are successfully processed
- Local prog can't be updated by a global prog
- Global prog can't be updated by a local prog
- Global prog can be attached even if there's a local prog
- Local prog can't be attached if there's a global prog
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
.../selftests/bpf/prog_tests/thp_adjust.c | 79 +++++++++++++++++++
.../selftests/bpf/progs/test_thp_adjust.c | 29 +++++++
2 files changed, 108 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c
index 2b23e2d08092..0d570cee9006 100644
--- a/tools/testing/selftests/bpf/prog_tests/thp_adjust.c
+++ b/tools/testing/selftests/bpf/prog_tests/thp_adjust.c
@@ -194,6 +194,79 @@ static void subtest_thp_eligible(void)
bpf_link__destroy(ops_link);
}
+static void subtest_thp_policy_update(void)
+{
+ struct bpf_link *old_link, *new_link;
+ int elighble, err, pid;
+ char *ptr;
+
+ pid = getpid();
+ ptr = thp_alloc();
+
+ old_link = bpf_map__attach_struct_ops(skel->maps.thp_eligible_ops);
+ if (!ASSERT_OK_PTR(old_link, "attach_old_link"))
+ goto free;
+
+ elighble = get_thp_eligible(pid, (unsigned long)ptr);
+ ASSERT_EQ(elighble, 0, "THPeligible");
+
+ /* Attach multi BPF-THP to a single process is rejected. */
+ new_link = bpf_map__attach_struct_ops(skel->maps.thp_eligible_ops2);
+ if (!ASSERT_NULL(new_link, "attach_new_link"))
+ goto destory_old;
+ ASSERT_EQ(errno, EBUSY, "attach_new_link");
+
+ elighble = get_thp_eligible(pid, (unsigned long)ptr);
+ ASSERT_EQ(elighble, 0, "THPeligible");
+
+ err = bpf_link__update_map(old_link, skel->maps.thp_eligible_ops2);
+ ASSERT_EQ(err, 0, "update_old_link");
+
+ elighble = get_thp_eligible(pid, (unsigned long)ptr);
+ ASSERT_EQ(elighble, 1, "THPeligible");
+
+ /* Per process prog can't be update by a global prog */
+ err = bpf_link__update_map(old_link, skel->maps.swap_ops);
+ ASSERT_EQ(err, -EINVAL, "update_old_link");
+
+destory_old:
+ bpf_link__destroy(old_link);
+free:
+ thp_free(ptr);
+}
+
+static void subtest_thp_global_policy(void)
+{
+ struct bpf_link *local_link, *global_link;
+ int err;
+
+ local_link = bpf_map__attach_struct_ops(skel->maps.thp_eligible_ops);
+ if (!ASSERT_OK_PTR(local_link, "attach_local_link"))
+ return;
+
+ /* global prog can be attached even if there is a local prog */
+ global_link = bpf_map__attach_struct_ops(skel->maps.swap_ops);
+ if (!ASSERT_OK_PTR(global_link, "attach_global_link")) {
+ bpf_link__destroy(local_link);
+ return;
+ }
+
+ bpf_link__destroy(local_link);
+
+ /* local prog can't be attaached if there is a global prog */
+ local_link = bpf_map__attach_struct_ops(skel->maps.thp_eligible_ops);
+ if (!ASSERT_NULL(local_link, "attach_new_link"))
+ goto destory_global;
+ ASSERT_EQ(errno, EBUSY, "attach_new_link");
+
+ /* global prog can't be updated by a local prog */
+ err = bpf_link__update_map(global_link, skel->maps.thp_eligible_ops);
+ ASSERT_EQ(err, -EINVAL, "update_old_link");
+
+destory_global:
+ bpf_link__destroy(global_link);
+}
+
static int thp_adjust_setup(void)
{
int err = -1, pmd_order;
@@ -214,6 +287,8 @@ static int thp_adjust_setup(void)
skel->bss->pmd_order = pmd_order;
skel->struct_ops.thp_eligible_ops->pid = getpid();
+ skel->struct_ops.thp_eligible_ops2->pid = getpid();
+ /* swap_ops is a global prog since its pid is not set. */
err = test_thp_adjust__load(skel);
if (!ASSERT_OK(err, "load"))
@@ -240,6 +315,10 @@ void test_thp_adjust(void)
if (test__start_subtest("thp_eligible"))
subtest_thp_eligible();
+ if (test__start_subtest("policy_update"))
+ subtest_thp_policy_update();
+ if (test__start_subtest("global_policy"))
+ subtest_thp_global_policy();
thp_adjust_destroy();
}
diff --git a/tools/testing/selftests/bpf/progs/test_thp_adjust.c b/tools/testing/selftests/bpf/progs/test_thp_adjust.c
index b180a7f9b923..44648326819a 100644
--- a/tools/testing/selftests/bpf/progs/test_thp_adjust.c
+++ b/tools/testing/selftests/bpf/progs/test_thp_adjust.c
@@ -22,3 +22,32 @@ SEC(".struct_ops.link")
struct bpf_thp_ops thp_eligible_ops = {
.thp_get_order = (void *)thp_not_eligible,
};
+
+SEC("struct_ops/thp_get_order")
+int BPF_PROG(thp_eligible, struct vm_area_struct *vma, enum tva_type type,
+ unsigned long orders)
+{
+ /* THPeligible in /proc/pid/smaps is 1 */
+ if (type == TVA_SMAPS)
+ return pmd_order;
+ return pmd_order;
+}
+
+SEC(".struct_ops.link")
+struct bpf_thp_ops thp_eligible_ops2 = {
+ .thp_get_order = (void *)thp_eligible,
+};
+
+SEC("struct_ops/thp_get_order")
+int BPF_PROG(alloc_not_in_swap, struct vm_area_struct *vma, enum tva_type type,
+ unsigned long orders)
+{
+ if (type == TVA_SWAP_PAGEFAULT)
+ return 0;
+ return -1;
+}
+
+SEC(".struct_ops.link")
+struct bpf_thp_ops swap_ops = {
+ .thp_get_order = (void *)alloc_not_in_swap,
+};
--
2.47.3
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-20 3:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-20 3:16 [PATCH v11 mm-new 05/10] mm: thp: enable THP allocation exclusively through khugepaged Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 06/10] mm: bpf-thp: add support for global mode Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 07/10] Documentation: add BPF THP Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 08/10] selftests/bpf: add a simple BPF based THP policy Yafang Shao
2025-10-20 3:16 ` [PATCH v11 mm-new 09/10] selftests/bpf: add test case to update " Yafang Shao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox