linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] mm: page_ext: Introduce new iteration API
@ 2025-02-19  2:17 Luiz Capitulino
  2025-02-19  2:17 ` [PATCH 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Luiz Capitulino @ 2025-02-19  2:17 UTC (permalink / raw)
  To: linux-kernel, linux-mm, david, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song, luizcap

Hi,

  [ Thanks to David Hildenbrand for identifying the root cause of this
    issue and proving guidance on how to fix it. The new API idea, bugs
    and misconceptions are all mine though ]

Currently, trying to reserve 1G pages with page_owner=on and sparsemem
causes a crash. The reproducer is very simple:

 1. Build the kernel with CONFIG_SPARSEMEM=y and the table extensions
 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
 3. Reserve one 1G page at run-time, this should crash (see patch 1 for
    backtrace) 

 [ A crash with page_table_check is also possible, but harder to trigger ]

Apparently, starting with commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP
for gigantic folios") we now pass the full allocation order to page
extension clients and the page extension implementation assumes that all
PFNs of an allocation range will be stored in the same memory section (which
is not true for 1G pages).

To fix this, this series introduces a new iteration API for page extension
objects. The API checks if the next page extension object can be retrieved
from the current section or if it needs to look up for it in another
section.

All details in patch 1. Also, this series is against Linus tree commit
2408a807bfc3f738850ef5ad5e3fd59d66168996 .

RFC -> v1
=========

- Revamped the API by introducing for_each_page_ext macros
- Implemented various suggestions from David Hildenbrand, including page_ext
  lookup optimization
- Fixed changelogs

Luiz Capitulino (4):
  mm: page_ext: add an iteration API for page extensions
  mm: page_table_check: use new iteration API
  mm: page_owner: use new iteration API
  mm: page_ext: make page_ext_next() private to page_ext

 include/linux/page_ext.h | 67 +++++++++++++++++++++++++++++++++++++---
 mm/page_ext.c            | 48 ++++++++++++++++++++++++++++
 mm/page_owner.c          | 61 +++++++++++++++++-------------------
 mm/page_table_check.c    | 39 +++++++----------------
 4 files changed, 152 insertions(+), 63 deletions(-)

-- 
2.48.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] mm: page_ext: add an iteration API for page extensions
  2025-02-19  2:17 [PATCH 0/4] mm: page_ext: Introduce new iteration API Luiz Capitulino
@ 2025-02-19  2:17 ` Luiz Capitulino
  2025-02-20 10:59   ` David Hildenbrand
  2025-02-19  2:17 ` [PATCH 2/4] mm: page_table_check: use new iteration API Luiz Capitulino
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Luiz Capitulino @ 2025-02-19  2:17 UTC (permalink / raw)
  To: linux-kernel, linux-mm, david, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song, luizcap

The page extension implementation assumes that all page extensions of
a given page order are stored in the same memory section. The function
page_ext_next() relies on this assumption by adding an offset to the
current object to return the next adjacent page extension.

This behavior works as expected for flatmem but fails for sparsemem when
using 1G pages. The commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for
gigantic folios") exposes this issue, making it possible for a crash when
using page_owner or page_table_check page extensions.

The problem is that for 1G pages, the page extensions may span memory
section boundaries and be stored in different memory sections. This issue
was not visible before commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP
for gigantic folios") because alloc_contig_pages() never passed more than
MAX_PAGE_ORDER to post_alloc_hook(). However, the series introducing
mentioned commit changed this behavior allowing the full 1G page order
to be passed.

Reproducer:

 1. Build the kernel with CONFIG_SPARSEMEM=y and table extensions
    support
 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
 3. Reserve one 1G page at run-time, this should crash (backtrace below)

To address this issue, this commit introduces a new API for iterating
through page extensions. The main iteration loops are for_each_page_ext()
and for_each_page_ext_order(). Both must be called with the RCU read
lock taken. Here's an usage example:

"""
struct page_ext_iter iter;
struct page_ext *page_ext;

...

rcu_read_lock();
for_each_page_ext_order(page, order, page_ext, iter) {
	struct my_page_ext *obj = get_my_page_ext_obj(page_ext);
	...
}
rcu_read_unlock();
"""

Both loop constructs use page_ext_iter_next(), which checks to see if we
have crossed sections in the iteration. In this case, page_ext_iter_next()
retrieves the next page_ext object from another section.

Thanks to David Hildenbrand for helping identify the root cause and
providing suggestions on how to fix and optmize the solution (final
implementation and bugs are all mine through).

Lastly, here's the backtrace, without kasan you can get random crashes:

[   76.052526] BUG: KASAN: slab-out-of-bounds in __update_page_owner_handle+0x238/0x298
[   76.060283] Write of size 4 at addr ffff07ff96240038 by task tee/3598
[   76.066714]
[   76.068203] CPU: 88 UID: 0 PID: 3598 Comm: tee Kdump: loaded Not tainted 6.13.0-rep1 #3
[   76.076202] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220810 (SCP: 2.10.20220810) 2022/08/10
[   76.088972] Call trace:
[   76.091411]  show_stack+0x20/0x38 (C)
[   76.095073]  dump_stack_lvl+0x80/0xf8
[   76.098733]  print_address_description.constprop.0+0x88/0x398
[   76.104476]  print_report+0xa8/0x278
[   76.108041]  kasan_report+0xa8/0xf8
[   76.111520]  __asan_report_store4_noabort+0x20/0x30
[   76.116391]  __update_page_owner_handle+0x238/0x298
[   76.121259]  __set_page_owner+0xdc/0x140
[   76.125173]  post_alloc_hook+0x190/0x1d8
[   76.129090]  alloc_contig_range_noprof+0x54c/0x890
[   76.133874]  alloc_contig_pages_noprof+0x35c/0x4a8
[   76.138656]  alloc_gigantic_folio.isra.0+0x2c0/0x368
[   76.143616]  only_alloc_fresh_hugetlb_folio.isra.0+0x24/0x150
[   76.149353]  alloc_pool_huge_folio+0x11c/0x1f8
[   76.153787]  set_max_huge_pages+0x364/0xca8
[   76.157961]  __nr_hugepages_store_common+0xb0/0x1a0
[   76.162829]  nr_hugepages_store+0x108/0x118
[   76.167003]  kobj_attr_store+0x3c/0x70
[   76.170745]  sysfs_kf_write+0xfc/0x188
[   76.174492]  kernfs_fop_write_iter+0x274/0x3e0
[   76.178927]  vfs_write+0x64c/0x8e0
[   76.182323]  ksys_write+0xf8/0x1f0
[   76.185716]  __arm64_sys_write+0x74/0xb0
[   76.189630]  invoke_syscall.constprop.0+0xd8/0x1e0
[   76.194412]  do_el0_svc+0x164/0x1e0
[   76.197891]  el0_svc+0x40/0xe0
[   76.200939]  el0t_64_sync_handler+0x144/0x168
[   76.205287]  el0t_64_sync+0x1ac/0x1b0

Fixes: cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
 include/linux/page_ext.h | 66 ++++++++++++++++++++++++++++++++++++++++
 mm/page_ext.c            | 41 +++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index e4b48a0dda244..a99da12e59fa7 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -3,6 +3,7 @@
 #define __LINUX_PAGE_EXT_H
 
 #include <linux/types.h>
+#include <linux/mmzone.h>
 #include <linux/stacktrace.h>
 
 struct pglist_data;
@@ -69,12 +70,26 @@ extern void page_ext_init(void);
 static inline void page_ext_init_flatmem_late(void)
 {
 }
+
+static inline bool page_ext_iter_next_fast_possible(unsigned long next_pfn)
+{
+	/*
+	 * page_ext is allocated per memory section. Once we cross a
+	 * memory section, we have to fetch the new pointer.
+	 */
+	return next_pfn % PAGES_PER_SECTION;
+}
 #else
 extern void page_ext_init_flatmem(void);
 extern void page_ext_init_flatmem_late(void);
 static inline void page_ext_init(void)
 {
 }
+
+static inline bool page_ext_iter_next_fast_possible(unsigned long next_pfn)
+{
+	return true;
+}
 #endif
 
 extern struct page_ext *page_ext_get(const struct page *page);
@@ -93,6 +108,57 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr)
 	return next;
 }
 
+struct page_ext_iter {
+	unsigned long pfn;
+	unsigned long index;
+	struct page_ext *page_ext;
+};
+
+struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
+struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
+
+/**
+ * page_ext_iter_get() - Get current page extension
+ * @iter: page extension iterator.
+ *
+ * Return: NULL if no page_ext exists for this iterator.
+ */
+static inline struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter)
+{
+	return iter->page_ext;
+}
+
+/**
+ * for_each_page_ext(): iterate through page_ext objects.
+ * @__page: the page we're interested in
+ * @__pgcount: how many pages to iterate through
+ * @__page_ext: struct page_ext pointer where the current page_ext
+ *              object is returned
+ * @__iter: struct page_ext_iter object (defined in the stack)
+ *
+ * IMPORTANT: must be called with RCU read lock taken.
+ */
+#define for_each_page_ext(__page, __pgcount, __page_ext, __iter) \
+	__page_ext = page_ext_iter_begin(&__iter, __page);     \
+	for (__iter.index = 0;                                 \
+		__page_ext && __iter.index < __pgcount;        \
+		__page_ext = page_ext_iter_next(&__iter),      \
+		__iter.index++)
+
+/**
+ * for_each_page_ext_order(): iterate through page_ext objects
+ *                            for a given page order
+ * @__page: the page we're interested in
+ * @__order: page order to iterate through
+ * @__page_ext: struct page_ext pointer where the current page_ext
+ *              object is returned
+ * @__iter: struct page_ext_iter object (defined in the stack)
+ *
+ * IMPORTANT: must be called with RCU read lock taken.
+ */
+#define for_each_page_ext_order(__page, __order, __page_ext, __iter) \
+	for_each_page_ext(__page, (1UL << __order), __page_ext, __iter)
+
 #else /* !CONFIG_PAGE_EXTENSION */
 struct page_ext;
 
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 641d93f6af4c1..508deb04d5ead 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -549,3 +549,44 @@ void page_ext_put(struct page_ext *page_ext)
 
 	rcu_read_unlock();
 }
+
+/**
+ * page_ext_iter_begin() - Prepare for iterating through page extensions.
+ * @iter: page extension iterator.
+ * @page: The page we're interested in.
+ *
+ * Must be called with RCU read lock taken.
+ *
+ * Return: NULL if no page_ext exists for this page.
+ */
+struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
+{
+	iter->pfn = page_to_pfn(page);
+	iter->page_ext = lookup_page_ext(page);
+
+	return iter->page_ext;
+}
+
+/**
+ * page_ext_iter_next() - Get next page extension
+ * @iter: page extension iterator.
+ *
+ * Must be called with RCU read lock taken.
+ *
+ * Return: NULL if no next page_ext exists.
+ */
+struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
+{
+	if (WARN_ON_ONCE(!iter->page_ext))
+		return NULL;
+
+	iter->pfn++;
+
+	if (page_ext_iter_next_fast_possible(iter->pfn)) {
+		iter->page_ext = page_ext_next(iter->page_ext);
+	} else {
+		iter->page_ext = lookup_page_ext(pfn_to_page(iter->pfn));
+	}
+
+	return iter->page_ext;
+}
-- 
2.48.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/4] mm: page_table_check: use new iteration API
  2025-02-19  2:17 [PATCH 0/4] mm: page_ext: Introduce new iteration API Luiz Capitulino
  2025-02-19  2:17 ` [PATCH 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
@ 2025-02-19  2:17 ` Luiz Capitulino
  2025-02-20 11:05   ` David Hildenbrand
  2025-02-19  2:17 ` [PATCH 3/4] mm: page_owner: " Luiz Capitulino
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Luiz Capitulino @ 2025-02-19  2:17 UTC (permalink / raw)
  To: linux-kernel, linux-mm, david, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song, luizcap

The page_ext_next() function assumes that page extension objects for a
page order allocation always reside in the same memory section, which
may not be true and could lead to crashes. Use the new page_ext
iteration API instead.

Fixes: cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
 mm/page_table_check.c | 39 ++++++++++++---------------------------
 1 file changed, 12 insertions(+), 27 deletions(-)

diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 509c6ef8de400..b52e04d31c809 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -62,24 +62,20 @@ static struct page_table_check *get_page_table_check(struct page_ext *page_ext)
  */
 static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
 {
+	struct page_ext_iter iter;
 	struct page_ext *page_ext;
 	struct page *page;
-	unsigned long i;
 	bool anon;
 
 	if (!pfn_valid(pfn))
 		return;
 
 	page = pfn_to_page(pfn);
-	page_ext = page_ext_get(page);
-
-	if (!page_ext)
-		return;
-
 	BUG_ON(PageSlab(page));
 	anon = PageAnon(page);
 
-	for (i = 0; i < pgcnt; i++) {
+	rcu_read_lock();
+	for_each_page_ext(page, pgcnt, page_ext, iter) {
 		struct page_table_check *ptc = get_page_table_check(page_ext);
 
 		if (anon) {
@@ -89,9 +85,8 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
 			BUG_ON(atomic_read(&ptc->anon_map_count));
 			BUG_ON(atomic_dec_return(&ptc->file_map_count) < 0);
 		}
-		page_ext = page_ext_next(page_ext);
 	}
-	page_ext_put(page_ext);
+	rcu_read_unlock();
 }
 
 /*
@@ -102,24 +97,20 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
 static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
 				 bool rw)
 {
+	struct page_ext_iter iter;
 	struct page_ext *page_ext;
 	struct page *page;
-	unsigned long i;
 	bool anon;
 
 	if (!pfn_valid(pfn))
 		return;
 
 	page = pfn_to_page(pfn);
-	page_ext = page_ext_get(page);
-
-	if (!page_ext)
-		return;
-
 	BUG_ON(PageSlab(page));
 	anon = PageAnon(page);
 
-	for (i = 0; i < pgcnt; i++) {
+	rcu_read_lock();
+	for_each_page_ext(page, pgcnt, page_ext, iter) {
 		struct page_table_check *ptc = get_page_table_check(page_ext);
 
 		if (anon) {
@@ -129,9 +120,8 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
 			BUG_ON(atomic_read(&ptc->anon_map_count));
 			BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0);
 		}
-		page_ext = page_ext_next(page_ext);
 	}
-	page_ext_put(page_ext);
+	rcu_read_unlock();
 }
 
 /*
@@ -140,24 +130,19 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
  */
 void __page_table_check_zero(struct page *page, unsigned int order)
 {
+	struct page_ext_iter iter;
 	struct page_ext *page_ext;
-	unsigned long i;
 
 	BUG_ON(PageSlab(page));
 
-	page_ext = page_ext_get(page);
-
-	if (!page_ext)
-		return;
-
-	for (i = 0; i < (1ul << order); i++) {
+	rcu_read_lock();
+	for_each_page_ext_order(page, order, page_ext, iter) {
 		struct page_table_check *ptc = get_page_table_check(page_ext);
 
 		BUG_ON(atomic_read(&ptc->anon_map_count));
 		BUG_ON(atomic_read(&ptc->file_map_count));
-		page_ext = page_ext_next(page_ext);
 	}
-	page_ext_put(page_ext);
+	rcu_read_unlock();
 }
 
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
-- 
2.48.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] mm: page_owner: use new iteration API
  2025-02-19  2:17 [PATCH 0/4] mm: page_ext: Introduce new iteration API Luiz Capitulino
  2025-02-19  2:17 ` [PATCH 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
  2025-02-19  2:17 ` [PATCH 2/4] mm: page_table_check: use new iteration API Luiz Capitulino
@ 2025-02-19  2:17 ` Luiz Capitulino
  2025-02-19  2:17 ` [PATCH 4/4] mm: page_ext: make page_ext_next() private to page_ext Luiz Capitulino
  2025-02-19 23:52 ` [PATCH 0/4] mm: page_ext: Introduce new iteration API Andrew Morton
  4 siblings, 0 replies; 15+ messages in thread
From: Luiz Capitulino @ 2025-02-19  2:17 UTC (permalink / raw)
  To: linux-kernel, linux-mm, david, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song, luizcap

The page_ext_next() function assumes that page extension objects for a
page order allocation always reside in the same memory section, which
may not be true and could lead to crashes. Use the new page_ext
iteration API instead.

Fixes: cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
 mm/page_owner.c | 61 +++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 32 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index 2d6360eaccbb6..9afc62a882813 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -229,17 +229,19 @@ static void dec_stack_record_count(depot_stack_handle_t handle,
 			handle);
 }
 
-static inline void __update_page_owner_handle(struct page_ext *page_ext,
+static inline void __update_page_owner_handle(struct page *page,
 					      depot_stack_handle_t handle,
 					      unsigned short order,
 					      gfp_t gfp_mask,
 					      short last_migrate_reason, u64 ts_nsec,
 					      pid_t pid, pid_t tgid, char *comm)
 {
-	int i;
+	struct page_ext_iter iter;
+	struct page_ext *page_ext;
 	struct page_owner *page_owner;
 
-	for (i = 0; i < (1 << order); i++) {
+	rcu_read_lock();
+	for_each_page_ext_order(page, order, page_ext, iter) {
 		page_owner = get_page_owner(page_ext);
 		page_owner->handle = handle;
 		page_owner->order = order;
@@ -252,20 +254,22 @@ static inline void __update_page_owner_handle(struct page_ext *page_ext,
 			sizeof(page_owner->comm));
 		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
 		__set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
-		page_ext = page_ext_next(page_ext);
 	}
+	rcu_read_unlock();
 }
 
-static inline void __update_page_owner_free_handle(struct page_ext *page_ext,
+static inline void __update_page_owner_free_handle(struct page *page,
 						   depot_stack_handle_t handle,
 						   unsigned short order,
 						   pid_t pid, pid_t tgid,
 						   u64 free_ts_nsec)
 {
-	int i;
+	struct page_ext_iter iter;
+	struct page_ext *page_ext;
 	struct page_owner *page_owner;
 
-	for (i = 0; i < (1 << order); i++) {
+	rcu_read_lock();
+	for_each_page_ext_order(page, order, page_ext, iter) {
 		page_owner = get_page_owner(page_ext);
 		/* Only __reset_page_owner() wants to clear the bit */
 		if (handle) {
@@ -275,8 +279,8 @@ static inline void __update_page_owner_free_handle(struct page_ext *page_ext,
 		page_owner->free_ts_nsec = free_ts_nsec;
 		page_owner->free_pid = current->pid;
 		page_owner->free_tgid = current->tgid;
-		page_ext = page_ext_next(page_ext);
 	}
+	rcu_read_unlock();
 }
 
 void __reset_page_owner(struct page *page, unsigned short order)
@@ -293,11 +297,11 @@ void __reset_page_owner(struct page *page, unsigned short order)
 
 	page_owner = get_page_owner(page_ext);
 	alloc_handle = page_owner->handle;
+	page_ext_put(page_ext);
 
 	handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
-	__update_page_owner_free_handle(page_ext, handle, order, current->pid,
+	__update_page_owner_free_handle(page, handle, order, current->pid,
 					current->tgid, free_ts_nsec);
-	page_ext_put(page_ext);
 
 	if (alloc_handle != early_handle)
 		/*
@@ -313,19 +317,13 @@ void __reset_page_owner(struct page *page, unsigned short order)
 noinline void __set_page_owner(struct page *page, unsigned short order,
 					gfp_t gfp_mask)
 {
-	struct page_ext *page_ext;
 	u64 ts_nsec = local_clock();
 	depot_stack_handle_t handle;
 
 	handle = save_stack(gfp_mask);
-
-	page_ext = page_ext_get(page);
-	if (unlikely(!page_ext))
-		return;
-	__update_page_owner_handle(page_ext, handle, order, gfp_mask, -1,
+	__update_page_owner_handle(page, handle, order, gfp_mask, -1,
 				   ts_nsec, current->pid, current->tgid,
 				   current->comm);
-	page_ext_put(page_ext);
 	inc_stack_record_count(handle, gfp_mask, 1 << order);
 }
 
@@ -344,26 +342,24 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
 
 void __split_page_owner(struct page *page, int old_order, int new_order)
 {
-	int i;
-	struct page_ext *page_ext = page_ext_get(page);
+	struct page_ext_iter iter;
+	struct page_ext *page_ext;
 	struct page_owner *page_owner;
 
-	if (unlikely(!page_ext))
-		return;
-
-	for (i = 0; i < (1 << old_order); i++) {
+	rcu_read_lock();
+	for_each_page_ext_order(page, old_order, page_ext, iter) {
 		page_owner = get_page_owner(page_ext);
 		page_owner->order = new_order;
-		page_ext = page_ext_next(page_ext);
 	}
-	page_ext_put(page_ext);
+	rcu_read_unlock();
 }
 
 void __folio_copy_owner(struct folio *newfolio, struct folio *old)
 {
-	int i;
 	struct page_ext *old_ext;
 	struct page_ext *new_ext;
+	struct page_ext *page_ext;
+	struct page_ext_iter iter;
 	struct page_owner *old_page_owner;
 	struct page_owner *new_page_owner;
 	depot_stack_handle_t migrate_handle;
@@ -381,7 +377,7 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
 	old_page_owner = get_page_owner(old_ext);
 	new_page_owner = get_page_owner(new_ext);
 	migrate_handle = new_page_owner->handle;
-	__update_page_owner_handle(new_ext, old_page_owner->handle,
+	__update_page_owner_handle(&newfolio->page, old_page_owner->handle,
 				   old_page_owner->order, old_page_owner->gfp_mask,
 				   old_page_owner->last_migrate_reason,
 				   old_page_owner->ts_nsec, old_page_owner->pid,
@@ -391,7 +387,7 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
 	 * will be freed after migration. Keep them until then as they may be
 	 * useful.
 	 */
-	__update_page_owner_free_handle(new_ext, 0, old_page_owner->order,
+	__update_page_owner_free_handle(&newfolio->page, 0, old_page_owner->order,
 					old_page_owner->free_pid,
 					old_page_owner->free_tgid,
 					old_page_owner->free_ts_nsec);
@@ -400,11 +396,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
 	 * for the new one and the old folio otherwise there will be an imbalance
 	 * when subtracting those pages from the stack.
 	 */
-	for (i = 0; i < (1 << new_page_owner->order); i++) {
+	rcu_read_lock();
+	for_each_page_ext_order(&old->page, new_page_owner->order, page_ext, iter) {
+		old_page_owner = get_page_owner(page_ext);
 		old_page_owner->handle = migrate_handle;
-		old_ext = page_ext_next(old_ext);
-		old_page_owner = get_page_owner(old_ext);
 	}
+	rcu_read_unlock();
 
 	page_ext_put(new_ext);
 	page_ext_put(old_ext);
@@ -813,7 +810,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
 				goto ext_put_continue;
 
 			/* Found early allocated page */
-			__update_page_owner_handle(page_ext, early_handle, 0, 0,
+			__update_page_owner_handle(page, early_handle, 0, 0,
 						   -1, local_clock(), current->pid,
 						   current->tgid, current->comm);
 			count++;
-- 
2.48.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/4] mm: page_ext: make page_ext_next() private to page_ext
  2025-02-19  2:17 [PATCH 0/4] mm: page_ext: Introduce new iteration API Luiz Capitulino
                   ` (2 preceding siblings ...)
  2025-02-19  2:17 ` [PATCH 3/4] mm: page_owner: " Luiz Capitulino
@ 2025-02-19  2:17 ` Luiz Capitulino
  2025-02-19 23:52 ` [PATCH 0/4] mm: page_ext: Introduce new iteration API Andrew Morton
  4 siblings, 0 replies; 15+ messages in thread
From: Luiz Capitulino @ 2025-02-19  2:17 UTC (permalink / raw)
  To: linux-kernel, linux-mm, david, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song, luizcap

Previous commits removed page_ext_next() use from page_ext clients. Make
it private.

Fixes: cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
 include/linux/page_ext.h | 7 -------
 mm/page_ext.c            | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index a99da12e59fa7..3a4f0f825aa59 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -101,13 +101,6 @@ static inline void *page_ext_data(struct page_ext *page_ext,
 	return (void *)(page_ext) + ops->offset;
 }
 
-static inline struct page_ext *page_ext_next(struct page_ext *curr)
-{
-	void *next = curr;
-	next += page_ext_size;
-	return next;
-}
-
 struct page_ext_iter {
 	unsigned long pfn;
 	unsigned long index;
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 508deb04d5ead..f9e515d353005 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -567,6 +567,13 @@ struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *pa
 	return iter->page_ext;
 }
 
+static struct page_ext *page_ext_next(struct page_ext *curr)
+{
+	void *next = curr;
+	next += page_ext_size;
+	return next;
+}
+
 /**
  * page_ext_iter_next() - Get next page extension
  * @iter: page extension iterator.
-- 
2.48.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] mm: page_ext: Introduce new iteration API
  2025-02-19  2:17 [PATCH 0/4] mm: page_ext: Introduce new iteration API Luiz Capitulino
                   ` (3 preceding siblings ...)
  2025-02-19  2:17 ` [PATCH 4/4] mm: page_ext: make page_ext_next() private to page_ext Luiz Capitulino
@ 2025-02-19 23:52 ` Andrew Morton
  2025-02-20 10:49   ` David Hildenbrand
  4 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2025-02-19 23:52 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: linux-kernel, linux-mm, david, yuzhao, pasha.tatashin, hannes,
	muchun.song

On Tue, 18 Feb 2025 21:17:46 -0500 Luiz Capitulino <luizcap@redhat.com> wrote:

> To fix this, this series introduces a new iteration API for page extension
> objects. The API checks if the next page extension object can be retrieved
> from the current section or if it needs to look up for it in another
> section.
> 
> ...

A regression since 6.12, so we should backport the fix.

> ...
>
>  include/linux/page_ext.h | 67 +++++++++++++++++++++++++++++++++++++---
>  mm/page_ext.c            | 48 ++++++++++++++++++++++++++++
>  mm/page_owner.c          | 61 +++++++++++++++++-------------------
>  mm/page_table_check.c    | 39 +++++++----------------
>  4 files changed, 152 insertions(+), 63 deletions(-)

That's a lot to backport!

Is there some quick-n-dirty fixup we can apply for the sake of -stable
kernels, then work on this long-term approach for future kernels?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] mm: page_ext: Introduce new iteration API
  2025-02-19 23:52 ` [PATCH 0/4] mm: page_ext: Introduce new iteration API Andrew Morton
@ 2025-02-20 10:49   ` David Hildenbrand
  2025-02-20 20:23     ` Luiz Capitulino
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-20 10:49 UTC (permalink / raw)
  To: Andrew Morton, Luiz Capitulino
  Cc: linux-kernel, linux-mm, yuzhao, pasha.tatashin, hannes, muchun.song

On 20.02.25 00:52, Andrew Morton wrote:
> On Tue, 18 Feb 2025 21:17:46 -0500 Luiz Capitulino <luizcap@redhat.com> wrote:
> 
>> To fix this, this series introduces a new iteration API for page extension
>> objects. The API checks if the next page extension object can be retrieved
>> from the current section or if it needs to look up for it in another
>> section.
>>
>> ...
> 
> A regression since 6.12, so we should backport the fix.
> 
>> ...
>>
>>   include/linux/page_ext.h | 67 +++++++++++++++++++++++++++++++++++++---
>>   mm/page_ext.c            | 48 ++++++++++++++++++++++++++++
>>   mm/page_owner.c          | 61 +++++++++++++++++-------------------
>>   mm/page_table_check.c    | 39 +++++++----------------
>>   4 files changed, 152 insertions(+), 63 deletions(-)
> 
> That's a lot to backport!
> 
> Is there some quick-n-dirty fixup we can apply for the sake of -stable
> kernels, then work on this long-term approach for future kernels?

I assume we could loop in 
reset_page_owner()/page_table_check_free()/set_page_owner()/page_table_check_alloc(). 
Not-so-nice for upstream, maybe good-enough for stable. Still nasty :)

OTOH, we don't really expect a lot of conflicts.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] mm: page_ext: add an iteration API for page extensions
  2025-02-19  2:17 ` [PATCH 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
@ 2025-02-20 10:59   ` David Hildenbrand
  2025-02-20 20:36     ` Luiz Capitulino
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-20 10:59 UTC (permalink / raw)
  To: Luiz Capitulino, linux-kernel, linux-mm, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song

On 19.02.25 03:17, Luiz Capitulino wrote:
> The page extension implementation assumes that all page extensions of
> a given page order are stored in the same memory section. The function
> page_ext_next() relies on this assumption by adding an offset to the
> current object to return the next adjacent page extension.
> 
> This behavior works as expected for flatmem but fails for sparsemem when
> using 1G pages. The commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for
> gigantic folios") exposes this issue, making it possible for a crash when
> using page_owner or page_table_check page extensions.
> 
> The problem is that for 1G pages, the page extensions may span memory
> section boundaries and be stored in different memory sections. This issue
> was not visible before commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP
> for gigantic folios") because alloc_contig_pages() never passed more than
> MAX_PAGE_ORDER to post_alloc_hook(). However, the series introducing
> mentioned commit changed this behavior allowing the full 1G page order
> to be passed.
> 
> Reproducer:
> 
>   1. Build the kernel with CONFIG_SPARSEMEM=y and table extensions
>      support
>   2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
>   3. Reserve one 1G page at run-time, this should crash (backtrace below)
> 
> To address this issue, this commit introduces a new API for iterating
> through page extensions. The main iteration loops are for_each_page_ext()
> and for_each_page_ext_order(). Both must be called with the RCU read
> lock taken. Here's an usage example:
> 
> """
> struct page_ext_iter iter;
> struct page_ext *page_ext;
> 
> ...
> 
> rcu_read_lock();
> for_each_page_ext_order(page, order, page_ext, iter) {
> 	struct my_page_ext *obj = get_my_page_ext_obj(page_ext);
> 	...
> }
> rcu_read_unlock();
> """
> 

[...]

> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
> +
> +/**
> + * page_ext_iter_get() - Get current page extension
> + * @iter: page extension iterator.
> + *
> + * Return: NULL if no page_ext exists for this iterator.
> + */
> +static inline struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter)
> +{
> +	return iter->page_ext;
> +}
> +
> +/**
> + * for_each_page_ext(): iterate through page_ext objects.
> + * @__page: the page we're interested in
> + * @__pgcount: how many pages to iterate through
> + * @__page_ext: struct page_ext pointer where the current page_ext
> + *              object is returned
> + * @__iter: struct page_ext_iter object (defined in the stack)
> + *
> + * IMPORTANT: must be called with RCU read lock taken.
> + */
> +#define for_each_page_ext(__page, __pgcount, __page_ext, __iter) \
> +	__page_ext = page_ext_iter_begin(&__iter, __page);     \

Doing stuff out of the loop is dangerous. Assume something does

if (xxx)
	for_each_page_ext()

Just move that inside the for().

for (__page_ext = page_ext_iter_begin(&__iter, __page), __iter.index = 0)

> +	for (__iter.index = 0;                                 \
> +		__page_ext && __iter.index < __pgcount;        \
> +		__page_ext = page_ext_iter_next(&__iter),      \
> +		__iter.index++)

Hm, if we now have an index, why not turn iter.pfn -> iter.start_pfn, 
and only adjust the index in page_ext_iter_next?

Then you can set the index to 0 in page_ext_iter_begin() and have here

for (__page_ext = page_ext_iter_begin(&__iter, __page),
      __page_ext && __iter.index < __pgcount,
      __page_ext = page_ext_iter_next(&__iter);)


A page_ext_iter_reset() could then simply reset the index=0 and
lookup the page_ext(start_pfn + index) == page_ext(start_pfn)

> +
> +/**
> + * for_each_page_ext_order(): iterate through page_ext objects
> + *                            for a given page order
> + * @__page: the page we're interested in
> + * @__order: page order to iterate through
> + * @__page_ext: struct page_ext pointer where the current page_ext
> + *              object is returned
> + * @__iter: struct page_ext_iter object (defined in the stack)
> + *
> + * IMPORTANT: must be called with RCU read lock taken.
> + */
> +#define for_each_page_ext_order(__page, __order, __page_ext, __iter) \
> +	for_each_page_ext(__page, (1UL << __order), __page_ext, __iter)
> +
>   #else /* !CONFIG_PAGE_EXTENSION */
>   struct page_ext;
>   
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 641d93f6af4c1..508deb04d5ead 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -549,3 +549,44 @@ void page_ext_put(struct page_ext *page_ext)
>   
>   	rcu_read_unlock();
>   }
> +
> +/**
> + * page_ext_iter_begin() - Prepare for iterating through page extensions.
> + * @iter: page extension iterator.
> + * @page: The page we're interested in.
> + *
> + * Must be called with RCU read lock taken.
> + *
> + * Return: NULL if no page_ext exists for this page.
> + */
> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
> +{
> +	iter->pfn = page_to_pfn(page);
> +	iter->page_ext = lookup_page_ext(page);
> +
> +	return iter->page_ext;
> +}
> +
> +/**
> + * page_ext_iter_next() - Get next page extension
> + * @iter: page extension iterator.
> + *
> + * Must be called with RCU read lock taken.
> + *
> + * Return: NULL if no next page_ext exists.
> + */
> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
> +{
> +	if (WARN_ON_ONCE(!iter->page_ext))
> +		return NULL;
> +
> +	iter->pfn++;
 > +> +	if (page_ext_iter_next_fast_possible(iter->pfn)) {
> +		iter->page_ext = page_ext_next(iter->page_ext);
> +	} else {
> +		iter->page_ext = lookup_page_ext(pfn_to_page(iter->pfn));
> +	}
> +
> +	return iter->page_ext;
> +}

We now always have a function call when calling into 
page_ext_iter_next(). Could we move that to the header and rather expose 
lookup_page_ext() ?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm: page_table_check: use new iteration API
  2025-02-19  2:17 ` [PATCH 2/4] mm: page_table_check: use new iteration API Luiz Capitulino
@ 2025-02-20 11:05   ` David Hildenbrand
  2025-02-20 20:37     ` Luiz Capitulino
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-02-20 11:05 UTC (permalink / raw)
  To: Luiz Capitulino, linux-kernel, linux-mm, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song

On 19.02.25 03:17, Luiz Capitulino wrote:
> The page_ext_next() function assumes that page extension objects for a
> page order allocation always reside in the same memory section, which
> may not be true and could lead to crashes. Use the new page_ext
> iteration API instead.
> 
> Fixes: cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios")
> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> ---
>   mm/page_table_check.c | 39 ++++++++++++---------------------------
>   1 file changed, 12 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> index 509c6ef8de400..b52e04d31c809 100644
> --- a/mm/page_table_check.c
> +++ b/mm/page_table_check.c
> @@ -62,24 +62,20 @@ static struct page_table_check *get_page_table_check(struct page_ext *page_ext)
>    */
>   static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
>   {
> +	struct page_ext_iter iter;
>   	struct page_ext *page_ext;
>   	struct page *page;
> -	unsigned long i;
>   	bool anon;
>   
>   	if (!pfn_valid(pfn))
>   		return;
>   
>   	page = pfn_to_page(pfn);
> -	page_ext = page_ext_get(page);
> -
> -	if (!page_ext)
> -		return;
> -
>   	BUG_ON(PageSlab(page));
>   	anon = PageAnon(page);
>   
> -	for (i = 0; i < pgcnt; i++) {
> +	rcu_read_lock();
> +	for_each_page_ext(page, pgcnt, page_ext, iter) {
>   		struct page_table_check *ptc = get_page_table_check(page_ext);
>   
>   		if (anon) {
> @@ -89,9 +85,8 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
>   			BUG_ON(atomic_read(&ptc->anon_map_count));
>   			BUG_ON(atomic_dec_return(&ptc->file_map_count) < 0);
>   		}
> -		page_ext = page_ext_next(page_ext);
>   	}
> -	page_ext_put(page_ext);
> +	rcu_read_unlock();
>   }

[...]

>   
>   /*
> @@ -140,24 +130,19 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
>    */
>   void __page_table_check_zero(struct page *page, unsigned int order)
>   {
> +	struct page_ext_iter iter;
>   	struct page_ext *page_ext;
> -	unsigned long i;
>   
>   	BUG_ON(PageSlab(page));
>   
> -	page_ext = page_ext_get(page);
> -
> -	if (!page_ext)
> -		return;
> -
> -	for (i = 0; i < (1ul << order); i++) {
> +	rcu_read_lock();
> +	for_each_page_ext_order(page, order, page_ext, iter) {

I would avoid introducing for_each_page_ext_order() and just pass "1 << 
order" as the page count.

>   		struct page_table_check *ptc = get_page_table_check(page_ext);
>   
>   		BUG_ON(atomic_read(&ptc->anon_map_count));
>   		BUG_ON(atomic_read(&ptc->file_map_count));
> -		page_ext = page_ext_next(page_ext);
>   	}
> -	page_ext_put(page_ext);
> +	rcu_read_unlock();
>   }
>   
>   void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)

Apart from that, this looks very nice to me

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] mm: page_ext: Introduce new iteration API
  2025-02-20 10:49   ` David Hildenbrand
@ 2025-02-20 20:23     ` Luiz Capitulino
  0 siblings, 0 replies; 15+ messages in thread
From: Luiz Capitulino @ 2025-02-20 20:23 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: linux-kernel, linux-mm, yuzhao, pasha.tatashin, hannes, muchun.song

On 2025-02-20 05:49, David Hildenbrand wrote:
> On 20.02.25 00:52, Andrew Morton wrote:
>> On Tue, 18 Feb 2025 21:17:46 -0500 Luiz Capitulino <luizcap@redhat.com> wrote:
>>
>>> To fix this, this series introduces a new iteration API for page extension
>>> objects. The API checks if the next page extension object can be retrieved
>>> from the current section or if it needs to look up for it in another
>>> section.
>>>
>>> ...
>>
>> A regression since 6.12, so we should backport the fix.
>>
>>> ...
>>>
>>>   include/linux/page_ext.h | 67 +++++++++++++++++++++++++++++++++++++---
>>>   mm/page_ext.c            | 48 ++++++++++++++++++++++++++++
>>>   mm/page_owner.c          | 61 +++++++++++++++++-------------------
>>>   mm/page_table_check.c    | 39 +++++++----------------
>>>   4 files changed, 152 insertions(+), 63 deletions(-)
>>
>> That's a lot to backport!
>>
>> Is there some quick-n-dirty fixup we can apply for the sake of -stable
>> kernels, then work on this long-term approach for future kernels?
> 
> I assume we could loop in reset_page_owner()/page_table_check_free()/set_page_owner()/page_table_check_alloc(). Not-so-nice for upstream, maybe good-enough for stable. Still nasty :)

I think Andrew wants to have the quick-n-dirty fix for upstream, so that
it's easier to backport to -stable. Then we work on this solution on top.

> OTOH, we don't really expect a lot of conflicts.

Yes, I was able to apply this series on top of 6.12.15 without conflicts.
Given that -stable does backport a lot of fixes anyways, I would push for
having this on -stable.

But just to answer the original question: I can't think of quick-n-dirty,
but I can think of easy-n-ugly:

  1. We could add a check for MAX_PAGE_ORDER for the first function in a
     call chain calling page_ext_next() (that is, bail out if > MAX_PAGE_ORDER)

  2. We could replace all page_ext_next() calls to a version of look_page_ext()
     that takes a PFN

But all these ideas have regression risk as well, so I don't see the advantage.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] mm: page_ext: add an iteration API for page extensions
  2025-02-20 10:59   ` David Hildenbrand
@ 2025-02-20 20:36     ` Luiz Capitulino
  2025-02-20 20:45       ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Luiz Capitulino @ 2025-02-20 20:36 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel, linux-mm, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song

On 2025-02-20 05:59, David Hildenbrand wrote:
> On 19.02.25 03:17, Luiz Capitulino wrote:
>> The page extension implementation assumes that all page extensions of
>> a given page order are stored in the same memory section. The function
>> page_ext_next() relies on this assumption by adding an offset to the
>> current object to return the next adjacent page extension.
>>
>> This behavior works as expected for flatmem but fails for sparsemem when
>> using 1G pages. The commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for
>> gigantic folios") exposes this issue, making it possible for a crash when
>> using page_owner or page_table_check page extensions.
>>
>> The problem is that for 1G pages, the page extensions may span memory
>> section boundaries and be stored in different memory sections. This issue
>> was not visible before commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP
>> for gigantic folios") because alloc_contig_pages() never passed more than
>> MAX_PAGE_ORDER to post_alloc_hook(). However, the series introducing
>> mentioned commit changed this behavior allowing the full 1G page order
>> to be passed.
>>
>> Reproducer:
>>
>>   1. Build the kernel with CONFIG_SPARSEMEM=y and table extensions
>>      support
>>   2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
>>   3. Reserve one 1G page at run-time, this should crash (backtrace below)
>>
>> To address this issue, this commit introduces a new API for iterating
>> through page extensions. The main iteration loops are for_each_page_ext()
>> and for_each_page_ext_order(). Both must be called with the RCU read
>> lock taken. Here's an usage example:
>>
>> """
>> struct page_ext_iter iter;
>> struct page_ext *page_ext;
>>
>> ...
>>
>> rcu_read_lock();
>> for_each_page_ext_order(page, order, page_ext, iter) {
>>     struct my_page_ext *obj = get_my_page_ext_obj(page_ext);
>>     ...
>> }
>> rcu_read_unlock();
>> """
>>
> 
> [...]
> 
>> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
>> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
>> +
>> +/**
>> + * page_ext_iter_get() - Get current page extension
>> + * @iter: page extension iterator.
>> + *
>> + * Return: NULL if no page_ext exists for this iterator.
>> + */
>> +static inline struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter)
>> +{
>> +    return iter->page_ext;
>> +}
>> +
>> +/**
>> + * for_each_page_ext(): iterate through page_ext objects.
>> + * @__page: the page we're interested in
>> + * @__pgcount: how many pages to iterate through
>> + * @__page_ext: struct page_ext pointer where the current page_ext
>> + *              object is returned
>> + * @__iter: struct page_ext_iter object (defined in the stack)
>> + *
>> + * IMPORTANT: must be called with RCU read lock taken.
>> + */
>> +#define for_each_page_ext(__page, __pgcount, __page_ext, __iter) \
>> +    __page_ext = page_ext_iter_begin(&__iter, __page);     \
> 
> Doing stuff out of the loop is dangerous. Assume something does
> 
> if (xxx)
>      for_each_page_ext()
> 
> Just move that inside the for().
> 
> for (__page_ext = page_ext_iter_begin(&__iter, __page), __iter.index = 0)

Oh, good point. Will do the change.

>> +    for (__iter.index = 0;                                 \
>> +        __page_ext && __iter.index < __pgcount;        \
>> +        __page_ext = page_ext_iter_next(&__iter),      \
>> +        __iter.index++)
> 
> Hm, if we now have an index, why not turn iter.pfn -> iter.start_pfn, and only adjust the index in page_ext_iter_next?
> 
> Then you can set the index to 0 in page_ext_iter_begin() and have here
> 
> for (__page_ext = page_ext_iter_begin(&__iter, __page),
>       __page_ext && __iter.index < __pgcount,
>       __page_ext = page_ext_iter_next(&__iter);)

I can do this if you feel strong about it, but I prefer explicitly over
implicitly. I moved the index into the iter object just to avoid having
to define it in the macro's body. Also, the way I did it allows for
using page_ext_iter_begin()/page_ext_iter_next() own their if the need
arises.

> A page_ext_iter_reset() could then simply reset the index=0 and
> lookup the page_ext(start_pfn + index) == page_ext(start_pfn)

Just note we don't have page_ext_iter_reset() today (and I guess it's
not needed).

>> +
>> +/**
>> + * for_each_page_ext_order(): iterate through page_ext objects
>> + *                            for a given page order
>> + * @__page: the page we're interested in
>> + * @__order: page order to iterate through
>> + * @__page_ext: struct page_ext pointer where the current page_ext
>> + *              object is returned
>> + * @__iter: struct page_ext_iter object (defined in the stack)
>> + *
>> + * IMPORTANT: must be called with RCU read lock taken.
>> + */
>> +#define for_each_page_ext_order(__page, __order, __page_ext, __iter) \
>> +    for_each_page_ext(__page, (1UL << __order), __page_ext, __iter)
>> +
>>   #else /* !CONFIG_PAGE_EXTENSION */
>>   struct page_ext;
>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>> index 641d93f6af4c1..508deb04d5ead 100644
>> --- a/mm/page_ext.c
>> +++ b/mm/page_ext.c
>> @@ -549,3 +549,44 @@ void page_ext_put(struct page_ext *page_ext)
>>       rcu_read_unlock();
>>   }
>> +
>> +/**
>> + * page_ext_iter_begin() - Prepare for iterating through page extensions.
>> + * @iter: page extension iterator.
>> + * @page: The page we're interested in.
>> + *
>> + * Must be called with RCU read lock taken.
>> + *
>> + * Return: NULL if no page_ext exists for this page.
>> + */
>> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
>> +{
>> +    iter->pfn = page_to_pfn(page);
>> +    iter->page_ext = lookup_page_ext(page);
>> +
>> +    return iter->page_ext;
>> +}
>> +
>> +/**
>> + * page_ext_iter_next() - Get next page extension
>> + * @iter: page extension iterator.
>> + *
>> + * Must be called with RCU read lock taken.
>> + *
>> + * Return: NULL if no next page_ext exists.
>> + */
>> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
>> +{
>> +    if (WARN_ON_ONCE(!iter->page_ext))
>> +        return NULL;
>> +
>> +    iter->pfn++;
>  > +> +    if (page_ext_iter_next_fast_possible(iter->pfn)) {
>> +        iter->page_ext = page_ext_next(iter->page_ext);
>> +    } else {
>> +        iter->page_ext = lookup_page_ext(pfn_to_page(iter->pfn));
>> +    }
>> +
>> +    return iter->page_ext;
>> +}
> 
> We now always have a function call when calling into page_ext_iter_next(). Could we move that to the header and rather expose lookup_page_ext() ?

I personally don't like over-using inline functions, also I don't think this
code needs optimization since the current clients make the affected code paths
slow anyways (and this also applies to the likely/unlikely use in page_owner
and page_table_check, I'd drop all of them if you ask me). But again, I can
change if this would prevent you from giving your ACK :)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm: page_table_check: use new iteration API
  2025-02-20 11:05   ` David Hildenbrand
@ 2025-02-20 20:37     ` Luiz Capitulino
  0 siblings, 0 replies; 15+ messages in thread
From: Luiz Capitulino @ 2025-02-20 20:37 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel, linux-mm, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song

On 2025-02-20 06:05, David Hildenbrand wrote:
> On 19.02.25 03:17, Luiz Capitulino wrote:
>> The page_ext_next() function assumes that page extension objects for a
>> page order allocation always reside in the same memory section, which
>> may not be true and could lead to crashes. Use the new page_ext
>> iteration API instead.
>>
>> Fixes: cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios")
>> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
>> ---
>>   mm/page_table_check.c | 39 ++++++++++++---------------------------
>>   1 file changed, 12 insertions(+), 27 deletions(-)
>>
>> diff --git a/mm/page_table_check.c b/mm/page_table_check.c
>> index 509c6ef8de400..b52e04d31c809 100644
>> --- a/mm/page_table_check.c
>> +++ b/mm/page_table_check.c
>> @@ -62,24 +62,20 @@ static struct page_table_check *get_page_table_check(struct page_ext *page_ext)
>>    */
>>   static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
>>   {
>> +    struct page_ext_iter iter;
>>       struct page_ext *page_ext;
>>       struct page *page;
>> -    unsigned long i;
>>       bool anon;
>>       if (!pfn_valid(pfn))
>>           return;
>>       page = pfn_to_page(pfn);
>> -    page_ext = page_ext_get(page);
>> -
>> -    if (!page_ext)
>> -        return;
>> -
>>       BUG_ON(PageSlab(page));
>>       anon = PageAnon(page);
>> -    for (i = 0; i < pgcnt; i++) {
>> +    rcu_read_lock();
>> +    for_each_page_ext(page, pgcnt, page_ext, iter) {
>>           struct page_table_check *ptc = get_page_table_check(page_ext);
>>           if (anon) {
>> @@ -89,9 +85,8 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
>>               BUG_ON(atomic_read(&ptc->anon_map_count));
>>               BUG_ON(atomic_dec_return(&ptc->file_map_count) < 0);
>>           }
>> -        page_ext = page_ext_next(page_ext);
>>       }
>> -    page_ext_put(page_ext);
>> +    rcu_read_unlock();
>>   }
> 
> [...]
> 
>>   /*
>> @@ -140,24 +130,19 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
>>    */
>>   void __page_table_check_zero(struct page *page, unsigned int order)
>>   {
>> +    struct page_ext_iter iter;
>>       struct page_ext *page_ext;
>> -    unsigned long i;
>>       BUG_ON(PageSlab(page));
>> -    page_ext = page_ext_get(page);
>> -
>> -    if (!page_ext)
>> -        return;
>> -
>> -    for (i = 0; i < (1ul << order); i++) {
>> +    rcu_read_lock();
>> +    for_each_page_ext_order(page, order, page_ext, iter) {
> 
> I would avoid introducing for_each_page_ext_order() and just pass "1 << order" as the page count.

OK, I'll drop it.

> 
>>           struct page_table_check *ptc = get_page_table_check(page_ext);
>>           BUG_ON(atomic_read(&ptc->anon_map_count));
>>           BUG_ON(atomic_read(&ptc->file_map_count));
>> -        page_ext = page_ext_next(page_ext);
>>       }
>> -    page_ext_put(page_ext);
>> +    rcu_read_unlock();
>>   }
>>   void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
> 
> Apart from that, this looks very nice to me
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] mm: page_ext: add an iteration API for page extensions
  2025-02-20 20:36     ` Luiz Capitulino
@ 2025-02-20 20:45       ` David Hildenbrand
  2025-02-20 20:47         ` David Hildenbrand
  2025-02-20 21:12         ` Luiz Capitulino
  0 siblings, 2 replies; 15+ messages in thread
From: David Hildenbrand @ 2025-02-20 20:45 UTC (permalink / raw)
  To: Luiz Capitulino, linux-kernel, linux-mm, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song

>>> +    for (__iter.index = 0;                                 \
>>> +        __page_ext && __iter.index < __pgcount;        \
>>> +        __page_ext = page_ext_iter_next(&__iter),      \
>>> +        __iter.index++)
>>
>> Hm, if we now have an index, why not turn iter.pfn -> iter.start_pfn, and only adjust the index in page_ext_iter_next?
>>
>> Then you can set the index to 0 in page_ext_iter_begin() and have here
>>
>> for (__page_ext = page_ext_iter_begin(&__iter, __page),
>>        __page_ext && __iter.index < __pgcount,
>>        __page_ext = page_ext_iter_next(&__iter);)
> 
> I can do this if you feel strong about it, but I prefer explicitly over
> implicitly. I moved the index into the iter object just to avoid having
> to define it in the macro's body. Also, the way I did it allows for
> using page_ext_iter_begin()/page_ext_iter_next() own their if the need
> arises.

Ah, I see what you mean.

for (__page_ext = page_ext_iter_begin(&__iter, __page, __pgcount);
      __page_ext;
      __page_ext = page_ext_iter_next(&__iter))

Could do that I guess by moving the count in there as well and 
performing the check+increment in page_ext_iter_next.

That looks very clean to me, but no strong opinion. Having the index in 
there just to make a macro happy is rather weird.

> 
>> A page_ext_iter_reset() could then simply reset the index=0 and
>> lookup the page_ext(start_pfn + index) == page_ext(start_pfn)
> 
> Just note we don't have page_ext_iter_reset() today (and I guess it's
> not needed).

Right, was writing this before reviewing the other patch.

> 
>>> +
>>> +/**
>>> + * for_each_page_ext_order(): iterate through page_ext objects
>>> + *                            for a given page order
>>> + * @__page: the page we're interested in
>>> + * @__order: page order to iterate through
>>> + * @__page_ext: struct page_ext pointer where the current page_ext
>>> + *              object is returned
>>> + * @__iter: struct page_ext_iter object (defined in the stack)
>>> + *
>>> + * IMPORTANT: must be called with RCU read lock taken.
>>> + */
>>> +#define for_each_page_ext_order(__page, __order, __page_ext, __iter) \
>>> +    for_each_page_ext(__page, (1UL << __order), __page_ext, __iter)
>>> +
>>>    #else /* !CONFIG_PAGE_EXTENSION */
>>>    struct page_ext;
>>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>>> index 641d93f6af4c1..508deb04d5ead 100644
>>> --- a/mm/page_ext.c
>>> +++ b/mm/page_ext.c
>>> @@ -549,3 +549,44 @@ void page_ext_put(struct page_ext *page_ext)
>>>        rcu_read_unlock();
>>>    }
>>> +
>>> +/**
>>> + * page_ext_iter_begin() - Prepare for iterating through page extensions.
>>> + * @iter: page extension iterator.
>>> + * @page: The page we're interested in.
>>> + *
>>> + * Must be called with RCU read lock taken.
>>> + *
>>> + * Return: NULL if no page_ext exists for this page.
>>> + */
>>> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
>>> +{
>>> +    iter->pfn = page_to_pfn(page);
>>> +    iter->page_ext = lookup_page_ext(page);
>>> +
>>> +    return iter->page_ext;
>>> +}
>>> +
>>> +/**
>>> + * page_ext_iter_next() - Get next page extension
>>> + * @iter: page extension iterator.
>>> + *
>>> + * Must be called with RCU read lock taken.
>>> + *
>>> + * Return: NULL if no next page_ext exists.
>>> + */
>>> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
>>> +{
>>> +    if (WARN_ON_ONCE(!iter->page_ext))
>>> +        return NULL;
>>> +
>>> +    iter->pfn++;
>>   > +> +    if (page_ext_iter_next_fast_possible(iter->pfn)) {
>>> +        iter->page_ext = page_ext_next(iter->page_ext);
>>> +    } else {
>>> +        iter->page_ext = lookup_page_ext(pfn_to_page(iter->pfn));
>>> +    }
>>> +
>>> +    return iter->page_ext;
>>> +}
>>
>> We now always have a function call when calling into page_ext_iter_next(). Could we move that to the header and rather expose lookup_page_ext() ?
> 
> I personally don't like over-using inline functions, also I don't think this
> code needs optimization since the current clients make the affected code paths
> slow anyways (and this also applies to the likely/unlikely use in page_owner
> and page_table_check, I'd drop all of them if you ask me). But again, I can
> change if this would prevent you from giving your ACK :)

Well, 512^512 function calls for a 1 GiB page just to traverse the page 
ext? :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] mm: page_ext: add an iteration API for page extensions
  2025-02-20 20:45       ` David Hildenbrand
@ 2025-02-20 20:47         ` David Hildenbrand
  2025-02-20 21:12         ` Luiz Capitulino
  1 sibling, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2025-02-20 20:47 UTC (permalink / raw)
  To: Luiz Capitulino, linux-kernel, linux-mm, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song


>> I personally don't like over-using inline functions, also I don't think this
>> code needs optimization since the current clients make the affected code paths
>> slow anyways (and this also applies to the likely/unlikely use in page_owner
>> and page_table_check, I'd drop all of them if you ask me). But again, I can
>> change if this would prevent you from giving your ACK :)
> 
> Well, 512^512 function calls for a 1 GiB page just to traverse the page
> ext? :)

Ehm, 512^2 :)


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] mm: page_ext: add an iteration API for page extensions
  2025-02-20 20:45       ` David Hildenbrand
  2025-02-20 20:47         ` David Hildenbrand
@ 2025-02-20 21:12         ` Luiz Capitulino
  1 sibling, 0 replies; 15+ messages in thread
From: Luiz Capitulino @ 2025-02-20 21:12 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel, linux-mm, yuzhao, pasha.tatashin
  Cc: akpm, hannes, muchun.song

On 2025-02-20 15:45, David Hildenbrand wrote:
>>>> +    for (__iter.index = 0;                                 \
>>>> +        __page_ext && __iter.index < __pgcount;        \
>>>> +        __page_ext = page_ext_iter_next(&__iter),      \
>>>> +        __iter.index++)
>>>
>>> Hm, if we now have an index, why not turn iter.pfn -> iter.start_pfn, and only adjust the index in page_ext_iter_next?
>>>
>>> Then you can set the index to 0 in page_ext_iter_begin() and have here
>>>
>>> for (__page_ext = page_ext_iter_begin(&__iter, __page),
>>>        __page_ext && __iter.index < __pgcount,
>>>        __page_ext = page_ext_iter_next(&__iter);)
>>
>> I can do this if you feel strong about it, but I prefer explicitly over
>> implicitly. I moved the index into the iter object just to avoid having
>> to define it in the macro's body. Also, the way I did it allows for
>> using page_ext_iter_begin()/page_ext_iter_next() own their if the need
>> arises.
> 
> Ah, I see what you mean.
> 
> for (__page_ext = page_ext_iter_begin(&__iter, __page, __pgcount);
>       __page_ext;
>       __page_ext = page_ext_iter_next(&__iter))
> 
> Could do that I guess by moving the count in there as well and performing the check+increment in page_ext_iter_next.
> 
> That looks very clean to me, but no strong opinion. Having the index in there just to make a macro happy is rather weird.

OK, I'll give this a try.

>>> A page_ext_iter_reset() could then simply reset the index=0 and
>>> lookup the page_ext(start_pfn + index) == page_ext(start_pfn)
>>
>> Just note we don't have page_ext_iter_reset() today (and I guess it's
>> not needed).
> 
> Right, was writing this before reviewing the other patch.
> 
>>
>>>> +
>>>> +/**
>>>> + * for_each_page_ext_order(): iterate through page_ext objects
>>>> + *                            for a given page order
>>>> + * @__page: the page we're interested in
>>>> + * @__order: page order to iterate through
>>>> + * @__page_ext: struct page_ext pointer where the current page_ext
>>>> + *              object is returned
>>>> + * @__iter: struct page_ext_iter object (defined in the stack)
>>>> + *
>>>> + * IMPORTANT: must be called with RCU read lock taken.
>>>> + */
>>>> +#define for_each_page_ext_order(__page, __order, __page_ext, __iter) \
>>>> +    for_each_page_ext(__page, (1UL << __order), __page_ext, __iter)
>>>> +
>>>>    #else /* !CONFIG_PAGE_EXTENSION */
>>>>    struct page_ext;
>>>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>>>> index 641d93f6af4c1..508deb04d5ead 100644
>>>> --- a/mm/page_ext.c
>>>> +++ b/mm/page_ext.c
>>>> @@ -549,3 +549,44 @@ void page_ext_put(struct page_ext *page_ext)
>>>>        rcu_read_unlock();
>>>>    }
>>>> +
>>>> +/**
>>>> + * page_ext_iter_begin() - Prepare for iterating through page extensions.
>>>> + * @iter: page extension iterator.
>>>> + * @page: The page we're interested in.
>>>> + *
>>>> + * Must be called with RCU read lock taken.
>>>> + *
>>>> + * Return: NULL if no page_ext exists for this page.
>>>> + */
>>>> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
>>>> +{
>>>> +    iter->pfn = page_to_pfn(page);
>>>> +    iter->page_ext = lookup_page_ext(page);
>>>> +
>>>> +    return iter->page_ext;
>>>> +}
>>>> +
>>>> +/**
>>>> + * page_ext_iter_next() - Get next page extension
>>>> + * @iter: page extension iterator.
>>>> + *
>>>> + * Must be called with RCU read lock taken.
>>>> + *
>>>> + * Return: NULL if no next page_ext exists.
>>>> + */
>>>> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
>>>> +{
>>>> +    if (WARN_ON_ONCE(!iter->page_ext))
>>>> +        return NULL;
>>>> +
>>>> +    iter->pfn++;
>>>   > +> +    if (page_ext_iter_next_fast_possible(iter->pfn)) {
>>>> +        iter->page_ext = page_ext_next(iter->page_ext);
>>>> +    } else {
>>>> +        iter->page_ext = lookup_page_ext(pfn_to_page(iter->pfn));
>>>> +    }
>>>> +
>>>> +    return iter->page_ext;
>>>> +}
>>>
>>> We now always have a function call when calling into page_ext_iter_next(). Could we move that to the header and rather expose lookup_page_ext() ?
>>
>> I personally don't like over-using inline functions, also I don't think this
>> code needs optimization since the current clients make the affected code paths
>> slow anyways (and this also applies to the likely/unlikely use in page_owner
>> and page_table_check, I'd drop all of them if you ask me). But again, I can
>> change if this would prevent you from giving your ACK :)
> 
> Well, 512^512 function calls for a 1 GiB page just to traverse the page ext? :)

Page_owner may allocate memory, do hash lookup and what not from that code path.
But you have a point that other clients (such as page_table_check) may benefit.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-20 21:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-19  2:17 [PATCH 0/4] mm: page_ext: Introduce new iteration API Luiz Capitulino
2025-02-19  2:17 ` [PATCH 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
2025-02-20 10:59   ` David Hildenbrand
2025-02-20 20:36     ` Luiz Capitulino
2025-02-20 20:45       ` David Hildenbrand
2025-02-20 20:47         ` David Hildenbrand
2025-02-20 21:12         ` Luiz Capitulino
2025-02-19  2:17 ` [PATCH 2/4] mm: page_table_check: use new iteration API Luiz Capitulino
2025-02-20 11:05   ` David Hildenbrand
2025-02-20 20:37     ` Luiz Capitulino
2025-02-19  2:17 ` [PATCH 3/4] mm: page_owner: " Luiz Capitulino
2025-02-19  2:17 ` [PATCH 4/4] mm: page_ext: make page_ext_next() private to page_ext Luiz Capitulino
2025-02-19 23:52 ` [PATCH 0/4] mm: page_ext: Introduce new iteration API Andrew Morton
2025-02-20 10:49   ` David Hildenbrand
2025-02-20 20:23     ` Luiz Capitulino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox