* [RFC 0/4] mm: page_ext: Fix crash when reserving 1G pages
@ 2025-01-24 21:37 Luiz Capitulino
2025-01-24 21:37 ` [RFC 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Luiz Capitulino @ 2025-01-24 21:37 UTC (permalink / raw)
To: linux-kernel, linux-mm, david, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino, luizcap
Hi,
[ Thanks to David Hildenbrand for identifying the root cause of this
issue and proving guidance on how to fix it. The new API idea, bugs
and misconceptions are all mine though ]
Currently, trying to reserve 1G pages with page_owner=on and sparsemem
causes a crash. The reproducer is very simple:
1. Build the kernel with CONFIG_SPARSEMEM=y and the table extensions
2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
3. Reserve one 1G page at run-time, this should crash (see patch 1 for
backtrace)
[ A crash with page_table_check is also possible, but harder to
trigger ]
Apparently, starting with commit e98337d11bbd ("mm/contig_alloc: support
__GFP_COMP") we now pass the full allocation order to page extension
clients and the page extension implementation assumes that all PFNs of
an allocation range will be stored in the same memory section (which is
not true for 1G pages).
To fix this, this series introduces a new iteration API for page extension
objects. The API ensures that we always lookup the memory sections for
the next page extension in the iteration.
While this series seems to fix the issue, it's RFC because:
1. page_ext_iter_next() uses brute-force. David suggested to maintain
the current API but only do a memory section lookup if the next PFN
aligns with PAGE_PER_SECTION (ie. it's the first PFN of a section).
But I couldn't make it work: I was getting random crashes and RCU
warnings
2. It's very lightly tested
Luiz Capitulino (4):
mm: page_ext: add an iteration API for page extensions
mm: page_owner: use new iteration API
mm: page_table_check: use new iteration API
mm: page_ext: drop page_ext_next()
include/linux/page_ext.h | 15 +++++----
mm/page_ext.c | 55 ++++++++++++++++++++++++++++++++
mm/page_owner.c | 68 ++++++++++++++++++++++++++--------------
mm/page_table_check.c | 21 +++++++------
4 files changed, 120 insertions(+), 39 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 1/4] mm: page_ext: add an iteration API for page extensions
2025-01-24 21:37 [RFC 0/4] mm: page_ext: Fix crash when reserving 1G pages Luiz Capitulino
@ 2025-01-24 21:37 ` Luiz Capitulino
2025-02-11 15:25 ` David Hildenbrand
2025-01-24 21:37 ` [RFC 2/4] mm: page_owner: use new iteration API Luiz Capitulino
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Luiz Capitulino @ 2025-01-24 21:37 UTC (permalink / raw)
To: linux-kernel, linux-mm, david, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino, luizcap
The page extension implementation assumes that all page extensions of a
given page order are stored in the same memory section. The function
page_ext_next() relies on this assumption by adding an offset to the
current object to return the next adjacent page extension.
This behavior works as expected for flatmem but fails for sparsemem when
using 1G pages. Commit e98337d11bbd ("mm/contig_alloc: support
__GFP_COMP") exposes this issue, making it possible for a crash when
using page_owner or page_table_check page extensions.
The problem is that for 1G pages, the page extensions may span memory
section boundaries and be stored in different memory sections. This issue
was not visible before commit e98337d11bbd ("mm/contig_alloc: support
__GFP_COMP") because alloc_contig_pages() never passed more than
MAX_PAGE_ORDER to post_alloc_hook(). However, the mentioned commit
changed this behavior allowing the full 1G page order to be passed.
Reproducer:
1. Build the kernel with CONFIG_SPARSEMEM=y and the table extensions
2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
3. Reserve one 1G page at run-time, this should crash (backtrace below)
To address this issue, this commit introduces a new API for iterating
through page extensions. In page_ext_iter_next(), we always go through
page_ext_get() to guarantee that we do a new memory section lookup for
the next page extension. In the future, this API could be used as a basis
to implement for_each_page_ext() type of macro.
Thanks to David Hildenbrand for helping identify the root cause and
providing suggestions on how to fix it (final implementation and bugs are
all mine though).
Here's the backtrace, without kasan you can get random crashes:
[ 76.052526] BUG: KASAN: slab-out-of-bounds in __update_page_owner_handle+0x238/0x298
[ 76.060283] Write of size 4 at addr ffff07ff96240038 by task tee/3598
[ 76.066714]
[ 76.068203] CPU: 88 UID: 0 PID: 3598 Comm: tee Kdump: loaded Not tainted 6.13.0-rep1 #3
[ 76.076202] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220810 (SCP: 2.10.20220810) 2022/08/10
[ 76.088972] Call trace:
[ 76.091411] show_stack+0x20/0x38 (C)
[ 76.095073] dump_stack_lvl+0x80/0xf8
[ 76.098733] print_address_description.constprop.0+0x88/0x398
[ 76.104476] print_report+0xa8/0x278
[ 76.108041] kasan_report+0xa8/0xf8
[ 76.111520] __asan_report_store4_noabort+0x20/0x30
[ 76.116391] __update_page_owner_handle+0x238/0x298
[ 76.121259] __set_page_owner+0xdc/0x140
[ 76.125173] post_alloc_hook+0x190/0x1d8
[ 76.129090] alloc_contig_range_noprof+0x54c/0x890
[ 76.133874] alloc_contig_pages_noprof+0x35c/0x4a8
[ 76.138656] alloc_gigantic_folio.isra.0+0x2c0/0x368
[ 76.143616] only_alloc_fresh_hugetlb_folio.isra.0+0x24/0x150
[ 76.149353] alloc_pool_huge_folio+0x11c/0x1f8
[ 76.153787] set_max_huge_pages+0x364/0xca8
[ 76.157961] __nr_hugepages_store_common+0xb0/0x1a0
[ 76.162829] nr_hugepages_store+0x108/0x118
[ 76.167003] kobj_attr_store+0x3c/0x70
[ 76.170745] sysfs_kf_write+0xfc/0x188
[ 76.174492] kernfs_fop_write_iter+0x274/0x3e0
[ 76.178927] vfs_write+0x64c/0x8e0
[ 76.182323] ksys_write+0xf8/0x1f0
[ 76.185716] __arm64_sys_write+0x74/0xb0
[ 76.189630] invoke_syscall.constprop.0+0xd8/0x1e0
[ 76.194412] do_el0_svc+0x164/0x1e0
[ 76.197891] el0_svc+0x40/0xe0
[ 76.200939] el0t_64_sync_handler+0x144/0x168
[ 76.205287] el0t_64_sync+0x1ac/0x1b0
Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
include/linux/page_ext.h | 10 ++++++++
mm/page_ext.c | 55 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index e4b48a0dda244..df904544d3fac 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -93,6 +93,16 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr)
return next;
}
+struct page_ext_iter {
+ unsigned long pfn;
+ struct page_ext *page_ext;
+};
+
+struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
+struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter);
+struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
+void page_ext_iter_end(struct page_ext_iter *iter);
+
#else /* !CONFIG_PAGE_EXTENSION */
struct page_ext;
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 641d93f6af4c1..0b6eb5524cb2c 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -549,3 +549,58 @@ void page_ext_put(struct page_ext *page_ext)
rcu_read_unlock();
}
+
+/**
+ * page_ext_iter_begin() - Prepare for iterating through page extensions.
+ * @iter: page extension iterator.
+ * @page: The page we're interested in.
+ *
+ * Return: NULL if no page_ext exists for this page.
+ */
+struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
+{
+ iter->pfn = page_to_pfn(page);
+ iter->page_ext = page_ext_get(page);
+
+ return iter->page_ext;
+}
+
+/**
+ * page_ext_iter_get() - Get current page extension
+ * @iter: page extension iterator.
+ *
+ * Return: NULL if no page_ext exists for this iterator.
+ */
+struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter)
+{
+ return iter->page_ext;
+}
+
+/**
+ * page_ext_iter_next() - Get next page extension
+ * @iter: page extension iterator.
+ *
+ * Return: NULL if no next page_ext exists.
+ */
+struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
+{
+ if (!iter->page_ext)
+ return NULL;
+
+ page_ext_put(iter->page_ext);
+
+ iter->pfn++;
+ iter->page_ext = page_ext_get(pfn_to_page(iter->pfn));
+
+ return iter->page_ext;
+}
+
+/**
+ * page_ext_iter_end() - End iteration through page extensions.
+ * @iter: page extension iterator.
+ */
+void page_ext_iter_end(struct page_ext_iter *iter)
+{
+ page_ext_put(iter->page_ext);
+ iter->page_ext = NULL;
+}
--
2.47.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 2/4] mm: page_owner: use new iteration API
2025-01-24 21:37 [RFC 0/4] mm: page_ext: Fix crash when reserving 1G pages Luiz Capitulino
2025-01-24 21:37 ` [RFC 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
@ 2025-01-24 21:37 ` Luiz Capitulino
2025-02-11 15:15 ` David Hildenbrand
2025-01-24 21:37 ` [RFC 3/4] mm: page_table_check: " Luiz Capitulino
2025-01-24 21:37 ` [RFC 4/4] mm: page_ext: drop page_ext_next() Luiz Capitulino
3 siblings, 1 reply; 10+ messages in thread
From: Luiz Capitulino @ 2025-01-24 21:37 UTC (permalink / raw)
To: linux-kernel, linux-mm, david, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino, luizcap
The page_ext_next() function assumes that page extension objects for a
page order allocation always reside in the same memory section, which
may not be true and could lead to crashes. Use the page_ext_iter API
instead.
Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
mm/page_owner.c | 68 ++++++++++++++++++++++++++++++++-----------------
1 file changed, 44 insertions(+), 24 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 2d6360eaccbb6..14b4ec8ceb83e 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -229,7 +229,7 @@ static void dec_stack_record_count(depot_stack_handle_t handle,
handle);
}
-static inline void __update_page_owner_handle(struct page_ext *page_ext,
+static inline void __update_page_owner_handle(struct page_ext_iter *iter,
depot_stack_handle_t handle,
unsigned short order,
gfp_t gfp_mask,
@@ -237,8 +237,11 @@ static inline void __update_page_owner_handle(struct page_ext *page_ext,
pid_t pid, pid_t tgid, char *comm)
{
int i;
+ struct page_ext *page_ext;
struct page_owner *page_owner;
+ page_ext = page_ext_iter_get(iter);
+
for (i = 0; i < (1 << order); i++) {
page_owner = get_page_owner(page_ext);
page_owner->handle = handle;
@@ -252,19 +255,22 @@ static inline void __update_page_owner_handle(struct page_ext *page_ext,
sizeof(page_owner->comm));
__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
__set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
- page_ext = page_ext_next(page_ext);
+ page_ext = page_ext_iter_next(iter);
}
}
-static inline void __update_page_owner_free_handle(struct page_ext *page_ext,
+static inline void __update_page_owner_free_handle(struct page_ext_iter *iter,
depot_stack_handle_t handle,
unsigned short order,
pid_t pid, pid_t tgid,
u64 free_ts_nsec)
{
int i;
+ struct page_ext *page_ext;
struct page_owner *page_owner;
+ page_ext = page_ext_iter_get(iter);
+
for (i = 0; i < (1 << order); i++) {
page_owner = get_page_owner(page_ext);
/* Only __reset_page_owner() wants to clear the bit */
@@ -275,7 +281,7 @@ static inline void __update_page_owner_free_handle(struct page_ext *page_ext,
page_owner->free_ts_nsec = free_ts_nsec;
page_owner->free_pid = current->pid;
page_owner->free_tgid = current->tgid;
- page_ext = page_ext_next(page_ext);
+ page_ext = page_ext_iter_next(iter);
}
}
@@ -286,8 +292,9 @@ void __reset_page_owner(struct page *page, unsigned short order)
depot_stack_handle_t alloc_handle;
struct page_owner *page_owner;
u64 free_ts_nsec = local_clock();
+ struct page_ext_iter iter;
- page_ext = page_ext_get(page);
+ page_ext = page_ext_iter_begin(&iter, page);
if (unlikely(!page_ext))
return;
@@ -295,9 +302,10 @@ void __reset_page_owner(struct page *page, unsigned short order)
alloc_handle = page_owner->handle;
handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
- __update_page_owner_free_handle(page_ext, handle, order, current->pid,
+ __update_page_owner_free_handle(&iter, handle, order, current->pid,
current->tgid, free_ts_nsec);
- page_ext_put(page_ext);
+
+ page_ext_iter_end(&iter);
if (alloc_handle != early_handle)
/*
@@ -314,18 +322,19 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
gfp_t gfp_mask)
{
struct page_ext *page_ext;
+ struct page_ext_iter iter;
u64 ts_nsec = local_clock();
depot_stack_handle_t handle;
handle = save_stack(gfp_mask);
- page_ext = page_ext_get(page);
+ page_ext = page_ext_iter_begin(&iter, page);
if (unlikely(!page_ext))
return;
- __update_page_owner_handle(page_ext, handle, order, gfp_mask, -1,
+ __update_page_owner_handle(&iter, handle, order, gfp_mask, -1,
ts_nsec, current->pid, current->tgid,
current->comm);
- page_ext_put(page_ext);
+ page_ext_iter_end(&iter);
inc_stack_record_count(handle, gfp_mask, 1 << order);
}
@@ -345,18 +354,21 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
void __split_page_owner(struct page *page, int old_order, int new_order)
{
int i;
- struct page_ext *page_ext = page_ext_get(page);
+ struct page_ext *page_ext;
+ struct page_ext_iter iter;
struct page_owner *page_owner;
+ page_ext = page_ext_iter_begin(&iter, page);
if (unlikely(!page_ext))
return;
for (i = 0; i < (1 << old_order); i++) {
page_owner = get_page_owner(page_ext);
page_owner->order = new_order;
- page_ext = page_ext_next(page_ext);
+ page_ext = page_ext_iter_next(&iter);
}
- page_ext_put(page_ext);
+
+ page_ext_iter_end(&iter);
}
void __folio_copy_owner(struct folio *newfolio, struct folio *old)
@@ -364,24 +376,26 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
int i;
struct page_ext *old_ext;
struct page_ext *new_ext;
+ struct page_ext_iter old_iter;
+ struct page_ext_iter new_iter;
struct page_owner *old_page_owner;
struct page_owner *new_page_owner;
depot_stack_handle_t migrate_handle;
- old_ext = page_ext_get(&old->page);
+ old_ext = page_ext_iter_begin(&old_iter, &old->page);
if (unlikely(!old_ext))
return;
- new_ext = page_ext_get(&newfolio->page);
+ new_ext = page_ext_iter_begin(&new_iter, &newfolio->page);
if (unlikely(!new_ext)) {
- page_ext_put(old_ext);
+ page_ext_iter_end(&old_iter);
return;
}
old_page_owner = get_page_owner(old_ext);
new_page_owner = get_page_owner(new_ext);
migrate_handle = new_page_owner->handle;
- __update_page_owner_handle(new_ext, old_page_owner->handle,
+ __update_page_owner_handle(&new_iter, old_page_owner->handle,
old_page_owner->order, old_page_owner->gfp_mask,
old_page_owner->last_migrate_reason,
old_page_owner->ts_nsec, old_page_owner->pid,
@@ -390,8 +404,13 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
* Do not proactively clear PAGE_EXT_OWNER{_ALLOCATED} bits as the folio
* will be freed after migration. Keep them until then as they may be
* useful.
+ *
+ * Note that we need to re-grab the page_ext iterator since
+ * __update_page_owner_handle changed it.
*/
- __update_page_owner_free_handle(new_ext, 0, old_page_owner->order,
+ page_ext_iter_end(&new_iter);
+ page_ext_iter_begin(&new_iter, &newfolio->page);
+ __update_page_owner_free_handle(&new_iter, 0, old_page_owner->order,
old_page_owner->free_pid,
old_page_owner->free_tgid,
old_page_owner->free_ts_nsec);
@@ -402,12 +421,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
*/
for (i = 0; i < (1 << new_page_owner->order); i++) {
old_page_owner->handle = migrate_handle;
- old_ext = page_ext_next(old_ext);
+ old_ext = page_ext_iter_next(&old_iter);
old_page_owner = get_page_owner(old_ext);
}
- page_ext_put(new_ext);
- page_ext_put(old_ext);
+ page_ext_iter_end(&new_iter);
+ page_ext_iter_end(&old_iter);
}
void pagetypeinfo_showmixedcount_print(struct seq_file *m,
@@ -782,6 +801,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
for (; pfn < block_end_pfn; pfn++) {
struct page *page = pfn_to_page(pfn);
struct page_ext *page_ext;
+ struct page_ext_iter iter;
if (page_zone(page) != zone)
continue;
@@ -804,7 +824,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
if (PageReserved(page))
continue;
- page_ext = page_ext_get(page);
+ page_ext = page_ext_iter_begin(&iter, page);
if (unlikely(!page_ext))
continue;
@@ -813,12 +833,12 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
goto ext_put_continue;
/* Found early allocated page */
- __update_page_owner_handle(page_ext, early_handle, 0, 0,
+ __update_page_owner_handle(&iter, early_handle, 0, 0,
-1, local_clock(), current->pid,
current->tgid, current->comm);
count++;
ext_put_continue:
- page_ext_put(page_ext);
+ page_ext_iter_end(&iter);
}
cond_resched();
}
--
2.47.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 3/4] mm: page_table_check: use new iteration API
2025-01-24 21:37 [RFC 0/4] mm: page_ext: Fix crash when reserving 1G pages Luiz Capitulino
2025-01-24 21:37 ` [RFC 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
2025-01-24 21:37 ` [RFC 2/4] mm: page_owner: use new iteration API Luiz Capitulino
@ 2025-01-24 21:37 ` Luiz Capitulino
2025-02-11 15:09 ` David Hildenbrand
2025-01-24 21:37 ` [RFC 4/4] mm: page_ext: drop page_ext_next() Luiz Capitulino
3 siblings, 1 reply; 10+ messages in thread
From: Luiz Capitulino @ 2025-01-24 21:37 UTC (permalink / raw)
To: linux-kernel, linux-mm, david, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino, luizcap
The page_ext_next() function assumes that page extension objects for a
page order allocation always reside in the same memory section, which
may not be true and could lead to crashes. Use the page_ext_iter API
instead.
Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
mm/page_table_check.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 509c6ef8de400..361322a5bc7ab 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -63,6 +63,7 @@ static struct page_table_check *get_page_table_check(struct page_ext *page_ext)
static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
{
struct page_ext *page_ext;
+ struct page_ext_iter iter;
struct page *page;
unsigned long i;
bool anon;
@@ -71,7 +72,7 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
return;
page = pfn_to_page(pfn);
- page_ext = page_ext_get(page);
+ page_ext = page_ext_iter_begin(&iter, page);
if (!page_ext)
return;
@@ -89,9 +90,9 @@ static void page_table_check_clear(unsigned long pfn, unsigned long pgcnt)
BUG_ON(atomic_read(&ptc->anon_map_count));
BUG_ON(atomic_dec_return(&ptc->file_map_count) < 0);
}
- page_ext = page_ext_next(page_ext);
+ page_ext = page_ext_iter_next(&iter);
}
- page_ext_put(page_ext);
+ page_ext_iter_end(&iter);
}
/*
@@ -103,6 +104,7 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
bool rw)
{
struct page_ext *page_ext;
+ struct page_ext_iter iter;
struct page *page;
unsigned long i;
bool anon;
@@ -111,7 +113,7 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
return;
page = pfn_to_page(pfn);
- page_ext = page_ext_get(page);
+ page_ext = page_ext_iter_begin(&iter, page);
if (!page_ext)
return;
@@ -129,9 +131,9 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
BUG_ON(atomic_read(&ptc->anon_map_count));
BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0);
}
- page_ext = page_ext_next(page_ext);
+ page_ext = page_ext_iter_next(&iter);
}
- page_ext_put(page_ext);
+ page_ext_iter_end(&iter);
}
/*
@@ -141,11 +143,12 @@ static void page_table_check_set(unsigned long pfn, unsigned long pgcnt,
void __page_table_check_zero(struct page *page, unsigned int order)
{
struct page_ext *page_ext;
+ struct page_ext_iter iter;
unsigned long i;
BUG_ON(PageSlab(page));
- page_ext = page_ext_get(page);
+ page_ext = page_ext_iter_begin(&iter, page);
if (!page_ext)
return;
@@ -155,9 +158,9 @@ void __page_table_check_zero(struct page *page, unsigned int order)
BUG_ON(atomic_read(&ptc->anon_map_count));
BUG_ON(atomic_read(&ptc->file_map_count));
- page_ext = page_ext_next(page_ext);
+ page_ext = page_ext_iter_next(&iter);
}
- page_ext_put(page_ext);
+ page_ext_iter_end(&iter);
}
void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
--
2.47.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 4/4] mm: page_ext: drop page_ext_next()
2025-01-24 21:37 [RFC 0/4] mm: page_ext: Fix crash when reserving 1G pages Luiz Capitulino
` (2 preceding siblings ...)
2025-01-24 21:37 ` [RFC 3/4] mm: page_table_check: " Luiz Capitulino
@ 2025-01-24 21:37 ` Luiz Capitulino
3 siblings, 0 replies; 10+ messages in thread
From: Luiz Capitulino @ 2025-01-24 21:37 UTC (permalink / raw)
To: linux-kernel, linux-mm, david, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino, luizcap
Previous commits converted users to the new page extension iteration API.
TODO: We can use this implementation for flatmem.
Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
include/linux/page_ext.h | 7 -------
1 file changed, 7 deletions(-)
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index df904544d3fac..4bbc6638fe14f 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -86,13 +86,6 @@ static inline void *page_ext_data(struct page_ext *page_ext,
return (void *)(page_ext) + ops->offset;
}
-static inline struct page_ext *page_ext_next(struct page_ext *curr)
-{
- void *next = curr;
- next += page_ext_size;
- return next;
-}
-
struct page_ext_iter {
unsigned long pfn;
struct page_ext *page_ext;
--
2.47.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 3/4] mm: page_table_check: use new iteration API
2025-01-24 21:37 ` [RFC 3/4] mm: page_table_check: " Luiz Capitulino
@ 2025-02-11 15:09 ` David Hildenbrand
0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-02-11 15:09 UTC (permalink / raw)
To: Luiz Capitulino, linux-kernel, linux-mm, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino
On 24.01.25 22:37, Luiz Capitulino wrote:
> The page_ext_next() function assumes that page extension objects for a
> page order allocation always reside in the same memory section, which
> may not be true and could lead to crashes. Use the page_ext_iter API
> instead.
>
> Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> ---
The usage of the new API here looks straight forward here. The
page_owner one is a bit more involved :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 2/4] mm: page_owner: use new iteration API
2025-01-24 21:37 ` [RFC 2/4] mm: page_owner: use new iteration API Luiz Capitulino
@ 2025-02-11 15:15 ` David Hildenbrand
2025-02-11 16:00 ` Luiz Capitulino
0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-02-11 15:15 UTC (permalink / raw)
To: Luiz Capitulino, linux-kernel, linux-mm, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino
On 24.01.25 22:37, Luiz Capitulino wrote:
> The page_ext_next() function assumes that page extension objects for a
> page order allocation always reside in the same memory section, which
> may not be true and could lead to crashes. Use the page_ext_iter API
> instead.
>
> Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> ---
[...]
> void __folio_copy_owner(struct folio *newfolio, struct folio *old)
> @@ -364,24 +376,26 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
> int i;
> struct page_ext *old_ext;
> struct page_ext *new_ext;
> + struct page_ext_iter old_iter;
> + struct page_ext_iter new_iter;
> struct page_owner *old_page_owner;
> struct page_owner *new_page_owner;
> depot_stack_handle_t migrate_handle;
>
> - old_ext = page_ext_get(&old->page);
> + old_ext = page_ext_iter_begin(&old_iter, &old->page);
> if (unlikely(!old_ext))
> return;
>
> - new_ext = page_ext_get(&newfolio->page);
> + new_ext = page_ext_iter_begin(&new_iter, &newfolio->page);
> if (unlikely(!new_ext)) {
> - page_ext_put(old_ext);
> + page_ext_iter_end(&old_iter);
> return;
> }
>
> old_page_owner = get_page_owner(old_ext);
> new_page_owner = get_page_owner(new_ext);
> migrate_handle = new_page_owner->handle;
> - __update_page_owner_handle(new_ext, old_page_owner->handle,
> + __update_page_owner_handle(&new_iter, old_page_owner->handle,
> old_page_owner->order, old_page_owner->gfp_mask,
> old_page_owner->last_migrate_reason,
> old_page_owner->ts_nsec, old_page_owner->pid,
> @@ -390,8 +404,13 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
> * Do not proactively clear PAGE_EXT_OWNER{_ALLOCATED} bits as the folio
> * will be freed after migration. Keep them until then as they may be
> * useful.
> + *
> + * Note that we need to re-grab the page_ext iterator since
> + * __update_page_owner_handle changed it.
> */
> - __update_page_owner_free_handle(new_ext, 0, old_page_owner->order,
> + page_ext_iter_end(&new_iter);
> + page_ext_iter_begin(&new_iter, &newfolio->page);
So a page_ext_iter_reset() could be helpful, that wouldn't drop the RCU
lock. With that, we could probably also drop the comment.
> + __update_page_owner_free_handle(&new_iter, 0, old_page_owner->order,
> old_page_owner->free_pid,
> old_page_owner->free_tgid,
> old_page_owner->free_ts_nsec);
> @@ -402,12 +421,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
> */
> for (i = 0; i < (1 << new_page_owner->order); i++) {
> old_page_owner->handle = migrate_handle;
> - old_ext = page_ext_next(old_ext);
> + old_ext = page_ext_iter_next(&old_iter);
> old_page_owner = get_page_owner(old_ext);
> }
>
> - page_ext_put(new_ext);
> - page_ext_put(old_ext);
> + page_ext_iter_end(&new_iter);
> + page_ext_iter_end(&old_iter);
In general, we should look into implementing the iterator without
temporarily dropping the RCU lock I think.
Nothing jumped at me from a quick glimpse, but yes, this usage is not
that easy.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/4] mm: page_ext: add an iteration API for page extensions
2025-01-24 21:37 ` [RFC 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
@ 2025-02-11 15:25 ` David Hildenbrand
2025-02-11 16:07 ` Luiz Capitulino
0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-02-11 15:25 UTC (permalink / raw)
To: Luiz Capitulino, linux-kernel, linux-mm, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino
On 24.01.25 22:37, Luiz Capitulino wrote:
> The page extension implementation assumes that all page extensions of a
> given page order are stored in the same memory section. The function
> page_ext_next() relies on this assumption by adding an offset to the
> current object to return the next adjacent page extension.
>
> This behavior works as expected for flatmem but fails for sparsemem when
> using 1G pages. Commit e98337d11bbd ("mm/contig_alloc: support
> __GFP_COMP") exposes this issue, making it possible for a crash when
> using page_owner or page_table_check page extensions.
>
> The problem is that for 1G pages, the page extensions may span memory
> section boundaries and be stored in different memory sections. This issue
> was not visible before commit e98337d11bbd ("mm/contig_alloc: support
> __GFP_COMP") because alloc_contig_pages() never passed more than
> MAX_PAGE_ORDER to post_alloc_hook(). However, the mentioned commit
> changed this behavior allowing the full 1G page order to be passed.
>
> Reproducer:
>
> 1. Build the kernel with CONFIG_SPARSEMEM=y and the table extensions
> 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
> 3. Reserve one 1G page at run-time, this should crash (backtrace below)
>
> To address this issue, this commit introduces a new API for iterating
> through page extensions. In page_ext_iter_next(), we always go through
> page_ext_get() to guarantee that we do a new memory section lookup for
> the next page extension. In the future, this API could be used as a basis
> to implement for_each_page_ext() type of macro.
>
> Thanks to David Hildenbrand for helping identify the root cause and
> providing suggestions on how to fix it (final implementation and bugs are
> all mine though).
>
> Here's the backtrace, without kasan you can get random crashes:
>
> [ 76.052526] BUG: KASAN: slab-out-of-bounds in __update_page_owner_handle+0x238/0x298
> [ 76.060283] Write of size 4 at addr ffff07ff96240038 by task tee/3598
> [ 76.066714]
> [ 76.068203] CPU: 88 UID: 0 PID: 3598 Comm: tee Kdump: loaded Not tainted 6.13.0-rep1 #3
> [ 76.076202] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220810 (SCP: 2.10.20220810) 2022/08/10
> [ 76.088972] Call trace:
> [ 76.091411] show_stack+0x20/0x38 (C)
> [ 76.095073] dump_stack_lvl+0x80/0xf8
> [ 76.098733] print_address_description.constprop.0+0x88/0x398
> [ 76.104476] print_report+0xa8/0x278
> [ 76.108041] kasan_report+0xa8/0xf8
> [ 76.111520] __asan_report_store4_noabort+0x20/0x30
> [ 76.116391] __update_page_owner_handle+0x238/0x298
> [ 76.121259] __set_page_owner+0xdc/0x140
> [ 76.125173] post_alloc_hook+0x190/0x1d8
> [ 76.129090] alloc_contig_range_noprof+0x54c/0x890
> [ 76.133874] alloc_contig_pages_noprof+0x35c/0x4a8
> [ 76.138656] alloc_gigantic_folio.isra.0+0x2c0/0x368
> [ 76.143616] only_alloc_fresh_hugetlb_folio.isra.0+0x24/0x150
> [ 76.149353] alloc_pool_huge_folio+0x11c/0x1f8
> [ 76.153787] set_max_huge_pages+0x364/0xca8
> [ 76.157961] __nr_hugepages_store_common+0xb0/0x1a0
> [ 76.162829] nr_hugepages_store+0x108/0x118
> [ 76.167003] kobj_attr_store+0x3c/0x70
> [ 76.170745] sysfs_kf_write+0xfc/0x188
> [ 76.174492] kernfs_fop_write_iter+0x274/0x3e0
> [ 76.178927] vfs_write+0x64c/0x8e0
> [ 76.182323] ksys_write+0xf8/0x1f0
> [ 76.185716] __arm64_sys_write+0x74/0xb0
> [ 76.189630] invoke_syscall.constprop.0+0xd8/0x1e0
> [ 76.194412] do_el0_svc+0x164/0x1e0
> [ 76.197891] el0_svc+0x40/0xe0
> [ 76.200939] el0t_64_sync_handler+0x144/0x168
> [ 76.205287] el0t_64_sync+0x1ac/0x1b0
>
> Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> ---
> include/linux/page_ext.h | 10 ++++++++
> mm/page_ext.c | 55 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 65 insertions(+)
>
> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
> index e4b48a0dda244..df904544d3fac 100644
> --- a/include/linux/page_ext.h
> +++ b/include/linux/page_ext.h
> @@ -93,6 +93,16 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr)
> return next;
> }
>
> +struct page_ext_iter {
> + unsigned long pfn;
> + struct page_ext *page_ext;
> +};
> +
> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
> +struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter);
> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
> +void page_ext_iter_end(struct page_ext_iter *iter);
> +
> #else /* !CONFIG_PAGE_EXTENSION */
> struct page_ext;
>
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 641d93f6af4c1..0b6eb5524cb2c 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -549,3 +549,58 @@ void page_ext_put(struct page_ext *page_ext)
>
> rcu_read_unlock();
> }
> +
> +/**
> + * page_ext_iter_begin() - Prepare for iterating through page extensions.
> + * @iter: page extension iterator.
> + * @page: The page we're interested in.
> + *
> + * Return: NULL if no page_ext exists for this page.
> + */
> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
> +{
> + iter->pfn = page_to_pfn(page);
> + iter->page_ext = page_ext_get(page);
> +
> + return iter->page_ext;
> +}
> +
> +/**
> + * page_ext_iter_get() - Get current page extension
> + * @iter: page extension iterator.
> + *
> + * Return: NULL if no page_ext exists for this iterator.
> + */
> +struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter)
> +{
> + return iter->page_ext;
> +}
> +
> +/**
> + * page_ext_iter_next() - Get next page extension
> + * @iter: page extension iterator.
> + *
> + * Return: NULL if no next page_ext exists.
> + */
> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
> +{
> + if (!iter->page_ext)
> + return NULL;
> +
> + page_ext_put(iter->page_ext);
> +
> + iter->pfn++;
> + iter->page_ext = page_ext_get(pfn_to_page(iter->pfn));
> +
> + return iter->page_ext;
> +}
> +
> +/**
> + * page_ext_iter_end() - End iteration through page extensions.
> + * @iter: page extension iterator.
> + */
> +void page_ext_iter_end(struct page_ext_iter *iter)
> +{
> + page_ext_put(iter->page_ext);
> + iter->page_ext = NULL;
> +}
We should likely move all these to the header file, and implement
page_ext_iter_next() without dropping the RCU lock. Further, we should add
the optimization where possible right away.
Maybe something like this might work:
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index df904544d3fac..4a79820cfe63e 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -69,12 +69,24 @@ extern void page_ext_init(void);
static inline void page_ext_init_flatmem_late(void)
{
}
+static inline bool page_ext_next_possible(unsigned long next_pfn)
+{
+ /*
+ * page_ext is allocated per memory section. Once we cross a
+ * memory section, we have to fetch the new pointer.
+ */
+ return next_pfn % PAGES_PER_SECTION;
+}
#else
extern void page_ext_init_flatmem(void);
extern void page_ext_init_flatmem_late(void);
static inline void page_ext_init(void)
{
}
+static inline bool page_ext_next_possible(unsigned long pfn)
+{
+ return true;
+}
#endif
extern struct page_ext *page_ext_get(const struct page *page);
@@ -100,9 +112,32 @@ struct page_ext_iter {
struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter);
-struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
void page_ext_iter_end(struct page_ext_iter *iter);
+/**
+ * page_ext_iter_next() - Get next page extension
+ * @iter: page extension iterator.
+ *
+ * Return: NULL if no next page_ext exists.
+ */
+struct page_ext *__page_ext_iter_next(struct page_ext_iter *iter)
+{
+ if (WARN_ON_ONCE(!iter->page_ext))
+ return NULL;
+
+ iter->pfn++;
+ if (page_exit_iter_next_fast_possible(iter)) {
+ iter->page_ext = page_ext_next(iter->page_ext);
+ } else {
+ /*
+ * Grabbing the new one before putting the old one makes sure
+ * that we won't be dropping the RCU read lock.
+ */
+ iter->page_ext = page_ext_get(pfn_to_page(iter->pfn));
+ page_ext_put(iter->page_ext);
+ }
+ WARN_ON_ONCE(!iter->page_ext)
+ return iter->page_ext;
+}
+
#else /* !CONFIG_PAGE_EXTENSION */
We should just assume that if the first page_ext exists, then the
other ones must exist as well (thus the WARN_ON_ONCE()).
If the page_ext_get+page_ext_put thingy regarding RCU does not work,
we should have a new function that looks up the page_ext, assuming
we already hold the rcu lock. (but I think this should work)
In general, I think this iterator API will be a lot cleaner to use.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 2/4] mm: page_owner: use new iteration API
2025-02-11 15:15 ` David Hildenbrand
@ 2025-02-11 16:00 ` Luiz Capitulino
0 siblings, 0 replies; 10+ messages in thread
From: Luiz Capitulino @ 2025-02-11 16:00 UTC (permalink / raw)
To: David Hildenbrand, linux-kernel, linux-mm, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino
On 2025-02-11 10:15, David Hildenbrand wrote:
> On 24.01.25 22:37, Luiz Capitulino wrote:
>> The page_ext_next() function assumes that page extension objects for a
>> page order allocation always reside in the same memory section, which
>> may not be true and could lead to crashes. Use the page_ext_iter API
>> instead.
>>
>> Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
>> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
>> ---
>
> [...]
>
>> void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>> @@ -364,24 +376,26 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>> int i;
>> struct page_ext *old_ext;
>> struct page_ext *new_ext;
>> + struct page_ext_iter old_iter;
>> + struct page_ext_iter new_iter;
>> struct page_owner *old_page_owner;
>> struct page_owner *new_page_owner;
>> depot_stack_handle_t migrate_handle;
>> - old_ext = page_ext_get(&old->page);
>> + old_ext = page_ext_iter_begin(&old_iter, &old->page);
>> if (unlikely(!old_ext))
>> return;
>> - new_ext = page_ext_get(&newfolio->page);
>> + new_ext = page_ext_iter_begin(&new_iter, &newfolio->page);
>> if (unlikely(!new_ext)) {
>> - page_ext_put(old_ext);
>> + page_ext_iter_end(&old_iter);
>> return;
>> }
>> old_page_owner = get_page_owner(old_ext);
>> new_page_owner = get_page_owner(new_ext);
>> migrate_handle = new_page_owner->handle;
>> - __update_page_owner_handle(new_ext, old_page_owner->handle,
>> + __update_page_owner_handle(&new_iter, old_page_owner->handle,
>> old_page_owner->order, old_page_owner->gfp_mask,
>> old_page_owner->last_migrate_reason,
>> old_page_owner->ts_nsec, old_page_owner->pid,
>> @@ -390,8 +404,13 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>> * Do not proactively clear PAGE_EXT_OWNER{_ALLOCATED} bits as the folio
>> * will be freed after migration. Keep them until then as they may be
>> * useful.
>> + *
>> + * Note that we need to re-grab the page_ext iterator since
>> + * __update_page_owner_handle changed it.
>> */
>> - __update_page_owner_free_handle(new_ext, 0, old_page_owner->order,
>> + page_ext_iter_end(&new_iter);
>> + page_ext_iter_begin(&new_iter, &newfolio->page);
>
> So a page_ext_iter_reset() could be helpful, that wouldn't drop the RCU lock. With that, we could probably also drop the comment.
That's a good suggestion, I'll give it a try.
>
>> + __update_page_owner_free_handle(&new_iter, 0, old_page_owner->order,
>> old_page_owner->free_pid,
>> old_page_owner->free_tgid,
>> old_page_owner->free_ts_nsec);
>> @@ -402,12 +421,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>> */
>> for (i = 0; i < (1 << new_page_owner->order); i++) {
>> old_page_owner->handle = migrate_handle;
>> - old_ext = page_ext_next(old_ext);
>> + old_ext = page_ext_iter_next(&old_iter);
>> old_page_owner = get_page_owner(old_ext);
>> }
>> - page_ext_put(new_ext);
>> - page_ext_put(old_ext);
>> + page_ext_iter_end(&new_iter);
>> + page_ext_iter_end(&old_iter);
>
> In general, we should look into implementing the iterator without temporarily dropping the RCU lock I think.
OK.
>
> Nothing jumped at me from a quick glimpse, but yes, this usage is not that easy.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/4] mm: page_ext: add an iteration API for page extensions
2025-02-11 15:25 ` David Hildenbrand
@ 2025-02-11 16:07 ` Luiz Capitulino
0 siblings, 0 replies; 10+ messages in thread
From: Luiz Capitulino @ 2025-02-11 16:07 UTC (permalink / raw)
To: David Hildenbrand, linux-kernel, linux-mm, yuzhao
Cc: akpm, hannes, muchun.song, lcapitulino
On 2025-02-11 10:25, David Hildenbrand wrote:
> On 24.01.25 22:37, Luiz Capitulino wrote:
>> The page extension implementation assumes that all page extensions of a
>> given page order are stored in the same memory section. The function
>> page_ext_next() relies on this assumption by adding an offset to the
>> current object to return the next adjacent page extension.
>>
>> This behavior works as expected for flatmem but fails for sparsemem when
>> using 1G pages. Commit e98337d11bbd ("mm/contig_alloc: support
>> __GFP_COMP") exposes this issue, making it possible for a crash when
>> using page_owner or page_table_check page extensions.
>>
>> The problem is that for 1G pages, the page extensions may span memory
>> section boundaries and be stored in different memory sections. This issue
>> was not visible before commit e98337d11bbd ("mm/contig_alloc: support
>> __GFP_COMP") because alloc_contig_pages() never passed more than
>> MAX_PAGE_ORDER to post_alloc_hook(). However, the mentioned commit
>> changed this behavior allowing the full 1G page order to be passed.
>>
>> Reproducer:
>>
>> 1. Build the kernel with CONFIG_SPARSEMEM=y and the table extensions
>> 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
>> 3. Reserve one 1G page at run-time, this should crash (backtrace below)
>>
>> To address this issue, this commit introduces a new API for iterating
>> through page extensions. In page_ext_iter_next(), we always go through
>> page_ext_get() to guarantee that we do a new memory section lookup for
>> the next page extension. In the future, this API could be used as a basis
>> to implement for_each_page_ext() type of macro.
>>
>> Thanks to David Hildenbrand for helping identify the root cause and
>> providing suggestions on how to fix it (final implementation and bugs are
>> all mine though).
>>
>> Here's the backtrace, without kasan you can get random crashes:
>>
>> [ 76.052526] BUG: KASAN: slab-out-of-bounds in __update_page_owner_handle+0x238/0x298
>> [ 76.060283] Write of size 4 at addr ffff07ff96240038 by task tee/3598
>> [ 76.066714]
>> [ 76.068203] CPU: 88 UID: 0 PID: 3598 Comm: tee Kdump: loaded Not tainted 6.13.0-rep1 #3
>> [ 76.076202] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220810 (SCP: 2.10.20220810) 2022/08/10
>> [ 76.088972] Call trace:
>> [ 76.091411] show_stack+0x20/0x38 (C)
>> [ 76.095073] dump_stack_lvl+0x80/0xf8
>> [ 76.098733] print_address_description.constprop.0+0x88/0x398
>> [ 76.104476] print_report+0xa8/0x278
>> [ 76.108041] kasan_report+0xa8/0xf8
>> [ 76.111520] __asan_report_store4_noabort+0x20/0x30
>> [ 76.116391] __update_page_owner_handle+0x238/0x298
>> [ 76.121259] __set_page_owner+0xdc/0x140
>> [ 76.125173] post_alloc_hook+0x190/0x1d8
>> [ 76.129090] alloc_contig_range_noprof+0x54c/0x890
>> [ 76.133874] alloc_contig_pages_noprof+0x35c/0x4a8
>> [ 76.138656] alloc_gigantic_folio.isra.0+0x2c0/0x368
>> [ 76.143616] only_alloc_fresh_hugetlb_folio.isra.0+0x24/0x150
>> [ 76.149353] alloc_pool_huge_folio+0x11c/0x1f8
>> [ 76.153787] set_max_huge_pages+0x364/0xca8
>> [ 76.157961] __nr_hugepages_store_common+0xb0/0x1a0
>> [ 76.162829] nr_hugepages_store+0x108/0x118
>> [ 76.167003] kobj_attr_store+0x3c/0x70
>> [ 76.170745] sysfs_kf_write+0xfc/0x188
>> [ 76.174492] kernfs_fop_write_iter+0x274/0x3e0
>> [ 76.178927] vfs_write+0x64c/0x8e0
>> [ 76.182323] ksys_write+0xf8/0x1f0
>> [ 76.185716] __arm64_sys_write+0x74/0xb0
>> [ 76.189630] invoke_syscall.constprop.0+0xd8/0x1e0
>> [ 76.194412] do_el0_svc+0x164/0x1e0
>> [ 76.197891] el0_svc+0x40/0xe0
>> [ 76.200939] el0t_64_sync_handler+0x144/0x168
>> [ 76.205287] el0t_64_sync+0x1ac/0x1b0
>>
>> Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
>> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
>> ---
>> include/linux/page_ext.h | 10 ++++++++
>> mm/page_ext.c | 55 ++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 65 insertions(+)
>>
>> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
>> index e4b48a0dda244..df904544d3fac 100644
>> --- a/include/linux/page_ext.h
>> +++ b/include/linux/page_ext.h
>> @@ -93,6 +93,16 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr)
>> return next;
>> }
>> +struct page_ext_iter {
>> + unsigned long pfn;
>> + struct page_ext *page_ext;
>> +};
>> +
>> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
>> +struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter);
>> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
>> +void page_ext_iter_end(struct page_ext_iter *iter);
>> +
>> #else /* !CONFIG_PAGE_EXTENSION */
>> struct page_ext;
>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>> index 641d93f6af4c1..0b6eb5524cb2c 100644
>> --- a/mm/page_ext.c
>> +++ b/mm/page_ext.c
>> @@ -549,3 +549,58 @@ void page_ext_put(struct page_ext *page_ext)
>> rcu_read_unlock();
>> }
>> +
>> +/**
>> + * page_ext_iter_begin() - Prepare for iterating through page extensions.
>> + * @iter: page extension iterator.
>> + * @page: The page we're interested in.
>> + *
>> + * Return: NULL if no page_ext exists for this page.
>> + */
>> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
>> +{
>> + iter->pfn = page_to_pfn(page);
>> + iter->page_ext = page_ext_get(page);
>> +
>> + return iter->page_ext;
>> +}
>> +
>> +/**
>> + * page_ext_iter_get() - Get current page extension
>> + * @iter: page extension iterator.
>> + *
>> + * Return: NULL if no page_ext exists for this iterator.
>> + */
>> +struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter)
>> +{
>> + return iter->page_ext;
>> +}
>> +
>> +/**
>> + * page_ext_iter_next() - Get next page extension
>> + * @iter: page extension iterator.
>> + *
>> + * Return: NULL if no next page_ext exists.
>> + */
>> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
>> +{
>> + if (!iter->page_ext)
>> + return NULL;
>> +
>> + page_ext_put(iter->page_ext);
>> +
>> + iter->pfn++;
>> + iter->page_ext = page_ext_get(pfn_to_page(iter->pfn));
>> +
>> + return iter->page_ext;
>> +}
>> +
>> +/**
>> + * page_ext_iter_end() - End iteration through page extensions.
>> + * @iter: page extension iterator.
>> + */
>> +void page_ext_iter_end(struct page_ext_iter *iter)
>> +{
>> + page_ext_put(iter->page_ext);
>> + iter->page_ext = NULL;
>> +}
>
> We should likely move all these to the header file, and implement
> page_ext_iter_next() without dropping the RCU lock. Further, we should add
> the optimization where possible right away.
I agreed.
>
>
> Maybe something like this might work:
>
>
> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
> index df904544d3fac..4a79820cfe63e 100644
> --- a/include/linux/page_ext.h
> +++ b/include/linux/page_ext.h
> @@ -69,12 +69,24 @@ extern void page_ext_init(void);
> static inline void page_ext_init_flatmem_late(void)
> {
> }
> +static inline bool page_ext_next_possible(unsigned long next_pfn)
> +{
> + /*
> + * page_ext is allocated per memory section. Once we cross a
> + * memory section, we have to fetch the new pointer.
> + */
> + return next_pfn % PAGES_PER_SECTION;
> +}
> #else
> extern void page_ext_init_flatmem(void);
> extern void page_ext_init_flatmem_late(void);
> static inline void page_ext_init(void)
> {
> }
> +static inline bool page_ext_next_possible(unsigned long pfn)
> +{
> + return true;
> +}
> #endif
>
> extern struct page_ext *page_ext_get(const struct page *page);
> @@ -100,9 +112,32 @@ struct page_ext_iter {
>
> struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
> struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter);
> -struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
> void page_ext_iter_end(struct page_ext_iter *iter);
>
> +/**
> + * page_ext_iter_next() - Get next page extension
> + * @iter: page extension iterator.
> + *
> + * Return: NULL if no next page_ext exists.
> + */
> +struct page_ext *__page_ext_iter_next(struct page_ext_iter *iter)
> +{
> + if (WARN_ON_ONCE(!iter->page_ext))
> + return NULL;
> +
> + iter->pfn++;
> + if (page_exit_iter_next_fast_possible(iter)) {
> + iter->page_ext = page_ext_next(iter->page_ext);
> + } else {
> + /*
> + * Grabbing the new one before putting the old one makes sure
> + * that we won't be dropping the RCU read lock.
> + */
> + iter->page_ext = page_ext_get(pfn_to_page(iter->pfn));
> + page_ext_put(iter->page_ext);
> + }
> + WARN_ON_ONCE(!iter->page_ext)
> + return iter->page_ext;
> +}
> +
> #else /* !CONFIG_PAGE_EXTENSION */
>
>
> We should just assume that if the first page_ext exists, then the
> other ones must exist as well (thus the WARN_ON_ONCE()).
>
> If the page_ext_get+page_ext_put thingy regarding RCU does not work,
> we should have a new function that looks up the page_ext, assuming
> we already hold the rcu lock. (but I think this should work)
>
>
> In general, I think this iterator API will be a lot cleaner to use.
Thanks for looking into this, David.
It's been a few weeks that I don't look at this code, but your suggestion
makes sense to me. I'll give this a try for the next posting.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-02-11 16:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-24 21:37 [RFC 0/4] mm: page_ext: Fix crash when reserving 1G pages Luiz Capitulino
2025-01-24 21:37 ` [RFC 1/4] mm: page_ext: add an iteration API for page extensions Luiz Capitulino
2025-02-11 15:25 ` David Hildenbrand
2025-02-11 16:07 ` Luiz Capitulino
2025-01-24 21:37 ` [RFC 2/4] mm: page_owner: use new iteration API Luiz Capitulino
2025-02-11 15:15 ` David Hildenbrand
2025-02-11 16:00 ` Luiz Capitulino
2025-01-24 21:37 ` [RFC 3/4] mm: page_table_check: " Luiz Capitulino
2025-02-11 15:09 ` David Hildenbrand
2025-01-24 21:37 ` [RFC 4/4] mm: page_ext: drop page_ext_next() Luiz Capitulino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox