linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/page_owner: fix prematurely released rcu_read_lock()
@ 2025-12-23  9:25 ranxiaokai627
  2025-12-23  9:42 ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 4+ messages in thread
From: ranxiaokai627 @ 2025-12-23  9:25 UTC (permalink / raw)
  To: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy, david, luizcap
  Cc: linux-kernel, linux-mm, ran.xiaokai, ranxiaokai627

From: Ran Xiaokai <ran.xiaokai@zte.com.cn>

In CONFIG_SPARSEMEM systems, page_ext uses RCU to synchronize with
memory hotplug operations, ensuring page_ext memory won't be freed
due to MEM_OFFLINE during page_ext data access.

Since page_owner is part of page_ext, rcu_read_lock() must be held
continuously throughout the entire page_owner access period and
should not be released midway. Otherwise, it may cause the
use-after-free issue. The sequence is like this:

CPU0                                        CPU1
__folio_copy_owner():                       MEM_OFFLINE:
page_ext = page_ext_get(&old->page);
old_page_owner = ...
page_ext_put(page_ext);

page_ext = page_ext_get(&newfolio->page);
new_page_owner = ...
page_ext_put(page_ext);
                                            __invalidate_page_ext(pfn);
                                            synchronize_rcu();
                                            __free_page_ext(pfn);
old_page_owner->pid
new_page_owner->order   ---> access to freed area

Fixes: 3a812bed3d32a ("mm: page_owner: use new iteration API")
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
 mm/page_owner.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index b6a394a130ec..5d6860e54be7 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -375,24 +375,25 @@ void __split_page_owner(struct page *page, int old_order, int new_order)
 void __folio_copy_owner(struct folio *newfolio, struct folio *old)
 {
 	struct page_ext *page_ext;
+	struct page_ext *old_page_ext, *new_page_ext;
 	struct page_ext_iter iter;
 	struct page_owner *old_page_owner;
 	struct page_owner *new_page_owner;
 	depot_stack_handle_t migrate_handle;
 
-	page_ext = page_ext_get(&old->page);
-	if (unlikely(!page_ext))
+	old_page_ext = page_ext_get(&old->page);
+	if (unlikely(!old_page_ext))
 		return;
 
-	old_page_owner = get_page_owner(page_ext);
-	page_ext_put(page_ext);
+	old_page_owner = get_page_owner(old_page_ext);
 
-	page_ext = page_ext_get(&newfolio->page);
-	if (unlikely(!page_ext))
+	new_page_ext = page_ext_get(&newfolio->page);
+	if (unlikely(!new_page_ext)) {
+		page_ext_put(old_page_ext);
 		return;
+	}
 
-	new_page_owner = get_page_owner(page_ext);
-	page_ext_put(page_ext);
+	new_page_owner = get_page_owner(new_page_ext);
 
 	migrate_handle = new_page_owner->handle;
 	__update_page_owner_handle(&newfolio->page, old_page_owner->handle,
@@ -414,12 +415,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
 	 * for the new one and the old folio otherwise there will be an imbalance
 	 * when subtracting those pages from the stack.
 	 */
-	rcu_read_lock();
 	for_each_page_ext(&old->page, 1 << new_page_owner->order, page_ext, iter) {
 		old_page_owner = get_page_owner(page_ext);
 		old_page_owner->handle = migrate_handle;
 	}
-	rcu_read_unlock();
+	page_ext_put(new_page_ext);
+	page_ext_put(old_page_ext);
 }
 
 void pagetypeinfo_showmixedcount_print(struct seq_file *m,
-- 
2.25.1




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/page_owner: fix prematurely released rcu_read_lock()
  2025-12-23  9:25 [PATCH] mm/page_owner: fix prematurely released rcu_read_lock() ranxiaokai627
@ 2025-12-23  9:42 ` David Hildenbrand (Red Hat)
  2025-12-25  8:17   ` ranxiaokai627
  0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-23  9:42 UTC (permalink / raw)
  To: ranxiaokai627, akpm, vbabka, surenb, mhocko, jackmanb, hannes,
	ziy, luizcap
  Cc: linux-kernel, linux-mm, ran.xiaokai

On 12/23/25 10:25, ranxiaokai627@163.com wrote:
> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> 
> In CONFIG_SPARSEMEM systems, page_ext uses RCU to synchronize with
> memory hotplug operations, ensuring page_ext memory won't be freed
> due to MEM_OFFLINE during page_ext data access.
> 
> Since page_owner is part of page_ext, rcu_read_lock() must be held
> continuously throughout the entire page_owner access period and
> should not be released midway. Otherwise, it may cause the
> use-after-free issue. The sequence is like this:
> 
> CPU0                                        CPU1
> __folio_copy_owner():                       MEM_OFFLINE:
> page_ext = page_ext_get(&old->page);
> old_page_owner = ...
> page_ext_put(page_ext);
> 
> page_ext = page_ext_get(&newfolio->page);
> new_page_owner = ...
> page_ext_put(page_ext);
>                                              __invalidate_page_ext(pfn);
>                                              synchronize_rcu();
>                                              __free_page_ext(pfn);
> old_page_owner->pid
> new_page_owner->order   ---> access to freed area
> 
> Fixes: 3a812bed3d32a ("mm: page_owner: use new iteration API")
> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> ---
>   mm/page_owner.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index b6a394a130ec..5d6860e54be7 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -375,24 +375,25 @@ void __split_page_owner(struct page *page, int old_order, int new_order)
>   void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>   {
>   	struct page_ext *page_ext;
> +	struct page_ext *old_page_ext, *new_page_ext;
>   	struct page_ext_iter iter;
>   	struct page_owner *old_page_owner;
>   	struct page_owner *new_page_owner;
>   	depot_stack_handle_t migrate_handle;
>   
> -	page_ext = page_ext_get(&old->page);
> -	if (unlikely(!page_ext))
> +	old_page_ext = page_ext_get(&old->page);
> +	if (unlikely(!old_page_ext))
>   		return;
>   
> -	old_page_owner = get_page_owner(page_ext);
> -	page_ext_put(page_ext);
> +	old_page_owner = get_page_owner(old_page_ext);
>   
> -	page_ext = page_ext_get(&newfolio->page);
> -	if (unlikely(!page_ext))
> +	new_page_ext = page_ext_get(&newfolio->page);
> +	if (unlikely(!new_page_ext)) {
> +		page_ext_put(old_page_ext);
>   		return;
> +	}
>   
> -	new_page_owner = get_page_owner(page_ext);
> -	page_ext_put(page_ext);
> +	new_page_owner = get_page_owner(new_page_ext);
>   
>   	migrate_handle = new_page_owner->handle;
>   	__update_page_owner_handle(&newfolio->page, old_page_owner->handle,
> @@ -414,12 +415,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>   	 * for the new one and the old folio otherwise there will be an imbalance
>   	 * when subtracting those pages from the stack.
>   	 */
> -	rcu_read_lock();
>   	for_each_page_ext(&old->page, 1 << new_page_owner->order, page_ext, iter) {
>   		old_page_owner = get_page_owner(page_ext);
>   		old_page_owner->handle = migrate_handle;
>   	}
> -	rcu_read_unlock();
> +	page_ext_put(new_page_ext);
> +	page_ext_put(old_page_ext);
>   }

How are you possibly able to call into __split_page_owner() while 
concurrently we are already finished with offlining the memory (-> all 
memory freed and isolated in the buddy) and triggering the notifier?

Doesn't make sense, no?

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/page_owner: fix prematurely released rcu_read_lock()
  2025-12-23  9:42 ` David Hildenbrand (Red Hat)
@ 2025-12-25  8:17   ` ranxiaokai627
  2025-12-30 20:45     ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 4+ messages in thread
From: ranxiaokai627 @ 2025-12-25  8:17 UTC (permalink / raw)
  To: david
  Cc: akpm, hannes, jackmanb, linux-mm, luizcap, mhocko, ran.xiaokai,
	ranxiaokai627, surenb, vbabka, ziy

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 3863 bytes --]

Hi, David

>On 12/23/25 10:25, ranxiaokai627@163.com wrote:
>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
>> @@ -375,24 +375,25 @@ void __split_page_owner(struct page *page, int old_order, int new_order)
>>   void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>>   {
>>   	struct page_ext *page_ext;
>> +	struct page_ext *old_page_ext, *new_page_ext;
>>   	struct page_ext_iter iter;
>>   	struct page_owner *old_page_owner;
>>   	struct page_owner *new_page_owner;
>>   	depot_stack_handle_t migrate_handle;
>>   
>> -	page_ext = page_ext_get(&old->page);
>> -	if (unlikely(!page_ext))
>> +	old_page_ext = page_ext_get(&old->page);
>> +	if (unlikely(!old_page_ext))
>>   		return;
>>   
>> -	old_page_owner = get_page_owner(page_ext);
>> -	page_ext_put(page_ext);
>> +	old_page_owner = get_page_owner(old_page_ext);
>>   
>> -	page_ext = page_ext_get(&newfolio->page);
>> -	if (unlikely(!page_ext))
>> +	new_page_ext = page_ext_get(&newfolio->page);
>> +	if (unlikely(!new_page_ext)) {
>> +		page_ext_put(old_page_ext);
>>   		return;
>> +	}
>>   
>> -	new_page_owner = get_page_owner(page_ext);
>> -	page_ext_put(page_ext);
>> +	new_page_owner = get_page_owner(new_page_ext);
>>   
>>   	migrate_handle = new_page_owner->handle;
>>   	__update_page_owner_handle(&newfolio->page, old_page_owner->handle,
>> @@ -414,12 +415,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>>   	 * for the new one and the old folio otherwise there will be an imbalance
>>   	 * when subtracting those pages from the stack.
>>   	 */
>> -	rcu_read_lock();
>>   	for_each_page_ext(&old->page, 1 << new_page_owner->order, page_ext, iter) {
>>   		old_page_owner = get_page_owner(page_ext);
>>   		old_page_owner->handle = migrate_handle;
>>   	}
>> -	rcu_read_unlock();
>> +	page_ext_put(new_page_ext);
>> +	page_ext_put(old_page_ext);
>>   }
>
>How are you possibly able to call into __split_page_owner() while 
>concurrently we are already finished with offlining the memory (-> all 
>memory freed and isolated in the buddy) and triggering the notifier?

This patch does not involve __split_page_owner() – did you perhaps
mean __folio_copy_owner()?

You are right: when memory_notify(MEM_OFFLINE, &mem_arg)
is called, memory hot-remove has already completed.
At this stage, all memory in the mem_section has been fully freed
and removed from zone->free_area[], making them impossible
to be allocated.

Currently, only read_page_owner() and pagetypeinfo_showmixedcount_print()
genuinely require RCU locks to handle concurrency with MEM_OFFLINE events.
This is because during traversal of page frames across the system/zone,
we cannot pre-determine the hotplug/remove state of these PFNs.
In other functions, RCU locks are merely used to satisfy the assertion
WARN_ON_ONCE(!rcu_read_lock_held()) within lookup_page_ext(). right ?

Regarding page_ext, I have a question: Semantically, page_ext should
correspond to one structure per folio, not per 4K base page. In x86_64,
page_ext consumes 88 bytes—more memory than struct page itself.
As mTHP adoption grows, avoiding page owner metadata setup/cleanup
for tail pages would yield performance gains.
Moreover, with folio gradually replacing struct page, will struct page
eventually be superseded by dynamically allocated all kinds of memdesc_xxx_t?

If so, wouldn’t it be more reasonable to dynamically allocate a
corresponding page_ext structure after folio allocation?
While this would introduce overhead on the
memory allocation hotpath, I believe we could optimize it—perhaps
by allocating multiple page frames at once to store page_ext structures,
or by establishing a dedicated kmem_cache for page_ext.

What are your suggestions regarding this approach of
dynamically allocating page_ext structures?

>Doesn't make sense, no?



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/page_owner: fix prematurely released rcu_read_lock()
  2025-12-25  8:17   ` ranxiaokai627
@ 2025-12-30 20:45     ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-30 20:45 UTC (permalink / raw)
  To: ranxiaokai627
  Cc: akpm, hannes, jackmanb, linux-mm, luizcap, mhocko, ran.xiaokai,
	surenb, vbabka, ziy

On 12/25/25 09:17, ranxiaokai627@163.com wrote:
> Hi, David
> 
>> On 12/23/25 10:25, ranxiaokai627@163.com wrote:
>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>> --- a/mm/page_owner.c
>>> +++ b/mm/page_owner.c
>>> @@ -375,24 +375,25 @@ void __split_page_owner(struct page *page, int old_order, int new_order)
>>>    void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>>>    {
>>>    	struct page_ext *page_ext;
>>> +	struct page_ext *old_page_ext, *new_page_ext;
>>>    	struct page_ext_iter iter;
>>>    	struct page_owner *old_page_owner;
>>>    	struct page_owner *new_page_owner;
>>>    	depot_stack_handle_t migrate_handle;
>>>    
>>> -	page_ext = page_ext_get(&old->page);
>>> -	if (unlikely(!page_ext))
>>> +	old_page_ext = page_ext_get(&old->page);
>>> +	if (unlikely(!old_page_ext))
>>>    		return;
>>>    
>>> -	old_page_owner = get_page_owner(page_ext);
>>> -	page_ext_put(page_ext);
>>> +	old_page_owner = get_page_owner(old_page_ext);
>>>    
>>> -	page_ext = page_ext_get(&newfolio->page);
>>> -	if (unlikely(!page_ext))
>>> +	new_page_ext = page_ext_get(&newfolio->page);
>>> +	if (unlikely(!new_page_ext)) {
>>> +		page_ext_put(old_page_ext);
>>>    		return;
>>> +	}
>>>    
>>> -	new_page_owner = get_page_owner(page_ext);
>>> -	page_ext_put(page_ext);
>>> +	new_page_owner = get_page_owner(new_page_ext);
>>>    
>>>    	migrate_handle = new_page_owner->handle;
>>>    	__update_page_owner_handle(&newfolio->page, old_page_owner->handle,
>>> @@ -414,12 +415,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>>>    	 * for the new one and the old folio otherwise there will be an imbalance
>>>    	 * when subtracting those pages from the stack.
>>>    	 */
>>> -	rcu_read_lock();
>>>    	for_each_page_ext(&old->page, 1 << new_page_owner->order, page_ext, iter) {
>>>    		old_page_owner = get_page_owner(page_ext);
>>>    		old_page_owner->handle = migrate_handle;
>>>    	}
>>> -	rcu_read_unlock();
>>> +	page_ext_put(new_page_ext);
>>> +	page_ext_put(old_page_ext);
>>>    }
>>
>> How are you possibly able to call into __split_page_owner() while
>> concurrently we are already finished with offlining the memory (-> all
>> memory freed and isolated in the buddy) and triggering the notifier?
> 
> This patch does not involve __split_page_owner() – did you perhaps
> mean __folio_copy_owner()?

Yes, any kind of folio operation.

Once memory is offline any operation on the page/folio itself would 
already be fatal.

So if we would ever reach this code while memory is already offline we 
would be having a bad time, even without any page_ext magic.

> 
> You are right: when memory_notify(MEM_OFFLINE, &mem_arg)
> is called, memory hot-remove has already completed.
> At this stage, all memory in the mem_section has been fully freed
> and removed from zone->free_area[], making them impossible
> to be allocated.
> 
> Currently, only read_page_owner() and pagetypeinfo_showmixedcount_print()
> genuinely require RCU locks to handle concurrency with MEM_OFFLINE events.
> This is because during traversal of page frames across the system/zone,
> we cannot pre-determine the hotplug/remove state of these PFNs.
> In other functions, RCU locks are merely used to satisfy the assertion
> WARN_ON_ONCE(!rcu_read_lock_held()) within lookup_page_ext(). right ?

Exactly.

> 
> Regarding page_ext, I have a question: Semantically, page_ext should
> correspond to one structure per folio, not per 4K base page. In x86_64,
> page_ext consumes 88 bytes—more memory than struct page itself.
> As mTHP adoption grows, avoiding page owner metadata setup/cleanup
> for tail pages would yield performance gains.
> Moreover, with folio gradually replacing struct page, will struct page
> eventually be superseded by dynamically allocated all kinds of memdesc_xxx_t?
> 
> If so, wouldn’t it be more reasonable to dynamically allocate a
> corresponding page_ext structure after folio allocation?

Most (but not all ...) page_ext data will be per folio, yes. Some is per 
allocation (which might not be a folio in the future ...), which is 
where it gets tricky.

But yes, Willy already thought about simply placing some of the page_ext 
stuff into "struct folio" once we dynamically allocate "struct folio".

The question is how we will handle this for other allocations that will 
not be folios (e.g., where to store page owner).

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-12-30 20:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-23  9:25 [PATCH] mm/page_owner: fix prematurely released rcu_read_lock() ranxiaokai627
2025-12-23  9:42 ` David Hildenbrand (Red Hat)
2025-12-25  8:17   ` ranxiaokai627
2025-12-30 20:45     ` David Hildenbrand (Red Hat)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox