linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: ranxiaokai627@163.com
To: david@kernel.org
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org,
	jackmanb@google.com, linux-mm@kvack.org, luizcap@redhat.com,
	mhocko@suse.com, ran.xiaokai@zte.com.cn, ranxiaokai627@163.com,
	surenb@google.com, vbabka@suse.cz, ziy@nvidia.com
Subject: Re: [PATCH] mm/page_owner: fix prematurely released rcu_read_lock()
Date: Thu, 25 Dec 2025 08:17:43 +0000	[thread overview]
Message-ID: <20251225081753.142479-1-ranxiaokai627@163.com> (raw)
In-Reply-To: <c6b766bf-0f05-47c3-bcc6-2f0e1961a864@kernel.org>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 3863 bytes --]

Hi, David

>On 12/23/25 10:25, ranxiaokai627@163.com wrote:
>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
>> @@ -375,24 +375,25 @@ void __split_page_owner(struct page *page, int old_order, int new_order)
>>   void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>>   {
>>   	struct page_ext *page_ext;
>> +	struct page_ext *old_page_ext, *new_page_ext;
>>   	struct page_ext_iter iter;
>>   	struct page_owner *old_page_owner;
>>   	struct page_owner *new_page_owner;
>>   	depot_stack_handle_t migrate_handle;
>>   
>> -	page_ext = page_ext_get(&old->page);
>> -	if (unlikely(!page_ext))
>> +	old_page_ext = page_ext_get(&old->page);
>> +	if (unlikely(!old_page_ext))
>>   		return;
>>   
>> -	old_page_owner = get_page_owner(page_ext);
>> -	page_ext_put(page_ext);
>> +	old_page_owner = get_page_owner(old_page_ext);
>>   
>> -	page_ext = page_ext_get(&newfolio->page);
>> -	if (unlikely(!page_ext))
>> +	new_page_ext = page_ext_get(&newfolio->page);
>> +	if (unlikely(!new_page_ext)) {
>> +		page_ext_put(old_page_ext);
>>   		return;
>> +	}
>>   
>> -	new_page_owner = get_page_owner(page_ext);
>> -	page_ext_put(page_ext);
>> +	new_page_owner = get_page_owner(new_page_ext);
>>   
>>   	migrate_handle = new_page_owner->handle;
>>   	__update_page_owner_handle(&newfolio->page, old_page_owner->handle,
>> @@ -414,12 +415,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>>   	 * for the new one and the old folio otherwise there will be an imbalance
>>   	 * when subtracting those pages from the stack.
>>   	 */
>> -	rcu_read_lock();
>>   	for_each_page_ext(&old->page, 1 << new_page_owner->order, page_ext, iter) {
>>   		old_page_owner = get_page_owner(page_ext);
>>   		old_page_owner->handle = migrate_handle;
>>   	}
>> -	rcu_read_unlock();
>> +	page_ext_put(new_page_ext);
>> +	page_ext_put(old_page_ext);
>>   }
>
>How are you possibly able to call into __split_page_owner() while 
>concurrently we are already finished with offlining the memory (-> all 
>memory freed and isolated in the buddy) and triggering the notifier?

This patch does not involve __split_page_owner() – did you perhaps
mean __folio_copy_owner()?

You are right: when memory_notify(MEM_OFFLINE, &mem_arg)
is called, memory hot-remove has already completed.
At this stage, all memory in the mem_section has been fully freed
and removed from zone->free_area[], making them impossible
to be allocated.

Currently, only read_page_owner() and pagetypeinfo_showmixedcount_print()
genuinely require RCU locks to handle concurrency with MEM_OFFLINE events.
This is because during traversal of page frames across the system/zone,
we cannot pre-determine the hotplug/remove state of these PFNs.
In other functions, RCU locks are merely used to satisfy the assertion
WARN_ON_ONCE(!rcu_read_lock_held()) within lookup_page_ext(). right ?

Regarding page_ext, I have a question: Semantically, page_ext should
correspond to one structure per folio, not per 4K base page. In x86_64,
page_ext consumes 88 bytes—more memory than struct page itself.
As mTHP adoption grows, avoiding page owner metadata setup/cleanup
for tail pages would yield performance gains.
Moreover, with folio gradually replacing struct page, will struct page
eventually be superseded by dynamically allocated all kinds of memdesc_xxx_t?

If so, wouldn’t it be more reasonable to dynamically allocate a
corresponding page_ext structure after folio allocation?
While this would introduce overhead on the
memory allocation hotpath, I believe we could optimize it—perhaps
by allocating multiple page frames at once to store page_ext structures,
or by establishing a dedicated kmem_cache for page_ext.

What are your suggestions regarding this approach of
dynamically allocating page_ext structures?

>Doesn't make sense, no?



  reply	other threads:[~2025-12-25  8:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-23  9:25 ranxiaokai627
2025-12-23  9:42 ` David Hildenbrand (Red Hat)
2025-12-25  8:17   ` ranxiaokai627 [this message]
2025-12-30 20:45     ` David Hildenbrand (Red Hat)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251225081753.142479-1-ranxiaokai627@163.com \
    --to=ranxiaokai627@163.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-mm@kvack.org \
    --cc=luizcap@redhat.com \
    --cc=mhocko@suse.com \
    --cc=ran.xiaokai@zte.com.cn \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox