Re: [PATCH] mm/page_owner: fix prematurely released rcu_read_lock()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: ranxiaokai627@163.com
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org,
	jackmanb@google.com, linux-mm@kvack.org, luizcap@redhat.com,
	mhocko@suse.com, ran.xiaokai@zte.com.cn, surenb@google.com,
	vbabka@suse.cz, ziy@nvidia.com
Subject: Re: [PATCH] mm/page_owner: fix prematurely released rcu_read_lock()
Date: Tue, 30 Dec 2025 21:45:42 +0100	[thread overview]
Message-ID: <530cf16c-ce1b-4ec7-b76e-23b184965501@kernel.org> (raw)
In-Reply-To: <20251225081753.142479-1-ranxiaokai627@163.com>

On 12/25/25 09:17, ranxiaokai627@163.com wrote:
> Hi, David
> 
>> On 12/23/25 10:25, ranxiaokai627@163.com wrote:
>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>> --- a/mm/page_owner.c
>>> +++ b/mm/page_owner.c
>>> @@ -375,24 +375,25 @@ void __split_page_owner(struct page *page, int old_order, int new_order)
>>>    void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>>>    {
>>>    	struct page_ext *page_ext;
>>> +	struct page_ext *old_page_ext, *new_page_ext;
>>>    	struct page_ext_iter iter;
>>>    	struct page_owner *old_page_owner;
>>>    	struct page_owner *new_page_owner;
>>>    	depot_stack_handle_t migrate_handle;
>>>    
>>> -	page_ext = page_ext_get(&old->page);
>>> -	if (unlikely(!page_ext))
>>> +	old_page_ext = page_ext_get(&old->page);
>>> +	if (unlikely(!old_page_ext))
>>>    		return;
>>>    
>>> -	old_page_owner = get_page_owner(page_ext);
>>> -	page_ext_put(page_ext);
>>> +	old_page_owner = get_page_owner(old_page_ext);
>>>    
>>> -	page_ext = page_ext_get(&newfolio->page);
>>> -	if (unlikely(!page_ext))
>>> +	new_page_ext = page_ext_get(&newfolio->page);
>>> +	if (unlikely(!new_page_ext)) {
>>> +		page_ext_put(old_page_ext);
>>>    		return;
>>> +	}
>>>    
>>> -	new_page_owner = get_page_owner(page_ext);
>>> -	page_ext_put(page_ext);
>>> +	new_page_owner = get_page_owner(new_page_ext);
>>>    
>>>    	migrate_handle = new_page_owner->handle;
>>>    	__update_page_owner_handle(&newfolio->page, old_page_owner->handle,
>>> @@ -414,12 +415,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
>>>    	 * for the new one and the old folio otherwise there will be an imbalance
>>>    	 * when subtracting those pages from the stack.
>>>    	 */
>>> -	rcu_read_lock();
>>>    	for_each_page_ext(&old->page, 1 << new_page_owner->order, page_ext, iter) {
>>>    		old_page_owner = get_page_owner(page_ext);
>>>    		old_page_owner->handle = migrate_handle;
>>>    	}
>>> -	rcu_read_unlock();
>>> +	page_ext_put(new_page_ext);
>>> +	page_ext_put(old_page_ext);
>>>    }
>>
>> How are you possibly able to call into __split_page_owner() while
>> concurrently we are already finished with offlining the memory (-> all
>> memory freed and isolated in the buddy) and triggering the notifier?
> 
> This patch does not involve __split_page_owner() – did you perhaps
> mean __folio_copy_owner()?

Yes, any kind of folio operation.

Once memory is offline any operation on the page/folio itself would 
already be fatal.

So if we would ever reach this code while memory is already offline we 
would be having a bad time, even without any page_ext magic.

> 
> You are right: when memory_notify(MEM_OFFLINE, &mem_arg)
> is called, memory hot-remove has already completed.
> At this stage, all memory in the mem_section has been fully freed
> and removed from zone->free_area[], making them impossible
> to be allocated.
> 
> Currently, only read_page_owner() and pagetypeinfo_showmixedcount_print()
> genuinely require RCU locks to handle concurrency with MEM_OFFLINE events.
> This is because during traversal of page frames across the system/zone,
> we cannot pre-determine the hotplug/remove state of these PFNs.
> In other functions, RCU locks are merely used to satisfy the assertion
> WARN_ON_ONCE(!rcu_read_lock_held()) within lookup_page_ext(). right ?

Exactly.

> 
> Regarding page_ext, I have a question: Semantically, page_ext should
> correspond to one structure per folio, not per 4K base page. In x86_64,
> page_ext consumes 88 bytes—more memory than struct page itself.
> As mTHP adoption grows, avoiding page owner metadata setup/cleanup
> for tail pages would yield performance gains.
> Moreover, with folio gradually replacing struct page, will struct page
> eventually be superseded by dynamically allocated all kinds of memdesc_xxx_t?
> 
> If so, wouldn’t it be more reasonable to dynamically allocate a
> corresponding page_ext structure after folio allocation?

Most (but not all ...) page_ext data will be per folio, yes. Some is per 
allocation (which might not be a folio in the future ...), which is 
where it gets tricky.

But yes, Willy already thought about simply placing some of the page_ext 
stuff into "struct folio" once we dynamically allocate "struct folio".

The question is how we will handle this for other allocations that will 
not be folios (e.g., where to store page owner).

-- 
Cheers

David

     prev parent reply	other threads:[~2025-12-30 20:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-23  9:25 ranxiaokai627
2025-12-23  9:42 ` David Hildenbrand (Red Hat)
2025-12-25  8:17   ` ranxiaokai627
2025-12-30 20:45     ` David Hildenbrand (Red Hat) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=530cf16c-ce1b-4ec7-b76e-23b184965501@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-mm@kvack.org \
    --cc=luizcap@redhat.com \
    --cc=mhocko@suse.com \
    --cc=ran.xiaokai@zte.com.cn \
    --cc=ranxiaokai627@163.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox