linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Muchun Song <muchun.song@linux.dev>
To: Oscar Salvador <osalvador@suse.de>
Cc: Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Charan Teja Kalla <quic_charante@quicinc.com>,
	Kairui Song <kasong@tencent.com>, Qi Zheng <qi.zheng@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Barry Song <baohua@kernel.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-cxl@vger.kernel.org
Subject: Re: [PATCH] mm/sparse: Fix race on mem_section->usage in pfn walkers
Date: Wed, 15 Apr 2026 17:45:30 +0800	[thread overview]
Message-ID: <9E4115F4-5787-495F-BCEE-1E9B1113E9E5@linux.dev> (raw)
In-Reply-To: <ad9OZAVnYvhxfh-Z@localhost.localdomain>



> On Apr 15, 2026, at 16:37, Oscar Salvador <osalvador@suse.de> wrote:
> 
> On Wed, Apr 15, 2026 at 10:23:26AM +0800, Muchun Song wrote:
>> When memory is hot-removed, section_deactivate() can tear down
>> mem_section->usage while concurrent pfn walkers still inspect the
>> subsection map via pfn_section_valid() or pfn_section_first_valid().
>> 
>> After commit 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing
>> memory_section->usage") converted the teardown to an RCU-based
>> scheme, the code still relies on SECTION_HAS_MEM_MAP becoming visible
>> to readers before ms->usage is cleared and queued for freeing.
>> 
>> That ordering is not guaranteed. section_deactivate() can clear
>> ms->usage and queue kfree_rcu() before another CPU observes the
>> SECTION_HAS_MEM_MAP clear. A concurrent pfn walker can therefore see
>> valid_section() return true, enter its sched-RCU read-side critical
>> section after kfree_rcu() has already been queued, and then dereference
>> a stale ms->usage pointer.
>> 
>> And pfn_to_online_page() can call pfn_section_valid() without its
>> own sched-RCU read-side critical section, which has similar problem.
>> 
>> The race looks like this:
>> 
>>  compact_zone()                    memunmap_pages
>>  ==============                    ==============
>>                                    __remove_pages()->
>>                                      sparse_remove_section()->
>>                                        section_deactivate():
>>                                          a) [ Clear SECTION_HAS_MEM_MAP
>>                                               is reordered to b) ]
>>                                          kfree_rcu(ms->usage)
>>      __pageblock_pfn_to_page
>>         ......
>>          pfn_valid():
>>            rcu_read_lock_sched()
>>            valid_section() // return true
>>            pfn_section_valid()
>>              [Access ms->usage which is UAF]
>>                                          WRITE_ONCE(ms->usage, NULL)
>>            rcu_read_unlock_sched()       b) Clear SECTION_HAS_MEM_MAP
>> 
>> Fix this by using rcu_replace_pointer() when clearing ms->usage in
>> section_deactivate(), then it does not rely on the order of clearing
>> of SECTION_HAS_MEM_MAP.
> 
> The fix itself does not look too intrusive and I guess it kind of makes
> sense when you think about the ordering issue, so if we want to be
> rock solid, why not.

You are right. Generally speaking, where RCU is used, the pointer should
be cleared first, and then the memory corresponding to the pointer should
be released via kfree_rcu(). Therefore, the correct sequence of use here
should be:

	WRITE_ONCE(ms->usage, NULL);
	kfree_rcu(ms->usage);

The actual code has the order reversed. To prevent such errors, the RCU
mechanism provides the rcu_replace_pointer() interface. If you look at its
implementation, this interface is essentially equivalent to the code sequence
mentioned above.

> Does it slow down operations a lot?

Regarding section_deactivate, I believe there is no functional difference.

> 
> I would also point out that you rcu-protect pfn_section_valid().

rcu_dereference_sched() is equivalent to the previous READ_ONCE(), but for
RCU-protected resources, it is recommended to use the RCU-specific interfaces
to avoid having to manually account for memory ordering issues.

> 
> Regarding the pfn_to_online_page() race, that is something that every now
> and then pops up, but as David said, we never seen that happening in the
> wild so I guess no one really made the time to look into that.

After taking a closer look at commit 5ec8e8ea8b77, it’s clear that it was
intended to resolve a race between __pageblock_pfn_to_page and section_deactivate.
While the issue surfaced this time due to ms->usage, I’m concerned that even
with ms->usage fixed, the underlying race condition in the execution path
still exists. This suggests that subsequent accesses to struct page might
run into similar trouble down the road.

That said, I’d love to hear what everyone thinks about whether this warrants a
full fix.

Thanks,
Muchun.

> 
> 
> 
> -- 
> Oscar Salvador
> SUSE Labs



      reply	other threads:[~2026-04-15  9:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15  2:23 Muchun Song
2026-04-15  5:44 ` Andrew Morton
2026-04-15  6:06   ` Muchun Song
2026-04-15  8:04 ` David Hildenbrand (Arm)
2026-04-15  9:20   ` Muchun Song
2026-04-15  8:37 ` Oscar Salvador
2026-04-15  9:45   ` Muchun Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9E4115F4-5787-495F-BCEE-1E9B1113E9E5@linux.dev \
    --to=muchun.song@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=david@kernel.org \
    --cc=kasong@tencent.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=qi.zheng@linux.dev \
    --cc=quic_charante@quicinc.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox