linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: Memory hotplug locking issue: Useless (?) zone span seqlock
Date: Fri, 9 May 2025 11:38:04 +0200	[thread overview]
Message-ID: <f93ef55c-250d-445f-9d77-b619685c419b@redhat.com> (raw)
In-Reply-To: <8a4a5d6e-d2cf-420d-91f3-2797618e7255@efficios.com>

On 08.05.25 15:11, Mathieu Desnoyers wrote:
> On 2025-05-08 06:45, David Hildenbrand wrote:
>> On 07.03.25 21:22, Mathieu Desnoyers wrote:
>>> I'm currently perfecting my understanding of the mm code and reviewing
>>> pieces of it as I go, and stumbled on this:
>>>
>>> commit 27cacaad16c5 ("mm,memory_hotplug: drop unneeded locking")
>>>
>>> This commit removes all users of zone_span_writelock(), thus making
>>> the inline useless, but leaves the now useless
>>> zone_span_seqbegin()/zone_span_seqretry() in place within
>>> page_outside_zone_boundaries().
>>>
>>> So I'm confused. What's going on ?
>>>
>>> And if this commit got things very wrong when removing the
>>> seqlock, I wonder if there are cases where its partial
>>> pgdat_resize_lock() removal can be an issue as well.
>>
>> I stumbled over that myself recently as well. I think I mentioned in the
>> past that we should just store
>>
>> start_pfn + end_pfn
>>
>> instead of
>>
>> start_pfn + nr_pages
>>
>>
>> Then, concurrent resizing could happen (and we could atomically read
>> start_pfn / end_pfn).
>>
>> Right now, when adjusting start_pfn, we always also have to adjust
>> nr_pages. A concurrent reader calculating end_pfn manually could see
>> some crappy result.
>>
>> Having that said, I am not aware of issues in that area, but it all
>> looks like only a partial cleanup to me.
> 
> I wonder if all callers to zone_spans_pfn() prevent concurrent modification
> of the zone start_pfn and nr_pages ?

I think compaction does not protect against that (e.g., no 
mem_hotplug_begin()).

My understanding was that compaction can mostly tolerate it (e.g., 
recheck pfn_to_online_page() + page_zone() check)

> 
> For instance set_pfnblock_flags_mask() (called by set_pageblock_migratetype)
> does:
> 
>          VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
> 

Yes, that might indeed trigger in some weird races.

> If we look at zone_spans_pfn():
> 
> static inline unsigned long zone_end_pfn(const struct zone *zone)
> {
>           return zone->zone_start_pfn + zone->spanned_pages;
> }
> 
> static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
> {
>           return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
> }
> 
> A concurrent updater which shrinks a zone by moving its start would have
> to increment zone_start_pfn *and* decrement spanned_pages. If this happens
> concurrently with the loads from zone_spans_pfn(), then it can observe
> an intermediate state (only nr pages reduced or only zone start moved).
> The first scenario at least can lead to false positive VM_BUG_ON().

Jup.

> 
> Likewise if the updater expands the zone by moving its start left. If
> the observer loads an updated start pfn without observing the nr pages
> update, it can lead to false positive VM_BUG_ON().
> 
> I notice that zone_intersects() also uses zone_end_pfn(). It is used for
> instance by default_kernel_zone_for_pfn() without locks. In this case,
> reading both nr pages and start pfn concurrently with update could cause
> a false-positive match, for instance if the start of the range is moved
> but the nr pages prior value is loaded (concurrent shrink). This could
> match a zone outside of the function parameter range.
> 
> Another reader of those fields is update_pgdat_span(), which appears to
> be called only from remove_pfn_range_from_zone with mem_hotplug_lock
> held in write mode. So that one should be fine.
> 
> AFAIU, updates to nr pages and zone start pfn are done by:
> 
> - remove_pfn_range_from_zone (AFAIU always called with mem_hotplug_lock
>     in write mode),
> - shrink_zone_span (called from remove_pfn_range_from_zone),
> - resize_zone_range (__meminit function), called from move_pfn_range_to_zone()
>     also called with mem_hotplug_lock in write mode.

Yes, I refactored that at some point. mem_hotplug_begin() could protect 
against it, but we really don't want to grab that all over the place.

> 
> So I think your idea of keeping track of both zone_start_pfn and zone_end_pfn
> would solve this specific issue, however we'd have to carefully consider what
> happens to users of spanned_pages (e.g. zone_is_empty()) callers, because it
> would then require to calculate the spanned_pages from end - start, which
> then can have similar issues wrt atomicity against concurrent updates.

We could easily just store both things (spanned pages + end_pfn), such 
that we can read both atomically.

-- 
Cheers,

David / dhildenb



      reply	other threads:[~2025-05-09  9:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-07 20:22 Mathieu Desnoyers
2025-05-08 10:45 ` David Hildenbrand
2025-05-08 13:11   ` Mathieu Desnoyers
2025-05-09  9:38     ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f93ef55c-250d-445f-9d77-b619685c419b@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox