Memory hotplug locking issue: Useless (?) zone span seqlock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Memory hotplug locking issue: Useless (?) zone span seqlock
@ 2025-03-07 20:22 Mathieu Desnoyers
  2025-05-08 10:45 ` David Hildenbrand
  0 siblings, 1 reply; 4+ messages in thread
From: Mathieu Desnoyers @ 2025-03-07 20:22 UTC (permalink / raw)
  To: Andrew Morton, Oscar Salvador
  Cc: David Hildenbrand, Michal Hocko, Anshuman Khandual,
	Vlastimil Babka, Pavel Tatashin, Linus Torvalds, linux-kernel,
	linux-mm

I'm currently perfecting my understanding of the mm code and reviewing
pieces of it as I go, and stumbled on this:

commit 27cacaad16c5 ("mm,memory_hotplug: drop unneeded locking")

This commit removes all users of zone_span_writelock(), thus making
the inline useless, but leaves the now useless
zone_span_seqbegin()/zone_span_seqretry() in place within
page_outside_zone_boundaries().

So I'm confused. What's going on ?

And if this commit got things very wrong when removing the
seqlock, I wonder if there are cases where its partial
pgdat_resize_lock() removal can be an issue as well.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Memory hotplug locking issue: Useless (?) zone span seqlock
  2025-03-07 20:22 Memory hotplug locking issue: Useless (?) zone span seqlock Mathieu Desnoyers
@ 2025-05-08 10:45 ` David Hildenbrand
  2025-05-08 13:11   ` Mathieu Desnoyers
  0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand @ 2025-05-08 10:45 UTC (permalink / raw)
  To: Mathieu Desnoyers, Andrew Morton, Oscar Salvador
  Cc: Michal Hocko, Anshuman Khandual, Vlastimil Babka, Pavel Tatashin,
	Linus Torvalds, linux-kernel, linux-mm

On 07.03.25 21:22, Mathieu Desnoyers wrote:
> I'm currently perfecting my understanding of the mm code and reviewing
> pieces of it as I go, and stumbled on this:
> 
> commit 27cacaad16c5 ("mm,memory_hotplug: drop unneeded locking")
> 
> This commit removes all users of zone_span_writelock(), thus making
> the inline useless, but leaves the now useless
> zone_span_seqbegin()/zone_span_seqretry() in place within
> page_outside_zone_boundaries().
> 
> So I'm confused. What's going on ?
> 
> And if this commit got things very wrong when removing the
> seqlock, I wonder if there are cases where its partial
> pgdat_resize_lock() removal can be an issue as well.

I stumbled over that myself recently as well. I think I mentioned in the 
past that we should just store

start_pfn + end_pfn

instead of

start_pfn + nr_pages


Then, concurrent resizing could happen (and we could atomically read 
start_pfn / end_pfn).

Right now, when adjusting start_pfn, we always also have to adjust 
nr_pages. A concurrent reader calculating end_pfn manually could see 
some crappy result.

Having that said, I am not aware of issues in that area, but it all 
looks like only a partial cleanup to me.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Memory hotplug locking issue: Useless (?) zone span seqlock
  2025-05-08 10:45 ` David Hildenbrand
@ 2025-05-08 13:11   ` Mathieu Desnoyers
  2025-05-09  9:38     ` David Hildenbrand
  0 siblings, 1 reply; 4+ messages in thread
From: Mathieu Desnoyers @ 2025-05-08 13:11 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Oscar Salvador
  Cc: Michal Hocko, Anshuman Khandual, Vlastimil Babka, Pavel Tatashin,
	Linus Torvalds, linux-kernel, linux-mm

On 2025-05-08 06:45, David Hildenbrand wrote:
> On 07.03.25 21:22, Mathieu Desnoyers wrote:
>> I'm currently perfecting my understanding of the mm code and reviewing
>> pieces of it as I go, and stumbled on this:
>>
>> commit 27cacaad16c5 ("mm,memory_hotplug: drop unneeded locking")
>>
>> This commit removes all users of zone_span_writelock(), thus making
>> the inline useless, but leaves the now useless
>> zone_span_seqbegin()/zone_span_seqretry() in place within
>> page_outside_zone_boundaries().
>>
>> So I'm confused. What's going on ?
>>
>> And if this commit got things very wrong when removing the
>> seqlock, I wonder if there are cases where its partial
>> pgdat_resize_lock() removal can be an issue as well.
> 
> I stumbled over that myself recently as well. I think I mentioned in the 
> past that we should just store
> 
> start_pfn + end_pfn
> 
> instead of
> 
> start_pfn + nr_pages
> 
> 
> Then, concurrent resizing could happen (and we could atomically read 
> start_pfn / end_pfn).
> 
> Right now, when adjusting start_pfn, we always also have to adjust 
> nr_pages. A concurrent reader calculating end_pfn manually could see 
> some crappy result.
> 
> Having that said, I am not aware of issues in that area, but it all 
> looks like only a partial cleanup to me.

I wonder if all callers to zone_spans_pfn() prevent concurrent modification
of the zone start_pfn and nr_pages ?

For instance set_pfnblock_flags_mask() (called by set_pageblock_migratetype)
does:

        VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);

If we look at zone_spans_pfn():

static inline unsigned long zone_end_pfn(const struct zone *zone)
{
         return zone->zone_start_pfn + zone->spanned_pages;
}

static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
{
         return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
}

A concurrent updater which shrinks a zone by moving its start would have
to increment zone_start_pfn *and* decrement spanned_pages. If this happens
concurrently with the loads from zone_spans_pfn(), then it can observe
an intermediate state (only nr pages reduced or only zone start moved).
The first scenario at least can lead to false positive VM_BUG_ON().

Likewise if the updater expands the zone by moving its start left. If
the observer loads an updated start pfn without observing the nr pages
update, it can lead to false positive VM_BUG_ON().

I notice that zone_intersects() also uses zone_end_pfn(). It is used for
instance by default_kernel_zone_for_pfn() without locks. In this case,
reading both nr pages and start pfn concurrently with update could cause
a false-positive match, for instance if the start of the range is moved
but the nr pages prior value is loaded (concurrent shrink). This could
match a zone outside of the function parameter range.

Another reader of those fields is update_pgdat_span(), which appears to
be called only from remove_pfn_range_from_zone with mem_hotplug_lock
held in write mode. So that one should be fine.

AFAIU, updates to nr pages and zone start pfn are done by:

- remove_pfn_range_from_zone (AFAIU always called with mem_hotplug_lock
   in write mode),
- shrink_zone_span (called from remove_pfn_range_from_zone),
- resize_zone_range (__meminit function), called from move_pfn_range_to_zone()
   also called with mem_hotplug_lock in write mode.

So I think your idea of keeping track of both zone_start_pfn and zone_end_pfn
would solve this specific issue, however we'd have to carefully consider what
happens to users of spanned_pages (e.g. zone_is_empty()) callers, because it
would then require to calculate the spanned_pages from end - start, which
then can have similar issues wrt atomicity against concurrent updates.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Memory hotplug locking issue: Useless (?) zone span seqlock
  2025-05-08 13:11   ` Mathieu Desnoyers
@ 2025-05-09  9:38     ` David Hildenbrand
  0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand @ 2025-05-09  9:38 UTC (permalink / raw)
  To: Mathieu Desnoyers, Andrew Morton, Oscar Salvador
  Cc: Michal Hocko, Anshuman Khandual, Vlastimil Babka, Pavel Tatashin,
	Linus Torvalds, linux-kernel, linux-mm

On 08.05.25 15:11, Mathieu Desnoyers wrote:
> On 2025-05-08 06:45, David Hildenbrand wrote:
>> On 07.03.25 21:22, Mathieu Desnoyers wrote:
>>> I'm currently perfecting my understanding of the mm code and reviewing
>>> pieces of it as I go, and stumbled on this:
>>>
>>> commit 27cacaad16c5 ("mm,memory_hotplug: drop unneeded locking")
>>>
>>> This commit removes all users of zone_span_writelock(), thus making
>>> the inline useless, but leaves the now useless
>>> zone_span_seqbegin()/zone_span_seqretry() in place within
>>> page_outside_zone_boundaries().
>>>
>>> So I'm confused. What's going on ?
>>>
>>> And if this commit got things very wrong when removing the
>>> seqlock, I wonder if there are cases where its partial
>>> pgdat_resize_lock() removal can be an issue as well.
>>
>> I stumbled over that myself recently as well. I think I mentioned in the
>> past that we should just store
>>
>> start_pfn + end_pfn
>>
>> instead of
>>
>> start_pfn + nr_pages
>>
>>
>> Then, concurrent resizing could happen (and we could atomically read
>> start_pfn / end_pfn).
>>
>> Right now, when adjusting start_pfn, we always also have to adjust
>> nr_pages. A concurrent reader calculating end_pfn manually could see
>> some crappy result.
>>
>> Having that said, I am not aware of issues in that area, but it all
>> looks like only a partial cleanup to me.
> 
> I wonder if all callers to zone_spans_pfn() prevent concurrent modification
> of the zone start_pfn and nr_pages ?

I think compaction does not protect against that (e.g., no 
mem_hotplug_begin()).

My understanding was that compaction can mostly tolerate it (e.g., 
recheck pfn_to_online_page() + page_zone() check)

> 
> For instance set_pfnblock_flags_mask() (called by set_pageblock_migratetype)
> does:
> 
>          VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
> 

Yes, that might indeed trigger in some weird races.

> If we look at zone_spans_pfn():
> 
> static inline unsigned long zone_end_pfn(const struct zone *zone)
> {
>           return zone->zone_start_pfn + zone->spanned_pages;
> }
> 
> static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
> {
>           return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
> }
> 
> A concurrent updater which shrinks a zone by moving its start would have
> to increment zone_start_pfn *and* decrement spanned_pages. If this happens
> concurrently with the loads from zone_spans_pfn(), then it can observe
> an intermediate state (only nr pages reduced or only zone start moved).
> The first scenario at least can lead to false positive VM_BUG_ON().

Jup.

> 
> Likewise if the updater expands the zone by moving its start left. If
> the observer loads an updated start pfn without observing the nr pages
> update, it can lead to false positive VM_BUG_ON().
> 
> I notice that zone_intersects() also uses zone_end_pfn(). It is used for
> instance by default_kernel_zone_for_pfn() without locks. In this case,
> reading both nr pages and start pfn concurrently with update could cause
> a false-positive match, for instance if the start of the range is moved
> but the nr pages prior value is loaded (concurrent shrink). This could
> match a zone outside of the function parameter range.
> 
> Another reader of those fields is update_pgdat_span(), which appears to
> be called only from remove_pfn_range_from_zone with mem_hotplug_lock
> held in write mode. So that one should be fine.
> 
> AFAIU, updates to nr pages and zone start pfn are done by:
> 
> - remove_pfn_range_from_zone (AFAIU always called with mem_hotplug_lock
>     in write mode),
> - shrink_zone_span (called from remove_pfn_range_from_zone),
> - resize_zone_range (__meminit function), called from move_pfn_range_to_zone()
>     also called with mem_hotplug_lock in write mode.

Yes, I refactored that at some point. mem_hotplug_begin() could protect 
against it, but we really don't want to grab that all over the place.

> 
> So I think your idea of keeping track of both zone_start_pfn and zone_end_pfn
> would solve this specific issue, however we'd have to carefully consider what
> happens to users of spanned_pages (e.g. zone_is_empty()) callers, because it
> would then require to calculate the spanned_pages from end - start, which
> then can have similar issues wrt atomicity against concurrent updates.

We could easily just store both things (spanned pages + end_pfn), such 
that we can read both atomically.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-09  9:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-07 20:22 Memory hotplug locking issue: Useless (?) zone span seqlock Mathieu Desnoyers
2025-05-08 10:45 ` David Hildenbrand
2025-05-08 13:11   ` Mathieu Desnoyers
2025-05-09  9:38     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox