Re: [PATCH] mm: migrate: Fix THP's mapcount on isolation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Alistair Popple <apopple@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Hugh Dickins <hughd@google.com>, Gavin Shan <gshan@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, william.kucharski@oracle.com,
	ziy@nvidia.com, kirill.shutemov@linux.intel.com,
	zhenyzha@redhat.com, shan.gavin@gmail.com, riel@surriel.com
Subject: Re: [PATCH] mm: migrate: Fix THP's mapcount on isolation
Date: Fri, 02 Dec 2022 09:35:42 +1100	[thread overview]
Message-ID: <87zgc6buoq.fsf@nvidia.com> (raw)
In-Reply-To: <2541344f-61d8-96d1-10e9-ba7e1743a299@redhat.com>


David Hildenbrand <david@redhat.com> writes:

>>>>> I think you're right. Without a page reference I don't think it is even
>>>>> safe to look at struct page, at least not without synchronisation
>>>>> against memory hot unplug which could remove the struct page. From a
>>>>> quick glance I didn't see anything here that obviously did that though.
>>>> Memory hotplug is the offending party here.  It has to make sure
>>>> that
>>>> everything else is definitely quiescent before removing the struct pages.
>>>> Otherwise you can't even try_get a refcount.
>> Sure, I might be missing something but how can memory hotplug
>> possibly
>> synchronise against some process (eg. try_to_compact_pages) doing
>> try_get(pfn_to_page(random_pfn)) without that process either informing
>> memory hotplug that it needs pages to remain valid long enough to get a
>> reference or detecting that memory hotplug is in the process of
>> offlining pages?
>
> It currently doesn't, and it's been mostly a known theoretical problem
> so far. We've been ignoring these kind of problems because they are
> not easy to sort out and so far never popped up in practice.
>
> First, the correct approach is using pfn_to_online_page() instead of
> pfn_to_page() when in doubt whether the page might already be
> offline. While we're using pfn_to_online_page() already in quite some
> PFN walkers, changes are good that we're still missing some.
>
> Second, essentially anybody (PFN walker) doing a pfn_to_online_page()
> is prone to races with memory offlining. I've discussed this in the
> past with Michal and Dan and one idea was to use RCU to synchronize
> PFN walkers and pfn_to_online_page(), essentially synchronizing
> clearing of the SECTION_IS_ONLINE flag.
>
> Nobody worked on that so far because we've never had actual BUG
> reports. These kind of races are rare, but they are theoretically
> possible.

Thanks for providing some background context here. I have been digging
into these details a bit because GPU drivers using device-private memory
tend to excersise memory hot unplug paths more frequently than what I
suspect is normally the case. I haven't actually seen any real BUG
reports caused by this though, and it's likely the places where they're
likely would involve device-private pages at the moment anyway. So agree
such races are rare.

[...]

>> try_to_compact_pages() -> compact_zone_order() -> compact_zone() ->
>> isolate_migratepages() -> isolate_migratepages_block() ->
>> PageHuge(pfn_to_page(pfn))
>> Again I couldn't see anything in that path that would hold the page
>> stable long enough to safely perform the pfn_to_page() and get a
>> reference. And after a bit of fiddling I was able to trigger the below
>> oops running the compaction_test selftest whilst offlining memory so I
>> don't think it is safe:
>
> Thanks for finding a reproducer. This is exactly the kind of BUG that
> we speculated about in the past but that we haven't seen in practice
> yet.

No worries. The reproducer consisted of running the compaction_test
selftest in a loop on a single NUMA node whilst cycling every memory
block in that node online/offline in parallel (ie. not a particularly
realistic real world workload). Eventually that led to a couple of
different kernel OOPS/BUG backtraces.

> Having that said, I'd be very happy if someone would have time to work
> on proper synchronization and I'd be happy to help
> brainstorming/reviewing :)

I would like to see the synchronisation cleaned up here and am happy to
work on it. I'm unlikely to have much bandwidth to do so before the new
year but any brainstorming/review would certainly be appreciated when I
do find some time though!

     prev parent reply	other threads:[~2022-12-01 22:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-23  0:57 Gavin Shan
2022-11-23  4:26 ` Alistair Popple
2022-11-23  5:06   ` Gavin Shan
2022-11-23  5:14 ` Hugh Dickins
2022-11-23  8:56   ` David Hildenbrand
2022-11-23 16:07     ` Matthew Wilcox
2022-11-24  8:50       ` David Hildenbrand
2022-11-24  0:14     ` Gavin Shan
2022-11-24  8:46       ` David Hildenbrand
2022-11-24  9:44         ` Gavin Shan
2022-11-24  1:06     ` Alistair Popple
2022-11-24  3:33       ` Matthew Wilcox
2022-11-24  8:49         ` David Hildenbrand
2022-11-25  0:58           ` Alistair Popple
2022-11-25  8:54             ` David Hildenbrand
2022-12-01 22:35               ` Alistair Popple [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgc6buoq.fsf@nvidia.com \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@surriel.com \
    --cc=shan.gavin@gmail.com \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=zhenyzha@redhat.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox