From: "Mika Penttilä" <mpenttil@redhat.com>
To: Balbir Singh <balbirs@nvidia.com>, Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
David Hildenbrand <david@redhat.com>,
Leon Romanovsky <leonro@nvidia.com>
Subject: Re: [RFC PATCH 1/4] mm: use current as mmu notifier's owner
Date: Tue, 19 Aug 2025 07:33:02 +0300 [thread overview]
Message-ID: <614bb7b7-85c4-4e49-95e7-8faed5372cf6@redhat.com> (raw)
In-Reply-To: <e054f622-69ad-4724-b12d-c2545d913170@nvidia.com>
Hi,
On 8/19/25 07:27, Balbir Singh wrote:
> On 8/15/25 17:11, Mika Penttilä wrote:
>> On 8/15/25 08:23, Alistair Popple wrote:
>>
>>> On Thu, Aug 14, 2025 at 08:45:43PM +0300, Mika Penttilä wrote:
>>>> On 8/14/25 20:20, Jason Gunthorpe wrote:
>>>>
>>>>> On Thu, Aug 14, 2025 at 08:00:01PM +0300, Mika Penttilä wrote:
>>>>>> as well as hmm test module with :
>>>>>>
>>>>>> * Ignore invalidation callbacks for device private pages since
>>>>>> * the invalidation is handled as part of the migration process.
>>>>>> */
>>>>>> if (range->event == MMU_NOTIFY_MIGRATE &&
>>>>>> range->owner == dmirror->mdevice)
>>>>>> return true;
>>>>> If I recall this was about a very specific case where migration does a
>>>>> number of invalidations and some of the earlier ones are known to be
>>>>> redundant in this specific case. Redundant means it can be ignored
>>>>> without causing an inconsistency.
>>>>>
>>>>> Alistair would know, but I assumed this works OK because the above
>>>>> invalidation doesn't actually go on to free any pages but keeps them
>>>>> around until a later invalidation?
>> Thanks Alistair for your deep insights!
>>
>>> Right, the pages don't actually get freed because a reference is taken on them
>>> during migrate_vma_setup(). However other device MMU's still need invalidating
>>> because the driver will go on to copy the page after this step. It's just
>>> assumed that the driver is able to be consistent with itself (ie. it will unmap/
>>> invalidate it's own MMU prior to initiating the copy).
>> And reference is taken as well in migrate on fault during hmm_range_fault
>> if migrating.
>>
>>> In practice I suspect what Mika is running into is that the page table
>>> synchronisation for migration works slightly differently for migrate_vma_*().
>>>
>>> Instead of using mmu_interval_notifier's which have a sequence number drivers
>>> typically use normal mmu_notifier's and take a device specific lock to block
>>> page table downgrades (eg. RW -> RO). This ensures it's safe to update the
>>> device page tables with the PFNs/permissions collected in migrate_vma_setup()
>>> (or the new PFN) by blocking other threads from updating the page table.
>>>
>>> The ususal problem with this approach is that when migrate_vma_setup() calls
>>> the mmu_notifier it deadlocks on the device specific lock in the notifier
>>> callback because it already holds the lock, which it can't drop before calling
>>> migrate_vma_setup().
>>>
>>> I think one of the main benefits of a series which consolidates these two
>>> page-table mirroring techniques into common code would also be to make the
>>> mirroring/invalidation logic the same for migration as hmm_range_fault(). Ie. to
>>> move to mmu_interval notifers with sequence numbers for migration, perhaps with
>>> filtering if required/safe and retries
>> Yes with the migrate_vma_setup() and collecting removed, the firing of mmu notifiers
>> and "collecting" are integral part of the hmm_range_fault() flow, so logical to use
>> interval notifiers for migrate also.
>>
>> I have removed the commit with the owner games. I studied it more and seems it was added
>> to mitigate a bug in an early version, which led me to do wrong conclusion of the root cause
>> of the hang. That version had unbalanced mmu_notifier_invalidate_range_start()
>> after returning from hmm_range_fault() with EBUSY (after done a folio split).
>> With that fixed, driving the migrate on fault using the interval notifiers seems to work well,
>> filtering MMU_NOTIFY_MIGRATE for device for retries.
>>
> So this patch can be ignored in the series?
Yes this can be ignored. I will do more testing and repost after a while with this removed and
possibly with other changes if needed.
>
> Balbir
>
--Mika
next prev parent reply other threads:[~2025-08-19 4:33 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-14 7:19 [RFC PATCH 0/4] Migrate on fault for device pages Mika Penttilä
2025-08-14 7:19 ` [RFC PATCH 1/4] mm: use current as mmu notifier's owner Mika Penttilä
2025-08-14 12:40 ` Jason Gunthorpe
2025-08-14 12:53 ` Mika Penttilä
2025-08-14 13:04 ` Jason Gunthorpe
2025-08-14 13:20 ` Mika Penttilä
2025-08-14 14:11 ` Jason Gunthorpe
2025-08-14 17:00 ` Mika Penttilä
2025-08-14 17:20 ` Jason Gunthorpe
2025-08-14 17:45 ` Mika Penttilä
2025-08-15 5:23 ` Alistair Popple
2025-08-15 7:11 ` Mika Penttilä
2025-08-19 4:27 ` Balbir Singh
2025-08-19 4:33 ` Mika Penttilä [this message]
2025-08-14 7:19 ` [RFC PATCH 2/4] mm: unified fault and migrate device page paths Mika Penttilä
2025-08-21 4:30 ` Balbir Singh
2025-08-21 5:10 ` Mika Penttilä
2025-08-22 5:02 ` Alistair Popple
2025-08-14 7:19 ` [RFC PATCH 3/4] mm:/migrate_device.c: remove migrate_vma_collect_*() functions Mika Penttilä
2025-08-14 7:19 ` [RFC PATCH 4/4] mm: add new testcase for the migrate on fault case Mika Penttilä
2025-08-15 11:36 ` [RFC PATCH 0/4] Migrate on fault for device pages Balbir Singh
2025-08-15 11:44 ` Mika Penttilä
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=614bb7b7-85c4-4e49-95e7-8faed5372cf6@redhat.com \
--to=mpenttil@redhat.com \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=david@redhat.com \
--cc=jgg@nvidia.com \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox