Re: [RFC PATCH 1/4] mm: use current as mmu notifier's owner

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Mika Penttilä" <mpenttil@redhat.com>
To: Balbir Singh <balbirs@nvidia.com>, Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	Leon Romanovsky <leonro@nvidia.com>
Subject: Re: [RFC PATCH 1/4] mm: use current as mmu notifier's owner
Date: Tue, 19 Aug 2025 07:33:02 +0300	[thread overview]
Message-ID: <614bb7b7-85c4-4e49-95e7-8faed5372cf6@redhat.com> (raw)
In-Reply-To: <e054f622-69ad-4724-b12d-c2545d913170@nvidia.com>

Hi,

On 8/19/25 07:27, Balbir Singh wrote:

> On 8/15/25 17:11, Mika Penttilä wrote:
>> On 8/15/25 08:23, Alistair Popple wrote:
>>
>>> On Thu, Aug 14, 2025 at 08:45:43PM +0300, Mika Penttilä wrote:
>>>> On 8/14/25 20:20, Jason Gunthorpe wrote:
>>>>
>>>>> On Thu, Aug 14, 2025 at 08:00:01PM +0300, Mika Penttilä wrote:
>>>>>> as well as hmm test module with :
>>>>>>
>>>>>>          * Ignore invalidation callbacks for device private pages since
>>>>>>          * the invalidation is handled as part of the migration process.
>>>>>>          */
>>>>>>         if (range->event == MMU_NOTIFY_MIGRATE &&
>>>>>>             range->owner == dmirror->mdevice)
>>>>>>                 return true;
>>>>> If I recall this was about a very specific case where migration does a
>>>>> number of invalidations and some of the earlier ones are known to be
>>>>> redundant in this specific case. Redundant means it can be ignored
>>>>> without causing an inconsistency.
>>>>>
>>>>> Alistair would know, but I assumed this works OK because the above
>>>>> invalidation doesn't actually go on to free any pages but keeps them
>>>>> around until a later invalidation?
>> Thanks Alistair for your deep insights! 
>>
>>> Right, the pages don't actually get freed because a reference is taken on them
>>> during migrate_vma_setup(). However other device MMU's still need invalidating
>>> because the driver will go on to copy the page after this step. It's just
>>> assumed that the driver is able to be consistent with itself (ie. it will unmap/
>>> invalidate it's own MMU prior to initiating the copy).
>> And reference is taken as well in migrate on fault during hmm_range_fault
>> if migrating.
>>
>>> In practice I suspect what Mika is running into is that the page table
>>> synchronisation for migration works slightly differently for migrate_vma_*().
>>>
>>> Instead of using mmu_interval_notifier's which have a sequence number drivers
>>> typically use normal mmu_notifier's and take a device specific lock to block
>>> page table downgrades (eg. RW -> RO). This ensures it's safe to update the
>>> device page tables with the PFNs/permissions collected in migrate_vma_setup()
>>> (or the new PFN) by blocking other threads from updating the page table.
>>>
>>> The ususal problem with this approach is that when migrate_vma_setup() calls
>>> the mmu_notifier it deadlocks on the device specific lock in the notifier
>>> callback because it already holds the lock, which it can't drop before calling
>>> migrate_vma_setup().
>>>
>>> I think one of the main benefits of a series which consolidates these two
>>> page-table mirroring techniques into common code would also be to make the
>>> mirroring/invalidation logic the same for migration as hmm_range_fault(). Ie. to
>>> move to mmu_interval notifers with sequence numbers for migration, perhaps with
>>> filtering if required/safe and retries
>> Yes with the migrate_vma_setup() and collecting removed, the firing of mmu notifiers
>> and "collecting" are integral part of the hmm_range_fault() flow, so logical to use
>> interval notifiers for migrate also.
>>
>> I have removed the commit with the owner games. I studied it more and seems it was added
>> to mitigate a bug in an early version, which led me to do wrong conclusion of the root cause
>> of the hang. That version had unbalanced mmu_notifier_invalidate_range_start()
>> after returning from hmm_range_fault() with EBUSY (after done a folio split).
>> With that fixed, driving the migrate on fault using the interval notifiers seems to work well, 
>> filtering MMU_NOTIFY_MIGRATE for device for retries.
>>
> So this patch can be ignored in the series?

Yes this can be ignored. I will do more testing and repost after a while with this removed and
possibly with other changes if needed.

>
> Balbir
>
--Mika

next prev parent reply	other threads:[~2025-08-19  4:33 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-14  7:19 [RFC PATCH 0/4] Migrate on fault for device pages Mika Penttilä
2025-08-14  7:19 ` [RFC PATCH 1/4] mm: use current as mmu notifier's owner Mika Penttilä
2025-08-14 12:40   ` Jason Gunthorpe
2025-08-14 12:53     ` Mika Penttilä
2025-08-14 13:04       ` Jason Gunthorpe
2025-08-14 13:20         ` Mika Penttilä
2025-08-14 14:11           ` Jason Gunthorpe
2025-08-14 17:00             ` Mika Penttilä
2025-08-14 17:20               ` Jason Gunthorpe
2025-08-14 17:45                 ` Mika Penttilä
2025-08-15  5:23                   ` Alistair Popple
2025-08-15  7:11                     ` Mika Penttilä
2025-08-19  4:27                       ` Balbir Singh
2025-08-19  4:33                         ` Mika Penttilä [this message]
2025-08-14  7:19 ` [RFC PATCH 2/4] mm: unified fault and migrate device page paths Mika Penttilä
2025-08-21  4:30   ` Balbir Singh
2025-08-21  5:10     ` Mika Penttilä
2025-08-22  5:02       ` Alistair Popple
2025-08-14  7:19 ` [RFC PATCH 3/4] mm:/migrate_device.c: remove migrate_vma_collect_*() functions Mika Penttilä
2025-08-14  7:19 ` [RFC PATCH 4/4] mm: add new testcase for the migrate on fault case Mika Penttilä
2025-08-15 11:36 ` [RFC PATCH 0/4] Migrate on fault for device pages Balbir Singh
2025-08-15 11:44   ` Mika Penttilä

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=614bb7b7-85c4-4e49-95e7-8faed5372cf6@redhat.com \
    --to=mpenttil@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox