From: Alistair Popple <apopple@nvidia.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Sean Christopherson <seanjc@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
robin.murphy@arm.com, will@kernel.org, nicolinc@nvidia.com,
linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
kvmarm@lists.cs.columbia.edu, jgg@nvidia.com
Subject: Re: [PATCH] mmu_notifiers: Notify on pte permission upgrades
Date: Tue, 23 May 2023 14:35:17 +1000 [thread overview]
Message-ID: <87ttw37jgt.fsf@nvidia.com> (raw)
In-Reply-To: <b99d20a4-ab1e-4e67-37ae-cb22777317ba@nvidia.com>
John Hubbard <jhubbard@nvidia.com> writes:
> On 5/22/23 16:50, Alistair Popple wrote:
> ...
>>> Again from include/linux/mmu_notifier.h, not implementing the start()/end() hooks
>>> is perfectly valid. And AFAICT, the existing invalidate_range() hook is pretty
>>> much a perfect fit for what you want to achieve.
>> Right, I didn't take that approach because it doesn't allow an event
>> type to be passed which would allow them to be filtered on platforms
>> which don't require this.
>> I had also assumed the invalidate_range() callbacks were allowed to
>> sleep, hence couldn't be called under PTL. That's certainly true of mmu
>> interval notifier callbacks, but Catalin reminded me that calls such as
>> ptep_clear_flush_notify() already call invalidate_range() callback under
>> PTL so I guess we already assume drivers don't sleep in their
>> invalidate_range() callbacks. I will update the comments to reflect
>
> This line of reasoning feels very fragile. The range notifiers generally
> do allow sleeping. They are using srcu (sleepable RCU) protection, btw.
Regardless of how well documented this is or isn't (it isn't currently,
but it used to be) it certainly seems to be a well established rule that
the .invalidate_range() callback cannot sleep. The vast majority of
callers do call this holding the PTL, and comments make it explicit that
this is somewhat expected:
Eg: In rmap.c:
* No need to call mmu_notifier_invalidate_range() it has be
* done above for all cases requiring it to happen under page
* table lock before mmu_notifier_invalidate_range_end()
> The fact that existing callers are calling these under PTL just means
> that so far, that has sorta worked. And yes, we can probably make this
> all work. That's not really the ideal way to deduce the API rules, though,
> and it would be good to clarify what they really are.
Of course not. I will update the documentation to clarify this, but see
below for some history which clarifies how we got here.
> Aside from those use cases, I don't see anything justifying a "not allowed
> to sleep" rule for .invalidate_range(), right?
Except that "those use cases" are approximately all of the use cases
AFAICT, and as it turns out this was actually a rule when
.invalidate_range() was added.
Commit 0f0a327fa12c ("mmu_notifier: add the callback for
mmu_notifier_invalidate_range()") included this in the documentation:
* The invalidate_range() function is called under the ptl
* spin-lock and not allowed to sleep.
This was later removed in 5ff7091f5a2c ("mm, mmu_notifier: annotate mmu
notifiers with blockable invalidate callbacks") which introduced the
MMU_INVALIDATE_DOES_NOT_BLOCK flag:
* If this [invalidate_range()] callback cannot block, and invalidate_range_{start,end}
* cannot block, mmu_notifier_ops.flags should have
* MMU_INVALIDATE_DOES_NOT_BLOCK set.
However the removal of the original comment seems wrong -
invalidate_range() was still getting called under the ptl and therefore
could not block regardless of if MMU_INVALIDATE_DOES_NOT_BLOCK was set
or not.
Of course the flag and related documentation was removed shortly after
by 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu
notifiers") and 4e15a073a168 ("Revert "mm, mmu_notifier: annotate mmu
notifiers with blockable invalidate callbacks"")
None of those changes actually made it safe for .invalidate_range()
callbacks to sleep, nor was that their goal. They were all about making
sure it was ok for .invalidate_range_start() to sleep AFAICT.
So I think it's perfectly fine to require .invalidate_range() callbacks
to be non-blocking, and if they are that's a driver bug. Note that this
isn't talking about mmu *interval* notifiers - they are slightly
different and don't hook into the mmu_notifier_invalidate_range() call.
They use start()/end() and as such are allowed to sleep.
- Alistair
> thanks,
next prev parent reply other threads:[~2023-05-23 4:36 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-22 6:37 Alistair Popple
2023-05-22 7:15 ` Qi Zheng
2023-05-22 7:45 ` Alistair Popple
2023-05-22 8:28 ` Qi Zheng
2023-05-22 15:09 ` Catalin Marinas
2023-05-22 23:52 ` Alistair Popple
2023-05-22 18:34 ` Sean Christopherson
2023-05-22 23:50 ` Alistair Popple
2023-05-23 0:06 ` Sean Christopherson
2023-05-23 0:43 ` Alistair Popple
2023-05-23 1:13 ` John Hubbard
2023-05-23 4:35 ` Alistair Popple [this message]
2023-05-23 0:55 ` John Hubbard
2023-05-23 1:12 ` Alistair Popple
2023-05-23 6:45 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttw37jgt.fsf@nvidia.com \
--to=apopple@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=seanjc@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox