linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Nadav Amit <nadav.amit@gmail.com>, Peter Xu <peterx@redhat.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 3/3] mm/uffd: Detect pgtable allocation failures
Date: Thu, 5 Jan 2023 09:59:28 +0100	[thread overview]
Message-ID: <cc6e17ad-1afd-5c52-a06e-1d89d1978368@redhat.com> (raw)
In-Reply-To: <AF984D5D-DC66-4FD3-A749-5AF6B7289E0D@gmail.com>

On 05.01.23 04:10, Nadav Amit wrote:
> 
>> On Jan 4, 2023, at 2:52 PM, Peter Xu <peterx@redhat.com> wrote:
>>
>> Before this patch, when there's any pgtable allocation issues happened
>> during change_protection(), the error will be ignored from the syscall.
>> For shmem, there will be an error dumped into the host dmesg.  Two issues
>> with that:
>>
>>   (1) Doing a trace dump when allocation fails is not anything close to
>>       grace..
>>
>>   (2) The user should be notified with any kind of such error, so the user
>>       can trap it and decide what to do next, either by retrying, or stop
>>       the process properly, or anything else.
>>
>> For userfault users, this will change the API of UFFDIO_WRITEPROTECT when
>> pgtable allocation failure happened.  It should not normally break anyone,
>> though.  If it breaks, then in good ways.
>>
>> One man-page update will be on the way to introduce the new -ENOMEM for
>> UFFDIO_WRITEPROTECT.  Not marking stable so we keep the old behavior on the
>> 5.19-till-now kernels.
> 
> I understand that the current assumption is that change_protection() should
> fully succeed or fail, and I guess this is the current behavior.
> 
> However, to be more “future-proof” perhaps this needs to be revisited.
> 
> For instance, UFFDIO_WRITEPROTECT can benefit from the ability to (based on
> userspace request) prevent write-protection of pages that are pinned. This is
> necessary to allow userspace uffd monitor to avoid write-protection of
> O_DIRECT’d memory, for instance, that might change even if a uffd monitor
> considers it write-protected.

Just a note that this is pretty tricky IMHO, because:

a) We cannot distinguished "pinned readable" from "pinned writable"
b) We can have false positives ("pinned") even for compound pages due to
    concurrent GUP-fast.
c) Synchronizing against GUP-fast is pretty tricky ... as we learned.
    Concurrent pinning is usually problematic.
d) O_DIRECT still uses FOLL_GET and we cannot identify that. (at least
    that should be figured out at one point)

I have a patch lying around for a very long time that removes that 
special-pinned handling from softdirty code, because of the above 
reasons (and because it forgets THP). For now I didn't send it because 
for softdirty, it's acceptable to over-indicate and it hasn't been 
reported to be an actual problem so far.

For existing UFFDIO_WRITEPROTECT users, however, it might be very 
harmful (especially for existing users) to get false protection errors. 
Failing due to ENOMEM is different to failing due to some temporary 
concurrency issues.

Having that said, I started thinking about alternative ways of detecting 
that in that past, without much outcome so far: that latest idea was 
indicating "this MM has had pinned pages at one point, be careful 
because any techniques that use write-protection (softdirty, mprotect, 
uffd-wp) won't be able to catch writes via pinned pages reliably".

Hm.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2023-01-05  8:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-04 22:52 [PATCH 0/3] mm/uffd: Fix missing markers on hugetlb Peter Xu
2023-01-04 22:52 ` [PATCH 1/3] mm/hugetlb: Pre-allocate pgtable pages for uffd wr-protects Peter Xu
2023-01-05  1:50   ` James Houghton
2023-01-05  8:39   ` David Hildenbrand
2023-01-05 18:37   ` Mike Kravetz
2023-01-04 22:52 ` [PATCH 2/3] mm/mprotect: Use long for page accountings and retval Peter Xu
2023-01-05  1:51   ` James Houghton
2023-01-05  8:44   ` David Hildenbrand
2023-01-05 19:22     ` Peter Xu
2023-01-09  8:04       ` David Hildenbrand
2023-01-05 18:48   ` Mike Kravetz
2023-01-04 22:52 ` [PATCH 3/3] mm/uffd: Detect pgtable allocation failures Peter Xu
2023-01-05  1:52   ` James Houghton
2023-01-05  3:10   ` Nadav Amit
2023-01-05  8:59     ` David Hildenbrand [this message]
2023-01-05 18:01       ` Nadav Amit
2023-01-05 19:51         ` Peter Xu
2023-01-18 21:51           ` Nadav Amit
2023-01-09  8:36         ` David Hildenbrand
2023-01-05  8:47   ` David Hildenbrand
2023-01-05  8:16 ` [PATCH 0/3] mm/uffd: Fix missing markers on hugetlb David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc6e17ad-1afd-5c52-a06e-1d89d1978368@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox