linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Peter Xu <peterx@redhat.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	David Hildenbrand <david@redhat.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations
Date: Tue, 28 Jun 2022 20:30:46 +0000	[thread overview]
Message-ID: <9B0B584B-DE1B-48E3-B448-9D6C02DBDD20@vmware.com> (raw)
In-Reply-To: <YrtTQ/sLm97R32Eu@xz-m1.local>



> On Jun 28, 2022, at 12:15 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> ⚠ External Email
> 
> On Mon, Jun 27, 2022 at 11:37:32PM +0000, Nadav Amit wrote:
>> 
>> Besides the benefit of avoiding a TLB flush, there is also the benefit
>> of having more precise dirty tracking. You assume UFFDIO_CONTINUE will be
>> preceded by memory write to the shared memory, but that does not have to
>> be the case. Similarly, if in the future userfaultfd would also support
>> memory-backed private mappings, that does not have to be the case either.
> 
> Could I ask what's the not supported private mapping you're talking about
> (and I think you meant "file-backed")? I thought all kinds of private
> mappings are supported on all modes already?

Yes, I meant file-backed. See vma_can_userfault() for the supported
types of memory:

  static inline bool vma_can_userfault(struct vm_area_struct *vma,
                                     unsigned long vm_flags)
  {       
        if (vm_flags & VM_UFFD_MINOR)
                return is_vm_hugetlb_page(vma) || vma_is_shmem(vma);
                
#ifndef CONFIG_PTE_MARKER_UFFD_WP
        /*
         * If user requested uffd-wp but not enabled pte markers for
         * uffd-wp, then shmem & hugetlbfs are not supported but only
         * anonymous.
         */             
        if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma))
                return false;
#endif
        return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
            vma_is_shmem(vma);
  }


> 
> Thanks for explaining, so yes either false positive or false negative on
> pte dirty bit can bring a side effect on things, and the user knows the
> best. Makes sense. I think what I overlooked is the downside of having
> both write==dirty==1 when it doesn't need to.
> 
>> 
>> Putting all of the above aside, there is a bug in my code, but this
>> bug also points why dirty should not be set unconditionally. If someone
>> uses SOFT_DIRTY with userfaultfd, then marking the PTE as dirty (and
>> soft-dirty) might be misleading, causing unnecessary userspace writeback
>> of memory.
> 
> Well I never thought anyone would be using soft-dirty with uffd because
> logically uffd-wp was somehow trying to replace it with a better interface,
> hopefully.

I have heard about some who does (not me). So I do not make it up,
unfortunately.

> 
> If to talk about it, IMHO the most important thing is really not the dirty
> bit but the write bit: even if you leave both the dirty bits cleared (so
> the user specified !WRITE_HINT then we grant write bit only), it means when
> the write happens for sure dirty bit will be set on demand but not
> soft-dirty since we won't generate a page fault at all. AFAICT it'll be a
> false negative.
> 
> The old code is safe in that respect on that we always set dirty so we can
> only have false positive (which is acceptable in this case) but not false
> negative (which is not).

I agree that the PTE should be left RO in this case. I was just pointing
that the user might not be aware of soft-dirty being used, and then it
makes sense to treat the (lack of) write-hint appropriately.

In the current code, when PTEs are unconditionally marked as dirty, even
when they are write-protected, soft-dirty would produce false-positives.
Nothing would break, but it is not good for performance either.

> 
>> 
>> So I do need to fix my code so it would not write-unprotect memory if
>> soft-dirty is enabled and UFFD_FLAGS_WRITE_LIKELY is not provided. But
>> I think it emphasizes the benefit of having UFFD_FLAGS_WRITE_LIKELY.
> 
> Sorry to say that, but to me it seems a proof that dirty bit is even more
> tricky than access bit so we need to have better thoughts, even if the
> WRITE_LIKELY hint sounds straightforward. I now can buy in all the tlb
> flush points you explained, but still IMHO that's too fine granule (some
> random ptes in a batched tlb range flush may not even matter!) and I'm just
> naively wondering whether we need more thoughts for an uABI to be proposed
> like that. I won't ask for some use cases and especially numbers to prove
> assuming I'm asking too much, but really that'll be very persuasive if we
> can get. I'd not worry for ACCESS_LIKELY on it because that's much more
> straightforward to me.

Let me get back to you with numbers.

> 
>> 
>>> 
>>> Maybe with the to be proposed RFC patch for tlb flush we can know whether
>>> that should be something we can rely on. It'll add more dependency on this
>>> work which I'm sorry to say. It's just that IMHO we should think carefully
>>> for the write-hint because this is a solid new uABI we're talking about.
>>> 
>>> The other option is we can introduce the access hint first and think more
>>> on the dirty one (we can always add it when proper). What do you think?
>>> Also, David please chim in anytime if I missed the whole point when you
>>> proposed the idea.
>>> 
>>>> 
>>>> But this discussion made me think that there are two somewhat related
>>>> matters that we may want to address:
>>>> 
>>>> 1. mwriteprotect_range() should use MM_CP_TRY_CHANGE_WRITABLE when !wp
>>>> to proactively make entries writable and save .
>>> 
>>> I'm not sure I'm right here, but I think David's patch should have covered
>>> that case? The new helper only checks pte_uffd_wp() based on my memory,
>>> and when resolving page faults uffd-wp bit should have been gone, so it
>>> should be treated the same as normal ptes.
>> 
>> Let’s see we get to the same page:
>> 
>> mwriteprotect_range() does:
>> 
>> change_protection(&tlb, dst_vma, start, start + len, newprot,
>> enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE)
>> 
>> As you see no use of MM_CP_TRY_CHANGE_WRITABLE.
>> 
>> And then change_pte_range() does:
>> 
>> if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
>> !pte_write(ptent) &&
>> can_change_pte_writable(vma, addr, ptent))
>> ptent = pte_mkwrite(ptent);
>> 
>> If we do not provide MM_CP_TRY_CHANGE_WRITABLE, the PTE would not be
>> writable.
>> 
>> Now for the record, we may want an additional optimization, so if a
>> fault handler is woken because a PTE become writable, we do not retry
>> the page fault (since the PTE is already writable). It is a small change,
>> but let’s see first we agree on the patches so far…
> 
> Ah I see what I missed, thanks.
> 
> Another option is we call can_change_pte_writable() somehow in the
> unprotect branch.

I don’t think that I got this one.

tl;dr: I’ll run some numbers (access when PTE-dirty/clear), address the
aforementioned issues and send v2. Based on the results we can decide
whether it makes sense.



  reply	other threads:[~2022-06-28 20:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 18:50 [PATCH v1 0/5] userfaultfd: support access/write hints Nadav Amit
2022-06-22 18:50 ` [PATCH v1 1/5] userfaultfd: introduce uffd_flags Nadav Amit
2022-06-23 21:57   ` Peter Xu
2022-06-23 22:04     ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Nadav Amit
2022-06-23 23:24   ` Peter Xu
2022-06-23 23:35     ` Nadav Amit
2022-06-23 23:49       ` Peter Xu
2022-06-24  0:03         ` Nadav Amit
2022-06-24  2:05           ` Peter Xu
2022-06-24  2:42             ` Nadav Amit
2022-06-24 21:58               ` Peter Xu
2022-06-24 22:17                 ` Peter Xu
2022-06-25  7:49                   ` Nadav Amit
2022-06-27 13:12                     ` Peter Xu
2022-06-27 13:27                       ` David Hildenbrand
2022-06-27 14:59                         ` Peter Xu
2022-06-27 23:37                       ` Nadav Amit
2022-06-28 10:55                         ` David Hildenbrand
2022-06-28 19:15                         ` Peter Xu
2022-06-28 20:30                           ` Nadav Amit [this message]
2022-06-28 20:56                             ` Peter Xu
2022-06-28 21:03                               ` Nadav Amit
2022-06-28 21:12                                 ` Peter Xu
2022-06-28 21:15                                   ` Nadav Amit
2022-07-12  6:19   ` Nadav Amit
2022-07-12 14:56     ` Peter Xu
2022-07-13  1:09       ` Nadav Amit
2022-07-13 16:02         ` Peter Xu
2022-07-13 16:49           ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 3/5] userfaultfd: introduce write-likely mode for uffd operations Nadav Amit
2022-06-22 18:50 ` [PATCH v1 4/5] userfaultfd: zero access/write hints Nadav Amit
2022-06-23 23:34   ` Peter Xu
2022-06-22 18:50 ` [PATCH v1 5/5] selftest/userfaultfd: test read/write hints Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9B0B584B-DE1B-48E3-B448-9D6C02DBDD20@vmware.com \
    --to=namit@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox