From: Nadav Amit <namit@vmware.com>
To: Peter Xu <peterx@redhat.com>
Cc: Linux MM <linux-mm@kvack.org>,
Mike Kravetz <mike.kravetz@oracle.com>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
David Hildenbrand <david@redhat.com>,
Mike Rapoport <rppt@linux.ibm.com>,
Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations
Date: Tue, 28 Jun 2022 20:30:46 +0000 [thread overview]
Message-ID: <9B0B584B-DE1B-48E3-B448-9D6C02DBDD20@vmware.com> (raw)
In-Reply-To: <YrtTQ/sLm97R32Eu@xz-m1.local>
> On Jun 28, 2022, at 12:15 PM, Peter Xu <peterx@redhat.com> wrote:
>
> ⚠ External Email
>
> On Mon, Jun 27, 2022 at 11:37:32PM +0000, Nadav Amit wrote:
>>
>> Besides the benefit of avoiding a TLB flush, there is also the benefit
>> of having more precise dirty tracking. You assume UFFDIO_CONTINUE will be
>> preceded by memory write to the shared memory, but that does not have to
>> be the case. Similarly, if in the future userfaultfd would also support
>> memory-backed private mappings, that does not have to be the case either.
>
> Could I ask what's the not supported private mapping you're talking about
> (and I think you meant "file-backed")? I thought all kinds of private
> mappings are supported on all modes already?
Yes, I meant file-backed. See vma_can_userfault() for the supported
types of memory:
static inline bool vma_can_userfault(struct vm_area_struct *vma,
unsigned long vm_flags)
{
if (vm_flags & VM_UFFD_MINOR)
return is_vm_hugetlb_page(vma) || vma_is_shmem(vma);
#ifndef CONFIG_PTE_MARKER_UFFD_WP
/*
* If user requested uffd-wp but not enabled pte markers for
* uffd-wp, then shmem & hugetlbfs are not supported but only
* anonymous.
*/
if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma))
return false;
#endif
return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
vma_is_shmem(vma);
}
>
> Thanks for explaining, so yes either false positive or false negative on
> pte dirty bit can bring a side effect on things, and the user knows the
> best. Makes sense. I think what I overlooked is the downside of having
> both write==dirty==1 when it doesn't need to.
>
>>
>> Putting all of the above aside, there is a bug in my code, but this
>> bug also points why dirty should not be set unconditionally. If someone
>> uses SOFT_DIRTY with userfaultfd, then marking the PTE as dirty (and
>> soft-dirty) might be misleading, causing unnecessary userspace writeback
>> of memory.
>
> Well I never thought anyone would be using soft-dirty with uffd because
> logically uffd-wp was somehow trying to replace it with a better interface,
> hopefully.
I have heard about some who does (not me). So I do not make it up,
unfortunately.
>
> If to talk about it, IMHO the most important thing is really not the dirty
> bit but the write bit: even if you leave both the dirty bits cleared (so
> the user specified !WRITE_HINT then we grant write bit only), it means when
> the write happens for sure dirty bit will be set on demand but not
> soft-dirty since we won't generate a page fault at all. AFAICT it'll be a
> false negative.
>
> The old code is safe in that respect on that we always set dirty so we can
> only have false positive (which is acceptable in this case) but not false
> negative (which is not).
I agree that the PTE should be left RO in this case. I was just pointing
that the user might not be aware of soft-dirty being used, and then it
makes sense to treat the (lack of) write-hint appropriately.
In the current code, when PTEs are unconditionally marked as dirty, even
when they are write-protected, soft-dirty would produce false-positives.
Nothing would break, but it is not good for performance either.
>
>>
>> So I do need to fix my code so it would not write-unprotect memory if
>> soft-dirty is enabled and UFFD_FLAGS_WRITE_LIKELY is not provided. But
>> I think it emphasizes the benefit of having UFFD_FLAGS_WRITE_LIKELY.
>
> Sorry to say that, but to me it seems a proof that dirty bit is even more
> tricky than access bit so we need to have better thoughts, even if the
> WRITE_LIKELY hint sounds straightforward. I now can buy in all the tlb
> flush points you explained, but still IMHO that's too fine granule (some
> random ptes in a batched tlb range flush may not even matter!) and I'm just
> naively wondering whether we need more thoughts for an uABI to be proposed
> like that. I won't ask for some use cases and especially numbers to prove
> assuming I'm asking too much, but really that'll be very persuasive if we
> can get. I'd not worry for ACCESS_LIKELY on it because that's much more
> straightforward to me.
Let me get back to you with numbers.
>
>>
>>>
>>> Maybe with the to be proposed RFC patch for tlb flush we can know whether
>>> that should be something we can rely on. It'll add more dependency on this
>>> work which I'm sorry to say. It's just that IMHO we should think carefully
>>> for the write-hint because this is a solid new uABI we're talking about.
>>>
>>> The other option is we can introduce the access hint first and think more
>>> on the dirty one (we can always add it when proper). What do you think?
>>> Also, David please chim in anytime if I missed the whole point when you
>>> proposed the idea.
>>>
>>>>
>>>> But this discussion made me think that there are two somewhat related
>>>> matters that we may want to address:
>>>>
>>>> 1. mwriteprotect_range() should use MM_CP_TRY_CHANGE_WRITABLE when !wp
>>>> to proactively make entries writable and save .
>>>
>>> I'm not sure I'm right here, but I think David's patch should have covered
>>> that case? The new helper only checks pte_uffd_wp() based on my memory,
>>> and when resolving page faults uffd-wp bit should have been gone, so it
>>> should be treated the same as normal ptes.
>>
>> Let’s see we get to the same page:
>>
>> mwriteprotect_range() does:
>>
>> change_protection(&tlb, dst_vma, start, start + len, newprot,
>> enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE)
>>
>> As you see no use of MM_CP_TRY_CHANGE_WRITABLE.
>>
>> And then change_pte_range() does:
>>
>> if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
>> !pte_write(ptent) &&
>> can_change_pte_writable(vma, addr, ptent))
>> ptent = pte_mkwrite(ptent);
>>
>> If we do not provide MM_CP_TRY_CHANGE_WRITABLE, the PTE would not be
>> writable.
>>
>> Now for the record, we may want an additional optimization, so if a
>> fault handler is woken because a PTE become writable, we do not retry
>> the page fault (since the PTE is already writable). It is a small change,
>> but let’s see first we agree on the patches so far…
>
> Ah I see what I missed, thanks.
>
> Another option is we call can_change_pte_writable() somehow in the
> unprotect branch.
I don’t think that I got this one.
tl;dr: I’ll run some numbers (access when PTE-dirty/clear), address the
aforementioned issues and send v2. Based on the results we can decide
whether it makes sense.
next prev parent reply other threads:[~2022-06-28 20:30 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-22 18:50 [PATCH v1 0/5] userfaultfd: support access/write hints Nadav Amit
2022-06-22 18:50 ` [PATCH v1 1/5] userfaultfd: introduce uffd_flags Nadav Amit
2022-06-23 21:57 ` Peter Xu
2022-06-23 22:04 ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Nadav Amit
2022-06-23 23:24 ` Peter Xu
2022-06-23 23:35 ` Nadav Amit
2022-06-23 23:49 ` Peter Xu
2022-06-24 0:03 ` Nadav Amit
2022-06-24 2:05 ` Peter Xu
2022-06-24 2:42 ` Nadav Amit
2022-06-24 21:58 ` Peter Xu
2022-06-24 22:17 ` Peter Xu
2022-06-25 7:49 ` Nadav Amit
2022-06-27 13:12 ` Peter Xu
2022-06-27 13:27 ` David Hildenbrand
2022-06-27 14:59 ` Peter Xu
2022-06-27 23:37 ` Nadav Amit
2022-06-28 10:55 ` David Hildenbrand
2022-06-28 19:15 ` Peter Xu
2022-06-28 20:30 ` Nadav Amit [this message]
2022-06-28 20:56 ` Peter Xu
2022-06-28 21:03 ` Nadav Amit
2022-06-28 21:12 ` Peter Xu
2022-06-28 21:15 ` Nadav Amit
2022-07-12 6:19 ` Nadav Amit
2022-07-12 14:56 ` Peter Xu
2022-07-13 1:09 ` Nadav Amit
2022-07-13 16:02 ` Peter Xu
2022-07-13 16:49 ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 3/5] userfaultfd: introduce write-likely mode for uffd operations Nadav Amit
2022-06-22 18:50 ` [PATCH v1 4/5] userfaultfd: zero access/write hints Nadav Amit
2022-06-23 23:34 ` Peter Xu
2022-06-22 18:50 ` [PATCH v1 5/5] selftest/userfaultfd: test read/write hints Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9B0B584B-DE1B-48E3-B448-9D6C02DBDD20@vmware.com \
--to=namit@vmware.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox