linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Peter Xu <peterx@redhat.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	David Hildenbrand <david@redhat.com>,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations
Date: Fri, 24 Jun 2022 00:03:38 +0000	[thread overview]
Message-ID: <18BCC23E-B344-41A8-926D-A49D768485AF@vmware.com> (raw)
In-Reply-To: <YrT8ISwJ+3x6nIKz@xz-m1.local>

On Jun 23, 2022, at 4:49 PM, Peter Xu <peterx@redhat.com> wrote:

> On Thu, Jun 23, 2022 at 11:35:00PM +0000, Nadav Amit wrote:
>> On Jun 23, 2022, at 4:24 PM, Peter Xu <peterx@redhat.com> wrote:
>> 
>>> ⚠ External Email
>>> 
>>> On Wed, Jun 22, 2022 at 11:50:35AM -0700, Nadav Amit wrote:
>>>> From: Nadav Amit <namit@vmware.com>
>>>> 
>>>> Using a PTE on x86 with cleared access-bit (aka young-bit)
>>>> takes ~600 cycles more than when the access bit is set. At the same
>>>> time, setting the access-bit for memory that is not used (e.g.,
>>>> prefetched) can introduce greater overheads, as the prefetched memory is
>>>> reclaimed later than it should be.
>>>> 
>>>> Userfaultfd currently does not set the access-bit (excluding the
>>>> huge-pages case). Arguably, it is best to let the user control whether
>>>> the access bit should be set or not. The expected use is to request
>>>> userfaultfd to set the access-bit when the copy/wp operation is done to
>>>> resolve a page-fault, and not to set the access-bit when the memory is
>>>> prefetched.
>>>> 
>>>> Introduce UFFDIO_[op]_ACCESS_LIKELY to enable userspace to request the
>>>> young bit to be set.
>>>> 
>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>> Cc: Hugh Dickins <hughd@google.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Axel Rasmussen <axelrasmussen@google.com>
>>>> Cc: Peter Xu <peterx@redhat.com>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>> 
>>> Hmm.. is the hugetlb code overlooked (for both of the hints), or maybe I
>>> missed it? Do we need to cover them too? Thanks,
>> 
>> I do not know what the value of not setting the PTE’s access/dirty when it
>> comes to performance. The pages won’t be swapped out, just as you wrote in
>> your comment in hugetlb_mcopy_atomic_pte():
>> 
>>        /*
>>         * Always mark UFFDIO_COPY page dirty; note that this may not be
>>         * extremely important for hugetlbfs for now since swapping is not
>>         * supported, but we should still be clear in that this page cannot be
>>         * thrown away at will, even if write bit not set.
>>         */
>>        _dst_pte = huge_pte_mkdirty(_dst_pte);
>>        _dst_pte = pte_mkyoung(_dst_pte);
>> 
>> If you want for consistency/robustness not to set dirty on read-only
>> entries, that’s something that I can do.
> 
> We have more than one options here I guess.
> 
> We could have the hints not available for hugetlbfs, but then we'll both
> need to add special document for hugetlbfs (when you write the man-page,
> let's say) and also it'll be probably better to fail the ioctls with hints
> when applying to hugetlb vmas.
> 
> Leaving them silent to hugetlbfs is another option, it just sounds weird to
> me without any kind of documentation so I like this one least.
> 
> Or we could make them work for hugetlbfs too.  Not saying that swap will
> work some day (but it still could?!) it just seems all consistent as you
> said.
> 
> So I'd prefer the last one, but only if you agree.

My take is that hints are hints. Following David’s (or was it yours?)
feedback, I fixed the description to indicate that this is merely a hint and
removed all references to dirty/access bits. The kernel therefore can ignore
the hint when it wants to or use it in any other way. I fully agree that
this gives the kernel the ability to change the behavior as needed.

Note that for write-protected 4KB zero-page (where we share the zero-page)
we always set the access-bit, regardless of the hint, because it makes
sense: the zero-page is not swappable and therefore the access-bit is set.

I think that the lesser user-facing documentation there is on how the
feature is *exactly* used by the kernel - is better from an API point of
view.

So I see no reason to fail or be forced not to set a page as young, just
because a hint was *not* provided. This would even be a regression in the
behavior. The hint is actually always respected right now, it is just that
even if you do not provide the hint, the access/dirty is set.

The only consistency I think worth thinking about is with the dirty-bit, and
I can add it if you want. Note that the access-bit (in x86) might be set
speculatively in contrast to the dirty-bit is only set atomically with a
real access. That’s the reason I think it may make sense not to set the
dirty without a hint.

Is that acceptable? Access-bit always set, dirty-bit according to hint?


  reply	other threads:[~2022-06-24  0:03 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 18:50 [PATCH v1 0/5] userfaultfd: support access/write hints Nadav Amit
2022-06-22 18:50 ` [PATCH v1 1/5] userfaultfd: introduce uffd_flags Nadav Amit
2022-06-23 21:57   ` Peter Xu
2022-06-23 22:04     ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Nadav Amit
2022-06-23 23:24   ` Peter Xu
2022-06-23 23:35     ` Nadav Amit
2022-06-23 23:49       ` Peter Xu
2022-06-24  0:03         ` Nadav Amit [this message]
2022-06-24  2:05           ` Peter Xu
2022-06-24  2:42             ` Nadav Amit
2022-06-24 21:58               ` Peter Xu
2022-06-24 22:17                 ` Peter Xu
2022-06-25  7:49                   ` Nadav Amit
2022-06-27 13:12                     ` Peter Xu
2022-06-27 13:27                       ` David Hildenbrand
2022-06-27 14:59                         ` Peter Xu
2022-06-27 23:37                       ` Nadav Amit
2022-06-28 10:55                         ` David Hildenbrand
2022-06-28 19:15                         ` Peter Xu
2022-06-28 20:30                           ` Nadav Amit
2022-06-28 20:56                             ` Peter Xu
2022-06-28 21:03                               ` Nadav Amit
2022-06-28 21:12                                 ` Peter Xu
2022-06-28 21:15                                   ` Nadav Amit
2022-07-12  6:19   ` Nadav Amit
2022-07-12 14:56     ` Peter Xu
2022-07-13  1:09       ` Nadav Amit
2022-07-13 16:02         ` Peter Xu
2022-07-13 16:49           ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 3/5] userfaultfd: introduce write-likely mode for uffd operations Nadav Amit
2022-06-22 18:50 ` [PATCH v1 4/5] userfaultfd: zero access/write hints Nadav Amit
2022-06-23 23:34   ` Peter Xu
2022-06-22 18:50 ` [PATCH v1 5/5] selftest/userfaultfd: test read/write hints Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18BCC23E-B344-41A8-926D-A49D768485AF@vmware.com \
    --to=namit@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox