From: Nadav Amit <namit@vmware.com>
To: Peter Xu <peterx@redhat.com>
Cc: Linux MM <linux-mm@kvack.org>,
Mike Kravetz <mike.kravetz@oracle.com>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
David Hildenbrand <david@redhat.com>,
Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations
Date: Fri, 24 Jun 2022 00:03:38 +0000 [thread overview]
Message-ID: <18BCC23E-B344-41A8-926D-A49D768485AF@vmware.com> (raw)
In-Reply-To: <YrT8ISwJ+3x6nIKz@xz-m1.local>
On Jun 23, 2022, at 4:49 PM, Peter Xu <peterx@redhat.com> wrote:
> On Thu, Jun 23, 2022 at 11:35:00PM +0000, Nadav Amit wrote:
>> On Jun 23, 2022, at 4:24 PM, Peter Xu <peterx@redhat.com> wrote:
>>
>>> ⚠ External Email
>>>
>>> On Wed, Jun 22, 2022 at 11:50:35AM -0700, Nadav Amit wrote:
>>>> From: Nadav Amit <namit@vmware.com>
>>>>
>>>> Using a PTE on x86 with cleared access-bit (aka young-bit)
>>>> takes ~600 cycles more than when the access bit is set. At the same
>>>> time, setting the access-bit for memory that is not used (e.g.,
>>>> prefetched) can introduce greater overheads, as the prefetched memory is
>>>> reclaimed later than it should be.
>>>>
>>>> Userfaultfd currently does not set the access-bit (excluding the
>>>> huge-pages case). Arguably, it is best to let the user control whether
>>>> the access bit should be set or not. The expected use is to request
>>>> userfaultfd to set the access-bit when the copy/wp operation is done to
>>>> resolve a page-fault, and not to set the access-bit when the memory is
>>>> prefetched.
>>>>
>>>> Introduce UFFDIO_[op]_ACCESS_LIKELY to enable userspace to request the
>>>> young bit to be set.
>>>>
>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>> Cc: Hugh Dickins <hughd@google.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Axel Rasmussen <axelrasmussen@google.com>
>>>> Cc: Peter Xu <peterx@redhat.com>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>
>>> Hmm.. is the hugetlb code overlooked (for both of the hints), or maybe I
>>> missed it? Do we need to cover them too? Thanks,
>>
>> I do not know what the value of not setting the PTE’s access/dirty when it
>> comes to performance. The pages won’t be swapped out, just as you wrote in
>> your comment in hugetlb_mcopy_atomic_pte():
>>
>> /*
>> * Always mark UFFDIO_COPY page dirty; note that this may not be
>> * extremely important for hugetlbfs for now since swapping is not
>> * supported, but we should still be clear in that this page cannot be
>> * thrown away at will, even if write bit not set.
>> */
>> _dst_pte = huge_pte_mkdirty(_dst_pte);
>> _dst_pte = pte_mkyoung(_dst_pte);
>>
>> If you want for consistency/robustness not to set dirty on read-only
>> entries, that’s something that I can do.
>
> We have more than one options here I guess.
>
> We could have the hints not available for hugetlbfs, but then we'll both
> need to add special document for hugetlbfs (when you write the man-page,
> let's say) and also it'll be probably better to fail the ioctls with hints
> when applying to hugetlb vmas.
>
> Leaving them silent to hugetlbfs is another option, it just sounds weird to
> me without any kind of documentation so I like this one least.
>
> Or we could make them work for hugetlbfs too. Not saying that swap will
> work some day (but it still could?!) it just seems all consistent as you
> said.
>
> So I'd prefer the last one, but only if you agree.
My take is that hints are hints. Following David’s (or was it yours?)
feedback, I fixed the description to indicate that this is merely a hint and
removed all references to dirty/access bits. The kernel therefore can ignore
the hint when it wants to or use it in any other way. I fully agree that
this gives the kernel the ability to change the behavior as needed.
Note that for write-protected 4KB zero-page (where we share the zero-page)
we always set the access-bit, regardless of the hint, because it makes
sense: the zero-page is not swappable and therefore the access-bit is set.
I think that the lesser user-facing documentation there is on how the
feature is *exactly* used by the kernel - is better from an API point of
view.
So I see no reason to fail or be forced not to set a page as young, just
because a hint was *not* provided. This would even be a regression in the
behavior. The hint is actually always respected right now, it is just that
even if you do not provide the hint, the access/dirty is set.
The only consistency I think worth thinking about is with the dirty-bit, and
I can add it if you want. Note that the access-bit (in x86) might be set
speculatively in contrast to the dirty-bit is only set atomically with a
real access. That’s the reason I think it may make sense not to set the
dirty without a hint.
Is that acceptable? Access-bit always set, dirty-bit according to hint?
next prev parent reply other threads:[~2022-06-24 0:03 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-22 18:50 [PATCH v1 0/5] userfaultfd: support access/write hints Nadav Amit
2022-06-22 18:50 ` [PATCH v1 1/5] userfaultfd: introduce uffd_flags Nadav Amit
2022-06-23 21:57 ` Peter Xu
2022-06-23 22:04 ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Nadav Amit
2022-06-23 23:24 ` Peter Xu
2022-06-23 23:35 ` Nadav Amit
2022-06-23 23:49 ` Peter Xu
2022-06-24 0:03 ` Nadav Amit [this message]
2022-06-24 2:05 ` Peter Xu
2022-06-24 2:42 ` Nadav Amit
2022-06-24 21:58 ` Peter Xu
2022-06-24 22:17 ` Peter Xu
2022-06-25 7:49 ` Nadav Amit
2022-06-27 13:12 ` Peter Xu
2022-06-27 13:27 ` David Hildenbrand
2022-06-27 14:59 ` Peter Xu
2022-06-27 23:37 ` Nadav Amit
2022-06-28 10:55 ` David Hildenbrand
2022-06-28 19:15 ` Peter Xu
2022-06-28 20:30 ` Nadav Amit
2022-06-28 20:56 ` Peter Xu
2022-06-28 21:03 ` Nadav Amit
2022-06-28 21:12 ` Peter Xu
2022-06-28 21:15 ` Nadav Amit
2022-07-12 6:19 ` Nadav Amit
2022-07-12 14:56 ` Peter Xu
2022-07-13 1:09 ` Nadav Amit
2022-07-13 16:02 ` Peter Xu
2022-07-13 16:49 ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 3/5] userfaultfd: introduce write-likely mode for uffd operations Nadav Amit
2022-06-22 18:50 ` [PATCH v1 4/5] userfaultfd: zero access/write hints Nadav Amit
2022-06-23 23:34 ` Peter Xu
2022-06-22 18:50 ` [PATCH v1 5/5] selftest/userfaultfd: test read/write hints Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18BCC23E-B344-41A8-926D-A49D768485AF@vmware.com \
--to=namit@vmware.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox