linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Peter Xu <peterx@redhat.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	David Hildenbrand <david@redhat.com>,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations
Date: Fri, 24 Jun 2022 02:42:21 +0000	[thread overview]
Message-ID: <6EF7D3B4-CF17-407B-A50F-B14D595E99A5@vmware.com> (raw)
In-Reply-To: <YrUb0oS/6/opb5hh@xz-m1.local>



> On Jun 23, 2022, at 7:05 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> On Fri, Jun 24, 2022 at 12:03:38AM +0000, Nadav Amit wrote:
>> My take is that hints are hints. Following David’s (or was it yours?)
>> feedback, I fixed the description to indicate that this is merely a hint and
>> removed all references to dirty/access bits. The kernel therefore can ignore
>> the hint when it wants to or use it in any other way. I fully agree that
>> this gives the kernel the ability to change the behavior as needed.
>> 
>> Note that for write-protected 4KB zero-page (where we share the zero-page)
>> we always set the access-bit, regardless of the hint, because it makes
>> sense: the zero-page is not swappable and therefore the access-bit is set.
> 
> The zero-page example makes sense, and yeah that makes the hugetlb behavior
> making more sense too.
> 
>> 
>> I think that the lesser user-facing documentation there is on how the
>> feature is *exactly* used by the kernel - is better from an API point of
>> view.
>> 
>> So I see no reason to fail or be forced not to set a page as young, just
>> because a hint was *not* provided. This would even be a regression in the
>> behavior. The hint is actually always respected right now, it is just that
>> even if you do not provide the hint, the access/dirty is set.
>> 
>> The only consistency I think worth thinking about is with the dirty-bit, and
>> I can add it if you want. Note that the access-bit (in x86) might be set
>> speculatively in contrast to the dirty-bit is only set atomically with a
>> real access. That’s the reason I think it may make sense not to set the
>> dirty without a hint.
> 
> Sorry to ask if this is (another) naive question: any link/help to explain
> the speculative behavior on access bit? Is it part of speculative
> execution (which, iiuc, would it be reverted if the speculation failed)?

Oh man, it is hard to find a reference. I made this claim it based on my
recollection (and logic).

The access-bit on Intel is set when the PTE is loaded into the TLB, so if you
allow speculative loading of the TLB, that’s what you get.

Googling shows Yu Zhao saying: "IIRC, there are also false positives, i.e.,
the accessed bit is set on entries used by speculative execution only.” [1]

Intel SDM says: "Whenever the processor uses a paging-structure entry as part
of linear-address translation, it sets the accessed flag in that entry...
Whenever there is a write to a linear address, the processor sets the dirty
flag (if it is not already set) in the paging- structure entry..."

You can argue that this indicates that the access-bit is updated
speculatively (translations can be speculative) and dirty-bit is on actual
write. But it is somewhat of a creative reading.

Googling further did not help much, but I found a relevant discussion on
RISC-V, in which they actually consider a similar behavior. [2]

If you want (and care), we can cc Dave Hansen to get a clear answer.

[1] https://lore.kernel.org/lkml/YE7Rk%2FYA1Uj7yFn2@google.com/
[2] https://lists.riscv.org/g/tech-virt-mem/topic/accessed_bit/77699883?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,1,80,77699883

> 
>> 
>> Is that acceptable? Access-bit always set, dirty-bit according to hint?
> 
> I'm still trying to digest what you said above, sorry.
> 
> Aren't both access and dirty bits need an atomic op to be set anyway? Then
> from perf pov should we simply keep setting them both too like what you did
> with this version? because it seems that'll always avoid an extra pgtable
> update access?

I guess by atomic-op you mean atomic-update by the hardware AD-assist.

I agree that if a page is written, the bits would need to be updated and
these would introduce an overhead. However, if the page cannot be written,
well, the dirty bit would never be set.

hugetlb_mcopy_atomic_pte() currently does the following:

        _dst_pte = huge_pte_mkdirty(_dst_pte);
        _dst_pte = pte_mkyoung(_dst_pte);

        if (wp_copy)
                _dst_pte = huge_pte_mkuffd_wp(_dst_pte);

Since you asked to update hugetlb_mcopy_atomic_pte(), I can offer three
options:

1. Do not set dirty if (wp_copy).
2. Do not set dirty if (wp_copy || !write_hint) 
3. Keep it as is.

I am fine with whatever would make you happy. :)


  reply	other threads:[~2022-06-24  2:42 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 18:50 [PATCH v1 0/5] userfaultfd: support access/write hints Nadav Amit
2022-06-22 18:50 ` [PATCH v1 1/5] userfaultfd: introduce uffd_flags Nadav Amit
2022-06-23 21:57   ` Peter Xu
2022-06-23 22:04     ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Nadav Amit
2022-06-23 23:24   ` Peter Xu
2022-06-23 23:35     ` Nadav Amit
2022-06-23 23:49       ` Peter Xu
2022-06-24  0:03         ` Nadav Amit
2022-06-24  2:05           ` Peter Xu
2022-06-24  2:42             ` Nadav Amit [this message]
2022-06-24 21:58               ` Peter Xu
2022-06-24 22:17                 ` Peter Xu
2022-06-25  7:49                   ` Nadav Amit
2022-06-27 13:12                     ` Peter Xu
2022-06-27 13:27                       ` David Hildenbrand
2022-06-27 14:59                         ` Peter Xu
2022-06-27 23:37                       ` Nadav Amit
2022-06-28 10:55                         ` David Hildenbrand
2022-06-28 19:15                         ` Peter Xu
2022-06-28 20:30                           ` Nadav Amit
2022-06-28 20:56                             ` Peter Xu
2022-06-28 21:03                               ` Nadav Amit
2022-06-28 21:12                                 ` Peter Xu
2022-06-28 21:15                                   ` Nadav Amit
2022-07-12  6:19   ` Nadav Amit
2022-07-12 14:56     ` Peter Xu
2022-07-13  1:09       ` Nadav Amit
2022-07-13 16:02         ` Peter Xu
2022-07-13 16:49           ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 3/5] userfaultfd: introduce write-likely mode for uffd operations Nadav Amit
2022-06-22 18:50 ` [PATCH v1 4/5] userfaultfd: zero access/write hints Nadav Amit
2022-06-23 23:34   ` Peter Xu
2022-06-22 18:50 ` [PATCH v1 5/5] selftest/userfaultfd: test read/write hints Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6EF7D3B4-CF17-407B-A50F-B14D595E99A5@vmware.com \
    --to=namit@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox