From: Nadav Amit <namit@vmware.com>
To: Peter Xu <peterx@redhat.com>
Cc: Linux MM <linux-mm@kvack.org>,
Mike Kravetz <mike.kravetz@oracle.com>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
David Hildenbrand <david@redhat.com>,
Mike Rapoport <rppt@linux.ibm.com>,
Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations
Date: Tue, 12 Jul 2022 06:19:08 +0000 [thread overview]
Message-ID: <5D85870C-CBDF-45F7-A3A5-5F889521BE41@vmware.com> (raw)
In-Reply-To: <20220622185038.71740-3-namit@vmware.com>
On Jun 22, 2022, at 11:50 AM, Nadav Amit <nadav.amit@gmail.com> wrote:
> From: Nadav Amit <namit@vmware.com>
>
> Using a PTE on x86 with cleared access-bit (aka young-bit)
> takes ~600 cycles more than when the access bit is set. At the same
> time, setting the access-bit for memory that is not used (e.g.,
> prefetched) can introduce greater overheads, as the prefetched memory is
> reclaimed later than it should be.
>
> Userfaultfd currently does not set the access-bit (excluding the
> huge-pages case). Arguably, it is best to let the user control whether
> the access bit should be set or not. The expected use is to request
> userfaultfd to set the access-bit when the copy/wp operation is done to
> resolve a page-fault, and not to set the access-bit when the memory is
> prefetched.
>
> Introduce UFFDIO_[op]_ACCESS_LIKELY to enable userspace to request the
> young bit to be set.
I reply to my own email, but this mostly addresses the concerns that Peter
has raised.
So I ran the test below on my Haswell (x86), which showed two things:
1. Accessing an address using a clean PTE or old PTE takes ~500 cycles
more than with dirty+young (depending on the access, of course: dirty
does not matter for read, dirty+young both matter for write).
2. I made a mistake in my implementation. PTEs are - at least on x86 -
created as young with mk_pte(). So the logic should be similar to
do_set_pte():
if (prefault && arch_wants_old_prefaulted_pte())
entry = pte_mkold(entry);
else
entry = pte_sw_mkyoung(entry);
Based on these results, I will send another version for both young and
dirty. Let me know if these results are not convincing.
I will add, as we discussed (well, I think I raised these things, so
hopefully you agree):
1. On x86, avoid flush if changing WP->RO and PTE is clean.
2. When write-unprotecting entry, if PTE is exclusive, set it as writable.
[ I considered not setting it as writable if write-hint is not provided, but
with the change in (1), it does not provide any real value. ]
---
#define _GNU_SOURCE
#include <sys/types.h>
#include <stdio.h>
#include <linux/userfaultfd.h>
#include <pthread.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <signal.h>
#include <poll.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <stdint.h>
#include <stdbool.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
static inline uint64_t rdtscp(void)
{
uint64_t rax, rdx;
uint32_t aux;
asm volatile ("rdtscp" : "=a" (rax), "=d" (rdx), "=c" (aux):: "memory");
}
int main(int argc, char *argv[])
{
long uffd; /* userfaultfd file descriptor */
char *addr; /* Start of region handled by userfaultfd */
unsigned long len; /* Length of region handled by userfaultfd */
pthread_t thr; /* ID of thread that handles page faults */
bool young, dirty, write;
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
int l;
static char *page = NULL;
struct uffdio_copy uffdio_copy;
ssize_t nread;
int page_size;
if (argc != 5) {
fprintf(stderr, "Usage: %s [num-pages] [write] [young] [dirty]\n", argv[0]);
exit(EXIT_FAILURE);
}
page_size = sysconf(_SC_PAGE_SIZE);
len = strtoul(argv[1], NULL, 0) * page_size;
write = !!strtoul(argv[2], NULL, 0);
young = !!strtoul(argv[3], NULL, 0);
dirty = !!strtoul(argv[4], NULL, 0);
page = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (page == MAP_FAILED)
errExit("mmap");
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
if (uffd == -1)
errExit("userfaultfd");
uffdio_api.api = UFFD_API;
uffdio_api.features = (1<<11); //UFFD_FEATURE_EXACT_ADDRESS;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
errExit("ioctl-UFFDIO_API");
addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED)
errExit("mmap");
uffdio_register.range.start = (unsigned long) addr;
uffdio_register.range.len = len;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1)
errExit("ioctl-UFFDIO_REGISTER");
uffdio_copy.src = (unsigned long) page;
uffdio_copy.mode = 0;
if (young)
uffdio_copy.mode |= (1ul << 2);
if (dirty)
uffdio_copy.mode |= (1ul << 3);
uffdio_copy.len = page_size;
uffdio_copy.copy = 0;
for (l = 0; l < len; l += page_size) {
uffdio_copy.dst = (unsigned long)(&addr[l]);
if (ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1)
errExit("ioctl-UFFDIO_COPY");
}
for (l = 0; l < len; l += page_size) {
char c;
uint64_t start;
start = rdtscp();
if (write)
addr[l] = 5;
else
c = *(volatile char *)(&addr[l]);
printf("%ld\n", rdtscp() - start);
}
exit(EXIT_SUCCESS);
}
next prev parent reply other threads:[~2022-07-12 6:19 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-22 18:50 [PATCH v1 0/5] userfaultfd: support access/write hints Nadav Amit
2022-06-22 18:50 ` [PATCH v1 1/5] userfaultfd: introduce uffd_flags Nadav Amit
2022-06-23 21:57 ` Peter Xu
2022-06-23 22:04 ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Nadav Amit
2022-06-23 23:24 ` Peter Xu
2022-06-23 23:35 ` Nadav Amit
2022-06-23 23:49 ` Peter Xu
2022-06-24 0:03 ` Nadav Amit
2022-06-24 2:05 ` Peter Xu
2022-06-24 2:42 ` Nadav Amit
2022-06-24 21:58 ` Peter Xu
2022-06-24 22:17 ` Peter Xu
2022-06-25 7:49 ` Nadav Amit
2022-06-27 13:12 ` Peter Xu
2022-06-27 13:27 ` David Hildenbrand
2022-06-27 14:59 ` Peter Xu
2022-06-27 23:37 ` Nadav Amit
2022-06-28 10:55 ` David Hildenbrand
2022-06-28 19:15 ` Peter Xu
2022-06-28 20:30 ` Nadav Amit
2022-06-28 20:56 ` Peter Xu
2022-06-28 21:03 ` Nadav Amit
2022-06-28 21:12 ` Peter Xu
2022-06-28 21:15 ` Nadav Amit
2022-07-12 6:19 ` Nadav Amit [this message]
2022-07-12 14:56 ` Peter Xu
2022-07-13 1:09 ` Nadav Amit
2022-07-13 16:02 ` Peter Xu
2022-07-13 16:49 ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 3/5] userfaultfd: introduce write-likely mode for uffd operations Nadav Amit
2022-06-22 18:50 ` [PATCH v1 4/5] userfaultfd: zero access/write hints Nadav Amit
2022-06-23 23:34 ` Peter Xu
2022-06-22 18:50 ` [PATCH v1 5/5] selftest/userfaultfd: test read/write hints Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5D85870C-CBDF-45F7-A3A5-5F889521BE41@vmware.com \
--to=namit@vmware.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox