linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Peter Xu <peterx@redhat.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	David Hildenbrand <david@redhat.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations
Date: Tue, 12 Jul 2022 06:19:08 +0000	[thread overview]
Message-ID: <5D85870C-CBDF-45F7-A3A5-5F889521BE41@vmware.com> (raw)
In-Reply-To: <20220622185038.71740-3-namit@vmware.com>

On Jun 22, 2022, at 11:50 AM, Nadav Amit <nadav.amit@gmail.com> wrote:

> From: Nadav Amit <namit@vmware.com>
> 
> Using a PTE on x86 with cleared access-bit (aka young-bit)
> takes ~600 cycles more than when the access bit is set. At the same
> time, setting the access-bit for memory that is not used (e.g.,
> prefetched) can introduce greater overheads, as the prefetched memory is
> reclaimed later than it should be.
> 
> Userfaultfd currently does not set the access-bit (excluding the
> huge-pages case). Arguably, it is best to let the user control whether
> the access bit should be set or not. The expected use is to request
> userfaultfd to set the access-bit when the copy/wp operation is done to
> resolve a page-fault, and not to set the access-bit when the memory is
> prefetched.
> 
> Introduce UFFDIO_[op]_ACCESS_LIKELY to enable userspace to request the
> young bit to be set.

I reply to my own email, but this mostly addresses the concerns that Peter
has raised.

So I ran the test below on my Haswell (x86), which showed two things:

1. Accessing an address using a clean PTE or old PTE takes ~500 cycles
more than with dirty+young (depending on the access, of course: dirty
does not matter for read, dirty+young both matter for write).

2. I made a mistake in my implementation. PTEs are - at least on x86 -
created as young with mk_pte(). So the logic should be similar to
do_set_pte():

        if (prefault && arch_wants_old_prefaulted_pte())
                entry = pte_mkold(entry);
        else
                entry = pte_sw_mkyoung(entry);

Based on these results, I will send another version for both young and
dirty. Let me know if these results are not convincing.

I will add, as we discussed (well, I think I raised these things, so
hopefully you agree):

1. On x86, avoid flush if changing WP->RO and PTE is clean.

2. When write-unprotecting entry, if PTE is exclusive, set it as writable.
[ I considered not setting it as writable if write-hint is not provided, but
with the change in (1), it does not provide any real value. ]

---

#define _GNU_SOURCE
#include <sys/types.h>
#include <stdio.h>
#include <linux/userfaultfd.h>
#include <pthread.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <signal.h>
#include <poll.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <stdint.h>
#include <stdbool.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
		       } while (0)

static inline uint64_t rdtscp(void)
{
	uint64_t rax, rdx;
	uint32_t aux;
	asm volatile ("rdtscp" : "=a" (rax), "=d" (rdx), "=c" (aux):: "memory");
}

int main(int argc, char *argv[])
{
	long uffd;          /* userfaultfd file descriptor */
	char *addr;         /* Start of region handled by userfaultfd */
	unsigned long len;  /* Length of region handled by userfaultfd */
	pthread_t thr;      /* ID of thread that handles page faults */
	bool young, dirty, write;
	struct uffdio_api uffdio_api;
	struct uffdio_register uffdio_register;
	int l;
	static char *page = NULL;
	struct uffdio_copy uffdio_copy;
	ssize_t nread;
	int page_size;

	if (argc != 5) {
		fprintf(stderr, "Usage: %s [num-pages] [write] [young] [dirty]\n", argv[0]);
		exit(EXIT_FAILURE);
	}

	page_size = sysconf(_SC_PAGE_SIZE);
	len = strtoul(argv[1], NULL, 0) * page_size;
	write = !!strtoul(argv[2], NULL, 0);
	young = !!strtoul(argv[3], NULL, 0);
	dirty = !!strtoul(argv[4], NULL, 0);

	page = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
		   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

	if (page == MAP_FAILED)
		errExit("mmap");

	uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
	if (uffd == -1)
		errExit("userfaultfd");

	uffdio_api.api = UFFD_API;
	uffdio_api.features = (1<<11); //UFFD_FEATURE_EXACT_ADDRESS;
	if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
		errExit("ioctl-UFFDIO_API");

	addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	if (addr == MAP_FAILED)
		errExit("mmap");

	uffdio_register.range.start = (unsigned long) addr;
	uffdio_register.range.len = len;
	uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
	if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1)
		errExit("ioctl-UFFDIO_REGISTER");

	uffdio_copy.src = (unsigned long) page;
	uffdio_copy.mode = 0;
	if (young)
		uffdio_copy.mode |= (1ul << 2);
	if (dirty)
		uffdio_copy.mode |= (1ul << 3);

	uffdio_copy.len = page_size;
	uffdio_copy.copy = 0;

	for (l = 0; l < len; l += page_size) {
		uffdio_copy.dst = (unsigned long)(&addr[l]);
		if (ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1)
			errExit("ioctl-UFFDIO_COPY");
	}

	for (l = 0; l < len; l += page_size) {
		char c;
		uint64_t start;

		start = rdtscp();
		if (write)
			addr[l] = 5;
		else
			c = *(volatile char *)(&addr[l]);
		printf("%ld\n", rdtscp() - start);
	}

	exit(EXIT_SUCCESS);
}

  parent reply	other threads:[~2022-07-12  6:19 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 18:50 [PATCH v1 0/5] userfaultfd: support access/write hints Nadav Amit
2022-06-22 18:50 ` [PATCH v1 1/5] userfaultfd: introduce uffd_flags Nadav Amit
2022-06-23 21:57   ` Peter Xu
2022-06-23 22:04     ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Nadav Amit
2022-06-23 23:24   ` Peter Xu
2022-06-23 23:35     ` Nadav Amit
2022-06-23 23:49       ` Peter Xu
2022-06-24  0:03         ` Nadav Amit
2022-06-24  2:05           ` Peter Xu
2022-06-24  2:42             ` Nadav Amit
2022-06-24 21:58               ` Peter Xu
2022-06-24 22:17                 ` Peter Xu
2022-06-25  7:49                   ` Nadav Amit
2022-06-27 13:12                     ` Peter Xu
2022-06-27 13:27                       ` David Hildenbrand
2022-06-27 14:59                         ` Peter Xu
2022-06-27 23:37                       ` Nadav Amit
2022-06-28 10:55                         ` David Hildenbrand
2022-06-28 19:15                         ` Peter Xu
2022-06-28 20:30                           ` Nadav Amit
2022-06-28 20:56                             ` Peter Xu
2022-06-28 21:03                               ` Nadav Amit
2022-06-28 21:12                                 ` Peter Xu
2022-06-28 21:15                                   ` Nadav Amit
2022-07-12  6:19   ` Nadav Amit [this message]
2022-07-12 14:56     ` Peter Xu
2022-07-13  1:09       ` Nadav Amit
2022-07-13 16:02         ` Peter Xu
2022-07-13 16:49           ` Nadav Amit
2022-06-22 18:50 ` [PATCH v1 3/5] userfaultfd: introduce write-likely mode for uffd operations Nadav Amit
2022-06-22 18:50 ` [PATCH v1 4/5] userfaultfd: zero access/write hints Nadav Amit
2022-06-23 23:34   ` Peter Xu
2022-06-22 18:50 ` [PATCH v1 5/5] selftest/userfaultfd: test read/write hints Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D85870C-CBDF-45F7-A3A5-5F889521BE41@vmware.com \
    --to=namit@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox