linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ives van Hoorne <ives@codesandbox.io>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Alistair Popple <apopple@nvidia.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte
Date: Fri, 2 Dec 2022 13:07:02 +0100	[thread overview]
Message-ID: <222fc0b2-6ec0-98e7-833f-ea868b248446@redhat.com> (raw)
In-Reply-To: <fc3e3497-053d-8e50-a504-764317b6a49a@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3243 bytes --]

On 02.12.22 12:03, David Hildenbrand wrote:
> On 01.12.22 23:30, Andrew Morton wrote:
>> On Thu, 1 Dec 2022 16:42:52 +0100 David Hildenbrand <david@redhat.com> wrote:
>>
>>> On 01.12.22 16:28, Peter Xu wrote:
>>>>
>>>> I didn't reply here because I have already replied with the question in
>>>> previous version with a few attempts.  Quotting myself:
>>>>
>>>> https://lore.kernel.org/all/Y3KgYeMTdTM0FN5W@x1n/
>>>>
>>>>            The thing is recovering the pte into its original form is the
>>>>            safest approach to me, so I think we need justification on why it's
>>>>            always safe to set the write bit.
>>>>
>>>> I've also got another longer email trying to explain why I think it's the
>>>> other way round to be justfied, rather than justifying removal of the write
>>>> bit for a read migration entry, here:
>>>>
>>>
>>> And I disagree for this patch that is supposed to fix this hunk:
>>>
>>>
>>> @@ -243,11 +243,15 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma,
>>>                    entry = pte_to_swp_entry(*pvmw.pte);
>>>                    if (is_write_migration_entry(entry))
>>>                            pte = maybe_mkwrite(pte, vma);
>>> +               else if (pte_swp_uffd_wp(*pvmw.pte))
>>> +                       pte = pte_mkuffd_wp(pte);
>>>     
>>>                    if (unlikely(is_zone_device_page(new))) {
>>>                            if (is_device_private_page(new)) {
>>>                                    entry = make_device_private_entry(new, pte_write(pte));
>>>                                    pte = swp_entry_to_pte(entry);
>>> +                               if (pte_swp_uffd_wp(*pvmw.pte))
>>> +                                       pte = pte_mkuffd_wp(pte);
>>>                            }
>>>                    }
>>
>> David, I'm unclear on what you mean by the above.  Can you please
>> expand?
>>
>>>
>>> There is really nothing to justify the other way around here.
>>> If it's broken fix it independently and properly backport it independenty.
>>>
>>> But we don't know about any such broken case.
>>>
>>> I have no energy to spare to argue further ;)
>>
>> This is a silent data loss bug, which is about as bad as it gets.
>> Under obscure conditions, fortunately.  But please let's keep working
>> it.  Let's aim for something minimal for backporting purposes.  We can
>> revisit any cleanliness issues later.
> 
> Okay, you activated my energy reserves.
> 
>>
>> David, do you feel that the proposed fix will at least address the bug
>> without adverse side-effects?
> 
> Usually, when I suspect something is dodgy I unconsciously push back
> harder than I usually would.
> 
> I just looked into the issue once again and realized that this patch
> here (and also my alternative proposal) most likely tackles the
> more-generic issue from the wrong direction. I found yet another such
> bug (most probably two, just too lazy to write another reproducer).
> Migration code does the right thing here -- IMHO -- and the issue should
> be fixed differently.
> 
> I'm testing an alternative patch right now and will share it later
> today, along with a reproducer.
> 

mprotect() reproducer attached.

-- 
Thanks,

David / dhildenb

[-- Attachment #2: uffd-wp-mprotect.c --]
[-- Type: text/x-csrc, Size: 3186 bytes --]

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <poll.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <linux/memfd.h>
#include <linux/userfaultfd.h>

size_t pagesize;
int uffd;

static void *uffd_thread_fn(void *arg)
{
	static struct uffd_msg msg;
	ssize_t nread;

	while (1) {
		struct pollfd pollfd;
		int nready;

		pollfd.fd = uffd;
		pollfd.events = POLLIN;
		nready = poll(&pollfd, 1, -1);
		if (nready == -1) {
			fprintf(stderr, "poll() failed: %d\n", errno);
			exit(1);
		}

		nread = read(uffd, &msg, sizeof(msg));
		if (nread <= 0)
			continue;

		if (msg.event != UFFD_EVENT_PAGEFAULT ||
		    !(msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP)) {
			printf("FAIL: wrong uffd-wp event fired\n");
			exit(1);
		}

		printf("PASS: uffd-wp fired\n");
		exit(0);
	}
}

static int setup_uffd(char *map)
{
	struct uffdio_api uffdio_api;
	struct uffdio_register uffdio_register;
	struct uffdio_range uffd_range;
	pthread_t thread;

	uffd = syscall(__NR_userfaultfd,
		       O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
	if (uffd < 0) {
		fprintf(stderr, "syscall() failed: %d\n", errno);
		return -errno;
	}

	uffdio_api.api = UFFD_API;
	uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
	if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
		fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
		return -errno;
	}

	if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
		fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n");
		return -ENOSYS;
	}

	uffdio_register.range.start = (unsigned long) map;
	uffdio_register.range.len = pagesize;
	uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
	if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
		fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
		return -errno;
	}

	pthread_create(&thread, NULL, uffd_thread_fn, NULL);

	return 0;
}

int main(int argc, char **argv)
{
	struct uffdio_writeprotect uffd_writeprotect;
	char *map;
	int fd;

	pagesize = getpagesize();
	fd = memfd_create("test", 0);
	if (fd < 0) {
		fprintf(stderr, "memfd_create() failed\n");
		return -errno;
	}
	if (ftruncate(fd, pagesize)) {
		fprintf(stderr, "ftruncate() failed\n");
		return -errno;
	}

	/* Start out without write protection. */
	map = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	if (map == MAP_FAILED) {
		fprintf(stderr, "mmap() failed\n");
		return -errno;
	}

	if (setup_uffd(map))
		return 1;

	/* Populate a page ... */
	memset(map, 0, pagesize);

	/* ... and write-protect it using uffd-wp. */
	uffd_writeprotect.range.start = (unsigned long) map;
	uffd_writeprotect.range.len = pagesize;
	uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP;
	if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) {
		fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno);
		return -errno;
	}

	/* Write-protect the whole mapping temporarily. */
	mprotect(map, pagesize, PROT_READ);
	mprotect(map, pagesize, PROT_READ|PROT_WRITE);

	/* Test if uffd-wp fires. */
	memset(map, 1, pagesize);

	printf("FAIL: uffd-wp did not fire\n");
	return 1;
}

  reply	other threads:[~2022-12-02 12:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-14  0:04 [PATCH v3 0/2] mm/migrate: Fix writable pte for read migration entry Peter Xu
2022-11-14  0:04 ` [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte Peter Xu
2022-11-15 18:17   ` David Hildenbrand
2022-11-30 22:24     ` Andrew Morton
2022-12-01 15:28       ` Peter Xu
2022-12-01 15:42         ` David Hildenbrand
2022-12-01 22:30           ` Andrew Morton
2022-12-02 11:03             ` David Hildenbrand
2022-12-02 12:07               ` David Hildenbrand [this message]
2022-12-02 15:14                 ` Peter Xu
2022-12-02 15:40                   ` David Hildenbrand
2022-12-02 17:18             ` David Hildenbrand
2022-11-14  0:04 ` [PATCH v3 2/2] mm/uffd: Sanity check write bit for uffd-wp protected ptes Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=222fc0b2-6ec0-98e7-833f-ea868b248446@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=ives@codesandbox.io \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox