From: David Hildenbrand <david@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Nadav Amit <nadav.amit@gmail.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Ives van Hoorne <ives@codesandbox.io>,
Axel Rasmussen <axelrasmussen@google.com>,
Alistair Popple <apopple@nvidia.com>,
stable@vger.kernel.org
Subject: Re: [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte
Date: Fri, 2 Dec 2022 13:07:02 +0100 [thread overview]
Message-ID: <222fc0b2-6ec0-98e7-833f-ea868b248446@redhat.com> (raw)
In-Reply-To: <fc3e3497-053d-8e50-a504-764317b6a49a@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3243 bytes --]
On 02.12.22 12:03, David Hildenbrand wrote:
> On 01.12.22 23:30, Andrew Morton wrote:
>> On Thu, 1 Dec 2022 16:42:52 +0100 David Hildenbrand <david@redhat.com> wrote:
>>
>>> On 01.12.22 16:28, Peter Xu wrote:
>>>>
>>>> I didn't reply here because I have already replied with the question in
>>>> previous version with a few attempts. Quotting myself:
>>>>
>>>> https://lore.kernel.org/all/Y3KgYeMTdTM0FN5W@x1n/
>>>>
>>>> The thing is recovering the pte into its original form is the
>>>> safest approach to me, so I think we need justification on why it's
>>>> always safe to set the write bit.
>>>>
>>>> I've also got another longer email trying to explain why I think it's the
>>>> other way round to be justfied, rather than justifying removal of the write
>>>> bit for a read migration entry, here:
>>>>
>>>
>>> And I disagree for this patch that is supposed to fix this hunk:
>>>
>>>
>>> @@ -243,11 +243,15 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma,
>>> entry = pte_to_swp_entry(*pvmw.pte);
>>> if (is_write_migration_entry(entry))
>>> pte = maybe_mkwrite(pte, vma);
>>> + else if (pte_swp_uffd_wp(*pvmw.pte))
>>> + pte = pte_mkuffd_wp(pte);
>>>
>>> if (unlikely(is_zone_device_page(new))) {
>>> if (is_device_private_page(new)) {
>>> entry = make_device_private_entry(new, pte_write(pte));
>>> pte = swp_entry_to_pte(entry);
>>> + if (pte_swp_uffd_wp(*pvmw.pte))
>>> + pte = pte_mkuffd_wp(pte);
>>> }
>>> }
>>
>> David, I'm unclear on what you mean by the above. Can you please
>> expand?
>>
>>>
>>> There is really nothing to justify the other way around here.
>>> If it's broken fix it independently and properly backport it independenty.
>>>
>>> But we don't know about any such broken case.
>>>
>>> I have no energy to spare to argue further ;)
>>
>> This is a silent data loss bug, which is about as bad as it gets.
>> Under obscure conditions, fortunately. But please let's keep working
>> it. Let's aim for something minimal for backporting purposes. We can
>> revisit any cleanliness issues later.
>
> Okay, you activated my energy reserves.
>
>>
>> David, do you feel that the proposed fix will at least address the bug
>> without adverse side-effects?
>
> Usually, when I suspect something is dodgy I unconsciously push back
> harder than I usually would.
>
> I just looked into the issue once again and realized that this patch
> here (and also my alternative proposal) most likely tackles the
> more-generic issue from the wrong direction. I found yet another such
> bug (most probably two, just too lazy to write another reproducer).
> Migration code does the right thing here -- IMHO -- and the issue should
> be fixed differently.
>
> I'm testing an alternative patch right now and will share it later
> today, along with a reproducer.
>
mprotect() reproducer attached.
--
Thanks,
David / dhildenb
[-- Attachment #2: uffd-wp-mprotect.c --]
[-- Type: text/x-csrc, Size: 3186 bytes --]
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <poll.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <linux/memfd.h>
#include <linux/userfaultfd.h>
size_t pagesize;
int uffd;
static void *uffd_thread_fn(void *arg)
{
static struct uffd_msg msg;
ssize_t nread;
while (1) {
struct pollfd pollfd;
int nready;
pollfd.fd = uffd;
pollfd.events = POLLIN;
nready = poll(&pollfd, 1, -1);
if (nready == -1) {
fprintf(stderr, "poll() failed: %d\n", errno);
exit(1);
}
nread = read(uffd, &msg, sizeof(msg));
if (nread <= 0)
continue;
if (msg.event != UFFD_EVENT_PAGEFAULT ||
!(msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP)) {
printf("FAIL: wrong uffd-wp event fired\n");
exit(1);
}
printf("PASS: uffd-wp fired\n");
exit(0);
}
}
static int setup_uffd(char *map)
{
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
struct uffdio_range uffd_range;
pthread_t thread;
uffd = syscall(__NR_userfaultfd,
O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
if (uffd < 0) {
fprintf(stderr, "syscall() failed: %d\n", errno);
return -errno;
}
uffdio_api.api = UFFD_API;
uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
return -errno;
}
if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n");
return -ENOSYS;
}
uffdio_register.range.start = (unsigned long) map;
uffdio_register.range.len = pagesize;
uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
return -errno;
}
pthread_create(&thread, NULL, uffd_thread_fn, NULL);
return 0;
}
int main(int argc, char **argv)
{
struct uffdio_writeprotect uffd_writeprotect;
char *map;
int fd;
pagesize = getpagesize();
fd = memfd_create("test", 0);
if (fd < 0) {
fprintf(stderr, "memfd_create() failed\n");
return -errno;
}
if (ftruncate(fd, pagesize)) {
fprintf(stderr, "ftruncate() failed\n");
return -errno;
}
/* Start out without write protection. */
map = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -errno;
}
if (setup_uffd(map))
return 1;
/* Populate a page ... */
memset(map, 0, pagesize);
/* ... and write-protect it using uffd-wp. */
uffd_writeprotect.range.start = (unsigned long) map;
uffd_writeprotect.range.len = pagesize;
uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP;
if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) {
fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno);
return -errno;
}
/* Write-protect the whole mapping temporarily. */
mprotect(map, pagesize, PROT_READ);
mprotect(map, pagesize, PROT_READ|PROT_WRITE);
/* Test if uffd-wp fires. */
memset(map, 1, pagesize);
printf("FAIL: uffd-wp did not fire\n");
return 1;
}
next prev parent reply other threads:[~2022-12-02 12:07 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-14 0:04 [PATCH v3 0/2] mm/migrate: Fix writable pte for read migration entry Peter Xu
2022-11-14 0:04 ` [PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte Peter Xu
2022-11-15 18:17 ` David Hildenbrand
2022-11-30 22:24 ` Andrew Morton
2022-12-01 15:28 ` Peter Xu
2022-12-01 15:42 ` David Hildenbrand
2022-12-01 22:30 ` Andrew Morton
2022-12-02 11:03 ` David Hildenbrand
2022-12-02 12:07 ` David Hildenbrand [this message]
2022-12-02 15:14 ` Peter Xu
2022-12-02 15:40 ` David Hildenbrand
2022-12-02 17:18 ` David Hildenbrand
2022-11-14 0:04 ` [PATCH v3 2/2] mm/uffd: Sanity check write bit for uffd-wp protected ptes Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=222fc0b2-6ec0-98e7-833f-ea868b248446@redhat.com \
--to=david@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=ives@codesandbox.io \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox