From: David Stevens <stevensd@chromium.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/khugepaged: skip shmem with armed userfaultfd
Date: Thu, 2 Feb 2023 18:30:38 +0900 [thread overview]
Message-ID: <CAD=HUj4=okVbwROVTCPb_WndA57WPK6hYE68un_M8miDbhCGNg@mail.gmail.com> (raw)
In-Reply-To: <20230201230943.fg2q6fmvu7gggxar@box.shutemov.name>
On Thu, Feb 2, 2023 at 8:09 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Wed, Feb 01, 2023 at 12:41:37PM +0900, David Stevens wrote:
> > From: David Stevens <stevensd@chromium.org>
> >
> > Collapsing memory in a vma that has an armed userfaultfd results in
> > zero-filling any missing pages, which breaks user-space paging for those
> > filled pages. Avoid khugepage bypassing userfaultfd by not collapsing
> > pages in shmem reached via scanning a vma with an armed userfaultfd if
> > doing so would zero-fill any pages.
>
> Could you elaborate on the failure? Will zero-filling the page prevent
> userfaultfd from catching future access?
Yes, zero-filling the page causes future major faults to be lost,
since it populates the pages in the backing shmem. The path for
anonymous memory in khugepaged does properly handle userfaultfd_armed,
but the path for shmem does not.
> A test-case would help a lot.
Here's a sample program that demonstrates the issue. On a v6.1 kernel,
no major fault is observed by the monitor thread. I used MADV_COLLAPSE
to exercise khugepaged_scan_file, but you would get the same effect by
replacing the madvise with a sleep and waiting for khugepaged to scan
the test process.
#define _GNU_SOURCE
#include <inttypes.h>
#include <stdio.h>
#include <linux/userfaultfd.h>
#include <threads.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <assert.h>
int had_fault;
int monitor_thread(void *arg) {
int ret, uffd = (int) (uintptr_t) arg;
struct uffd_msg msg;
ret = read(uffd, &msg, sizeof(msg));
assert(ret > 0);
assert(msg.event == UFFD_EVENT_PAGEFAULT);
had_fault = 1;
struct uffdio_zeropage zeropage;
zeropage.range.start = msg.arg.pagefault.address & ~0xfff;
zeropage.range.len = 4096;
zeropage.mode = 0;
ret = ioctl(uffd, UFFDIO_ZEROPAGE, &zeropage);
assert(ret >= 0);
}
int main() {
int ret;
int uffd = syscall(__NR_userfaultfd, 0);
assert(uffd >= 0);
struct uffdio_api uffdio_api;
uffdio_api.api = UFFD_API;
uffdio_api.features = UFFD_FEATURE_MISSING_SHMEM;
ret = ioctl(uffd, UFFDIO_API, &uffdio_api);
assert(ret >= 0);
int memfd = memfd_create("memfd", MFD_CLOEXEC);
assert(memfd >= 0);
ret = ftruncate(memfd, 2 * 1024 * 1024);
assert(ret >= 0);
uint8_t *addr = mmap(NULL, 2 * 1024 * 1024, PROT_READ | PROT_WRITE,
MAP_SHARED, memfd, 0);
assert(addr != MAP_FAILED);
addr[0] = 0xff;
struct uffdio_register uffdio_register;
uffdio_register.range.start = (unsigned long) addr;
uffdio_register.range.len = 2 * 1024 * 1024;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
ret = ioctl(uffd, UFFDIO_REGISTER, &uffdio_register);
assert(ret >= 0);
thrd_t t;
ret = thrd_create(&t, monitor_thread, (void *) (uintptr_t) uffd);
assert(ret >= 0);
ret = madvise(addr, 2 * 1024 * 1024, 25 /* MADV_COLLAPSE */);
printf("madvise ret %d\n", ret);
addr[4096] = 0xff;
printf("%s major fault\n", had_fault ? "had" : "no");
return 0;
}
> And what prevents the same pages be filled (with zeros or otherwise) via
> write(2) bypassing VMA checks? I cannot immediately see it.
There isn't any such check. You can bypass userfaultfd on a shmem with
write syscalls, or simply by writing to the shmem through a different
vma. However, it is definitely possible for userspace to use shmem
plus userfaultfd in a safe way. And the kernel shouldn't come along
and break a setup that should be safe from the perspective of
userspace.
> BTW, there's already a check that prevent establishing PMD in the place if
> VM_UFFD_WP is set.
>
> Maybe just an update of the check in retract_page_tables() from
> userfaultfd_wp() to userfaultfd_armed() would be enough?
It seems like it will be a little more complicated than that, because
the target VM having an armed userfaultfd is a hard error condition.
However, it might not be that difficult to modify collapse_file to
rollback if necessary. I'll consider this approach for v2 of this
patch.
-David
> I have very limited understanding of userfaultfd(). Sorry in advance for
> stupid questions.
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
prev parent reply other threads:[~2023-02-02 9:30 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-01 3:41 David Stevens
2023-02-01 17:36 ` Yang Shi
2023-02-01 20:52 ` Peter Xu
2023-02-01 23:57 ` Yang Shi
2023-02-02 20:04 ` Peter Xu
2023-02-02 21:11 ` Yang Shi
2023-02-02 9:56 ` David Stevens
2023-02-02 17:40 ` Yang Shi
2023-02-02 20:22 ` Peter Xu
2023-02-03 6:09 ` David Stevens
2023-02-03 14:56 ` Peter Xu
2023-02-01 23:09 ` Kirill A. Shutemov
2023-02-02 9:30 ` David Stevens [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAD=HUj4=okVbwROVTCPb_WndA57WPK6hYE68un_M8miDbhCGNg@mail.gmail.com' \
--to=stevensd@chromium.org \
--cc=akpm@linux-foundation.org \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox