From: James Houghton <jthoughton@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>,
Peter Xu <peterx@redhat.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] hugetlb: unshare some PMDs when splitting VMAs
Date: Wed, 4 Jan 2023 19:10:11 +0000 [thread overview]
Message-ID: <CADrL8HV73m0nVJOK3uv4sbyGKOVZhVxSv2+i4pUV7tozu6vW5Q@mail.gmail.com> (raw)
In-Reply-To: <Y7Sq+Rs9cpSaHZSk@monkey>
[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]
> > I'll see if I can confirm that this is indeed possible and send a
> > repro if it is.
>
> I think your analysis above is correct. The key being the failure to unshare
> in the non-PUD_SIZE vma after the split.
I do indeed hit the WARN_ON_ONCE (repro attached), and the MADV wasn't
even needed (the UFFDIO_REGISTER does the VMA split before "unsharing
all PMDs"). With the fix, we avoid the WARN_ON_ONCE, but the behavior
is still incorrect: I expect the address range to be write-protected,
but it isn't.
The reason why is that hugetlb_change_protection uses huge_pte_offset,
even if it's being called for a UFFDIO_WRITEPROTECT with
UFFDIO_WRITEPROTECT_MODE_WP. In that particular case, I'm pretty sure
we should be using huge_pte_alloc, but even so, it's not trivial to
get an allocation failure back up to userspace. The non-hugetlb
implementation of UFFDIO_WRITEPROTECT seems to also have this problem.
Peter, what do you think?
>
> To me, the fact it was somewhat difficult to come up with this scenario is an
> argument what we should just unshare at split time as you propose. Who
> knows what other issues may exist.
>
> > 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes") is the
> > commit that introduced the WARN_ON_ONCE; perhaps it's a good choice
> > for a Fixes: tag (if above is indeed true).
>
> If the key issue in your above scenario is indeed the failure of
> hugetlb_unshare_all_pmds in the non-PUD_SIZE vma, then perhaps we tag?
>
> 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when
> register wp")
SGTM. Thanks Mike.
[-- Attachment #2: pmd-share-repro.c --]
[-- Type: text/x-csrc, Size: 2302 bytes --]
#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
#include <linux/memfd.h>
#include <linux/mman.h>
#include <sys/mman.h>
#include <linux/userfaultfd.h>
#include <linux/errno.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#define PAGES_PER_GIG 512
#define HUGE_PAGE_SIZE (2UL << 20)
#define GIG_MASK ~(PAGES_PER_GIG * HUGE_PAGE_SIZE - 1)
void fault_in_write(char *mapping, size_t len)
{
volatile char *mapping_v = mapping;
for (size_t i = 0; i < len; i += 4096)
mapping_v[i] = 1;
}
int main()
{
int fd = memfd_create("test", MFD_HUGETLB);
size_t len = 2 * PAGES_PER_GIG * HUGE_PAGE_SIZE;
char *mapping;
char *mapping_to_use;
int uffd;
int pid;
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
struct uffdio_writeprotect uffdio_writeprotect;
if (ftruncate(fd, len)) {
perror("ftruncate failed");
return -1;
}
mapping = mmap(NULL, len, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0);
if (mapping == MAP_FAILED) {
perror("mmap failed");
return -1;
}
mapping_to_use = mapping;
if ((unsigned long)mapping & ~GIG_MASK) {
mapping_to_use = (char *)((unsigned long)mapping & GIG_MASK) + (len/2);
}
pid = fork();
if (pid < 0) {
perror("fork failed");
return -1;
}
fault_in_write(mapping, len);
if (pid > 0) {
int status;
waitpid(pid, &status, 0);
return WEXITSTATUS(status);
}
uffd = syscall(SYS_userfaultfd, O_CLOEXEC);
uffdio_api.api = UFFD_API;
uffdio_api.features = UFFD_FEATURE_SIGBUS;
uffdio_api.ioctls = 0;
if (ioctl(uffd, UFFDIO_API, &uffdio_api)) {
perror("UFFDIO_API failed");
return -1;
}
uffdio_register.range.start = (unsigned long)mapping_to_use;
uffdio_register.range.len = len/4;
uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) {
perror("UFFDIO_REGISTER failed");
return -1;
}
uffdio_writeprotect.range.start = (unsigned long)mapping_to_use;
uffdio_writeprotect.range.len = len/4;
uffdio_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP;
if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffdio_writeprotect)) {
perror("UFFDIO_WRITEPROTECT failed");
return -1;
}
/* If correct: should SIGBUS */
fault_in_write(mapping_to_use, 1);
printf("BUG: no sigbus\n");
return 0;
}
next prev parent reply other threads:[~2023-01-04 19:10 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-01 23:00 James Houghton
2023-01-03 19:24 ` Mike Kravetz
2023-01-03 20:26 ` James Houghton
2023-01-03 20:27 ` James Houghton
2023-01-03 22:23 ` Mike Kravetz
2023-01-04 19:10 ` James Houghton [this message]
2023-01-04 20:03 ` Peter Xu
2023-01-04 23:12 ` James Houghton
2023-01-03 23:04 ` Peter Xu
2023-01-04 19:34 ` James Houghton
2023-01-04 20:04 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CADrL8HV73m0nVJOK3uv4sbyGKOVZhVxSv2+i4pUV7tozu6vW5Q@mail.gmail.com \
--to=jthoughton@google.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox