From: Ryan Roberts <ryan.roberts@arm.com>
To: David Hildenbrand <david@redhat.com>, Peter Xu <peterx@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>, Linux-MM <linux-mm@kvack.org>
Subject: Re: Warning on mremapped uffd-wp memory
Date: Tue, 6 Aug 2024 17:58:37 +0100 [thread overview]
Message-ID: <01fb619b-2086-435b-90e5-79fd36f77da7@arm.com> (raw)
In-Reply-To: <520f4933-7164-4559-b6a9-8f28c1bff0d1@redhat.com>
On 06/08/2024 17:37, David Hildenbrand wrote:
> On 06.08.24 17:15, Ryan Roberts wrote:
>> Hi Peter, David,
>>
>> syzkaller has found an issue (at least on arm64, but I suspect it will be
>> visible on x86_64 too) that triggers the following warning:
>>
>> [ 2291.836518] ------------[ cut here ]------------
>> [ 2291.836528] WARNING: CPU: 3 PID: 9056 at mm/page_table_check.c:207
>> __page_table_check_ptes_set+0x22c/0x248
>> [ 2291.836541] Modules linked in:
>> [ 2291.836549] CPU: 3 UID: 1000 PID: 9056 Comm: bug Tainted: G
>> W 6.11.0-rc2-dirty #2
>> [ 2291.836554] Tainted: [W]=WARN
>> [ 2291.836557] Hardware name: linux,dummy-virt (DT)
>> [ 2291.836559] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 2291.836564] pc : __page_table_check_ptes_set+0x22c/0x248
>> [ 2291.836568] lr : ptep_modify_prot_commit+0x24c/0x2b0
>> [ 2291.836573] sp : ffff80008ca6ba20
>> [ 2291.836575] x29: ffff80008ca6ba20 x28: ffff186392d1eb00 x27: 0000000020ffd000
>> [ 2291.836598] x26: 0010000000000001 x25: 0000000000000001 x24: 0000000000000000
>> [ 2291.836605] x23: 04e800018c738f43 x22: 0000000000000001 x21: ffff1863824163c0
>> [ 2291.836612] x20: 04e800018c738f43 x19: 04e800018c738f43 x18: 0000fffff7f87fff
>> [ 2291.836619] x17: 0000000000000000 x16: 1fffe30c748d22a1 x15: 0060000000000fc3
>> [ 2291.836625] x14: 0000000000000000 x13: 0000000020ffd000 x12: 0000fffff7f87fff
>> [ 2291.836631] x11: 0000000020ffd000 x10: 0000000000000000 x9 : ffffbcab99e3ab84
>> [ 2291.836638] x8 : ffff186382b8f000 x7 : 0000000020ffe000 x6 : 0000000020ffd000
>> [ 2291.836644] x5 : ffff186392d1eb00 x4 : 04e800018c738f43 x3 : 0000000000000001
>> [ 2291.836650] x2 : 04e800018c738f43 x1 : ffff18639fe01fe8 x0 : ffffbcab9ce56780
>> [ 2291.836657] Call trace:
>> [ 2291.836659] __page_table_check_ptes_set+0x22c/0x248
>> [ 2291.836664] ptep_modify_prot_commit+0x24c/0x2b0
>> [ 2291.836667] change_protection+0x8a0/0x1100
>> [ 2291.836672] mprotect_fixup+0x124/0x2d0
>> [ 2291.836675] do_mprotect_pkey.constprop.0+0x29c/0x460
>> [ 2291.836679] __arm64_sys_mprotect+0x24/0xf8
>> [ 2291.836682] invoke_syscall+0x50/0x120
>> [ 2291.836690] el0_svc_common.constprop.0+0x48/0xf0
>> [ 2291.836694] do_el0_svc+0x24/0x38
>> [ 2291.836699] el0_svc+0x34/0xe0
>> [ 2291.836705] el0t_64_sync_handler+0x100/0x130
>> [ 2291.836709] el0t_64_sync+0x190/0x198
>> [ 2291.836713] ---[ end trace 0000000000000000 ]---
>>
>> The generated program (see below) mmaps a 16M region (RWX). It then mlocks all
>> current and future memory.
>>
>> Next, it registers 12K (3 pages) for use with UFFD-WP, and marks 4 pages
>> UFFD-WP'ed. This returns ENOENT because we only registered 3 pages, but those 3
>> pages are still UFFD-WP'ed in their PTE, so this error is not relavent to the
>> bug. At this point, there is a single VMA covering the 12K, with VM_UFFD_WP set,
>> amongst other flags:
>>
>> 20ffb000-20ffe000 rwxp 00000000 00:00 0
>> Size: 12 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Rss: 12 kB
>> Pss: 12 kB
>> Pss_Dirty: 12 kB
>> Shared_Clean: 0 kB
>> Shared_Dirty: 0 kB
>> Private_Clean: 0 kB
>> Private_Dirty: 12 kB
>> Referenced: 12 kB
>> Anonymous: 12 kB
>> KSM: 0 kB
>> LazyFree: 0 kB
>> AnonHugePages: 0 kB
>> ShmemPmdMapped: 0 kB
>> FilePmdMapped: 0 kB
>> Shared_Hugetlb: 0 kB
>> Private_Hugetlb: 0 kB
>> Swap: 0 kB
>> SwapPss: 0 kB
>> Locked: 12 kB
>> THPeligible: 0
>> VmFlags: rd wr ex mr mw me uw lo ac
>>
>> Next we mremap the first page to the address where the last page was previously
>> mapped, with MREMAP_DONTUNMAP. This leads to 2 VMAs, but the new one doesn't
>> have VM_UFFD_WP set (Note also that the original VMA no longer has VM_LOCKED
>> which seems wrong to me, but I'll ignore that for now):
>>
>> 20ffb000-20ffd000 rwxp 00000000 00:00 0
>> Size: 8 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Rss: 4 kB
>> Pss: 4 kB
>> Pss_Dirty: 4 kB
>> Shared_Clean: 0 kB
>> Shared_Dirty: 0 kB
>> Private_Clean: 0 kB
>> Private_Dirty: 4 kB
>> Referenced: 4 kB
>> Anonymous: 4 kB
>> KSM: 0 kB
>> LazyFree: 0 kB
>> AnonHugePages: 0 kB
>> ShmemPmdMapped: 0 kB
>> FilePmdMapped: 0 kB
>> Shared_Hugetlb: 0 kB
>> Private_Hugetlb: 0 kB
>> Swap: 0 kB
>> SwapPss: 0 kB
>> Locked: 0 kB
>> THPeligible: 0
>> VmFlags: rd wr ex mr mw me uw ac
>> 20ffd000-20ffe000 rwxp 00000000 00:00 0
>> Size: 4 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Rss: 4 kB
>> Pss: 4 kB
>> Pss_Dirty: 4 kB
>> Shared_Clean: 0 kB
>> Shared_Dirty: 0 kB
>> Private_Clean: 0 kB
>> Private_Dirty: 4 kB
>> Referenced: 4 kB
>> Anonymous: 4 kB
>> KSM: 0 kB
>> LazyFree: 0 kB
>> AnonHugePages: 0 kB
>> ShmemPmdMapped: 0 kB
>> FilePmdMapped: 0 kB
>> Shared_Hugetlb: 0 kB
>> Private_Hugetlb: 0 kB
>> Swap: 0 kB
>> SwapPss: 0 kB
>> Locked: 4 kB
>> THPeligible: 0
>> VmFlags: rd wr ex mr mw me lo ac
>>
>> Finally we try to mprotect that last 4K region to remove X, and we get the
>> warning saying the PTE has both the UFFD-WP and WRITE bits set.
>>
>> I'm guessing this is because the VM_UFFD_WP flag got spuriously dropped when
>> creating the final 4K VMA and so mprotect's can_change_pte_writable() check
>> incorrectly allowed the pte to be marked writable. But the mremap man page is
>> not very clear on the semantics when interacting with uffd regions; perhaps
>> uffd-wp bit should have been cleared when mremapping the ptes?
>>
>> I'm hoping you can advice on the expected semantics and we can figure out how to
>> solve this?
>>
>>
>> The reproducer is as follows (with a few annotations added by me):
>>
>> """
>> // autogenerated by syzkaller (https://github.com/google/syzkaller)
>>
>> #define _GNU_SOURCE
>>
>> #include <endian.h>
>> #include <stdint.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> #include <sys/syscall.h>
>> #include <sys/types.h>
>> #include <unistd.h>
>>
>> #ifndef __NR_ioctl
>> #define __NR_ioctl 29
>> #endif
>> #ifndef __NR_mlockall
>> #define __NR_mlockall 230
>> #endif
>> #ifndef __NR_mmap
>> #define __NR_mmap 222
>> #endif
>> #ifndef __NR_mprotect
>> #define __NR_mprotect 226
>> #endif
>> #ifndef __NR_mremap
>> #define __NR_mremap 216
>> #endif
>> #ifndef __NR_userfaultfd
>> #define __NR_userfaultfd 282
>> #endif
>>
>> uint64_t r[1] = {0xffffffffffffffff};
>>
>> int main(void)
>> {
>> intptr_t res = 0;
>>
>> syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>> syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
>> /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/7ul,
>> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>> syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>>
>> write(1, "executing program\n", sizeof("executing program\n") - 1);
>>
>> // userfaultfd(UFFD_USER_MODE_ONLY) = 3
>> res = syscall(__NR_userfaultfd, /*flags=UFFD_USER_MODE_ONLY*/1ul);
>> if (res != -1)
>> r[0] = res;
>>
>> // ioctl(3, UFFDIO_API, {api=0xaa, features=0 =>
>> features=UFFD_FEATURE_PAGEFAULT_FLAG_WP|UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_EVENT_REMAP|UFFD_FEATURE_EVENT_REMOVE|UFFD_FEATURE_MISSING_HUGETLBFS|UFFD_FEATURE_MISSING_SHMEM|UFFD_FEATURE_EVENT_UNMAP|UFFD_FEATURE_SIGBUS|UFFD_FEATURE_THREAD_ID|UFFD_FEATURE_MINOR_HUGETLBFS|UFFD_FEATURE_MINOR_SHMEM|0x1f800, ioctls=1<<_UFFDIO_REGISTER|1<<_UFFDIO_UNREGISTER|1<<_UFFDIO_API}) = 0
>> *(uint64_t*)0x20000000 = 0xaa;
>> *(uint64_t*)0x20000008 = 0;
>> *(uint64_t*)0x20000010 = 0;
>> syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc018aa3f, /*arg=*/0x20000000ul);
>>
>> syscall(__NR_mlockall, /*flags=MCL_FUTURE|MCL_CURRENT*/3ul);
>>
>> // ioctl(3, UFFDIO_REGISTER, {range={start=0x20ffb000, len=0x3000},
>> mode=UFFDIO_REGISTER_MODE_WP,
>> ioctls=1<<_UFFDIO_WAKE|1<<_UFFDIO_COPY|1<<_UFFDIO_ZEROPAGE|1<<_UFFDIO_WRITEPROTECT|0x120}) = 0
>> *(uint64_t*)0x20000180 = 0x20ffb000;
>> *(uint64_t*)0x20000188 = 0x3000;
>> *(uint64_t*)0x20000190 = 2;
>> *(uint64_t*)0x20000198 = 0;
>> syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc020aa00, /*arg=*/0x20000180ul);
>>
>> // ioctl(3, UFFDIO_WRITEPROTECT, 0x20000080) = -1 ENOENT (No such file or
>> directory)
>> *(uint64_t*)0x20000080 = 0x20ffb000;
>> *(uint64_t*)0x20000088 = 0x4000;
>> *(uint64_t*)0x20000090 = 1;
>> syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc018aa06, /*arg=*/0x20000080ul);
>>
>> syscall(__NR_mremap, /*addr=*/0x20ffb000ul, /*len=*/0x1000ul,
>> /*newlen=*/0x1000ul,
>> /*flags=MREMAP_DONTUNMAP|MREMAP_FIXED|MREMAP_MAYMOVE*/7ul,
>> /*newaddr=*/0x20ffd000ul);
>> syscall(__NR_mprotect, /*addr=*/0x20ffd000ul, /*len=*/0x1000ul,
>> /*prot=PROT_WRITE|PROT_READ*/3ul);
>>
>> return 0;
>> }
>> """
>>
>> I'd appreciate any thoughts you may have!
>
> Interesting. Either the vma flag shouldn't get dropped or we should un-mark the
> PTEs.
Yes, agreed. But which? I guess Peter is the expert here?
>
> Is the vma flag maybe getting dropped because of some weird interaction with
> UFFD_EVENT_REMAP?
>
next prev parent reply other threads:[~2024-08-06 16:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-06 15:15 Ryan Roberts
2024-08-06 16:37 ` David Hildenbrand
2024-08-06 16:58 ` Ryan Roberts [this message]
[not found] ` <ZrKHtOJ7wxk0V9Pl@x1n>
[not found] ` <7e52ca0b-39df-4979-8b16-9880e5a7149c@redhat.com>
[not found] ` <97c7b531-daeb-468f-af2a-31980f6f6a84@redhat.com>
[not found] ` <ZrPELKDCMl6MXupy@x1n>
[not found] ` <29404449-fcbe-4d54-85ce-44da0b202243@arm.com>
2024-10-01 14:27 ` Ryan Roberts
2024-10-01 15:10 ` Peter Xu
2024-10-01 15:31 ` Ryan Roberts
2024-10-01 15:42 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=01fb619b-2086-435b-90e5-79fd36f77da7@arm.com \
--to=ryan.roberts@arm.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox