From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C083C52D71 for ; Tue, 6 Aug 2024 16:58:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9175D6B0085; Tue, 6 Aug 2024 12:58:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C6CF6B0088; Tue, 6 Aug 2024 12:58:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B5666B0089; Tue, 6 Aug 2024 12:58:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5DACE6B0085 for ; Tue, 6 Aug 2024 12:58:43 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CE1A740406 for ; Tue, 6 Aug 2024 16:58:42 +0000 (UTC) X-FDA: 82422429684.02.9A2D3F9 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf28.hostedemail.com (Postfix) with ESMTP id B149DC0018 for ; Tue, 6 Aug 2024 16:58:40 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722963451; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2MrvJKvTEeluC19s7lI4udj89tsI6VAhfWePK5f4AVc=; b=oKcittbKMEH6z683tnGuRblIYxLCASsoz1MilylBlBdLwjUxKRt1T96fgo7xyFSMKSGwRs FLvuV4nL+XwfK72TxTXmaamnInh5YfWdZvKMdLfnxm/VVF26dBz1Xb3dDpioybxDNjadFW gHg0r348r/g+NZZ9zx75u0iwe7A5Q7w= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722963451; a=rsa-sha256; cv=none; b=e3qRO/T9x+MtfJSlFybJaOj9zM0wW21V8u5hVn+ZZxdU/0Cqj+LT0uvBUYaXQkrBoS2y+Z 3br4PQT/uCt1GgqMJRS3p2hAlmN/FIMT682uHoHwZnIW3O/eQYG4DGvtsDMXVh9TnYx4ZS SKSfxnoOdIUJKUCk7EcjcNd4v2PByEE= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 34706FEC; Tue, 6 Aug 2024 09:59:05 -0700 (PDT) Received: from [10.1.31.182] (XHFQ2J9959.cambridge.arm.com [10.1.31.182]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B1FB23F766; Tue, 6 Aug 2024 09:58:38 -0700 (PDT) Message-ID: <01fb619b-2086-435b-90e5-79fd36f77da7@arm.com> Date: Tue, 6 Aug 2024 17:58:37 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Warning on mremapped uffd-wp memory Content-Language: en-GB To: David Hildenbrand , Peter Xu Cc: Mark Rutland , Linux-MM References: <810b44a8-d2ae-4107-b665-5a42eae2d948@arm.com> <520f4933-7164-4559-b6a9-8f28c1bff0d1@redhat.com> From: Ryan Roberts In-Reply-To: <520f4933-7164-4559-b6a9-8f28c1bff0d1@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B149DC0018 X-Stat-Signature: e4qda63c4ognn9tmupt3d5g363ospu5u X-Rspam-User: X-HE-Tag: 1722963520-989197 X-HE-Meta: U2FsdGVkX19T9vX54ndNRWmNQf5iCcQuhhY8n4qEwCd7cxI9vaT/XMA311l6zrteL+E411n9SaWice3OQwD0Ti2YrKL2Wq3/oqrvSFcF4KUKF35GUSIgInc39mIbJSGXt+SiOT8udpaNtwcL2OEn2jNZeU9Snv+C4/pIpjoceX+hqDl1Mzvy5xNY3/UKUUVxgoBpcUaroV+z1daV6uaTQvG5BMxHtOydniTeDuZA7vmxzC1V07O744XeuJcGGDMiuiJW/6VztLnNXWXyIBcHw4zz7sqrUl5BAs5w+h7tEJl6bP/oxCR7ylfgyrU6hGC6fc67qB37zWsqlYDatArKOEZQDyudhQwXf2WevMtb/cuwNlECfgtXK8Wu5DYDeeJCAFbYrVTSS4qQ1SDs+yUWPBEuYneKA+LuotvH7XE3Vhp1tl0n2VpfIaPP/XdA4muI/iP8O/8MyCtzV3yR4MuuwWA7KSrjpvABF1yXZnt+Mgr1TOcft7AVOg+JxpUFby5X7kdMmHp/uOjCrwj4UChsU8qm/SXuqbOP35TJdHtjE9Jg6iwcweemA8R6wHe+mHZ2s3dlEnjNLaeFXWv3rVlGvbjGRFx/AQEh/2pfVC3m8aHMgolXkP9x6qkbKpgKWn0Vr+czITarFUWjtUAGQyaSS35QqmbcEIJKmsh5wazekpx4WkJOHK3wSpdZUXF5Nk1SjSsLi7VjVW4OvpmztSytXvQx28dURIQIc3y/u3mkvlWtPk0AzdLo8ENJ4Gx4ffHRfUfETflvWJpV6HzO4wxcE7KZFrVtIXjKdA5DUJGLYjzXFjyZwc64gNmBsiTnCnilD0Uz8upJEJKOAtu5anqPhm9RY+kLSgV7Y/I0hAzTh8970zDqyg7jIsw9m7vPoNr9GDXAkgUtOJBKlBmUOSLA0nSp2GrKdNSsk8bkD7HLcdEGSGBdMoo8dVmQtA2yqlJrWQ+ieoR/2jllh9ux31i 3JfMhQZP wF/+MdBjuiw6Qa3sZkFhIG2hkKDaV8MskEzS4UUZF09nqrVYCq08CMsZd0rM4DqVDfv+JD3kVarE/IvUpU2WsrWfA7Wy5AlVKfDcd9wLOH5+wiwGDxI0VQlgXTNVsmkaYgGeL1tCsIDLWgyabOpxoFvW4Qr6HDWJeEuSxC4Dp05NoC24r5QjzztqF4vU9FZXSqJX8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 06/08/2024 17:37, David Hildenbrand wrote: > On 06.08.24 17:15, Ryan Roberts wrote: >> Hi Peter, David, >> >> syzkaller has found an issue (at least on arm64, but I suspect it will be >> visible on x86_64 too) that triggers the following warning: >> >> [ 2291.836518] ------------[ cut here ]------------ >> [ 2291.836528] WARNING: CPU: 3 PID: 9056 at mm/page_table_check.c:207 >> __page_table_check_ptes_set+0x22c/0x248 >> [ 2291.836541] Modules linked in: >> [ 2291.836549] CPU: 3 UID: 1000 PID: 9056 Comm: bug Tainted: G        >> W          6.11.0-rc2-dirty #2 >> [ 2291.836554] Tainted: [W]=WARN >> [ 2291.836557] Hardware name: linux,dummy-virt (DT) >> [ 2291.836559] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 2291.836564] pc : __page_table_check_ptes_set+0x22c/0x248 >> [ 2291.836568] lr : ptep_modify_prot_commit+0x24c/0x2b0 >> [ 2291.836573] sp : ffff80008ca6ba20 >> [ 2291.836575] x29: ffff80008ca6ba20 x28: ffff186392d1eb00 x27: 0000000020ffd000 >> [ 2291.836598] x26: 0010000000000001 x25: 0000000000000001 x24: 0000000000000000 >> [ 2291.836605] x23: 04e800018c738f43 x22: 0000000000000001 x21: ffff1863824163c0 >> [ 2291.836612] x20: 04e800018c738f43 x19: 04e800018c738f43 x18: 0000fffff7f87fff >> [ 2291.836619] x17: 0000000000000000 x16: 1fffe30c748d22a1 x15: 0060000000000fc3 >> [ 2291.836625] x14: 0000000000000000 x13: 0000000020ffd000 x12: 0000fffff7f87fff >> [ 2291.836631] x11: 0000000020ffd000 x10: 0000000000000000 x9 : ffffbcab99e3ab84 >> [ 2291.836638] x8 : ffff186382b8f000 x7 : 0000000020ffe000 x6 : 0000000020ffd000 >> [ 2291.836644] x5 : ffff186392d1eb00 x4 : 04e800018c738f43 x3 : 0000000000000001 >> [ 2291.836650] x2 : 04e800018c738f43 x1 : ffff18639fe01fe8 x0 : ffffbcab9ce56780 >> [ 2291.836657] Call trace: >> [ 2291.836659]  __page_table_check_ptes_set+0x22c/0x248 >> [ 2291.836664]  ptep_modify_prot_commit+0x24c/0x2b0 >> [ 2291.836667]  change_protection+0x8a0/0x1100 >> [ 2291.836672]  mprotect_fixup+0x124/0x2d0 >> [ 2291.836675]  do_mprotect_pkey.constprop.0+0x29c/0x460 >> [ 2291.836679]  __arm64_sys_mprotect+0x24/0xf8 >> [ 2291.836682]  invoke_syscall+0x50/0x120 >> [ 2291.836690]  el0_svc_common.constprop.0+0x48/0xf0 >> [ 2291.836694]  do_el0_svc+0x24/0x38 >> [ 2291.836699]  el0_svc+0x34/0xe0 >> [ 2291.836705]  el0t_64_sync_handler+0x100/0x130 >> [ 2291.836709]  el0t_64_sync+0x190/0x198 >> [ 2291.836713] ---[ end trace 0000000000000000 ]--- >> >> The generated program (see below) mmaps a 16M region (RWX). It then mlocks all >> current and future memory. >> >> Next, it registers 12K (3 pages) for use with UFFD-WP, and marks 4 pages >> UFFD-WP'ed. This returns ENOENT because we only registered 3 pages, but those 3 >> pages are still UFFD-WP'ed in their PTE, so this error is not relavent to the >> bug. At this point, there is a single VMA covering the 12K, with VM_UFFD_WP set, >> amongst other flags: >> >>    20ffb000-20ffe000 rwxp 00000000 00:00 0 >>    Size:                 12 kB >>    KernelPageSize:        4 kB >>    MMUPageSize:           4 kB >>    Rss:                  12 kB >>    Pss:                  12 kB >>    Pss_Dirty:            12 kB >>    Shared_Clean:          0 kB >>    Shared_Dirty:          0 kB >>    Private_Clean:         0 kB >>    Private_Dirty:        12 kB >>    Referenced:           12 kB >>    Anonymous:            12 kB >>    KSM:                   0 kB >>    LazyFree:              0 kB >>    AnonHugePages:         0 kB >>    ShmemPmdMapped:        0 kB >>    FilePmdMapped:         0 kB >>    Shared_Hugetlb:        0 kB >>    Private_Hugetlb:       0 kB >>    Swap:                  0 kB >>    SwapPss:               0 kB >>    Locked:               12 kB >>    THPeligible:           0 >>    VmFlags: rd wr ex mr mw me uw lo ac >> >> Next we mremap the first page to the address where the last page was previously >> mapped, with MREMAP_DONTUNMAP. This leads to 2 VMAs, but the new one doesn't >> have VM_UFFD_WP set (Note also that the original VMA no longer has VM_LOCKED >> which seems wrong to me, but I'll ignore that for now): >> >>    20ffb000-20ffd000 rwxp 00000000 00:00 0 >>    Size:                  8 kB >>    KernelPageSize:        4 kB >>    MMUPageSize:           4 kB >>    Rss:                   4 kB >>    Pss:                   4 kB >>    Pss_Dirty:             4 kB >>    Shared_Clean:          0 kB >>    Shared_Dirty:          0 kB >>    Private_Clean:         0 kB >>    Private_Dirty:         4 kB >>    Referenced:            4 kB >>    Anonymous:             4 kB >>    KSM:                   0 kB >>    LazyFree:              0 kB >>    AnonHugePages:         0 kB >>    ShmemPmdMapped:        0 kB >>    FilePmdMapped:         0 kB >>    Shared_Hugetlb:        0 kB >>    Private_Hugetlb:       0 kB >>    Swap:                  0 kB >>    SwapPss:               0 kB >>    Locked:                0 kB >>    THPeligible:           0 >>    VmFlags: rd wr ex mr mw me uw ac >>    20ffd000-20ffe000 rwxp 00000000 00:00 0 >>    Size:                  4 kB >>    KernelPageSize:        4 kB >>    MMUPageSize:           4 kB >>    Rss:                   4 kB >>    Pss:                   4 kB >>    Pss_Dirty:             4 kB >>    Shared_Clean:          0 kB >>    Shared_Dirty:          0 kB >>    Private_Clean:         0 kB >>    Private_Dirty:         4 kB >>    Referenced:            4 kB >>    Anonymous:             4 kB >>    KSM:                   0 kB >>    LazyFree:              0 kB >>    AnonHugePages:         0 kB >>    ShmemPmdMapped:        0 kB >>    FilePmdMapped:         0 kB >>    Shared_Hugetlb:        0 kB >>    Private_Hugetlb:       0 kB >>    Swap:                  0 kB >>    SwapPss:               0 kB >>    Locked:                4 kB >>    THPeligible:           0 >>    VmFlags: rd wr ex mr mw me lo ac >> >> Finally we try to mprotect that last 4K region to remove X, and we get the >> warning saying the PTE has both the UFFD-WP and WRITE bits set. >> >> I'm guessing this is because the VM_UFFD_WP flag got spuriously dropped when >> creating the final 4K VMA and so mprotect's can_change_pte_writable() check >> incorrectly allowed the pte to be marked writable. But the mremap man page is >> not very clear on the semantics when interacting with uffd regions; perhaps >> uffd-wp bit should have been cleared when mremapping the ptes? >> >> I'm hoping you can advice on the expected semantics and we can figure out how to >> solve this? >> >> >> The reproducer is as follows (with a few annotations added by me): >> >> """ >> // autogenerated by syzkaller (https://github.com/google/syzkaller) >> >> #define _GNU_SOURCE >> >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> >> #ifndef __NR_ioctl >> #define __NR_ioctl 29 >> #endif >> #ifndef __NR_mlockall >> #define __NR_mlockall 230 >> #endif >> #ifndef __NR_mmap >> #define __NR_mmap 222 >> #endif >> #ifndef __NR_mprotect >> #define __NR_mprotect 226 >> #endif >> #ifndef __NR_mremap >> #define __NR_mremap 216 >> #endif >> #ifndef __NR_userfaultfd >> #define __NR_userfaultfd 282 >> #endif >> >> uint64_t r[1] = {0xffffffffffffffff}; >> >> int main(void) >> { >>     intptr_t res = 0; >> >>     syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul, >> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/0x32ul, /*fd=*/-1, /*offset=*/0ul); >>     syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, >> /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/7ul, >> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/0x32ul, /*fd=*/-1, /*offset=*/0ul); >>     syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul, >> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/0x32ul, /*fd=*/-1, /*offset=*/0ul); >> >>     write(1, "executing program\n", sizeof("executing program\n") - 1); >> >>     // userfaultfd(UFFD_USER_MODE_ONLY)        = 3 >>     res = syscall(__NR_userfaultfd, /*flags=UFFD_USER_MODE_ONLY*/1ul); >>     if (res != -1) >>         r[0] = res; >> >>     // ioctl(3, UFFDIO_API, {api=0xaa, features=0 => >> features=UFFD_FEATURE_PAGEFAULT_FLAG_WP|UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_EVENT_REMAP|UFFD_FEATURE_EVENT_REMOVE|UFFD_FEATURE_MISSING_HUGETLBFS|UFFD_FEATURE_MISSING_SHMEM|UFFD_FEATURE_EVENT_UNMAP|UFFD_FEATURE_SIGBUS|UFFD_FEATURE_THREAD_ID|UFFD_FEATURE_MINOR_HUGETLBFS|UFFD_FEATURE_MINOR_SHMEM|0x1f800, ioctls=1<<_UFFDIO_REGISTER|1<<_UFFDIO_UNREGISTER|1<<_UFFDIO_API}) = 0 >>     *(uint64_t*)0x20000000 = 0xaa; >>     *(uint64_t*)0x20000008 = 0; >>     *(uint64_t*)0x20000010 = 0; >>     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc018aa3f, /*arg=*/0x20000000ul); >> >>     syscall(__NR_mlockall, /*flags=MCL_FUTURE|MCL_CURRENT*/3ul); >> >>     // ioctl(3, UFFDIO_REGISTER, {range={start=0x20ffb000, len=0x3000}, >> mode=UFFDIO_REGISTER_MODE_WP, >> ioctls=1<<_UFFDIO_WAKE|1<<_UFFDIO_COPY|1<<_UFFDIO_ZEROPAGE|1<<_UFFDIO_WRITEPROTECT|0x120}) = 0 >>     *(uint64_t*)0x20000180 = 0x20ffb000; >>     *(uint64_t*)0x20000188 = 0x3000; >>     *(uint64_t*)0x20000190 = 2; >>     *(uint64_t*)0x20000198 = 0; >>     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc020aa00, /*arg=*/0x20000180ul); >> >>     // ioctl(3, UFFDIO_WRITEPROTECT, 0x20000080) = -1 ENOENT (No such file or >> directory) >>     *(uint64_t*)0x20000080 = 0x20ffb000; >>     *(uint64_t*)0x20000088 = 0x4000; >>     *(uint64_t*)0x20000090 = 1; >>     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc018aa06, /*arg=*/0x20000080ul); >> >>     syscall(__NR_mremap, /*addr=*/0x20ffb000ul, /*len=*/0x1000ul, >> /*newlen=*/0x1000ul, >> /*flags=MREMAP_DONTUNMAP|MREMAP_FIXED|MREMAP_MAYMOVE*/7ul, >> /*newaddr=*/0x20ffd000ul); >>     syscall(__NR_mprotect, /*addr=*/0x20ffd000ul, /*len=*/0x1000ul, >> /*prot=PROT_WRITE|PROT_READ*/3ul); >> >>     return 0; >> } >> """ >> >> I'd appreciate any thoughts you may have! > > Interesting. Either the vma flag shouldn't get dropped or we should un-mark the > PTEs. Yes, agreed. But which? I guess Peter is the expert here? > > Is the vma flag maybe getting dropped because of some weird interaction with > UFFD_EVENT_REMAP? >