From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3125CC433F5 for ; Fri, 11 Mar 2022 21:17:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AAFED8D0002; Fri, 11 Mar 2022 16:17:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A5FF58D0001; Fri, 11 Mar 2022 16:17:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9010A8D0002; Fri, 11 Mar 2022 16:17:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 7F0E78D0001 for ; Fri, 11 Mar 2022 16:17:03 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 490B0814E7 for ; Fri, 11 Mar 2022 21:17:03 +0000 (UTC) X-FDA: 79233365526.06.A1EB81F Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf08.hostedemail.com (Postfix) with ESMTP id BCB6F160021 for ; Fri, 11 Mar 2022 21:17:02 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id p17so8665362plo.9 for ; Fri, 11 Mar 2022 13:17:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XH+5/+cW8VQKO81AORhyT2Hmdy1TzbLeTlOyFWJWYeA=; b=PgciNoUU40YKRk+bsPBh194RRGa5sI3NFQvsgyxxdemnlRf1lRQ/7uF7PvVTwNVjfQ 7Wyakm1uYkqo7VUjKAM5orbK+8q8lq7akwX+fqKy6nN1+sjl1DfS0c8YnPPtBGF5jDg6 fJkE1+lwwLz8zeQVLEywqWQi90ErE7gVWl4dNjehnQZsCox+D81F4cPth65VOI7g0VKD WfK1qgaHCY4UHfnAsl3pcdK/OEzZAO2tLDrXaen8KPpom8gQnUTQbnAxvLrka4s29uMb eS9Up5m0pSRi6L7s5+L8KqFaRtMyIWUWJDKnkL7XPpKByOKQcdqnCmmVV9QkS2tr+VFX 4SGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XH+5/+cW8VQKO81AORhyT2Hmdy1TzbLeTlOyFWJWYeA=; b=rT60HA2826ToswakUS1oPGB0FVsXY+tF4Pg4LtNThuiyZhsSKmXMzyynvir5GPNCRy ffG5AppLdt+0bOzdxdpGcOs3fiKHcE0ErGw1nzvVn2h1QP/VmqNxm2T2DOhPn6o8JeAD tggzKcgaWw7mMCEh+OCAeBjXcUAyNI/JjCtQVl9IlyxRwx51jAe29koKXqrDR5DmMPNf 0bZoK9DZsHYnhCIL2UNto8WC4BBSSoIthwBS4fRm++tWMSo28maM36U1XF7JSGRSv2Ce sksQznwj2VEVWbclQk6tlct7SseUetP4MiYpNeY0BxVXaeUQUmUTatXjBEk6PjSDkpbd W0kA== X-Gm-Message-State: AOAM533Zqd6Gnn4Uk1H9fbxW6z38jjs3cYcaCNYjZ7Oazq+ufKAzb6n3 LxHhkKjaiiLc1PPxHChXtyE= X-Google-Smtp-Source: ABdhPJwucZP2RjPXu31Wk6NuUNHwx2Rs6nW29uTE/rQOakKnRfuUIrgOwAsq2TZhmEYXGijjSMbAJg== X-Received: by 2002:a17:90b:4ac1:b0:1bf:6d51:1ad9 with SMTP id mh1-20020a17090b4ac100b001bf6d511ad9mr23648278pjb.199.1647033421514; Fri, 11 Mar 2022 13:17:01 -0800 (PST) Received: from smtpclient.apple ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id c14-20020a056a00248e00b004f77e0fbfc0sm6864387pfv.185.2022.03.11.13.17.00 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Mar 2022 13:17:01 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.60.0.1.1\)) Subject: Re: [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault From: Nadav Amit In-Reply-To: <70e08bd5-187a-daee-2822-1d9a437a9cff@intel.com> Date: Fri, 11 Mar 2022 13:16:59 -0800 Cc: Linux-MM , Linux Kernel Mailing List , Andrew Morton , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , "x86@kernel.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <20220311190749.338281-1-namit@vmware.com> <20220311190749.338281-3-namit@vmware.com> <70e08bd5-187a-daee-2822-1d9a437a9cff@intel.com> To: Dave Hansen X-Mailer: Apple Mail (2.3693.60.0.1.1) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PgciNoUU; spf=pass (imf08.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: BCB6F160021 X-Stat-Signature: x4w3wa16p6t1tzppnq5cisyqc8s6mq9z X-HE-Tag: 1647033422-947169 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Mar 11, 2022, at 12:59 PM, Dave Hansen = wrote: >=20 > On 3/11/22 12:38, Nadav Amit wrote: >>> On Mar 11, 2022, at 11:41 AM, Dave Hansen = wrote: > ... >>> Can any sane code trigger this? >>=20 >> Well, regarding this question and the previous one: I do not think = that >> this scenario is possible today since mprotect() holds the mmap_lock >> for write. There is no other code that I am aware of that toggles >> the NX bit on a present entry. >>=20 >> But I will not bet my life on it. That=E2=80=99s the reason for the = somewhat >> vague phrasing that I used. >=20 > =46rom the userspace perspective, mmap(MAP_FIXED) can do this too. = But, > sane userspace can't rely on the syscall to have done any work and the > TLB flushing is currently done before the syscall returns. >=20 > I'd put it this way: >=20 > Today, it is possible for a thread to end up in access_error() > for a PF_INSN fault and observe a VM_EXEC VMA. If you are > generous, this could be considered a spurious fault. >=20 > However, the faulting thread would have had to race with the > thread which was changing the PTE and the VMA and is currently > *in* mprotect() (or some other syscall). In other words, the > faulting thread can encounter this situation, but it never had > any assurance from the kernel that it wouldn't fault. This is > because the faulting thread never had a chance to observe the > syscall return. >=20 > There is no evidence that the existing behavior can cause any > issues with sane userspace. Done. Thanks. >=20 >>>> index d0074c6ed31a..ad0ef0a6087a 100644 >>>> --- a/arch/x86/mm/fault.c >>>> +++ b/arch/x86/mm/fault.c >>>> @@ -1107,10 +1107,28 @@ access_error(unsigned long error_code, = struct vm_area_struct *vma) >>>> (error_code & X86_PF_INSTR), = foreign)) >>>> return 1; >>>>=20 >>>> - if (error_code & X86_PF_WRITE) { >>>> + if (error_code & (X86_PF_WRITE | X86_PF_INSTR)) { >>>> + /* >>>> + * CPUs are not expected to set the two error code bits >>>> + * together, but to ensure that hypervisors do not = misbehave, >>>> + * run an additional sanity check. >>>> + */ >>>> + if ((error_code & (X86_PF_WRITE|X86_PF_INSTR)) =3D=3D >>>> + (X86_PF_WRITE|X86_PF_INSTR)) { >>>> + WARN_ON_ONCE(1); >>>> + return 1; >>>> + } >>>=20 >>> access_error() is only used on the do_user_addr_fault() side of = things. >>> Can we stick this check somewhere that also works for kernel address >>> faults? This is a generic sanity check. It can also be in a = separate >>> patch. >>=20 >> I can wrap it in a different function and also call it from >> do_kern_addr_fault() or spurious_kernel_fault(). >>=20 >> Anyhow, spurious_kernel_fault() should handle spurious faults on >> executable code correctly.=20 >=20 > This is really about checking the sanity of the "hardware"-provided > error code. Let's just do it in handle_page_fault(), maybe hidden in = a > function like: >=20 > void check_error_code_sanity(unsigned long error_code) > { > WARN_ON_ONCE(...); > } >=20 > You can leave the X86_PF_PK check in place for now. It's probably = going > away soon anyway. Done. Thanks. But note that removing the check from access_error() means that if the assertion is broken, userspace might crash inadvertently (in contrast to the version I sent, which would have potentially led to infinite stream of page-faults). I don=E2=80=99t know which behavior is = better, so let=E2=80=99s go with your version and just hope it doesn=E2=80=99t = happen. >=20 >>> Also, we should *probably* stop talking about CPUs here. If there's >>> ever something wonky with error code bits, I'd put my money on a = weird >>> hypervisor before any kind of CPU issue. >>=20 >> I thought I manage to convey exactly that in the comment. Can you = provide >> a better phrasing? >=20 > Maybe: >=20 > /* > * X86_PF_INSTR for instruction _fetches_. Fetches never write. > * X86_PF_WRITE should never be set with X86_PF_INSTR. > * > * This is most likely due to a buggy hypervisor. > */ Done, thank you.