From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2FC2C433F5 for ; Fri, 11 Mar 2022 19:42:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A8868D0002; Fri, 11 Mar 2022 14:42:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 156148D0001; Fri, 11 Mar 2022 14:42:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F39C58D0002; Fri, 11 Mar 2022 14:42:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id E7BF78D0001 for ; Fri, 11 Mar 2022 14:42:08 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id B99698230A for ; Fri, 11 Mar 2022 19:42:08 +0000 (UTC) X-FDA: 79233126336.10.D3CCEF7 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf23.hostedemail.com (Postfix) with ESMTP id 05B0C140010 for ; Fri, 11 Mar 2022 19:42:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647027727; x=1678563727; h=message-id:date:mime-version:to:cc:references:from: subject:in-reply-to:content-transfer-encoding; bh=mQiNdXMDSm6S2pRJONnpTeru2KszX9Gpzb6dViTFnCk=; b=ixa9Q+THuJ3Mfcrcm+IlXm4LFWw9c5p3W5daM0CZRjHLk7tRGiFVxj6N CACL4w/agzrvS3H7P7CA14xJ2CHV8UXNb4iM2N24bXeeVUVH/XJkqjbp7 Dduv5x/lLe3NzgxY+1dgCelfVzwuriPZjywzb+PuGiqEbcVzI3UvPqneI q9Tgs0N9odgr+guQwX+hnk0SkGE000N6RHm9+ayCu1EUEQUQBt6snAk6h JYT1eNwtFiSbJxQzSDg4HdALFRQsH5HC1jx+J1i2hoi2KF4nHbJuBL5Vw b4HOjog8vPZP/lRAP+Sapwk2yzlCFVHOe7Iv/Wbu31WgMOtMi87Nu69ip Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10283"; a="237813378" X-IronPort-AV: E=Sophos;i="5.90,174,1643702400"; d="scan'208";a="237813378" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Mar 2022 11:42:05 -0800 X-IronPort-AV: E=Sophos;i="5.90,174,1643702400"; d="scan'208";a="645033254" Received: from cpeirce-mobl1.amr.corp.intel.com (HELO [10.212.128.243]) ([10.212.128.243]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Mar 2022 11:42:05 -0800 Message-ID: Date: Fri, 11 Mar 2022 11:41:58 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-US To: Nadav Amit , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org References: <20220311190749.338281-1-namit@vmware.com> <20220311190749.338281-3-namit@vmware.com> From: Dave Hansen Subject: Re: [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault In-Reply-To: <20220311190749.338281-3-namit@vmware.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 05B0C140010 X-Stat-Signature: 6cgy9p7wfe7p8nqs463yyhfsdpconm5z Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ixa9Q+TH; spf=none (imf23.hostedemail.com: domain of dave.hansen@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1647027726-128505 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/11/22 11:07, Nadav Amit wrote: > From: Nadav Amit > > access_error() currently does not check for execution permission > violation. As a result, spurious page-faults due to execution permission > violation cause SIGSEGV. This is a bit muddy on the problem statement. I get that spurious faults can theoretically cause this, but *do* they in practice on current kernels? > It appears not to be an issue so far, but the next patches avoid TLB > flushes on permission promotion, which can lead to this scenario. nodejs > for instance crashes when TLB flush is avoided on permission promotion. By "it appears not to be an issue", do you mean that this suboptimal behavior can not be triggered, period? Or, it can be triggered but folks seem not to care that it can be triggered? I *think* these can be triggered today. I think it takes two threads that do something like: Thread 1 Thread 2 ======== ======== ptr = malloc(); memcpy(ptr, &code, len); exec_now = 1; while (!exec_now); call(ptr); // fault mprotect(ptr, PROT_EXEC, len); // fault sees VM_EXEC But that has a bug: exec_now is set before the mprotect(). It's not sane code. Can any sane code trigger this? > Add a check to prevent access_error() from returning mistakenly that > spurious page-faults due to instruction fetch are a reason for an access > error. > > It is assumed that error code bits of "instruction fetch" and "write" in > the hardware error code are mutual exclusive, and the change assumes so. > However, to be on the safe side, especially if hypervisors misbehave, > assert this is the case and warn otherwise. > > Cc: Andrea Arcangeli > Cc: Andrew Cooper > Cc: Andrew Morton > Cc: Andy Lutomirski > Cc: Dave Hansen > Cc: Peter Xu > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Will Deacon > Cc: Yu Zhao > Cc: Nick Piggin > Cc: x86@kernel.org > Signed-off-by: Nadav Amit > --- > arch/x86/mm/fault.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index d0074c6ed31a..ad0ef0a6087a 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -1107,10 +1107,28 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) > (error_code & X86_PF_INSTR), foreign)) > return 1; > > - if (error_code & X86_PF_WRITE) { > + if (error_code & (X86_PF_WRITE | X86_PF_INSTR)) { > + /* > + * CPUs are not expected to set the two error code bits > + * together, but to ensure that hypervisors do not misbehave, > + * run an additional sanity check. > + */ > + if ((error_code & (X86_PF_WRITE|X86_PF_INSTR)) == > + (X86_PF_WRITE|X86_PF_INSTR)) { > + WARN_ON_ONCE(1); > + return 1; > + } access_error() is only used on the do_user_addr_fault() side of things. Can we stick this check somewhere that also works for kernel address faults? This is a generic sanity check. It can also be in a separate patch. Also, we should *probably* stop talking about CPUs here. If there's ever something wonky with error code bits, I'd put my money on a weird hypervisor before any kind of CPU issue. > /* write, present and write, not present: */ > - if (unlikely(!(vma->vm_flags & VM_WRITE))) > + if ((error_code & X86_PF_WRITE) && > + unlikely(!(vma->vm_flags & VM_WRITE))) > + return 1; > + > + /* exec, present and exec, not present: */ > + if ((error_code & X86_PF_INSTR) && > + unlikely(!(vma->vm_flags & VM_EXEC))) > return 1; > + > return 0; > } This is getting really ugly. I think we've gone over this before, but it escapes me. Why do we need a common (X86_PF_WRITE | X86_PF_INSTR) block of code? Why can't we just add a simple X86_PF_INSN if() that mirrors the current X86_PF_WRITE one? if (error_code & X86_PF_INSN) { /* present and not exec: */ if (unlikely(!(vma->vm_flags & VM_EXEC))) return 1; return 0; }