From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2F98C3DA41 for ; Wed, 10 Jul 2024 12:54:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C3526B008C; Wed, 10 Jul 2024 08:54:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 473B46B0095; Wed, 10 Jul 2024 08:54:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33B566B0096; Wed, 10 Jul 2024 08:54:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0F1246B008C for ; Wed, 10 Jul 2024 08:54:58 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B6C9EA42AD for ; Wed, 10 Jul 2024 12:54:57 +0000 (UTC) X-FDA: 82323837834.28.3180FB7 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by imf30.hostedemail.com (Postfix) with ESMTP id 649CC8001D for ; Wed, 10 Jul 2024 12:54:54 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aEmqMfbE; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf30.hostedemail.com: domain of andrew.zaborowski@intel.com designates 198.175.65.11 as permitted sender) smtp.mailfrom=andrew.zaborowski@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720616059; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=ZC/zW/gu+gbCBHfDY5XBvsavFG3oCWYvE9KNUBob3dc=; b=I2Ncu1J4SxLVbcrIwv0TaIlyxCTi6FytCFw5cik9aADYmd5+xbyF+Mn7pyt5PKBo1FjKhG 0FwINm5ePaiPygyd6ML2gnaYOYDSV6Cl3M4YG/U8WTIj0d6BoLxylgZaRqu9a4CWbYBLLP XDmVYedGgUUDoREyeWX/DBJrk/oq+TI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720616059; a=rsa-sha256; cv=none; b=OifWQixzLZ+xFDzNv4weuDQJQ4GHnlRcEZHwZCUoHuqVU7ZdhT/qJvKFcdxj98FSc05nl5 m7uZJejGB60zU9XEjJSZX41SP4/IRL1cAkMVoXd//mvpvNOiMDNYRP514ddciwBlDOM9ga udk7NbTnScq0x7mODgLXSEm1QmQNqQw= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aEmqMfbE; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf30.hostedemail.com: domain of andrew.zaborowski@intel.com designates 198.175.65.11 as permitted sender) smtp.mailfrom=andrew.zaborowski@intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720616094; x=1752152094; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=iyhDRd0tdD3nFuZ73bX3M9v+EM3Ik5VJ5bCjwFAAkyo=; b=aEmqMfbEKUIjU4Xa5dpN/6Cbs/a0GkyF3545VuNKyvrH1p1SLATasbNE u5JhKkXihynrGWFRZhRvOnPPksV+wTaf6FWHyTQBZQlbrjCaPLvGoiAOA s8LQhNByU/K7Ci6C9dd9AFri8YPVXQqaAK2Gc60lOCWgF96JpGzjcuMCw od66jihPZ9KwnoFDEgwDtIusjyPuoShFPq+feOcujVmzil85HtkWU6dwz 0cKy0wd16OWZNtJlhry5oNIAIScjgzDNdFxP8uE2HwLQigMxYuuyDIk74 lTdXGhz9R+oKrvf2boNifEc4uTCUuNoi7IyBKaSXWGEJ6TDae6x/PacoH Q==; X-CSE-ConnectionGUID: ghCd0SljSLi5ApGg3wDnTg== X-CSE-MsgGUID: /K3s+/isQ3yqxPtk/kVocA== X-IronPort-AV: E=McAfee;i="6700,10204,11128"; a="28524542" X-IronPort-AV: E=Sophos;i="6.09,198,1716274800"; d="scan'208";a="28524542" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2024 05:54:53 -0700 X-CSE-ConnectionGUID: 01vYcPkIR/uzxG+dtqX1mw== X-CSE-MsgGUID: wlldvR3CRUuaI1yhitbczA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,198,1716274800"; d="scan'208";a="48933750" Received: from linuxbkc-devel.jf.intel.com ([10.54.39.76]) by orviesa008.jf.intel.com with ESMTP; 10 Jul 2024 05:54:53 -0700 From: Andrew Zaborowski To: linux-edac@vger.kernel.org, linux-mm@kvack.org Cc: Kees Cook , Tony Luck , Eric Biederman , Borislav Petkov Subject: [PATCH 1/3] x86: Add task_struct flag to force SIGBUS on MCE Date: Wed, 10 Jul 2024 05:54:43 -0700 Message-ID: <20240710125445.564245-1-andrew.zaborowski@intel.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 649CC8001D X-Stat-Signature: omgpx1oeos3dxrwqeaqcg7rfifdipget X-Rspam-User: X-HE-Tag: 1720616094-687877 X-HE-Meta: U2FsdGVkX1/yp+RDI0olNGHkDjRJJPKdROn62cyBuf8o7QLjjd6rNNJa3sEZi6EhNRMxL7m7kEDQ6bNoR9MK6KBirdHAu2rU1HUM2R+/4z+nA92A3KhvMyEArWgAniZJ79IZlw4Zw+G8LCiFGC3DNCWyuyrHJbF5tRMuv3Pbv0/fkE2C48S+ccEvxzxGBydOo+KUDwAZTXvG5icRE8Ymsf+Zvl2h78FhXkgUb0hJqgZE3QVGR5zcgJPel24n35LlJn5pooBGHAjF+kk+LXNyw27lFYD7tCDDZMt7XgZRA3wOvwuUS4I3lvs7DJwhOAPooaC0rwVgrehngmcdYNbT7K3lg5qiUsLoaXK57co/3zEH1AD5suRcoicpRzWur5UoyOvbSrYnTs9GbRzwaggVSAFovKUY5BW9jiDTUnow7xEYqkYftaCTJUW3l33sJ71dKnAnHS3VysqhbGTF+ytnZg3PuarDnQ8oM4w/6HuH5vaPMro4NZJdMtCFaENCBQNgqAJ3+P4a/vN3N0AEk67H27HQdjUXNNwsRmi+zk/U1ZOMsE5qpElinfljj9l1Wh1rqf8xSz3pmp8JBUM90gvVy/rbZrE1mAAoXb8WLgG4c5IBojmz39f0qEeiZPaI6Ph4sla/2SLpacvZoIn9OI7p1uLyzPxRSuDljGVZzkcJyiZPhkshih3/A4K97Qv29y2RMOxiND9DwCqPSeM+cgK2kkuZw5esbHjlqy5w7wgWTWgP3ATSbIdbBcU5cLWGNTeCId326Qp1CYzUvle5qOocD4eFxI3wvTFC4vccvazSJc5Ax276mECrFxqJy4J3tE3zmnoMYWd8IBl0a+d1uUGrCEBL58ELsWKoym8hggBevFdj5MibP0dr917DtJTE3ClI3RNRYONTHLVv/jLFNUZQasgvJCIu8rmekrC1XwFRdgDIUpKD7+Jg/a2HcDNadXm8Lv0YhitVdtazLYqbfzz aoag70nV lRJIblptYHLO5dPzFI6XlNyZEy6cEU2TAChzjngwgHYnnsVCkdpsd3+0b8fMFn3mfsrbGcF/OI7oP1PPTFQOpL9UDUUSvKHtj1UV9npKCjXpl0fyy4Ec4g5o3SPTrUr9MIKQxuVmKV5IYXtmX5BHMdF56H3cljFQAOTCPQWpUAYonx7dApiJW+Do+ve9OkrhfBZAwjAZUsgcsOZy/2lz9bcWEnGFplvkb77bmFmpsPnxCtBmf5wLurvCYaMD9yU5n9m/mhsIUtx/Vy10= X-Bogosity: Ham, tests=bogofilter, spamicity=0.449946, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Uncorrected memory errors for user pages are signaled to processes using SIGBUS or, if the error happens in a syscall, an error retval from the syscall. The SIGBUS is documented in Documentation/mm/hwpoison.rst#failure-recovery-modes But there are corner cases where we cannot or don't want to return a plain error from the syscall. Subsequent commits covers two such cases: execve and rseq. Current code, in both places, will kill the task with a SIGSEGV on error. While not explicitly stated, it can be argued that it should be a SIGBUS, for consistency and for the benefit of the userspace signal handlers. Even if the process cannot handle the signal, perhaps the parent process can. This was the case in the scenario that motivated this patch. In both cases, the architecture's exception handler (MCE handler on x86) will queue a call to memory_failure. This doesn't work because the syscall-specific code sees the -EFAULT and terminates the task before the queued work runs. To fix this: 1. let pending work run in the error cases in both places. And 2. on MCE, ensure memory_failure() is passed MF_ACTION_REQUIRED so that the SIGBUS is queued. Normally when the MCE is in a syscall, a fixup of return IP and a call to kill_me_never() are what we want. But in this case it's necessary to queue kill_me_maybe() which will set MF_ACTION_REQUIRED which is checked by memory_failure(). To do this the syscall code will set current->kill_on_efault, a new task_struct flag. Check that flag in arch/x86/kernel/cpu/mce/core.c:do_machine_check() Note: the flag is not x86 specific even if only x86 handling is being added here. The definition could be guarded by #ifdef CONFIG_MEMORY_FAILURE, but it would then need set/clear utilities. Signed-off-by: Andrew Zaborowski --- This is a v2 of https://lore.kernel.org/linux-mm/20240501015340.3014724-1-andrew.zaborowski@intel.com/ In the v1 the existing flag current->in_execve was being reused instead of adding a new one. Kees Cook commented in https://lore.kernel.org/linux-mm/202405010915.465AF19@keescook/ that current->in_execve is going away. Lacking a better idea and seeing that execve() and rseq() would benefit from using a common mechanism, I decided to add this new flag. Perhaps with a better name current->kill_on_efault could replace brpm->point_of_no_return to offset the pain of having this extra flag. --- arch/x86/kernel/cpu/mce/core.c | 18 +++++++++++++++++- include/linux/sched.h | 2 ++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index ad0623b65..13f2ace3d 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1611,7 +1611,7 @@ noinstr void do_machine_check(struct pt_regs *regs) if (p) SetPageHWPoison(p); } - } else { + } else if (!current->kill_on_efault) { /* * Handle an MCE which has happened in kernel space but from * which the kernel can recover: ex_has_fault_handler() has @@ -1628,6 +1628,22 @@ noinstr void do_machine_check(struct pt_regs *regs) if (m.kflags & MCE_IN_KERNEL_COPYIN) queue_task_work(&m, msg, kill_me_never); + } else { + /* + * Even with recovery code extra handling is required when + * we're not returning to userspace after error (e.g. in + * execve() beyond the point of no return) to ensure that + * a SIGBUS is delivered. + */ + if (m.kflags & MCE_IN_KERNEL_RECOV) { + if (!fixup_exception(regs, X86_TRAP_MC, 0, 0)) + mce_panic("Failed kernel mode recovery", &m, msg); + } + + if (!mce_usable_address(&m)) + queue_task_work(&m, msg, kill_me_now); + else + queue_task_work(&m, msg, kill_me_maybe); } out: diff --git a/include/linux/sched.h b/include/linux/sched.h index 61591ac6e..0cde1ba11 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -975,6 +975,8 @@ struct task_struct { /* delay due to memory thrashing */ unsigned in_thrashing:1; #endif + /* Kill task on user memory access error */ + unsigned kill_on_efault:1; unsigned long atomic_flags; /* Flags requiring atomic access. */ -- 2.43.0 ----------------------------------------------------------- Intel Corporation Iberia S.A, Martinez Villergas, 49, Bloque V, Planta 1, Oficina 134, Martinez Villergas Business Park, 28027, Madrid, Spain Este mensaje se dirige exclusivamente a su destinatario y puede contener informacion privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la lectura, utilizacion, divulgacion y,o copia sin autorizacion esta prohibida en virtud de la legislacion vigente. Si ha recibido este mensaje por error, le rogamos que nos lo communique inmediatamente por esta misma via y proceda a su destruccion. This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.