From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4C8FC433DB for ; Fri, 26 Mar 2021 00:02:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5EE07619F8 for ; Fri, 26 Mar 2021 00:02:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5EE07619F8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D92626B0072; Thu, 25 Mar 2021 20:02:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D69A16B0073; Thu, 25 Mar 2021 20:02:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE3DD6B0074; Thu, 25 Mar 2021 20:02:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id A57426B0072 for ; Thu, 25 Mar 2021 20:02:49 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6A67D180ACF7F for ; Fri, 26 Mar 2021 00:02:49 +0000 (UTC) X-FDA: 77960074458.33.D8FDDE8 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf15.hostedemail.com (Postfix) with ESMTP id CACFAA00024B for ; Fri, 26 Mar 2021 00:02:45 +0000 (UTC) IronPort-SDR: 9dAJy4Lb2h4v07MxKTmeZGBDuxf3orvekGomU5LlUJuPNo0ISjTj7j3zJAByufJs94G/ea/ka1 eu4G4cXl+UIA== X-IronPort-AV: E=McAfee;i="6000,8403,9934"; a="276171514" X-IronPort-AV: E=Sophos;i="5.81,278,1610438400"; d="scan'208";a="276171514" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Mar 2021 17:02:44 -0700 IronPort-SDR: 381eQXWHSCu6mbpZPx3KOQD5wTsa+RnwhGNbuhw/jc5ma2bOZKWInhprjdOs1buwQMYVeAObVq ymmK1r5NISPA== X-IronPort-AV: E=Sophos;i="5.81,278,1610438400"; d="scan'208";a="416265846" Received: from agluck-desk2.sc.intel.com ([10.3.52.146]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Mar 2021 17:02:43 -0700 From: Tony Luck To: Borislav Petkov Cc: Tony Luck , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andy Lutomirski , Aili Yao , =?UTF-8?q?HORIGUCHI=20NAOYA=28=20=E5=A0=80=E5=8F=A3=E3=80=80=E7=9B=B4=E4=B9=9F=29?= Subject: [PATCH 4/4] x86/mce: Avoid infinite loop for copy from user recovery Date: Thu, 25 Mar 2021 17:02:35 -0700 Message-Id: <20210326000235.370514-5-tony.luck@intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210326000235.370514-1-tony.luck@intel.com> References: <20210326000235.370514-1-tony.luck@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: CACFAA00024B X-Stat-Signature: 17kh8ku6bnc4tqsbo7yfm1ctgtsm3i7n Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=mga05.intel.com; client-ip=192.55.52.43 X-HE-DKIM-Result: none/none X-HE-Tag: 1616716965-506718 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Recovery action when get_user() triggers a machine check uses the fixup path to make get_user() return -EFAULT. Also queue_task_work() sets up so that kill_me_maybe() will be called on return to user mode to send a SIGBUS to the current process. But there are places in the kernel where the code assumes that this EFAULT return was simply because of a page fault. The code takes some action to fix that, and then retries the access. This results in a second machine check. While processing this second machine check queue_task_work() is called again. But since this uses the same callback_head structure that was used in the first call, the net result is an entry on the current->task_works list that points to itself. When task_work_run() is called it loops forever in this code: do { next =3D work->next; work->func(work); work =3D next; cond_resched(); } while (work); Add a counter (current->mce_count) to keep track of repeated machine chec= ks before task_work() is called. First machine check saves the address infor= mation and calls task_work_add(). Subsequent machine checks before that task_wor= k call back is executed check that the address is in the same page as the f= irst machine check (since the callback will offline exactly one page). Expected worst case is two machine checks before moving on (e.g. one user access with page faults disabled, then a repeat to the same addrsss with page faults enabled). Just in case there is some code that loops forever enforce a limit of 10. Signed-off-by: Tony Luck --- arch/x86/kernel/cpu/mce/core.c | 40 ++++++++++++++++++++++++++-------- include/linux/sched.h | 1 + 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/cor= e.c index 1570310cadab..999fd7f0330b 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1250,6 +1250,9 @@ static void __mc_scan_banks(struct mce *m, struct p= t_regs *regs, struct mce *fin =20 static void kill_me_now(struct callback_head *ch) { + struct task_struct *p =3D container_of(ch, struct task_struct, mce_kill= _me); + + p->mce_count =3D 0; force_sig(SIGBUS); } =20 @@ -1258,6 +1261,7 @@ static void kill_me_maybe(struct callback_head *cb) struct task_struct *p =3D container_of(cb, struct task_struct, mce_kill= _me); int flags =3D MF_ACTION_REQUIRED; =20 + p->mce_count =3D 0; pr_err("Uncorrected hardware memory error in user-access at %llx", p->m= ce_addr); =20 if (!p->mce_ripv) @@ -1277,18 +1281,36 @@ static void kill_me_never(struct callback_head *c= b) { struct task_struct *p =3D container_of(cb, struct task_struct, mce_kill= _me); =20 + p->mce_count =3D 0; pr_err("Kernel accessed poison in user space at %llx\n", p->mce_addr); if (!memory_failure(p->mce_addr >> PAGE_SHIFT, 0)) set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); } =20 -static void queue_task_work(struct mce *m, void (*func)(struct callback_= head *)) +static void queue_task_work(struct mce *m, char *msg, void (*func)(struc= t callback_head *)) { - current->mce_addr =3D m->addr; - current->mce_kflags =3D m->kflags; - current->mce_ripv =3D !!(m->mcgstatus & MCG_STATUS_RIPV); - current->mce_whole_page =3D whole_page(m); - current->mce_kill_me.func =3D func; + int count =3D ++current->mce_count; + + /* First call, save all the details */ + if (count =3D=3D 1) { + current->mce_addr =3D m->addr; + current->mce_kflags =3D m->kflags; + current->mce_ripv =3D !!(m->mcgstatus & MCG_STATUS_RIPV); + current->mce_whole_page =3D whole_page(m); + current->mce_kill_me.func =3D func; + } + + /* Ten is likley overkill. Don't expect more than two faults before tas= k_work() */ + if (count > 10) + mce_panic("Too many machine checks while accessing user data", m, msg)= ; + + /* Second or later call, make sure page address matches the one from fi= rst call */ + if (count > 1 && (current->mce_addr >> PAGE_SHIFT) !=3D (m->addr >> PAG= E_SHIFT)) + mce_panic("Machine checks to different user pages", m, msg); + + /* Do not call task_work_add() more than once */ + if (count > 1) + return; =20 task_work_add(current, ¤t->mce_kill_me, TWA_RESUME); } @@ -1427,9 +1449,9 @@ noinstr void do_machine_check(struct pt_regs *regs) BUG_ON(!on_thread_stack() || !user_mode(regs)); =20 if (kill_current_task) - queue_task_work(&m, kill_me_now); + queue_task_work(&m, msg, kill_me_now); else - queue_task_work(&m, kill_me_maybe); + queue_task_work(&m, msg, kill_me_maybe); =20 } else { /* @@ -1447,7 +1469,7 @@ noinstr void do_machine_check(struct pt_regs *regs) } =20 if (m.kflags & MCE_IN_KERNEL_COPYIN) - queue_task_work(&m, kill_me_never); + queue_task_work(&m, msg, kill_me_never); } out: mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); diff --git a/include/linux/sched.h b/include/linux/sched.h index 2d213b52730c..8f9dc91498cf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1364,6 +1364,7 @@ struct task_struct { mce_whole_page : 1, __mce_reserved : 62; struct callback_head mce_kill_me; + int mce_count; #endif =20 #ifdef CONFIG_KRETPROBES --=20 2.29.2