From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CBBBC433E0 for ; Tue, 26 Jan 2021 22:36:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BD94A20679 for ; Tue, 26 Jan 2021 22:36:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD94A20679 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EE1846B0005; Tue, 26 Jan 2021 17:36:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E94C36B0006; Tue, 26 Jan 2021 17:36:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAA206B0007; Tue, 26 Jan 2021 17:36:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id C57716B0005 for ; Tue, 26 Jan 2021 17:36:14 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 75EA7363B for ; Tue, 26 Jan 2021 22:36:14 +0000 (UTC) X-FDA: 77749385868.10.flesh39_520c35727592 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 281E816A4A1 for ; Tue, 26 Jan 2021 22:36:14 +0000 (UTC) X-HE-Tag: flesh39_520c35727592 X-Filterd-Recvd-Size: 5811 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Tue, 26 Jan 2021 22:36:12 +0000 (UTC) IronPort-SDR: eM8eIwNmD+M0fI+HYWvIFmdBpyZMxNp1t/ky9dFJdsiCqfO7bVxZLs16TWvrFIRcVuYHymSx6p icDxsg60nKXQ== X-IronPort-AV: E=McAfee;i="6000,8403,9876"; a="159759550" X-IronPort-AV: E=Sophos;i="5.79,377,1602572400"; d="scan'208";a="159759550" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jan 2021 14:36:07 -0800 IronPort-SDR: ntLRWFScXw4JrK88VX4h8u9fF93qP3ZixkZMrT4BFbEk0zfvh5qocUC/rsR5vLT73QIMLY+uIM Pbo/qvMjWrUA== X-IronPort-AV: E=Sophos;i="5.79,377,1602572400"; d="scan'208";a="472913413" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.146]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jan 2021 14:36:06 -0800 Date: Tue, 26 Jan 2021 14:36:05 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: x86@kernel.org, Andrew Morton , Peter Zijlstra , Darren Hart , Andy Lutomirski , linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v5] x86/mce: Avoid infinite loop for copy from user recovery Message-ID: <20210126223605.GA14355@agluck-desk2.amr.corp.intel.com> References: <20210115152754.GC9138@zn.tnic> <20210115193435.GA4663@agluck-desk2.amr.corp.intel.com> <20210115205103.GA5920@agluck-desk2.amr.corp.intel.com> <20210115232346.GA7967@agluck-desk2.amr.corp.intel.com> <20210119105632.GF27433@zn.tnic> <20210119235759.GA9970@agluck-desk2.amr.corp.intel.com> <20210120121812.GF825@zn.tnic> <20210121210959.GA10304@agluck-desk2.amr.corp.intel.com> <20210125225509.GA7149@agluck-desk2.amr.corp.intel.com> <20210126110314.GC6514@zn.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210126110314.GC6514@zn.tnic> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 26, 2021 at 12:03:14PM +0100, Borislav Petkov wrote: > On Mon, Jan 25, 2021 at 02:55:09PM -0800, Luck, Tony wrote: > > And now I've changed it back to non-atomic (but keeping the > > slightly cleaner looking code style that I used for the atomic > > version). This one also works for thousands of injections and > > recoveries. Maybe take it now before it stops working again :-) > > Hmm, so the only differences I see between your v4 and this are: > > -@@ -1238,6 +1238,7 @@ static void __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *fin > +@@ -1238,6 +1238,9 @@ static void __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *fin > > static void kill_me_now(struct callback_head *ch) > { > ++ struct task_struct *p = container_of(ch, struct task_struct, mce_kill_me); > ++ > + p->mce_count = 0; > force_sig(SIGBUS); > } > > Could the container_of() macro have changed something? That change was to fix my brown paper bag moment (does not compile without a variable named "p" in scope to be used on next line.) > Because we don't know yet (right?) why would it fail? Would it read > stale ->mce_count data? If so, then a barrier is missing somewhere. I don't see how a barrier would make a differece. In the common case all this code is executed on the same logical CPU. Return from the do_machine_check() tries to return to user mode and finds that there is some "task_work" to execute first. In some cases Linux might context switch to something else. Perhaps this task even gets picked up by another CPU to run the task work queued functions. But I imagine that the context switch should act as a barrier ... shouldn't it? > Or what is the failure exactly? After a few cycles of the test injection to user mode, I saw an overflow in the machine check bank. As if it hadn't been cleared from the previous iteration ... but all the banks are cleared as soon as we find that the machine check is recoverable. A while before getting to the code I changed. When the tests were failing, code was on top of v5.11-rc3. Latest experiments moved to -rc5. There's just a tracing fix from PeterZ between rc3 and rc5 to mce/core.c: 737495361d44 ("x86/mce: Remove explicit/superfluous tracing") which doesn't appear to be a candidate for the problems I saw. > Because if I take it now without us knowing what the issue is, it will > start failing somewhere - Murphy's our friend - and then we'll have to > deal with breaking people's boxes. Not fun. Fair point. > The other difference is: > > @@ -76,8 +71,10 @@ index 13d3f1cbda17..5460c146edb5 100644 > - current->mce_kflags = m->kflags; > - current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); > - current->mce_whole_page = whole_page(m); > ++ int count = ++current->mce_count; > ++ > + /* First call, save all the details */ > -+ if (current->mce_count++ == 0) { > ++ if (count == 1) { > + current->mce_addr = m->addr; > + current->mce_kflags = m->kflags; > + current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); > > Hmm, a local variable and a pre-increment. Can that have an effect somehow? This is the bit that changed during my detour using atomic_t mce_count. I added the local variable to capture value from atomic_inc_return(), then used it later, instead of a bunch of atomic_read() calls. I kept it this way because "if (count == 1)" is marginally easier to read than "if (current->mce_count++ == 0)" > > + /* Ten is likley overkill. Don't expect more than two faults before task_work() */ > > Typo: likely. Oops. Fixed. -Tony