From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC542C43334 for ; Sat, 18 Jun 2022 12:52:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D6366B0071; Sat, 18 Jun 2022 08:52:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 486D76B0072; Sat, 18 Jun 2022 08:52:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3261D6B0073; Sat, 18 Jun 2022 08:52:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 20CA66B0071 for ; Sat, 18 Jun 2022 08:52:35 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id ED80534D4E for ; Sat, 18 Jun 2022 12:52:34 +0000 (UTC) X-FDA: 79591345428.04.99137F4 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 665CEC00B1 for ; Sat, 18 Jun 2022 12:52:34 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B43C5113E; Sat, 18 Jun 2022 05:52:33 -0700 (PDT) Received: from FVFF77S0Q05N (unknown [10.57.35.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1F5253F792; Sat, 18 Jun 2022 05:52:28 -0700 (PDT) Date: Sat, 18 Jun 2022 13:52:24 +0100 From: Mark Rutland To: Tong Tiangen Cc: James Morse , Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Robin Murphy , Dave Hansen , Catalin Marinas , Will Deacon , Alexander Viro , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , x86@kernel.org, "H . Peter Anvin" , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Kefeng Wang , Xie XiuQi , Guohanjun Subject: Re: [PATCH -next v5 6/8] arm64: add support for machine check error safe Message-ID: References: <20220528065056.1034168-1-tongtiangen@huawei.com> <20220528065056.1034168-7-tongtiangen@huawei.com> <4aa8b109-c79b-8da0-db89-85ca128f1049@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4aa8b109-c79b-8da0-db89-85ca128f1049@huawei.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655556754; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4K0c0FGYB67OXTPpQQapOQLWfaqxguXyN0ybNLBUuLE=; b=Wm9Te8Fts18wnyIZXSwIgel4zzBFLdQmp1RuAT+14nt7ALmgM5H8y/9kcIsIp3DNnCjYDF kUTQgocoWXtO4YEV66R+u7Vrtd3UQQLmYDPwVR37pfFI+I8JZAlnG2jRaCoDEf2N00DHME cKoqo3K7oLAU/5S6pz41kE8sgSknx4c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655556754; a=rsa-sha256; cv=none; b=F7LVhQmyuHkcT9Wcgz0tdftIW4gEKXl93u8ai72pk3ogLslyAwuagYqarM3sPTVKLpSILl ZQXW+c/H8wsRvKU0eImjni/nQaoPhSpTCOlIWHUfDDiDwAbFj+4P4kcjOHNg0uW4qkDeQE Eu7J48VNy/eoxTEnCU8GXTSuKHWWKZk= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf22.hostedemail.com: domain of mark.rutland@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=mark.rutland@arm.com X-Stat-Signature: ordindo7z6c49f3gfnrpuaycinnbzhfi X-Rspamd-Queue-Id: 665CEC00B1 Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf22.hostedemail.com: domain of mark.rutland@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=mark.rutland@arm.com X-Rspamd-Server: rspam01 X-Rspam-User: X-HE-Tag: 1655556754-919352 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jun 18, 2022 at 05:18:55PM +0800, Tong Tiangen wrote: > 在 2022/6/17 16:55, Mark Rutland 写道: > > On Sat, May 28, 2022 at 06:50:54AM +0000, Tong Tiangen wrote: > > > +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr, > > > + struct pt_regs *regs, int sig, int code) > > > +{ > > > + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC)) > > > + return false; > > > + > > > + if (user_mode(regs) || !current->mm) > > > + return false; > > > > What's the `!current->mm` check for? > > At first, I considered that only user processes have the opportunity to > recover when they trigger memory error. > > But it seems that this restriction is unreasonable. When the kernel thread > triggers memory error, it can also be recovered. for instance: > > https://lore.kernel.org/linux-mm/20220527190731.322722-1-jiaqiyan@google.com/ > > And i think if(!current->mm) shoud be added below: > > if(!current->mm) { > set_thread_esr(0, esr); > arm64_force_sig_fault(...); > } > return true; Why does 'current->mm' have anything to do with this, though? There can be kernel threads with `current->mm` set in unusual circumstances (and there's a lot of kernel code out there which handles that wrong), so if you want to treat user tasks differently, we should be doing something like checking PF_KTHREAD, or adding something like an is_user_task() helper. [...] > > > + > > > + if (apei_claim_sea(regs) < 0) > > > + return false; > > > + > > > + if (!fixup_exception_mc(regs)) > > > + return false; > > > > I thought we still wanted to signal the task in this case? Or do you expect to > > add that into `fixup_exception_mc()` ? > > Yeah, here return false and will signal to task in do_sea() -> > arm64_notify_die(). I mean when we do the fixup. I thought the idea was to apply the fixup (to stop the kernel from crashing), but still to deliver a fatal signal to the user task since we can't do what the user task asked us to. > > > + > > > + set_thread_esr(0, esr); > > > > Why are we not setting the address? Is that deliberate, or an oversight? > > Here set fault_address to 0, i refer to the logic of arm64_notify_die(). > > void arm64_notify_die(...) > { > if (user_mode(regs)) { > WARN_ON(regs != current_pt_regs()); > current->thread.fault_address = 0; > current->thread.fault_code = err; > > arm64_force_sig_fault(signo, sicode, far, str); > } else { > die(str, regs, err); > } > } > > I don't know exactly why and do you know why arm64_notify_die() did this? :) To be honest, I don't know, and that looks equally suspicious to me. Looking at the git history, that was added in commit: 9141300a5884b57c ("arm64: Provide read/write fault information in compat signal handlers") ... so maybe Catalin recalls why. Perhaps the assumption is just that this will be fatal and so unimportant? ... but in that case the same logic would apply to the ESR value, so it's not clear to me. Mark.