From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 943A0C6FD18 for ; Thu, 20 Apr 2023 03:00:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 138496B0071; Wed, 19 Apr 2023 23:00:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E8F26B0072; Wed, 19 Apr 2023 23:00:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF2D16B0074; Wed, 19 Apr 2023 23:00:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E3C7F6B0071 for ; Wed, 19 Apr 2023 23:00:03 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A9375AC4F0 for ; Thu, 20 Apr 2023 03:00:03 +0000 (UTC) X-FDA: 80700265086.28.82A221B Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf08.hostedemail.com (Postfix) with ESMTP id 0FBFF160003 for ; Thu, 20 Apr 2023 02:59:59 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681959601; a=rsa-sha256; cv=none; b=VNXv4sdismryrWvX8FwuXgDAudUE/Db2GsevE+C1+36Dc13bR7/uy3y5tzwk5jdQAii0MA yk/fqye7y6uE+nPvfh0VIFUKCnc44COPGABCHzo7+rEiNKBhsncI03nAhLui9PZASpqB4w 3DzTKeYUKBfsK6giV4ZNXrx71njdQzg= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681959601; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=THktoUkUAjmilNxEphE31AkUreFktUqBj/uudloKYeo=; b=8mbYvHdynzYVd0k17GuClOraSfL5x31+Ryk2JKz0+b5cx0ZSzvn2tLoanZXo18QwB+Sitk evX4ingpF0OGlWd3Eexi+jzpfiih/B3QgS+PESshlhCmAsynZPErIZfWMRTWScTT9UINB1 sm7516Fl889kVK7KsUBpgoaKeykATRo= Received: from dggpemm500001.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Q22Rq2rsnz8xC7; Thu, 20 Apr 2023 10:59:03 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 20 Apr 2023 10:59:54 +0800 Message-ID: <6dc1b117-020e-be9e-7e5e-a349ffb7d00a@huawei.com> Date: Thu, 20 Apr 2023 10:59:54 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: Re: [PATCH v2] mm: hwpoison: coredump: support recovery from dump_user_range() Content-Language: en-US To: Jane Chu , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Thomas Gleixner CC: Alexander Viro , Christian Brauner , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , Andrew Morton , Miaohe Lin , "linux-kernel@vger.kernel.org" , Tong Tiangen , Jens Axboe References: <20230417045323.11054-1-wangkefeng.wang@huawei.com> <20230418031243.GA2845864@hori.linux.bs1.fc.nec.co.jp> <54d761bb-1bcc-21a2-6b53-9d797a3c076b@huawei.com> <20230419072557.GA2926483@hori.linux.bs1.fc.nec.co.jp> <9fa67780-c48f-4675-731b-4e9a25cd29a0@huawei.com> <7d0c38a9-ed2a-a221-0c67-4a2f3945d48b@oracle.com> From: Kefeng Wang In-Reply-To: <7d0c38a9-ed2a-a221-0c67-4a2f3945d48b@oracle.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 0FBFF160003 X-Rspamd-Server: rspam01 X-Stat-Signature: nftdfnemcznqw8r9ep7b8gembr3t4sjk X-HE-Tag: 1681959599-364130 X-HE-Meta: U2FsdGVkX18Q7z422M+1i5mQA2WErVgRrsE1nhs6UcUjJynFcYLkllIk0t60BUOA+pBhhqiPX+BPrR6XZCmg7qT3cyZRn9S6I1CBLdPBwpBFkWZo8f8He7uK8wRmPOsOK7rPcxWARVm4J8fJSXfcMqHddrnLHipb3+gHml3OVENm2R58S8ZygOgmBB7eFV4DJcVJ/Lgr/zmHakJQyR7Cl8N8JtSJ8m4Qc8fsXIzISy5mB3sAqhut9FKlHpqrEpLpTsSn6Qry5mk0v6auhVsWQvbWZ86OPdV0eW/SDPCdaX1EdQEbmXjQYss9KhKswL8L/Q14ZRekE6aWVRTrBwLoOXWZXcdYB9UX0R+kSPHHjmbXfswfshISYS3bH3vB1tZHDdmlZRtLUZ/4/Y8xgWscBGOpI+aeCryuvfvMSOCV9IM/liaL+kaXbMvdsgoE9vPkH8mORbMychKzQMJnGH0m1mjsDHvhZRwtFXJphHI91vpKAT5w8Q+w5LWtIWbxL0IKFPWZ/mNdSiEm2+wEbfYzgPPhmHfN6R3D1Bbr2BrDpD0jrAVckaz5YQ/H/WnEXSKMpk57DfQ4XgsfTsv+HhnP7il3e8aJDRV52Jqp5MT9r4tafEtoVZz5NwUq+DQUSz0dtPF4xviCxAqaa/YTDvWdBqJyB+fLQ6VFSzKAb0nEwFibblGJzNzMIEtm2VKU+IPhytwycwP/crueJKhjogIM6otlYCuGDFdu3La5ZO9K2v3WurcBZrlD/rAMRnGGdyKKx/+i/eCHdGC7uiqfEArmLeu209P1gQgo8kY+8OBvm3D9Q7GmO5DRlHZFe2WFwqFfjmrniGvdsbbL/1Z3qLjTlNCQLYa5F1mJMkPPP2TlBvU/55IDWhP/KbxyqdKKTX5/Vm8FSTuYA/AKneUxP201TsgiHMjqqqth7NAx081zzxl135m25yCXd+RaWzMyCsQfztJF2GBmCtTfjE1LLhC yWC70qPB 4YutQjhTutH6OhR58GSj7hftZYfB2Yh0ZNBC8FMifcn1yaCnrIZK4lLodQP095e1O6P9GXZJYUSnxMaZU+mksJhqTfxCjPxX4z5zGECX12OWdcRHdPHJmZ+8VYhQYEG14ns3wiz+H3jy432XXKaHpIn9zHVwE+90/u8oGnHx4I0rmU+LNeFWgjj8qXRIxhlTkOETOIcdItu2G9feyMGU0NSZI1LDElUVCDntK8e75F+ABQ5LXGRhhh5Kghg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/4/20 10:03, Jane Chu wrote: > > On 4/19/2023 5:03 AM, Kefeng Wang wrote: >> >> >> On 2023/4/19 15:25, HORIGUCHI NAOYA(堀口 直也) wrote: >>> On Tue, Apr 18, 2023 at 05:45:06PM +0800, Kefeng Wang wrote: >>>> >>>> ... >>>>>> @@ -371,6 +372,14 @@ size_t _copy_mc_to_iter(const void *addr, >>>>>> size_t bytes, struct iov_iter *i) >>>>>>    EXPORT_SYMBOL_GPL(_copy_mc_to_iter); >>>>>>    #endif /* CONFIG_ARCH_HAS_COPY_MC */ >>>>>> +static void *memcpy_from_iter(struct iov_iter *i, void *to, const >>>>>> void *from, >>>>>> +                 size_t size) >>>>>> +{ >>>>>> +    if (iov_iter_is_copy_mc(i)) >>>>>> +        return (void *)copy_mc_to_kernel(to, from, size); >>>>> >>>>> Is it helpful to call memory_failure_queue() if copy_mc_to_kernel() >>>>> fails >>>>> due to a memory error? >>>> >>>> For dump_user_range(), the task is dying, if copy incomplete size, the >>>> coredump will fail and task will exit, also memory_failure will >>>> be called by kill_me_maybe(), >>>> >>>>   CPU: 0 PID: 1418 Comm: test Tainted: G   M               6.3.0-rc5 >>>> #29 >>>>   Call Trace: >>>>    >>>>    dump_stack_lvl+0x37/0x50 >>>>    memory_failure+0x51/0x970 >>>>    kill_me_maybe+0x5b/0xc0 >>>>    task_work_run+0x5a/0x90 >>>>    exit_to_user_mode_prepare+0x194/0x1a0 >>>>    irqentry_exit_to_user_mode+0x9/0x30 >>>>    noist_exc_machine_check+0x40/0x80 >>>>    asm_exc_machine_check+0x33/0x40 >>> >>> Is this call trace printed out when copy_mc_to_kernel() failed by >>> finding >>> a memory error (or in some testcase using error injection)? >> >> I add dump_stack() into memory_failure() to check whether the poisoned >> memory is called or not, and the call trace shows it do call >> memory_failure(), but I get confused when do the test. >> >>> In my understanding, an MCE should not be triggered when MC-safe copy >>> tries >>> to access to a memory error.  So I feel that we might be talking about >>> different scenarios. >>> >>> When I questioned previously, I thought about the following scenario: >>> >>>    - a process terminates abnormally for any reason like segmentation >>> fault, >>>    - then, kernel tries to create a coredump, >>>    - during this, the copying routine accesses to corrupted page to >>> read. >>> >> Yes, we tested like your described, >> >> 1) inject memory error into a process >> 2) send a SIGABT/SIGBUS to process to trigger the coredump >> >> Without patch, the system panic, and with patch only process exits. >> >>> In this case the corrupted page should not be handled by >>> memory_failure() >>> yet (because otherwise properly handled hwpoisoned page should be >>> ignored >>> by coredump process).  The coredump process would exit with failure with >>> your patch, but then, the corrupted page is still left unhandled and can >>> be reused, so any other thread can easily access to it again. >> >> As shown above, the corrupted page will be handled by >> memory_failure(), but what I'm wondering, >> 1) memory_failure() is not always called >> 2) look at the above call trace, it looks like from asynchronous >>     interrupt, not from synchronous exception, right? >> >>> >>> You can find a few other places (like __wp_page_copy_user and >>> ksm_might_need_to_copy) >>> to call memory_failure_queue() to cope with such unhandled error pages. >>> So does memcpy_from_iter() do the same? >> >> I add some debug print in do_machine_check() on x86: >> >> 1) COW, >>    m.kflags: MCE_IN_KERNEL_RECOV >>    fixup_type: EX_TYPE_DEFAULT_MCE_SAFE >> >>    CPU: 11 PID: 2038 Comm: einj_mem_uc >>    Call Trace: >>     <#MC> >>     dump_stack_lvl+0x37/0x50 >>     do_machine_check+0x7ad/0x840 >>     exc_machine_check+0x5a/0x90 >>     asm_exc_machine_check+0x1e/0x40 >>    RIP: 0010:copy_mc_fragile+0x35/0x62 >> >>    if (m.kflags & MCE_IN_KERNEL_RECOV) { >>            if (!fixup_exception(regs, X86_TRAP_MC, 0, 0)) >>                    mce_panic("Failed kernel mode recovery", &m, msg); >>    } >> >>    if (m.kflags & MCE_IN_KERNEL_COPYIN) >>            queue_task_work(&m, msg, kill_me_never); >> >> There is no memory_failure() called when >> EX_TYPE_DEFAULT_MCE_SAFE, also EX_TYPE_FAULT_MCE_SAFE too, >> so we manually add a memory_failure_queue() to handle with >> the poisoned page. >> >> 2) Coredump,  nothing print about m.kflags and fixup_type, >> with above check, add a memory_failure_queue() or memory_failure() seems >> to be needed for memcpy_from_iter(), but it is totally different from >> the COW scenario >> >> >> Another question, other copy_mc_to_kernel() callers, eg, >> nvdimm/dm-writecache/dax, there are not call memory_failure_queue(), >> should they need a memory_failure_queue(), if so, why not add it into >> do_machine_check() ? > What I mean is that EX_TYPE_DEFAULT_MCE_SAFE/EX_TYPE_FAULT_MCE_SAFE is designed to identify fixups which allow in kernel #MC recovery, that is, the caller of copy_mc_to_kernel() must know the source is a user address, so we could add a MCE_IN_KERNEL_COPYIN fro the MCE_SAFE type. diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c index c4477162c07d..63e94484c5d6 100644 --- a/arch/x86/kernel/cpu/mce/severity.c +++ b/arch/x86/kernel/cpu/mce/severity.c @@ -293,12 +293,11 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs) case EX_TYPE_COPY: if (!copy_user) return IN_KERNEL; - m->kflags |= MCE_IN_KERNEL_COPYIN; fallthrough; case EX_TYPE_FAULT_MCE_SAFE: case EX_TYPE_DEFAULT_MCE_SAFE: - m->kflags |= MCE_IN_KERNEL_RECOV; + m->kflags |= MCE_IN_KERNEL_RECOV | MCE_IN_KERNEL_COPYIN; return IN_KERNEL_RECOV; default: then we could drop memory_failure_queue(pfn, flags) from cow/ksm copy, or every Machine Check safe memory copy will need a memory_failure_xx() call. +Thomas,who add the two types, could you share some comments about this,thanks. > In the dax case, if the source address is poisoned, and we do follow up > with memory_failure_queue(pfn, flags), what should the value of the > 'flags' be ? I think flags = 0 is enough to for all copy_mc_xxx to isolate the poisoned page. Thanks.