From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EC6D8CCD1AB for ; Fri, 24 Oct 2025 06:34:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3AB598E0037; Fri, 24 Oct 2025 02:34:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 382908E0002; Fri, 24 Oct 2025 02:34:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 299358E0037; Fri, 24 Oct 2025 02:34:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 141408E0002 for ; Fri, 24 Oct 2025 02:34:21 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C7C2813C344 for ; Fri, 24 Oct 2025 06:34:20 +0000 (UTC) X-FDA: 84032043480.18.C55FF35 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf17.hostedemail.com (Postfix) with ESMTP id 7952C40005 for ; Fri, 24 Oct 2025 06:34:16 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf17.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1761287659; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8stRqKVra8vW1T7mKX2XVqBaLprqqVaii3YoizrqAIk=; b=JDKaSqU/BdqnNXCTcus0g4uXaQkfZW2EmPomIzLyAo23gc1Iqj1nXCb4stkNuZdsL/tFwo aZSBMdcEB5goWKnj/f3BTvbKK9Bk2s/SEldCOIjxU8PDAAae4Ztkx/728InwlKsKm85nsI Qpp0knoZoTKYblsETgxgX4L/f/PMHH4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1761287659; a=rsa-sha256; cv=none; b=LlNNLrd5+TRQYKy1b4fDb8BSsl8cWLc5aDN/Xh0AY2erbnYXtRQvB499VBFsufkqogbbrr UppDuY4RIWQNXnpWN7odsZ83Dqwi50PWYJOhFAUSorCIzogRsmPSsvZCEucS4k5OH9Siqo JBkXKoKPMExnCSyUJTX9TOsc41ITvCc= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf17.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4ctClk5glPzvXFZ; Fri, 24 Oct 2025 14:33:38 +0800 (CST) Received: from dggemv712-chm.china.huawei.com (unknown [10.1.198.32]) by mail.maildlp.com (Postfix) with ESMTPS id F085C180B73; Fri, 24 Oct 2025 14:34:11 +0800 (CST) Received: from kwepemq500010.china.huawei.com (7.202.194.235) by dggemv712-chm.china.huawei.com (10.1.198.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 24 Oct 2025 14:34:11 +0800 Received: from [10.173.125.37] (10.173.125.37) by kwepemq500010.china.huawei.com (7.202.194.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 24 Oct 2025 14:34:09 +0800 Subject: Re: [PATCH v3 1/3] mm: handle poisoning of pfn without struct pages To: , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , References: <20251021102327.199099-1-ankita@nvidia.com> <20251021102327.199099-2-ankita@nvidia.com> From: Miaohe Lin Message-ID: <421a271d-ef7b-7c00-830c-85f18a6a7afa@huawei.com> Date: Fri, 24 Oct 2025 14:34:09 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <20251021102327.199099-2-ankita@nvidia.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.173.125.37] X-ClientProxiedBy: kwepems100001.china.huawei.com (7.221.188.238) To kwepemq500010.china.huawei.com (7.202.194.235) X-Rspam-User: X-Rspamd-Queue-Id: 7952C40005 X-Rspamd-Server: rspam02 X-Stat-Signature: bkqghp6p3af5cqxxz8f8mqxfh9gujrez X-HE-Tag: 1761287656-449266 X-HE-Meta: U2FsdGVkX19Kw/xiUYw4yKTaZyn7SbtW5ESKS5hKkHZ8jyrQyXQTHOL9cf37Vl7U5ydiQXFc+JpG1e5Duo+3WtK5k6HrkmRrBc58k5/MmpZ0O/Gz8is7I70afmCNjlGDmSzVeoN7A1GHvilTm7VHWpynlNj1RLY+S/Ln0t2/JIkxEJppVqAogAAH3U8Oti6qFgd6+pMGcjzeouzNbpt4VXB2i/COkB/VEJTxa4dRSBds1C9Idceff1NqyLqa0yuiRipbK7ERCFmb4bOKi0TeySnBp8RBHjWyZuskYxUAM5HQuIzYhY0N27nQWORrGUwnamsV0sZuB8vLXZSg8cV0oeREyWZaCyw2HL6d8XW+EiH+rOPsG6L+p0F3xvnT9g4qXFEQJ0PQSCzcGN7X5WbJhs6h878zqE5Bb+To5wDP5EQxNhGhqVRDP0/DIg1U5GZK6m3c7ig9H5ti9bCF51vYB9NqtXNBKde7DwnQbw/TwU64uVmWBcGuno8DxEYJpHHhDFiiAl5F7ODF/1WDp10cZErml/0cvdaBXuTEgIjSrVyBaTcbAU5XCIIHohIseVMOCtJzc1o3GLj8ELk0s1TJDl+9ywZVtBt7N19uFscx90tQ3y78ngPZldcUed6c23lymNTanKzXn2u/MfYN/QRySJIlQ4kdmToEYEgQIQbnp4+eIQSBLIt2+1nGyavZY6QIdqigN0axP8w11SlLet+ORSKx0qeBQWYqMA4NRiHw5+fur1YTetJEjjVSTY15IJg4ShWMke6EBF3Wrpd81JoDQO1/WF8AsS0ARY7lpplk6zmbFLeEPcGhlWSdqUgr+hpbsFyYMfT49X70LmLneSS6RWpOv0HxeZFpdoWkQ9zKmX76idvR36RC0gWqGHQB11E6Fffc9+alIpkmpuEXzX+SpL0N+CcAAK1UW+3luA3F3ytT9dr5kp1l9pD28IWveDEjIiP8yR5Ay6EKzq2LEJO 63ZStmYV EzxQwkcqDs+7Fjstx2LN60ztMG9dXX7bNe03tyWfbk45GzFO/6F1XdGxcdfU0yA6wxNo6F8yO+8LPVaUIEDb9q+ApChvdslQCgDPUVeNcFgBKJlPPoLN3efC/Jzuc4KYqxx5aNuQhcW6kHfNKQ7ceYqNwhPd87xr75G5vUMWKRL0UcET8a5kB8IUkD3O1/TA2eSFIpNOgpuV3gam114u875DIfdizuhi1/IQT3Dogr/qv0Q6AT2fb3aub5zOYwYvSZDbpuxoUuvO9gSF2qDlu8q2PqM74ai11Oyy9dQ6b8J7dQ7LgkPb4OAkqose507vGBcDIq/Iat02M4TmQXPi0Vcfp3dNoLW06BF6LgTp9B5lBLu1Sg2MjYTy+67zjEejcYBJKaXg5izJl7U7rIkBdk9n17r4EhtucNeOWoAFbAkFDXKE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/10/21 18:23, ankita@nvidia.com wrote: > From: Ankit Agrawal > > The kernel MM currently does not handle ECC errors / poison on a memory > region that is not backed by struct pages. If a memory region mapped > using remap_pfn_range() for example, but not added to the kernel, MM > will not have associated struct pages. Add a new mechanism to handle > memory failure on such memory. > > Make kernel MM expose a function to allow modules managing the device > memory to register the device memory SPA and the address space associated > it. MM maintains this information as an interval tree. On poison, MM can > search for the range that the poisoned PFN belong and use the address_space > to determine the mapping VMA. > > In this implementation, kernel MM follows the following sequence that is > largely similar to the memory_failure() handler for struct page backed > memory: > 1. memory_failure() is triggered on reception of a poison error. An > absence of struct page is detected and consequently memory_failure_pfn() > is executed. > 2. memory_failure_pfn() collects the processes mapped to the PFN. > 3. memory_failure_pfn() sends SIGBUS to all the processes mapping the > poisoned PFN using kill_procs(). > > Note that there is one primary difference versus the handling of the > poison on struct pages, which is to skip unmapping to the faulty PFN. > This is done to handle the huge PFNMAP support added recently [1] that > enables VM_PFNMAP vmas to map in either PMD level. Otherwise, a poison > to a PFN would need breaking the PMD mapping into PTEs to unmap only > the poisoned PFN. This will have a major performance impact. > > Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1] > > Signed-off-by: Ankit Agrawal Thanks for your patch. Some comments below. > --- > MAINTAINERS | 1 + > include/linux/memory-failure.h | 17 +++++ > include/linux/mm.h | 1 + > include/ras/ras_event.h | 1 + > mm/Kconfig | 1 + > mm/memory-failure.c | 128 ++++++++++++++++++++++++++++++++- > 6 files changed, 148 insertions(+), 1 deletion(-) > create mode 100644 include/linux/memory-failure.h > > diff --git a/MAINTAINERS b/MAINTAINERS > index 520fb4e379a3..463d062d0386 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -11359,6 +11359,7 @@ M: Miaohe Lin > R: Naoya Horiguchi > L: linux-mm@kvack.org > S: Maintained > +F: include/linux/memory-failure.h > F: mm/hwpoison-inject.c > F: mm/memory-failure.c > > diff --git a/include/linux/memory-failure.h b/include/linux/memory-failure.h > new file mode 100644 > index 000000000000..bc326503d2d2 > --- /dev/null > +++ b/include/linux/memory-failure.h > @@ -0,0 +1,17 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _LINUX_MEMORY_FAILURE_H > +#define _LINUX_MEMORY_FAILURE_H > + > +#include > + > +struct pfn_address_space; Do we need this declaration? > + > +struct pfn_address_space { > + struct interval_tree_node node; > + struct address_space *mapping; > +}; > + > +static int memory_failure_pfn(unsigned long pfn, int flags) > +{ > + struct interval_tree_node *node; > + LIST_HEAD(tokill); > + > + mutex_lock(&pfn_space_lock); > + /* > + * Modules registers with MM the address space mapping to the device memory they > + * manage. Iterate to identify exactly which address space has mapped to this > + * failing PFN. > + */ > + for (node = interval_tree_iter_first(&pfn_space_itree, pfn, pfn); node; > + node = interval_tree_iter_next(node, pfn, pfn)) { > + struct pfn_address_space *pfn_space = > + container_of(node, struct pfn_address_space, node); > + > + collect_procs_pfn(pfn_space->mapping, pfn, &tokill); > + } > + mutex_unlock(&pfn_space_lock); > + > + /* > + * Unlike System-RAM there is no possibility to swap in a different > + * physical page at a given virtual address, so all userspace > + * consumption of direct PFN memory necessitates SIGBUS (i.e. > + * MF_MUST_KILL) > + */ > + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; > + > + kill_procs(&tokill, true, pfn, flags); > + If pfn doesn't belong to any address space mapping, it's still counted as MF_RECOVERED? > + return action_result(pfn, MF_MSG_PFN_MAP, MF_RECOVERED); > +} > + > /** > * memory_failure - Handle memory failure of a page. > * @pfn: Page Number of the corrupted page > @@ -2259,6 +2380,11 @@ int memory_failure(unsigned long pfn, int flags) > if (!(flags & MF_SW_SIMULATED)) > hw_memory_failure = true; > > + if (!pfn_valid(pfn) && !arch_is_platform_page(PFN_PHYS(pfn))) { It's better to have some comments for this case. > + res = memory_failure_pfn(pfn, flags); > + goto unlock_mutex; > + } > + > p = pfn_to_online_page(pfn); > if (!p) { > res = arch_memory_failure(pfn, flags); Can we move above memory_failure_pfn block here? I'm worried that too many scenario branches might lead to confusion. Thanks. .