From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 21EF5CCD1BF for ; Wed, 29 Oct 2025 06:40:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6949A8E0037; Wed, 29 Oct 2025 02:40:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66BFF8E0015; Wed, 29 Oct 2025 02:40:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A92E8E0037; Wed, 29 Oct 2025 02:40:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 46A818E0015 for ; Wed, 29 Oct 2025 02:40:24 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D401B14078B for ; Wed, 29 Oct 2025 06:40:23 +0000 (UTC) X-FDA: 84050202726.05.6858DB5 Received: from canpmsgout04.his.huawei.com (canpmsgout04.his.huawei.com [113.46.200.219]) by imf19.hostedemail.com (Postfix) with ESMTP id A494F1A0009 for ; Wed, 29 Oct 2025 06:40:20 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=huawei.com header.s=dkim header.b=IpMOnKmo; spf=pass (imf19.hostedemail.com: domain of linmiaohe@huawei.com designates 113.46.200.219 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1761720022; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R2FdD5J/Lr6BnGwMKRTDIEW2p4Dh3z/n14/cPtSxKg4=; b=g7mBGoXMR8YMHNnJHv1L5tgrV6ffOSP25hSSOX3Egc6dhytnVdrC69uwoH+JAvcuvhMy+C b6Dw8O5wBprW+h58aufMVgtxlDZZKaDy9CCaFKJdQPs0p2Vgu1+yXGa521w18e9x0+WL7n PpNr6UVJO2rN+xEEH0Cv8ZjrJS7zQ6M= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=huawei.com header.s=dkim header.b=IpMOnKmo; spf=pass (imf19.hostedemail.com: domain of linmiaohe@huawei.com designates 113.46.200.219 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1761720022; a=rsa-sha256; cv=none; b=TPShehj7DfHZab/gt4xoeLrQ2gd/tj3kqXLFXOTXXEWHtO/UPSCe1+D9w9pgPfhqHqWu9Y GESihtWoO1zbWO1Idf4+xVagQXeTWQlfwuYO/YyxIrEe3ylCtH4Lk+dPoYD+ITQiKJhbVW SGItmV6mh8z3wYNXN0bguKTqtBi4G8E= dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=R2FdD5J/Lr6BnGwMKRTDIEW2p4Dh3z/n14/cPtSxKg4=; b=IpMOnKmoS6zK+4EVtk8S3WHaTh5AL00RQBseN5kgB2Ev70POZb4Je5zUJWtprvuJpp47OgMEq MOXFez0vV1bPbCJ4ziREV4gK0zcNxU3m1eII/EP21O++DpFU7LSEZsnLYqk6jaxqESBjosdlxTd uSlw04DKUcdrRNDy8XH4TYQ= Received: from mail.maildlp.com (unknown [172.19.162.254]) by canpmsgout04.his.huawei.com (SkyGuard) with ESMTPS id 4cxHfV2CvBz1prQ0; Wed, 29 Oct 2025 14:39:46 +0800 (CST) Received: from dggemv706-chm.china.huawei.com (unknown [10.3.19.33]) by mail.maildlp.com (Postfix) with ESMTPS id 8643D18048D; Wed, 29 Oct 2025 14:40:15 +0800 (CST) Received: from kwepemq500010.china.huawei.com (7.202.194.235) by dggemv706-chm.china.huawei.com (10.3.19.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 29 Oct 2025 14:40:15 +0800 Received: from [10.173.125.37] (10.173.125.37) by kwepemq500010.china.huawei.com (7.202.194.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 29 Oct 2025 14:40:14 +0800 Subject: Re: [PATCH v2 1/1] mm/ksm: recover from memory failure on KSM page by migrating to healthy duplicate To: Long long Xia CC: , , , , , , , , Longlong Xia , , References: <20251016101813.484565-1-xialonglong2025@163.com> <20251016101813.484565-2-xialonglong2025@163.com> <7c069611-21e1-40df-bdb1-a3144c54507e@163.com> From: Miaohe Lin Message-ID: Date: Wed, 29 Oct 2025 14:40:14 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <7c069611-21e1-40df-bdb1-a3144c54507e@163.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.173.125.37] X-ClientProxiedBy: kwepems100002.china.huawei.com (7.221.188.206) To kwepemq500010.china.huawei.com (7.202.194.235) X-Stat-Signature: xypsdszqttptw8d54jpfdz6stojfar3t X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A494F1A0009 X-HE-Tag: 1761720020-442591 X-HE-Meta: U2FsdGVkX1/nJflsxM9MZL+HaE8JO4VfwVHhzFq+FG9cwck24BotAg7NcDQtEBsxktL7R9HLraXihTtGekW9iC/zUeu1kA1Yjp0XUyCHE+hZU6YZ32Jz0nwbpx0YqVuB8bjc1VUi7qMsOCL1w7/enjVQdsdb+EiDMeFwCxq9iDOpkqZk1klt2mOu1g4iZ5C03wNaT2rP1b0P9LF/Kl6cVjBSaGGnAsopkqMT4RqldxiQMg+CIdZ74OjOr/4OxOhfrLMBYBHHiFukVC8d4VptExPsFvzFaBiQBNx6UdNTgynHO4ER5YdvBs2Xb7UV4aMVSxKykUDvMDJZUAenjTdkIWRp76/CVW8sRY85QhNElSUEzOvBoZ/WwI0nOAfbIzlWe1nuQdz9Zi15e6Aih2i4nICy9z0EE7w/EOLiZWwJCZjGYdfzwmq5ufMFEdsyrYpxctz0Dx9HNbcdD0/ZBPUk1CtfMkj/rvOabDptXQ4cMEa9B/AYQLw01IRIJ8BA8I/NuZEV9LQvltVPomVAcIBl4JteG7HV+j17nGzRfmeADV6MkzgKoEfttdZ2l+mV2bT5z7ReZNWh2Y9PkS9fJXiugdfK55DpWYrfVtakSKazv8/Z9TpdFvRwghAdRzxURna4Os8H8jHTklpLYovUvXpHwKr4TsBWbc8F6biTKRLOm7UFar3XLBqw+xwuqDIcAnkWXLS3SvSv/1ONJpeQIED93bPcQfGCyFLmU9XJwGXWZKPj+lk50WnufKC1wcNjU9WrrnAA4XhnsdjUHtbChHgW+Uf3R3nYO2mZqj1ynQ6dctmcCZ55ef4xfeOCUHO/oTuZiOqZLxcOlzyo7L922H4z1EeCG6GP1X9+RDDTSP3v/xrdKyvaul2sKmPQAaqXzlOyJ+vb4KRZXrcnn904T9ymX7KeZTqn2l3BdFrCeH7srlxMgfGyJd4dflv2eQv9gv+FiofDMGKKH3N5Vex0y7/ HdOyP14H viiLm4xOg02UjJ08gnds5gSd3k/BfKRR1A/hKqvvZVw7Z9SnGu3PAr4L6srd26BRmtCxauHlDzvpE95ASVe4oWuscJq0QtmGcMBktFPxY0nJ5UBl8823xp1sA5ybJgUxtkuBf6XEXgr6C6HrLvrPiMcVx5ObGcpvdWBszjCF1F1xTOW9kGzwS6U+BUyg+GggXpoy7CL2dDR8BZ5DoSlr6vvN3ElzpBNXJFXuSjAjQe/YD1u9zqqziMZnX4asnS5LvpzkSF8hWBXfwTv1BmCzkutYSFiEwHY4GUSKOcsbUGqVCG8gmRfwwqkb6pp0YLwBQKxBmGndMN5aExzUDzfziWM3M/r+m52vzHn4KuD/XZS8Ao5uY7+7mZb6AVYi1YqwFhacQ9SFpAws+OzFzTQ0C8MHhEQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/10/28 15:54, Long long Xia wrote: > Thanks for the reply. > > 在 2025/10/23 19:54, Miaohe Lin 写道: >> On 2025/10/16 18:18, Longlong Xia wrote: >>> From: Longlong Xia >>> >>> When a hardware memory error occurs on a KSM page, the current >>> behavior is to kill all processes mapping that page. This can >>> be overly aggressive when KSM has multiple duplicate pages in >>> a chain where other duplicates are still healthy. >>> >>> This patch introduces a recovery mechanism that attempts to >>> migrate mappings from the failing KSM page to a newly >>> allocated KSM page or another healthy duplicate already >>> present in the same chain, before falling back to the >>> process-killing procedure. >>> >>> The recovery process works as follows: >>> 1. Identify if the failing KSM page belongs to a stable node chain. >>> 2. Locate a healthy duplicate KSM page within the same chain. >>> 3. For each process mapping the failing page: >>>     a. Attempt to allocate a new KSM page copy from healthy duplicate >>>        KSM page. If successful, migrate the mapping to this new KSM page. >>>     b. If allocation fails, migrate the mapping to the existing healthy >>>        duplicate KSM page. >>> 4. If all migrations succeed, remove the failing KSM page from the chain. >>> 5. Only if recovery fails (e.g., no healthy duplicate found or migration >>>     error) does the kernel fall back to killing the affected processes. >>> >>> Signed-off-by: Longlong Xia >> Thanks for your patch. Some comments below. >> >>> --- >>>   mm/ksm.c | 246 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>   1 file changed, 246 insertions(+) >>> >>> diff --git a/mm/ksm.c b/mm/ksm.c >>> index 160787bb121c..9099bad1ab35 100644 >>> --- a/mm/ksm.c >>> +++ b/mm/ksm.c >>> @@ -3084,6 +3084,246 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc) >>>   } >>>     #ifdef CONFIG_MEMORY_FAILURE >>> +static struct ksm_stable_node *find_chain_head(struct ksm_stable_node *dup_node) >>> +{ >>> +    struct ksm_stable_node *stable_node, *dup; >>> +    struct rb_node *node; >>> +    int nid; >>> + >>> +    if (!is_stable_node_dup(dup_node)) >>> +        return NULL; >>> + >>> +    for (nid = 0; nid < ksm_nr_node_ids; nid++) { >>> +        node = rb_first(root_stable_tree + nid); >>> +        for (; node; node = rb_next(node)) { >>> +            stable_node = rb_entry(node, >>> +                    struct ksm_stable_node, >>> +                    node); >>> + >>> +            if (!is_stable_node_chain(stable_node)) >>> +                continue; >>> + >>> +            hlist_for_each_entry(dup, &stable_node->hlist, >>> +                    hlist_dup) { >>> +                if (dup == dup_node) >>> +                    return stable_node; >>> +            } >>> +        } >>> +    } >> Would above multiple loops take a long time in some corner cases? > > Thanks for the concern. > > I do some simple test。 > > Test 1: 10 Virtual Machines (Real-world Scenario) > Environment: 10 VMs (256MB each) with KSM enabled > > KSM State: > pages_sharing: 262,802 (≈1GB) > pages_shared: 17,374 (≈68MB) > pages_unshared = 124,057 (≈485MB) > total ≈1.5GB > chain_count = 9, not_chain_count = 17152 > Red-black tree nodes to traverse: > 17,161 (9 chains + 17,152 non-chains) > > Performance: > find_chain: 898 μs (0.9 ms) > collect_procs_ksm: 4,409 μs (4.4 ms) > Total memory failure handling: 6,135 μs (6.1 ms) > > > Test 2: 10GB Single Process (Extreme Case) > Environment: Single process with 10GB memory, > 1,310,720 page pairs (each pair identical, different from others) > > KSM State: > pages_sharing: 1,311,740 (≈5GB) > pages_shared: 1,310,724 (≈5GB) > pages_unshared = 0 > total ≈10GB > Red-black tree nodes to traverse: > 1,310,721 (1 chain + 1,310,720 non-chains) > > Performance: > find_chain: 28,822 μs (28.8 ms) > collect_procs_ksm: 45,944 μs (45.9 ms) > Total memory failure handling: 46,594 μs (46.6 ms) Thanks for your test. > > Summary: > The find_chain function shows approximately linear scaling with the number of red-black tree nodes. > With a 76x increase in nodes (17,161 → 1,310,721), latency increased by 32x (898 μs → 28,822 μs). > representing 62% of total memory failure handling time (46.6ms). > However, since memory failures are rare events, this latency may be acceptable > as it does not impact normal system performance and only affects error recovery paths. > IMHO, the execution time of a kernel function must not be too long without any scheduling points. Otherwise it may affect the normal scheduling of the system and leads to something like performance fluctuation. Or am I miss something? Thanks. .