Re: [PATCH v2 1/1] mm/ksm: recover from memory failure on KSM page by migrating to healthy duplicate

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Long long Xia <xialonglong2025@163.com>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: markus.elfring@web.de, nao.horiguchi@gmail.com,
	akpm@linux-foundation.org, wangkefeng.wang@huawei.com,
	qiuxu.zhuo@intel.com, xu.xin16@zte.com.cn,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Longlong Xia <xialonglong@kylinos.cn>,
	david@redhat.com, lance.yang@linux.dev
Subject: Re: [PATCH v2 1/1] mm/ksm: recover from memory failure on KSM page by migrating to healthy duplicate
Date: Wed, 29 Oct 2025 15:12:45 +0800	[thread overview]
Message-ID: <394cb428-c37b-44c7-8367-4f76514a6322@163.com> (raw)
In-Reply-To: <db70f612-bbb1-0f9a-3dd6-884b1d64ab61@huawei.com>

Thanks for the reply.


在 2025/10/29 14:40, Miaohe Lin 写道:
> On 2025/10/28 15:54, Long long Xia wrote:
>> Thanks for the reply.
>>
>> 在 2025/10/23 19:54, Miaohe Lin 写道:
>>> On 2025/10/16 18:18, Longlong Xia wrote:
>>>> From: Longlong Xia <xialonglong@kylinos.cn>
>>>>
>>>> When a hardware memory error occurs on a KSM page, the current
>>>> behavior is to kill all processes mapping that page. This can
>>>> be overly aggressive when KSM has multiple duplicate pages in
>>>> a chain where other duplicates are still healthy.
>>>>
>>>> This patch introduces a recovery mechanism that attempts to
>>>> migrate mappings from the failing KSM page to a newly
>>>> allocated KSM page or another healthy duplicate already
>>>> present in the same chain, before falling back to the
>>>> process-killing procedure.
>>>>
>>>> The recovery process works as follows:
>>>> 1. Identify if the failing KSM page belongs to a stable node chain.
>>>> 2. Locate a healthy duplicate KSM page within the same chain.
>>>> 3. For each process mapping the failing page:
>>>>      a. Attempt to allocate a new KSM page copy from healthy duplicate
>>>>         KSM page. If successful, migrate the mapping to this new KSM page.
>>>>      b. If allocation fails, migrate the mapping to the existing healthy
>>>>         duplicate KSM page.
>>>> 4. If all migrations succeed, remove the failing KSM page from the chain.
>>>> 5. Only if recovery fails (e.g., no healthy duplicate found or migration
>>>>      error) does the kernel fall back to killing the affected processes.
>>>>
>>>> Signed-off-by: Longlong Xia <xialonglong@kylinos.cn>
>>> Thanks for your patch. Some comments below.
>>>
>>>> ---
>>>>    mm/ksm.c | 246 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 246 insertions(+)
>>>>
>>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>>> index 160787bb121c..9099bad1ab35 100644
>>>> --- a/mm/ksm.c
>>>> +++ b/mm/ksm.c
>>>> @@ -3084,6 +3084,246 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
>>>>    }
>>>>      #ifdef CONFIG_MEMORY_FAILURE
>>>> +static struct ksm_stable_node *find_chain_head(struct ksm_stable_node *dup_node)
>>>> +{
>>>> +    struct ksm_stable_node *stable_node, *dup;
>>>> +    struct rb_node *node;
>>>> +    int nid;
>>>> +
>>>> +    if (!is_stable_node_dup(dup_node))
>>>> +        return NULL;
>>>> +
>>>> +    for (nid = 0; nid < ksm_nr_node_ids; nid++) {
>>>> +        node = rb_first(root_stable_tree + nid);
>>>> +        for (; node; node = rb_next(node)) {
>>>> +            stable_node = rb_entry(node,
>>>> +                    struct ksm_stable_node,
>>>> +                    node);
>>>> +
>>>> +            if (!is_stable_node_chain(stable_node))
>>>> +                continue;
>>>> +
>>>> +            hlist_for_each_entry(dup, &stable_node->hlist,
>>>> +                    hlist_dup) {
>>>> +                if (dup == dup_node)
>>>> +                    return stable_node;
>>>> +            }
may I add cond_resched(); here ？
>>>> +        }
>>>> +    }
>>> Would above multiple loops take a long time in some corner cases?
>> Thanks for the concern.
>>
>> I do some simple test。
>>
>> Test 1: 10 Virtual Machines (Real-world Scenario)
>> Environment: 10 VMs (256MB each) with KSM enabled
>>
>> KSM State:
>> pages_sharing: 262,802 (≈1GB)
>> pages_shared: 17,374 （≈68MB）
>> pages_unshared = 124,057 (≈485MB)
>> total ≈1.5GB
>> chain_count = 9, not_chain_count = 17152
>> Red-black tree nodes to traverse:
>> 17,161 (9 chains + 17,152 non-chains)
>>
>> Performance:
>> find_chain: 898 μs (0.9 ms)
>> collect_procs_ksm: 4,409 μs (4.4 ms)
>> Total memory failure handling: 6,135 μs (6.1 ms)
>>
>>
>> Test 2: 10GB Single Process (Extreme Case)
>> Environment: Single process with 10GB memory,
>> 1,310,720 page pairs (each pair identical, different from others)
>>
>> KSM State:
>> pages_sharing: 1,311,740 （≈5GB)
>> pages_shared: 1,310,724 （≈5GB)
>> pages_unshared = 0
>> total ≈10GB
>> Red-black tree nodes to traverse:
>> 1,310,721 (1 chain + 1,310,720 non-chains)
>>
>> Performance:
>> find_chain: 28,822 μs (28.8 ms)
>> collect_procs_ksm: 45,944 μs (45.9 ms)
>> Total memory failure handling: 46,594 μs (46.6 ms)
> Thanks for your test.
>
>> Summary:
>> The find_chain function shows approximately linear scaling with the number of red-black tree nodes.
>> With a 76x increase in nodes (17,161 → 1,310,721), latency increased by 32x (898 μs → 28,822 μs).
>> representing 62% of total memory failure handling time (46.6ms).
>> However, since memory failures are rare events, this latency may be acceptable
>> as it does not impact normal system performance and only affects error recovery paths.
>>
> IMHO, the execution time of a kernel function must not be too long without any scheduling points.
> Otherwise it may affect the normal scheduling of the system and leads to something like performance
> fluctuation. Or am I miss something?
>
> Thanks.
> .

I will add cond_resched()in the loop of red-black tree to allow 
scheduling in find_chain(), may be it is enough?

Best regards,
Longlong Xia

next prev parent reply	other threads:[~2025-10-29  7:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16 10:18 [PATCH v2 0/1] " Longlong Xia
2025-10-16 10:18 ` [PATCH v2 1/1] " Longlong Xia
2025-10-16 14:37   ` [PATCH v2] " Markus Elfring
2025-10-17  3:09   ` [PATCH v2 1/1] " kernel test robot
2025-10-23 11:54   ` Miaohe Lin
2025-10-28  7:54     ` Long long Xia
2025-10-29  6:40       ` Miaohe Lin
2025-10-29  7:12         ` Long long Xia [this message]
2025-10-30  2:56           ` Miaohe Lin
2025-10-28  9:44   ` David Hildenbrand
2025-11-03 15:15     ` [PATCH v3 0/2] mm/ksm: try " Longlong Xia
2025-11-03 15:16       ` [PATCH v3 1/2] mm/ksm: add helper to allocate and initialize stable node duplicates Longlong Xia
2025-11-03 15:16       ` [PATCH v3 2/2] mm/ksm: try recover from memory failure on KSM page by migrating to healthy duplicate Longlong Xia
2025-10-16 10:46 ` [PATCH v2 0/1] mm/ksm: " David Hildenbrand
2025-10-21 14:00   ` Long long Xia
2025-10-23 16:16     ` David Hildenbrand
2025-10-16 11:01 ` Markus Elfring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=394cb428-c37b-44c7-8367-4f76514a6322@163.com \
    --to=xialonglong2025@163.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=lance.yang@linux.dev \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=markus.elfring@web.de \
    --cc=nao.horiguchi@gmail.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=xialonglong@kylinos.cn \
    --cc=xu.xin16@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox