From: Zi Yan <ziy@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>,
akpm@linux-foundation.org, linmiaohe@huawei.com,
linux-mm@kvack.org, wangkefeng.wang@huawei.com
Subject: Re: [PATCH] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
Date: Thu, 12 Jun 2025 12:48:18 -0400 [thread overview]
Message-ID: <45D4230C-F5CC-49B2-B672-C65E9D99BB3F@nvidia.com> (raw)
In-Reply-To: <94438931-d78f-4d5d-be4e-86938225c7c8@redhat.com>
On 12 Jun 2025, at 11:50, David Hildenbrand wrote:
> On 12.06.25 17:35, Zi Yan wrote:
>> On 12 Jun 2025, at 3:53, David Hildenbrand wrote:
>>
>>> On 11.06.25 19:52, Zi Yan wrote:
>>>> On 11 Jun 2025, at 13:34, David Hildenbrand wrote:
>>>>
>>>>>> So __folio_split() has an implicit rule that:
>>>>>> 1. if the given list is not NULL, the folio cannot be on LRU;
>>>>>> 2. if the given list is NULL, the folio is on LRU.
>>>>>>
>>>>>> And the rule is buried deeply in lru_add_split_folio().
>>>>>>
>>>>>> Should we add some checks in __folio_split()?
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index d3e66136e41a..8ce2734c9ca0 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -3732,6 +3732,11 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>> VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
>>>>>> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
>>>>>>
>>>>>> + if (list && folio_test_lru(folio))
>>>>>> + return -EINVAL;
>>>>>> + if (!list && !folio_test_lru(folio))
>>>>>> + return -EINVAL;
>>>>>> +
>>>>>
>>>>> I guess we currently don't run into that, because whenever a folio is otherwise isolated, there is an additional reference or a page table mapping, so it cannot get split either way (e.g., freezing the refcount fails).
>>>>>
>>>>> So maybe these checks would be too early and they should happen after we froze the refcount?
>>>>
>>>> But if the caller does the isolation, the additional refcount is OK and
>>>> can_split_folio() will return true. In addition, __folio_split() does not
>>>> change folio LRU state, so these two checks are orthogonal to refcount
>>>> check, right? The placement of them does not matter, but earlier the better
>>>> to avoid unnecessary work. I see these are sanity checks for callers.
>>>
>>> In light of the discussion in this thread, if you have someone that takes the folio off the LRU concurrently, I think we could still run into a race here. Because that could happen just after we passed the test in __folio_split().
>>>
>>> That's why I think the test would have to happen when there are no such races possible anymore.
>>
>> Make sense. Thanks for the explanation.
>>
>>>
>>> But the real question is if it is okay to remove the folio from the LRU as done in the patch discussed here ...
>>
>> I just read through the email thread. IIUC, when deferred_split_scan() split
>> a THP, it expects the THP is on LRU list. I think it makes sense since
>> all these THPs are in both the deferred_split_queue and LRU list.
>> And deferred_split_scan() uses split_folio() without providing a list
>> to store the after-split folios.
>>
>> In terms of the patch, since unmap_poisoned_folio() does not handle large
>> folios, why not just split the large folios and add the after-split folios
>> to folio_list?
>
> That's what I raised, but apparently it might not be worth it for that corner case (splitting might fail).
OK, the reason to not split is that handling split failures is too much work
and it is better to be done in memory_failure().
In terms of this patch, returning large poisoned folio back to LRU list
seems to be the right thing to do. My thought is that the reclaim code
is trying to free this folio, but it cannot reclaim a large poisoned folio,
so it puts the folio back like it cannot reclaim an in-use folio.
>
> Then, the while loop will go over all the after-split folios
>> one by one.
>>
>> BTW, unmap_poisoned_folio() is also used in do_migrate_range() from
>> memory_hotplug.c and there is no guard for large folios either. That
>> also needs a fix?
>
> Yes, that was mentioned, and I was hoping we could let unmap_poisoned_folio() check+fail in that case.
For this case, if unmap_poisoned_folio() fails, the whole range cannot
be offline?
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2025-06-12 16:48 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-11 7:46 Jinjiang Tu
2025-06-11 7:59 ` David Hildenbrand
2025-06-11 8:29 ` Jinjiang Tu
2025-06-11 8:35 ` David Hildenbrand
2025-06-11 9:00 ` Jinjiang Tu
2025-06-11 9:20 ` David Hildenbrand
2025-06-11 9:24 ` David Hildenbrand
2025-06-11 14:30 ` Zi Yan
2025-06-11 17:34 ` David Hildenbrand
2025-06-11 17:52 ` Zi Yan
2025-06-12 7:53 ` David Hildenbrand
2025-06-12 15:35 ` Zi Yan
2025-06-12 15:50 ` David Hildenbrand
2025-06-12 16:48 ` Zi Yan [this message]
2025-06-16 11:34 ` Jinjiang Tu
2025-06-16 11:33 ` Jinjiang Tu
2025-06-16 19:27 ` David Hildenbrand
2025-06-17 6:43 ` Jinjiang Tu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45D4230C-F5CC-49B2-B672-C65E9D99BB3F@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=tujinjiang@huawei.com \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox