Re: [PATCH] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Jinjiang Tu <tujinjiang@huawei.com>,
	akpm@linux-foundation.org, linmiaohe@huawei.com
Cc: linux-mm@kvack.org, wangkefeng.wang@huawei.com, Zi Yan <ziy@nvidia.com>
Subject: Re: [PATCH] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
Date: Wed, 11 Jun 2025 11:24:43 +0200	[thread overview]
Message-ID: <62e1f100-0e0e-40bc-9dc3-fcaf8f8d343f@redhat.com> (raw)
In-Reply-To: <a5b77c94-bc8f-4a79-9e45-95dffbaaf280@redhat.com>

On 11.06.25 11:20, David Hildenbrand wrote:
> On 11.06.25 11:00, Jinjiang Tu wrote:
>>
>> 在 2025/6/11 16:35, David Hildenbrand 写道:
>>> On 11.06.25 10:29, Jinjiang Tu wrote:
>>>>
>>>> 在 2025/6/11 15:59, David Hildenbrand 写道:
>>>>> On 11.06.25 09:46, Jinjiang Tu wrote:
>>>>>> In shrink_folio_list(), the hwpoisoned folio may be large folio, which
>>>>>> can't be handled by unmap_poisoned_folio().
>>>>>>
>>>>>> Since UCE is rare in real world, and race with reclaimation is more
>>>>>> rare,
>>>>>> just skipping the hwpoisoned large folio is enough. memory_failure()
>>>>>> will
>>>>>> handle it if the UCE is triggered again.
>>>>>>
>>>>>> Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
>>>>>
>>>>> Please also add
>>>>>
>>>>> Closes:
>>>>>
>>>>> with a link to the report
>>>> Thanks, I will add it.
>>>>>
>>>>>> Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
>>>>>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>>>>>> ---
>>>>>>      mm/vmscan.c | 8 ++++++++
>>>>>>      1 file changed, 8 insertions(+)
>>>>>> /home/tujinjiang/hulk-repo/hulk/mm/mempolicy.c
>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>> index b6f4db6c240f..3a4e8d7419ae 100644
>>>>>> --- a/mm/vmscan.c
>>>>>> +++ b/mm/vmscan.c
>>>>>> @@ -1131,6 +1131,14 @@ static unsigned int shrink_folio_list(struct
>>>>>> list_head *folio_list,
>>>>>>                  goto keep;
>>>>>>                if (folio_contain_hwpoisoned_page(folio)) {
>>>>>> +            /*
>>>>>> +             * unmap_poisoned_folio() can't handle large
>>>>>> +             * folio, just skip it. memory_failure() will
>>>>>> +             * handle it if the UCE is triggered again.
>>>>>> +             */
>>>>>> +            if (folio_test_large(folio))
>>>>>> +                goto keep_locked;
>>>>>> +
>>>>>>                  unmap_poisoned_folio(folio, folio_pfn(folio), false);
>>>>>>                  folio_unlock(folio);
>>>>>>                  folio_put(folio);
>>>>>
>>>>> Why not handle that in unmap_poisoned_folio() to make that limitation
>>>>> clear and avoid?
>>>> I tried to put the check in unmap_poisoned_folio(), but it still exists
>>>> other issues.
>>>
>>>
>>>
>>>> The calltrace in v6.6 kernel:
>>>>
>>>> Unable to handle kernel paging request at virtual address
>>>> fbd5200000000024
>>>> KASAN: maybe wild-memory-access in range
>>>> [0xdead000000000120-0xdead000000000127]
>>>> pc : __list_add_valid_or_report+0x50/0x158 lib/list_debug.c:32
>>>> lr : __list_add_valid include/linux/list.h:88 [inline]
>>>> lr : __list_add include/linux/list.h:150 [inline]
>>>> lr : list_add_tail include/linux/list.h:183 [inline]
>>>> lr : lru_add_page_tail.constprop.0+0x4ac/0x640 mm/huge_memory.c:3187
>>>> Call trace:
>>>>      __list_add_valid_or_report+0x50/0x158 lib/list_debug.c:32
>>>>      __list_add_valid include/linux/list.h:88 [inline]
>>>>      __list_add include/linux/list.h:150 [inline]
>>>>      list_add_tail include/linux/list.h:183 [inline]
>>>>      lru_add_page_tail.constprop.0+0x4ac/0x640 mm/huge_memory.c:3187
>>>>      __split_huge_page_tail.isra.0+0x344/0x508 mm/huge_memory.c:3286
>>>>      __split_huge_page+0x244/0x1270 mm/huge_memory.c:3317
>>>>      split_huge_page_to_list_to_order+0x1038/0x1620 mm/huge_memory.c:3625
>>>>      split_folio_to_list_to_order include/linux/huge_mm.h:638 [inline]
>>>>      split_folio_to_order include/linux/huge_mm.h:643 [inline]
>>>>      deferred_split_scan+0x5f8/0xb70 mm/huge_memory.c:3778
>>>>      do_shrink_slab+0x2a0/0x828 mm/vmscan.c:927
>>>>      shrink_slab_memcg+0x2c0/0x558 mm/vmscan.c:996
>>>>      shrink_slab+0x228/0x250 mm/vmscan.c:1075
>>>>      shrink_node_memcgs+0x34c/0x6a0 mm/vmscan.c:6630
>>>>      shrink_node+0x21c/0x1378 mm/vmscan.c:6664
>>>>      shrink_zones.constprop.0+0x24c/0xab0 mm/vmscan.c:6906
>>>>      do_try_to_free_pages+0x150/0x880 mm/vmscan.c:6968
>>>>
>>>>
>>>> The folio is deleted from lru and the folio->lru can't be accessed. If
>>>> the folio is splitted later,
>>>> lru_add_split_folio() assumes the folio is on lru.
>>>
>>> Not sure if something like the following would be appropriate:
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index b91a33fb6c694..fdd58c8ba5254 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1566,6 +1566,9 @@ int unmap_poisoned_folio(struct folio *folio,
>>> unsigned long pfn, bool must_kill)
>>>           enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
>>>           struct address_space *mapping;
>>>
>>> +       if (folio_test_large && !folio_test_hugetlb(folio))
>>> +               return -EBUSY;
>>> +
>>>           if (folio_test_swapcache(folio)) {
>>>                   pr_err("%#lx: keeping poisoned page in swap cache\n",
>>> pfn);
>>>                   ttu &= ~TTU_HWPOISON;
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index f8dfd2864bbf4..6a3426bc9e9d7 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1138,7 +1138,8 @@ static unsigned int shrink_folio_list(struct
>>> list_head *folio_list,
>>>                           goto keep;
>>>
>>>                   if (folio_contain_hwpoisoned_page(folio)) {
>>> -                       unmap_poisoned_folio(folio, folio_pfn(folio),
>>> false);
>>> +                       if (unmap_poisoned_folio(folio,
>>> folio_pfn(folio), false)){
>>> +                               list_add(&folio->lru, &ret_folios);
>>>                           folio_unlock(folio);
>>>                           folio_put(folio);
>>>                           continue;
>>
>> The expected behaviour is keeping the folio on lru if
>> unmap_poisoned_folio fails?
> 
> Good question, it's a mess.
> 
> If we keep the LRU bit cleared (kept isolated), we wouldn't have to add
> it to the list.
> 
> But now I wonder where deferred_split_scan() would check for the LRU flag?
> 
> It seems to trylock, to then call split_folio().
> 
> In __folio_split(), I don't find any checks for the lru flag ... :(
> 
> We call lru_add_split_folio() where we
> VM_BUG_ON_FOLIO(folio_test_lru(new_folio), folio);

Oh, that's for the new folio. So the checks in the other two paths apply.

When we come through deferred, we don't expect a "list" and consequently 
assume that it is still on the LRU.

	VM_WARN_ON(!folio_test_lru(folio));

So not adding it back to the LRU will be problematic for 
lru_add_split_folio() when called through deferred shrinking I guess ...

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2025-06-11  9:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-11  7:46 Jinjiang Tu
2025-06-11  7:59 ` David Hildenbrand
2025-06-11  8:29   ` Jinjiang Tu
2025-06-11  8:35     ` David Hildenbrand
2025-06-11  9:00       ` Jinjiang Tu
2025-06-11  9:20         ` David Hildenbrand
2025-06-11  9:24           ` David Hildenbrand [this message]
2025-06-11 14:30             ` Zi Yan
2025-06-11 17:34               ` David Hildenbrand
2025-06-11 17:52                 ` Zi Yan
2025-06-12  7:53                   ` David Hildenbrand
2025-06-12 15:35                     ` Zi Yan
2025-06-12 15:50                       ` David Hildenbrand
2025-06-12 16:48                         ` Zi Yan
2025-06-16 11:34                           ` Jinjiang Tu
2025-06-16 11:33                         ` Jinjiang Tu
2025-06-16 19:27                           ` David Hildenbrand
2025-06-17  6:43                             ` Jinjiang Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62e1f100-0e0e-40bc-9dc3-fcaf8f8d343f@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=tujinjiang@huawei.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox