在 2025/8/8 16:21, David Hildenbrand 写道:
> On 07.08.25 13:13, Jinjiang Tu wrote:
>>
>> 在 2025/8/6 20:41, David Hildenbrand 写道:
>>> On 06.08.25 05:24, Jinjiang Tu wrote:
>>>>
>>>> 在 2025/8/6 11:05, Miaohe Lin 写道:
>>>>> On 2025/8/6 10:05, Jinjiang Tu wrote:
>>>>>> When memory_failure() is called for a already hwpoisoned pfn,
>>>>>> kill_accessing_process() will be called to kill current task. 
>>>>>> However, if
>>>>> Thanks for your patch.
>>>>>
>>>>>> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() 
>>>>>> will skip
>>>>>> the vma in walk_page_test() and return 0.
>>>>>>
>>>>>> Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to 
>>>>>> processes
>>>>>> with recovered clean pages"), kill_accessing_process() will 
>>>>>> return EFAULT.
>>>>> I'm not sure but pfn_to_online_page should return NULL for 
>>>>> VM_PFNMAP pages?
>>>>> So memory_failure_dev_pagemap should handle these pages?
>>>>
>>>> We could call remap_pfn_range() for those pfns with struct page. 
>>>> IIUC, VM_PFNMAP
>>>> means we should assume the pfn doesn't have struct page, but it can 
>>>> have.
>>>>
>>>>>> For x86, the current task will be killed in kill_me_maybe().
>>>>>>
>>>>>> However, after this commit, kill_accessing_process() simplies 
>>>>>> return 0,
>>>>>> that means UCE is handled properly, but it doesn't actually. In 
>>>>>> such case,
>>>>>> the user task will trigger UCE infinitely.
>>>>> Did you ever trigger this loop?
>>>>
>>>> Yes. Our test is as follow steps:
>>>> 1) create a user task allocates a clean anonymous page, wihout 
>>>> accessing it.
>>>> 2) use einj to inject UCE for the page
>>>> 3) create task devmem to use /dev/mem to map the pfn and keep 
>>>> accessing it.
>>>
>>> What is the use case for that? It sounds extremely questionable.
>>>
>> This case is only for test, and is strange indeed.
>>
>> But considering another case, a driver may map same RAM pfn to 
>> several processes with remap_pfn_range().
>> If the first task triggers UCE when accessing the pfn, the task will 
>> be killed. But the other tasks couldn't be killed
>> and triggers UCE infinitely.
>
> Yes, the "anon page" example is confusing though. We really just want 
> to test here if the PFN is mapped. And I would agree that your patch 
> is correct in that case.
>
> For memory poisoning handling you really need a "struct page". 
> struct-less memory is only handled in special ways for DAX (see 
> pfn_to_online_page() logic in memory_failure()).
>
> So what you describe here really only works when a process uses 
> remap_pfn_range() to VM_PFNMAP a struct-page-backed PFN.
>
>
> Likely your patch description should be:
>
> "
> mm/memory-failure: fix infinite UCE for VM_PFNMAP'ed page
>
> When memory_failure() is called for an already hardware poisoned page,
> kill_accessing_process() will conditionally send a SIGBUS to the 
> current (triggering) process if it still maps the page.
>
> However, in case the page is not ordinarily mapped, but was mapped 
> through remap_pfn_range(), kill_accessing_process() would not identify 
> it as mapped even though hwpoison_pte_range() would be prepared to 
> handle it, because walk_page_range() will skip VM_PFNMAP as default in 
> walk_page_test().
>
> walk_page_range() will return 0, assuming "not mapped" and the SIGBUS 
> will be skipped. In this case, the user task will trigger UCE 
> infinitely because it will not receive a SIGBUS on access and simply 
> retry.
>
>
> Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to 
> processes with recovered clean pages"), kill_accessing_process() would 
> return EFAULT in that case, and on x86, the current task would be 
> killed in kill_me_maybe().
>
> Let's fix it by adding our custom .test_walk callback that will also
> process VM_PFNMAP VMAs.
> "
>
Thanks, I will update the patch description to emphasize the pfn is backed with struct page.