linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Jane Chu <jane.chu@oracle.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [RFC] Missing compound_head() in memory-failure
Date: Tue, 1 Feb 2022 18:34:03 +0000	[thread overview]
Message-ID: <dceac808-cb08-53db-b9fb-4e5502c25008@oracle.com> (raw)
In-Reply-To: <YflyKDTO4kC4FbVj@casper.infradead.org>

On 2/1/22 17:47, Matthew Wilcox wrote:
> On Tue, Feb 01, 2022 at 05:40:57PM +0000, Joao Martins wrote:
>> On 2/1/22 16:01, Joao Martins wrote:
>>> On 2/1/22 15:46, Matthew Wilcox wrote:
>>>> On Mon, Jan 31, 2022 at 08:54:39PM +0000, Joao Martins wrote:
>>>>> On 1/31/22 20:29, Matthew Wilcox wrote:
>>>>>> Unless I am mistaken, you have to pass the compound head of the page
>>>>>> which has the error to collect_procs().  Am I mistaken?
>>>>>>
>>>>> -rc2 already has a fix for it:
>>>>>
>>>>> https://lore.kernel.org/linux-mm/20220129021420.PgBIZm-q9%25akpm@linux-foundation.org/
>>>>>
>>>>> Earlier in that function there's a:
>>>>>
>>>>> 	page = compound_head(page);
>>>>>
>>>>> So the @page passed to collect_procs() already is a head page.
>>>>
>>>> It's wrong though ;-(  You set the HWPoison bit on the page after
>>>> calling compound_head(), so you set the bit on the head page instead
>>>> of the precise page that had the poison.
>>>>
>>> Considering that on device-dax we would unmap the whole 2M page regardless
>>> of the poisoned subpage isn't that actually representative still?
>>>
>>
>> To say this another way. We do set the HWPoison on the head page,
>> and not the subpage as you say, but we end up propagating the resultant
>> MCE action on the superset of pages in the whole PMD or PUD.
>>
>> What I was trying to say in perhaps a convoluted way is that device-dax
>> case isn't different than HugeTLB that only wants to poison head. If there's
>> a head page, there's likely a PMD or PUD populated (depending what the device
>> was onlined with) and thus that's what gets unmapped. There's no idea of
>> subpages being treated any differently, at least as far as device-dax is
>> concerned -- unless I miss auditing some other code path.
> 
> Using HugeTLB as a model is not a good idea.  THP is the model to
> aim for; one can choose to map pages askew (ie not aligned with a
> PMD), and I don't think you'll find all the mappings with the current
> code (eg if someone has mapped a single page of the hugepage).
> 

Don't know if that last sentence referred to THP or DAX -- but to be extra sure,
on device-dax current code you can't map or remap subpages of a hugepage.
Even if you open a device-dax fd and touch just a subpage, what gets mapped
in the user mm is the entirety of the PMD (or PUD). You can't even map a
single random page, it always needs to be aligned /at a minimum/ to the
namespace boundary (which is the page table granularity i.e. PTE, PMD or PUD).

>> fsdax IIUC seems to rely more on the subpage bit being flagged but no
>> functional change here for fsdax as there's only base pages there (no heads).
>>
>>>> I'm fixing this up as part of the folio patches, but you may wish to
>>>> fix it earlier than that.
>> (+Dan in case I misrepresented or missed something)
>>
>> Should we deem it a problem, I'll fix for the next -rc.
>> Just in case, here's the diff stashed:
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 2e2f740c63dc..661c23df8115 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1577,7 +1577,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>                 struct dev_pagemap *pgmap)
>>  {
>> -       struct page *page = pfn_to_page(pfn);
>> +       struct page *page = pfn_to_page(pfn), *subpage = page;
>>         unsigned long size = 0;
>>         struct to_kill *tk;
>>         LIST_HEAD(tokill);
>> @@ -1631,7 +1631,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>          * Use this flag as an indication that the dax page has been
>>          * remapped UC to prevent speculative consumption of poison.
>>          */
>> -       SetPageHWPoison(page);
>> +       SetPageHWPoison(subpage);
>>
>>         /*
>>          * Unlike System-RAM there is no possibility to swap in a


  reply	other threads:[~2022-02-01 18:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-31 20:29 Matthew Wilcox
2022-01-31 20:54 ` Joao Martins
2022-02-01  2:34   ` Matthew Wilcox
2022-02-01 15:46   ` Matthew Wilcox
2022-02-01 16:01     ` Joao Martins
2022-02-01 17:40       ` Joao Martins
2022-02-01 17:47         ` Matthew Wilcox
2022-02-01 18:34           ` Joao Martins [this message]
2022-02-01 21:11     ` Jane Chu
2022-02-01 22:04       ` Joao Martins
2022-02-01 22:20         ` Dan Williams
2022-02-01 23:04           ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dceac808-cb08-53db-b9fb-4e5502c25008@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox