linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
Date: Thu, 17 Mar 2022 13:31:34 +0000	[thread overview]
Message-ID: <20220317133133.GA32934@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <7362f9ee-81fa-702a-7a03-1a91ecf0b58e@oracle.com>

On Wed, Mar 16, 2022 at 03:51:35PM -0700, Mike Kravetz wrote:
> On 3/16/22 05:07, Naoya Horiguchi wrote:
> > From: Miaohe Lin <linmiaohe@huawei.com>
> > 
> > There is a race condition between memory_failure_hugetlb() and hugetlb
> > free/demotion, which causes setting PageHWPoison flag on the wrong page.
> > The one simple result is that wrong processes can be killed, but another
> > (more serious) one is that the actual error is left unhandled, so no one
> > prevents later access to it, and that might lead to more serious results
> > like consuming corrupted data.
> > 
> > Think about the below race window:
> > 
> >   CPU 1                                   CPU 2
> >   memory_failure_hugetlb
> >   struct page *head = compound_head(p);
> >                                           hugetlb page might be freed to
> >                                           buddy, or even changed to another
> >                                           compound page.
> > 
> >   get_hwpoison_page -- page is not what we want now...
> > 
> > The compound_head is called outside hugetlb_lock, so the head is not
> > reliable.
> > 
> > So set PageHWPoison flag after passing prechecks. And to detect
> > potential violation, this patch also introduces a new action type
> > MF_MSG_DIFFERENT_PAGE_SIZE.
> 
> Thanks for squashing these patches.
> 
> In my testing, there is a change in behavior that may not be intended.
> 
> My test strategy is:
> - allocate two hugetlb pages
> - create a mapping which reserves those two pages, but does not fault them in
>   - as a result, the pages are on the free list but can not be freed
> - inject error on a subpage of one of the huge pages
>   - echo 0xYYY > /sys/kernel/debug/hwpoison/corrupt-pfn
> - memory error code will call dissolve_free_huge_page
>   - dissolve_free_huge_page returns -EBUSY because
>     h->free_huge_pages - h->resv_huge_pages == 0
> - We never end up setting Poison on the page with error or head page
> - Huge page sitting on free list with error in subpage and not marked
> - huge page with error could be given to an application or returned to buddy
> 
> Prior to this change, Poison would be set on the head page

You're right, this is not intended.
I should've kept current behavior for this case (set PageHWPoison
flag on the head page, and no refcnt taken), so I'll update the patch.

In this case the hwpoisoned hugepage remains in free list, but
dequeue_huge_page_node_exact() checks the PageHWPoison flag not to be
allocated, that's OK.  It might not be optimal that the hwpoisoned free
hugepage is in the list for long, so some mechanism to handle it in a
delayed manner.  One way is to call dissolve_free_huge_page() when a
hwpoisoned page is found in dequeue_huge_page_node_exact().  If the
dissolving succeeds, it's OK.  If it fails, keep it as-is expecting to
be dissolved next time.

Another related problem is that when hwpoison hugepage is listed in
free list, the information about raw error offset is lost.  So if
hugepage pool is shrinked and the hwpoison hugepage is freed to buddy,
the PageHWPoison flag remains on the ex-head page.  So we need somehow
keep error offset.

Anyway, this will be done in separate work, I'll just fix the problem
you mentioned.

Thank you very much,
Naoya Horiguchi

> 
> I do not think this was an intended change in behavior.  But, perhaps it is
> all we can do in this case?  Sorry for not being able to look more closely
> at the code right now.   
> -- 
> Mike Kravetz

      parent reply	other threads:[~2022-03-17 13:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-16 12:07 Naoya Horiguchi
2022-03-16 22:51 ` Mike Kravetz
2022-03-17  9:28   ` Miaohe Lin
2022-03-17 14:44     ` HORIGUCHI NAOYA(堀口 直也)
2022-03-17 13:31   ` HORIGUCHI NAOYA(堀口 直也) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220317133133.GA32934@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox