linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Linux MM <linux-mm@kvack.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	 Miaohe Lin <linmiaohe@huawei.com>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
Date: Wed, 9 Mar 2022 16:30:50 -0800	[thread overview]
Message-ID: <CAHbLzkpWV4oP86oe4BXop20KMJqwEmSkpFmZfT+q38hs90oqKA@mail.gmail.com> (raw)
In-Reply-To: <20220310000024.GA1577304@hori.linux.bs1.fc.nec.co.jp>

On Wed, Mar 9, 2022 at 4:01 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@nec.com> wrote:
>
> On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote:
> > On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi
> > <naoya.horiguchi@linux.dev> wrote:
> > >
> > > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > >
> > > There is a race condition between memory_failure_hugetlb() and hugetlb
> > > free/demotion, which causes setting PageHWPoison flag on the wrong page
> > > (which was a hugetlb when memory_failrue() was called, but was removed
> > > or demoted when memory_failure_hugetlb() is called).  This results in
> > > killing wrong processes.  So set PageHWPoison flag with holding page lock,
> > >
> > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > ---
> > >  mm/memory-failure.c | 27 ++++++++++++---------------
> > >  1 file changed, 12 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > index ac6492e36978..fe25eee8f9d6 100644
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> > >         int res;
> > >         unsigned long page_flags;
> > >
> > > -       if (TestSetPageHWPoison(head)) {
> > > -               pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > > -                      pfn);
> > > -               res = -EHWPOISON;
> > > -               if (flags & MF_ACTION_REQUIRED)
> > > -                       res = kill_accessing_process(current, page_to_pfn(head), flags);
> > > -               return res;
> > > -       }
> > > -
> > > -       num_poisoned_pages_inc();
> > > -
> > >         if (!(flags & MF_COUNT_INCREASED)) {
> > >                 res = get_hwpoison_page(p, flags);
> >
> > I'm not an expert of hugetlb, I may be wrong. I'm wondering how this
> > could solve the race? Is the below race still possible?
> >
> > __get_hwpoison_page()
> >   head = compound_head(page)
> >
> > hugetlb demotion (1G --> 2M)
> >   get_hwpoison_huge_page(head, &hugetlb);
>
> Thanks for the comment.
> I assume Miaohe's patch below introduces additional check to detect the
> race.  The patch calls compound_head() for the raw error page again, so
> the demotion case should be detected.  I'll make the dependency clear in
> the commit log.
>
> https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huawei.com/
>
> >
> >
> > Then the head may point to a 2M page, but the hwpoisoned subpage is
> > not in that 2M range?
> >
> >
> > >                 if (!res) {
> > >                         lock_page(head);
> > >                         if (hwpoison_filter(p)) {
> > > -                               if (TestClearPageHWPoison(head))
> > > -                                       num_poisoned_pages_dec();
> > >                                 unlock_page(head);
> > >                                 return -EOPNOTSUPP;
> > >                         }
> > > @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> > >         page_flags = head->flags;
> > >
> > >         if (hwpoison_filter(p)) {
> > > -               if (TestClearPageHWPoison(head))
> > > -                       num_poisoned_pages_dec();
> > >                 put_page(p);
> > >                 res = -EOPNOTSUPP;
> > >                 goto out;
> > >         }
> > >
> > > +       if (TestSetPageHWPoison(head))
> >
> > And I don't think "head" is still the head you expected if the race
> > happened. I think we need to re-retrieve the head once the page
> > refcount is bumped and locked.
>
> I think the above justification works for this.
> When the kernel reaches this line, the hugepage is properly pinned without being
> freed or demoted, so "head" is still pointing to the same head page as expected.

I think Mike's comment in the earlier email works for this too. The
huge page may get demoted before the page is pinned and locked, so the
actual hwpoisoned subpage may belong to another smaller huge page now.


>
> Thanks,
> Naoya Horiguchi
>
> >
> > > +               goto already_hwpoisoned;
> > > +
> > > +       num_poisoned_pages_inc();
> > > +
> > >         /*
> > >          * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
> > >          * simply disable it. In order to make it work properly, we need
> > > @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> > >  out:
> > >         unlock_page(head);
> > >         return res;
> > > +already_hwpoisoned:
> > > +       unlock_page(head);
> > > +       pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> > > +       res = -EHWPOISON;
> > > +       if (flags & MF_ACTION_REQUIRED)
> > > +               res = kill_accessing_process(current, page_to_pfn(head), flags);
> > > +       return res;
> > >  }
> > >
> > >  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> > > --
> > > 2.25.1
> > >


  reply	other threads:[~2022-03-10  0:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09  9:14 Naoya Horiguchi
2022-03-09 21:30 ` Andrew Morton
2022-03-10  1:15   ` HORIGUCHI NAOYA(堀口 直也)
2022-03-09 21:55 ` Yang Shi
2022-03-09 23:59   ` Mike Kravetz
2022-03-10  0:29     ` HORIGUCHI NAOYA(堀口 直也)
2022-03-10  0:00   ` HORIGUCHI NAOYA(堀口 直也)
2022-03-10  0:30     ` Yang Shi [this message]
2022-03-10  6:23       ` Miaohe Lin
2022-03-10 17:50         ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkpWV4oP86oe4BXop20KMJqwEmSkpFmZfT+q38hs90oqKA@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=naoya.horiguchi@nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox