From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 401A7C433F5 for ; Thu, 10 Mar 2022 00:31:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDAE08D0002; Wed, 9 Mar 2022 19:31:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3ADD8D0001; Wed, 9 Mar 2022 19:31:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98E368D0002; Wed, 9 Mar 2022 19:31:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 8346E8D0001 for ; Wed, 9 Mar 2022 19:31:03 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 565DF24C03 for ; Thu, 10 Mar 2022 00:31:03 +0000 (UTC) X-FDA: 79226596806.02.99BB9C1 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf15.hostedemail.com (Postfix) with ESMTP id C1202A0015 for ; Thu, 10 Mar 2022 00:31:02 +0000 (UTC) Received: by mail-pf1-f172.google.com with SMTP id t2so997829pfj.10 for ; Wed, 09 Mar 2022 16:31:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=XAK6UQcVdFywQafnZbczeEpSjkkUEsUm28p2DeHOwuM=; b=aoVvvC+wR80TWp9+dvnTBG5a42cYKc+R2euTCMM0LN6Bn/vS8rni+MvQX38gdCnzfS DZU0kDthmOQkY+sUUcFddx3aN0c40RMlhJ7AmQE4LbyIfXUa0qKAy85CHAQtmU5UmtSH EKSmccSOJGNAa4vMPQvBCTEDI8iecqzjbYoQouY4ht3OJH6Ktyo15JtYBJ1Wowax9Yyz r4LIrLXxUwKt1TnsWc4ytKxn2K3VF7ZdWdMWlhy7BFAu026Hdh84P4akKiM9dCpfEwm2 9hRxdNNUcKh+DI69MFRL9RN0yVrcbWYA/nhuNbQ8pRFge3GCArtSQQNk5WUYkqqHsVJy 3GZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=XAK6UQcVdFywQafnZbczeEpSjkkUEsUm28p2DeHOwuM=; b=idEVkU00vLn6SgsHwwJ6BQvBorLCKgs80eodJjZsxdkvPqvAoSHZJ07Esho1/mG9QY cyj2aWrefXZK8OJrDo4oAUqdVGhhxm4b9xG80uui686kr2d1L+N+9LiqJM6Nkyqt+4l8 spoLeLtH5F38bjwQWYyrgwYNZOoEbPgcJ1cr7BxCAGjJR5F+rgkfIezUZ2lV9HbWt2iy qgDMwMCj+xN8HIFdN3Pyhkx3hduHkEXkzrG1W/eOIwkjPj4hZ2/1HbevzviAVOlS82N1 QpibxX/dC/SjSs+KyT8RJaxGZXjdTCBWzKKDX7QsIq0jtsmmaEcyUZsyZ0hDjOYQILjQ +9Wg== X-Gm-Message-State: AOAM531W4zuxuXf08kEIpeQTXmcvOA5RA1P0nW3IYQO295NNdnDuPJBN qYPF0Ow/ToStABbkUHAe1cpe6+tSSFRHFizjqtY= X-Google-Smtp-Source: ABdhPJzYlVySOdrZmPs/c1gnoQZpE19IYyHyHgeqi5Z01bSBVV+Uw5v2Q7r51Ur9JJtyQEvf4KoN64b2+uL0ZvCvRmo= X-Received: by 2002:a62:8c44:0:b0:4c4:8072:e588 with SMTP id m65-20020a628c44000000b004c48072e588mr2112278pfd.11.1646872261851; Wed, 09 Mar 2022 16:31:01 -0800 (PST) MIME-Version: 1.0 References: <20220309091449.2753904-1-naoya.horiguchi@linux.dev> <20220310000024.GA1577304@hori.linux.bs1.fc.nec.co.jp> In-Reply-To: <20220310000024.GA1577304@hori.linux.bs1.fc.nec.co.jp> From: Yang Shi Date: Wed, 9 Mar 2022 16:30:50 -0800 Message-ID: Subject: Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= Cc: Naoya Horiguchi , Linux MM , Andrew Morton , Mike Kravetz , Miaohe Lin , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C1202A0015 X-Stat-Signature: 7wg7oqb8s517p8aemaqimbigiknqi51s Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=aoVvvC+w; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-HE-Tag: 1646872262-753076 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 9, 2022 at 4:01 PM HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80= =E7=9B=B4=E4=B9=9F) wrote: > > On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote: > > On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi > > wrote: > > > > > > From: Naoya Horiguchi > > > > > > There is a race condition between memory_failure_hugetlb() and hugetl= b > > > free/demotion, which causes setting PageHWPoison flag on the wrong pa= ge > > > (which was a hugetlb when memory_failrue() was called, but was remove= d > > > or demoted when memory_failure_hugetlb() is called). This results in > > > killing wrong processes. So set PageHWPoison flag with holding page = lock, > > > > > > Signed-off-by: Naoya Horiguchi > > > --- > > > mm/memory-failure.c | 27 ++++++++++++--------------- > > > 1 file changed, 12 insertions(+), 15 deletions(-) > > > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > > index ac6492e36978..fe25eee8f9d6 100644 > > > --- a/mm/memory-failure.c > > > +++ b/mm/memory-failure.c > > > @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned lo= ng pfn, int flags) > > > int res; > > > unsigned long page_flags; > > > > > > - if (TestSetPageHWPoison(head)) { > > > - pr_err("Memory failure: %#lx: already hardware poison= ed\n", > > > - pfn); > > > - res =3D -EHWPOISON; > > > - if (flags & MF_ACTION_REQUIRED) > > > - res =3D kill_accessing_process(current, page_= to_pfn(head), flags); > > > - return res; > > > - } > > > - > > > - num_poisoned_pages_inc(); > > > - > > > if (!(flags & MF_COUNT_INCREASED)) { > > > res =3D get_hwpoison_page(p, flags); > > > > I'm not an expert of hugetlb, I may be wrong. I'm wondering how this > > could solve the race? Is the below race still possible? > > > > __get_hwpoison_page() > > head =3D compound_head(page) > > > > hugetlb demotion (1G --> 2M) > > get_hwpoison_huge_page(head, &hugetlb); > > Thanks for the comment. > I assume Miaohe's patch below introduces additional check to detect the > race. The patch calls compound_head() for the raw error page again, so > the demotion case should be detected. I'll make the dependency clear in > the commit log. > > https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huawei.= com/ > > > > > > > Then the head may point to a 2M page, but the hwpoisoned subpage is > > not in that 2M range? > > > > > > > if (!res) { > > > lock_page(head); > > > if (hwpoison_filter(p)) { > > > - if (TestClearPageHWPoison(head)) > > > - num_poisoned_pages_dec(); > > > unlock_page(head); > > > return -EOPNOTSUPP; > > > } > > > @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned lo= ng pfn, int flags) > > > page_flags =3D head->flags; > > > > > > if (hwpoison_filter(p)) { > > > - if (TestClearPageHWPoison(head)) > > > - num_poisoned_pages_dec(); > > > put_page(p); > > > res =3D -EOPNOTSUPP; > > > goto out; > > > } > > > > > > + if (TestSetPageHWPoison(head)) > > > > And I don't think "head" is still the head you expected if the race > > happened. I think we need to re-retrieve the head once the page > > refcount is bumped and locked. > > I think the above justification works for this. > When the kernel reaches this line, the hugepage is properly pinned withou= t being > freed or demoted, so "head" is still pointing to the same head page as ex= pected. I think Mike's comment in the earlier email works for this too. The huge page may get demoted before the page is pinned and locked, so the actual hwpoisoned subpage may belong to another smaller huge page now. > > Thanks, > Naoya Horiguchi > > > > > > + goto already_hwpoisoned; > > > + > > > + num_poisoned_pages_inc(); > > > + > > > /* > > > * TODO: hwpoison for pud-sized hugetlb doesn't work right no= w, so > > > * simply disable it. In order to make it work properly, we n= eed > > > @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned lon= g pfn, int flags) > > > out: > > > unlock_page(head); > > > return res; > > > +already_hwpoisoned: > > > + unlock_page(head); > > > + pr_err("Memory failure: %#lx: already hardware poisoned\n", p= fn); > > > + res =3D -EHWPOISON; > > > + if (flags & MF_ACTION_REQUIRED) > > > + res =3D kill_accessing_process(current, page_to_pfn(h= ead), flags); > > > + return res; > > > } > > > > > > static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > > > -- > > > 2.25.1 > > >