From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,FROM_EXCESS_BASE64, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 658D4C3A5A2 for ; Fri, 20 Sep 2019 16:32:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0AF75208C3 for ; Fri, 20 Sep 2019 16:32:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iGoLx2jY" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0AF75208C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7765E6B0003; Fri, 20 Sep 2019 12:32:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FF6F6B0005; Fri, 20 Sep 2019 12:32:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59F346B0006; Fri, 20 Sep 2019 12:32:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 2F53B6B0003 for ; Fri, 20 Sep 2019 12:32:52 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id DBD35181AC9AE for ; Fri, 20 Sep 2019 16:32:51 +0000 (UTC) X-FDA: 75955842942.15.knee42_18f1e37450454 X-HE-Tag: knee42_18f1e37450454 X-Filterd-Recvd-Size: 11404 Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com [209.85.160.196]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Fri, 20 Sep 2019 16:32:51 +0000 (UTC) Received: by mail-qt1-f196.google.com with SMTP id j31so9361458qta.5 for ; Fri, 20 Sep 2019 09:32:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=x5yf321sl3IfsmH8xYSNt7lhJaLDadHzYnpws+Qn6eg=; b=iGoLx2jY6q1rZBk9rgzS2/X+H98kmAXekDRDd6CgMR4tany5upwIT5aLaTxx1RninE AwTcMiLJgZxmzv5lwv28oHZO1OE0tLZVsi0NHLsyZr1upQnw3KpPbycMmV12sXCZFh/I MMRSWb0E/G4tKMyGPwW8s1tZ1jH24rwSDsgn7+ThDzTnFVc/dF6XUB0fGzavtbkmOL1e KfzI3ZHLQ+Xx+VaEzeEbDX20/uQC4LrCpggoG0Uv63O8a4l5HFYcuzyjJZ1uEpkFtJBz n9D+zqI/hjwF0+0XN6KgUBK7Q4kITHMNZ0X3o9oTSeJnLA8FKJy3Mvde5Kw6vCYi/LOG 6oJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=x5yf321sl3IfsmH8xYSNt7lhJaLDadHzYnpws+Qn6eg=; b=BUtRvjMRT0Tgo/jArn/5sAude410p4ctlnyns0dd2iE56nZyHS+iCKQsc52ULvcKAG jYEXQHMQpYl+3oppC0cecvfqEn88P+Skpvm80ABx74CtW9YljL+WPSi0M++wcKlwNAvu FTwfnUmo4QSGiXZTqKTtPmntWfGjwTY1NuHMJ1enqYqcjmq/UButrGpKFPJebirCnFvK wUjOJtP70gRPBewX/9Voum3p8/Iqn6zXpmNgmwL8SXoGEWP2hTD0pygo/KF9KJM/HdxV +v8xtXNHk1NkxxkS2KNE8GkFAJDW4Uy1eCYreEiKaoaChXLntFd1eivZ+LzNoMqOVOeC nOhg== X-Gm-Message-State: APjAAAU5pmr7WuWvyyiNDzAF7CxmD2AaZIWY1N3bOfOJbUbaEIjyC8Oi eq0jw9FV66N0E3VvR00kjSsY1TGE4A9qNLdQLJA= X-Google-Smtp-Source: APXvYqzY4GYHYQMgPpuRxxLngO/dpenARH8CzPutj834icp3vD2CcRUPJV1fpXDuhCbF32p71f4RzgrA28qbzPJw4BQ= X-Received: by 2002:ac8:1194:: with SMTP id d20mr4254071qtj.294.1568997170817; Fri, 20 Sep 2019 09:32:50 -0700 (PDT) MIME-Version: 1.0 References: <1568994684-1425-1-git-send-email-hqjagain@gmail.com> <2b9d2a82-ad48-f493-b53e-b34de28980c7@linux.ibm.com> In-Reply-To: <2b9d2a82-ad48-f493-b53e-b34de28980c7@linux.ibm.com> From: =?UTF-8?B?6buE56eL6ZKn?= Date: Sat, 21 Sep 2019 00:32:38 +0800 Message-ID: Subject: Re: [PATCH 3/3] mm:fix gup_pud_range To: "Aneesh Kumar K.V" Cc: akpm@linux-foundation.org, ira.weiny@intel.com, jgg@ziepe.ca, dan.j.williams@intel.com, rppt@linux.ibm.com, jhubbard@nvidia.com, keith.busch@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: multipart/alternative; boundary="000000000000052fe50592fe9f44" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000052fe50592fe9f44 Content-Type: text/plain; charset="UTF-8" >On Fri, Sep 20, 2019 at 11:58 PM Aneesh Kumar K.V < aneesh.kumar@linux.ibm.com> wrote: > >On 9/20/19 9:21 PM, Qiujun Huang wrote: > >> __get_user_pages_fast try to walk the page table but the > > >hugepage pte is replace by hwpoison swap entry by mca path. > > >... > > > > >Can you describe this in more details. I guess you are facing the issue > >with respect PUD level PTE entry that got updated by hwpoison as a swap > >entry. Since we don't specifically check for pud_present(), we walk the > >page table with wrong values and that results in corruption? > Yes, in the case using 2G hugepage. > > > >> [15798.177437] mce: Uncorrected hardware memory error in > >> user-access at 224f1761c0 > > >[15798.180171] MCE 0x224f176: Killing pal_main:6784 due to > > > hardware memory corruption > > >[15798.180176] MCE 0x224f176: Killing qemu-system-x86:167336 > > > due to hardware memory corruption > >> ... > >> [15798.180206] BUG: unable to handle kernel > >> [15798.180226] paging request at ffff891200003000 > >> [15798.180236] IP: [] gup_pud_range+ > >> 0x13e/0x1e0 > >> ... > >> > >> We need to skip the hwpoison entry in gup_pud_range. > >> > >> Signed-off-by: Qiujun Huang > >> --- > >> mm/gup.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/mm/gup.c b/mm/gup.c > >> index 98f13ab..6157ed9 100644 > >> --- a/mm/gup.c > >> +++ b/mm/gup.c > >> @@ -2230,6 +2230,8 @@ static int gup_pud_range(p4d_t p4d, unsigned long > addr, unsigned long end, > >> next = pud_addr_end(addr, end); > >> if (pud_none(pud)) > >> return 0; > >> + if (unlikely(!pud_present(pud))) > >> + return 0; > > >You should be able to remove that if (pud_none(pud)) check and just keep > >the pud_present() check? > indeed > > >> if (unlikely(pud_huge(pud))) { > >> if (!gup_huge_pud(pud, pudp, addr, next, flags, > >> pages, nr)) > >> > > diff --git a/mm/gup.c b/mm/gup.c index 98f13ab..2e3a1d3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2228,7 +2228,7 @@ static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, pud_t pud = READ_ONCE(*pudp); next = pud_addr_end(addr, end); - if (pud_none(pud)) + if (!pud_present(pud)) return 0; if (unlikely(pud_huge(pud))) { if (!gup_huge_pud(pud, pudp, addr, next, flags, --000000000000052fe50592fe9f44 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
>On Fri, Sep 20, 2019 at 11:58 PM Anee= sh Kumar K.V <aneesh.kumar= @linux.ibm.com> wrote:
>On 9/20/19 9:21 PM, Qiujun Hua= ng wrote:
>> __get_user_pages_fast try to walk the page table but = the
> >hugepage pte is replace by hwpoison swap entry by mca path.
> >...
>
>
=C2=A0>Can you describe this in more details. I gu= ess you are facing the issue=C2=A0
>with respect PUD level PTE entry that got updated by hwp= oison as a swap
>entry. Since we don't specifically check for pu= d_present(), we walk the
>page table with wrong values and that resu= lts in corruption?
=C2=A0
Yes, in the case u= sing 2G hugepage.=C2=A0


>> [15798.177437] mce: Uncorrected hardware memory error in
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0user-access at 224f1761c0
> >[15798.180171] MCE 0x224f176: Killing pal_main:6784 due to
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 hardware memory corruption
> >[15798.180176] MCE 0x224f176: Killing qemu-system-x86:167336
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0due to hardware memory corruption
= >> ...
>> [15798.180206] BUG: unable to handle kernel
>= ;> [15798.180226] paging request at ffff891200003000
>> [15798.= 180236] IP: [<ffffffff8106edae>] gup_pud_range+
>>=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A00x13e/0x1e0
>> ...
>>
>= > We need to skip the hwpoison entry in gup_pud_range.
>>
&= gt;> Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
>> ---
>>= =C2=A0 =C2=A0mm/gup.c | 2 ++
>>=C2=A0 =C2=A01 file changed, 2 inse= rtions(+)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>= > index 98f13ab..6157ed9 100644
>> --- a/mm/gup.c
>> += ++ b/mm/gup.c
>> @@ -2230,6 +2230,8 @@ static int gup_pud_range(p4= d_t p4d, unsigned long addr, unsigned long end,
>>=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0next =3D pud_addr_end(addr, end);<= br>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (pud_n= one(pud))
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
>> +=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0if (unlikely(!pud_present(pud)))
>> += =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0return 0;
=C2=A0=C2=A0
>You should be able to remove that if (pud_none(pud)= ) check and just keep
>the pud_present() check?
=C2=A0indeed

>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (unli= kely(pud_huge(pud))) {
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!gup_huge_pud(pud, pudp, addr= , next, flags,
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0pages, nr))
>>


diff --git a/mm/gup.c b/mm/gup.c
=
index 98f13ab..2e3a1d3 100644
--- a/mm/gup.c
+++ b= /mm/gup.c
@@ -2228,7 +2228,7 @@ static int gup_pud_range(p4d_t p4= d, unsigned long addr, unsigned long end,
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= pud_t pud =3D READ_ONCE(*pudp);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0next =3D pud= _addr_end(addr, end);
-=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (pud_none(pud))
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!pud_present(pud))<= /div>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0return 0;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (unlikely(pud_huge(pud))= ) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 if (!gup_huge_pud(pud, pudp, addr, next, flags,=C2= =A0
--000000000000052fe50592fe9f44--