From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49F5FCA9EA2 for ; Fri, 18 Oct 2019 06:06:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E84C92089C for ; Fri, 18 Oct 2019 06:06:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E84C92089C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 462A78E0007; Fri, 18 Oct 2019 02:06:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4133A8E0003; Fri, 18 Oct 2019 02:06:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DB6C8E0007; Fri, 18 Oct 2019 02:06:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 0EFEF8E0003 for ; Fri, 18 Oct 2019 02:06:39 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id C4F0975B3 for ; Fri, 18 Oct 2019 06:06:38 +0000 (UTC) X-FDA: 76055871276.09.son27_7427e48b6eb1f X-HE-Tag: son27_7427e48b6eb1f X-Filterd-Recvd-Size: 5017 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 06:06:38 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 72203AC9D; Fri, 18 Oct 2019 06:06:36 +0000 (UTC) Date: Fri, 18 Oct 2019 08:06:35 +0200 From: Michal Hocko To: Naoya Horiguchi Cc: Qian Cai , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , David Hildenbrand , Mike Kravetz Subject: Re: memory offline infinite loop after soft offline Message-ID: <20191018060635.GA5017@dhcp22.suse.cz> References: <1570829564.5937.36.camel@lca.pw> <20191014083914.GA317@dhcp22.suse.cz> <20191017093410.GA19973@hori.linux.bs1.fc.nec.co.jp> <20191017100106.GF24485@dhcp22.suse.cz> <1571335633.5937.69.camel@lca.pw> <20191017182759.GN24485@dhcp22.suse.cz> <20191018021906.GA24978@hori.linux.bs1.fc.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20191018021906.GA24978@hori.linux.bs1.fc.nec.co.jp> User-Agent: Mutt/1.10.1 (2018-07-13) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 18-10-19 02:19:06, Naoya Horiguchi wrote: > On Thu, Oct 17, 2019 at 08:27:59PM +0200, Michal Hocko wrote: > > On Thu 17-10-19 14:07:13, Qian Cai wrote: > > > On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote: > > > > On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote: > > > > > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote: > > > >=20 > > > > [...] > > > > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > > > > > > index 89c19c0feadb..5fb3fee16fde 100644 > > > > > > --- a/mm/page_isolation.c > > > > > > +++ b/mm/page_isolation.c > > > > > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigne= d long pfn, unsigned long end_pfn, > > > > > > * simple way to verify that as VM_BUG_ON(), though. > > > > > > */ > > > > > > pfn +=3D 1 << page_order(page); > > > > > > - else if (skip_hwpoisoned_pages && PageHWPoison(page)) > > > > > > + else if (skip_hwpoisoned_pages && PageHWPoison(compound_he= ad(page))) > > > > > > /* A HWPoisoned page cannot be also PageBuddy */ > > > > > > pfn++; > > > > > > else > > > > >=20 > > > > > This fix looks good to me. The original code only addresses hwp= oisoned 4kB-page, > > > > > we seem to have this issue since the following commit, > > > >=20 > > > > Thanks a lot for double checking Naoya! > > > > =20 > > > > > commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64 > > > > > Author: Wen Congyang > > > > > Date: Tue Dec 11 16:00:45 2012 -0800 > > > > > =20 > > > > > memory-hotplug: skip HWPoisoned page when offlining pages > > > > >=20 > > > > > and extension of LTP coverage finally discovered this. > > > >=20 > > > > Qian, could you give the patch some testing? > > >=20 > > > Unfortunately, this does not solve the problem.=A0It looks to me th= at in > > > soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only = set > > > PG_hwpoison for buddy pages, so the even the compound_head() has no= PG_hwpoison > > > set. > > >=20 > > > if (PageBuddy(page_head) && page_order(page_head) >=3D order) { > > > if (!TestSetPageHWPoison(page)) > > > hwpoisoned =3D true; > >=20 > > This is more than unexpected. How are we supposed to find out that th= e > > page is poisoned? Any idea Naoya? >=20 > # sorry for my poor review... >=20 > We set PG_hwpoison bit only on the head page for hugetlb, that's becaus= e > we handle multiple pages as a single one for hugetlb. So it's enough > to check isolation only on the head page. Simply skipping pfn cursor t= o > the page after the hugepage should avoid the infinite loop: But the page dump Qian provided shows that the head page doesn't have HWPoison bit either. If it had then going pfn at a time should just work because all tail pages would be skipped. Or do I miss something? =20 > @@ -274,9 +274,13 @@ __test_page_isolated_in_pageblock(unsigned long = pfn, unsigned long end_pfn, > * simple way to verify that as VM_BUG_ON(), though. > */ > pfn +=3D 1 << page_order(page); > - else if (skip_hwpoisoned_pages && PageHWPoison(page)) > - /* A HWPoisoned page cannot be also PageBuddy */ > - pfn++; > + else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)= )) > + /* > + * A HWPoisoned page cannot be also PageBuddy. > + * PG_hwpoison could be set only on the head page in > + * hugetlb case, so no need to check tail pages. > + */ > + pfn +=3D 1 << compound_order(page); > else > break; > } >=20 > Qian, could you please try this? >=20 > Thanks, > Naoya Horiguchi --=20 Michal Hocko SUSE Labs