From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69F9BECE58E for ; Thu, 17 Oct 2019 18:07:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F11EE21835 for ; Thu, 17 Oct 2019 18:07:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=lca.pw header.i=@lca.pw header.b="NgfrjT6G" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F11EE21835 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lca.pw Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 98D218E0013; Thu, 17 Oct 2019 14:07:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 93E0E8E0003; Thu, 17 Oct 2019 14:07:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82BEC8E0013; Thu, 17 Oct 2019 14:07:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id 5D0C78E0003 for ; Thu, 17 Oct 2019 14:07:17 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 0DAD118017070 for ; Thu, 17 Oct 2019 18:07:17 +0000 (UTC) X-FDA: 76054058514.30.title10_5c9e4f99d9100 X-HE-Tag: title10_5c9e4f99d9100 X-Filterd-Recvd-Size: 10993 Received: from mail-qt1-f193.google.com (mail-qt1-f193.google.com [209.85.160.193]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Oct 2019 18:07:16 +0000 (UTC) Received: by mail-qt1-f193.google.com with SMTP id u40so4881742qth.11 for ; Thu, 17 Oct 2019 11:07:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=message-id:subject:from:to:cc:date:in-reply-to:references :mime-version:content-transfer-encoding; bh=O8aPNDBxCNGS4hw55BtMCSSOHVJRAaIQa9yD2iMaQKo=; b=NgfrjT6GBJThLJamv9Jn9DbzBJb/SDaoMWbX51RMkb5cxPW613VN1NNlJR+kJyUCyP KCgtFs6BAl+2Mz8t/PmLg7eraChYFFBw4qDsHwL24KzKgSLGKxm6VjWSQLym15laEc6x O8WD2OPPsJAdgJH2j885dSFJFkSxRZ0ZuOegnW+TrLMmjnSzuHmEzibW9FpJrJn9+TTm y+2p/QaV7bmMQ4X1KtSLKnZ6+xupJgiW1+W0RAjtvNoTr83VAKHu9KgJHq1BOLGk02CI zvYF2ShJ46lHeYtrA49Ouuz1o6JkTX27dkaMW/ei9PgnFpto/SgFdJ5RPvLhpPCM8hBz pd0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=O8aPNDBxCNGS4hw55BtMCSSOHVJRAaIQa9yD2iMaQKo=; b=bDkDPGyLD16EmVA9ZiMYCsPqJO727YUA2l9mqOnnd7y6AsPFBWKX8IPE/MPH8MWT4t 1ZPD5/UQ3sR5rKF/mOhQyR5raJAfeo+4PkSjUSFTjDtUhoIZ5dJ3byQuHu/qEAVdSZF+ UTfWYhCaZJFkOtzKi2e7NWOEtcOtOh+zqMo9KGnzNAAEsFU1hRdJjduDy8lUSMIy8HsE Ij0VKuPwtcAv3HMO2K78m8OkINUIlgfBCwRLjbXBjVRq4Z9fb8iyLaPmCzQG4dU8u1Mk fXW6dphOed3kl1OAtL9f0Bz3+Ye5Op5WONI3hcBkXXYTPF3/iAHvphREIpILv25oplv2 ikeg== X-Gm-Message-State: APjAAAXZMO5r5Qm/T9WgaSSPCpxbi/9IJgM8Tgc3SpYft322NPFJyZpk ZLxhtkNJJ2kn8TZqMu4P8bZ6aA== X-Google-Smtp-Source: APXvYqy98Behq50Rx7i37NYt4U5mS/Bg7hgCs0GGPo47m3ccYSpSNKz15PxrzZe5ztPmy+H9lwTk5A== X-Received: by 2002:ac8:529a:: with SMTP id s26mr5134896qtn.238.1571335635529; Thu, 17 Oct 2019 11:07:15 -0700 (PDT) Received: from dhcp-41-57.bos.redhat.com (nat-pool-bos-t.redhat.com. [66.187.233.206]) by smtp.gmail.com with ESMTPSA id b1sm1237767qtr.17.2019.10.17.11.07.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Oct 2019 11:07:14 -0700 (PDT) Message-ID: <1571335633.5937.69.camel@lca.pw> Subject: Re: memory offline infinite loop after soft offline From: Qian Cai To: Michal Hocko , Naoya Horiguchi Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , David Hildenbrand , Mike Kravetz Date: Thu, 17 Oct 2019 14:07:13 -0400 In-Reply-To: <20191017100106.GF24485@dhcp22.suse.cz> References: <1570829564.5937.36.camel@lca.pw> <20191014083914.GA317@dhcp22.suse.cz> <20191017093410.GA19973@hori.linux.bs1.fc.nec.co.jp> <20191017100106.GF24485@dhcp22.suse.cz> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 (3.22.6-10.el7) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000036, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote: > On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote: > > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote: >=20 > [...] > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > > > index 89c19c0feadb..5fb3fee16fde 100644 > > > --- a/mm/page_isolation.c > > > +++ b/mm/page_isolation.c > > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long= pfn, unsigned long end_pfn, > > > * simple way to verify that as VM_BUG_ON(), though. > > > */ > > > pfn +=3D 1 << page_order(page); > > > - else if (skip_hwpoisoned_pages && PageHWPoison(page)) > > > + else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(pag= e))) > > > /* A HWPoisoned page cannot be also PageBuddy */ > > > pfn++; > > > else > >=20 > > This fix looks good to me. The original code only addresses hwpoisone= d 4kB-page, > > we seem to have this issue since the following commit, >=20 > Thanks a lot for double checking Naoya! > =20 > > commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64 > > Author: Wen Congyang > > Date: Tue Dec 11 16:00:45 2012 -0800 > > =20 > > memory-hotplug: skip HWPoisoned page when offlining pages > >=20 > > and extension of LTP coverage finally discovered this. >=20 > Qian, could you give the patch some testing? Unfortunately, this does not solve the problem.=C2=A0It looks to me that = in soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hw= poison set. if (PageBuddy(page_head) && page_order(page_head) >=3D order) { if (!TestSetPageHWPoison(page)) hwpoisoned =3D true; The below is the dump_page() of the compound_head(). [=C2=A0=C2=A0113.796632][ T8907] page:c00c000800458040 refcount:0 mapcoun= t:0 mapping:0000000000000000 index:0x0 [=C2=A0=C2=A0113.796716][ T8907] flags: 0x83fffc000000000() [=C2=A0=C2=A0113.796764][ T8907] raw: 083fffc000000000 0000000000000000 f= fffffff00450500 0000000000000000 [=C2=A0=C2=A0113.796870][ T8907] raw: 0000000000000000 0000000000000000 0= 0000000ffffffff 0000000000000000 [=C2=A0=C2=A0113.796959][ T8907] page dumped because: soft offline compou= nd_head() [=C2=A0=C2=A0113.797037][ T8907] page_owner tracks the page as freed [=C2=A0=C2=A0113.797086][ T8907] page last allocated via order 5, migrate= type Movable, gfp_mask 0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP= |__GFP_ THISNODE) [=C2=A0=C2=A0113.797262][ T8907]=C2=A0=C2=A0prep_new_page+0x3c0/0x440 [=C2=A0=C2=A0113.797316][ T8907]=C2=A0=C2=A0get_page_from_freelist+0x2568= /0x2bb0 [=C2=A0=C2=A0113.797395][ T8907]=C2=A0=C2=A0__alloc_pages_nodemask+0x1b4/= 0x670 [=C2=A0=C2=A0113.797443][ T8907]=C2=A0=C2=A0alloc_fresh_huge_page+0xb8/0x= 300 [=C2=A0=C2=A0113.797493][ T8907]=C2=A0=C2=A0alloc_migrate_huge_page+0x30/= 0x70 [=C2=A0=C2=A0113.797550][ T8907]=C2=A0=C2=A0alloc_new_node_page+0xc4/0x38= 0 [=C2=A0=C2=A0113.797609][ T8907]=C2=A0=C2=A0migrate_pages+0x3b4/0x19e0 [=C2=A0=C2=A0113.797649][ T8907]=C2=A0=C2=A0do_move_pages_to_node.isra.29= .part.30+0x44/0xa0 [=C2=A0=C2=A0113.797713][ T8907]=C2=A0=C2=A0kernel_move_pages+0x498/0xfc0 [=C2=A0=C2=A0113.797784][ T8907]=C2=A0=C2=A0sys_move_pages+0x28/0x40 [=C2=A0=C2=A0113.797843][ T8907]=C2=A0=C2=A0system_call+0x5c/0x68 [=C2=A0=C2=A0113.797895][ T8907] page last free stack trace: [=C2=A0=C2=A0113.797947][ T8907]=C2=A0=C2=A0__free_pages_ok+0xa4c/0xd40 [=C2=A0=C2=A0113.797991][ T8907]=C2=A0=C2=A0update_and_free_page+0x2dc/0x= 5b0 [=C2=A0=C2=A0113.798059][ T8907]=C2=A0=C2=A0free_huge_page+0x2dc/0x740 [=C2=A0=C2=A0113.798103][ T8907]=C2=A0=C2=A0__put_compound_page+0x64/0xc0 [=C2=A0=C2=A0113.798171][ T8907]=C2=A0=C2=A0putback_active_hugepage+0x228= /0x390 [=C2=A0=C2=A0113.798219][ T8907]=C2=A0=C2=A0migrate_pages+0xa78/0x19e0 [=C2=A0=C2=A0113.798273][ T8907]=C2=A0=C2=A0soft_offline_page+0x314/0x105= 0 [=C2=A0=C2=A0113.798319][ T8907]=C2=A0=C2=A0sys_madvise+0x1068/0x1080 [=C2=A0=C2=A0113.798381][ T8907]=C2=A0=C2=A0system_call+0x5c/0x68 > --- >=20 > From 441a9515dcdb29bb0ca39ff995632907d959032f Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Thu, 17 Oct 2019 11:49:15 +0200 > Subject: [PATCH] hugetlb, memory_hotplug: fix HWPoisoned tail pages pro= perly > MIME-Version: 1.0 > Content-Type: text/plain; charset=3DUTF-8 > Content-Transfer-Encoding: 8bit >=20 > Qian Cai has noticed that hwpoisoned hugetlb pages prevent memory > offlining from making a forward progress. He has nailed down the issue > to be __test_page_isolated_in_pageblock always returning EBUSY because > of soft offlined page: > [=C2=A0=C2=A0101.665160][ T8885] pfn =3D 77501, end_pfn =3D 78000 > [=C2=A0=C2=A0101.665245][ T8885] page:c00c000001dd4040 refcount:0 mapco= unt:0 > mapping:0000000000000000 index:0x0 > [=C2=A0=C2=A0101.665329][ T8885] flags: 0x3fffc000000000() > [=C2=A0=C2=A0101.665391][ T8885] raw: 003fffc000000000 0000000000000000= ffffffff01dd0500 > 0000000000000000 > [=C2=A0=C2=A0101.665498][ T8885] raw: 0000000000000000 0000000000000000= 00000000ffffffff > 0000000000000000 > [=C2=A0=C2=A0101.665588][ T8885] page dumped because: soft_offline > [=C2=A0=C2=A0101.665639][ T8885] page_owner tracks the page as freed > [=C2=A0=C2=A0101.665697][ T8885] page last allocated via order 5, migra= tetype Movable, > gfp_mask > 0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_CO= MP|__GFP_ > THISNODE) > [=C2=A0=C2=A0101.665924][ T8885]=C2=A0=C2=A0prep_new_page+0x3c0/0x440 > [=C2=A0=C2=A0101.665962][ T8885]=C2=A0=C2=A0get_page_from_freelist+0x25= 68/0x2bb0 > [=C2=A0=C2=A0101.666059][ T8885]=C2=A0=C2=A0__alloc_pages_nodemask+0x1b= 4/0x670 > [=C2=A0=C2=A0101.666115][ T8885]=C2=A0=C2=A0alloc_fresh_huge_page+0x244= /0x6e0 > [=C2=A0=C2=A0101.666183][ T8885]=C2=A0=C2=A0alloc_migrate_huge_page+0x3= 0/0x70 > [=C2=A0=C2=A0101.666254][ T8885]=C2=A0=C2=A0alloc_new_node_page+0xc4/0x= 380 > [=C2=A0=C2=A0101.666325][ T8885]=C2=A0=C2=A0migrate_pages+0x3b4/0x19e0 > [=C2=A0=C2=A0101.666375][ T8885]=C2=A0=C2=A0do_move_pages_to_node.isra.= 29.part.30+0x44/0xa0 > [=C2=A0=C2=A0101.666464][ T8885]=C2=A0=C2=A0kernel_move_pages+0x498/0xf= c0 > [=C2=A0=C2=A0101.666520][ T8885]=C2=A0=C2=A0sys_move_pages+0x28/0x40 > [=C2=A0=C2=A0101.666643][ T8885]=C2=A0=C2=A0system_call+0x5c/0x68 > [=C2=A0=C2=A0101.666665][ T8885] page last free stack trace: > [=C2=A0=C2=A0101.666704][ T8885]=C2=A0=C2=A0__free_pages_ok+0xa4c/0xd40 > [=C2=A0=C2=A0101.666773][ T8885]=C2=A0=C2=A0update_and_free_page+0x2dc/= 0x5b0 > [=C2=A0=C2=A0101.666821][ T8885]=C2=A0=C2=A0free_huge_page+0x2dc/0x740 > [=C2=A0=C2=A0101.666875][ T8885]=C2=A0=C2=A0__put_compound_page+0x64/0x= c0 > [=C2=A0=C2=A0101.666926][ T8885]=C2=A0=C2=A0putback_active_hugepage+0x2= 28/0x390 > [=C2=A0=C2=A0101.666990][ T8885]=C2=A0=C2=A0migrate_pages+0xa78/0x19e0 > [=C2=A0=C2=A0101.667048][ T8885]=C2=A0=C2=A0soft_offline_page+0x314/0x1= 050 > [=C2=A0=C2=A0101.667117][ T8885]=C2=A0=C2=A0sys_madvise+0x1068/0x1080 > [=C2=A0=C2=A0101.667185][ T8885]=C2=A0=C2=A0system_call+0x5c/0x68 >=20 > The reason is that __test_page_isolated_in_pageblock doesn't recognize > hugetlb tail pages as the HWPoison bit is not transferred from the head > page. Pfn walker then doesn't recognize those pages and so EBUSY is > returned up the call chain. >=20 > The proper fix would be to handle HWPoison throughout the huge page but > considering there is a WIP to rework that code considerably let's go > with a simple and easily backportable workaround and simply check the > the head of a compound page for the HWPoison flag. >=20 > Reported-and-analyzed-by: Qian Cai > Fixes: b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlini= ng pages") > Cc: stable > Signed-off-by: Michal Hocko > --- > mm/page_isolation.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) >=20 > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 89c19c0feadb..5fb3fee16fde 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn= , unsigned long end_pfn, > * simple way to verify that as VM_BUG_ON(), though. > */ > pfn +=3D 1 << page_order(page); > - else if (skip_hwpoisoned_pages && PageHWPoison(page)) > + else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page))) > /* A HWPoisoned page cannot be also PageBuddy */ > pfn++; > else > --=20 > 2.20.1 >=20