From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 261FFC4338F for ; Wed, 28 Jul 2021 11:00:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 95E2F60C51 for ; Wed, 28 Jul 2021 11:00:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 95E2F60C51 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sangfor.com.cn Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 34A2C6B0036; Wed, 28 Jul 2021 07:00:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2FB4E8D0001; Wed, 28 Jul 2021 07:00:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E9396B006C; Wed, 28 Jul 2021 07:00:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id 01DD36B0036 for ; Wed, 28 Jul 2021 07:00:17 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A73461D999 for ; Wed, 28 Jul 2021 11:00:17 +0000 (UTC) X-FDA: 78411702474.22.E4A04F2 Received: from mail-m17640.qiye.163.com (mail-m17640.qiye.163.com [59.111.176.40]) by imf24.hostedemail.com (Postfix) with SMTP id 3091CB00543F for ; Wed, 28 Jul 2021 11:00:15 +0000 (UTC) Received: from [0.0.0.0] (unknown [119.136.90.116]) by mail-m17640.qiye.163.com (Hmail) with ESMTPA id A34FC54056F; Wed, 28 Jul 2021 19:00:10 +0800 (CST) Subject: Re: [PATCH v1 5/6] mm/hwpoison: make some kernel pages handlable To: Naoya Horiguchi , mike.kravetz@oracle.com Cc: Andrew Morton , David Hildenbrand , Oscar Salvador , Michal Hocko , Tony Luck , "Aneesh Kumar K.V" , Naoya Horiguchi , linux-kernel@vger.kernel.org, linux-mm@kvack.org, huangcun@sangfor.com.cn References: <20210614021212.223326-1-nao.horiguchi@gmail.com> <20210614021212.223326-6-nao.horiguchi@gmail.com> From: Ding Hui Message-ID: <271d0f41-0599-9d5d-0555-47189f476243@sangfor.com.cn> Date: Wed, 28 Jul 2021 18:59:37 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210614021212.223326-6-nao.horiguchi@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-HM-Spam-Status: e1kfGhgUHx5ZQUtXWQgPGg8OCBgUHx5ZQUlOS1dZCBgUCR5ZQVlLVUtZV1 kWDxoPAgseWUFZKDYvK1lXWShZQUhPN1dZLVlBSVdZDwkaFQgSH1lBWUMfGUhWQ0pMHR1LS0tKHR hLVRMBExYaEhckFA4PWVdZFhoPEhUdFFlBWU9LSFVKSktISkxVS1kG X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6OjI6KRw4Kz9WLTpCLx8ONVZD QxJPCglVSlVKTUlMT0xLS0pKTktIVTMWGhIXVR8SFRwTDhI7CBoVHB0UCVUYFBZVGBVFWVdZEgtZ QVlKSkJVSkhNVUJLVUpKTVlXWQgBWUFMT0lCNwY+ X-HM-Tid: 0a7aecc59b13d995kuwsa34fc54056f Content-Transfer-Encoding: quoted-printable Authentication-Results: imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of dinghui@sangfor.com.cn designates 59.111.176.40 as permitted sender) smtp.mailfrom=dinghui@sangfor.com.cn; dmarc=pass (policy=none) header.from=sangfor.com.cn X-Rspamd-Server: rspam02 X-Stat-Signature: 4wra3wmqqujenx5scrpw4knycbb96kfa X-Rspamd-Queue-Id: 3091CB00543F X-HE-Tag: 1627470015-883635 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021/6/14 10:12, Naoya Horiguchi wrote: > From: Naoya Horiguchi >=20 > HWPoisonHandlable() introduced by patch "mm,hwpoison: fix race with hug= etlb > page allocation" filters error events by page type, and only limited ev= ents > reach get_page_unless_zero() to avoid race > I want to report a bug which has relationship with "mm,hwpoison: fix=20 race with hugetlb page allocation", hugetlb pmd shared and also this patc= h. Recently, when test hugetlb and soft offline, I encountered a crash like=20 this: [449901.638605] huge_test[16596]: segfault at 8 ip 00007f5f64c39a12 sp=20 00007fff2105c020 error 4 in ld-2.23.so[7f5f64c2a000+26000] [449901.638612] Code: 48 8d 35 2c 03 01 00 48 8d 3d 31 03 01 00 ba b5 00=20 00 00 e8 f0 a5 00 00 53 49 89 fa 89 f6 48 8d 14 76 48 83 ec 10 48 8b 47=20 68 <48> 8b 78 08 49 8b 82 f8 00 00 00 48 8b 40 08 4c 8d 04 d0 49 8b 42 [449901.638885] BUG: Bad rss-counter state mm:00000000a1ce68ac idx:0 val:= 358 [449901.638894] ------------[ cut here ]------------ [449901.638962] BUG: Bad rss-counter state mm:00000000a1ce68ac idx:1 val:= 26 [449901.638966] BUG: non-zero pgtables_bytes on freeing mm: 28672 [449901.639045] kernel BUG at fs/hugetlbfs/inode.c:443! [449901.639193] invalid opcode: 0000 [#1] SMP NOPTI After a few days of digging and reproduce, it turns out that there is a=20 mechanism conflict between the get_hwpoison_page() and hugetlb pmd share: In huge_pmd_unshare(), the page_count is used to determine whether the=20 page is shared, it is not safe. My case is the same page's refcount was increaseed by=20 get_hwpoison_page() little before if (page_count(virt_to_page(ptep)) =3D=3D= =20 1) in huge_pmd_unshare(), so huge_pmd_unshare() went to wrong branch. > Actually this is too restictive because get_hwpoison_page always fails > to take refcount for any types of kernel page, leading to > MF_MSG_KERNEL_HIGH_ORDER. This is not critical (no panic), but less > informative than MF_MSG_SLAB or MF_MSG_PAGETABLE, so extend > HWPoisonHandlable() to some basic types of kernel pages (slab, pgtable, > and reserved pages). >=20 After "mm,hwpoison: fix race with hugetlb page allocation"=EF=BC=8Cthe=20 PageTable(page) is blocked to get_page_unless_zero() due to=20 "restictive", this bug is just fixed by side effect. > The "handling" for these types are still primitive (just taking refcoun= t > and setting PG_hwpoison) and some more aggressive actions for memory > error containment are possible and wanted. But compared to the older c= ode, > these cases never enter the code block of page locks (note that > page locks is not well-defined on these pages), so it's a little safer > for functions intended for user pages not to be called for kernel pages= . >=20 But the root cause is still existed, the bug may come back at any time=20 by unconsciously, like this patch, if the PageTable(page) is allowed to=20 get_page_unless_zero(), the risk is come back. I'm not sure is there any other way to determine whether the pmd page is=20 shared, so I add Mike Kravetz here, and report the risk to you. > Signed-off-by: Naoya Horiguchi > --- > mm/memory-failure.c | 28 ++++++++++++++++++++-------- > 1 file changed, 20 insertions(+), 8 deletions(-) >=20 > diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-fa= ilure.c > index b986936e50eb..0d51067f0129 100644 > --- v5.13-rc5/mm/memory-failure.c > +++ v5.13-rc5_patched/mm/memory-failure.c > @@ -1113,7 +1113,8 @@ static int page_action(struct page_state *ps, str= uct page *p, > */ > static inline bool HWPoisonHandlable(struct page *page) > { > - return PageLRU(page) || __PageMovable(page); > + return PageLRU(page) || __PageMovable(page) || > + PageSlab(page) || PageTable(page) || PageReserved(page); > } > > static int __get_hwpoison_page(struct page *page) > @@ -1260,12 +1261,6 @@ static bool hwpoison_user_mappings(struct page *= p, unsigned long pfn, > struct page *hpage =3D *hpagep; > bool mlocked =3D PageMlocked(hpage); > =20 > - /* > - * Here we are interested only in user-mapped pages, so skip any > - * other types of pages. > - */ > - if (PageReserved(p) || PageSlab(p)) > - return true; > if (!(PageLRU(hpage) || PageHuge(p))) > return true; > =20 > @@ -1670,7 +1665,10 @@ int memory_failure(unsigned long pfn, int flags) > action_result(pfn, MF_MSG_BUDDY, res); > res =3D res =3D=3D MF_RECOVERED ? 0 : -EBUSY; > } else { > - action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); > + if (PageCompound(p)) > + action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); > + else > + action_result(pfn, MF_MSG_KERNEL, MF_IGNORED); > res =3D -EBUSY; > } > goto unlock_mutex; > @@ -1681,6 +1679,20 @@ int memory_failure(unsigned long pfn, int flags) > } > } > =20 > + if (PageSlab(p)) { > + action_result(pfn, MF_MSG_SLAB, MF_IGNORED); > + res =3D -EBUSY; > + goto unlock_mutex; > + } else if (PageTable(p)) { > + action_result(pfn, MF_MSG_PAGETABLE, MF_IGNORED); > + res =3D -EBUSY; > + goto unlock_mutex; > + } else if (PageReserved(p)) { > + action_result(pfn, MF_MSG_KERNEL, MF_IGNORED); > + res =3D -EBUSY; > + goto unlock_mutex; > + } > + > if (PageTransHuge(hpage)) { > if (try_to_split_thp_page(p, "Memory Failure") < 0) { > action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); >=20 --=20 Thanks, - Ding Hui