From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFD34C433E7 for ; Tue, 13 Oct 2020 14:43:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 537692487A for ; Tue, 13 Oct 2020 14:43:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 537692487A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5627C6B0062; Tue, 13 Oct 2020 10:43:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 516F46B0068; Tue, 13 Oct 2020 10:43:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42E386B006C; Tue, 13 Oct 2020 10:43:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0059.hostedemail.com [216.40.44.59]) by kanga.kvack.org (Postfix) with ESMTP id 346D96B0062 for ; Tue, 13 Oct 2020 10:43:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6F8EA3624 for ; Tue, 13 Oct 2020 14:43:37 +0000 (UTC) X-FDA: 77367170874.16.bat77_4c11a5127204 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id E06E7100688A8 for ; Tue, 13 Oct 2020 14:43:36 +0000 (UTC) X-HE-Tag: bat77_4c11a5127204 X-Filterd-Recvd-Size: 6160 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Tue, 13 Oct 2020 14:43:35 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 35DAFB19A; Tue, 13 Oct 2020 14:43:34 +0000 (UTC) From: Oscar Salvador To: n-horiguchi@ah.jp.nec.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador , Naoya Horiguchi Subject: [PATCH v5 2/4] mm,hwpoison: take free pages off the buddy freelists Date: Tue, 13 Oct 2020 16:44:45 +0200 Message-Id: <20201013144447.6706-3-osalvador@suse.de> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201013144447.6706-1-osalvador@suse.de> References: <20201013144447.6706-1-osalvador@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The crux of the matter is that historically we left poisoned pages in the buddy system because we have some checks in place when allocating a page that are gatekeeper for poisoned pages. Unfortunately, we do have other users (e.g: compaction [1]) that scan buddy freelists and try to get a page from there without checking whether the page is HWPoison. As I stated already, I think it is fundamentally wrong to keep HWPoison pages within the buddy systems, checks in place or not. Let us fix this the same way we did for soft_offline [2], taking the page off the buddy freelist so it is completely unreachable. Note that this is fairly simple to trigger, as we only need to poison fre= e buddy pages (madvise MADV_HWPOISON) and then run some sort of memory stre= ss system. Just for a matter of reference, I put a dump_page() in compaction_alloc() to trigger for HWPoison patches: kernel: page:0000000012b2982b refcount:1 mapcount:0 mapping:0000000000000= 000 index:0x1 pfn:0x1d5db kernel: flags: 0xfffffc0800000(hwpoison) kernel: raw: 000fffffc0800000 ffffea00007573c8 ffffc90000857de0 000000000= 0000000 kernel: raw: 0000000000000001 0000000000000000 00000001ffffffff 000000000= 0000000 kernel: page dumped because: compaction_alloc kernel: CPU: 4 PID: 123 Comm: kcompactd0 Tainted: G E 5.9.= 0-rc2-mm1-1-default+ #5 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1= .10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 kernel: Call Trace: kernel: dump_stack+0x6d/0x8b kernel: compaction_alloc+0xb2/0xc0 kernel: migrate_pages+0x2a6/0x12a0 kernel: ? isolate_freepages+0xc80/0xc80 kernel: ? __ClearPageMovable+0xb0/0xb0 kernel: compact_zone+0x5eb/0x11c0 kernel: ? finish_task_switch+0x74/0x300 kernel: ? lock_timer_base+0xa8/0x170 kernel: proactive_compact_node+0x89/0xf0 kernel: ? kcompactd+0x2d0/0x3a0 kernel: kcompactd+0x2d0/0x3a0 kernel: ? finish_wait+0x80/0x80 kernel: ? kcompactd_do_work+0x350/0x350 kernel: kthread+0x118/0x130 kernel: ? kthread_associate_blkcg+0xa0/0xa0 kernel: ret_from_fork+0x22/0x30 After that, if e.g: a process faults in the page, it will get killed unexpectedly. Fix it by containing the page immediatelly. Besides that, two more changes can be noticed: * MF_DELAYED no longer suits as we are fixing the issue by containing the page immediately, so it does no longer rely on the allocation-time checks to stop HWPoison to be handed over. gain unless it is unpoisoned, so we fixed the situation. Because of that, let us use MF_RECOVERED from now on. * The second block that handles PageBuddy pages is no longer needed: We call shake_page and then check whether the page is Buddy because shake_page calls drain_all_pages, which sends pcp-pages back to the buddy freelists, so we could have a chance to handle free pages. Currently, get_hwpoison_page already calls drain_all_pages, and we call get_hwpoison_page right before coming here, so we should be on the safe side. [1] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u [2] https://patchwork.kernel.org/cover/11792607/ Signed-off-by: Oscar Salvador Acked-by: Naoya Horiguchi --- mm/memory-failure.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index e2f12410c594..181bed890c16 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1341,6 +1341,7 @@ int memory_failure(unsigned long pfn, int flags) struct dev_pagemap *pgmap; int res; unsigned long page_flags; + bool retry =3D true; =20 if (!sysctl_memory_failure_recovery) panic("Memory failure on page %lx", pfn); @@ -1358,6 +1359,7 @@ int memory_failure(unsigned long pfn, int flags) return -ENXIO; } =20 +try_again: if (PageHuge(p)) return memory_failure_hugetlb(pfn, flags); if (TestSetPageHWPoison(p)) { @@ -1382,8 +1384,21 @@ int memory_failure(unsigned long pfn, int flags) */ if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) { if (is_free_buddy_page(p)) { - action_result(pfn, MF_MSG_BUDDY, MF_DELAYED); - return 0; + if (take_page_off_buddy(p)) { + page_ref_inc(p); + res =3D MF_RECOVERED; + } else { + /* We lost the race, try again */ + if (retry) { + ClearPageHWPoison(p); + num_poisoned_pages_dec(); + retry =3D false; + goto try_again; + } + res =3D MF_FAILED; + } + action_result(pfn, MF_MSG_BUDDY, res); + return res =3D=3D MF_RECOVERED ? 0 : -EBUSY; } else { action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); return -EBUSY; @@ -1407,14 +1422,6 @@ int memory_failure(unsigned long pfn, int flags) * walked by the page reclaim code, however that's not a big loss. */ shake_page(p, 0); - /* shake_page could have turned it free. */ - if (!PageLRU(p) && is_free_buddy_page(p)) { - if (flags & MF_COUNT_INCREASED) - action_result(pfn, MF_MSG_BUDDY, MF_DELAYED); - else - action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED); - return 0; - } =20 lock_page(p); =20 --=20 2.26.2