From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BF13C433E2 for ; Thu, 17 Sep 2020 13:40:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9921B2075E for ; Thu, 17 Sep 2020 13:40:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9921B2075E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F0BE46B0055; Thu, 17 Sep 2020 09:40:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBBB06B005A; Thu, 17 Sep 2020 09:40:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD2136B005C; Thu, 17 Sep 2020 09:40:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id C732D6B0055 for ; Thu, 17 Sep 2020 09:40:11 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 850168249980 for ; Thu, 17 Sep 2020 13:40:11 +0000 (UTC) X-FDA: 77272662222.26.ship90_08129d327123 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 4DF981804B656 for ; Thu, 17 Sep 2020 13:40:11 +0000 (UTC) X-HE-Tag: ship90_08129d327123 X-Filterd-Recvd-Size: 4042 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Sep 2020 13:40:10 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 11DBEB13C; Thu, 17 Sep 2020 13:40:43 +0000 (UTC) Date: Thu, 17 Sep 2020 15:40:06 +0200 From: Oscar Salvador To: HORIGUCHI =?utf-8?B?TkFPWUEo5aCA5Y+j44CA55u05LmfKQ==?= Cc: "akpm@linux-foundation.org" , "aris@ruivo.org" , "mhocko@kernel.org" , "tony.luck@intel.com" , "cai@lca.pw" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" Subject: Re: [PATCH v4 0/7] HWpoison: further fixes and cleanups Message-ID: <20200917133959.GA2504@linux> References: <20200917081049.27428-1-osalvador@suse.de> <20200917113920.GA19898@hori.linux.bs1.fc.nec.co.jp> <20200917130948.GA1812@linux> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200917130948.GA1812@linux> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Queue-Id: 4DF981804B656 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 17, 2020 at 03:09:52PM +0200, Oscar Salvador wrote: > static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) > { > if (release) { > put_page(page); > drain_all_pages(page_zone(page)); > } > > ... > SetPageHWPoison(page); > page_ref_inc(page); > > 1) You are freeing the page first, which means it goes to buddy > 2) Then you set it as poisoned and you update its refcount. > > Now we have a page sitting in Buddy with a refcount = 1 and poisoned, and that is quite wrong. Hi Naoya, Ok, I tested it and with the following changes on top I cannot reproduce the issue: diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f68cb5e3b320..4ffaaa5c2603 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -67,11 +67,6 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) { - if (release) { - put_page(page); - drain_all_pages(page_zone(page)); - } - if (hugepage_or_freepage) { /* * Doing this check for free pages is also fine since dissolve_free_huge_page @@ -89,6 +84,12 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo } SetPageHWPoison(page); + + if (release) { + put_page(page); + drain_all_pages(page_zone(page)); + } + page_ref_inc(page); num_poisoned_pages_inc(); return true; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0d9f9bd0e06c..8a6ab05f074c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1173,6 +1173,16 @@ static __always_inline bool free_pages_prepare(struct page *page, trace_mm_page_free(page, order); + if (unlikely(PageHWPoison(page)) && !order) { + /* + * Untie memcg state and reset page's owner + */ + if (memcg_kmem_enabled() && PageKmemcg(page)) + __memcg_kmem_uncharge_page(page, order); + reset_page_owner(page, order); + return false; + } + /* * Check tail pages before head page information is cleared to * avoid checking PageCompound for order-0 pages. # sh tmp_run_ksm_madv.sh p1 0x7f6b6b08e000 p2 0x7f6b529ee000 madvise(p1) 0 madvise(p2) 0 writing p1 ... done writing p2 ... done soft offline soft offline returns 0 OK Can you try to re-run your tests and see if they come clean? If they do, I will try to see if Andrew can squezee above changes into [1], where they belong to. Otherwise I will craft a v5 containing all fixups (pretty unfortunate). [1] https://patchwork.kernel.org/patch/11704099/ -- Oscar Salvador SUSE L3