From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21120C433E0 for ; Thu, 11 Mar 2021 09:13:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6829664E69 for ; Thu, 11 Mar 2021 09:13:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6829664E69 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AD8518D02A0; Thu, 11 Mar 2021 04:13:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AAF468D028E; Thu, 11 Mar 2021 04:13:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95DBC8D02A0; Thu, 11 Mar 2021 04:13:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id 7D31C8D028E for ; Thu, 11 Mar 2021 04:13:52 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5DFD0612D for ; Thu, 11 Mar 2021 09:13:51 +0000 (UTC) X-FDA: 77907031062.18.C9A0794 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf21.hostedemail.com (Postfix) with ESMTP id CF0B3E0011C5 for ; Thu, 11 Mar 2021 09:13:48 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id ha17so2384853pjb.2 for ; Thu, 11 Mar 2021 01:13:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=m1cMUuEgpKnm2GTf/raAO249wZsBDF+ILTqGlU7qKVk=; b=L7VtFArDpP7/Z4xW7M5SjIdpJsgwfDjg0ysqTp59ew+B31AX8Ig7qvdpOulPS8nVu2 36s8kiciJuowre0qzGM49k87QlAaTvG1WmBZXZ/JaJMk3j8ll7Sue50VU7/uU/Ps217V MRlqmFuaX90C9lrc7Yk1jCxDVhEmgzn20lXLPRLf7Xnx9uDU+l+JMSO5HO55TkhSw6rJ HOlpduy43kf2+CXFzkbmlStjEz/ipIl6keCHd4azd9eV8jV25EOTpAc43oGebDww8+MW MCMMkt0KfgEotwnlaXaV9RybCFJsTyj2CIijAWGJnDJi4FDx2Di+YP2xfANJNLyzHVts nkSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=m1cMUuEgpKnm2GTf/raAO249wZsBDF+ILTqGlU7qKVk=; b=MaFdjFuJyyi0Wiui3ENN01J26nFCBcNjcUHpC2KST4vJL7ARFpec2ugN+0tgEhfs5o fFIq0Pc3kTu4vedACR+QrYrqpMmshpU1nVgYjJmL7EESTaJqT7EqfMDSi0PrQpBGQFf9 j+RsuQxg9JR/3FQfyJal4GSFvQLiTUELyRm20/lO93YzOon101jOSfi3iW3rSloHZaoq YSZGcMdXfKkvleO3HcG8ZigvamXRJMo5EFydkDNjtDJYRnXj7LGCH6a02ndgbrDS4XFD tjReWj+9Qk87APiLO6YlNFBPEisEISAIYSbzGcdsxuZUpjH211EuX+Lxe6nKtB0ltTAj nYpw== X-Gm-Message-State: AOAM531wlxqqpO3XAAANVS2I2/+HdiOshLRRKUkS/411xLimMs+4LxoL ezxMMm9gsaR2nyOyLCX0OzIz1YeIrXS3u1NG4npydQ== X-Google-Smtp-Source: ABdhPJzsOK4kYPzWPl16KRujaNua+UCtyCBGX0TLmBMMt/QSKW2CamsgjBDeE/g4i+ArqAX0lsPzgIsJr2C6Y8UwQe8= X-Received: by 2002:a17:90a:f008:: with SMTP id bt8mr8141746pjb.13.1615454029628; Thu, 11 Mar 2021 01:13:49 -0800 (PST) MIME-Version: 1.0 References: <20210308102807.59745-1-songmuchun@bytedance.com> <20210308102807.59745-6-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Thu, 11 Mar 2021 17:13:12 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v18 5/9] mm: hugetlb: set the PageHWPoison to the raw error page To: Michal Hocko Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , Ingo Molnar , bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , Alexander Viro , Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , "Song Bao Hua (Barry Song)" , David Hildenbrand , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Joao Martins , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel , Chen Huang , Bodeddula Balasubramaniam Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: ckygg5uctdi9o7urny5boaxzom8q4atr X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CF0B3E0011C5 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf21; identity=mailfrom; envelope-from=""; helo=mail-pj1-f49.google.com; client-ip=209.85.216.49 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615454028-984772 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 11, 2021 at 4:50 PM Michal Hocko wrote: > > On Thu 11-03-21 14:34:04, Muchun Song wrote: > > On Wed, Mar 10, 2021 at 11:28 PM Michal Hocko wrote: > > > > > > On Mon 08-03-21 18:28:03, Muchun Song wrote: > > > > Because we reuse the first tail vmemmap page frame and remap it > > > > with read-only, we cannot set the PageHWPosion on some tail pages. > > > > So we can use the head[4].private (There are at least 128 struct > > > > page structures associated with the optimized HugeTLB page, so > > > > using head[4].private is safe) to record the real error page index > > > > and set the raw error page PageHWPoison later. > > > > > > Can we have more poisoned tail pages? Also who does consume that index > > > and set the HWPoison on the proper tail page? > > > > Good point. I look at the routine of memory failure closely. > > If we do not clear the HWPoison of the head page, we cannot > > poison another tail page. > > > > So we should not set the destructor of the huge page from > > HUGETLB_PAGE_DTOR to NULL_COMPOUND_DTOR > > before calling alloc_huge_page_vmemmap(). In this case, > > the below check of PageHuge() always returns true. > > > > I need to fix this in the previous patch. > > > > memory_failure() > > if (PageHuge(page)) > > memory_failure_hugetlb() > > head = compound_head(page) > > if (TestSetPageHWPoison(head)) > > return > > I have to say that I am not fully familiar with hwpoisoning code > (especially after recent changes) but IIRC it does rely on hugetlb page > dissolving. With the new code this operation can fail which is a new > situation. Unless I am misunderstanding this can lead to a lost memory > failure operation on other tail pages. > > Anyway the above answers the question why a single slot is sufficient so > it would be great to mention that in a changelog along with the caveat > that some pages might miss their poisoning. OK. I will update the changelog. Thanks for your suggestions. > -- > Michal Hocko > SUSE Labs