linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>
To: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: "naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
	"linmiaohe@huawei.com" <linmiaohe@huawei.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"Luck, Tony" <tony.luck@intel.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Yin, Fengwei" <fengwei.yin@intel.com>
Subject: RE: [PATCH 1/1] mm: memory-failure: Re-split hw-poisoned huge page on -EAGAIN
Date: Wed, 20 Dec 2023 08:44:29 +0000	[thread overview]
Message-ID: <CY8PR11MB7134D0CB9A22A24E1B7E928B8996A@CY8PR11MB7134.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20231219021723.GA158136@ik1-406-35019.vs.sakura.ne.jp>

Hi Naoya Horiguchi,

Thanks for the review. 
See the comments below.

> From: Naoya Horiguchi <naoya.horiguchi@linux.dev>
> Sent: Tuesday, December 19, 2023 10:17 AM
> ...
> > The kernel log (before):
> >   [ 1116.862895] Memory failure: 0x4097fa7: recovery action for
> > unsplit thp: Ignored
> >
> > The kernel log (after):
> >   [  793.573536] Memory failure: 0x2100dda: recovery action for unsplit thp:
> Delayed
> >   [  793.574666] Memory failure: 0x2100dda: split unsplit thp successfully.
> 
> I'm unclear about the user-visible benefit of ensuring that the error thp is
> split.
> So could you explain about it?

During our testing, we observed that the hardware-poisoned huge page had been 
mapped for the victim application's text and was present in the file cache.
Unfortunately, when attempting to restart the application without splitting the thp,
the application restart failed. This was possible because its text was remapped to the 
hardware-poisoned huge page from the file cache, leading to its swift termination 
due to another MCE.

So, after re-splitting the unsplit thp successfully (drop the text mapping), 
the application restart is successful.  I'll also add this description in the commit message in the v2.

> I think that the raw error page is not unmapped (with hwpoisoned entry)
> after delayed re-splitting, so recovery action seems not complete even with
> this patch.
> So this patch seems to just convert a hwpoisoned unrecovered thp into a
> hwpoisoned unrecovered raw page.

You're correct. Thanks for catching this.
Instead of creating a new work just to split the thp, I'll leverage the existing memory_failure_queue()
 to re-split the thp in the v2, which should make the recovery action more complete.
 
-Qiuxu



  reply	other threads:[~2023-12-20  8:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-15  8:12 Qiuxu Zhuo
2023-12-19  2:17 ` Naoya Horiguchi
2023-12-20  8:44   ` Zhuo, Qiuxu [this message]
2023-12-19 11:50 ` Miaohe Lin
2023-12-20  8:56   ` Zhuo, Qiuxu
2023-12-22  6:27 ` [PATCH v2 1/2] mm: memory-failure: Make memory_failure_queue_delayed() helper Qiuxu Zhuo
2023-12-22  6:27   ` [PATCH v2 2/2] mm: memory-failure: Re-split hw-poisoned huge page on -EAGAIN Qiuxu Zhuo
2023-12-22 19:42     ` Andrew Morton
2024-01-02  2:41       ` Zhuo, Qiuxu
2024-01-03  2:47         ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY8PR11MB7134D0CB9A22A24E1B7E928B8996A@CY8PR11MB7134.namprd11.prod.outlook.com \
    --to=qiuxu.zhuo@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengwei.yin@intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@linux.dev \
    --cc=naoya.horiguchi@nec.com \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox