linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: zhenwei pi <pizhenwei@bytedance.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Tony Luck <tony.luck@intel.com>,
	Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: Re: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page
Date: Mon, 6 Jun 2022 09:15:05 +0000	[thread overview]
Message-ID: <20220606091503.GA1337789@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <3b58adbf-a8b2-8dba-71a7-123ba3850c10@bytedance.com>

On Mon, Jun 06, 2022 at 03:20:27PM +0800, zhenwei pi wrote:
> 
> 
> On 6/6/22 12:32, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Sun, Jun 05, 2022 at 12:24:24PM +0800, zhenwei pi wrote:
> > > 
> > > 
> > > On 6/5/22 02:56, Andrew Morton wrote:
> > > > On Sat,  4 Jun 2022 18:32:29 +0800 zhenwei pi <pizhenwei@bytedance.com> wrote:
> > > > 
> > > > > Currently unpoison_memory(unsigned long pfn) is designed for soft
> > > > > poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
> > > > > puts page back buddy only, this leads BUG during accessing on the
> > > > > corrupted KPTE.
> > 
> > Thank you for the patch. I think this will be helpful for integration testing.
> > 
> > You mention "hardware corrupted page" as the condition of this bug, and I
> > think that it means a real hardware error, but this BUG seems to be
> > triggered when we use mce-inject or APEI (these are also software injection
> > without corrupting the memory physically). So the actual condition is
> > "when memory_failure() is called by MCE handler"?
> > 
> 
> Yes, I use QEMU to emulate a 'real hardware error' by command:
> virsh qemu-monitor-command vm --hmp mce 0 9 0xbd000000000000c0 0xd
> 0x61234000 0x8c
> 
> > > > > 
> > > > > Do not allow to unpoison hardware corrupted page in unpoison_memory()
> > > > > to avoid BUG like this:
> > > > > 
> > > > >    Unpoison: Software-unpoisoned page 0x61234
> > > > >    BUG: unable to handle page fault for address: ffff888061234000
> > > > 
> > > > Thanks.
> > > > 
> > > > > --- a/mm/memory-failure.c
> > > > > +++ b/mm/memory-failure.c
> > > > > @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
> > > > >    {
> > > > >    	struct page *page;
> > > > >    	struct page *p;
> > > > > +	pte_t *kpte;
> > > > >    	int ret = -EBUSY;
> > > > >    	int freeit = 0;
> > > > >    	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> > > > > @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
> > > > >    	p = pfn_to_page(pfn);
> > > > >    	page = compound_head(p);
> > > > > +	kpte = virt_to_kpte((unsigned long)page_to_virt(p));
> > > > > +	if (kpte && !pte_present(*kpte)) {
> > > > > +		unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
> > > > > +				 pfn, &unpoison_rs);
> > 
> > This can prevent unpoison for hwpoison on 4kB pages, but not for hugetlb pages,
> > where I see the similar BUG as follows (even with applying your patch):
> > 
> >    [  917.806712] BUG: unable to handle page fault for address: ffff9f7bb3201000
> >    [  917.810144] #PF: supervisor write access in kernel mode
> >    [  917.812588] #PF: error_code(0x0002) - not-present page
> >    [  917.815007] PGD 104801067 P4D 104801067 PUD 10006b063 PMD 1052d0063 PTE 800ffffeccdfe062
> >    [  917.818768] Oops: 0002 [#1] PREEMPT SMP PTI
> >    [  917.820759] CPU: 0 PID: 7774 Comm: test_alloc_gene Tainted: G   M       OE     5.18.0-v5.18-220606-0942-029-ge4dcc+ #47
> >    [  917.825720] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
> >    [  917.829762] RIP: 0010:clear_page_erms+0x7/0x10
> >    [  917.831867] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 48 85 ff 0f 84 d3 00 00 00 0f b6 0f 4c
> >    [  917.840540] RSP: 0000:ffffab49c25ebdf0 EFLAGS: 00010246
> >    [  917.842839] RAX: 0000000000000000 RBX: ffffd538c4cc8000 RCX: 0000000000001000
> >    [  917.845835] RDX: 0000000080000000 RSI: 00007f2aeb600000 RDI: ffff9f7bb3201000
> >    [  917.848687] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> >    [  917.851377] R10: 0000000000000002 R11: ffff9f7b87e3a2a0 R12: 0000000000000000
> >    [  917.854035] R13: 0000000000000001 R14: ffffd538c4cc8000 R15: ffff9f7bc002a5d8
> >    [  917.856539] FS:  00007f2aebad3740(0000) GS:ffff9f7bbbc00000(0000) knlGS:0000000000000000
> >    [  917.859229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >    [  917.861149] CR2: ffff9f7bb3201000 CR3: 0000000107726003 CR4: 0000000000170ef0
> >    [  917.863433] Call Trace:
> >    [  917.864266]  <TASK>
> >    [  917.864961]  clear_huge_page+0x147/0x270
> >    [  917.866236]  hugetlb_fault+0x440/0xad0
> >    [  917.867366]  handle_mm_fault+0x270/0x290
> >    [  917.868532]  do_user_addr_fault+0x1c3/0x680
> >    [  917.869768]  exc_page_fault+0x6c/0x160
> >    [  917.870912]  ? asm_exc_page_fault+0x8/0x30
> >    [  917.872082]  asm_exc_page_fault+0x1e/0x30
> >    [  917.873220] RIP: 0033:0x7f2aeb8ba367
> > 
> > I don't think of a workaround for this now ...
> > 
> 
> Could you please tell me how to reproduce this issue?

You are familiar with qemu-monitor-command, so the following procedure
should work for you:

  - run a process using hugepages on your VM,
  - check the guest physical address of the hugepage (page-types.c is helpful for this),
  - inject a MCE with virsh qemu-monitor-command on the guest physical address, then
  - unpoison the injected physical address.


Maybe the above is enough, but in case let me share my procedure using
my own test tool.

  $ git clone https://github.com/nhoriguchi/mm_regression
  $ cd mm_regression
  $ ...               # Make sure the prerequisites (see README.md) are met.
  $ make              # Some files may fail to build, but it's ok if
                      # test_alloc_generic.c is built.
  $ ./run.sh prepare debug
  $ ./run.sh recipe list | grep mce/uc/srao/backend-hugetlb > work/debug/recipelist
  $ RUN_MODE=all ./run.sh project run
  $ RUN_MODE=all ./run.sh project run -a   # when you want to rerun

I don't want bother you to learn this tool, so if something go wrong,
feel free to let me know.

Thanks,
Naoya Horiguchi

  reply	other threads:[~2022-06-06  9:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-04 10:32 zhenwei pi
2022-06-04 18:56 ` Andrew Morton
2022-06-05  4:24   ` zhenwei pi
2022-06-06  4:32     ` HORIGUCHI NAOYA(堀口 直也)
2022-06-06  7:20       ` zhenwei pi
2022-06-06  9:15         ` HORIGUCHI NAOYA(堀口 直也) [this message]
2022-06-07 12:36           ` David Hildenbrand
2022-06-07 21:59             ` Andrew Morton
2022-06-07 23:43               ` HORIGUCHI NAOYA(堀口 直也)
2022-06-08  3:06                 ` zhenwei pi
2022-06-08  9:45               ` David Hildenbrand
2022-06-07 11:50 ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220606091503.GA1337789@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pizhenwei@bytedance.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox