linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory
Date: Mon, 17 Aug 2015 04:32:08 +0000	[thread overview]
Message-ID: <1439785924-27885-1-git-send-email-n-horiguchi@ah.jp.nec.com> (raw)
In-Reply-To: <BLU436-SMTP2235CDFEDA4DEB534BF8C85807C0@phx.gbl>

On Fri, Aug 14, 2015 at 05:01:34PM +0800, Wanpeng Li wrote:
> On 8/14/15 4:38 PM, Naoya Horiguchi wrote:
> > On Fri, Aug 14, 2015 at 03:59:21PM +0800, Wanpeng Li wrote:
> >> On 8/14/15 3:54 PM, Wanpeng Li wrote:
> >>> [...]
> >>>> OK, then I rethink of handling the race in unpoison_memory().
> >>>>
> >>>> Currently properly contained/hwpoisoned pages should have page refcount 1
> >>>> (when the memory error hits LRU pages or hugetlb pages) or refcount 0
> >>>> (when the memory error hits the buddy page.) And current unpoison_memory()
> >>>> implicitly assumes this because otherwise the unpoisoned page has no place
> >>>> to go and it's just leaked.
> >>>> So to avoid the kernel panic, adding prechecks of refcount and mapcount
> >>>> to limit the page to unpoison for only unpoisonable pages looks OK to me.
> >>>> The page under soft offlining always has refcount >=2 and/or mapcount > 0,
> >>>> so such pages should be filtered out.
> >>>>
> >>>> Here's a patch. In my testing (run soft offline stress testing then repeat
> >>>> unpoisoning in background,) the reported (or similar) bug doesn't happen.
> >>>> Can I have your comments?
> >>> As page_action() prints out page maybe still referenced by some users,
> >>> however, PageHWPoison has already set. So you will leak many poison pages.
> >>>
> >> Anyway, the bug is still there.
> >>
> >> [  944.387559] BUG: Bad page state in process expr  pfn:591e3
> >> [  944.393053] page:ffffea00016478c0 count:-1 mapcount:0 mapping:
> >> (null) index:0x2
> >> [  944.401147] flags: 0x1fffff80000000()
> >> [  944.404819] page dumped because: nonzero _count
> > Hmm, no luck :(
> >
> > To investigate more, I'd like to test the exactly same kernel as yours, so
> > could you share the kernel info (.config and base kernel and what patches
> > you applied)? or pushing your tree somewhere like github?
> > # if you like, sending to me privately is fine.
> >
> > I think that I tested v4.2-rc6 + <your recent 7 hwpoison patches> +
> > "mm/hwpoison: fix race between soft_offline_page and unpoison_memory",
> > but I experienced some conflict in applying your patches for some reason,
> > so it might happen that we are testing on different kernels.
> 
> I don't have special config and tree, the latest mmotm has already
> merged my recent 8 hwpoison patches, you can test based on it.

OK, so I wrote the next version against mmotm-2015-08-13-15-29 (replied to
this email.) It moves PageSetHWPoison part into migration code, which should
close up the reported race window and minimize the another revived race window
of reusing offlined pages, so I feel that it's a good compromise between two
races.

My testing shows no kernel panic with these patches (same testing easily caused
panics for bare mmotm-2015-08-13-15-29,) so they should work. But I'm appreciated
if you help double checking.

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-08-17  4:33 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-13  7:09 Wanpeng Li
2015-08-13  8:53 ` Naoya Horiguchi
2015-08-13  9:18   ` Wanpeng Li
2015-08-13 10:04     ` Naoya Horiguchi
2015-08-13 10:27       ` Wanpeng Li
2015-08-14  4:19         ` Naoya Horiguchi
2015-08-14  5:03           ` Wanpeng Li
2015-08-14  7:26             ` Naoya Horiguchi
2015-08-14  7:54               ` Wanpeng Li
2015-08-14  7:59                 ` Wanpeng Li
2015-08-14  8:38                   ` Naoya Horiguchi
2015-08-14  9:01                     ` Wanpeng Li
2015-08-17  4:32                       ` Naoya Horiguchi [this message]
2015-08-17  4:32                         ` [PATCH v2 1/3] mm/hwpoison: introduce num_poisoned_pages wrappers Naoya Horiguchi
2015-08-17  4:32                         ` [PATCH v2 2/3] mm/hwpoison: fix race between soft_offline_page and unpoison_memory Naoya Horiguchi
2015-08-17  4:32                         ` [PATCH v2 3/3] mm/hwpoison: don't try to unpoison containment-failed pages Naoya Horiguchi
2015-08-17  5:29                         ` [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory Wanpeng Li
2015-08-14  8:02                 ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1439785924-27885-1-git-send-email-n-horiguchi@ah.jp.nec.com \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox