linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: "mhocko@kernel.org" <mhocko@kernel.org>,
	"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"vbabka@suse.cz" <vbabka@suse.cz>
Subject: Re: poisoned pages do not play well in the buddy allocator
Date: Tue, 27 Aug 2019 01:34:29 +0000	[thread overview]
Message-ID: <20190827013429.GA5125@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20190826104144.GA7849@linux>

On Mon, Aug 26, 2019 at 12:41:50PM +0200, Oscar Salvador wrote:
> Hi,
> 
> When analyzing a problem reported by one of our customers, I stumbbled upon an issue
> that origins from the fact that poisoned pages end up in the buddy allocator.
> 
> Let me break down the stepts that lie to the problem:
> 
> 1) We soft-offline a page
> 2) Page gets flagged as HWPoison and is being sent to the buddy allocator.
>    This is done through set_hwpoison_free_buddy_page().
> 3) Kcompactd wakes up in order to perform some compaction.
> 4) compact_zone() will call migrate_pages()
> 5) migrate_pages() will try to get a new page from compaction_alloc() to migrate to
> 6) if cc->freelist is empty, compaction_alloc() will call isolate_free_pagesblock()
> 7) isolate_free_pagesblock only checks for PageBuddy() to assume that a page is OK
>    to be used to migrate to. Since HWPoisoned page are also PageBuddy, we add
>    the page to the list. (same problem exists in fast_isolate_freepages()).
> 
> The outcome of that is that we end up happily handing poisoned pages in compaction_alloc,
> so if we ever got a fault on that page through *_fault, we will return VM_FAULT_HWPOISON,
> and the process will be killed.
> 
> I first though that I could get away with it by checking PageHWPoison in
> {fast_isolate_freepages/isolate_free_pagesblock}, but not really.
> It might be that the page we are checking is an order > 0 page, so the first page
> might not be poisoned, but the one the follows might be, and we end up in the
> same situation.

Yes, this is a whole point of the current implementation.

> 
> After some more thought, I really came to the conclusion that HWPoison pages should not
> really be in the buddy allocator, as this is only asking for problems.
> In this case it is only compaction code, but it could be happening somewhere else,
> and one would expect that the pages you got from the buddy allocator are __ready__ to use.
> 
> I __think__ that we thought we were safe to put HWPoison pages in the buddy allocator as we
> perform healthy checks when getting a page from there, so we skip poisoned pages
> 
> Of course, this is not the end of the story, now that someone got a page, if he frees it,
> there is a high chance that this page ends up in a pcplist (I saw that).
> Unless we are on CONFIG_VM_DEBUG, we do not check for the health of pages got from pcplist,
> as we do when getting a page from the buddy allocator.
> 
> I checked [1], and it seems that [2] was going towards fixing this kind of issue.
> 
> I think it is about time to revamp the whole thing.
> 
> @Naoya: I could give it a try if you are busy.

Thanks for raising hand. That's really wonderful. I think that the series [1] is not
merge yet but not rejected yet, so feel free to reuse/update/revamp it.

> 
> [1] https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> [2] https://lore.kernel.org/linux-mm/1541746035-13408-9-git-send-email-n-horiguchi@ah.jp.nec.com/
> 
> -- 
> Oscar Salvador
> SUSE L3
> 


  parent reply	other threads:[~2019-08-27  1:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26 10:41 Oscar Salvador
2019-08-26 11:14 ` Michal Hocko
2019-08-27  1:34 ` Naoya Horiguchi [this message]
2019-08-27  7:28   ` Oscar Salvador
2019-08-30 10:45     ` Oscar Salvador
2019-08-30 12:36       ` Michal Hocko
2019-08-30 13:21         ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190827013429.GA5125@hori.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox