From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Michal Hocko <mhocko@kernel.org>,
"xishi.qiuxishi@alibaba-inc.com" <xishi.qiuxishi@alibaba-inc.com>,
"zy.zhengyi@alibaba-inc.com" <zy.zhengyi@alibaba-inc.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v1 1/2] mm: fix race on soft-offlining free huge pages
Date: Tue, 17 Jul 2018 00:25:19 +0000 [thread overview]
Message-ID: <20180717002518.GA4673@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20180713133545.658173ca953e7d2a8a4ee6bd@linux-foundation.org>
On Fri, Jul 13, 2018 at 01:35:45PM -0700, Andrew Morton wrote:
> On Fri, 13 Jul 2018 12:26:05 +0900 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
>
> > There's a race condition between soft offline and hugetlb_fault which
> > causes unexpected process killing and/or hugetlb allocation failure.
> >
> > The process killing is caused by the following flow:
> >
> > CPU 0 CPU 1 CPU 2
> >
> > soft offline
> > get_any_page
> > // find the hugetlb is free
> > mmap a hugetlb file
> > page fault
> > ...
> > hugetlb_fault
> > hugetlb_no_page
> > alloc_huge_page
> > // succeed
> > soft_offline_free_page
> > // set hwpoison flag
> > mmap the hugetlb file
> > page fault
> > ...
> > hugetlb_fault
> > hugetlb_no_page
> > find_lock_page
> > return VM_FAULT_HWPOISON
> > mm_fault_error
> > do_sigbus
> > // kill the process
> >
> >
> > The hugetlb allocation failure comes from the following flow:
> >
> > CPU 0 CPU 1
> >
> > mmap a hugetlb file
> > // reserve all free page but don't fault-in
> > soft offline
> > get_any_page
> > // find the hugetlb is free
> > soft_offline_free_page
> > // set hwpoison flag
> > dissolve_free_huge_page
> > // fail because all free hugepages are reserved
> > page fault
> > ...
> > hugetlb_fault
> > hugetlb_no_page
> > alloc_huge_page
> > ...
> > dequeue_huge_page_node_exact
> > // ignore hwpoisoned hugepage
> > // and finally fail due to no-mem
> >
> > The root cause of this is that current soft-offline code is written
> > based on an assumption that PageHWPoison flag should beset at first to
> > avoid accessing the corrupted data. This makes sense for memory_failure()
> > or hard offline, but does not for soft offline because soft offline is
> > about corrected (not uncorrected) error and is safe from data lost.
> > This patch changes soft offline semantics where it sets PageHWPoison flag
> > only after containment of the error page completes successfully.
> >
> > ...
> >
> > --- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c
> > +++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
> > @@ -1598,8 +1598,18 @@ static int soft_offline_huge_page(struct page *page, int flags)
> > if (ret > 0)
> > ret = -EIO;
> > } else {
> > - if (PageHuge(page))
> > - dissolve_free_huge_page(page);
> > + /*
> > + * We set PG_hwpoison only when the migration source hugepage
> > + * was successfully dissolved, because otherwise hwpoisoned
> > + * hugepage remains on free hugepage list, then userspace will
> > + * find it as SIGBUS by allocation failure. That's not expected
> > + * in soft-offlining.
> > + */
>
> This comment is unclear. What happens if there's a hwpoisoned page on
> the freelist? The allocator just skips it and looks for another page?
Yes, this is what the allocator does.
> Or does the allocator return the poisoned page, it gets mapped and
> userspace gets a SIGBUS when accessing it? If the latter (or the
> former!), why does the comment mention allocation failure?
The mention to allocation failure was unclear, I'd like to replace
with below, is it clearer?
+ /*
+ * We set PG_hwpoison only when the migration source hugepage
+ * was successfully dissolved, because otherwise hwpoisoned
+ * hugepage remains on free hugepage list. The allocator ignores
+ * such a hwpoisoned page so it's never allocated, but it could
+ * kill a process because of no-memory rather than hwpoison.
+ * Soft-offline never impacts the userspace, so this is undesired.
+ */
Thanks,
Naoya Horiguchi
next prev parent reply other threads:[~2018-07-17 0:28 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-13 3:26 [PATCH v1 0/2] mm: soft-offline: fix race against page allocation Naoya Horiguchi
2018-07-13 3:26 ` [PATCH v1 1/2] mm: fix race on soft-offlining free huge pages Naoya Horiguchi
2018-07-13 5:23 ` 回复:[PATCH " 裘稀石(稀石)
2018-07-13 5:28 ` 裘稀石(稀石)
2018-07-13 20:35 ` [PATCH " Andrew Morton
2018-07-17 0:25 ` Naoya Horiguchi [this message]
2018-07-13 3:26 ` [PATCH v1 2/2] mm: soft-offline: close the race against page allocation Naoya Horiguchi
2018-07-13 20:40 ` Andrew Morton
2018-07-17 0:27 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180717002518.GA4673@hori1.linux.bs1.fc.nec.co.jp \
--to=n-horiguchi@ah.jp.nec.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=xishi.qiuxishi@alibaba-inc.com \
--cc=zy.zhengyi@alibaba-inc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox