From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Andi Kleen <tatsu@ab.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Wu Fengguang <fengguang.wu@intel.com>, Mel Gorman <mel@csn.ul.ie>,
Christoph Lameter <cl@linux-foundation.org>,
Huang Ying <ying.huang@intel.com>,
Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>,
tony.luck@intel.com, LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>
Subject: [PATCH 5/7] hugetlb: fix race condition between hugepage soft offline and page fault
Date: Fri, 21 Jan 2011 15:28:58 +0900 [thread overview]
Message-ID: <1295591340-1862-6-git-send-email-n-horiguchi@ah.jp.nec.com> (raw)
In-Reply-To: <1295591340-1862-1-git-send-email-n-horiguchi@ah.jp.nec.com>
When hugepage soft offline succeeds, the old hugepage is expected
to be temporarily enqueued to free hugepage list, and then dequeued
as a HWPOISONed hugepage.
But there is a race window, which collapses reference counting.
See the following list:
soft offline page fault
soft_offline_huge_page
migrate_huge_pages
unmap_and_move_huge_page
lock_page
try_to_unmap
move_to_new_page
migrate_page
migrate_page_copy
hugetlb_fault
migration_hugepage_entry_wait
get_page_unless_zero
wait_on_page_locked
remove_migration_ptes
unlock_page
-------------------------------------------------------------------
put_page put_page
dequeue_hwpoisoned_huge_page
Two put_page()s below the horizontal line are racy.
If put_page() from soft offline comes first, the HWPOISONed hugepage
remains in free hugepage list, causing wrong results.
It's hard to fix this problem by locking because we cannot control
page fault by page lock.
So this patch just adds to free_huge_page() a HWPOISON check,
which ensures that the last user of the old hugepage dequeues it
from free hugepage list.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
mm/hugetlb.c | 28 +++++++++++++++++++++-------
1 files changed, 21 insertions(+), 7 deletions(-)
diff --git v2.6.38-rc1/mm/hugetlb.c v2.6.38-rc1/mm/hugetlb.c
index d3b856a..b777c81 100644
--- v2.6.38-rc1/mm/hugetlb.c
+++ v2.6.38-rc1/mm/hugetlb.c
@@ -524,6 +524,8 @@ struct hstate *size_to_hstate(unsigned long size)
return NULL;
}
+static int __dequeue_hwpoisoned_huge_page(struct page *hpage, struct hstate *h);
+
static void free_huge_page(struct page *page)
{
/*
@@ -548,6 +550,8 @@ static void free_huge_page(struct page *page)
h->surplus_huge_pages_node[nid]--;
} else {
enqueue_huge_page(h, page);
+ if (unlikely(PageHWPoison(page)))
+ __dequeue_hwpoisoned_huge_page(page, h);
}
spin_unlock(&hugetlb_lock);
if (mapping)
@@ -2932,17 +2936,11 @@ static int is_hugepage_on_freelist(struct page *hpage)
return 0;
}
-/*
- * This function is called from memory failure code.
- * Assume the caller holds page lock of the head page.
- */
-int dequeue_hwpoisoned_huge_page(struct page *hpage)
+static int __dequeue_hwpoisoned_huge_page(struct page *hpage, struct hstate *h)
{
- struct hstate *h = page_hstate(hpage);
int nid = page_to_nid(hpage);
int ret = -EBUSY;
- spin_lock(&hugetlb_lock);
if (is_hugepage_on_freelist(hpage)) {
list_del(&hpage->lru);
set_page_refcounted(hpage);
@@ -2950,6 +2948,22 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
h->free_huge_pages_node[nid]--;
ret = 0;
}
+ return ret;
+}
+
+/*
+ * This function is called from memory failure code.
+ * Assume the caller holds page lock of the head page.
+ */
+int dequeue_hwpoisoned_huge_page(struct page *hpage)
+{
+ struct hstate *h = page_hstate(hpage);
+ int ret;
+
+ if (!h)
+ return 0;
+ spin_lock(&hugetlb_lock);
+ ret = __dequeue_hwpoisoned_huge_page(hpage, h);
spin_unlock(&hugetlb_lock);
return ret;
}
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-01-21 6:33 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-21 6:28 [PATCH 0/7] HWPOISON for hugepage backed KVM guest Naoya Horiguchi
2011-01-21 6:28 ` [PATCH 1/7] hugetlb: check swap entry in follow_hugetlb_page() Naoya Horiguchi
2011-01-21 6:28 ` [PATCH 2/7] check hugepage swap entry in get_user_pages_fast() Naoya Horiguchi
2011-01-21 6:28 ` [PATCH 3/7] remove putback_lru_pages() in hugepage migration context Naoya Horiguchi
2011-01-21 6:40 ` Minchan Kim
2011-01-21 10:00 ` Naoya Horiguchi
2011-01-21 6:28 ` [PATCH 4/7] hugetlb, migration: add migration_hugepage_entry_wait() Naoya Horiguchi
2011-01-21 6:28 ` Naoya Horiguchi [this message]
2011-01-21 6:28 ` [PATCH 6/7] HWPOISON: pass order to set/clear_page_hwpoison_huge_page() Naoya Horiguchi
2011-01-21 6:29 ` [PATCH 7/7] HWPOISON, hugetlb: fix hard offline for hugepage backed KVM guest Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1295591340-1862-6-git-send-email-n-horiguchi@ah.jp.nec.com \
--to=n-horiguchi@ah.jp.nec.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=fernando@oss.ntt.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=tatsu@ab.jp.nec.com \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox