linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Paul Turner <pjt@google.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>,
	Zhouping Liu <zliu@redhat.com>
Subject: [PATCH 2/2] sched, numa, mm: Fixes and cleanups in do_huge_pmd_numa_page()
Date: Tue, 13 Nov 2012 18:29:43 -0800 (PST)	[thread overview]
Message-ID: <alpine.LNX.2.00.1211131828020.29612@eggly.anvils> (raw)
In-Reply-To: <alpine.LNX.2.00.1211131759390.29612@eggly.anvils>

Refuse to migrate a THPage if its page count is raised: that covers both
the case when it is mapped into another address space, and the case
when it has been pinned by get_user_pages(), fast or slow.

Repeat this check each time we recheck pmd_same() under page_table_lock:
it is unclear how necessary this is, perhaps once after lock_page() or
once after isolate_lru_page() would be enough; but normal page migration
certainly checks page count, and we often think "ah, this is safe against
page migration because its page count is raised" - though sadly without
thinking through what serialization supports that.

Do not proceed with migration when PageLRU is unset: such a page may
well be in a private list or on a pagevec, about to be added to LRU at
any instant: checking PageLRU under zone lock, as isolate_lru_page() does,
is essential before proceeding safely.

Replace trylock_page and BUG by __set_page_locked: here the page has
been allocated a few lines earlier.  And SetPageSwapBacked: it is set
later, but there may be an error path which needs it set earlier.

On error path reverse the Active, Unevictable, Mlocked changes made
by migrate_page_copy().  Update mlock_migrate_page() to account for
THPages correctly now that it can get called on them.

Cleanup: rearrange unwinding slightly, removing a few blank lines.

Previous-Version-Tested-by: Zhouping Liu <zliu@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---
I did not understand at all how pmd_page(entry) might be NULL, but
assumed that check is there for good reason and did not remove it.

 mm/huge_memory.c |   60 +++++++++++++++++++++++----------------------
 mm/internal.h    |    5 ++-
 2 files changed, 34 insertions(+), 31 deletions(-)

--- mmotm/mm/huge_memory.c	2012-11-13 14:51:04.000321370 -0800
+++ linux/mm/huge_memory.c	2012-11-13 15:01:01.892335579 -0800
@@ -751,9 +751,9 @@ void do_huge_pmd_numa_page(struct mm_str
 {
 	unsigned long haddr = address & HPAGE_PMD_MASK;
 	struct mem_cgroup *memcg = NULL;
-	struct page *new_page = NULL;
+	struct page *new_page;
 	struct page *page = NULL;
-	int node, lru;
+	int node = -1;
 
 	spin_lock(&mm->page_table_lock);
 	if (unlikely(!pmd_same(*pmd, entry)))
@@ -770,7 +770,17 @@ void do_huge_pmd_numa_page(struct mm_str
 		VM_BUG_ON(!PageCompound(page) || !PageHead(page));
 
 		get_page(page);
-		node = mpol_misplaced(page, vma, haddr);
+		/*
+		 * Do not migrate this page if it is mapped anywhere else.
+		 * Do not migrate this page if its count has been raised.
+		 * Our caller's down_read of mmap_sem excludes fork raising
+		 * mapcount; but recheck page count below whenever we take
+		 * page_table_lock - although it's unclear what pin we are
+		 * protecting against, since get_user_pages() or GUP fast
+		 * would have to fault it present before they could proceed.
+		 */
+		if (page_count(page) == 2)
+			node = mpol_misplaced(page, vma, haddr);
 		if (node != -1)
 			goto migrate;
 	}
@@ -794,7 +804,7 @@ migrate:
 
 	lock_page(page);
 	spin_lock(&mm->page_table_lock);
-	if (unlikely(!pmd_same(*pmd, entry))) {
+	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {
 		spin_unlock(&mm->page_table_lock);
 		unlock_page(page);
 		put_page(page);
@@ -803,19 +813,17 @@ migrate:
 	spin_unlock(&mm->page_table_lock);
 
 	new_page = alloc_pages_node(node,
-	    (GFP_TRANSHUGE | GFP_THISNODE) & ~__GFP_WAIT,
-	    HPAGE_PMD_ORDER);
-
+	    (GFP_TRANSHUGE | GFP_THISNODE) & ~__GFP_WAIT, HPAGE_PMD_ORDER);
 	if (!new_page)
 		goto alloc_fail;
 
-	lru = PageLRU(page);
-
-	if (lru && isolate_lru_page(page)) /* does an implicit get_page() */
+	if (isolate_lru_page(page)) {	/* Does an implicit get_page() */
+		put_page(new_page);
 		goto alloc_fail;
+	}
 
-	if (!trylock_page(new_page))
-		BUG();
+	__set_page_locked(new_page);
+	SetPageSwapBacked(new_page);
 
 	/* anon mapping, we can simply copy page->mapping to the new page: */
 	new_page->mapping = page->mapping;
@@ -826,19 +834,22 @@ migrate:
 	WARN_ON(PageLRU(new_page));
 
 	spin_lock(&mm->page_table_lock);
-	if (unlikely(!pmd_same(*pmd, entry))) {
+	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 3)) {
 		spin_unlock(&mm->page_table_lock);
-		if (lru)
-			putback_lru_page(page);
+
+		/* Reverse changes made by migrate_page_copy() */
+		if (TestClearPageActive(new_page))
+			SetPageActive(page);
+		if (TestClearPageUnevictable(new_page))
+			SetPageUnevictable(page);
+		mlock_migrate_page(page, new_page);
 
 		unlock_page(new_page);
-		ClearPageActive(new_page);	/* Set by migrate_page_copy() */
-		new_page->mapping = NULL;
 		put_page(new_page);		/* Free it */
 
 		unlock_page(page);
+		putback_lru_page(page);
 		put_page(page);			/* Drop the local reference */
-
 		return;
 	}
 	/*
@@ -867,26 +878,17 @@ migrate:
 	mem_cgroup_end_migration(memcg, page, new_page, true);
 	spin_unlock(&mm->page_table_lock);
 
-	put_page(page);			/* Drop the rmap reference */
-
 	task_numa_fault(node, HPAGE_PMD_NR);
 
-	if (lru)
-		put_page(page);		/* drop the LRU isolation reference */
-
 	unlock_page(new_page);
-
 	unlock_page(page);
+	put_page(page);			/* Drop the rmap reference */
+	put_page(page);			/* Drop the LRU isolation reference */
 	put_page(page);			/* Drop the local reference */
-
 	return;
 
 alloc_fail:
-	if (new_page)
-		put_page(new_page);
-
 	unlock_page(page);
-
 	spin_lock(&mm->page_table_lock);
 	if (unlikely(!pmd_same(*pmd, entry))) {
 		put_page(page);
--- mmotm/mm/internal.h	2012-11-09 09:43:46.896046342 -0800
+++ linux/mm/internal.h	2012-11-13 15:01:01.892335579 -0800
@@ -218,11 +218,12 @@ static inline void mlock_migrate_page(st
 {
 	if (TestClearPageMlocked(page)) {
 		unsigned long flags;
+		int nr_pages = hpage_nr_pages(page);
 
 		local_irq_save(flags);
-		__dec_zone_page_state(page, NR_MLOCK);
+		__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
 		SetPageMlocked(newpage);
-		__inc_zone_page_state(newpage, NR_MLOCK);
+		__mod_zone_page_state(page_zone(newpage), NR_MLOCK, nr_pages);
 		local_irq_restore(flags);
 	}
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-11-14  2:29 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-13 17:13 [PATCH 00/31] Latest numa/core patches, v15 Ingo Molnar
2012-11-13 17:13 ` [PATCH 01/31] mm/generic: Only flush the local TLB in ptep_set_access_flags() Ingo Molnar
2012-11-13 17:13 ` [PATCH 02/31] x86/mm: Only do a local tlb flush " Ingo Molnar
2012-11-13 17:13 ` [PATCH 03/31] sched, numa, mm: Make find_busiest_queue() a method Ingo Molnar
2012-11-13 17:13 ` [PATCH 04/31] sched, numa, mm: Describe the NUMA scheduling problem formally Ingo Molnar
2012-11-13 17:13 ` [PATCH 05/31] sched, numa, mm, s390/thp: Implement pmd_pgprot() for s390 Ingo Molnar
2012-11-13 17:13 ` [PATCH 06/31] mm/thp: Preserve pgprot across huge page split Ingo Molnar
2012-11-13 17:13 ` [PATCH 07/31] x86/mm: Introduce pte_accessible() Ingo Molnar
2012-11-13 17:13 ` [PATCH 08/31] mm: Only flush the TLB when clearing an accessible pte Ingo Molnar
2012-11-13 17:13 ` [PATCH 09/31] sched, numa, mm, MIPS/thp: Add pmd_pgprot() implementation Ingo Molnar
2012-11-13 17:13 ` [PATCH 10/31] mm/pgprot: Move the pgprot_modify() fallback definition to mm.h Ingo Molnar
2012-11-13 17:13 ` [PATCH 11/31] mm/mpol: Make MPOL_LOCAL a real policy Ingo Molnar
2012-11-13 17:13 ` [PATCH 12/31] mm/mpol: Add MPOL_MF_NOOP Ingo Molnar
2012-11-13 17:13 ` [PATCH 13/31] mm/mpol: Check for misplaced page Ingo Molnar
2012-11-13 17:13 ` [PATCH 14/31] mm/mpol: Create special PROT_NONE infrastructure Ingo Molnar
2012-11-13 17:13 ` [PATCH 15/31] mm/mpol: Add MPOL_MF_LAZY Ingo Molnar
2012-11-13 17:13 ` [PATCH 16/31] numa, mm: Support NUMA hinting page faults from gup/gup_fast Ingo Molnar
2012-11-13 17:13 ` [PATCH 17/31] mm/migrate: Introduce migrate_misplaced_page() Ingo Molnar
2012-11-13 17:13 ` [PATCH 18/31] mm/mpol: Use special PROT_NONE to migrate pages Ingo Molnar
2012-11-13 17:13 ` [PATCH 19/31] x86/mm: Completely drop the TLB flush from ptep_set_access_flags() Ingo Molnar
2012-11-13 17:13 ` [PATCH 20/31] sched, numa, mm: Introduce sched_feat_numa() Ingo Molnar
2012-11-13 17:13 ` [PATCH 21/31] sched, numa, mm: Implement THP migration Ingo Molnar
2012-11-13 18:48   ` Johannes Weiner
2012-11-14  2:23     ` Hugh Dickins
2012-11-14  2:27       ` [PATCH 1/2] sched, numa, mm: Add memcg support to do_huge_pmd_numa_page() Hugh Dickins
2012-11-14  2:29       ` Hugh Dickins [this message]
2012-11-14  4:28       ` [PATCH 21/31] sched, numa, mm: Implement THP migration Andrew Morton
2012-11-13 17:13 ` [PATCH 22/31] sched, numa, mm: Add last_cpu to page flags Ingo Molnar
2012-11-13 17:13 ` [PATCH 23/31] sched, numa, mm, arch: Add variable locality exception Ingo Molnar
2012-11-13 17:13 ` [PATCH 24/31] sched, numa, mm: Add credits for NUMA placement Ingo Molnar
2012-11-13 17:13 ` [PATCH 25/31] sched, mm, x86: Add the ARCH_SUPPORTS_NUMA_BALANCING flag Ingo Molnar
2012-11-13 17:13 ` [PATCH 26/31] sched, numa, mm: Add the scanning page fault machinery Ingo Molnar
2012-11-13 17:13 ` [PATCH 27/31] sched, numa, mm: Add adaptive NUMA affinity support Ingo Molnar
2012-11-13 17:13 ` [PATCH 28/31] sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate Ingo Molnar
2012-11-13 17:13 ` [PATCH 29/31] sched, numa, mm: Count WS scanning against present PTEs, not virtual memory ranges Ingo Molnar
2012-11-13 17:13 ` [PATCH 30/31] sched, numa, mm: Implement slow start for working set sampling Ingo Molnar
2012-11-13 17:13 ` [PATCH 31/31] mm: Allow the migration of shared pages Ingo Molnar
2012-11-13 17:54 ` [PATCH 00/31] Latest numa/core patches, v15 Mel Gorman
2012-11-14  7:52   ` Ingo Molnar
2012-11-14 11:36     ` Mel Gorman
2012-11-14 12:03     ` Mel Gorman
2012-11-17  8:45 ` Alex Shi
2012-11-18 19:32   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LNX.2.00.1211131828020.29612@eggly.anvils \
    --to=hughd@google.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=zliu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox