linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: linux-mm@kvack.org
Cc: sergey.senozhatsky@gmail.com, minchan@kernel.org,
	nadav.amit@gmail.com, mgorman@suse.de, riel@redhat.com,
	luto@kernel.org, Nadav Amit <namit@vmware.com>
Subject: [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending
Date: Thu, 27 Jul 2017 04:40:15 -0700	[thread overview]
Message-ID: <20170727114015.3452-3-namit@vmware.com> (raw)
In-Reply-To: <20170727114015.3452-1-namit@vmware.com>

Reading tlb_flush_pending while the page-table lock is taken does not
require a barrier, since the lock/unlock already acts as a barrier.
Removing the barrier in mm_tlb_flush_pending() to address this issue.

However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending()
while the page-table lock is already released, which may present a
problem on architectures with weak memory model (PPC). To deal with this
case, a new parameter is added to mm_tlb_flush_pending() to indicate
if it is read without the page-table lock taken, and calling
smp_mb__after_unlock_lock() in this case.

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/arm/include/asm/pgtable.h   |  3 ++-
 arch/arm64/include/asm/pgtable.h |  3 ++-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/mm_types.h         | 31 +++++++++++++++++++++++--------
 mm/migrate.c                     |  2 +-
 5 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 1c462381c225..2e0608a8049d 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -223,7 +223,8 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_none(pte)		(!pte_val(pte))
 #define pte_present(pte)	(pte_isset((pte), L_PTE_PRESENT))
 #define pte_valid(pte)		(pte_isset((pte), L_PTE_VALID))
-#define pte_accessible(mm, pte)	(mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte))
+#define pte_accessible(mm, pte)	(mm_tlb_flush_pending(mm, true) ? \
+					pte_present(pte) : pte_valid(pte))
 #define pte_write(pte)		(pte_isclear((pte), L_PTE_RDONLY))
 #define pte_dirty(pte)		(pte_isset((pte), L_PTE_DIRTY))
 #define pte_young(pte)		(pte_isset((pte), L_PTE_YOUNG))
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index c213fdbd056c..47f934d378ca 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -108,7 +108,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
  * remapped as PROT_NONE but are yet to be flushed from the TLB.
  */
 #define pte_accessible(mm, pte)	\
-	(mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte))
+	(mm_tlb_flush_pending(mm, true) ? pte_present(pte) : \
+					  pte_valid_young(pte))
 
 static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
 {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f5af95a0c6b8..da16793203dd 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -642,7 +642,7 @@ static inline bool pte_accessible(struct mm_struct *mm, pte_t a)
 		return true;
 
 	if ((pte_flags(a) & _PAGE_PROTNONE) &&
-			mm_tlb_flush_pending(mm))
+			mm_tlb_flush_pending(mm, true))
 		return true;
 
 	return false;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 36f4ec589544..57ab8061a2c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -522,12 +522,21 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
 /*
  * Memory barriers to keep this state in sync are graciously provided by
  * the page table locks, outside of which no page table modifications happen.
- * The barriers below prevent the compiler from re-ordering the instructions
- * around the memory barriers that are already present in the code.
+ * The barriers are used to ensure the order between tlb_flush_pending updates,
+ * which happen while the lock is not taken, and the PTE updates, which happen
+ * while the lock is taken, are serialized.
  */
-static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
+static inline bool mm_tlb_flush_pending(struct mm_struct *mm, bool pt_locked)
 {
-	barrier();
+	/*
+	 * mm_tlb_flush_pending() is safe if it is executed while the page-table
+	 * lock is taken. But if the lock was already released, there does not
+	 * seem to be a guarantee that a memory barrier. A memory barrier is
+	 * therefore needed on architectures with weak memory models.
+	 */
+	if (!pt_locked)
+		smp_mb__after_unlock_lock();
+
 	return atomic_read(&mm->tlb_flush_pending) > 0;
 }
 static inline void set_tlb_flush_pending(struct mm_struct *mm)
@@ -535,19 +544,25 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
 	atomic_inc(&mm->tlb_flush_pending);
 
 	/*
-	 * Guarantee that the tlb_flush_pending store does not leak into the
+	 * Guarantee that the tlb_flush_pending increase does not leak into the
 	 * critical section updating the page tables
 	 */
 	smp_mb__before_spinlock();
 }
-/* Clearing is done after a TLB flush, which also provides a barrier. */
+
 static inline void clear_tlb_flush_pending(struct mm_struct *mm)
 {
-	barrier();
+	/*
+	 * Guarantee that the tlb_flush_pending does not not leak into the
+	 * critical section, since we must order the PTE change and changes to
+	 * the pending TLB flush indication. We could have relied on TLB flush
+	 * as a memory barrier, but this behavior is not clearly documented.
+	 */
+	smp_mb__before_atomic();
 	atomic_dec(&mm->tlb_flush_pending);
 }
 #else
-static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
+static inline bool mm_tlb_flush_pending(struct mm_struct *mm, bool pt_locked)
 {
 	return false;
 }
diff --git a/mm/migrate.c b/mm/migrate.c
index 89a0a1707f4c..169c3165be41 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1939,7 +1939,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	 * We are not sure a pending tlb flush here is for a huge page
 	 * mapping or not. Hence use the tlb range variant
 	 */
-	if (mm_tlb_flush_pending(mm))
+	if (mm_tlb_flush_pending(mm, false))
 		flush_tlb_range(vma, mmun_start, mmun_end);
 
 	/* Prepare a page as a migration target */
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-07-27 18:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-27 11:40 [PATCH v3 0/2] mm: fixes of tlb_flush_pending races Nadav Amit
2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
2017-07-28  1:34   ` Sergey Senozhatsky
2017-07-28  1:44     ` Nadav Amit
2017-07-28  1:42   ` Sergey Senozhatsky
2017-07-28  2:28   ` Rik van Riel
2017-07-27 11:40 ` Nadav Amit [this message]
2017-07-28  7:42   ` [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending Mel Gorman
2017-07-28 16:40     ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170727114015.3452-3-namit@vmware.com \
    --to=namit@vmware.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=riel@redhat.com \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox