* [PATCH v2 0/2] mm: fixes of tlb_flush_pending @ 2017-07-26 15:02 Nadav Amit 2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit 2017-07-26 15:02 ` [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit 0 siblings, 2 replies; 6+ messages in thread From: Nadav Amit @ 2017-07-26 15:02 UTC (permalink / raw) To: linux-mm; +Cc: nadav.amit, mgorman, riel, luto, Nadav Amit These two patches address tlb_flush_pending issues. The first one address a race when accessing tlb_flush_pending and is the important one. The second patch addresses Andrew Morton question regarding the barriers. This patch is not really related to the first one: the atomic operations atomic_read() and atomic_inc() do not act as a memory barrier, and replacing existing barriers with smp_mb__after_atomic() did not seem beneficial. Yet, while reviewing the memory barriers around the use of tlb_flush_pending, few issues were identified. v1 -> v2: - Explain the implications of the implications of the race (Andrew) - Mark the patch that address the race as stable (Andrew) - Add another patch to clean the use of barriers (Andrew) Nadav Amit (2): mm: migrate: prevent racy access to tlb_flush_pending mm: migrate: fix barriers around tlb_flush_pending include/linux/mm_types.h | 26 ++++++++++++++++---------- kernel/fork.c | 2 +- mm/debug.c | 2 +- mm/migrate.c | 9 +++++++++ 4 files changed, 27 insertions(+), 12 deletions(-) -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending 2017-07-26 15:02 [PATCH v2 0/2] mm: fixes of tlb_flush_pending Nadav Amit @ 2017-07-26 15:02 ` Nadav Amit 2017-07-27 6:48 ` Mel Gorman 2017-07-29 23:50 ` kbuild test robot 2017-07-26 15:02 ` [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit 1 sibling, 2 replies; 6+ messages in thread From: Nadav Amit @ 2017-07-26 15:02 UTC (permalink / raw) To: linux-mm; +Cc: nadav.amit, mgorman, riel, luto, stable, Nadav Amit From: Nadav Amit <nadav.amit@gmail.com> Setting and clearing mm->tlb_flush_pending can be performed by multiple threads, since mmap_sem may only be acquired for read in task_numa_work(). If this happens, tlb_flush_pending might be cleared while one of the threads still changes PTEs and batches TLB flushes. This can lead to the same race between migration and change_protection_range() that led to the introduction of tlb_flush_pending. The result of this race was data corruption, which means that this patch also addresses a theoretically possible data corruption. An actual data corruption was not observed, yet the race was was confirmed by adding assertion to check tlb_flush_pending is not set by two threads, adding artificial latency in change_protection_range() and using sysctl to reduce kernel.numa_balancing_scan_delay_ms. Fixes: 20841405940e ("mm: fix TLB flush race between migration, and change_protection_range") Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@vmware.com> --- include/linux/mm_types.h | 8 ++++---- kernel/fork.c | 2 +- mm/debug.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 45cdb27791a3..36f4ec589544 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -493,7 +493,7 @@ struct mm_struct { * can move process memory needs to flush the TLB when moving a * PROT_NONE or PROT_NUMA mapped page. */ - bool tlb_flush_pending; + atomic_t tlb_flush_pending; #endif struct uprobes_state uprobes_state; #ifdef CONFIG_HUGETLB_PAGE @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm) static inline bool mm_tlb_flush_pending(struct mm_struct *mm) { barrier(); - return mm->tlb_flush_pending; + return atomic_read(&mm->tlb_flush_pending) > 0; } static inline void set_tlb_flush_pending(struct mm_struct *mm) { - mm->tlb_flush_pending = true; + atomic_inc(&mm->tlb_flush_pending); /* * Guarantee that the tlb_flush_pending store does not leak into the @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm) static inline void clear_tlb_flush_pending(struct mm_struct *mm) { barrier(); - mm->tlb_flush_pending = false; + atomic_dec(&mm->tlb_flush_pending); } #else static inline bool mm_tlb_flush_pending(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index e53770d2bf95..5a7ecfbb7420 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -809,7 +809,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm_init_aio(mm); mm_init_owner(mm, p); mmu_notifier_mm_init(mm); - clear_tlb_flush_pending(mm); + atomic_set(&mm->tlb_flush_pending, 0); #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS mm->pmd_huge_pte = NULL; #endif diff --git a/mm/debug.c b/mm/debug.c index db1cd26d8752..d70103bb4731 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm) mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq, #endif #if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) - mm->tlb_flush_pending, + atomic_read(&mm->tlb_flush_pending), #endif mm->def_flags, &mm->def_flags ); -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending 2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit @ 2017-07-27 6:48 ` Mel Gorman 2017-07-29 23:50 ` kbuild test robot 1 sibling, 0 replies; 6+ messages in thread From: Mel Gorman @ 2017-07-27 6:48 UTC (permalink / raw) To: Nadav Amit; +Cc: linux-mm, nadav.amit, riel, luto, stable On Wed, Jul 26, 2017 at 08:02:13AM -0700, Nadav Amit wrote: > From: Nadav Amit <nadav.amit@gmail.com> > > Setting and clearing mm->tlb_flush_pending can be performed by multiple > threads, since mmap_sem may only be acquired for read in > task_numa_work(). If this happens, tlb_flush_pending might be cleared > while one of the threads still changes PTEs and batches TLB flushes. > > This can lead to the same race between migration and > change_protection_range() that led to the introduction of > tlb_flush_pending. The result of this race was data corruption, which > means that this patch also addresses a theoretically possible data > corruption. > > An actual data corruption was not observed, yet the race was > was confirmed by adding assertion to check tlb_flush_pending is not set > by two threads, adding artificial latency in change_protection_range() > and using sysctl to reduce kernel.numa_balancing_scan_delay_ms. > > Fixes: 20841405940e ("mm: fix TLB flush race between migration, and > change_protection_range") > > Cc: stable@vger.kernel.org > > Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@suse.de> -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending 2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit 2017-07-27 6:48 ` Mel Gorman @ 2017-07-29 23:50 ` kbuild test robot 1 sibling, 0 replies; 6+ messages in thread From: kbuild test robot @ 2017-07-29 23:50 UTC (permalink / raw) To: Nadav Amit; +Cc: kbuild-all, linux-mm, nadav.amit, mgorman, riel, luto, stable [-- Attachment #1: Type: text/plain, Size: 4595 bytes --] Hi Nadav, [auto build test ERROR on linus/master] [also build test ERROR on v4.13-rc2 next-20170728] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nadav-Amit/mm-fixes-of-tlb_flush_pending/20170728-034608 config: blackfin-BF537-STAMP_defconfig (attached as .config) compiler: bfin-uclinux-gcc (GCC) 6.2.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=blackfin All errors (new ones prefixed by >>): In file included from include/asm-generic/bug.h:4:0, from arch/blackfin/include/asm/bug.h:71, from include/linux/bug.h:4, from include/linux/mmdebug.h:4, from include/linux/gfp.h:4, from include/linux/slab.h:14, from kernel/fork.c:14: kernel/fork.c: In function 'mm_init': >> kernel/fork.c:810:16: error: 'struct mm_struct' has no member named 'tlb_flush_pending' atomic_set(&mm->tlb_flush_pending, 0); ^ include/linux/compiler.h:329:17: note: in definition of macro 'WRITE_ONCE' union { typeof(x) __val; char __c[1]; } __u = \ ^ kernel/fork.c:810:2: note: in expansion of macro 'atomic_set' atomic_set(&mm->tlb_flush_pending, 0); ^~~~~~~~~~ >> kernel/fork.c:810:16: error: 'struct mm_struct' has no member named 'tlb_flush_pending' atomic_set(&mm->tlb_flush_pending, 0); ^ include/linux/compiler.h:330:30: note: in definition of macro 'WRITE_ONCE' { .__val = (__force typeof(x)) (val) }; \ ^ kernel/fork.c:810:2: note: in expansion of macro 'atomic_set' atomic_set(&mm->tlb_flush_pending, 0); ^~~~~~~~~~ >> kernel/fork.c:810:16: error: 'struct mm_struct' has no member named 'tlb_flush_pending' atomic_set(&mm->tlb_flush_pending, 0); ^ include/linux/compiler.h:331:22: note: in definition of macro 'WRITE_ONCE' __write_once_size(&(x), __u.__c, sizeof(x)); \ ^ kernel/fork.c:810:2: note: in expansion of macro 'atomic_set' atomic_set(&mm->tlb_flush_pending, 0); ^~~~~~~~~~ >> kernel/fork.c:810:16: error: 'struct mm_struct' has no member named 'tlb_flush_pending' atomic_set(&mm->tlb_flush_pending, 0); ^ include/linux/compiler.h:331:42: note: in definition of macro 'WRITE_ONCE' __write_once_size(&(x), __u.__c, sizeof(x)); \ ^ kernel/fork.c:810:2: note: in expansion of macro 'atomic_set' atomic_set(&mm->tlb_flush_pending, 0); ^~~~~~~~~~ vim +810 kernel/fork.c 787 788 static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, 789 struct user_namespace *user_ns) 790 { 791 mm->mmap = NULL; 792 mm->mm_rb = RB_ROOT; 793 mm->vmacache_seqnum = 0; 794 atomic_set(&mm->mm_users, 1); 795 atomic_set(&mm->mm_count, 1); 796 init_rwsem(&mm->mmap_sem); 797 INIT_LIST_HEAD(&mm->mmlist); 798 mm->core_state = NULL; 799 atomic_long_set(&mm->nr_ptes, 0); 800 mm_nr_pmds_init(mm); 801 mm->map_count = 0; 802 mm->locked_vm = 0; 803 mm->pinned_vm = 0; 804 memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); 805 spin_lock_init(&mm->page_table_lock); 806 mm_init_cpumask(mm); 807 mm_init_aio(mm); 808 mm_init_owner(mm, p); 809 mmu_notifier_mm_init(mm); > 810 atomic_set(&mm->tlb_flush_pending, 0); 811 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS 812 mm->pmd_huge_pte = NULL; 813 #endif 814 815 if (current->mm) { 816 mm->flags = current->mm->flags & MMF_INIT_MASK; 817 mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK; 818 } else { 819 mm->flags = default_dump_filter; 820 mm->def_flags = 0; 821 } 822 823 if (mm_alloc_pgd(mm)) 824 goto fail_nopgd; 825 826 if (init_new_context(p, mm)) 827 goto fail_nocontext; 828 829 mm->user_ns = get_user_ns(user_ns); 830 return mm; 831 832 fail_nocontext: 833 mm_free_pgd(mm); 834 fail_nopgd: 835 free_mm(mm); 836 return NULL; 837 } 838 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 14145 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending 2017-07-26 15:02 [PATCH v2 0/2] mm: fixes of tlb_flush_pending Nadav Amit 2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit @ 2017-07-26 15:02 ` Nadav Amit 2017-07-27 0:04 ` Minchan Kim 1 sibling, 1 reply; 6+ messages in thread From: Nadav Amit @ 2017-07-26 15:02 UTC (permalink / raw) To: linux-mm; +Cc: nadav.amit, mgorman, riel, luto, Nadav Amit Reading tlb_flush_pending while the page-table lock is taken does not require a barrier, since the lock/unlock already acts as a barrier. Removing the barrier in mm_tlb_flush_pending() to address this issue. However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending() while the page-table lock is already released, which may present a problem on architectures with weak memory model (PPC). Use smp_mb__after_unlock_lock() in that case. Signed-off-by: Nadav Amit <namit@vmware.com> --- include/linux/mm_types.h | 18 ++++++++++++------ mm/migrate.c | 9 +++++++++ 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 36f4ec589544..312eec5690d4 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -522,12 +522,12 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm) /* * Memory barriers to keep this state in sync are graciously provided by * the page table locks, outside of which no page table modifications happen. - * The barriers below prevent the compiler from re-ordering the instructions - * around the memory barriers that are already present in the code. + * The barriers are used to ensure the order between tlb_flush_pending updates, + * which happen while the lock is not taken, and the PTE updates, which happen + * while the lock is taken, are serialized. */ static inline bool mm_tlb_flush_pending(struct mm_struct *mm) { - barrier(); return atomic_read(&mm->tlb_flush_pending) > 0; } static inline void set_tlb_flush_pending(struct mm_struct *mm) @@ -535,15 +535,21 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm) atomic_inc(&mm->tlb_flush_pending); /* - * Guarantee that the tlb_flush_pending store does not leak into the + * Guarantee that the tlb_flush_pending increase does not leak into the * critical section updating the page tables */ smp_mb__before_spinlock(); } -/* Clearing is done after a TLB flush, which also provides a barrier. */ + static inline void clear_tlb_flush_pending(struct mm_struct *mm) { - barrier(); + /* + * Guarantee that the tlb_flush_pending does not not leak into the + * critical section, since we must order the PTE change and changes to + * the pending TLB flush indication. We could have relied on TLB flush + * as a memory barrier, but this behavior is not clearly documented. + */ + smp_mb__before_atomic(); atomic_dec(&mm->tlb_flush_pending); } #else diff --git a/mm/migrate.c b/mm/migrate.c index 89a0a1707f4c..85c7134d70cc 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1935,6 +1935,15 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, put_page(new_page); goto out_fail; } + + /* + * mm_tlb_flush_pending() is safe if it is executed while the page-table + * lock is taken. But here, it is executed while the page-table lock is + * already released. This requires a full memory barrier on + * architectures with weak memory models. + */ + smp_mb__after_unlock_lock(); + /* * We are not sure a pending tlb flush here is for a huge page * mapping or not. Hence use the tlb range variant -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending 2017-07-26 15:02 ` [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit @ 2017-07-27 0:04 ` Minchan Kim 0 siblings, 0 replies; 6+ messages in thread From: Minchan Kim @ 2017-07-27 0:04 UTC (permalink / raw) To: Nadav Amit; +Cc: linux-mm, nadav.amit, mgorman, riel, luto On Wed, Jul 26, 2017 at 08:02:14AM -0700, Nadav Amit wrote: > Reading tlb_flush_pending while the page-table lock is taken does not > require a barrier, since the lock/unlock already acts as a barrier. > Removing the barrier in mm_tlb_flush_pending() to address this issue. > > However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending() > while the page-table lock is already released, which may present a > problem on architectures with weak memory model (PPC). Use > smp_mb__after_unlock_lock() in that case. > > Signed-off-by: Nadav Amit <namit@vmware.com> > --- > include/linux/mm_types.h | 18 ++++++++++++------ > mm/migrate.c | 9 +++++++++ > 2 files changed, 21 insertions(+), 6 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 36f4ec589544..312eec5690d4 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -522,12 +522,12 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm) > /* > * Memory barriers to keep this state in sync are graciously provided by > * the page table locks, outside of which no page table modifications happen. > - * The barriers below prevent the compiler from re-ordering the instructions > - * around the memory barriers that are already present in the code. > + * The barriers are used to ensure the order between tlb_flush_pending updates, > + * which happen while the lock is not taken, and the PTE updates, which happen > + * while the lock is taken, are serialized. > */ > static inline bool mm_tlb_flush_pending(struct mm_struct *mm) > { > - barrier(); > return atomic_read(&mm->tlb_flush_pending) > 0; > } > static inline void set_tlb_flush_pending(struct mm_struct *mm) > @@ -535,15 +535,21 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm) > atomic_inc(&mm->tlb_flush_pending); > > /* > - * Guarantee that the tlb_flush_pending store does not leak into the > + * Guarantee that the tlb_flush_pending increase does not leak into the > * critical section updating the page tables > */ > smp_mb__before_spinlock(); > } > -/* Clearing is done after a TLB flush, which also provides a barrier. */ > + > static inline void clear_tlb_flush_pending(struct mm_struct *mm) > { > - barrier(); > + /* > + * Guarantee that the tlb_flush_pending does not not leak into the > + * critical section, since we must order the PTE change and changes to > + * the pending TLB flush indication. We could have relied on TLB flush > + * as a memory barrier, but this behavior is not clearly documented. > + */ > + smp_mb__before_atomic(); > atomic_dec(&mm->tlb_flush_pending); > } > #else > diff --git a/mm/migrate.c b/mm/migrate.c > index 89a0a1707f4c..85c7134d70cc 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1935,6 +1935,15 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, > put_page(new_page); > goto out_fail; > } > + > + /* > + * mm_tlb_flush_pending() is safe if it is executed while the page-table > + * lock is taken. But here, it is executed while the page-table lock is > + * already released. This requires a full memory barrier on > + * architectures with weak memory models. > + */ > + smp_mb__after_unlock_lock(); > + As you saw my work, I will use mm_tlb_flush_pending in tlb_finish_mmu where page-table lock is already released. So, I should use same comment/barrier in there, too. Like that, mm_tlb_flush_pending user should be aware of whether he is calling the mm_tlb_flush_pending inside of pte lock or not. I think it would be better to say about it as function interface. IOW, bool mm_tlb_flush_pending(bool pte_locked) Otherwise, at least, I hope comment you wrote in here should be in mm_tlb_flush_pending for users to catch it up. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-07-29 23:52 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-26 15:02 [PATCH v2 0/2] mm: fixes of tlb_flush_pending Nadav Amit 2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit 2017-07-27 6:48 ` Mel Gorman 2017-07-29 23:50 ` kbuild test robot 2017-07-26 15:02 ` [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit 2017-07-27 0:04 ` Minchan Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox