* [PATCH 0/2][RFC] mm callback for batched pte updates
@ 2016-07-05 12:00 Martin Schwidefsky
2016-07-05 12:00 ` [PATCH 1/2] mm: add callback to prepare the update of multiple page table entries Martin Schwidefsky
2016-07-05 12:00 ` [PATCH 2/2] s390/mm: use ipte range to invalidate " Martin Schwidefsky
0 siblings, 2 replies; 3+ messages in thread
From: Martin Schwidefsky @ 2016-07-05 12:00 UTC (permalink / raw)
To: linux-mm; +Cc: Martin Schwidefsky
Hello,
there is another peculiarity on s390 I would like to exploit, the
range option of the IPTE instruction. This is an extension that allows
to set the invalid bit and clear the associated TLB entry for multiple
page table entries with a single instruction instead of doing an IPTE
for each pte. Each IPTE or IPTE-range is a quiescing operation, basically
an IPI to all other CPUs to coordinate the pte invalidation.
The IPTE-range is useful in mulit-threaded programs for a fork or a
mprotect/munmap/mremap affecting large memory areas where s390 may not
just do the pte update and clear the TLBs later.
In order to add the IPTE range optimization another mm callback is
needed in copy_page_range, unmap_page_range, move_page_tables, and
change_protection_range. The name is 'ptep_prepare_range', suggestions
for a better name are welcome.
With the two patches the update for the ptes inside a single page table
is done in two steps. First the prep_prepare_range invalidates all ptes,
this makes the address range inaccessible for all CPUs. The pages are
still marked as present and could be revalidated again if the page table
lock is released, but this does not happen with the current code.
The second step is the usual update loop over all single ptes.
Given a multi-threaded program a fork or a mprotect/munmap/mremap of a
large address range now needs fewer IPTEs / IPIs by a factor up to 256.
My mprotect stress test runs faster by an order of magnitude.
Martin Schwidefsky (2):
mm: add callback to prepare the update of multiple page table entries
s390/mm: use ipte range to invalidate multiple page table entries
arch/s390/include/asm/pgtable.h | 25 +++++++++++++++++++++++++
arch/s390/include/asm/setup.h | 2 ++
arch/s390/kernel/early.c | 2 ++
arch/s390/mm/pageattr.c | 2 +-
arch/s390/mm/pgtable.c | 17 +++++++++++++++++
include/asm-generic/pgtable.h | 4 ++++
mm/memory.c | 2 ++
mm/mprotect.c | 1 +
mm/mremap.c | 1 +
9 files changed, 55 insertions(+), 1 deletion(-)
--
2.6.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] mm: add callback to prepare the update of multiple page table entries
2016-07-05 12:00 [PATCH 0/2][RFC] mm callback for batched pte updates Martin Schwidefsky
@ 2016-07-05 12:00 ` Martin Schwidefsky
2016-07-05 12:00 ` [PATCH 2/2] s390/mm: use ipte range to invalidate " Martin Schwidefsky
1 sibling, 0 replies; 3+ messages in thread
From: Martin Schwidefsky @ 2016-07-05 12:00 UTC (permalink / raw)
To: linux-mm; +Cc: Martin Schwidefsky
Add a new callback 'ptep_prepare_range' to allow the architecture
code to optimize the modification of multiple page table entries.
The background for the callback is an instruction found on s390.
The IPTE-range instruction can be used to invalidate up to 256 ptes
with a single IPI, including the flush of the TLB entries associated
to the address range.
This has similarities to the arch_[enter|leave]_lazy_mmu_mode, but for
a more specific situation. ptep_prepare_range is called for the update
of a block of ptes.
ptep_prepare_range is called optimistically, the callback may choose
to do nothing. In this case the individual single pte operation and
the arch_[enter|leave]_lazy_mmu_mode mechanics need to deal with the
invalidation and the associated TLB flush.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
include/asm-generic/pgtable.h | 4 ++++
mm/memory.c | 2 ++
mm/mprotect.c | 1 +
mm/mremap.c | 1 +
4 files changed, 8 insertions(+)
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 9401f48..b29f360 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -192,6 +192,10 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
}
#endif
+#ifndef ptep_prepare_range
+#define ptep_prepare_range(mm, start, end, ptep, full) do {} while (0)
+#endif
+
#ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline void pmdp_set_wrprotect(struct mm_struct *mm,
diff --git a/mm/memory.c b/mm/memory.c
index 07493e3..eeecb92 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -934,6 +934,7 @@ again:
orig_src_pte = src_pte;
orig_dst_pte = dst_pte;
arch_enter_lazy_mmu_mode();
+ ptep_prepare_range(src_mm, addr, end, src_pte, 0);
do {
/*
@@ -1114,6 +1115,7 @@ again:
start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
pte = start_pte;
arch_enter_lazy_mmu_mode();
+ ptep_prepare_range(mm, addr, end, pte, tlb->fullmm);
do {
pte_t ptent = *pte;
if (pte_none(ptent)) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b650c54..3fa15b5 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -74,6 +74,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
return 0;
arch_enter_lazy_mmu_mode();
+ ptep_prepare_range(mm, addr, end, pte, 0);
do {
oldpte = *pte;
if (pte_present(oldpte)) {
diff --git a/mm/mremap.c b/mm/mremap.c
index 3fa0a467..5f4d0af 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -135,6 +135,7 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
arch_enter_lazy_mmu_mode();
+ ptep_prepare_range(mm, old_addr, old_end, old_pte, 0);
for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
new_pte++, new_addr += PAGE_SIZE) {
--
2.6.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 2/2] s390/mm: use ipte range to invalidate multiple page table entries
2016-07-05 12:00 [PATCH 0/2][RFC] mm callback for batched pte updates Martin Schwidefsky
2016-07-05 12:00 ` [PATCH 1/2] mm: add callback to prepare the update of multiple page table entries Martin Schwidefsky
@ 2016-07-05 12:00 ` Martin Schwidefsky
1 sibling, 0 replies; 3+ messages in thread
From: Martin Schwidefsky @ 2016-07-05 12:00 UTC (permalink / raw)
To: linux-mm; +Cc: Martin Schwidefsky
The IPTE instruction with the range option can invalidate up to 256 page
table entries at once. This speeds up the mprotect, munmap, mremap and
fork operations for multi-threaded programs.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
arch/s390/include/asm/pgtable.h | 25 +++++++++++++++++++++++++
arch/s390/include/asm/setup.h | 2 ++
arch/s390/kernel/early.c | 2 ++
arch/s390/mm/pageattr.c | 2 +-
arch/s390/mm/pgtable.c | 17 +++++++++++++++++
5 files changed, 47 insertions(+), 1 deletion(-)
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 20e5f7d..2caf726 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -997,6 +997,31 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma,
return 1;
}
+void ptep_invalidate_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, pte_t *ptep);
+
+static inline void ptep_prepare_range(struct mm_struct *mm,
+ unsigned long start,
+ unsigned long end,
+ pte_t *ptep, int full)
+{
+ if (!full)
+ ptep_invalidate_range(mm, start, end, ptep);
+}
+#define ptep_prepare_range ptep_prepare_range
+
+#define __HAVE_ARCH_MOVE_PTE
+static inline pte_t move_pte(pte_t pte, pgprot_t prot,
+ unsigned long old_addr,
+ unsigned long new_addr)
+{
+ if ((pte_val(pte) & _PAGE_PRESENT) &&
+ (pte_val(pte) & _PAGE_READ) &&
+ (pte_val(pte) & _PAGE_YOUNG))
+ pte_val(pte) &= ~_PAGE_INVALID;
+ return pte;
+}
+
/*
* Additional functions to handle KVM guest page tables
*/
diff --git a/arch/s390/include/asm/setup.h b/arch/s390/include/asm/setup.h
index c0f0efb..58b13e0 100644
--- a/arch/s390/include/asm/setup.h
+++ b/arch/s390/include/asm/setup.h
@@ -30,6 +30,7 @@
#define MACHINE_FLAG_TLB_LC _BITUL(12)
#define MACHINE_FLAG_VX _BITUL(13)
#define MACHINE_FLAG_CAD _BITUL(14)
+#define MACHINE_FLAG_IPTE_RANGE _BITUL(15)
#define LPP_MAGIC _BITUL(31)
#define LPP_PFAULT_PID_MASK _AC(0xffffffff, UL)
@@ -71,6 +72,7 @@ extern void detect_memory_memblock(void);
#define MACHINE_HAS_TLB_LC (S390_lowcore.machine_flags & MACHINE_FLAG_TLB_LC)
#define MACHINE_HAS_VX (S390_lowcore.machine_flags & MACHINE_FLAG_VX)
#define MACHINE_HAS_CAD (S390_lowcore.machine_flags & MACHINE_FLAG_CAD)
+#define MACHINE_HAS_IPTE_RANGE (S390_lowcore.machine_flags & MACHINE_FLAG_IPTE_RANGE)
/*
* Console mode. Override with conmode=
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 717b03a..ebf69c4 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -339,6 +339,8 @@ static __init void detect_machine_facilities(void)
S390_lowcore.machine_flags |= MACHINE_FLAG_EDAT1;
__ctl_set_bit(0, 23);
}
+ if (test_facility(13))
+ S390_lowcore.machine_flags |= MACHINE_FLAG_IPTE_RANGE;
if (test_facility(78))
S390_lowcore.machine_flags |= MACHINE_FLAG_EDAT2;
if (test_facility(3))
diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index 7104ffb..91809d9 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -306,7 +306,7 @@ static void ipte_range(pte_t *pte, unsigned long address, int nr)
{
int i;
- if (test_facility(13)) {
+ if (MACHINE_HAS_IPTE_RANGE) {
__ptep_ipte_range(address, nr - 1, pte);
return;
}
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 74f8f2a..3dd85ec 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -283,6 +283,23 @@ void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
}
EXPORT_SYMBOL(ptep_modify_prot_commit);
+void ptep_invalidate_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, pte_t *ptep)
+{
+ unsigned long nr;
+
+ if (!MACHINE_HAS_IPTE_RANGE || mm_has_pgste(mm))
+ return;
+ preempt_disable();
+ nr = (end - start) >> PAGE_SHIFT;
+ /* If the flush is likely to be local skip the ipte range */
+ if (nr && !cpumask_equal(mm_cpumask(mm),
+ cpumask_of(smp_processor_id())))
+ __ptep_ipte_range(start, nr - 1, ptep);
+ preempt_enable();
+}
+EXPORT_SYMBOL(ptep_invalidate_range);
+
static inline pmd_t pmdp_flush_direct(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
{
--
2.6.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-07-05 12:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-05 12:00 [PATCH 0/2][RFC] mm callback for batched pte updates Martin Schwidefsky
2016-07-05 12:00 ` [PATCH 1/2] mm: add callback to prepare the update of multiple page table entries Martin Schwidefsky
2016-07-05 12:00 ` [PATCH 2/2] s390/mm: use ipte range to invalidate " Martin Schwidefsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox