* Re: [PATCH v2 00/11] Introduces new count-based method for monitoring lockless pagetable wakls
[not found] ` <1f5d9380418ad8bb90c6bbdac34716c650b917a0.camel@linux.ibm.com>
@ 2019-09-20 21:24 ` John Hubbard
2019-09-23 20:51 ` John Hubbard
1 sibling, 0 replies; 4+ messages in thread
From: John Hubbard @ 2019-09-20 21:24 UTC (permalink / raw)
To: Leonardo Bras, linuxppc-dev, linux-kernel
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Arnd Bergmann, Aneesh Kumar K.V, Christophe Leroy, Andrew Morton,
Dan Williams, Nicholas Piggin, Mahesh Salgaonkar,
Thomas Gleixner, Richard Fontana, Ganesh Goudar, Allison Randal,
Greg Kroah-Hartman, Mike Rapoport, YueHaibing, Ira Weiny,
Jason Gunthorpe, Keith Busch, Linux-MM
On 9/20/19 1:12 PM, Leonardo Bras wrote:
> If a process (qemu) with a lot of CPUs (128) try to munmap() a large
> chunk of memory (496GB) mapped with THP, it takes an average of 275
> seconds, which can cause a lot of problems to the load (in qemu case,
> the guest will lock for this time).
>
> Trying to find the source of this bug, I found out most of this time is
> spent on serialize_against_pte_lookup(). This function will take a lot
> of time in smp_call_function_many() if there is more than a couple CPUs
> running the user process. Since it has to happen to all THP mapped, it
> will take a very long time for large amounts of memory.
>
> By the docs, serialize_against_pte_lookup() is needed in order to avoid
> pmd_t to pte_t casting inside find_current_mm_pte(), or any lockless
> pagetable walk, to happen concurrently with THP splitting/collapsing.
>
> It does so by calling a do_nothing() on each CPU in mm->cpu_bitmap[],
> after interrupts are re-enabled.
> Since, interrupts are (usually) disabled during lockless pagetable
> walk, and serialize_against_pte_lookup will only return after
> interrupts are enabled, it is protected.
>
> So, by what I could understand, if there is no lockless pagetable walk
> running, there is no need to call serialize_against_pte_lookup().
>
> So, to avoid the cost of running serialize_against_pte_lookup(), I
> propose a counter that keeps track of how many find_current_mm_pte()
> are currently running, and if there is none, just skip
> smp_call_function_many().
Just noticed that this really should also include linux-mm, maybe
it's best to repost the patchset with them included?
In particular, there is likely to be some feedback about adding more
calls, in addition to local_irq_disable/enable, around the gup_fast() path,
separately from my questions about the synchronization cases in ppc.
thanks,
--
John Hubbard
NVIDIA
>
> The related functions are:
> start_lockless_pgtbl_walk(mm)
> Insert before starting any lockless pgtable walk
> end_lockless_pgtbl_walk(mm)
> Insert after the end of any lockless pgtable walk
> (Mostly after the ptep is last used)
> running_lockless_pgtbl_walk(mm)
> Returns the number of lockless pgtable walks running
>
>
> On my workload (qemu), I could see munmap's time reduction from 275
> seconds to 418ms.
>
>> Leonardo Bras (11):
>> powerpc/mm: Adds counting method to monitor lockless pgtable walks
>> asm-generic/pgtable: Adds dummy functions to monitor lockless pgtable
>> walks
>> mm/gup: Applies counting method to monitor gup_pgd_range
>> powerpc/mce_power: Applies counting method to monitor lockless pgtbl
>> walks
>> powerpc/perf: Applies counting method to monitor lockless pgtbl walks
>> powerpc/mm/book3s64/hash: Applies counting method to monitor lockless
>> pgtbl walks
>> powerpc/kvm/e500: Applies counting method to monitor lockless pgtbl
>> walks
>> powerpc/kvm/book3s_hv: Applies counting method to monitor lockless
>> pgtbl walks
>> powerpc/kvm/book3s_64: Applies counting method to monitor lockless
>> pgtbl walks
>> powerpc/book3s_64: Enables counting method to monitor lockless pgtbl
>> walk
>> powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing
>>
>> arch/powerpc/include/asm/book3s/64/mmu.h | 3 +++
>> arch/powerpc/include/asm/book3s/64/pgtable.h | 5 +++++
>> arch/powerpc/kernel/mce_power.c | 13 ++++++++++---
>> arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 ++
>> arch/powerpc/kvm/book3s_64_mmu_radix.c | 20 ++++++++++++++++++--
>> arch/powerpc/kvm/book3s_64_vio_hv.c | 4 ++++
>> arch/powerpc/kvm/book3s_hv_nested.c | 8 ++++++++
>> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 ++++++++-
>> arch/powerpc/kvm/e500_mmu_host.c | 4 ++++
>> arch/powerpc/mm/book3s64/hash_tlb.c | 2 ++
>> arch/powerpc/mm/book3s64/hash_utils.c | 7 +++++++
>> arch/powerpc/mm/book3s64/mmu_context.c | 1 +
>> arch/powerpc/mm/book3s64/pgtable.c | 20 +++++++++++++++++++-
>> arch/powerpc/perf/callchain.c | 5 ++++-
>> include/asm-generic/pgtable.h | 9 +++++++++
>> mm/gup.c | 4 ++++
>> 16 files changed, 108 insertions(+), 8 deletions(-)
>>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH v2 00/11] Introduces new count-based method for monitoring lockless pagetable wakls
[not found] ` <1f5d9380418ad8bb90c6bbdac34716c650b917a0.camel@linux.ibm.com>
2019-09-20 21:24 ` John Hubbard
@ 2019-09-23 20:51 ` John Hubbard
2019-09-23 20:58 ` Leonardo Bras
1 sibling, 1 reply; 4+ messages in thread
From: John Hubbard @ 2019-09-23 20:51 UTC (permalink / raw)
To: Leonardo Bras, linuxppc-dev, linux-kernel, Linux-MM
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Arnd Bergmann, Aneesh Kumar K.V, Christophe Leroy, Andrew Morton,
Dan Williams, Nicholas Piggin, Mahesh Salgaonkar,
Thomas Gleixner, Richard Fontana, Ganesh Goudar, Allison Randal,
Greg Kroah-Hartman, Mike Rapoport, YueHaibing, Ira Weiny,
Jason Gunthorpe, Keith Busch
On 9/20/19 1:12 PM, Leonardo Bras wrote:
...
>> arch/powerpc/include/asm/book3s/64/mmu.h | 3 +++
>> arch/powerpc/include/asm/book3s/64/pgtable.h | 5 +++++
>> arch/powerpc/kernel/mce_power.c | 13 ++++++++++---
>> arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 ++
>> arch/powerpc/kvm/book3s_64_mmu_radix.c | 20 ++++++++++++++++++--
>> arch/powerpc/kvm/book3s_64_vio_hv.c | 4 ++++
>> arch/powerpc/kvm/book3s_hv_nested.c | 8 ++++++++
>> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 ++++++++-
>> arch/powerpc/kvm/e500_mmu_host.c | 4 ++++
>> arch/powerpc/mm/book3s64/hash_tlb.c | 2 ++
>> arch/powerpc/mm/book3s64/hash_utils.c | 7 +++++++
>> arch/powerpc/mm/book3s64/mmu_context.c | 1 +
>> arch/powerpc/mm/book3s64/pgtable.c | 20 +++++++++++++++++++-
>> arch/powerpc/perf/callchain.c | 5 ++++-
>> include/asm-generic/pgtable.h | 9 +++++++++
>> mm/gup.c | 4 ++++
>> 16 files changed, 108 insertions(+), 8 deletions(-)
>>
Also, which tree do these patches apply to, please?
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply [flat|nested] 4+ messages in thread