[PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag
@ 2009-03-27 20:31 Jeremy Fitzhardinge
  2009-03-28  3:48 ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-27 20:31 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Avi Kivity, Linux Kernel Mailing List,
	Linux Memory Management List, the arch/x86 maintainers

get_user_pages_fast() relies on cross-cpu tlb flushes being a barrier
between clearing and setting a pte, and before freeing a pagetable page.
It usually does this by disabling interrupts to hold off IPIs, but
some tlb flush implementations don't use IPIs for tlb flushes, and
must use another mechanism.

In this change, add in_gup_cpumask, which is a cpumask of cpus currently
performing a get_user_pages_fast traversal of a pagetable.  A cross-cpu
tlb flush function can use this to determine whether it should hold-off
on the flush until the gup_fast has finished.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 80a1dee..b2e23e2 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -23,4 +23,6 @@ static inline void leave_mm(int cpu)
 }
 #endif
 
+extern cpumask_var_t in_gup_cpumask;
+
 #endif /* _ASM_X86_MMU_H */
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index be54176..a937b46 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -4,13 +4,17 @@
  * Copyright (C) 2008 Nick Piggin
  * Copyright (C) 2008 Novell Inc.
  */
+#include <linux/init.h>
 #include <linux/sched.h>
+#include <linux/cpumask.h>
 #include <linux/mm.h>
 #include <linux/vmstat.h>
 #include <linux/highmem.h>
 
 #include <asm/pgtable.h>
 
+cpumask_var_t in_gup_cpumask;
+
 static inline pte_t gup_get_pte(pte_t *ptep)
 {
 #ifndef CONFIG_X86_PAE
@@ -227,6 +231,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	unsigned long next;
 	pgd_t *pgdp;
 	int nr = 0;
+	int cpu;
 
 	start &= PAGE_MASK;
 	addr = start;
@@ -255,6 +260,10 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	 * address down to the the page and take a ref on it.
 	 */
 	local_irq_disable();
+
+	cpu = smp_processor_id();
+	cpumask_set_cpu(cpu, in_gup_cpumask);
+
 	pgdp = pgd_offset(mm, addr);
 	do {
 		pgd_t pgd = *pgdp;
@@ -265,6 +274,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
 			goto slow;
 	} while (pgdp++, addr = next, addr != end);
+
+	cpumask_clear_cpu(cpu, in_gup_cpumask);
+
 	local_irq_enable();
 
 	VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
@@ -274,6 +286,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		int ret;
 
 slow:
+		cpumask_clear_cpu(cpu, in_gup_cpumask);
 		local_irq_enable();
 slow_irqon:
 		/* Try to get the remaining pages with get_user_pages */
@@ -296,3 +309,9 @@ slow_irqon:
 		return ret;
 	}
 }
+
+static int __init gup_mask_init(void)
+{
+	return alloc_cpumask_var(&in_gup_cpumask, GFP_KERNEL);
+}
+core_initcall(gup_mask_init);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag
  2009-03-27 20:31 [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag Jeremy Fitzhardinge
@ 2009-03-28  3:48 ` Avi Kivity
  2009-03-28  5:01   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-03-28  3:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, Linux Kernel Mailing List,
	Linux Memory Management List, the arch/x86 maintainers

Jeremy Fitzhardinge wrote:
> get_user_pages_fast() relies on cross-cpu tlb flushes being a barrier
> between clearing and setting a pte, and before freeing a pagetable page.
> It usually does this by disabling interrupts to hold off IPIs, but
> some tlb flush implementations don't use IPIs for tlb flushes, and
> must use another mechanism.
>
> In this change, add in_gup_cpumask, which is a cpumask of cpus currently
> performing a get_user_pages_fast traversal of a pagetable.  A cross-cpu
> tlb flush function can use this to determine whether it should hold-off
> on the flush until the gup_fast has finished.
>
> @@ -255,6 +260,10 @@ int get_user_pages_fast(unsigned long start, int 
> nr_pages, int write,
>      * address down to the the page and take a ref on it.
>      */
>     local_irq_disable();
> +
> +    cpu = smp_processor_id();
> +    cpumask_set_cpu(cpu, in_gup_cpumask);
> +

This will bounce a cacheline, every time.  Please wrap in CONFIG_XEN and 
skip at runtime if Xen is not enabled.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag
  2009-03-28  3:48 ` Avi Kivity
@ 2009-03-28  5:01   ` Jeremy Fitzhardinge
  2009-03-28  7:54     ` Eric Dumazet
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-28  5:01 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Nick Piggin, Linux Kernel Mailing List,
	Linux Memory Management List, the arch/x86 maintainers

Avi Kivity wrote:
> Jeremy Fitzhardinge wrote:
>> get_user_pages_fast() relies on cross-cpu tlb flushes being a barrier
>> between clearing and setting a pte, and before freeing a pagetable page.
>> It usually does this by disabling interrupts to hold off IPIs, but
>> some tlb flush implementations don't use IPIs for tlb flushes, and
>> must use another mechanism.
>>
>> In this change, add in_gup_cpumask, which is a cpumask of cpus currently
>> performing a get_user_pages_fast traversal of a pagetable.  A cross-cpu
>> tlb flush function can use this to determine whether it should hold-off
>> on the flush until the gup_fast has finished.
>>
>> @@ -255,6 +260,10 @@ int get_user_pages_fast(unsigned long start, int 
>> nr_pages, int write,
>>      * address down to the the page and take a ref on it.
>>      */
>>     local_irq_disable();
>> +
>> +    cpu = smp_processor_id();
>> +    cpumask_set_cpu(cpu, in_gup_cpumask);
>> +
>
> This will bounce a cacheline, every time.  Please wrap in CONFIG_XEN 
> and skip at runtime if Xen is not enabled.

Every time?  Only when running successive gup_fasts on different cpus, 
and only twice per gup_fast. (What's the typical page count?  I see that 
kvm and lguest are page-at-a-time users, but presumably direct IO has 
larger batches.)

Alternatively, it could have per-cpu flags and the other side could 
construct the mask (I originally had that, but this was simpler).

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag
  2009-03-28  5:01   ` Jeremy Fitzhardinge
@ 2009-03-28  7:54     ` Eric Dumazet
  2009-03-28 12:31       ` Peter Zijlstra
  2009-03-28  9:54     ` Avi Kivity
  2009-03-28 12:27     ` Peter Zijlstra
  2 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-03-28  7:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Avi Kivity, Nick Piggin, Linux Kernel Mailing List,
	Linux Memory Management List, the arch/x86 maintainers

Jeremy Fitzhardinge a écrit :
> Avi Kivity wrote:
>> Jeremy Fitzhardinge wrote:
>>> get_user_pages_fast() relies on cross-cpu tlb flushes being a barrier
>>> between clearing and setting a pte, and before freeing a pagetable page.
>>> It usually does this by disabling interrupts to hold off IPIs, but
>>> some tlb flush implementations don't use IPIs for tlb flushes, and
>>> must use another mechanism.
>>>
>>> In this change, add in_gup_cpumask, which is a cpumask of cpus currently
>>> performing a get_user_pages_fast traversal of a pagetable.  A cross-cpu
>>> tlb flush function can use this to determine whether it should hold-off
>>> on the flush until the gup_fast has finished.
>>>
>>> @@ -255,6 +260,10 @@ int get_user_pages_fast(unsigned long start, int
>>> nr_pages, int write,
>>>      * address down to the the page and take a ref on it.
>>>      */
>>>     local_irq_disable();
>>> +
>>> +    cpu = smp_processor_id();
>>> +    cpumask_set_cpu(cpu, in_gup_cpumask);
>>> +
>>
>> This will bounce a cacheline, every time.  Please wrap in CONFIG_XEN
>> and skip at runtime if Xen is not enabled.
> 
> Every time?  Only when running successive gup_fasts on different cpus,
> and only twice per gup_fast. (What's the typical page count?  I see that
> kvm and lguest are page-at-a-time users, but presumably direct IO has
> larger batches.)

If I am not mistaken, shared futexes where hitting hard mm semaphore.
Then gup_fast was introduced in kernel/futex.c to remove this contention point.
Yet, this contention point was process specific, not a global one :)

And now, you want to add a global hot point, that would slow
down unrelated processes, only because they use shared futexes, thousand
times per second...

> 
> Alternatively, it could have per-cpu flags and the other side could
> construct the mask (I originally had that, but this was simpler).

Simpler but would be a regression for legacy applications still using shared
futexes (because statically linked with old libc)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag
  2009-03-28  7:54     ` Eric Dumazet
@ 2009-03-28 12:31       ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2009-03-28 12:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jeremy Fitzhardinge, Avi Kivity, Nick Piggin,
	Linux Kernel Mailing List, Linux Memory Management List,
	the arch/x86 maintainers

On Sat, 2009-03-28 at 08:54 +0100, Eric Dumazet wrote:
> Jeremy Fitzhardinge a A(C)crit :
> > Avi Kivity wrote:
> >> Jeremy Fitzhardinge wrote:
> >>> get_user_pages_fast() relies on cross-cpu tlb flushes being a barrier
> >>> between clearing and setting a pte, and before freeing a pagetable page.
> >>> It usually does this by disabling interrupts to hold off IPIs, but
> >>> some tlb flush implementations don't use IPIs for tlb flushes, and
> >>> must use another mechanism.
> >>>
> >>> In this change, add in_gup_cpumask, which is a cpumask of cpus currently
> >>> performing a get_user_pages_fast traversal of a pagetable.  A cross-cpu
> >>> tlb flush function can use this to determine whether it should hold-off
> >>> on the flush until the gup_fast has finished.
> >>>
> >>> @@ -255,6 +260,10 @@ int get_user_pages_fast(unsigned long start, int
> >>> nr_pages, int write,
> >>>      * address down to the the page and take a ref on it.
> >>>      */
> >>>     local_irq_disable();
> >>> +
> >>> +    cpu = smp_processor_id();
> >>> +    cpumask_set_cpu(cpu, in_gup_cpumask);
> >>> +
> >>
> >> This will bounce a cacheline, every time.  Please wrap in CONFIG_XEN
> >> and skip at runtime if Xen is not enabled.
> > 
> > Every time?  Only when running successive gup_fasts on different cpus,
> > and only twice per gup_fast. (What's the typical page count?  I see that
> > kvm and lguest are page-at-a-time users, but presumably direct IO has
> > larger batches.)
> 
> If I am not mistaken, shared futexes where hitting hard mm semaphore.
> Then gup_fast was introduced in kernel/futex.c to remove this contention point.
> Yet, this contention point was process specific, not a global one :)
> 
> And now, you want to add a global hot point, that would slow
> down unrelated processes, only because they use shared futexes, thousand
> times per second...

Yet another reason to turn all this virt muck off :-) I just wish I
could turn off the paravirt code impact, it makes finding functions in
the x86 code a terrible pain.

> > Alternatively, it could have per-cpu flags and the other side could
> > construct the mask (I originally had that, but this was simpler).
> 
> Simpler but would be a regression for legacy applications still using shared
> futexes (because statically linked with old libc)

Still doesn't help those apps that really use shared futexes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag
  2009-03-28  5:01   ` Jeremy Fitzhardinge
  2009-03-28  7:54     ` Eric Dumazet
@ 2009-03-28  9:54     ` Avi Kivity
  2009-03-28 12:27     ` Peter Zijlstra
  2 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-03-28  9:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, Linux Kernel Mailing List,
	Linux Memory Management List, the arch/x86 maintainers

Jeremy Fitzhardinge wrote:
>>> @@ -255,6 +260,10 @@ int get_user_pages_fast(unsigned long start, 
>>> int nr_pages, int write,
>>>      * address down to the the page and take a ref on it.
>>>      */
>>>     local_irq_disable();
>>> +
>>> +    cpu = smp_processor_id();
>>> +    cpumask_set_cpu(cpu, in_gup_cpumask);
>>> +
>>
>> This will bounce a cacheline, every time.  Please wrap in CONFIG_XEN 
>> and skip at runtime if Xen is not enabled.
>
> Every time?  Only when running successive gup_fasts on different cpus, 
> and only twice per gup_fast. (What's the typical page count?  I see 
> that kvm and lguest are page-at-a-time users, but presumably direct IO 
> has larger batches.)

Databases will often issue I/Os of 1 or 2 pages.  But not regressing kvm 
should be sufficient motivation.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag
  2009-03-28  5:01   ` Jeremy Fitzhardinge
  2009-03-28  7:54     ` Eric Dumazet
  2009-03-28  9:54     ` Avi Kivity
@ 2009-03-28 12:27     ` Peter Zijlstra
  2 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2009-03-28 12:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Avi Kivity, Nick Piggin, Linux Kernel Mailing List,
	Linux Memory Management List, the arch/x86 maintainers

On Fri, 2009-03-27 at 22:01 -0700, Jeremy Fitzhardinge wrote:
> Avi Kivity wrote:
> > Jeremy Fitzhardinge wrote:
> >> get_user_pages_fast() relies on cross-cpu tlb flushes being a barrier
> >> between clearing and setting a pte, and before freeing a pagetable page.
> >> It usually does this by disabling interrupts to hold off IPIs, but
> >> some tlb flush implementations don't use IPIs for tlb flushes, and
> >> must use another mechanism.
> >>
> >> In this change, add in_gup_cpumask, which is a cpumask of cpus currently
> >> performing a get_user_pages_fast traversal of a pagetable.  A cross-cpu
> >> tlb flush function can use this to determine whether it should hold-off
> >> on the flush until the gup_fast has finished.
> >>
> >> @@ -255,6 +260,10 @@ int get_user_pages_fast(unsigned long start, int 
> >> nr_pages, int write,
> >>      * address down to the the page and take a ref on it.
> >>      */
> >>     local_irq_disable();
> >> +
> >> +    cpu = smp_processor_id();
> >> +    cpumask_set_cpu(cpu, in_gup_cpumask);
> >> +
> >
> > This will bounce a cacheline, every time.  Please wrap in CONFIG_XEN 
> > and skip at runtime if Xen is not enabled.
> 
> Every time?  Only when running successive gup_fasts on different cpus, 
> and only twice per gup_fast. (What's the typical page count?  I see that 
> kvm and lguest are page-at-a-time users, but presumably direct IO has 
> larger batches.)

The larger the batch, the longer the irq-off latency, I've just proposed
adding a batch mechanism to gup_fast() to limit this.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-03-28 12:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-27 20:31 [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag Jeremy Fitzhardinge
2009-03-28  3:48 ` Avi Kivity
2009-03-28  5:01   ` Jeremy Fitzhardinge
2009-03-28  7:54     ` Eric Dumazet
2009-03-28 12:31       ` Peter Zijlstra
2009-03-28  9:54     ` Avi Kivity
2009-03-28 12:27     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox