linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/vmalloc: Introduce DEBUG_VMALLOCINFO to reduce spinlock contention
@ 2014-04-10 16:40 Richard Yao
  2014-04-10 16:51 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Richard Yao @ 2014-04-10 16:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Zhang Yanfei, Wanpeng Li, Johannes Weiner,
	HATAYAMA Daisuke, linux-mm, linux-kernel, kernel, Matthew Thode,
	Richard Yao

Performance analysis of software compilation by Gentoo portage on an
Intel E5-2620 with 64GB of RAM revealed that a sizeable amount of time,
anywhere from 5% to 15%, was spent in get_vmalloc_info(), with at least
40% of that time spent in the _raw_spin_lock() invoked by it.

The spinlock call is done on vmap_area_lock to protect vmap_area_list,
but changes to vmap_area_list are made under RCU. The only consumer that
requires a spinlock on an RCU-ified list is /proc/vmallocinfo. That is
only intended for use by kernel developers doing debugging, but even few
kernel developers appear to use it. Introducing DEBUG_VMALLOCINFO allows
us to fully RCU-ify the list, which eliminates this list as a source of
contention.

This patch brings a substantial reduction in time spent in spinlocks on
my system. Flame graphs from my early analysis are available on my
developer space. They were created by profiling the system under
concurrent package builds done by emerge at a sample rate of 99Hz for 10
seconds and using Brendan Gregg's scripts to process the data:

http://dev.gentoo.org/~ryao/emerge.svg
http://dev.gentoo.org/~ryao/emerge-patched.svg

In this example, 6.64% of system time is spent in get_vmalloc_info()
with 2.59% spent in the spinlock. The patched version sees only 0.50% of
time spent in get_vmalloc_info() with neligible time spent in spin
locks. The low utilization of get_vmalloc_info() in this is partly
attributable to measurement error, but the reduction in time spent
spinning is clear.

Signed-off-by: Richard Yao <ryao@gentoo.org>
---
 lib/Kconfig.debug | 11 +++++++++++
 mm/vmalloc.c      | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index dd7f885..a3e6967 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -492,6 +492,17 @@ config DEBUG_STACK_USAGE
 
 	  This option will slow down process creation somewhat.
 
+config DEBUG_VMALLOCINFO
+	bool "Provide /proc/vmallocinfo"
+	depends on PROC_FS
+	help
+	  Provides a userland interface to view kernel virtual memory mappings.
+	  Enabling this places a RCU-ified list under spinlock protection. That
+	  hurts performance in concurrent workloads.
+
+	  If unsure, say N.
+
+
 config DEBUG_VM
 	bool "Debug VM"
 	depends on DEBUG_KERNEL
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index bf233b2..12ab34b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1988,8 +1988,13 @@ long vread(char *buf, char *addr, unsigned long count)
 	if ((unsigned long) addr + count < count)
 		count = -(unsigned long) addr;
 
+#ifdef CONFIG_DEBUG_VMALLOCINFO
 	spin_lock(&vmap_area_lock);
 	list_for_each_entry(va, &vmap_area_list, list) {
+#else
+	rcu_read_lock();
+	list_for_each_entry_rcu(va, &vmap_area_list, list) {
+#endif
 		if (!count)
 			break;
 
@@ -2020,7 +2025,11 @@ long vread(char *buf, char *addr, unsigned long count)
 		count -= n;
 	}
 finished:
+#ifdef CONFIG_DEBUG_VMALLOCINFO
 	spin_unlock(&vmap_area_lock);
+#else
+	rcu_read_unlock();
+#endif
 
 	if (buf == buf_start)
 		return 0;
@@ -2070,8 +2079,13 @@ long vwrite(char *buf, char *addr, unsigned long count)
 		count = -(unsigned long) addr;
 	buflen = count;
 
+#ifdef CONFIG_DEBUG_VMALLOCINFO
 	spin_lock(&vmap_area_lock);
 	list_for_each_entry(va, &vmap_area_list, list) {
+#else
+	rcu_read_lock();
+	list_for_each_entry_rcu(va, &vmap_area_list, list) {
+#endif
 		if (!count)
 			break;
 
@@ -2101,7 +2115,11 @@ long vwrite(char *buf, char *addr, unsigned long count)
 		count -= n;
 	}
 finished:
+#ifdef CONFIG_DEBUG_VMALLOCINFO
 	spin_unlock(&vmap_area_lock);
+#else
+	rcu_read_unlock();
+#endif
 	if (!copied)
 		return 0;
 	return buflen;
@@ -2531,6 +2549,7 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 #endif	/* CONFIG_SMP */
 
 #ifdef CONFIG_PROC_FS
+#ifdef CONFIG_DEBUG_VMALLOCINFO
 static void *s_start(struct seq_file *m, loff_t *pos)
 	__acquires(&vmap_area_lock)
 {
@@ -2677,6 +2696,7 @@ static int __init proc_vmalloc_init(void)
 	return 0;
 }
 module_init(proc_vmalloc_init);
+#endif
 
 void get_vmalloc_info(struct vmalloc_info *vmi)
 {
@@ -2689,14 +2709,22 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
 
 	prev_end = VMALLOC_START;
 
+#ifdef CONFIG_DEBUG_VMALLOCINFO
 	spin_lock(&vmap_area_lock);
+#else
+	rcu_read_lock();
+#endif
 
 	if (list_empty(&vmap_area_list)) {
 		vmi->largest_chunk = VMALLOC_TOTAL;
 		goto out;
 	}
 
+#ifdef CONFIG_DEBUG_VMALLOCINFO
 	list_for_each_entry(va, &vmap_area_list, list) {
+#else
+	list_for_each_entry_rcu(va, &vmap_area_list, list) {
+#endif
 		unsigned long addr = va->va_start;
 
 		/*
@@ -2723,7 +2751,11 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
 		vmi->largest_chunk = VMALLOC_END - prev_end;
 
 out:
+#ifdef CONFIG_DEBUG_VMALLOCINFO
 	spin_unlock(&vmap_area_lock);
+#else
+	rcu_read_unlock();
+#endif
 }
 #endif
 
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/vmalloc: Introduce DEBUG_VMALLOCINFO to reduce spinlock contention
  2014-04-10 16:40 [PATCH] mm/vmalloc: Introduce DEBUG_VMALLOCINFO to reduce spinlock contention Richard Yao
@ 2014-04-10 16:51 ` Andi Kleen
  2014-04-10 17:09   ` Richard Yao
  2014-04-16  0:28 ` Joonsoo Kim
  2014-04-17 23:18 ` Andrew Morton
  2 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2014-04-10 16:51 UTC (permalink / raw)
  To: Richard Yao
  Cc: Andrew Morton, Joonsoo Kim, Zhang Yanfei, Wanpeng Li,
	Johannes Weiner, HATAYAMA Daisuke, linux-mm, linux-kernel,
	kernel, Matthew Thode

Richard Yao <ryao@gentoo.org> writes:

> Performance analysis of software compilation by Gentoo portage on an
> Intel E5-2620 with 64GB of RAM revealed that a sizeable amount of time,
> anywhere from 5% to 15%, was spent in get_vmalloc_info(), with at least
> 40% of that time spent in the _raw_spin_lock() invoked by it.

I don't think that's the right fix. We want to be able 
to debug kernels without having to recompile them.

And switching locking around dynamically like this is very
ugly and hard to maintain.

Besides are you sure the spin lock is not needed elsewhere?

How are writers to the list protected?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/vmalloc: Introduce DEBUG_VMALLOCINFO to reduce spinlock contention
  2014-04-10 16:51 ` Andi Kleen
@ 2014-04-10 17:09   ` Richard Yao
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Yao @ 2014-04-10 17:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Joonsoo Kim, Zhang Yanfei, Wanpeng Li,
	Johannes Weiner, HATAYAMA Daisuke, linux-mm, linux-kernel,
	kernel, Matthew Thode

[-- Attachment #1: Type: text/plain, Size: 1833 bytes --]

On 04/10/2014 12:51 PM, Andi Kleen wrote:
> Richard Yao <ryao@gentoo.org> writes:
> 
>> Performance analysis of software compilation by Gentoo portage on an
>> Intel E5-2620 with 64GB of RAM revealed that a sizeable amount of time,
>> anywhere from 5% to 15%, was spent in get_vmalloc_info(), with at least
>> 40% of that time spent in the _raw_spin_lock() invoked by it.
> 
> I don't think that's the right fix. We want to be able 
> to debug kernels without having to recompile them.

There are plenty of other features for debugging the VM subsystem that
are disabled in production kernels because they are too expensive. I see
no reason why this should not be one of them.

If someone reading this has a use for this functionality in production
systems, I would love to hear about it. I am having trouble finding uses
for this in production.

That being said, we are clearly spending plenty of time blocked on list
traversal. I imagine that we could use an extent tree to track free
space for even bigger gains, but I have difficulty seeing why
/proc/vmallocinfo should be available on non-debug kernels. Allowing
userland to hold a critical lock indefinitely on production systems is a
deadlock waiting to happen.

> And switching locking around dynamically like this is very
> ugly and hard to maintain.

I welcome suggestions on how to make the changes I have made in this
patch more maintainable.

> Besides are you sure the spin lock is not needed elsewhere?
> 
> How are writers to the list protected?

The spinlock is needed elsewhere, but not to protect this list.
Modifications to this list are done under RCU. The only thing stopping
RCU from being enough to avoid a spinlock is /proc/vmallocinfo, which
does locking to prevent modification while userland is reading the list.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/vmalloc: Introduce DEBUG_VMALLOCINFO to reduce spinlock contention
  2014-04-10 16:40 [PATCH] mm/vmalloc: Introduce DEBUG_VMALLOCINFO to reduce spinlock contention Richard Yao
  2014-04-10 16:51 ` Andi Kleen
@ 2014-04-16  0:28 ` Joonsoo Kim
  2014-04-17 23:18 ` Andrew Morton
  2 siblings, 0 replies; 5+ messages in thread
From: Joonsoo Kim @ 2014-04-16  0:28 UTC (permalink / raw)
  To: Richard Yao
  Cc: Andrew Morton, Zhang Yanfei, Wanpeng Li, Johannes Weiner,
	HATAYAMA Daisuke, linux-mm, linux-kernel, kernel, Matthew Thode

On Thu, Apr 10, 2014 at 12:40:58PM -0400, Richard Yao wrote:
> Performance analysis of software compilation by Gentoo portage on an
> Intel E5-2620 with 64GB of RAM revealed that a sizeable amount of time,
> anywhere from 5% to 15%, was spent in get_vmalloc_info(), with at least
> 40% of that time spent in the _raw_spin_lock() invoked by it.
> 
> The spinlock call is done on vmap_area_lock to protect vmap_area_list,
> but changes to vmap_area_list are made under RCU. The only consumer that
> requires a spinlock on an RCU-ified list is /proc/vmallocinfo. That is

Why only '/proc/vmallocinfo' needs the spinlock?
List iterators which access va->vm such as vread() and vwrite() needs
the spinlock too.
But, I think that get_vmalloc_info() doesn't need it, so you can use
rcu list iteration on that function and it would fix your problem.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/vmalloc: Introduce DEBUG_VMALLOCINFO to reduce spinlock contention
  2014-04-10 16:40 [PATCH] mm/vmalloc: Introduce DEBUG_VMALLOCINFO to reduce spinlock contention Richard Yao
  2014-04-10 16:51 ` Andi Kleen
  2014-04-16  0:28 ` Joonsoo Kim
@ 2014-04-17 23:18 ` Andrew Morton
  2 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2014-04-17 23:18 UTC (permalink / raw)
  To: Richard Yao
  Cc: Joonsoo Kim, Zhang Yanfei, Wanpeng Li, Johannes Weiner,
	HATAYAMA Daisuke, linux-mm, linux-kernel, kernel, Matthew Thode

On Thu, 10 Apr 2014 12:40:58 -0400 Richard Yao <ryao@gentoo.org> wrote:

> Performance analysis of software compilation by Gentoo portage on an
> Intel E5-2620 with 64GB of RAM revealed that a sizeable amount of time,
> anywhere from 5% to 15%, was spent in get_vmalloc_info(), with at least
> 40% of that time spent in the _raw_spin_lock() invoked by it.

This means that something in userspace is beating the crap out of
/proc/meminfo.  What is it and why is it doing this?

/proc/meminfo reads a large amount of stuff and gathering it will
always be expensive.  I don't think we really want to be doing
significant work and adding significant complexity to optimize meminfo.

If there really is a legitimate need to be reading meminfo with this
frequency then it would be pretty simple to optimise
get_vmalloc_info(): all it does is to return two ulongs and we could
maintain those at vmalloc/vfree time rather than doing the big list
walk.

If we can address these things then the vmap_area_lock problem should
just go away - the kernel shouldn't be calling vmalloc/vfree at high
frequency, especially during a compilation workload.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-04-17 23:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 16:40 [PATCH] mm/vmalloc: Introduce DEBUG_VMALLOCINFO to reduce spinlock contention Richard Yao
2014-04-10 16:51 ` Andi Kleen
2014-04-10 17:09   ` Richard Yao
2014-04-16  0:28 ` Joonsoo Kim
2014-04-17 23:18 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox