From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id EC0256B016B for ; Thu, 28 Jul 2011 17:15:00 -0400 (EDT) Received: from kpbe16.cbf.corp.google.com (kpbe16.cbf.corp.google.com [172.25.105.80]) by smtp-out.google.com with ESMTP id p6SLEuvN013217 for ; Thu, 28 Jul 2011 14:14:56 -0700 Received: from yxl31 (yxl31.prod.google.com [10.190.3.223]) by kpbe16.cbf.corp.google.com with ESMTP id p6SLEpbw004743 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 28 Jul 2011 14:14:55 -0700 Received: by yxl31 with SMTP id 31so2430155yxl.41 for ; Thu, 28 Jul 2011 14:14:55 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110727161936.e6ab9299.akpm@linux-foundation.org> References: <1311103882-13544-1-git-send-email-abrestic@google.com> <20110727161936.e6ab9299.akpm@linux-foundation.org> Date: Thu, 28 Jul 2011 14:14:54 -0700 Message-ID: Subject: Re: [PATCH V4] Eliminate task stack trace duplication. From: Andrew Bresticker Content-Type: multipart/alternative; boundary=001636c9267008bbcb04a927a892 Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: KOSAKI Motohiro , Minchan Kim , Daisuke Nishimura , Balbir Singh , Tejun Heo , Pavel Emelyanov , KAMEZAWA Hiroyuki , Li Zefan , Mel Gorman , Christoph Lameter , Johannes Weiner , Rik van Riel , Hugh Dickins , Michal Hocko , Dave Hansen , Zhu Yanhai , linux-mm@kvack.org, Ying Han --001636c9267008bbcb04a927a892 Content-Type: text/plain; charset=ISO-8859-1 Hi Andrew, Thanks for reviewing the patch! On Wed, Jul 27, 2011 at 4:19 PM, Andrew Morton wrote: > On Tue, 19 Jul 2011 12:31:22 -0700 > Andrew Bresticker wrote: > > > The problem with small dmesg ring buffer like 512k is that only limited > number > > of task traces will be logged. Sometimes we lose important information > only > > because of too many duplicated stack traces. This problem occurs when > dumping > > lots of stacks in a single operation, such as sysrq-T. > > > > This patch tries to reduce the duplication of task stack trace in the > dump > > message by hashing the task stack. The hashtable is a 32k pre-allocated > buffer > > during bootup. Then we hash the task stack with stack_depth 32 for each > stack > > entry. Each time if we find the identical task trace in the task stack, > we dump > > only the pid of the task which has the task trace dumped. So it is easy > to back > > track to the full stack with the pid. > > > > [ 58.469730] kworker/0:0 S 0000000000000000 0 4 2 > 0x00000000 > > [ 58.469735] ffff88082fcfde80 0000000000000046 ffff88082e9d8000 > ffff88082fcfc010 > > [ 58.469739] ffff88082fce9860 0000000000011440 ffff88082fcfdfd8 > ffff88082fcfdfd8 > > [ 58.469743] 0000000000011440 0000000000000000 ffff88082fcee180 > ffff88082fce9860 > > [ 58.469747] Call Trace: > > [ 58.469751] [] worker_thread+0x24b/0x250 > > [ 58.469754] [] ? manage_workers+0x192/0x192 > > [ 58.469757] [] kthread+0x82/0x8a > > [ 58.469760] [] kernel_thread_helper+0x4/0x10 > > [ 58.469763] [] ? kthread_worker_fn+0x112/0x112 > > [ 58.469765] [] ? gs_change+0xb/0xb > > [ 58.469768] kworker/u:0 S 0000000000000004 0 5 2 > 0x00000000 > > [ 58.469773] ffff88082fcffe80 0000000000000046 ffff880800000000 > ffff88082fcfe010 > > [ 58.469777] ffff88082fcea080 0000000000011440 ffff88082fcfffd8 > ffff88082fcfffd8 > > [ 58.469781] 0000000000011440 0000000000000000 ffff88082fd4e9a0 > ffff88082fcea080 > > [ 58.469785] Call Trace: > > [ 58.469786] > > [ 58.470235] kworker/0:1 S 0000000000000000 0 13 2 > 0x00000000 > > [ 58.470255] ffff88082fd3fe80 0000000000000046 ffff880800000000 > ffff88082fd3e010 > > [ 58.470279] ffff88082fcee180 0000000000011440 ffff88082fd3ffd8 > ffff88082fd3ffd8 > > [ 58.470301] 0000000000011440 0000000000000000 ffffffff8180b020 > ffff88082fcee180 > > [ 58.470325] Call Trace: > > [ 58.470332] > > That looks nice. > > > ... > > > > Signed-off-by: Ying Han > > Signed-off-by: Andrew Bresticker > > --- > > arch/x86/Kconfig | 3 + > > arch/x86/include/asm/stacktrace.h | 6 ++- > > arch/x86/kernel/dumpstack.c | 24 ++++++-- > > arch/x86/kernel/dumpstack_32.c | 7 ++- > > arch/x86/kernel/dumpstack_64.c | 11 +++- > > arch/x86/kernel/stacktrace.c | 106 > +++++++++++++++++++++++++++++++++++++ > > drivers/tty/sysrq.c | 2 +- > > include/linux/sched.h | 3 +- > > include/linux/stacktrace.h | 2 + > > kernel/debug/kdb/kdb_bt.c | 8 ++-- > > kernel/rtmutex-debug.c | 2 +- > > kernel/sched.c | 20 ++++++- > > kernel/stacktrace.c | 10 ++++ > > 13 files changed, 180 insertions(+), 24 deletions(-) > > This is all pretty x86-centric. I wonder if the code could/should be > implemented in a fashion whcih would permit other architectures to use > it? > With this interface we would need to modify show_stack() on each architecture since we added the dup_stack_pid argument. I'll look into changing the interface so that we don't have to do this. Do you have any suggestions? > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -103,6 +103,9 @@ config LOCKDEP_SUPPORT > > config STACKTRACE_SUPPORT > > def_bool y > > > > +config STACKTRACE > > + def_bool y > > + > > What's this change for? > We don't need this any more. I'll get rid of it. > > > config HAVE_LATENCYTOP_SUPPORT > > def_bool y > > > > > > ... > > > > +static unsigned int stack_trace_lookup(int len) > > +{ > > + int j; > > + int index = 0; > > + unsigned int ret = 0; > > + struct task_stack *stack; > > + > > + index = task_stack_hash(cur_stack, len) % DEDUP_STACK_LAST_ENTRY; > > + > > + for (j = 0; j < DEDUP_HASH_MAX_ITERATIONS; j++) { > > + stack = stack_hash_table + (index + (1 << j)) % > > + DEDUP_STACK_LAST_ENTRY; > > + if (stack->entries[0] == 0x0) { > > + memcpy(stack, cur_stack, sizeof(*cur_stack)); > > + ret = 0; > > + break; > > + } else { > > + if (memcmp(stack->entries, cur_stack->entries, > > + sizeof(stack->entries)) == > 0) { > > + ret = stack->pid; > > + break; > > + } > > + } > > + } > > + memset(cur_stack, 0, sizeof(struct task_stack)); > > + > > + return ret; > > +} > > I can kinda see what this function is doing - maintaining an LRU ring > of task stacks. Or something. I didn't look very hard because I > shouldn't have to ;) Please comment this function: tell us what it's > doing and why it's doing it? > > What surprises me about this patch is that it appears to be maintaining > an array of entire stack traces. Why not just generate a good hash of > the stack contents and assume that if one task's hash is equal to > another tasks's hash, then the two tasks have the same stack trace? > > That way, > > struct task_stack { > pid_t pid; > unsigned long entries[DEDUP_MAX_STACK_DEPTH]; > }; > > becomes > > struct task_stack { > pid_t pid; > unsigned long stack_hash; > }; > I'll clean this up for the next version. > > > > > ... > > > Thanks, Andrew --001636c9267008bbcb04a927a892 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Andrew,

Thanks for reviewing the patch!

On Wed, Jul 27, 2011 at 4:19 PM, Andrew Morton <akpm@linux-foun= dation.org> wrote:
On Tue, 1= 9 Jul 2011 12:31:22 -0700
Andrew Bresticker <abrestic@googl= e.com> wrote:

> The problem with small dmesg ring buffer like 512k is that only limite= d number
> of task traces will be logged. Sometimes we lose important information= only
> because of too many duplicated stack traces. This problem occurs when = dumping
> lots of stacks in a single operation, such as sysrq-T.
>
> This patch tries to reduce the duplication of task stack trace in the = dump
> message by hashing the task stack. The hashtable is a 32k pre-allocate= d buffer
> during bootup. Then we hash the task stack with stack_depth 32 for eac= h stack
> entry. Each time if we find the identical task trace in the task stack= , we dump
> only the pid of the task which has the task trace dumped. So it is eas= y to back
> track to the full stack with the pid.
>
> [ =A0 58.469730] kworker/0:0 =A0 =A0 S 0000000000000000 =A0 =A0 0 =A0 = =A0 4 =A0 =A0 =A02 0x00000000
> [ =A0 58.469735] =A0ffff88082fcfde80 0000000000000046 ffff88082e9d8000= ffff88082fcfc010
> [ =A0 58.469739] =A0ffff88082fce9860 0000000000011440 ffff88082fcfdfd8= ffff88082fcfdfd8
> [ =A0 58.469743] =A00000000000011440 0000000000000000 ffff88082fcee180= ffff88082fce9860
> [ =A0 58.469747] Call Trace:
> [ =A0 58.469751] =A0[<ffffffff8108525a>] worker_thread+0x24b/0x2= 50
> [ =A0 58.469754] =A0[<ffffffff8108500f>] ? manage_workers+0x192/= 0x192
> [ =A0 58.469757] =A0[<ffffffff810885bd>] kthread+0x82/0x8a
> [ =A0 58.469760] =A0[<ffffffff8141aed4>] kernel_thread_helper+0x= 4/0x10
> [ =A0 58.469763] =A0[<ffffffff8108853b>] ? kthread_worker_fn+0x1= 12/0x112
> [ =A0 58.469765] =A0[<ffffffff8141aed0>] ? gs_change+0xb/0xb
> [ =A0 58.469768] kworker/u:0 =A0 =A0 S 0000000000000004 =A0 =A0 0 =A0 = =A0 5 =A0 =A0 =A02 0x00000000
> [ =A0 58.469773] =A0ffff88082fcffe80 0000000000000046 ffff880800000000= ffff88082fcfe010
> [ =A0 58.469777] =A0ffff88082fcea080 0000000000011440 ffff88082fcfffd8= ffff88082fcfffd8
> [ =A0 58.469781] =A00000000000011440 0000000000000000 ffff88082fd4e9a0= ffff88082fcea080
> [ =A0 58.469785] Call Trace:
> [ =A0 58.469786] <Same stack as pid 4>
> [ =A0 58.470235] kworker/0:1 =A0 =A0 S 0000000000000000 =A0 =A0 0 =A0 = =A013 =A0 =A0 =A02 0x00000000
> [ =A0 58.470255] =A0ffff88082fd3fe80 0000000000000046 ffff880800000000= ffff88082fd3e010
> [ =A0 58.470279] =A0ffff88082fcee180 0000000000011440 ffff88082fd3ffd8= ffff88082fd3ffd8
> [ =A0 58.470301] =A00000000000011440 0000000000000000 ffffffff8180b020= ffff88082fcee180
> [ =A0 58.470325] Call Trace:
> [ =A0 58.470332] <Same stack as pid 4>

That looks nice.

> ...
>
> Signed-off-by: Ying Han <ying= han@google.com>
> Signed-off-by: Andrew Bresticker <abrestic@google.com>
> ---
> =A0arch/x86/Kconfig =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| =A0 =A03 + > =A0arch/x86/include/asm/stacktrace.h | =A0 =A06 ++-
> =A0arch/x86/kernel/dumpstack.c =A0 =A0 =A0 | =A0 24 ++++++--
> =A0arch/x86/kernel/dumpstack_32.c =A0 =A0| =A0 =A07 ++-
> =A0arch/x86/kernel/dumpstack_64.c =A0 =A0| =A0 11 +++-
> =A0arch/x86/kernel/stacktrace.c =A0 =A0 =A0| =A0106 ++++++++++++++++++= +++++++++++++++++++
> =A0drivers/tty/sysrq.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 | =A0 =A02 +-
> =A0include/linux/sched.h =A0 =A0 =A0 =A0 =A0 =A0 | =A0 =A03 +-
> =A0include/linux/stacktrace.h =A0 =A0 =A0 =A0| =A0 =A02 +
> =A0kernel/debug/kdb/kdb_bt.c =A0 =A0 =A0 =A0 | =A0 =A08 ++--
> =A0kernel/rtmutex-debug.c =A0 =A0 =A0 =A0 =A0 =A0| =A0 =A02 +-
> =A0kernel/sched.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| =A0 20 ++++= ++-
> =A0kernel/stacktrace.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 | =A0 10 ++++
> =A013 files changed, 180 insertions(+), 24 deletions(-)

This is all pretty x86-centric. =A0I wonder if the code could/should = be
implemented in a fashion whcih would permit other architectures to use
it?

With this interface we would need t= o modify show_stack() on each architecture since we added the dup_stack_pid= argument. =A0I'll look into changing the interface so that we don'= t have to do this. =A0Do you have any suggestions?


> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -103,6 +103,9 @@ config LOCKDEP_SUPPORT
> =A0config STACKTRACE_SUPPORT
> =A0 =A0 =A0 def_bool y
>
> +config STACKTRACE
> + =A0 =A0 def_bool y
> +

What's this change for?

We do= n't need this any more. =A0I'll get rid of it.
=A0
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex;">

> =A0config HAVE_LATENCYTOP_SUPPORT
> =A0 =A0 =A0 def_bool y
>
>
> ...
>
> +static unsigned int stack_trace_lookup(int len)
> +{
> + =A0 =A0 int j;
> + =A0 =A0 int index =3D 0;
> + =A0 =A0 unsigned int ret =3D 0;
> + =A0 =A0 struct task_stack *stack;
> +
> + =A0 =A0 index =3D task_stack_hash(cur_stack, len) % DEDUP_STACK_LAST= _ENTRY;
> +
> + =A0 =A0 for (j =3D 0; j < DEDUP_HASH_MAX_ITERATIONS; j++) {
> + =A0 =A0 =A0 =A0 =A0 =A0 stack =3D stack_hash_table + (index + (1 <= ;< j)) %
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 DEDUP_STACK_LAST_ENTRY;
> + =A0 =A0 =A0 =A0 =A0 =A0 if (stack->entries[0] =3D=3D 0x0) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 memcpy(stack, cur_stack, siz= eof(*cur_stack));
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D 0;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> + =A0 =A0 =A0 =A0 =A0 =A0 } else {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (memcmp(stack->entries= , cur_stack->entries,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 sizeof(stack->entries)) =3D=3D 0) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D stac= k->pid;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> + =A0 =A0 =A0 =A0 =A0 =A0 }
> + =A0 =A0 }
> + =A0 =A0 memset(cur_stack, 0, sizeof(struct task_stack));
> +
> + =A0 =A0 return ret;
> +}

I can kinda see what this function is doing - maintaining an LRU ring=
of task stacks. =A0Or something. =A0I didn't look very hard because I shouldn't have to ;) Please comment this function: tell us what it'= s
doing and why it's doing it?

What surprises me about this patch is that it appears to be maintaining
an array of entire stack traces. =A0Why not just generate a good hash of the stack contents and assume that if one task's hash is equal to
another tasks's hash, then the two tasks have the same stack trace?

That way,

struct task_stack {
=A0 =A0 =A0 =A0pid_t pid;
=A0 =A0 =A0 =A0unsigned long entries[DEDUP_MAX_STACK_DEPTH];
};

becomes

struct task_stack {
=A0 =A0 =A0 =A0pid_t pid;
=A0 =A0 =A0 =A0unsigned long stack_hash;
};

I'll clean this up for the next = version.
=A0

>
> ...
>

Thanks,
Andrew=A0
<= /div>
--001636c9267008bbcb04a927a892-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org