Re: Enhanced profiling support (was Re: vm lock contention reduction)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
@ 2002-07-10 14:28 Richard J Moore
  2002-07-10 20:30 ` Karim Yaghmour
  0 siblings, 1 reply; 13+ messages in thread
From: Richard J Moore @ 2002-07-10 14:28 UTC (permalink / raw)
  To: John Levon
  Cc: Andrew Morton, Andrea Arcangeli, bob, Karim Yaghmour,
	linux-kernel, linux-mm, mjbligh, John Levon, Rik van Riel,
	Linus Torvalds

>Sure, there are all sorts of things where some tracing can come in
>useful. The question is whether it's really something the mainline
>kernel should be doing, and if the gung-ho approach is nice or not.
>
>> The fact that so many kernel subsystems already have their own tracing
>> built-in (see other posting)
>
>Your list was almost entirely composed of per-driver debug routines.
>This is not the same thing as logging trap entry/exits, syscalls etc
>etc, on any level, and I'm a bit perplexed that you're making such an
>assocation.

There's a balance to be struck with tracing. First we should point out that
the recording mechanism doesn't have to intrude within the kernel unlss you
want init time tracing. The bigger point of contention seems to be that of
instrumentation. Yes, it is very ugly to have thousands of trace points
littering the source. On the otherhand, for basic serviceability a minimal
set should be present in a production system - these would typically allow
the external interface of any component to be traced.  For low-level
tracing - i.e. internal routines etc - the dynamic trace can be used. This
requires no modification to source. The tracepoint is implemanted
dynamically in execting code. DProbes+LTT provides this capability.

Some level of tracing (along with other complementary PD tools e.g. crash
dump) needs to be readiliy available to deal with those types of problem we
see with mature systems employed in the production environment. Typically
such problems are not readily recreatable nor even prictable. I've often
had to solve problems which impact a business environment severely, where
one server out of 2000 gets hit each day, but its a different one each day.
Its under those circumstances that trace along without other automated data
capturing problem determination tools become invaluable. And its a fact of
life that only those types of difficult problem remain once we've beaten a
system to death in developments and test. Being able to use a common set of
tools whatever the componets under investigation greatly eases problem
determination. This is especially so where you have the ability to use
dprobes with LTT to provide ad hoc tracepoints that were not originally
included by the developers.

Richard J Moore CEng, MIEE, Consulting IT Specialist, TSM
RAS Project Lead - Linux Technology Centre (ATS-PIC).
http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd,  MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK
The IBM Academy will hold a Conference on Performance Engineering in
Toronto July 8-10. A High Availability Conference follows July 10-12.
Details on http://w3.ibm.com/academy/

                      John Levon                                                                                                           
                      <movement@marcelothewonderp        To:       Karim Yaghmour <karim@opersys.com>                                      
                      enguin.com>                        cc:       Linus Torvalds <torvalds@transmeta.com>, Andrew Morton                  
                      Sent by: John Levon                 <akpm@zip.com.au>, Andrea Arcangeli <andrea@suse.de>, Rik van Riel               
                      <moz@compsoc.man.ac.uk>             <riel@conectiva.com.br>, "linux-mm@kvack.org" <linux-mm@kvack.org>,              
                                                          mjbligh@linux.ibm.com, linux-kernel@vger.kernel.org, Richard J                   
                                                          Moore/UK/IBM@IBMGB, bob <bob@watson.ibm.com>                                     
                      10/07/2002 00:38                   Subject:  Re: Enhanced profiling support (was Re: vm lock contention reduction)   
                      Please respond to John                                                                                               
                      Levon                                                                                                                

On Wed, Jul 10, 2002 at 12:16:05AM -0400, Karim Yaghmour wrote:

[snip]

> And the list goes on.

Sure, there are all sorts of things where some tracing can come in
useful. The question is whether it's really something the mainline
kernel should be doing, and if the gung-ho approach is nice or not.

> The fact that so many kernel subsystems already have their own tracing
> built-in (see other posting)

Your list was almost entirely composed of per-driver debug routines.
This is not the same thing as logging trap entry/exits, syscalls etc
etc, on any level, and I'm a bit perplexed that you're making such an
assocation.

> expect user-space developers to efficiently use the kernel if they
> have
> absolutely no idea about the dynamic interaction their processes have
> with the kernel and how this interaction is influenced by and
> influences
> the interaction with other processes?

This is clearly an exaggeration. And seeing as something like LTT
doesn't (and cannot) tell the "whole story" either, I could throw the
same argument directly back at you. The point is, there comes a point of
no return where usefulness gets outweighed by ugliness. For the very few
cases that such detailed information is really useful, the user can
usually install the needed special-case tools.

In contrast a profiling mechanism that improves on the poor lot that
currently exists (gprof, readprofile) has a truly general utility, and
can hopefully be done without too much ugliness.

The primary reason I want to see something like this is to kill the ugly
code I have to maintain.

> > The entry.S examine-the-registers approach is simple enough, but
> > it's
> > not much more tasteful than sys_call_table hackery IMHO
>
> I guess we won't agree on this. From my point of view it is much
> better
> to have the code directly within entry.S for all to see instead of
> having some external software play around with the syscall table in a
> way kernel users can't trace back to the kernel's own code.

Eh ? I didn't say sys_call_table hackery was better. I said the entry.S
thing wasn't much better ...

regards
john

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-10 14:28 Enhanced profiling support (was Re: vm lock contention reduction) Richard J Moore
@ 2002-07-10 20:30 ` Karim Yaghmour
  2002-07-10 21:41   ` Andrea Arcangeli
  0 siblings, 1 reply; 13+ messages in thread
From: Karim Yaghmour @ 2002-07-10 20:30 UTC (permalink / raw)
  To: Richard J Moore
  Cc: John Levon, Andrew Morton, Andrea Arcangeli, bob, linux-kernel,
	linux-mm, mjbligh, John Levon, Rik van Riel, Linus Torvalds

Richard J Moore wrote:
> Some level of tracing (along with other complementary PD tools e.g. crash
> dump) needs to be readiliy available to deal with those types of problem we
> see with mature systems employed in the production environment. Typically
> such problems are not readily recreatable nor even prictable. I've often
> had to solve problems which impact a business environment severely, where
> one server out of 2000 gets hit each day, but its a different one each day.
> Its under those circumstances that trace along without other automated data
> capturing problem determination tools become invaluable. And its a fact of
> life that only those types of difficult problem remain once we've beaten a
> system to death in developments and test. Being able to use a common set of
> tools whatever the componets under investigation greatly eases problem
> determination. This is especially so where you have the ability to use
> dprobes with LTT to provide ad hoc tracepoints that were not originally
> included by the developers.

I definitely agree.

One case which perfectly illustrates how extreme these situations can be is
the Mars Pathfinder. The folks at the Jet Propulsion Lab used a tracing tool
very similar to LTT to locate the priority inversion problem the Pathfinder
had while it was on Mars.

The full account gives an interesting read (sorry for the link being on
MS's website but its author works for MS research ...):
http://research.microsoft.com/research/os/mbj/Mars_Pathfinder/Authoritative_Account.html

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-10 20:30 ` Karim Yaghmour
@ 2002-07-10 21:41   ` Andrea Arcangeli
  2002-07-11  4:47     ` Karim Yaghmour
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2002-07-10 21:41 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Richard J Moore, John Levon, Andrew Morton, bob, linux-kernel,
	linux-mm, mjbligh, John Levon, Rik van Riel, Linus Torvalds

On Wed, Jul 10, 2002 at 04:30:42PM -0400, Karim Yaghmour wrote:
> 
> Richard J Moore wrote:
> > Some level of tracing (along with other complementary PD tools e.g. crash
> > dump) needs to be readiliy available to deal with those types of problem we
> > see with mature systems employed in the production environment. Typically
> > such problems are not readily recreatable nor even prictable. I've often
> > had to solve problems which impact a business environment severely, where
> > one server out of 2000 gets hit each day, but its a different one each day.
> > Its under those circumstances that trace along without other automated data
> > capturing problem determination tools become invaluable. And its a fact of
> > life that only those types of difficult problem remain once we've beaten a
> > system to death in developments and test. Being able to use a common set of
> > tools whatever the componets under investigation greatly eases problem
> > determination. This is especially so where you have the ability to use
> > dprobes with LTT to provide ad hoc tracepoints that were not originally
> > included by the developers.
> 
> I definitely agree.
> 
> One case which perfectly illustrates how extreme these situations can be is
> the Mars Pathfinder. The folks at the Jet Propulsion Lab used a tracing tool
> very similar to LTT to locate the priority inversion problem the Pathfinder
> had while it was on Mars.

btw, on the topic, with our semaphores there's no way to handle priority
inversion with SCHED_RR tasks, if there's more than one task that runs
in RT priority we may fall into starvation of RT tasks too the same way.

No starvation can happen of course if all tasks in the systems belongs
to the same scheduler policy (nice levels can have effects but they're
not indefinite delays).

The fix Ingo used for SCHED_IDLE is to have a special call to the
scheduler while returning to userspace, so in the only place where we
know the kernel isn't holding any lock. But while it's going to only
generate some minor unexpected cpu load with SCHED_IDLE, generalizing
that hack to make all tasks scheduling inside the kernel running with RT
priority isn't going to provide a nice/fair behaviour (some task infact could
run way too much if it's very system-hungry, in particular with
-preempt, which could again generate starvation of userspace, even if
not anymore because of kernel locks). Maybe I'm overlooking something
simple but I guess it's not going to be easy to fix it, for the
semaphores it isn't too bad, they could learn how to raise priority of a
special holder when needed, but for any semaphore-by-hand (for example
spinlock based) it would require some major auditing.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-10 21:41   ` Andrea Arcangeli
@ 2002-07-11  4:47     ` Karim Yaghmour
  2002-07-11  4:59       ` Karim Yaghmour
  0 siblings, 1 reply; 13+ messages in thread
From: Karim Yaghmour @ 2002-07-11  4:47 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Richard J Moore, John Levon, Andrew Morton, bob, linux-kernel,
	linux-mm, mjbligh, John Levon, Rik van Riel, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 4771 bytes --]


Andrea Arcangeli wrote:
> btw, on the topic, with our semaphores there's no way to handle priority
> inversion with SCHED_RR tasks, if there's more than one task that runs
> in RT priority we may fall into starvation of RT tasks too the same way.
> 
> No starvation can happen of course if all tasks in the systems belongs
> to the same scheduler policy (nice levels can have effects but they're
> not indefinite delays).
> 
> The fix Ingo used for SCHED_IDLE is to have a special call to the
> scheduler while returning to userspace, so in the only place where we
> know the kernel isn't holding any lock. But while it's going to only
> generate some minor unexpected cpu load with SCHED_IDLE, generalizing
> that hack to make all tasks scheduling inside the kernel running with RT
> priority isn't going to provide a nice/fair behaviour (some task infact could
> run way too much if it's very system-hungry, in particular with
> -preempt, which could again generate starvation of userspace, even if
> not anymore because of kernel locks). Maybe I'm overlooking something
> simple but I guess it's not going to be easy to fix it, for the
> semaphores it isn't too bad, they could learn how to raise priority of a
> special holder when needed, but for any semaphore-by-hand (for example
> spinlock based) it would require some major auditing.

I wasn't aware of this particular problem, but I certainly can see LTT
as being helpful in trying to understand the actual interactions. The
custom event creation API is rather simple to use if you would like
to instrument some of the semaphore code (the Sys V IPC code is already
instrumented at a basic level):
trace_create_event()
trace_create_owned_event()
trace_destroy_event()
trace_std_formatted_event()
trace_raw_event()

Here's the complete description:
int trace_create_event(char*            pm_event_type,
		       char*            pm_event_desc,
		       int              pm_format_type,
		       char*            pm_format_data)
	pm_event_type is a short string describing the type of event.
	pm_event_desc is required if you are using
	trace_std_formatted_event(), more on this below.
	pm_format_type and pm_format_data are required if you are
	using trace_raw_event(). The function returns a unique
	custom event ID.

int trace_create_owned_event(char*            pm_event_type,
			     char*            pm_event_desc,
			     int              pm_format_type,
			     char*            pm_format_data,
			     pid_t            pm_owner_pid)
	Same as trace_create_event() except that all events created
	using this call will be deleted once the process with pm_owner_pid
	exits. Not really useful in kernel space, but essential for
	providing user-space events.

void trace_destroy_event(int pm_event_id)
	Destroys the event with given pm_event_id.

int trace_std_formatted_event(int pm_event_id, ...)
	Instead of having a slew of printk's all around the code,
	pm_event_desc is filled with a printk-like string at the
	event creation and the actual params used to fill this
	string are passed to trace_std_formatted_event(). Be aware
	that this function uses va_start/vsprintf/va_end. The
	resulting string is logged in the trace as such and is visible
	in the trace much like a printk in a log. Except that you can
	place this in paths where printk's can't go and you can be sure
	that events logged with this are delivered even in high
	throughput situations (granted the trace buffer size is adequate).

int trace_raw_event(int   pm_event_id,
                    int   pm_event_size,
                    void* pm_event_data)
	This is the easiest way to log tons of binary data. Just give it
	the size of the data (pm_event_size) and a pointer to it
	(pm_event_data) and it gets written in the trace. This is what
	DProbes uses. The binary data is then formatted in userspace.

All events logged using this API can easily be extracted using LibLTT's
API. There is no requirement to use LTT's own visualization tool,
although you can still use it to see your own custom events.

As with the other traced events, each custom event gets timestamped
(do_gettimeofday) and placed in order of occurrence in the trace buffer.

I've attached an example module that uses the custom event API. See
the Examples directory of the LTT package for an example custom trace
reader which uses LibLTT.

One clear advantage of this API is that you can avoid using any
"#ifdef TRACE" or "#ifdef DEBUG". If the tracing module isn't loaded
or if the trace daemon isn't running, then nothing gets logged.

Cheers,

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

[-- Attachment #2: custom1.c --]
[-- Type: image/x-xbitmap, Size: 2287 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-11  4:47     ` Karim Yaghmour
@ 2002-07-11  4:59       ` Karim Yaghmour
  0 siblings, 0 replies; 13+ messages in thread
From: Karim Yaghmour @ 2002-07-11  4:59 UTC (permalink / raw)
  To: Andrea Arcangeli, Richard J Moore, John Levon, Andrew Morton,
	bob, linux-kernel, linux-mm, mjbligh, John Levon, Rik van Riel,
	Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 252 bytes --]


Sorry, the attachment never made it ...

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

[-- Attachment #2: example.c --]
[-- Type: text/plain, Size: 2325 bytes --]

/* Example usage of custom event API */

#define MODULE

#if 0
#define CONFIG_TRACE
#endif

#include <linux/config.h>
#include <linux/module.h>
#include <linux/trace.h>
#include <asm/string.h>

struct delta_event
{
  int   an_int;
  char  a_char;
};

static int alpha_id, omega_id, theta_id, delta_id, rho_id;

int init_module(void)
{
  uint8_t a_byte;
  char a_char;
  int an_int;
  int a_hex;
  char* a_string = "We are initializing the module";
  struct delta_event a_delta_event;

  /* Create events */
  alpha_id = trace_create_event("Alpha",
				"Number %d, String %s, Hex %08X",
				CUSTOM_EVENT_FORMAT_TYPE_STR,
				NULL);
  omega_id = trace_create_event("Omega",
				"Number %d, Char %c",
				CUSTOM_EVENT_FORMAT_TYPE_STR,
				NULL);
  theta_id = trace_create_event("Theta",
				"Plain string",
				CUSTOM_EVENT_FORMAT_TYPE_STR,
				NULL);
  delta_id = trace_create_event("Delta",
				NULL,
				CUSTOM_EVENT_FORMAT_TYPE_HEX,
				NULL);
  rho_id = trace_create_event("Rho",
			      NULL,
			      CUSTOM_EVENT_FORMAT_TYPE_HEX,
			      NULL);

  /* Trace events */
  an_int = 1;
  a_hex = 0xFFFFAAAA;
  trace_std_formatted_event(alpha_id, an_int, a_string, a_hex);
  an_int = 25;
  a_char = 'c';
  trace_std_formatted_event(omega_id, an_int, a_char);
  trace_std_formatted_event(theta_id);
  memset(&a_delta_event, 0, sizeof(a_delta_event));
  trace_raw_event(delta_id, sizeof(a_delta_event), &a_delta_event);
  a_byte = 0x12;
  trace_raw_event(rho_id, sizeof(a_byte), &a_byte);

  return 0;
}

void cleanup_module(void)
{
  uint8_t a_byte;
  char a_char;
  int an_int;
  int a_hex;
  char* a_string = "We are initializing the module";
  struct delta_event a_delta_event;

  /* Trace events */
  an_int = 324;
  a_hex = 0xABCDEF10;
  trace_std_formatted_event(alpha_id, an_int, a_string, a_hex);
  an_int = 789;
  a_char = 's';
  trace_std_formatted_event(omega_id, an_int, a_char);
  trace_std_formatted_event(theta_id);
  memset(&a_delta_event, 0xFF, sizeof(a_delta_event));
  trace_raw_event(delta_id, sizeof(a_delta_event), &a_delta_event);
  a_byte = 0xA4;
  trace_raw_event(rho_id, sizeof(a_byte), &a_byte);

  /* Destroy the events created */
  trace_destroy_event(alpha_id);
  trace_destroy_event(omega_id);
  trace_destroy_event(theta_id);
  trace_destroy_event(delta_id);
  trace_destroy_event(rho_id);
}


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vm lock contention reduction
@ 2002-07-07  2:50 Andrew Morton
  2002-07-07  3:05 ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2002-07-07  2:50 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, Rik van Riel, linux-mm, Martin J. Bligh

Andrea Arcangeli wrote:
> 
> On Thu, Jul 04, 2002 at 11:33:45PM -0700, Andrew Morton wrote:
> > Well.  First locks first.  kmap_lock is a bad one on x86.
> 
> Actually I thought about kmap_lock and the per-process kmaps a bit more
> with Martin (cc'ed) during OLS and there is an easy process-scalable
> solution to drop:

Martin is being bitten by the global invalidate more than by the lock.
He increased the size of the kmap pool just to reduce the invalidate
frequency and saw 40% speedups of some stuff.

Those invalidates don't show up nicely on profiles.

>         the kmap_lock
>         in turn the global pool
>         in turn the global tlb flush
> 
> The only problem is that it's not anymore both atomic *and* persistent,
> it's only persistent. It's also atomic if the mm_count == 1, but the
> kernel cannot rely on it, it has to assume it's a blocking operation
> always (you find it out if it's blocking only at runtime).

I was discussing this with sct a few days back.  iiuc, the proposal
was to create a small per-cpu pool (say, 4-8 pages) which is a
"front-end" to regular old kmap().

Any time you have one of these pages in use, the process gets
pinned onto the current CPU. If we run out of per-cpu kmaps,
just fall back to traditional kmap().

It does mean that this variant of kmap() couldn't just return
a `struct page *' - it would have to return something richer
than that.

> In short the same design of the per-process kmaps will work just fine if
> we add a semaphore to the mm_struct. then before starting using the kmap
> entry we must acquire the semaphore. This way all the global locking and
> global tlb flush goes away completely for normal tasks, but still
> remains the contention of that per-mm semaphore with threads doing
> simutaneous pte manipulation or simultaneous pagecache I/O though.
> Furthmore this I/O will be serialized, threaded benchmark like dbench
> may perform poorly that way I suspect, or we should add a pool of
> userspace pages so more than 1 thread is allowed to go ahead, but still
> we may cacheline-bounce in the synchronization of the pool across
> threads (similar to what we do now in the global pool).
> 
> Then there's the problem the pagecache/FS API should be changed to pass
> the vaddr through the stack because page->virtual would go away, the
> virtual address would be per-process protected by the mm->kmap_sem so we
> couldn't store it in a global, all tasks can kmap the same page at the
> same time at virtual vaddr. This as well will break some common code.
> 
> Last but not the least, I hope in 2.6 production I won't be running
> benchmarks and profiling using a 32bit cpu anymore anyways.
> 
> So I'm not very motivated anymore in doing that change after the comment
> from Linus about the issue with threads.

I believe that IBM have 32gig, 8- or 16-CPU ia32 machines just
coming into production now.  Presumably, they're not the only
ones.  We're stuck with this mess for another few years.

-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vm lock contention reduction
  2002-07-07  2:50 vm lock contention reduction Andrew Morton
@ 2002-07-07  3:05 ` Linus Torvalds
  2002-07-07  3:47   ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2002-07-07  3:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, Rik van Riel, linux-mm, Martin J. Bligh

On Sat, 6 Jul 2002, Andrew Morton wrote:
>
> Martin is being bitten by the global invalidate more than by the lock.
> He increased the size of the kmap pool just to reduce the invalidate
> frequency and saw 40% speedups of some stuff.
>
> Those invalidates don't show up nicely on profiles.

I'd like to enhance the profiling support a bit, to create some
infrastructure for doing different kinds of profiles, not just the current
timer-based one (and not just for the kernel).

There's also the P4 native support for "event buffers" or whatever intel
calls them, that allows profiling at a lower level by interrupting not for
every event, but only when the hw buffer overflows.

I haven't had much time to look at the oprofile thing, but what I _have_
seen has made me rather unhappy (especially the horrid system call
tracking kludges).

I'd rather have some generic hooks (a notion of a "profile buffer" and
events that cause us to have to synchronize with it, like process
switches, mmap/munmap - oprofile wants these too), and some generic helper
routines for profiling (turn any eip into a "dentry + offset" pair
together with ways to tag specific dentries as being "worthy" of
profiling).

Depending on the regular timer interrupt will never give good profiles,
simply because it can't be NMI, but also because you then don't have the
choice of using other counters (cache miss etc).

oprofile does much of this, but in a damn ugly manner.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: vm lock contention reduction
  2002-07-07  3:05 ` Linus Torvalds
@ 2002-07-07  3:47   ` Andrew Morton
  2002-07-08 11:39     ` Enhanced profiling support (was Re: vm lock contention reduction) John Levon
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2002-07-07  3:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Rik van Riel, linux-mm, Martin J. Bligh, John Levon

Linus Torvalds wrote:
> 
> On Sat, 6 Jul 2002, Andrew Morton wrote:
> >
> > Martin is being bitten by the global invalidate more than by the lock.
> > He increased the size of the kmap pool just to reduce the invalidate
> > frequency and saw 40% speedups of some stuff.
> >
> > Those invalidates don't show up nicely on profiles.
> 
> I'd like to enhance the profiling support a bit, to create some
> infrastructure for doing different kinds of profiles, not just the current
> timer-based one (and not just for the kernel).
> 
> There's also the P4 native support for "event buffers" or whatever intel
> calls them, that allows profiling at a lower level by interrupting not for
> every event, but only when the hw buffer overflows.
> 
> I haven't had much time to look at the oprofile thing, but what I _have_
> seen has made me rather unhappy (especially the horrid system call
> tracking kludges).
> 
> I'd rather have some generic hooks (a notion of a "profile buffer" and
> events that cause us to have to synchronize with it, like process
> switches, mmap/munmap - oprofile wants these too), and some generic helper
> routines for profiling (turn any eip into a "dentry + offset" pair
> together with ways to tag specific dentries as being "worthy" of
> profiling).
> 
> Depending on the regular timer interrupt will never give good profiles,
> simply because it can't be NMI, but also because you then don't have the
> choice of using other counters (cache miss etc).
> 
> oprofile does much of this, but in a damn ugly manner.
> 

I pinged John about an oprofile merge just the other day actually.
He agrees with you on the syscall table thing.  I think he says
that it could be cleaned up if oprofile was in the tree.  Ditto
the issue with mmap.

I was able to isolate and fix some fairly hairy performance problems
at work with oprofile.  It's a great tool - I use it all the time.  And
it profiles the entire system - right down to file-n-line in some random
shared object.  With NMIs.  It is not just a kernel tool.

So.  John.  Get coding :-)

-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-07  3:47   ` Andrew Morton
@ 2002-07-08 11:39     ` John Levon
  2002-07-08 17:52       ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: John Levon @ 2002-07-08 11:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Andrea Arcangeli, Rik van Riel, linux-mm,
	Martin J. Bligh, linux-kernel

[Excuse the quoting, I was out of the loop on this ...]

On Sat, Jul 06, 2002 at 08:47:54PM -0700, Andrew Morton wrote:

> Linus Torvalds wrote:
> > 
> > I haven't had much time to look at the oprofile thing, but what I _have_
> > seen has made me rather unhappy (especially the horrid system call
> > tracking kludges).

It makes me very unhappy too. There are a number of horrible things
there, mostly for the sake of convenience and performance.
sys_call_table is just the most obviously foul thing.  I'm glad to hear
there is interest in getting some kernel support for such things to be
done tastefully.

> > I'd rather have some generic hooks (a notion of a "profile buffer" and
> > events that cause us to have to synchronize with it, like process
> > switches, mmap/munmap - oprofile wants these too), and some generic helper
> > routines for profiling (turn any eip into a "dentry + offset" pair
> > together with ways to tag specific dentries as being "worthy" of
> > profiling).

How do you see such dentry names being exported to user-space for the
profiling daemon to access ? The current oprofile scheme is, um, less
than ideal ...

> So.  John.  Get coding :-)

I'm interested in doing so but I'd like to hear some more on how people
perceive this working. It essentially means a fork for a lot of the
kernel-side code, so it'd mean a lot more work for us (at least until I
can drop the 2.2/2.4 versions).

regards
john

-- 
"If a thing is not diminished by being shared, it is not rightly owned if
 it is only owned & not shared."
	- St. Augustine
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-08 11:39     ` Enhanced profiling support (was Re: vm lock contention reduction) John Levon
@ 2002-07-08 17:52       ` Linus Torvalds
  2002-07-08 18:41         ` Karim Yaghmour
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2002-07-08 17:52 UTC (permalink / raw)
  To: John Levon
  Cc: Andrew Morton, Andrea Arcangeli, Rik van Riel, linux-mm,
	Martin J. Bligh, linux-kernel


On Mon, 8 Jul 2002, John Levon wrote:
>
> > I'd rather have some generic hooks (a notion of a "profile buffer" and
> > events that cause us to have to synchronize with it, like process
> > switches, mmap/munmap - oprofile wants these too), and some generic helper
> > routines for profiling (turn any eip into a "dentry + offset" pair
> > together with ways to tag specific dentries as being "worthy" of
> > profiling).
>
> How do you see such dentry names being exported to user-space for the
> profiling daemon to access ? The current oprofile scheme is, um, less
> than ideal ...

Ok, I'll outline my personal favourite interface, but I'd also better
point out that while I've thought a bit about what I'd like to have and
how it could be implemented in the kernel, I have _not_ actually tried any
of it out, much less thought about what the user level stuff really needs.

Anyway, here goes a straw-man:

 - I'd associate each profiling event with a dentry/offset pair, simply
   because that's the highest-level thing that the kernel knows about and
   that is "static".

 - I'd suggest that the profiler explicitly mark the dentries it wants
   profiled, so that the kernel can throw away events that we're not
   interested in. The marking function would return a cookie to user
   space, and increment the dentry count (along with setting the
   "profile" flag in the dentry)

 - the "cookie" (which would most easily just be the kernel address of the
   dentry) would be the thing that we give to user-space (along with
   offset) on profile read. The user app can turn it back into a filename.

Whether it is the original "mark this file for profiling" phase that saves
away the cookie<->filename association, or whether we also have a system
call for "return the path of this cookie", I don't much care about.
Details, details.

Anyway, what would be the preferred interface from user level?

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-08 17:52       ` Linus Torvalds
@ 2002-07-08 18:41         ` Karim Yaghmour
  2002-07-10  2:22           ` John Levon
  0 siblings, 1 reply; 13+ messages in thread
From: Karim Yaghmour @ 2002-07-08 18:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Levon, Andrew Morton, Andrea Arcangeli, Rik van Riel,
	linux-mm, Martin J. Bligh, linux-kernel, Richard Moore, bob

Linus Torvalds wrote:
> On Mon, 8 Jul 2002, John Levon wrote:
> > How do you see such dentry names being exported to user-space for the
> > profiling daemon to access ? The current oprofile scheme is, um, less
> > than ideal ...
> 
> Ok, I'll outline my personal favourite interface, but I'd also better
> point out that while I've thought a bit about what I'd like to have and
> how it could be implemented in the kernel, I have _not_ actually tried any
> of it out, much less thought about what the user level stuff really needs.

Sure. I've done some work on profiling using trace hooks. Hopefully the
following is useful.

> Anyway, here goes a straw-man:
> 
>  - I'd associate each profiling event with a dentry/offset pair, simply
>    because that's the highest-level thing that the kernel knows about and
>    that is "static".

dentry + offset: on a 32bit machine, this is 8 bytes total per event being
profiled. This is a lot of information if you are trying you have a high
volume throughput. You can almost always skip the dentry since you know scheduling
changes and since you can catch a system-state snapshot at the begining of
the profiling. After that, the eip is sufficient and can easily be correlated
to a meaningfull entry in a file in user-space.

>  - I'd suggest that the profiler explicitly mark the dentries it wants
>    profiled, so that the kernel can throw away events that we're not
>    interested in. The marking function would return a cookie to user
>    space, and increment the dentry count (along with setting the
>    "profile" flag in the dentry)

Or the kernel can completely ignore this sort of selection and leave it
all to the agent responsible for collecting the events. This is what is
done in LTT. Currently, you can select one PID, GID, UID, but this
is easily extendable to include many. Of course if you agree to having
the task struct have "trace" or "profile" field, then this would be
much easier.

>  - the "cookie" (which would most easily just be the kernel address of the
>    dentry) would be the thing that we give to user-space (along with
>    offset) on profile read. The user app can turn it back into a filename.

That's the typical scheme and the one possible with the data retrieved
by LTT.

> Whether it is the original "mark this file for profiling" phase that saves
> away the cookie<->filename association, or whether we also have a system
> call for "return the path of this cookie", I don't much care about.
> Details, details.
> 
> Anyway, what would be the preferred interface from user level?

The approach LTT takes is that no part in the kernel should actually care
about the user level needs. Anything in user level that has to modify
the tracing/profiling makes its requests to the trace driver, /dev/tracer.
No additional system calls, no special cases in the main kernel code. Only
3 main files:
kernel/trace.c: The main entry point for all events (trace_event())
drivers/trace/tracer.c: The trace driver
include/linux/trace.h: The trace hook definitions

Cheers,

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-08 18:41         ` Karim Yaghmour
@ 2002-07-10  2:22           ` John Levon
  2002-07-10  4:16             ` Karim Yaghmour
  0 siblings, 1 reply; 13+ messages in thread
From: John Levon @ 2002-07-10  2:22 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Linus Torvalds, Andrew Morton, Andrea Arcangeli, Rik van Riel,
	linux-mm, Martin J. Bligh, linux-kernel, Richard Moore, bob

On Mon, Jul 08, 2002 at 02:41:00PM -0400, Karim Yaghmour wrote:

> dentry + offset: on a 32bit machine, this is 8 bytes total per event being
> profiled. This is a lot of information if you are trying you have a high
> volume throughput.

I haven't found that to be significant in profiling overhead, mainly
because the hash table removes some of the "sting" of high sampling
rates (and the interrupt handler dwarfs all other aspects). The
situation is probably different for more general tracing purposes, but
I'm dubious as to the utility of a general tracing mechanism.

(besides, a profile buffer needs a sample context value too, for things
like CPU number and perfctr event number).

> You can almost always skip the dentry since you know scheduling
> changes and since you can catch a system-state snapshot at the begining of
> the profiling. After that, the eip is sufficient and can easily be correlated
> to a meaningfull entry in a file in user-space.

But as I point out in my other post, dentry-offset alone is not as
useful as it could be...

I just don't see a really good reason to introduce insidious tracing
throughout. Both tracing and profiling are ugly ugly things to be doing
by their very nature, and I'd much prefer to keep such intrusions to a
bare minimum.

The entry.S examine-the-registers approach is simple enough, but it's
not much more tasteful than sys_call_table hackery IMHO

regards
john

-- 
"I know I believe in nothing but it is my nothing"
	- Manic Street Preachers
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-10  2:22           ` John Levon
@ 2002-07-10  4:16             ` Karim Yaghmour
  2002-07-10  4:38               ` John Levon
  0 siblings, 1 reply; 13+ messages in thread
From: Karim Yaghmour @ 2002-07-10  4:16 UTC (permalink / raw)
  To: John Levon
  Cc: Linus Torvalds, Andrew Morton, Andrea Arcangeli, Rik van Riel,
	linux-mm, Martin J. Bligh, linux-kernel, Richard Moore, bob

John Levon wrote:
> I'm dubious as to the utility of a general tracing mechanism.
...
> I just don't see a really good reason to introduce insidious tracing
> throughout. Both tracing and profiling are ugly ugly things to be doing
> by their very nature, and I'd much prefer to keep such intrusions to a
> bare minimum.

Tracing is essential for an entire category of problems which can only
be solved by obtaining the raw data which describesg the dynamic behavior
of the kernel.

Have you ever tried to solve an inter-process synchronization problem
using strace or gdb? In reality, only tracing built into the kernel can
enable a developer to solve such problems.

Have you ever tried to follow the exact reaction applications have
to kernel input? It took lots of ad-hoc experimentation to isolate the
thundering hurd problem. Tracing would have shown this immediately.

Can you list the exact sequence of processes that are scheduled
in reaction to input when you press the keyboard while running a
terminal in X? This is out of reach of most user-space programmers but
a trace shows this quite nicely.

Ever had a box saturated with IRQs and still showing 0% CPU usage? This
problem has been reported time and again. Lately someone was asking
about the utility which soaks-up CPU cycles to show this sort of
situation. Once more, tracing shows this right away ... without soaking
up CPU cycles.

Ever tried to get the exact time spent by an application in user-space
vs. kernel space? Even better, can you tell the actually syscall which
cost the most in kernel time? You can indeed get closer using random sampling,
but it's just one more thing tracing gives you without any difficulty.

And the list goes on.

The fact that so many kernel subsystems already have their own tracing
built-in (see other posting) clearly shows that there is a fundamental
need for such a utility even for driver developers. If many driver
developers can't develop drivers adequately without tracing, can we
expect user-space developers to efficiently use the kernel if they have
absolutely no idea about the dynamic interaction their processes have
with the kernel and how this interaction is influenced by and influences
the interaction with other processes?

> The entry.S examine-the-registers approach is simple enough, but it's
> not much more tasteful than sys_call_table hackery IMHO

I guess we won't agree on this. From my point of view it is much better
to have the code directly within entry.S for all to see instead of
having some external software play around with the syscall table in a
way kernel users can't trace back to the kernel's own code.

Cheers,

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-10  4:16             ` Karim Yaghmour
@ 2002-07-10  4:38               ` John Levon
  2002-07-10  5:46                 ` Karim Yaghmour
  2002-07-10 13:10                 ` bob
  0 siblings, 2 replies; 13+ messages in thread
From: John Levon @ 2002-07-10  4:38 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Linus Torvalds, Andrew Morton, Andrea Arcangeli, Rik van Riel,
	linux-mm, Martin J. Bligh, linux-kernel, Richard Moore, bob

On Wed, Jul 10, 2002 at 12:16:05AM -0400, Karim Yaghmour wrote:

[snip]

> And the list goes on.

Sure, there are all sorts of things where some tracing can come in
useful. The question is whether it's really something the mainline
kernel should be doing, and if the gung-ho approach is nice or not.

> The fact that so many kernel subsystems already have their own tracing
> built-in (see other posting)

Your list was almost entirely composed of per-driver debug routines.
This is not the same thing as logging trap entry/exits, syscalls etc
etc, on any level, and I'm a bit perplexed that you're making such an
assocation.

> expect user-space developers to efficiently use the kernel if they
> have
> absolutely no idea about the dynamic interaction their processes have
> with the kernel and how this interaction is influenced by and
> influences
> the interaction with other processes?

This is clearly an exaggeration. And seeing as something like LTT
doesn't (and cannot) tell the "whole story" either, I could throw the
same argument directly back at you. The point is, there comes a point of
no return where usefulness gets outweighed by ugliness. For the very few
cases that such detailed information is really useful, the user can
usually install the needed special-case tools.

In contrast a profiling mechanism that improves on the poor lot that
currently exists (gprof, readprofile) has a truly general utility, and
can hopefully be done without too much ugliness.

The primary reason I want to see something like this is to kill the ugly
code I have to maintain.

> > The entry.S examine-the-registers approach is simple enough, but
> > it's
> > not much more tasteful than sys_call_table hackery IMHO
>
> I guess we won't agree on this. From my point of view it is much
> better
> to have the code directly within entry.S for all to see instead of
> having some external software play around with the syscall table in a
> way kernel users can't trace back to the kernel's own code.

Eh ? I didn't say sys_call_table hackery was better. I said the entry.S
thing wasn't much better ...

regards
john

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-10  4:38               ` John Levon
@ 2002-07-10  5:46                 ` Karim Yaghmour
  2002-07-10 13:10                 ` bob
  1 sibling, 0 replies; 13+ messages in thread
From: Karim Yaghmour @ 2002-07-10  5:46 UTC (permalink / raw)
  To: John Levon
  Cc: Linus Torvalds, Andrew Morton, Andrea Arcangeli, Rik van Riel,
	linux-mm, Martin J. Bligh, linux-kernel, Richard Moore, bob

John Levon wrote:
> > And the list goes on.
> 
> Sure, there are all sorts of things where some tracing can come in
> useful. The question is whether it's really something the mainline
> kernel should be doing, and if the gung-ho approach is nice or not.

I'm not sure how to read "gung-ho approach" but I would point out that
LTT wasn't built overnight. It's been out there for 3 years now.

As for whether the mainline kernel should have it or not, then I
think the standard has always been set using the actual purpose:
Is this useful to users or is this only useful to kernel developers?
Whenever it was the later, then the answer has almost always been NO.
In the case of tracing I think that not only is it "useful" to users,
but I would say "essential". Would you ask a user to recompile his
kernel with a ptrace patch because he needs to use gdb? I don't
think so, and I don't think application developers or system
administrators should have to recompile their kernel either in
order to understand the system' dynamics.

If facts are of interest, then I would point out that LTT is already
part of a number of distributions out there. Most other distros that
don't have it included find it very useful but don't want to get
involved in maintaining it until it's part of the kernel.

> > The fact that so many kernel subsystems already have their own tracing
> > built-in (see other posting)
> 
> Your list was almost entirely composed of per-driver debug routines.

I don't see any contradiction here. This is part of what I'm pointing out.
Mainly that understanding the dynamic behavior of the system is essential
to software development, especially when complex interactions, such as
in the Linux kernel, take place.

> This is not the same thing as logging trap entry/exits, syscalls etc
> etc, on any level, and I'm a bit perplexed that you're making such an
> assocation.

In regards to trap entry/exits, there was a talk a couple of years ago
by Nat Friedman, I think, which discussed GNU rope. Basically, it
identified the location of page fault misses at the start of programs
and used this information to reorder the binary in order to accelerate
its load time. That's just one example where traps are of interest.

Not to mention that traps can result in scheduling changes and that
knowing why a process has been scheduled out is all part of understanding
the system's dynamics.

As for syscalls, oprofile, for one, really needs this sort of info. So
I don't see your point.

There are 2 things LTT provides in the kernel:
1- A unified tracing and high-throughput data-collection system
2- A select list of trace points in the kernel

Item #1 can easily replace the existing ad-hoc implementation while
serving oprofile, DProbes and others. Item #2 is of prime interest
to application developers and system administrators.

> > expect user-space developers to efficiently use the kernel if they
> > have
> > absolutely no idea about the dynamic interaction their processes have
> > with the kernel and how this interaction is influenced by and
> > influences
> > the interaction with other processes?
> 
> This is clearly an exaggeration. And seeing as something like LTT
> doesn't (and cannot) tell the "whole story" either, I could throw the
> same argument directly back at you. The point is, there comes a point of
> no return where usefulness gets outweighed by ugliness. For the very few
> cases that such detailed information is really useful, the user can
> usually install the needed special-case tools.

I'd really be interested in seing what you mean by "ugliness". If there's
a list of grievances you have with LTT then I'm all ears.

Anything inserted by LTT is clean and clear. It doesn't change anything
to the kernel's normal behavior and once a trace point is inserted it
requires almost zero maintenance. Take for example the scheduling change
trace point (kernel/sched.c:schedule()):
	TRACE_SCHEDCHANGE(prev, next);

I don't see why this should be ugly. It's an inline that results in
zero lines of code if you configure tracing as off.

The cases I presented earlier clearly show the usefullness of this
information. The developer who needs to solve his synchronization
problem or the system administrator who wants to understand why his
box is so slow doesn't really want to patch/recompile/reboot to fix
his problem.

You would like to paint these as "very few cases". Unfortunately
these cases are much more common than you say they are.

> In contrast a profiling mechanism that improves on the poor lot that
> currently exists (gprof, readprofile) has a truly general utility, and
> can hopefully be done without too much ugliness.

Somehow it is not justifiable to add a feature the like of which didn't
exist before but it is justifiable to add a feature which only "improves"
on existing tools?

As I said before, LTT and the information it provides has a truly
general utility: Oprofile can use it, DProbes uses it, a slew of existing
ad-hoc tracing systems in the kernel can be replaced by it, application
developers can isolate synchronization problems with it, system
administrators can identify performance bottlenecks with it, etc.

An argument over which is more useful between LTT and oprofile is bound
to be useless. If nothing else, I see LTT as having a more general use
than oprofile. But let's not get into that. What I'm really interested in
here is the possibilty of having one unified tracing and data collection
system which serves many purposes instead of having each subsystem or
profiler have its own tracing and data collection mechanism.

> The primary reason I want to see something like this is to kill the ugly
> code I have to maintain.

I can't say that my goals are any different.

> > > The entry.S examine-the-registers approach is simple enough, but
> > > it's
> > > not much more tasteful than sys_call_table hackery IMHO
> >
> > I guess we won't agree on this. From my point of view it is much
> > better
> > to have the code directly within entry.S for all to see instead of
> > having some external software play around with the syscall table in a
> > way kernel users can't trace back to the kernel's own code.
> 
> Eh ? I didn't say sys_call_table hackery was better. I said the entry.S
> thing wasn't much better ...

I know you weren't saying that. I said that in _my point of view_ the entry.S
"thing" is much better.

Cheers,

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Enhanced profiling support (was Re: vm lock contention reduction)
  2002-07-10  4:38               ` John Levon
  2002-07-10  5:46                 ` Karim Yaghmour
@ 2002-07-10 13:10                 ` bob
  1 sibling, 0 replies; 13+ messages in thread
From: bob @ 2002-07-10 13:10 UTC (permalink / raw)
  To: John Levon
  Cc: Karim Yaghmour, Linus Torvalds, Andrew Morton, Andrea Arcangeli,
	Rik van Riel, linux-mm, Martin J. Bligh, linux-kernel,
	Richard Moore, bob, okrieg

John,
     I have been cc'ed on this email as I was an active participant at the
RAS performance monitoring/tracing session at OLS.  Let me preface by
saying that my view may be a bit biased as I have worked on the the core
tracing infrastructure that went into IRIX in the mid 90s as well as the
tracing infrastructure for K42, our research OS (see
http://www.research.ibm.com/K42/index.html), and in both cases helped solve
performance issues that would not have been otherwise solved.  From the
below it doesn't appear that anyone is arguing that tracing is not useful.
The debate (except for some of the details) appears over whether it should
be included in the kernel in a first-class manner or individual mechanisms
put in on an ad-hoc basis.  While this is indeed philosophical, let me
share some experiences and benefits from other systems I've worked on:
 1) The mechanism proposed is very non-invasive, a single line of
code (some TRACE_XXX macro or like) is added to the area of interest.
Further, at OLS, the proposal was to add only a few trace points.
Programming-wise this does not clutter the code - in fact having a single
well-known unified mechanism is cleaner coding than a set of one-off
ways, as when anyone sees a trace macro it will be clear what it is.
 2) In the end, there will be less intrusion with a single unified
approach.  With a series of different mechanisms over time multiple events
will get added in the same place creating performance issues and more
importantly causing confusion.
 3) A unified approach will uncover performance issues not explicitly being 
searched for and allow ones of known interest to be tracked down without
adding a patch (that may be cumbersome to maintain) and re-compilation.
 4) In both my experiences, I have had resistance to adding this
tracing infrastructure, and in both experiences other kernel developers
have come back after the fact and thanked me for my persistence :-), as it
helped them solve timing sensitive or other such issues they were having
great difficulty with.

If there is interest, I would happy to set up a conference number so people 
who are interested could all speak.

-bob

Robert Wisniewski
The K42 MP OS Project
Advanced Operating Systems
Scalable Parallel Systems
IBM T.J. Watson Research Center
914-945-3181
http://www.research.ibm.com/K42/
bob@watson.ibm.com

----

John Levon writes:
 > On Wed, Jul 10, 2002 at 12:16:05AM -0400, Karim Yaghmour wrote:
 >  
 > [snip]
 >  
 > > And the list goes on.
 >  
 > Sure, there are all sorts of things where some tracing can come in
 > useful. The question is whether it's really something the mainline
 > kernel should be doing, and if the gung-ho approach is nice or not.
 > 
 > > The fact that so many kernel subsystems already have their own tracing
 > > built-in (see other posting)
 >  
 > Your list was almost entirely composed of per-driver debug routines.
 > This is not the same thing as logging trap entry/exits, syscalls etc
 > etc, on any level, and I'm a bit perplexed that you're making such an
 > assocation.
 >  
 > > expect user-space developers to efficiently use the kernel if they
 > > have
 > > absolutely no idea about the dynamic interaction their processes have
 > > with the kernel and how this interaction is influenced by and
 > > influences
 > > the interaction with other processes?
 >  
 > This is clearly an exaggeration. And seeing as something like LTT
 > doesn't (and cannot) tell the "whole story" either, I could throw the
 > same argument directly back at you. The point is, there comes a point of
 > no return where usefulness gets outweighed by ugliness. For the very few
 > cases that such detailed information is really useful, the user can
 > usually install the needed special-case tools.
 >  
 > In contrast a profiling mechanism that improves on the poor lot that
 > currently exists (gprof, readprofile) has a truly general utility, and
 > can hopefully be done without too much ugliness.
 >  
 > The primary reason I want to see something like this is to kill the ugly
 > code I have to maintain.
 > 
 > > > The entry.S examine-the-registers approach is simple enough, but
 > > > it's
 > > > not much more tasteful than sys_call_table hackery IMHO
 > >
 > > I guess we won't agree on this. From my point of view it is much
 > > better
 > > to have the code directly within entry.S for all to see instead of
 > > having some external software play around with the syscall table in a
 > > way kernel users can't trace back to the kernel's own code.
 > 
 > Eh ? I didn't say sys_call_table hackery was better. I said the entry.S
 > thing wasn't much better ...
 > 
 > regards
 > john
 > 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-07-11  4:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-10 14:28 Enhanced profiling support (was Re: vm lock contention reduction) Richard J Moore
2002-07-10 20:30 ` Karim Yaghmour
2002-07-10 21:41   ` Andrea Arcangeli
2002-07-11  4:47     ` Karim Yaghmour
2002-07-11  4:59       ` Karim Yaghmour
  -- strict thread matches above, loose matches on Subject: below --
2002-07-07  2:50 vm lock contention reduction Andrew Morton
2002-07-07  3:05 ` Linus Torvalds
2002-07-07  3:47   ` Andrew Morton
2002-07-08 11:39     ` Enhanced profiling support (was Re: vm lock contention reduction) John Levon
2002-07-08 17:52       ` Linus Torvalds
2002-07-08 18:41         ` Karim Yaghmour
2002-07-10  2:22           ` John Levon
2002-07-10  4:16             ` Karim Yaghmour
2002-07-10  4:38               ` John Levon
2002-07-10  5:46                 ` Karim Yaghmour
2002-07-10 13:10                 ` bob

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox