linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
To: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
	cl@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, rdunlap@xenotime.net,
	mpm@selenic.com, Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC PATCH 1/4] kmemtrace: Core implementation.
Date: Mon, 28 Jul 2008 20:35:54 +0300	[thread overview]
Message-ID: <20080728173549.GA5185@localhost> (raw)
In-Reply-To: <20080728162916.GD17823@Krystal>

[-- Attachment #1: Type: text/plain, Size: 9416 bytes --]

On Mon, Jul 28, 2008 at 12:29:17PM -0400, Mathieu Desnoyers wrote:
> Hmm ? why record an invalid event ?? I see it's not used in the code, is
> that actually used in some way because the memory is set to 0 ?

The relay interface is really inconsistent and produces erroneous output
if it's not used in a specific way. It's nice to be able to catch these
errors if they occur (e.g. in case we have a regression).

> > > +	Target CPU	(4 bytes)	Signed integer, valid for event id 1.
> > > +					If equal to -1, target CPU is the same
> > > +					as origin CPU, but the reverse might
> > > +					not be true.
> 
> If only valid for event ID 1 and only in NUMA case, please don't waste
> space in each event header and make that a event-specific field... ?

Yes, this would probably be a better approach.

> > > +	Caller address	(8 bytes)	Return address to the caller.
> 
> Not true on 32 bits machines. You are wasting 4 bytes on those archs.

Pekka suggested we use types that have constant size on every arch. I
could change this.

> 8 bytes for GFP flags ?? Whoah, that's a lot of one-hot bits ! :) I knew
> that some allocators were bloated, bit not that much. :)

This could change too, but if the number of GFP flags is too close to
32, I'd rather keep the ABI stable, providing for a larger number of GFP
flags.

> > > +	Timestamp	(8 bytes)	Signed integer representing timestamp.
> > > +
> 
> With a heartbeat, as lttng does, you can cut that to a 4 bytes field.

Hmm, I'll look at lttng's code and see what exactly you are talking
about. For now, I'm not sure how 32-bit timestamps perform.

> > > +The data is made available in the same endianness the machine has.
> > > +
> 
> Using a magic number in the trace header lets you deal with
> cross-endianness.

Why? I mean I can do this in the userspace app when I record the data.

> Saving the type sizes in the trace header lets you deal with different
> int/long/pointer type sizes.
> 
> > > +Other event ids and type ids may be defined and added. Other fields may be
> > > +added by increasing event size. Every modification to the ABI, including
> > > +new id definitions, are followed by bumping the ABI version by one.
> > > +
> 
> I personally prefer a self-describing trace :)

ASCII/text? :)
I'm not sure what you meant, but non-binary traces would result in huge
amounts of data.

> Not currently true : cross-endianness/wastes space for 32 bits archs.

Sure, cross-endianness is not even currently implemented in the
userspace app.

> > > +	- be fast and anticipate usage in high-load environments (*)
> 
> LTTng will be faster though : per-cpu atomic ops instead of interrupt
> disable makes the probe faster.

I'm not sure how one could record a timestamp orderly into relay buffers
using only atomic ops and no locks.

> > > +	- be reasonably extensible
> 
> Automatic description of markers and dynamic assignation of IDs to
> markers should provide a bit more flexibility here.

Dynamic assignation makes it hard to preserve ABI compatibility with the
userspace app. And Pekka suggested it's important to preserve it in
order to allow distros to include kmemtrace.
 
> > > +	- make it possible for GNU/Linux distributions to have kmemtrace
> > > +	included in their repositories
> > > +
> > > +(*) - one of the reasons Pekka Enberg's original userspace data analysis
> > > +    tool's code was rewritten from Perl to C (although this is more than a
> > > +    simple conversion)
> > > +
> > > +
> > > +III. Quick usage guide
> > > +======================
> > > +
> > > +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
> > > +CONFIG_KMEMTRACE and CONFIG_DEFAULT_ENABLED).
> > > +
> > > +2) Get the userspace tool and build it:
> > > +$ git-clone git://repo.or.cz/kmemtrace-user.git		# current repository
> > > +$ cd kmemtrace-user/
> > > +$ ./autogen.sh
> > > +$ ./configure
> > > +$ make
> > > +
> > > +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
> > > +'single' runlevel (so that relay buffers don't fill up easily), and run
> > > +kmemtrace:
> > > +# '$' does not mean user, but root here.
> 
> Change the documentation to prefix a root command line by "#" instead of
> leaving this weird comment.

Yes, it's probably better. I just wanted to avoid a user taking that as
a comment.

> What in the world can be causing that ? Shouldn't it be fixed ? It might
> be due to unexpected allocator behavior, non-instrumented alloc/free
> code or broken tracer....

Of course it will be fixed. But this FAQ entry also serves as a warning
that future allocator patches could introduce untraced functions.
 
> > > +struct kmemtrace_event {
> > > +	u8		event_id;	/* Allocate or free? */
> > > +	u8		type_id;	/* Kind of allocation/free. */
> > > +	u16		event_size;	/* Size of event */
> > > +	s32		node;		/* Target CPU. */
> > > +	u64		call_site;	/* Caller address. */
> > > +	u64		ptr;		/* Pointer to allocation. */
> > > +	u64		bytes_req;	/* Number of bytes requested. */
> > > +	u64		bytes_alloc;	/* Number of bytes allocated. */
> > > +	u64		gfp_flags;	/* Requested flags. */
> > > +	s64		timestamp;	/* When the operation occured in ns. */
> > > +} __attribute__ ((__packed__));
> > > +
> 
> See below for detail, but this event record is way too big and not
> adapted to 32 bits architectures.

Pekka, what do you think?

> > > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> > > +				       unsigned long call_site,
> > > +				       const void *ptr)
> > > +{
> > > +	trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu",
> > > +		   type_id, call_site, (unsigned long) ptr);
> > > +}
> 
> This could be trivially turned into a tracepoint probe.

Okay, will rebase my patches on a tracepoints-enabled branch. How close
are they to mainline?

> > > +
> > > +#define KMEMTRACE_SUBBUF_SIZE	(8192 * sizeof(struct kmemtrace_event))
> > > +#define KMEMTRACE_N_SUBBUFS	20
> > > +
> 
> Isn't this overridable by a command line param ? Shouldn't it be called
> "DEFAULT_KMEMTRACE_*" then ?

I wanted to avoid using too long macro names. But I can change this.

> > > +static struct rchan *kmemtrace_chan;
> > > +static u32 kmemtrace_buf_overruns;
> > > +
> > > +static unsigned int kmemtrace_n_subbufs;
> > > +#ifdef CONFIG_KMEMTRACE_DEFAULT_ENABLED
> > > +static unsigned int kmemtrace_enabled = 1;
> > > +#else
> > > +static unsigned int kmemtrace_enabled = 0;
> > > +#endif
> 
> Hrm, I'd leave that as a kernel command line option, not config option.
> If you ever want to _aways_ have it on, then change your lilo/grub file.

Not quite true. I saw a few kernel subsystems that provide compile-time options
for those arches where supplying command-line options is hard/impossible.

> > > +	 * Don't convert this to use structure initializers,
> > > +	 * C99 does not guarantee the rvalues evaluation order.
> > > +	 */
> > > +	ev.event_id = KMEMTRACE_EVENT_ALLOC;
> > > +	ev.type_id = va_arg(*args, int);
> > > +	ev.event_size = sizeof(struct kmemtrace_event);
> > > +	ev.call_site = va_arg(*args, unsigned long);
> > > +	ev.ptr = va_arg(*args, unsigned long);
> 
> Argh, and you do a supplementary copy here. You could simply alias the
> buffers and write directly to them after reserving the correct amount of
> space.

Oh, good point. I could use relay_reserve() here.

> > > +	/* Don't trace ignored allocations. */
> > > +	if (!ev.ptr)
> > > +		return;
> > > +	ev.bytes_req = va_arg(*args, unsigned long);
> > > +	ev.bytes_alloc = va_arg(*args, unsigned long);
> > > +	/* ev.timestamp set below, to preserve event ordering. */
> > > +	ev.gfp_flags = va_arg(*args, unsigned long);
> > > +	ev.node = va_arg(*args, int);
> > > +
> > > +	/* We disable IRQs for timestamps to match event ordering. */
> > > +	local_irq_save(flags);
> > > +	ev.timestamp = ktime_to_ns(ktime_get());
> 
> ktime_get is monotonic, but with potentially coarse granularity. I see
> that you use ktime_to_ns here, which gives you a resolution of 1 timer
> tick in the case where the TSCs are not synchronized. While it should be
> "good enough" for the scheduler, I doubt it's enough for a tracer.
> 
> It also takes the xtime seqlock, which adds a potentially big delay to
> the tracing code (if you read the clock while the writer lock is taken).
> 
> Also, when NTP modifies the clock, although it stays monotonic, the rate
> at which it increments can dramatically change. I doubt you want to use
> that as a reference for performance analysis.

What would you suggest instead?

Please keep in mind timer resolution is not that critical, we're not
benchmarking the allocators cycle-wise; instead we're merely looking at
allocation lifetimes, fragmentation, patterns etc.. Timestamps are most
important for reordering events in userspace.

> > > +static int __init kmemtrace_set_boot_enabled(char *str)
> > > +{
> > > +	if (!str)
> > > +		return -EINVAL;
> > > +
> > > +	if (!strcmp(str, "yes"))
> 
> I think the standard is to use =0, =1 here, not =yes, =no ?

Okay.

> Mathieu
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

Thanks for your comments.


	Cheers,
	Eduard


[-- Attachment #2: Type: application/pgp-signature, Size: 827 bytes --]

  parent reply	other threads:[~2008-07-28 17:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-22 18:36 [RFC PATCH 0/4] kmemtrace RFC (resend 2, fixed wrong Cc) Eduard - Gabriel Munteanu
2008-07-22 18:36 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu
2008-07-22 18:36   ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Eduard - Gabriel Munteanu
2008-07-22 18:36     ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Eduard - Gabriel Munteanu
2008-07-22 18:36       ` [RFC PATCH 4/4] kmemtrace: SLOB hooks Eduard - Gabriel Munteanu
2008-07-22 20:53         ` Matt Mackall
2008-07-22 21:07           ` Eduard - Gabriel Munteanu
2008-07-28  9:41         ` Pekka Enberg
2008-07-28  9:40       ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Pekka Enberg
2008-07-28  9:37     ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Pekka Enberg
2008-07-28  9:24   ` [RFC PATCH 1/4] kmemtrace: Core implementation Pekka Enberg
2008-07-28 16:29     ` Mathieu Desnoyers
2008-07-28 17:09       ` Matt Mackall
2008-07-28 17:35       ` Eduard - Gabriel Munteanu [this message]
2008-07-29  8:25         ` Pekka Enberg
  -- strict thread matches above, loose matches on Subject: below --
2008-07-22 18:31 [RFC PATCH 0/4] kmemtrace RFC (resend 2) Eduard - Gabriel Munteanu
2008-07-22 18:31 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu
2008-07-22 21:28   ` Frank Ch. Eigler
2008-07-23  0:50     ` Eduard - Gabriel Munteanu
2008-07-23  0:55       ` Eduard - Gabriel Munteanu
2008-07-23  1:26       ` Frank Ch. Eigler
2008-07-17  0:46 [RFC PATCH 0/4] kmemtrace RFC (resubmit 1) Eduard - Gabriel Munteanu
2008-07-17  0:46 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu
2008-07-17  8:01   ` Pekka Enberg
2008-07-17 18:32     ` Eduard - Gabriel Munteanu
2008-07-18  8:48       ` Pekka J Enberg
2008-07-18 10:13         ` Eduard - Gabriel Munteanu
2008-07-18 14:38           ` Pekka Enberg
2008-07-18 19:40             ` Eduard - Gabriel Munteanu
2008-07-18 20:07               ` Matt Mackall
2008-07-17 21:34   ` Randy Dunlap
2008-07-17 23:49     ` Eduard - Gabriel Munteanu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080728173549.GA5185@localhost \
    --to=eduard.munteanu@linux360.ro \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mpm@selenic.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=rdunlap@xenotime.net \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox