* [RFC PATCH 0/4] kmemtrace RFC (resubmit 1) @ 2008-07-17 0:46 Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu 0 siblings, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-17 0:46 UTC (permalink / raw) To: penberg; +Cc: cl, linux-mm, linux-kernel Hello everybody, I hopefully fixed previous complaints. Also wrote some documentation and fixed some missing stuff in SLAB. Please take a look and comment. BTW, see Documentation/vm/kmemtrace.txt for details on how to use this and for info on design details. Eduard Eduard - Gabriel Munteanu (4): kmemtrace: Core implementation. kmemtrace: SLAB hooks. kmemtrace: SLUB hooks. kmemtrace: SLOB hooks. Documentation/kernel-parameters.txt | 6 + Documentation/vm/kmemtrace.txt | 96 ++++++++++++++++ MAINTAINERS | 6 + include/linux/kmemtrace.h | 110 ++++++++++++++++++ include/linux/slab_def.h | 56 ++++++++- include/linux/slub_def.h | 9 ++- init/main.c | 2 + lib/Kconfig.debug | 4 + mm/Makefile | 2 +- mm/kmemtrace.c | 208 +++++++++++++++++++++++++++++++++++ mm/slab.c | 61 +++++++++- mm/slob.c | 37 +++++- mm/slub.c | 47 +++++++- 13 files changed, 617 insertions(+), 27 deletions(-) create mode 100644 Documentation/vm/kmemtrace.txt create mode 100644 include/linux/kmemtrace.h create mode 100644 mm/kmemtrace.c -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-17 0:46 [RFC PATCH 0/4] kmemtrace RFC (resubmit 1) Eduard - Gabriel Munteanu @ 2008-07-17 0:46 ` Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Eduard - Gabriel Munteanu ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-17 0:46 UTC (permalink / raw) To: penberg; +Cc: cl, linux-mm, linux-kernel kmemtrace provides tracing for slab allocator functions, such as kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed to the userspace application in order to analyse allocation hotspots, internal fragmentation and so on, making it possible to see how well an allocator performs, as well as debug and profile kernel code. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> --- Documentation/kernel-parameters.txt | 6 + Documentation/vm/kmemtrace.txt | 96 ++++++++++++++++ MAINTAINERS | 6 + include/linux/kmemtrace.h | 110 ++++++++++++++++++ init/main.c | 2 + lib/Kconfig.debug | 4 + mm/Makefile | 2 +- mm/kmemtrace.c | 208 +++++++++++++++++++++++++++++++++++ 8 files changed, 433 insertions(+), 1 deletions(-) create mode 100644 Documentation/vm/kmemtrace.txt create mode 100644 include/linux/kmemtrace.h create mode 100644 mm/kmemtrace.c diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index b52f47d..b230aff 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -49,6 +49,7 @@ parameter is applicable: ISAPNP ISA PnP code is enabled. ISDN Appropriate ISDN support is enabled. JOY Appropriate joystick support is enabled. + KMEMTRACE kmemtrace is enabled. LIBATA Libata driver is enabled LP Printer support is enabled. LOOP Loopback device support is enabled. @@ -941,6 +942,11 @@ and is between 256 and 4096 characters. It is defined in the file use the HighMem zone if it exists, and the Normal zone if it does not. + kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of + subbufs kmemtrace's relay channel has. Set this + higher than default (KMEMTRACE_N_SUBBUFS in code) if + you experience buffer overruns. + movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This parameter is similar to kernelcore except it specifies the amount of memory used for migratable allocations. diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt new file mode 100644 index 0000000..1147ecb --- /dev/null +++ b/Documentation/vm/kmemtrace.txt @@ -0,0 +1,96 @@ + kmemtrace - Kernel Memory Tracer + + by Eduard - Gabriel Munteanu + <eduard.munteanu@linux360.ro> + + +I. Design and goals +=================== + +kmemtrace was designed to handle rather large amounts of data. Thus, it uses +the relay interface to export whatever is logged to userspace, which then +stores it. Analysis and reporting is done asynchronously, that is, after the +data is collected and stored. By design, it allows one to log and analyse +on different machines and different arches. + +As this is a debugging feature, kmemtrace's ABI is not designed to be very +stable, although this may happen in the future if it's deemed mature and +sufficient. So the userspace tool does not contain a copy of the kernel +header. Instead, the ABI allows checking if the logged data matches the +userspace tool. Well, what I said about ABI stability isn't totally true: +while I've tried hard to cover all possible (and useful) use cases, I don't +want it frozen in the current state. I anticipate the ABI will be _quite_ +stable, even across multiple stable kernel versions, but I don't make any +guarantees regarding this matter. + +Summary of design goals: + - allow logging and analysis to be done across different machines + - be fast and anticipate usage in high-load environments (*) + - be reasonably extensible + - have a _reasonably_ (not completely) stable ABI + +(*) - one of the reasons Pekka Enberg's original userspace data analysis + tool's code was rewritten from Perl to C (although this is more than a + simple conversion) + + +II. Quick usage guide +===================== + +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable +CONFIG_KMEMTRACE). + +2) Get the userspace tool and build it: +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository +$ cd kmemtrace-user/ +$ autoreconf +$ ./configure # Supply KERNEL_SOURCES=/path/to/sources/ if you're + # _not_ running this on a kmemtrace-enabled kernel. +$ make + +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the +'single' runlevel (so that relay buffers don't fill up easily), and run +kmemtrace: +# '$' does not mean user, but root here. +$ mount -t debugfs none /debug +$ mount -t proc none /proc +$ cd path/to/kmemtrace-user/ +$ ./kmemtraced +Wait a bit, then stop it with CTRL+C. +$ cat /debug/kmemtrace/total_overruns # Check if we didn't overrun, should + # be zero. +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to + check its correctness] +$ ./kmemtrace-report + +Now you should have a nice and short summary of how the allocator performs. + +III. FAQ and known issues +========================= +Q: 'cat /debug/kmemtrace/total_overruns' is non-zero, how do I fix this? +Should I worry? +A: If it's non-zero, this affects kmemtrace's accuracy, depending on how +large the number is. You can fix it by supplying a higher +'kmemtrace.subbufs=N' kernel parameter. +--- + +Q: kmemtrace_check reports errors, how do I fix this? Should I worry? +A: This is a bug and should be reported. It can occur for a variety of +reasons: + - possible bugs in relay code + - possible misuse of relay by kmemtrace + - timestamps being collected unorderly +Or you may fix it yourself and send us a patch. +--- + +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? +A: This is a known issue and I'm working on it. These might be true errors +in kernel code, which may have inconsistent behavior (e.g. allocating memory +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed +out this behavior may work with SLAB, but may fail with other allocators. + +It may also be due to lack of tracing in some unusual allocator functions. + +We don't want bug reports regarding this issue yet. +--- + diff --git a/MAINTAINERS b/MAINTAINERS index 56a2f67..e967bc2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2425,6 +2425,12 @@ M: jason.wessel@windriver.com L: kgdb-bugreport@lists.sourceforge.net S: Maintained +KMEMTRACE +P: Eduard - Gabriel Munteanu +M: eduard.munteanu@linux360.ro +L: linux-kernel@vger.kernel.org +S: Maintained + KPROBES P: Ananth N Mavinakayanahalli M: ananth@in.ibm.com diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h new file mode 100644 index 0000000..da69d22 --- /dev/null +++ b/include/linux/kmemtrace.h @@ -0,0 +1,110 @@ +/* + * Copyright (C) 2008 Eduard - Gabriel Munteanu + * + * This file is released under GPL version 2. + */ + +#ifndef _LINUX_KMEMTRACE_H +#define _LINUX_KMEMTRACE_H + +#include <linux/types.h> + +/* ABI definition starts here. */ + +#define KMEMTRACE_ABI_VERSION 1 + +enum kmemtrace_event_id { + KMEMTRACE_EVENT_NULL = 0, /* Erroneous event. */ + KMEMTRACE_EVENT_ALLOC, + KMEMTRACE_EVENT_FREE, +}; + +enum kmemtrace_type_id { + KMEMTRACE_TYPE_KERNEL = 0, /* kmalloc() / kfree(). */ + KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */ + KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */ +}; + +struct kmemtrace_event { + __u16 event_id; /* Allocate or free? */ + __u16 type_id; /* Kind of allocation/free. */ + __s32 node; /* Target CPU. */ + __u64 call_site; /* Caller address. */ + __u64 ptr; /* Pointer to allocation. */ + __u64 bytes_req; /* Number of bytes requested. */ + __u64 bytes_alloc; /* Number of bytes allocated. */ + __u64 gfp_flags; /* Requested flags. */ + __s64 timestamp; /* When the operation occured in ns. */ +} __attribute__ ((__packed__)); + +/* End of ABI definition. */ + +#ifdef __KERNEL__ + +#include <linux/marker.h> + +#ifdef CONFIG_KMEMTRACE + +extern void kmemtrace_init(void); + +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr, + size_t bytes_req, + size_t bytes_alloc, + unsigned long gfp_flags, + int node) +{ + trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu " + "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d", + type_id, call_site, (unsigned long) ptr, + bytes_req, bytes_alloc, gfp_flags, node); +} + +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr) +{ + trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu", + type_id, call_site, (unsigned long) ptr); +} + +#else /* CONFIG_KMEMTRACE */ + +static inline void kmemtrace_init(void) +{ +} + +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr, + size_t bytes_req, + size_t bytes_alloc, + unsigned long gfp_flags, + int node) +{ +} + +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr) +{ +} + +#endif /* CONFIG_KMEMTRACE */ + +static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr, + size_t bytes_req, + size_t bytes_alloc, + unsigned long gfp_flags) +{ + kmemtrace_mark_alloc_node(type_id, call_site, ptr, + bytes_req, bytes_alloc, gfp_flags, -1); +} + +#endif /* __KERNEL__ */ + +#endif /* _LINUX_KMEMTRACE_H */ + diff --git a/init/main.c b/init/main.c index 057f364..c00659c 100644 --- a/init/main.c +++ b/init/main.c @@ -66,6 +66,7 @@ #include <asm/setup.h> #include <asm/sections.h> #include <asm/cacheflush.h> +#include <linux/kmemtrace.h> #ifdef CONFIG_X86_LOCAL_APIC #include <asm/smp.h> @@ -641,6 +642,7 @@ asmlinkage void __init start_kernel(void) enable_debug_pagealloc(); cpu_hotplug_init(); kmem_cache_init(); + kmemtrace_init(); debug_objects_mem_init(); idr_init_cache(); setup_per_cpu_pageset(); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index d2099f4..6bacab5 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -674,6 +674,10 @@ config FIREWIRE_OHCI_REMOTE_DMA If unsure, say N. +config KMEMTRACE + bool "Kernel memory tracer" + depends on RELAY && DEBUG_FS && MARKERS + source "samples/Kconfig" source "lib/Kconfig.kgdb" diff --git a/mm/Makefile b/mm/Makefile index 18c143b..d88a3bc 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -33,4 +33,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o - +obj-$(CONFIG_KMEMTRACE) += kmemtrace.o diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c new file mode 100644 index 0000000..9258010 --- /dev/null +++ b/mm/kmemtrace.c @@ -0,0 +1,208 @@ +/* + * Copyright (C) 2008 Pekka Enberg, Eduard - Gabriel Munteanu + * + * This file is released under GPL version 2. + */ + +#include <linux/string.h> +#include <linux/debugfs.h> +#include <linux/relay.h> +#include <linux/module.h> +#include <linux/marker.h> +#include <linux/gfp.h> +#include <linux/kmemtrace.h> + +#define KMEMTRACE_SUBBUF_SIZE (8192 * sizeof(struct kmemtrace_event)) +#define KMEMTRACE_N_SUBBUFS 20 + +static struct rchan *kmemtrace_chan; +static u32 kmemtrace_buf_overruns; +static unsigned int kmemtrace_n_subbufs; + +static inline void kmemtrace_log_event(struct kmemtrace_event *event) +{ + relay_write(kmemtrace_chan, event, sizeof(struct kmemtrace_event)); +} + +static void kmemtrace_probe_alloc(void *probe_data, void *call_data, + const char *format, va_list *args) +{ + unsigned long flags; + struct kmemtrace_event ev; + + /* + * Don't convert this to use structure initializers, + * C99 does not guarantee the rvalues evaluation order. + */ + ev.event_id = KMEMTRACE_EVENT_ALLOC; + ev.type_id = va_arg(*args, int); + ev.call_site = va_arg(*args, unsigned long); + ev.ptr = va_arg(*args, unsigned long); + /* Don't trace ignored allocations. */ + if (!ev.ptr) + return; + ev.bytes_req = va_arg(*args, unsigned long); + ev.bytes_alloc = va_arg(*args, unsigned long); + /* ev.timestamp set below, to preserve event ordering. */ + ev.gfp_flags = va_arg(*args, unsigned long); + ev.node = va_arg(*args, int); + + local_irq_save(flags); + ev.timestamp = ktime_to_ns(ktime_get()); + kmemtrace_log_event(&ev); + local_irq_restore(flags); +} + +static void kmemtrace_probe_free(void *probe_data, void *call_data, + const char *format, va_list *args) +{ + unsigned long flags; + struct kmemtrace_event ev; + + /* + * Don't convert this to use structure initializers, + * C99 does not guarantee the rvalues evaluation order. + */ + ev.event_id = KMEMTRACE_EVENT_FREE; + ev.type_id = va_arg(*args, int); + ev.call_site = va_arg(*args, unsigned long); + ev.ptr = va_arg(*args, unsigned long); + /* Don't trace ignored allocations. */ + if (!ev.ptr) + return; + /* ev.timestamp set below, to preserve event ordering. */ + + local_irq_save(flags); + ev.timestamp = ktime_to_ns(ktime_get()); + kmemtrace_log_event(&ev); + local_irq_restore(flags); +} + +static struct dentry * +kmemtrace_create_buf_file(const char *filename, struct dentry *parent, + int mode, struct rchan_buf *buf, int *is_global) +{ + return debugfs_create_file(filename, mode, parent, buf, + &relay_file_operations); +} + +static int kmemtrace_remove_buf_file(struct dentry *dentry) +{ + debugfs_remove(dentry); + + return 0; +} + +static int kmemtrace_count_overruns(struct rchan_buf *buf, + void *subbuf, void *prev_subbuf, + size_t prev_padding) +{ + if (relay_buf_full(buf)) { + kmemtrace_buf_overruns++; + return 0; + } + + return 1; +} + +static struct rchan_callbacks relay_callbacks = { + .create_buf_file = kmemtrace_create_buf_file, + .remove_buf_file = kmemtrace_remove_buf_file, + .subbuf_start = kmemtrace_count_overruns, +}; + +static struct dentry *kmemtrace_dir; +static struct dentry *kmemtrace_overruns_dentry; + +static void kmemtrace_cleanup(void) +{ + relay_close(kmemtrace_chan); + marker_probe_unregister("kmemtrace_alloc", + kmemtrace_probe_alloc, NULL); + marker_probe_unregister("kmemtrace_free", + kmemtrace_probe_free, NULL); + if (kmemtrace_overruns_dentry) + debugfs_remove(kmemtrace_overruns_dentry); +} + +static int __init kmemtrace_setup_late(void) +{ + if (!kmemtrace_chan) + goto failed; + + kmemtrace_dir = debugfs_create_dir("kmemtrace", NULL); + if (!kmemtrace_dir) + goto cleanup; + + kmemtrace_overruns_dentry = + debugfs_create_u32("total_overruns", S_IRUSR, + kmemtrace_dir, &kmemtrace_buf_overruns); + if (!kmemtrace_overruns_dentry) + goto dir_cleanup; + + if (relay_late_setup_files(kmemtrace_chan, "cpu", kmemtrace_dir)) + goto overrun_cleanup; + + printk(KERN_INFO "kmemtrace: fully up.\n"); + + return 0; + +overrun_cleanup: + debugfs_remove(kmemtrace_overruns_dentry); + kmemtrace_overruns_dentry = NULL; +dir_cleanup: + debugfs_remove(kmemtrace_dir); +cleanup: + kmemtrace_cleanup(); +failed: + return 1; +} +late_initcall(kmemtrace_setup_late); + +static int __init kmemtrace_set_subbuf_size(char *str) +{ + get_option(&str, &kmemtrace_n_subbufs); + return 0; +} +early_param("kmemtrace.subbufs", kmemtrace_set_subbuf_size); + +void kmemtrace_init(void) +{ + int err; + + if (!kmemtrace_n_subbufs) + kmemtrace_n_subbufs = KMEMTRACE_N_SUBBUFS; + + kmemtrace_chan = relay_open(NULL, NULL, KMEMTRACE_SUBBUF_SIZE, + kmemtrace_n_subbufs, &relay_callbacks, + NULL); + if (!kmemtrace_chan) { + printk(KERN_INFO "kmemtrace: could not open relay channel\n"); + return; + } + + err = marker_probe_register("kmemtrace_alloc", "type_id %d " + "call_site %lu ptr %lu " + "bytes_req %lu bytes_alloc %lu " + "gfp_flags %lu node %d", + kmemtrace_probe_alloc, NULL); + if (err) + goto probe_alloc_fail; + err = marker_probe_register("kmemtrace_free", "type_id %d " + "call_site %lu ptr %lu", + kmemtrace_probe_free, NULL); + if (err) + goto probe_free_fail; + + printk(KERN_INFO "kmemtrace: early init successful.\n"); + return; + +probe_free_fail: + err = marker_probe_unregister("kmemtrace_alloc", + kmemtrace_probe_alloc, NULL); + printk(KERN_INFO "kmemtrace: could not register marker probes!\n"); +probe_alloc_fail: + relay_close(kmemtrace_chan); + kmemtrace_chan = NULL; +} + -- 1.5.6.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 2/4] kmemtrace: SLAB hooks. 2008-07-17 0:46 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu @ 2008-07-17 0:46 ` Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Eduard - Gabriel Munteanu 2008-07-17 7:38 ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Pekka Enberg 2008-07-17 8:01 ` [RFC PATCH 1/4] kmemtrace: Core implementation Pekka Enberg 2008-07-17 21:34 ` Randy Dunlap 2 siblings, 2 replies; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-17 0:46 UTC (permalink / raw) To: penberg; +Cc: cl, linux-mm, linux-kernel This adds hooks for the SLAB allocator, to allow tracing with kmemtrace. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> --- include/linux/slab_def.h | 56 +++++++++++++++++++++++++++++++++++++----- mm/slab.c | 61 +++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 104 insertions(+), 13 deletions(-) diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 39c3a5e..77b8045 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -14,6 +14,7 @@ #include <asm/page.h> /* kmalloc_sizes.h needs PAGE_SIZE */ #include <asm/cache.h> /* kmalloc_sizes.h needs L1_CACHE_BYTES */ #include <linux/compiler.h> +#include <linux/kmemtrace.h> /* Size description struct for general caches. */ struct cache_sizes { @@ -28,8 +29,20 @@ extern struct cache_sizes malloc_sizes[]; void *kmem_cache_alloc(struct kmem_cache *, gfp_t); void *__kmalloc(size_t size, gfp_t flags); +#ifdef CONFIG_KMEMTRACE +extern void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags); +#else +static inline void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, + gfp_t flags) +{ + return kmem_cache_alloc(cachep, flags); +} +#endif + static inline void *kmalloc(size_t size, gfp_t flags) { + void *ret; + if (__builtin_constant_p(size)) { int i = 0; @@ -50,10 +63,17 @@ static inline void *kmalloc(size_t size, gfp_t flags) found: #ifdef CONFIG_ZONE_DMA if (flags & GFP_DMA) - return kmem_cache_alloc(malloc_sizes[i].cs_dmacachep, - flags); + ret = kmem_cache_alloc_notrace( + malloc_sizes[i].cs_dmacachep, flags); + else #endif - return kmem_cache_alloc(malloc_sizes[i].cs_cachep, flags); + ret = kmem_cache_alloc_notrace( + malloc_sizes[i].cs_cachep, flags); + + kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, _THIS_IP_, ret, + size, malloc_sizes[i].cs_size, flags); + + return ret; } return __kmalloc(size, flags); } @@ -62,8 +82,23 @@ found: extern void *__kmalloc_node(size_t size, gfp_t flags, int node); extern void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node); +#ifdef CONFIG_KMEMTRACE +extern void *kmem_cache_alloc_node_notrace(struct kmem_cache *cachep, + gfp_t flags, + int nodeid); +#else +static inline void *kmem_cache_alloc_node_notrace(struct kmem_cache *cachep, + gfp_t flags, + int nodeid) +{ + return kmem_cache_alloc_node(cachep, flags, nodeid); +} +#endif + static inline void *kmalloc_node(size_t size, gfp_t flags, int node) { + void *ret; + if (__builtin_constant_p(size)) { int i = 0; @@ -84,11 +119,18 @@ static inline void *kmalloc_node(size_t size, gfp_t flags, int node) found: #ifdef CONFIG_ZONE_DMA if (flags & GFP_DMA) - return kmem_cache_alloc_node(malloc_sizes[i].cs_dmacachep, - flags, node); + ret = kmem_cache_alloc_node_notrace( + malloc_sizes[i].cs_dmacachep, flags, node); + else #endif - return kmem_cache_alloc_node(malloc_sizes[i].cs_cachep, - flags, node); + ret = kmem_cache_alloc_node_notrace( + malloc_sizes[i].cs_cachep, flags, node); + + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, _THIS_IP_, + ret, size, malloc_sizes[i].cs_size, + flags, node); + + return ret; } return __kmalloc_node(size, flags, node); } diff --git a/mm/slab.c b/mm/slab.c index 046607f..e9a61ac 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -111,6 +111,7 @@ #include <linux/rtmutex.h> #include <linux/reciprocal_div.h> #include <linux/debugobjects.h> +#include <linux/kmemtrace.h> #include <asm/cacheflush.h> #include <asm/tlbflush.h> @@ -3621,10 +3622,23 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp) */ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) { - return __cache_alloc(cachep, flags, __builtin_return_address(0)); + void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0)); + + kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret, + obj_size(cachep), obj_size(cachep), flags); + + return ret; } EXPORT_SYMBOL(kmem_cache_alloc); +#ifdef CONFIG_KMEMTRACE +void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags) +{ + return __cache_alloc(cachep, flags, __builtin_return_address(0)); +} +EXPORT_SYMBOL(kmem_cache_alloc_notrace); +#endif + /** * kmem_ptr_validate - check if an untrusted pointer might be a slab entry. * @cachep: the cache we're checking against @@ -3669,20 +3683,44 @@ out: #ifdef CONFIG_NUMA void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid) { - return __cache_alloc_node(cachep, flags, nodeid, - __builtin_return_address(0)); + void *ret = __cache_alloc_node(cachep, flags, nodeid, + __builtin_return_address(0)); + + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret, + obj_size(cachep), obj_size(cachep), + flags, nodeid); + + return ret; } EXPORT_SYMBOL(kmem_cache_alloc_node); +#ifdef CONFIG_KMEMTRACE +void *kmem_cache_alloc_node_notrace(struct kmem_cache *cachep, + gfp_t flags, + int nodeid) +{ + return __cache_alloc_node(cachep, flags, nodeid, + __builtin_return_address(0)); +} +EXPORT_SYMBOL(kmem_cache_alloc_node_notrace); +#endif + static __always_inline void * __do_kmalloc_node(size_t size, gfp_t flags, int node, void *caller) { struct kmem_cache *cachep; + void *ret; cachep = kmem_find_general_cachep(size, flags); if (unlikely(ZERO_OR_NULL_PTR(cachep))) return cachep; - return kmem_cache_alloc_node(cachep, flags, node); + ret = kmem_cache_alloc_node_notrace(cachep, flags, node); + + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, + (unsigned long) caller, ret, + size, cachep->buffer_size, flags, node); + + return ret; } #ifdef CONFIG_DEBUG_SLAB @@ -3718,6 +3756,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags, void *caller) { struct kmem_cache *cachep; + void *ret; /* If you want to save a few bytes .text space: replace * __ with kmem_. @@ -3727,11 +3766,17 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags, cachep = __find_general_cachep(size, flags); if (unlikely(ZERO_OR_NULL_PTR(cachep))) return cachep; - return __cache_alloc(cachep, flags, caller); + ret = __cache_alloc(cachep, flags, caller); + + kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, + (unsigned long) caller, ret, + size, cachep->buffer_size, flags); + + return ret; } -#ifdef CONFIG_DEBUG_SLAB +#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_KMEMTRACE) void *__kmalloc(size_t size, gfp_t flags) { return __do_kmalloc(size, flags, __builtin_return_address(0)); @@ -3770,6 +3815,8 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp) debug_check_no_obj_freed(objp, obj_size(cachep)); __cache_free(cachep, objp); local_irq_restore(flags); + + kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, objp); } EXPORT_SYMBOL(kmem_cache_free); @@ -3796,6 +3843,8 @@ void kfree(const void *objp) debug_check_no_obj_freed(objp, obj_size(c)); __cache_free(c, (void *)objp); local_irq_restore(flags); + + kmemtrace_mark_free(KMEMTRACE_TYPE_KERNEL, _RET_IP_, objp); } EXPORT_SYMBOL(kfree); -- 1.5.6.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 3/4] kmemtrace: SLUB hooks. 2008-07-17 0:46 ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Eduard - Gabriel Munteanu @ 2008-07-17 0:46 ` Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 4/4] kmemtrace: SLOB hooks Eduard - Gabriel Munteanu 2008-07-17 7:46 ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Pekka Enberg 2008-07-17 7:38 ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Pekka Enberg 1 sibling, 2 replies; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-17 0:46 UTC (permalink / raw) To: penberg; +Cc: cl, linux-mm, linux-kernel This adds hooks for the SLUB allocator, to allow tracing with kmemtrace. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> --- include/linux/slub_def.h | 9 +++++++- mm/slub.c | 47 ++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 49 insertions(+), 7 deletions(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index d117ea2..0cef121 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -10,6 +10,7 @@ #include <linux/gfp.h> #include <linux/workqueue.h> #include <linux/kobject.h> +#include <linux/kmemtrace.h> enum stat_item { ALLOC_FASTPATH, /* Allocation from cpu slab */ @@ -205,7 +206,13 @@ void *__kmalloc(size_t size, gfp_t flags); static __always_inline void *kmalloc_large(size_t size, gfp_t flags) { - return (void *)__get_free_pages(flags | __GFP_COMP, get_order(size)); + unsigned int order = get_order(size); + void *ret = (void *) __get_free_pages(flags, order); + + kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, _THIS_IP_, ret, + size, PAGE_SIZE << order, flags); + + return ret; } static __always_inline void *kmalloc(size_t size, gfp_t flags) diff --git a/mm/slub.c b/mm/slub.c index 315c392..a6f930f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -23,6 +23,7 @@ #include <linux/kallsyms.h> #include <linux/memory.h> #include <linux/math64.h> +#include <linux/kmemtrace.h> /* * Lock order: @@ -1652,14 +1653,25 @@ static __always_inline void *slab_alloc(struct kmem_cache *s, void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) { - return slab_alloc(s, gfpflags, -1, __builtin_return_address(0)); + void *ret = slab_alloc(s, gfpflags, -1, __builtin_return_address(0)); + + kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret, + s->objsize, s->size, gfpflags); + + return ret; } EXPORT_SYMBOL(kmem_cache_alloc); #ifdef CONFIG_NUMA void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node) { - return slab_alloc(s, gfpflags, node, __builtin_return_address(0)); + void *ret = slab_alloc(s, gfpflags, node, + __builtin_return_address(0)); + + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret, + s->objsize, s->size, gfpflags, node); + + return ret; } EXPORT_SYMBOL(kmem_cache_alloc_node); #endif @@ -1771,6 +1783,8 @@ void kmem_cache_free(struct kmem_cache *s, void *x) page = virt_to_head_page(x); slab_free(s, page, x, __builtin_return_address(0)); + + kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x); } EXPORT_SYMBOL(kmem_cache_free); @@ -2676,6 +2690,7 @@ static struct kmem_cache *get_slab(size_t size, gfp_t flags) void *__kmalloc(size_t size, gfp_t flags) { struct kmem_cache *s; + void *ret; if (unlikely(size > PAGE_SIZE)) return kmalloc_large(size, flags); @@ -2685,7 +2700,12 @@ void *__kmalloc(size_t size, gfp_t flags) if (unlikely(ZERO_OR_NULL_PTR(s))) return s; - return slab_alloc(s, flags, -1, __builtin_return_address(0)); + ret = slab_alloc(s, flags, -1, __builtin_return_address(0)); + + kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, _RET_IP_, ret, + size, (size_t) s->size, (unsigned long) flags); + + return ret; } EXPORT_SYMBOL(__kmalloc); @@ -2704,16 +2724,29 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) void *__kmalloc_node(size_t size, gfp_t flags, int node) { struct kmem_cache *s; + void *ret; - if (unlikely(size > PAGE_SIZE)) - return kmalloc_large_node(size, flags, node); + if (unlikely(size > PAGE_SIZE)) { + ret = kmalloc_large_node(size, flags, node); + + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, _RET_IP_, ret, + size, PAGE_SIZE << get_order(size), + (unsigned long) flags, node); + + return ret; + } s = get_slab(size, flags); if (unlikely(ZERO_OR_NULL_PTR(s))) return s; - return slab_alloc(s, flags, node, __builtin_return_address(0)); + ret = slab_alloc(s, flags, node, __builtin_return_address(0)); + + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, _RET_IP_, ret, + size, s->size, (unsigned long) flags, node); + + return ret; } EXPORT_SYMBOL(__kmalloc_node); #endif @@ -2771,6 +2804,8 @@ void kfree(const void *x) return; } slab_free(page->slab, page, object, __builtin_return_address(0)); + + kmemtrace_mark_free(KMEMTRACE_TYPE_KERNEL, _RET_IP_, x); } EXPORT_SYMBOL(kfree); -- 1.5.6.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 4/4] kmemtrace: SLOB hooks. 2008-07-17 0:46 ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Eduard - Gabriel Munteanu @ 2008-07-17 0:46 ` Eduard - Gabriel Munteanu 2008-07-17 7:43 ` Pekka Enberg 2008-07-17 7:46 ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Pekka Enberg 1 sibling, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-17 0:46 UTC (permalink / raw) To: penberg; +Cc: cl, linux-mm, linux-kernel This adds hooks for the SLOB allocator, to allow tracing with kmemtrace. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> --- mm/slob.c | 37 +++++++++++++++++++++++++++++++------ 1 files changed, 31 insertions(+), 6 deletions(-) diff --git a/mm/slob.c b/mm/slob.c index a3ad667..0335c01 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -65,6 +65,7 @@ #include <linux/module.h> #include <linux/rcupdate.h> #include <linux/list.h> +#include <linux/kmemtrace.h> #include <asm/atomic.h> /* @@ -463,27 +464,38 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node) { unsigned int *m; int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); + void *ret; if (size < PAGE_SIZE - align) { if (!size) return ZERO_SIZE_PTR; m = slob_alloc(size + align, gfp, align, node); + if (!m) return NULL; *m = size; - return (void *)m + align; + ret = (void *)m + align; + + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, + _RET_IP_, ret, + size, size + align, gfp, node); } else { - void *ret; + unsigned int order = get_order(size); - ret = slob_new_page(gfp | __GFP_COMP, get_order(size), node); + ret = slob_new_page(gfp | __GFP_COMP, order, node); if (ret) { struct page *page; page = virt_to_page(ret); page->private = size; } - return ret; + + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, + _RET_IP_, ret, + size, PAGE_SIZE << order, gfp, node); } + + return ret; } EXPORT_SYMBOL(__kmalloc_node); @@ -501,6 +513,8 @@ void kfree(const void *block) slob_free(m, *m + align); } else put_page(&sp->page); + + kmemtrace_mark_free(KMEMTRACE_TYPE_KERNEL, _RET_IP_, block); } EXPORT_SYMBOL(kfree); @@ -569,10 +583,19 @@ void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node) { void *b; - if (c->size < PAGE_SIZE) + if (c->size < PAGE_SIZE) { b = slob_alloc(c->size, flags, c->align, node); - else + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, + _RET_IP_, b, c->size, + SLOB_UNITS(c->size) * SLOB_UNIT, + flags, node); + } else { b = slob_new_page(flags, get_order(c->size), node); + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, + _RET_IP_, b, c->size, + PAGE_SIZE << get_order(c->size), + flags, node); + } if (c->ctor) c->ctor(c, b); @@ -608,6 +631,8 @@ void kmem_cache_free(struct kmem_cache *c, void *b) } else { __kmem_cache_free(b, c->size); } + + kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b); } EXPORT_SYMBOL(kmem_cache_free); -- 1.5.6.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/4] kmemtrace: SLOB hooks. 2008-07-17 0:46 ` [RFC PATCH 4/4] kmemtrace: SLOB hooks Eduard - Gabriel Munteanu @ 2008-07-17 7:43 ` Pekka Enberg 2008-07-17 15:46 ` Matt Mackall 0 siblings, 1 reply; 30+ messages in thread From: Pekka Enberg @ 2008-07-17 7:43 UTC (permalink / raw) To: Eduard - Gabriel Munteanu; +Cc: cl, linux-mm, linux-kernel, Matt Mackall Hi, [Adding Matt as cc.] On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> wrote: > This adds hooks for the SLOB allocator, to allow tracing with kmemtrace. > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> > --- > mm/slob.c | 37 +++++++++++++++++++++++++++++++------ > 1 files changed, 31 insertions(+), 6 deletions(-) > > diff --git a/mm/slob.c b/mm/slob.c > index a3ad667..0335c01 100644 > --- a/mm/slob.c > +++ b/mm/slob.c > @@ -65,6 +65,7 @@ > #include <linux/module.h> > #include <linux/rcupdate.h> > #include <linux/list.h> > +#include <linux/kmemtrace.h> > #include <asm/atomic.h> > > /* > @@ -463,27 +464,38 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node) > { > unsigned int *m; > int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); > + void *ret; > > if (size < PAGE_SIZE - align) { > if (!size) > return ZERO_SIZE_PTR; > > m = slob_alloc(size + align, gfp, align, node); > + > if (!m) > return NULL; > *m = size; > - return (void *)m + align; > + ret = (void *)m + align; > + > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, > + _RET_IP_, ret, > + size, size + align, gfp, node); > } else { > - void *ret; > + unsigned int order = get_order(size); > > - ret = slob_new_page(gfp | __GFP_COMP, get_order(size), node); > + ret = slob_new_page(gfp | __GFP_COMP, order, node); > if (ret) { > struct page *page; > page = virt_to_page(ret); > page->private = size; > } > - return ret; > + > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, > + _RET_IP_, ret, > + size, PAGE_SIZE << order, gfp, node); > } > + > + return ret; > } > EXPORT_SYMBOL(__kmalloc_node); > > @@ -501,6 +513,8 @@ void kfree(const void *block) > slob_free(m, *m + align); > } else > put_page(&sp->page); > + > + kmemtrace_mark_free(KMEMTRACE_TYPE_KERNEL, _RET_IP_, block); > } > EXPORT_SYMBOL(kfree); > > @@ -569,10 +583,19 @@ void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node) > { > void *b; > > - if (c->size < PAGE_SIZE) > + if (c->size < PAGE_SIZE) { > b = slob_alloc(c->size, flags, c->align, node); > - else > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, > + _RET_IP_, b, c->size, > + SLOB_UNITS(c->size) * SLOB_UNIT, > + flags, node); > + } else { > b = slob_new_page(flags, get_order(c->size), node); > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, > + _RET_IP_, b, c->size, > + PAGE_SIZE << get_order(c->size), > + flags, node); > + } > > if (c->ctor) > c->ctor(c, b); > @@ -608,6 +631,8 @@ void kmem_cache_free(struct kmem_cache *c, void *b) > } else { > __kmem_cache_free(b, c->size); > } > + > + kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b); > } > EXPORT_SYMBOL(kmem_cache_free); > > -- > 1.5.6.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/4] kmemtrace: SLOB hooks. 2008-07-17 7:43 ` Pekka Enberg @ 2008-07-17 15:46 ` Matt Mackall 0 siblings, 0 replies; 30+ messages in thread From: Matt Mackall @ 2008-07-17 15:46 UTC (permalink / raw) To: Pekka Enberg; +Cc: Eduard - Gabriel Munteanu, cl, linux-mm, linux-kernel On Thu, 2008-07-17 at 10:43 +0300, Pekka Enberg wrote: > Hi, > > [Adding Matt as cc.] > > On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu > <eduard.munteanu@linux360.ro> wrote: > > This adds hooks for the SLOB allocator, to allow tracing with kmemtrace. > > > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > > Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Matt Mackall <mpm@selenic.com> -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 3/4] kmemtrace: SLUB hooks. 2008-07-17 0:46 ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 4/4] kmemtrace: SLOB hooks Eduard - Gabriel Munteanu @ 2008-07-17 7:46 ` Pekka Enberg 2008-07-17 18:06 ` Eduard - Gabriel Munteanu 1 sibling, 1 reply; 30+ messages in thread From: Pekka Enberg @ 2008-07-17 7:46 UTC (permalink / raw) To: Eduard - Gabriel Munteanu; +Cc: cl, linux-mm, linux-kernel On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> wrote: > This adds hooks for the SLUB allocator, to allow tracing with kmemtrace. > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > --- > include/linux/slub_def.h | 9 +++++++- > mm/slub.c | 47 ++++++++++++++++++++++++++++++++++++++++----- > 2 files changed, 49 insertions(+), 7 deletions(-) > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index d117ea2..0cef121 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -10,6 +10,7 @@ > #include <linux/gfp.h> > #include <linux/workqueue.h> > #include <linux/kobject.h> > +#include <linux/kmemtrace.h> > > enum stat_item { > ALLOC_FASTPATH, /* Allocation from cpu slab */ > @@ -205,7 +206,13 @@ void *__kmalloc(size_t size, gfp_t flags); > > static __always_inline void *kmalloc_large(size_t size, gfp_t flags) > { > - return (void *)__get_free_pages(flags | __GFP_COMP, get_order(size)); > + unsigned int order = get_order(size); > + void *ret = (void *) __get_free_pages(flags, order); > + > + kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, _THIS_IP_, ret, > + size, PAGE_SIZE << order, flags); > + > + return ret; > } > > static __always_inline void *kmalloc(size_t size, gfp_t flags) > diff --git a/mm/slub.c b/mm/slub.c > index 315c392..a6f930f 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -23,6 +23,7 @@ > #include <linux/kallsyms.h> > #include <linux/memory.h> > #include <linux/math64.h> > +#include <linux/kmemtrace.h> > > /* > * Lock order: > @@ -1652,14 +1653,25 @@ static __always_inline void *slab_alloc(struct kmem_cache *s, > > void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) > { > - return slab_alloc(s, gfpflags, -1, __builtin_return_address(0)); > + void *ret = slab_alloc(s, gfpflags, -1, __builtin_return_address(0)); > + > + kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret, > + s->objsize, s->size, gfpflags); > + > + return ret; > } > EXPORT_SYMBOL(kmem_cache_alloc); > > #ifdef CONFIG_NUMA > void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node) > { > - return slab_alloc(s, gfpflags, node, __builtin_return_address(0)); > + void *ret = slab_alloc(s, gfpflags, node, > + __builtin_return_address(0)); > + > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret, > + s->objsize, s->size, gfpflags, node); > + > + return ret; > } > EXPORT_SYMBOL(kmem_cache_alloc_node); > #endif > @@ -1771,6 +1783,8 @@ void kmem_cache_free(struct kmem_cache *s, void *x) > page = virt_to_head_page(x); > > slab_free(s, page, x, __builtin_return_address(0)); > + > + kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x); > } > EXPORT_SYMBOL(kmem_cache_free); > > @@ -2676,6 +2690,7 @@ static struct kmem_cache *get_slab(size_t size, gfp_t flags) > void *__kmalloc(size_t size, gfp_t flags) > { > struct kmem_cache *s; > + void *ret; > > if (unlikely(size > PAGE_SIZE)) > return kmalloc_large(size, flags); > @@ -2685,7 +2700,12 @@ void *__kmalloc(size_t size, gfp_t flags) > if (unlikely(ZERO_OR_NULL_PTR(s))) > return s; > > - return slab_alloc(s, flags, -1, __builtin_return_address(0)); > + ret = slab_alloc(s, flags, -1, __builtin_return_address(0)); > + > + kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, _RET_IP_, ret, > + size, (size_t) s->size, (unsigned long) flags); What are these casts doing here? I think you can just drop them, no? > + > + return ret; > } > EXPORT_SYMBOL(__kmalloc); > > @@ -2704,16 +2724,29 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) > void *__kmalloc_node(size_t size, gfp_t flags, int node) > { > struct kmem_cache *s; > + void *ret; > > - if (unlikely(size > PAGE_SIZE)) > - return kmalloc_large_node(size, flags, node); > + if (unlikely(size > PAGE_SIZE)) { > + ret = kmalloc_large_node(size, flags, node); > + > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, _RET_IP_, ret, > + size, PAGE_SIZE << get_order(size), > + (unsigned long) flags, node); Don't cast flags to unsigned long. The kmemtrace core should use gfp_t as the parameter type. > + > + return ret; > + } > > s = get_slab(size, flags); > > if (unlikely(ZERO_OR_NULL_PTR(s))) > return s; > > - return slab_alloc(s, flags, node, __builtin_return_address(0)); > + ret = slab_alloc(s, flags, node, __builtin_return_address(0)); > + > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, _RET_IP_, ret, > + size, s->size, (unsigned long) flags, node); Another cast here. > + > + return ret; > } > EXPORT_SYMBOL(__kmalloc_node); > #endif > @@ -2771,6 +2804,8 @@ void kfree(const void *x) > return; > } > slab_free(page->slab, page, object, __builtin_return_address(0)); > + > + kmemtrace_mark_free(KMEMTRACE_TYPE_KERNEL, _RET_IP_, x); > } > EXPORT_SYMBOL(kfree); > > -- > 1.5.6.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 3/4] kmemtrace: SLUB hooks. 2008-07-17 7:46 ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Pekka Enberg @ 2008-07-17 18:06 ` Eduard - Gabriel Munteanu 0 siblings, 0 replies; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-17 18:06 UTC (permalink / raw) To: Pekka Enberg; +Cc: cl, linux-mm, linux-kernel On Thu, Jul 17, 2008 at 10:46:51AM +0300, Pekka Enberg wrote: > On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu > <eduard.munteanu@linux360.ro> wrote: > > This adds hooks for the SLUB allocator, to allow tracing with kmemtrace. > > > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > > --- > > include/linux/slub_def.h | 9 +++++++- > > mm/slub.c | 47 ++++++++++++++++++++++++++++++++++++++++----- > > 2 files changed, 49 insertions(+), 7 deletions(-) > > What are these casts doing here? I think you can just drop them, no? Yes, I should cut all these casts off. Will resubmit soon. Eduard -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/4] kmemtrace: SLAB hooks. 2008-07-17 0:46 ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Eduard - Gabriel Munteanu @ 2008-07-17 7:38 ` Pekka Enberg 1 sibling, 0 replies; 30+ messages in thread From: Pekka Enberg @ 2008-07-17 7:38 UTC (permalink / raw) To: Eduard - Gabriel Munteanu; +Cc: cl, linux-mm, linux-kernel Hi Eduard-Gabriel, On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> wrote: > This adds hooks for the SLAB allocator, to allow tracing with kmemtrace. > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > --- > include/linux/slab_def.h | 56 +++++++++++++++++++++++++++++++++++++----- > mm/slab.c | 61 +++++++++++++++++++++++++++++++++++++++++---- > 2 files changed, 104 insertions(+), 13 deletions(-) > > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h > index 39c3a5e..77b8045 100644 > --- a/include/linux/slab_def.h > +++ b/include/linux/slab_def.h > @@ -14,6 +14,7 @@ > #include <asm/page.h> /* kmalloc_sizes.h needs PAGE_SIZE */ > #include <asm/cache.h> /* kmalloc_sizes.h needs L1_CACHE_BYTES */ > #include <linux/compiler.h> > +#include <linux/kmemtrace.h> > > /* Size description struct for general caches. */ > struct cache_sizes { > @@ -28,8 +29,20 @@ extern struct cache_sizes malloc_sizes[]; > void *kmem_cache_alloc(struct kmem_cache *, gfp_t); > void *__kmalloc(size_t size, gfp_t flags); > > +#ifdef CONFIG_KMEMTRACE > +extern void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags); > +#else > +static inline void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, > + gfp_t flags) > +{ > + return kmem_cache_alloc(cachep, flags); > +} > +#endif > + > static inline void *kmalloc(size_t size, gfp_t flags) > { > + void *ret; > + > if (__builtin_constant_p(size)) { > int i = 0; > > @@ -50,10 +63,17 @@ static inline void *kmalloc(size_t size, gfp_t flags) > found: > #ifdef CONFIG_ZONE_DMA > if (flags & GFP_DMA) > - return kmem_cache_alloc(malloc_sizes[i].cs_dmacachep, > - flags); > + ret = kmem_cache_alloc_notrace( > + malloc_sizes[i].cs_dmacachep, flags); > + else > #endif > - return kmem_cache_alloc(malloc_sizes[i].cs_cachep, flags); > + ret = kmem_cache_alloc_notrace( > + malloc_sizes[i].cs_cachep, flags); > + > + kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, _THIS_IP_, ret, > + size, malloc_sizes[i].cs_size, flags); We have malloc_sizes[i].cs_size here as the _allocated_ size (which seems wrong to be btw). > + > + return ret; > } > return __kmalloc(size, flags); > } > @@ -62,8 +82,23 @@ found: > extern void *__kmalloc_node(size_t size, gfp_t flags, int node); > extern void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node); > > +#ifdef CONFIG_KMEMTRACE > +extern void *kmem_cache_alloc_node_notrace(struct kmem_cache *cachep, > + gfp_t flags, > + int nodeid); > +#else > +static inline void *kmem_cache_alloc_node_notrace(struct kmem_cache *cachep, > + gfp_t flags, > + int nodeid) > +{ > + return kmem_cache_alloc_node(cachep, flags, nodeid); > +} > +#endif > + > static inline void *kmalloc_node(size_t size, gfp_t flags, int node) > { > + void *ret; > + > if (__builtin_constant_p(size)) { > int i = 0; > > @@ -84,11 +119,18 @@ static inline void *kmalloc_node(size_t size, gfp_t flags, int node) > found: > #ifdef CONFIG_ZONE_DMA > if (flags & GFP_DMA) > - return kmem_cache_alloc_node(malloc_sizes[i].cs_dmacachep, > - flags, node); > + ret = kmem_cache_alloc_node_notrace( > + malloc_sizes[i].cs_dmacachep, flags, node); > + else > #endif > - return kmem_cache_alloc_node(malloc_sizes[i].cs_cachep, > - flags, node); > + ret = kmem_cache_alloc_node_notrace( > + malloc_sizes[i].cs_cachep, flags, node); > + > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, _THIS_IP_, > + ret, size, malloc_sizes[i].cs_size, > + flags, node); And here. > + > + return ret; > } > return __kmalloc_node(size, flags, node); > } > > +#ifdef CONFIG_KMEMTRACE > +void *kmem_cache_alloc_node_notrace(struct kmem_cache *cachep, > + gfp_t flags, > + int nodeid) > +{ > + return __cache_alloc_node(cachep, flags, nodeid, > + __builtin_return_address(0)); > +} > +EXPORT_SYMBOL(kmem_cache_alloc_node_notrace); > +#endif > + > static __always_inline void * > __do_kmalloc_node(size_t size, gfp_t flags, int node, void *caller) > { > struct kmem_cache *cachep; > + void *ret; > > cachep = kmem_find_general_cachep(size, flags); > if (unlikely(ZERO_OR_NULL_PTR(cachep))) > return cachep; > - return kmem_cache_alloc_node(cachep, flags, node); > + ret = kmem_cache_alloc_node_notrace(cachep, flags, node); > + > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL, > + (unsigned long) caller, ret, > + size, cachep->buffer_size, flags, node); But here we use cachep->buffer_size and... > + > + return ret; > } > > #ifdef CONFIG_DEBUG_SLAB > @@ -3718,6 +3756,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags, > void *caller) > { > struct kmem_cache *cachep; > + void *ret; > > /* If you want to save a few bytes .text space: replace > * __ with kmem_. > @@ -3727,11 +3766,17 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags, > cachep = __find_general_cachep(size, flags); > if (unlikely(ZERO_OR_NULL_PTR(cachep))) > return cachep; > - return __cache_alloc(cachep, flags, caller); > + ret = __cache_alloc(cachep, flags, caller); > + > + kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, > + (unsigned long) caller, ret, > + size, cachep->buffer_size, flags); ...here as well. Why? Also, > diff --git a/mm/slab.c b/mm/slab.c > index 046607f..e9a61ac 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -111,6 +111,7 @@ > #include <linux/rtmutex.h> > #include <linux/reciprocal_div.h> > #include <linux/debugobjects.h> > +#include <linux/kmemtrace.h> > > #include <asm/cacheflush.h> > #include <asm/tlbflush.h> > @@ -3621,10 +3622,23 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp) > */ > void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) > { > - return __cache_alloc(cachep, flags, __builtin_return_address(0)); > + void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0)); > + > + kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret, > + obj_size(cachep), obj_size(cachep), flags); Here.... > + > + return ret; > } > EXPORT_SYMBOL(kmem_cache_alloc); > > +#ifdef CONFIG_KMEMTRACE > +void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags) > +{ > + return __cache_alloc(cachep, flags, __builtin_return_address(0)); > +} > +EXPORT_SYMBOL(kmem_cache_alloc_notrace); > +#endif > + > /** > * kmem_ptr_validate - check if an untrusted pointer might be a slab entry. > * @cachep: the cache we're checking against > @@ -3669,20 +3683,44 @@ out: > #ifdef CONFIG_NUMA > void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid) > { > - return __cache_alloc_node(cachep, flags, nodeid, > - __builtin_return_address(0)); > + void *ret = __cache_alloc_node(cachep, flags, nodeid, > + __builtin_return_address(0)); > + > + kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret, > + obj_size(cachep), obj_size(cachep), > + flags, nodeid); ...and here, we use obj_size(). > + > + return ret; > } > EXPORT_SYMBOL(kmem_cache_alloc_node); AFAICT, you should always use ->buffer_size as the_allocated_ size. Hmm? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-17 0:46 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Eduard - Gabriel Munteanu @ 2008-07-17 8:01 ` Pekka Enberg 2008-07-17 18:32 ` Eduard - Gabriel Munteanu 2008-07-17 21:34 ` Randy Dunlap 2 siblings, 1 reply; 30+ messages in thread From: Pekka Enberg @ 2008-07-17 8:01 UTC (permalink / raw) To: Eduard - Gabriel Munteanu Cc: cl, linux-mm, linux-kernel, Randy Dunlap, Matt Mackall Hi, [Adding Randy to cc for the Documentation/ parts and Matt for the core.] On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> wrote: > kmemtrace provides tracing for slab allocator functions, such as kmalloc, > kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed > to the userspace application in order to analyse allocation hotspots, > internal fragmentation and so on, making it possible to see how well an > allocator performs, as well as debug and profile kernel code. > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > --- > Documentation/kernel-parameters.txt | 6 + > Documentation/vm/kmemtrace.txt | 96 ++++++++++++++++ > MAINTAINERS | 6 + > include/linux/kmemtrace.h | 110 ++++++++++++++++++ > init/main.c | 2 + > lib/Kconfig.debug | 4 + > mm/Makefile | 2 +- > mm/kmemtrace.c | 208 +++++++++++++++++++++++++++++++++++ > 8 files changed, 433 insertions(+), 1 deletions(-) > create mode 100644 Documentation/vm/kmemtrace.txt > create mode 100644 include/linux/kmemtrace.h > create mode 100644 mm/kmemtrace.c > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > index b52f47d..b230aff 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -49,6 +49,7 @@ parameter is applicable: > ISAPNP ISA PnP code is enabled. > ISDN Appropriate ISDN support is enabled. > JOY Appropriate joystick support is enabled. > + KMEMTRACE kmemtrace is enabled. > LIBATA Libata driver is enabled > LP Printer support is enabled. > LOOP Loopback device support is enabled. > @@ -941,6 +942,11 @@ and is between 256 and 4096 characters. It is defined in the file > use the HighMem zone if it exists, and the Normal > zone if it does not. > > + kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of > + subbufs kmemtrace's relay channel has. Set this > + higher than default (KMEMTRACE_N_SUBBUFS in code) if > + you experience buffer overruns. > + > movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This parameter > is similar to kernelcore except it specifies the > amount of memory used for migratable allocations. > diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt > new file mode 100644 > index 0000000..1147ecb > --- /dev/null > +++ b/Documentation/vm/kmemtrace.txt > @@ -0,0 +1,96 @@ > + kmemtrace - Kernel Memory Tracer > + > + by Eduard - Gabriel Munteanu > + <eduard.munteanu@linux360.ro> > + A chapter on what kmemtrace is here would probably be helpful. > + > +I. Design and goals > +=================== > + > +kmemtrace was designed to handle rather large amounts of data. Thus, it uses > +the relay interface to export whatever is logged to userspace, which then > +stores it. Analysis and reporting is done asynchronously, that is, after the > +data is collected and stored. By design, it allows one to log and analyse > +on different machines and different arches. > + > +As this is a debugging feature, kmemtrace's ABI is not designed to be very > +stable, although this may happen in the future if it's deemed mature and > +sufficient. So the userspace tool does not contain a copy of the kernel > +header. Instead, the ABI allows checking if the logged data matches the > +userspace tool. Well, what I said about ABI stability isn't totally true: > +while I've tried hard to cover all possible (and useful) use cases, I don't > +want it frozen in the current state. I anticipate the ABI will be _quite_ > +stable, even across multiple stable kernel versions, but I don't make any > +guarantees regarding this matter. > + > +Summary of design goals: > + - allow logging and analysis to be done across different machines > + - be fast and anticipate usage in high-load environments (*) > + - be reasonably extensible > + - have a _reasonably_ (not completely) stable ABI > + > +(*) - one of the reasons Pekka Enberg's original userspace data analysis > + tool's code was rewritten from Perl to C (although this is more than a > + simple conversion) > + > + > +II. Quick usage guide > +===================== > + > +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable > +CONFIG_KMEMTRACE). > + > +2) Get the userspace tool and build it: > +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository > +$ cd kmemtrace-user/ > +$ autoreconf > +$ ./configure # Supply KERNEL_SOURCES=/path/to/sources/ if you're > + # _not_ running this on a kmemtrace-enabled kernel. > +$ make As I mentioned in private, I would prefer we drop autoconf from the userspace tool, but maybe that's just my personal preference. > + > +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the > +'single' runlevel (so that relay buffers don't fill up easily), and run > +kmemtrace: > +# '$' does not mean user, but root here. > +$ mount -t debugfs none /debug > +$ mount -t proc none /proc > +$ cd path/to/kmemtrace-user/ > +$ ./kmemtraced > +Wait a bit, then stop it with CTRL+C. > +$ cat /debug/kmemtrace/total_overruns # Check if we didn't overrun, should > + # be zero. > +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to > + check its correctness] > +$ ./kmemtrace-report > + > +Now you should have a nice and short summary of how the allocator performs. > + > +III. FAQ and known issues > +========================= > +Q: 'cat /debug/kmemtrace/total_overruns' is non-zero, how do I fix this? > +Should I worry? > +A: If it's non-zero, this affects kmemtrace's accuracy, depending on how > +large the number is. You can fix it by supplying a higher > +'kmemtrace.subbufs=N' kernel parameter. > +--- > + > +Q: kmemtrace_check reports errors, how do I fix this? Should I worry? > +A: This is a bug and should be reported. It can occur for a variety of > +reasons: > + - possible bugs in relay code > + - possible misuse of relay by kmemtrace > + - timestamps being collected unorderly > +Or you may fix it yourself and send us a patch. > +--- > + > +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? > +A: This is a known issue and I'm working on it. These might be true errors > +in kernel code, which may have inconsistent behavior (e.g. allocating memory > +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed > +out this behavior may work with SLAB, but may fail with other allocators. > + > +It may also be due to lack of tracing in some unusual allocator functions. > + > +We don't want bug reports regarding this issue yet. > +--- I think you're supposed to document the actual filesystem in Documentation/ABI as well. > + > diff --git a/MAINTAINERS b/MAINTAINERS > index 56a2f67..e967bc2 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -2425,6 +2425,12 @@ M: jason.wessel@windriver.com > L: kgdb-bugreport@lists.sourceforge.net > S: Maintained > > +KMEMTRACE > +P: Eduard - Gabriel Munteanu > +M: eduard.munteanu@linux360.ro > +L: linux-kernel@vger.kernel.org > +S: Maintained > + > KPROBES > P: Ananth N Mavinakayanahalli > M: ananth@in.ibm.com > diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h > new file mode 100644 > index 0000000..da69d22 > --- /dev/null > +++ b/include/linux/kmemtrace.h > @@ -0,0 +1,110 @@ > +/* > + * Copyright (C) 2008 Eduard - Gabriel Munteanu > + * > + * This file is released under GPL version 2. > + */ > + > +#ifndef _LINUX_KMEMTRACE_H > +#define _LINUX_KMEMTRACE_H > + > +#include <linux/types.h> > + > +/* ABI definition starts here. */ > + > +#define KMEMTRACE_ABI_VERSION 1 > + > +enum kmemtrace_event_id { > + KMEMTRACE_EVENT_NULL = 0, /* Erroneous event. */ I don't think this is used anywhere so why not drop it? > + KMEMTRACE_EVENT_ALLOC, > + KMEMTRACE_EVENT_FREE, > +}; > + > +enum kmemtrace_type_id { > + KMEMTRACE_TYPE_KERNEL = 0, /* kmalloc() / kfree(). */ > + KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */ > + KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */ I still think kernel vs. cache is confusing because both allocations *are* for the kernel. So perhaps kmalloc vs. cache? > +}; > + > +struct kmemtrace_event { So why don't we have the ABI version embedded here like blktrace has so that user-space can check if the format matches its expectations? That should be future-proof as well: as long as y ou keep the existing fields where they're at now, you can always add new fields at the end of the struct. > + __u16 event_id; /* Allocate or free? */ > + __u16 type_id; /* Kind of allocation/free. */ > + __s32 node; /* Target CPU. */ > + __u64 call_site; /* Caller address. */ > + __u64 ptr; /* Pointer to allocation. */ > + __u64 bytes_req; /* Number of bytes requested. */ > + __u64 bytes_alloc; /* Number of bytes allocated. */ > + __u64 gfp_flags; /* Requested flags. */ > + __s64 timestamp; /* When the operation occured in ns. */ > +} __attribute__ ((__packed__)); > + > +/* End of ABI definition. */ > + > +#ifdef __KERNEL__ > + > +#include <linux/marker.h> > + > +#ifdef CONFIG_KMEMTRACE > + > +extern void kmemtrace_init(void); > + > +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr, > + size_t bytes_req, > + size_t bytes_alloc, > + unsigned long gfp_flags, > + int node) > +{ > + trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu " > + "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d", > + type_id, call_site, (unsigned long) ptr, > + bytes_req, bytes_alloc, gfp_flags, node); > +} > + > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr) > +{ > + trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu", > + type_id, call_site, (unsigned long) ptr); > +} > + > +#else /* CONFIG_KMEMTRACE */ > + > +static inline void kmemtrace_init(void) > +{ > +} > + > +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr, > + size_t bytes_req, > + size_t bytes_alloc, > + unsigned long gfp_flags, > + int node) > +{ > +} > + > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr) > +{ > +} > + > +#endif /* CONFIG_KMEMTRACE */ > + > +static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr, > + size_t bytes_req, > + size_t bytes_alloc, > + unsigned long gfp_flags) > +{ > + kmemtrace_mark_alloc_node(type_id, call_site, ptr, > + bytes_req, bytes_alloc, gfp_flags, -1); > +} > + > +#endif /* __KERNEL__ */ > + > +#endif /* _LINUX_KMEMTRACE_H */ > + > diff --git a/init/main.c b/init/main.c > index 057f364..c00659c 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -66,6 +66,7 @@ > #include <asm/setup.h> > #include <asm/sections.h> > #include <asm/cacheflush.h> > +#include <linux/kmemtrace.h> > > #ifdef CONFIG_X86_LOCAL_APIC > #include <asm/smp.h> > @@ -641,6 +642,7 @@ asmlinkage void __init start_kernel(void) > enable_debug_pagealloc(); > cpu_hotplug_init(); > kmem_cache_init(); > + kmemtrace_init(); > debug_objects_mem_init(); > idr_init_cache(); > setup_per_cpu_pageset(); > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index d2099f4..6bacab5 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -674,6 +674,10 @@ config FIREWIRE_OHCI_REMOTE_DMA > > If unsure, say N. > > +config KMEMTRACE > + bool "Kernel memory tracer" > + depends on RELAY && DEBUG_FS && MARKERS > + > source "samples/Kconfig" > > source "lib/Kconfig.kgdb" > diff --git a/mm/Makefile b/mm/Makefile > index 18c143b..d88a3bc 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -33,4 +33,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o > obj-$(CONFIG_SMP) += allocpercpu.o > obj-$(CONFIG_QUICKLIST) += quicklist.o > obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o > - > +obj-$(CONFIG_KMEMTRACE) += kmemtrace.o > diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c > new file mode 100644 > index 0000000..9258010 > --- /dev/null > +++ b/mm/kmemtrace.c > @@ -0,0 +1,208 @@ > +/* > + * Copyright (C) 2008 Pekka Enberg, Eduard - Gabriel Munteanu > + * > + * This file is released under GPL version 2. > + */ > + > +#include <linux/string.h> > +#include <linux/debugfs.h> > +#include <linux/relay.h> > +#include <linux/module.h> > +#include <linux/marker.h> > +#include <linux/gfp.h> > +#include <linux/kmemtrace.h> > + > +#define KMEMTRACE_SUBBUF_SIZE (8192 * sizeof(struct kmemtrace_event)) > +#define KMEMTRACE_N_SUBBUFS 20 > + > +static struct rchan *kmemtrace_chan; > +static u32 kmemtrace_buf_overruns; > +static unsigned int kmemtrace_n_subbufs; > + > +static inline void kmemtrace_log_event(struct kmemtrace_event *event) > +{ > + relay_write(kmemtrace_chan, event, sizeof(struct kmemtrace_event)); > +} > + > +static void kmemtrace_probe_alloc(void *probe_data, void *call_data, > + const char *format, va_list *args) > +{ > + unsigned long flags; > + struct kmemtrace_event ev; > + > + /* > + * Don't convert this to use structure initializers, > + * C99 does not guarantee the rvalues evaluation order. > + */ > + ev.event_id = KMEMTRACE_EVENT_ALLOC; > + ev.type_id = va_arg(*args, int); > + ev.call_site = va_arg(*args, unsigned long); > + ev.ptr = va_arg(*args, unsigned long); > + /* Don't trace ignored allocations. */ > + if (!ev.ptr) > + return; > + ev.bytes_req = va_arg(*args, unsigned long); > + ev.bytes_alloc = va_arg(*args, unsigned long); > + /* ev.timestamp set below, to preserve event ordering. */ > + ev.gfp_flags = va_arg(*args, unsigned long); > + ev.node = va_arg(*args, int); > + > + local_irq_save(flags); Why do we disable local irqs here? (Perhaps a comment is in order.) > + ev.timestamp = ktime_to_ns(ktime_get()); > + kmemtrace_log_event(&ev); > + local_irq_restore(flags); > +} > + > +static void kmemtrace_probe_free(void *probe_data, void *call_data, > + const char *format, va_list *args) > +{ > + unsigned long flags; > + struct kmemtrace_event ev; > + > + /* > + * Don't convert this to use structure initializers, > + * C99 does not guarantee the rvalues evaluation order. > + */ > + ev.event_id = KMEMTRACE_EVENT_FREE; > + ev.type_id = va_arg(*args, int); > + ev.call_site = va_arg(*args, unsigned long); > + ev.ptr = va_arg(*args, unsigned long); > + /* Don't trace ignored allocations. */ > + if (!ev.ptr) > + return; > + /* ev.timestamp set below, to preserve event ordering. */ > + > + local_irq_save(flags); (same here) > + ev.timestamp = ktime_to_ns(ktime_get()); > + kmemtrace_log_event(&ev); > + local_irq_restore(flags); > +} > + > +static struct dentry * > +kmemtrace_create_buf_file(const char *filename, struct dentry *parent, > + int mode, struct rchan_buf *buf, int *is_global) > +{ > + return debugfs_create_file(filename, mode, parent, buf, > + &relay_file_operations); > +} > + > +static int kmemtrace_remove_buf_file(struct dentry *dentry) > +{ > + debugfs_remove(dentry); > + > + return 0; > +} > + > +static int kmemtrace_count_overruns(struct rchan_buf *buf, > + void *subbuf, void *prev_subbuf, > + size_t prev_padding) > +{ > + if (relay_buf_full(buf)) { > + kmemtrace_buf_overruns++; > + return 0; > + } > + > + return 1; > +} > + > +static struct rchan_callbacks relay_callbacks = { > + .create_buf_file = kmemtrace_create_buf_file, > + .remove_buf_file = kmemtrace_remove_buf_file, > + .subbuf_start = kmemtrace_count_overruns, > +}; > + > +static struct dentry *kmemtrace_dir; > +static struct dentry *kmemtrace_overruns_dentry; > + > +static void kmemtrace_cleanup(void) > +{ > + relay_close(kmemtrace_chan); > + marker_probe_unregister("kmemtrace_alloc", > + kmemtrace_probe_alloc, NULL); > + marker_probe_unregister("kmemtrace_free", > + kmemtrace_probe_free, NULL); > + if (kmemtrace_overruns_dentry) > + debugfs_remove(kmemtrace_overruns_dentry); > +} > + > +static int __init kmemtrace_setup_late(void) > +{ > + if (!kmemtrace_chan) > + goto failed; > + > + kmemtrace_dir = debugfs_create_dir("kmemtrace", NULL); > + if (!kmemtrace_dir) > + goto cleanup; > + > + kmemtrace_overruns_dentry = > + debugfs_create_u32("total_overruns", S_IRUSR, > + kmemtrace_dir, &kmemtrace_buf_overruns); > + if (!kmemtrace_overruns_dentry) > + goto dir_cleanup; > + > + if (relay_late_setup_files(kmemtrace_chan, "cpu", kmemtrace_dir)) > + goto overrun_cleanup; > + > + printk(KERN_INFO "kmemtrace: fully up.\n"); > + > + return 0; > + > +overrun_cleanup: > + debugfs_remove(kmemtrace_overruns_dentry); > + kmemtrace_overruns_dentry = NULL; > +dir_cleanup: > + debugfs_remove(kmemtrace_dir); > +cleanup: > + kmemtrace_cleanup(); > +failed: > + return 1; > +} > +late_initcall(kmemtrace_setup_late); > + > +static int __init kmemtrace_set_subbuf_size(char *str) > +{ > + get_option(&str, &kmemtrace_n_subbufs); > + return 0; > +} > +early_param("kmemtrace.subbufs", kmemtrace_set_subbuf_size); > + > +void kmemtrace_init(void) > +{ > + int err; > + > + if (!kmemtrace_n_subbufs) > + kmemtrace_n_subbufs = KMEMTRACE_N_SUBBUFS; > + > + kmemtrace_chan = relay_open(NULL, NULL, KMEMTRACE_SUBBUF_SIZE, > + kmemtrace_n_subbufs, &relay_callbacks, > + NULL); > + if (!kmemtrace_chan) { > + printk(KERN_INFO "kmemtrace: could not open relay channel\n"); > + return; > + } > + > + err = marker_probe_register("kmemtrace_alloc", "type_id %d " > + "call_site %lu ptr %lu " > + "bytes_req %lu bytes_alloc %lu " > + "gfp_flags %lu node %d", > + kmemtrace_probe_alloc, NULL); > + if (err) > + goto probe_alloc_fail; > + err = marker_probe_register("kmemtrace_free", "type_id %d " > + "call_site %lu ptr %lu", > + kmemtrace_probe_free, NULL); > + if (err) > + goto probe_free_fail; > + > + printk(KERN_INFO "kmemtrace: early init successful.\n"); > + return; > + > +probe_free_fail: > + err = marker_probe_unregister("kmemtrace_alloc", > + kmemtrace_probe_alloc, NULL); > + printk(KERN_INFO "kmemtrace: could not register marker probes!\n"); > +probe_alloc_fail: > + relay_close(kmemtrace_chan); > + kmemtrace_chan = NULL; > +} > + > -- > 1.5.6.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-17 8:01 ` [RFC PATCH 1/4] kmemtrace: Core implementation Pekka Enberg @ 2008-07-17 18:32 ` Eduard - Gabriel Munteanu 2008-07-18 8:48 ` Pekka J Enberg 0 siblings, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-17 18:32 UTC (permalink / raw) To: Pekka Enberg; +Cc: cl, linux-mm, linux-kernel, Randy Dunlap, Matt Mackall On Thu, Jul 17, 2008 at 11:01:21AM +0300, Pekka Enberg wrote: > Hi, > > [Adding Randy to cc for the Documentation/ parts and Matt for the core.] > > On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu > <eduard.munteanu@linux360.ro> wrote: > > diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt > > new file mode 100644 > > index 0000000..1147ecb > > --- /dev/null > > +++ b/Documentation/vm/kmemtrace.txt > > @@ -0,0 +1,96 @@ > > + kmemtrace - Kernel Memory Tracer > > + > > + by Eduard - Gabriel Munteanu > > + <eduard.munteanu@linux360.ro> > > + > > A chapter on what kmemtrace is here would probably be helpful. > Will do. > > +2) Get the userspace tool and build it: > > +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository > > +$ cd kmemtrace-user/ > > +$ autoreconf > > +$ ./configure # Supply KERNEL_SOURCES=/path/to/sources/ if you're > > + # _not_ running this on a kmemtrace-enabled kernel. > > +$ make > > As I mentioned in private, I would prefer we drop autoconf from the > userspace tool, but maybe that's just my personal preference. > Yes, I'm working on a legible plain Makefile. However, I'd leave both the autoconf variant and the plain Makefile in the package for now. Most developers can use autoconf since it's part of the standard toolset for regular userspace. > > +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? > > +A: This is a known issue and I'm working on it. These might be true errors > > +in kernel code, which may have inconsistent behavior (e.g. allocating memory > > +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed > > +out this behavior may work with SLAB, but may fail with other allocators. > > + > > +It may also be due to lack of tracing in some unusual allocator functions. > > + > > +We don't want bug reports regarding this issue yet. > > +--- > > I think you're supposed to document the actual filesystem in > Documentation/ABI as well. Sounds like a good idea, I'll get on it. > > +enum kmemtrace_event_id { > > + KMEMTRACE_EVENT_NULL = 0, /* Erroneous event. */ > > I don't think this is used anywhere so why not drop it? > We keep this here because we see all-zeros events when relay errors occur. I'd like to keep it until I'm sure the relay problem was solved (although I've not seen these errors in a while since I patched kmemtraced). > > + KMEMTRACE_EVENT_ALLOC, > > + KMEMTRACE_EVENT_FREE, > > +}; > > + > > +enum kmemtrace_type_id { > > + KMEMTRACE_TYPE_KERNEL = 0, /* kmalloc() / kfree(). */ > > + KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */ > > + KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */ > > I still think kernel vs. cache is confusing because both allocations > *are* for the kernel. So perhaps kmalloc vs. cache? > Okay, will s/TYPE_KERNEL/TYPE_KMALLOC/. > > +}; > > + > > +struct kmemtrace_event { > > So why don't we have the ABI version embedded here like blktrace has > so that user-space can check if the format matches its expectations? > That should be future-proof as well: as long as y ou keep the existing > fields where they're at now, you can always add new fields at the end > of the struct. > You can't add fields at the end, because the struct size will change and reads will be erroneous. Also, stamping every 'packet' with ABI version looks like a huge waste of space. > > + __u16 event_id; /* Allocate or free? */ > > + __u16 type_id; /* Kind of allocation/free. */ > > + __s32 node; /* Target CPU. */ > > + __u64 call_site; /* Caller address. */ > > + __u64 ptr; /* Pointer to allocation. */ > > + __u64 bytes_req; /* Number of bytes requested. */ > > + __u64 bytes_alloc; /* Number of bytes allocated. */ > > + __u64 gfp_flags; /* Requested flags. */ > > + __s64 timestamp; /* When the operation occured in ns. */ > > +} __attribute__ ((__packed__)); > > + > > + ev.bytes_req = va_arg(*args, unsigned long); > > + ev.bytes_alloc = va_arg(*args, unsigned long); > > + /* ev.timestamp set below, to preserve event ordering. */ > > + ev.gfp_flags = va_arg(*args, unsigned long); > > + ev.node = va_arg(*args, int); > > + > > + local_irq_save(flags); > > Why do we disable local irqs here? (Perhaps a comment is in order.) > We do it to preserve ordering of timestamps. Otherwise, the CPU might get preempted (by IRQs or otherwise) and the event might not be logged in the order timestamps were taken. I thought the previous comment about 'ev.timestamp' was enough. I'll make things more explicit. > > + ev.timestamp = ktime_to_ns(ktime_get()); > > + kmemtrace_log_event(&ev); > > + local_irq_restore(flags); > > +} > > + > > +static void kmemtrace_probe_free(void *probe_data, void *call_data, > > + const char *format, va_list *args) > > +{ > > + unsigned long flags; > > + struct kmemtrace_event ev; > > + > > + /* > > + * Don't convert this to use structure initializers, > > + * C99 does not guarantee the rvalues evaluation order. > > + */ > > + ev.event_id = KMEMTRACE_EVENT_FREE; > > + ev.type_id = va_arg(*args, int); > > + ev.call_site = va_arg(*args, unsigned long); > > + ev.ptr = va_arg(*args, unsigned long); > > + /* Don't trace ignored allocations. */ > > + if (!ev.ptr) > > + return; > > + /* ev.timestamp set below, to preserve event ordering. */ > > + > > + local_irq_save(flags); > > (same here) > > > + ev.timestamp = ktime_to_ns(ktime_get()); > > + kmemtrace_log_event(&ev); > > + local_irq_restore(flags); > > +} > > + > > +static struct dentry * > > +kmemtrace_create_buf_file(const char *filename, struct dentry *parent, > > + int mode, struct rchan_buf *buf, int *is_global) > > +{ > > + return debugfs_create_file(filename, mode, parent, buf, > > + &relay_file_operations); > > +} > > + > > +static int kmemtrace_remove_buf_file(struct dentry *dentry) > > +{ > > + debugfs_remove(dentry); > > + > > + return 0; > > +} > > + > > +static int kmemtrace_count_overruns(struct rchan_buf *buf, > > + void *subbuf, void *prev_subbuf, > > + size_t prev_padding) > > +{ > > + if (relay_buf_full(buf)) { > > + kmemtrace_buf_overruns++; > > + return 0; > > + } > > + > > + return 1; > > +} > > + > > +static struct rchan_callbacks relay_callbacks = { > > + .create_buf_file = kmemtrace_create_buf_file, > > + .remove_buf_file = kmemtrace_remove_buf_file, > > + .subbuf_start = kmemtrace_count_overruns, > > +}; > > + > > +static struct dentry *kmemtrace_dir; > > +static struct dentry *kmemtrace_overruns_dentry; > > + > > +static void kmemtrace_cleanup(void) > > +{ > > + relay_close(kmemtrace_chan); > > + marker_probe_unregister("kmemtrace_alloc", > > + kmemtrace_probe_alloc, NULL); > > + marker_probe_unregister("kmemtrace_free", > > + kmemtrace_probe_free, NULL); > > + if (kmemtrace_overruns_dentry) > > + debugfs_remove(kmemtrace_overruns_dentry); > > +} > > + > > +static int __init kmemtrace_setup_late(void) > > +{ > > + if (!kmemtrace_chan) > > + goto failed; > > + > > + kmemtrace_dir = debugfs_create_dir("kmemtrace", NULL); > > + if (!kmemtrace_dir) > > + goto cleanup; > > + > > + kmemtrace_overruns_dentry = > > + debugfs_create_u32("total_overruns", S_IRUSR, > > + kmemtrace_dir, &kmemtrace_buf_overruns); > > + if (!kmemtrace_overruns_dentry) > > + goto dir_cleanup; > > + > > + if (relay_late_setup_files(kmemtrace_chan, "cpu", kmemtrace_dir)) > > + goto overrun_cleanup; > > + > > + printk(KERN_INFO "kmemtrace: fully up.\n"); > > + > > + return 0; > > + > > +overrun_cleanup: > > + debugfs_remove(kmemtrace_overruns_dentry); > > + kmemtrace_overruns_dentry = NULL; > > +dir_cleanup: > > + debugfs_remove(kmemtrace_dir); > > +cleanup: > > + kmemtrace_cleanup(); > > +failed: > > + return 1; > > +} > > +late_initcall(kmemtrace_setup_late); > > + > > +static int __init kmemtrace_set_subbuf_size(char *str) > > +{ > > + get_option(&str, &kmemtrace_n_subbufs); > > + return 0; > > +} > > +early_param("kmemtrace.subbufs", kmemtrace_set_subbuf_size); > > + > > +void kmemtrace_init(void) > > +{ > > + int err; > > + > > + if (!kmemtrace_n_subbufs) > > + kmemtrace_n_subbufs = KMEMTRACE_N_SUBBUFS; > > + > > + kmemtrace_chan = relay_open(NULL, NULL, KMEMTRACE_SUBBUF_SIZE, > > + kmemtrace_n_subbufs, &relay_callbacks, > > + NULL); > > + if (!kmemtrace_chan) { > > + printk(KERN_INFO "kmemtrace: could not open relay channel\n"); > > + return; > > + } > > + > > + err = marker_probe_register("kmemtrace_alloc", "type_id %d " > > + "call_site %lu ptr %lu " > > + "bytes_req %lu bytes_alloc %lu " > > + "gfp_flags %lu node %d", > > + kmemtrace_probe_alloc, NULL); > > + if (err) > > + goto probe_alloc_fail; > > + err = marker_probe_register("kmemtrace_free", "type_id %d " > > + "call_site %lu ptr %lu", > > + kmemtrace_probe_free, NULL); > > + if (err) > > + goto probe_free_fail; > > + > > + printk(KERN_INFO "kmemtrace: early init successful.\n"); > > + return; > > + > > +probe_free_fail: > > + err = marker_probe_unregister("kmemtrace_alloc", > > + kmemtrace_probe_alloc, NULL); > > + printk(KERN_INFO "kmemtrace: could not register marker probes!\n"); > > +probe_alloc_fail: > > + relay_close(kmemtrace_chan); > > + kmemtrace_chan = NULL; > > +} > > + > > -- > > 1.5.6.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-17 18:32 ` Eduard - Gabriel Munteanu @ 2008-07-18 8:48 ` Pekka J Enberg 2008-07-18 10:13 ` Eduard - Gabriel Munteanu 0 siblings, 1 reply; 30+ messages in thread From: Pekka J Enberg @ 2008-07-18 8:48 UTC (permalink / raw) To: Eduard - Gabriel Munteanu Cc: cl, linux-mm, linux-kernel, Randy Dunlap, Matt Mackall Hi Eduard-Gabriel, On Thu, 17 Jul 2008, Eduard - Gabriel Munteanu wrote: > > > +struct kmemtrace_event { > > > > So why don't we have the ABI version embedded here like blktrace has > > so that user-space can check if the format matches its expectations? > > That should be future-proof as well: as long as y ou keep the existing > > fields where they're at now, you can always add new fields at the end > > of the struct. > > You can't add fields at the end, because the struct size will change and > reads will be erroneous. Also, stamping every 'packet' with ABI version > looks like a huge waste of space. It's an ABI so you want to make it backwards compatible and extensible. Yes, it's just for debugging, so the rules are bit more relaxed here but that's not an excuse for not designing the ABI properly. I really wish we would follow the example set by blktrace here. It uses a fixed-length header that knows the length of the rest of the packet. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-18 8:48 ` Pekka J Enberg @ 2008-07-18 10:13 ` Eduard - Gabriel Munteanu 2008-07-18 14:38 ` Pekka Enberg 0 siblings, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-18 10:13 UTC (permalink / raw) To: Pekka J Enberg; +Cc: cl, linux-mm, linux-kernel, Randy Dunlap, Matt Mackall On Fri, Jul 18, 2008 at 11:48:03AM +0300, Pekka J Enberg wrote: > Hi Eduard-Gabriel, > > On Thu, 17 Jul 2008, Eduard - Gabriel Munteanu wrote: > > > > +struct kmemtrace_event { > > > > > > So why don't we have the ABI version embedded here like blktrace has > > > so that user-space can check if the format matches its expectations? > > > That should be future-proof as well: as long as y ou keep the existing > > > fields where they're at now, you can always add new fields at the end > > > of the struct. > > > > You can't add fields at the end, because the struct size will change and > > reads will be erroneous. Also, stamping every 'packet' with ABI version > > looks like a huge waste of space. > > It's an ABI so you want to make it backwards compatible and extensible. > Yes, it's just for debugging, so the rules are bit more relaxed here but > that's not an excuse for not designing the ABI properly. I do expect to keep things source-compatible, but even binary-compatible? Developers debug and write patches on the latest kernel, not on a 6-month-old kernel. Isn't it reasonable that they would recompile kmemtrace along with the kernel? I would deem one ABI or another stable, but then we have to worry about not breaking it, which leads to either bloating the kernel, or keeping improvements away from kmemtrace. Should we do it just because this is an ABI? > I really wish we would follow the example set by blktrace here. It uses a > fixed-length header that knows the length of the rest of the packet. I'd rather export the header length through a separate debugfs entry, rather than add this to every packet. I don't think we need variable length packets, unless we intend to export the whole stack trace, for example. By the way, do you anticipate the need for such a stack trace? It would seem nice, but is it worth the trouble? (/me writes this down as a possible future improvement) > Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-18 10:13 ` Eduard - Gabriel Munteanu @ 2008-07-18 14:38 ` Pekka Enberg 2008-07-18 19:40 ` Eduard - Gabriel Munteanu 0 siblings, 1 reply; 30+ messages in thread From: Pekka Enberg @ 2008-07-18 14:38 UTC (permalink / raw) To: Eduard - Gabriel Munteanu Cc: cl, linux-mm, linux-kernel, Randy Dunlap, Matt Mackall Hi Eduard-Gabriel, On Fri, Jul 18, 2008 at 11:48:03AM +0300, Pekka J Enberg wrote: >> It's an ABI so you want to make it backwards compatible and extensible. >> Yes, it's just for debugging, so the rules are bit more relaxed here but >> that's not an excuse for not designing the ABI properly. On Fri, Jul 18, 2008 at 1:13 PM, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> wrote: > I do expect to keep things source-compatible, but even > binary-compatible? Developers debug and write patches on the latest kernel, > not on a 6-month-old kernel. Isn't it reasonable that they would > recompile kmemtrace along with the kernel? Yes, I do think it's unreasonable. I, for one, am hoping distributions will pick up the kmemtrace userspace at some point after which I don't need to ever compile it myself. On Fri, Jul 18, 2008 at 1:13 PM, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> wrote: > I would deem one ABI or another stable, but then we have to worry about > not breaking it, which leads to either bloating the kernel, or keeping > improvements away from kmemtrace. Should we do it just because this is an ABI? Like I've said before, it's debugging/tracing infrastructure so the rules are bit more relaxed. That said, what we should do is (1) make the ABI as future-proof as we can, (2) explicitly mark it as unstable by documenting it in Documentation/ABI/testing and (3) at some point in time move it in Documentation/ABI/stable and hopefully never break it again. But sure, we probably don't need to keep any "bloat" around like we do with the syscall interface, for example. And hopefully, the ABI is good enough to allow adding *new* tracing events while retaining the old ones nicely in a backwards compatible way. On Fri, Jul 18, 2008 at 11:48:03AM +0300, Pekka J Enberg wrote: >> I really wish we would follow the example set by blktrace here. It uses a >> fixed-length header that knows the length of the rest of the packet. On Fri, Jul 18, 2008 at 1:13 PM, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> wrote: > I'd rather export the header length through a separate debugfs entry, > rather than add this to every packet. I don't think we need variable > length packets, unless we intend to export the whole stack trace, for > example. Sure, makes sense. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-18 14:38 ` Pekka Enberg @ 2008-07-18 19:40 ` Eduard - Gabriel Munteanu 2008-07-18 20:07 ` Matt Mackall 0 siblings, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-18 19:40 UTC (permalink / raw) To: Pekka Enberg; +Cc: cl, linux-mm, linux-kernel, Randy Dunlap, Matt Mackall On Fri, Jul 18, 2008 at 05:38:04PM +0300, Pekka Enberg wrote: > Hi Eduard-Gabriel, > > I do expect to keep things source-compatible, but even > > binary-compatible? Developers debug and write patches on the latest kernel, > > not on a 6-month-old kernel. Isn't it reasonable that they would > > recompile kmemtrace along with the kernel? > > Yes, I do think it's unreasonable. I, for one, am hoping distributions > will pick up the kmemtrace userspace at some point after which I don't > need to ever compile it myself. Ok, I agree it's nice to have it in distros. I wasn't planning for this, but it's good to know others' expectations. Then I'll also add a turn-off mechanism, so maybe it makes it into distro kernels too (either debug or not). And we don't need to include kernel headers from userspace anymore and I'll just provide a copy. BTW, I also expect the kmemtrace-user git repo to become stable soon (i.e. no more revision history rewrites). > On Fri, Jul 18, 2008 at 1:13 PM, Eduard - Gabriel Munteanu > <eduard.munteanu@linux360.ro> wrote: > > I would deem one ABI or another stable, but then we have to worry about > > not breaking it, which leads to either bloating the kernel, or keeping > > improvements away from kmemtrace. Should we do it just because this is an ABI? > > Like I've said before, it's debugging/tracing infrastructure so the > rules are bit more relaxed. That said, what we should do is (1) make > the ABI as future-proof as we can, (2) explicitly mark it as unstable > by documenting it in Documentation/ABI/testing and (3) at some point > in time move it in Documentation/ABI/stable and hopefully never break > it again. But sure, we probably don't need to keep any "bloat" around > like we do with the syscall interface, for example. > > And hopefully, the ABI is good enough to allow adding *new* tracing > events while retaining the old ones nicely in a backwards compatible > way. Sounds like a good plan. I'll also update the docs (Documentation/ABI/ and Documentation/vm/kmemtrace.txt) to reflect this. > On Fri, Jul 18, 2008 at 11:48:03AM +0300, Pekka J Enberg wrote: > >> I really wish we would follow the example set by blktrace here. It uses a > >> fixed-length header that knows the length of the rest of the packet. > > On Fri, Jul 18, 2008 at 1:13 PM, Eduard - Gabriel Munteanu > <eduard.munteanu@linux360.ro> wrote: > > I'd rather export the header length through a separate debugfs entry, > > rather than add this to every packet. I don't think we need variable > > length packets, unless we intend to export the whole stack trace, for > > example. > > Sure, makes sense. > Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-18 19:40 ` Eduard - Gabriel Munteanu @ 2008-07-18 20:07 ` Matt Mackall 0 siblings, 0 replies; 30+ messages in thread From: Matt Mackall @ 2008-07-18 20:07 UTC (permalink / raw) To: Eduard - Gabriel Munteanu Cc: Pekka Enberg, cl, linux-mm, linux-kernel, Randy Dunlap On Fri, 2008-07-18 at 22:40 +0300, Eduard - Gabriel Munteanu wrote: > On Fri, Jul 18, 2008 at 05:38:04PM +0300, Pekka Enberg wrote: > > Hi Eduard-Gabriel, > > > I do expect to keep things source-compatible, but even > > > binary-compatible? Developers debug and write patches on the latest kernel, > > > not on a 6-month-old kernel. Isn't it reasonable that they would > > > recompile kmemtrace along with the kernel? > > > > Yes, I do think it's unreasonable. I, for one, am hoping distributions > > will pick up the kmemtrace userspace at some point after which I don't > > need to ever compile it myself. > > Ok, I agree it's nice to have it in distros. I wasn't planning for this, > but it's good to know others' expectations. It's worth pointing out that this is one of the big downfalls of things like systemtap. If a tool can't just work out of the box for a distro, it's basically a non-starter for most users. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-17 0:46 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Eduard - Gabriel Munteanu 2008-07-17 8:01 ` [RFC PATCH 1/4] kmemtrace: Core implementation Pekka Enberg @ 2008-07-17 21:34 ` Randy Dunlap 2008-07-17 23:49 ` Eduard - Gabriel Munteanu 2 siblings, 1 reply; 30+ messages in thread From: Randy Dunlap @ 2008-07-17 21:34 UTC (permalink / raw) To: Eduard - Gabriel Munteanu; +Cc: penberg, cl, linux-mm, linux-kernel On Thu, 17 Jul 2008 03:46:45 +0300 Eduard - Gabriel Munteanu wrote: > kmemtrace provides tracing for slab allocator functions, such as kmalloc, > kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed > to the userspace application in order to analyse allocation hotspots, > internal fragmentation and so on, making it possible to see how well an > allocator performs, as well as debug and profile kernel code. > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > --- > Documentation/kernel-parameters.txt | 6 + > Documentation/vm/kmemtrace.txt | 96 ++++++++++++++++ > MAINTAINERS | 6 + > include/linux/kmemtrace.h | 110 ++++++++++++++++++ > init/main.c | 2 + > lib/Kconfig.debug | 4 + > mm/Makefile | 2 +- > mm/kmemtrace.c | 208 +++++++++++++++++++++++++++++++++++ > 8 files changed, 433 insertions(+), 1 deletions(-) > create mode 100644 Documentation/vm/kmemtrace.txt > create mode 100644 include/linux/kmemtrace.h > create mode 100644 mm/kmemtrace.c > > diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt > new file mode 100644 > index 0000000..1147ecb > --- /dev/null > +++ b/Documentation/vm/kmemtrace.txt > +II. Quick usage guide > +===================== > + > +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable > +CONFIG_KMEMTRACE). > + > +2) Get the userspace tool and build it: > +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository > +$ cd kmemtrace-user/ > +$ autoreconf > +$ ./configure # Supply KERNEL_SOURCES=/path/to/sources/ if you're > + # _not_ running this on a kmemtrace-enabled kernel. > +$ make > + > +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the > +'single' runlevel (so that relay buffers don't fill up easily), and run > +kmemtrace: > +# '$' does not mean user, but root here. > +$ mount -t debugfs none /debug Please mount at /sys/kernel/debug, i.e., the expected debugfs mount point. > +$ mount -t proc none /proc > +$ cd path/to/kmemtrace-user/ > +$ ./kmemtraced > +Wait a bit, then stop it with CTRL+C. > +$ cat /debug/kmemtrace/total_overruns # Check if we didn't overrun, should > + # be zero. > +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to > + check its correctness] > +$ ./kmemtrace-report > + > +Now you should have a nice and short summary of how the allocator performs. Otherwise looks nice. Thanks. --- ~Randy Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA http://linuxplumbersconf.org/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-17 21:34 ` Randy Dunlap @ 2008-07-17 23:49 ` Eduard - Gabriel Munteanu 0 siblings, 0 replies; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-17 23:49 UTC (permalink / raw) To: Randy Dunlap; +Cc: penberg, cl, linux-mm, linux-kernel On Thu, Jul 17, 2008 at 02:34:34PM -0700, Randy Dunlap wrote: > On Thu, 17 Jul 2008 03:46:45 +0300 Eduard - Gabriel Munteanu wrote: > > > +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the > > +'single' runlevel (so that relay buffers don't fill up easily), and run > > +kmemtrace: > > +# '$' does not mean user, but root here. > > +$ mount -t debugfs none /debug > > Please mount at /sys/kernel/debug, i.e., the expected debugfs mount point. > Oh, I did not know that. Thanks, will change accordingly. > > +$ mount -t proc none /proc > > +$ cd path/to/kmemtrace-user/ > > +$ ./kmemtraced > > +Wait a bit, then stop it with CTRL+C. > > +$ cat /debug/kmemtrace/total_overruns # Check if we didn't overrun, should > > + # be zero. > > +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to > > + check its correctness] > > +$ ./kmemtrace-report > > + > > +Now you should have a nice and short summary of how the allocator performs. > > > Otherwise looks nice. Thanks. > > --- > ~Randy > Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA > http://linuxplumbersconf.org/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 0/4] kmemtrace RFC (resend 2) @ 2008-07-22 18:31 Eduard - Gabriel Munteanu 2008-07-22 18:31 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu 0 siblings, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-22 18:31 UTC (permalink / raw) To: penberg; +Cc: cl, linux-mm, linux-kernel, rdunlap, mpm Hi everyone, I hopefully fixed all your previous objections. I have also set up a git tree for anyone who'd like to try kmemtrace (gitweb URL): http://repo.or.cz/w/linux-2.6/kmemtrace.git Comment on the patchset and please try running kmemtrace if possible. Check the docs for information on how to get the userspace tool and set it up. Important: the kmemtrace-user repo went stable and I'll not alter the revision history anymore. BTW, don't be scared if you see many errors being reported by kmemtrace-report, this is a known issue (I could use some advice on this if you know what's going on). Changes since last submission: 1. fixed allocator tracing 2. wrote more documentation 3. reworked the ABI and documented it in Documentation/ABI; we don't include kernel headers in userspace anymore 4. added support for disabling kmemtrace at boot-time 5. added provisions for disabling kmemtrace at runtime 6. changed slab allocators to use __always_inline instead of plain inline, so that we're sure the return address is valid 7. removed some useless cast, as pointed out by Pekka Enberg Since the changes were quite extensive, I chose not to preserve any tags such as "Reviewed-by". I'm waiting for your input on this. Thanks, Eduard P.S.: Pekka, I followed your advice on adding a field containing the struct size (managed to make room for it without adding to the current struct size). This allows us to do crazy stuff in the future, like exporting the whole stack trace on every allocation. Not sure how useful this is right now, but let's keep the ABI extensible. Eduard - Gabriel Munteanu (4): kmemtrace: Core implementation. kmemtrace: SLAB hooks. kmemtrace: SLUB hooks. kmemtrace: SLOB hooks. Documentation/ABI/testing/debugfs-kmemtrace | 58 +++++++ Documentation/kernel-parameters.txt | 10 + Documentation/vm/kmemtrace.txt | 126 ++++++++++++++ MAINTAINERS | 6 + include/linux/kmemtrace.h | 110 ++++++++++++ include/linux/slab_def.h | 68 +++++++- include/linux/slob_def.h | 9 +- include/linux/slub_def.h | 53 ++++++- init/main.c | 2 + lib/Kconfig.debug | 28 +++ mm/Makefile | 2 +- mm/kmemtrace.c | 244 +++++++++++++++++++++++++++ mm/slab.c | 71 +++++++- mm/slob.c | 37 ++++- mm/slub.c | 66 +++++++- 15 files changed, 854 insertions(+), 36 deletions(-) create mode 100644 Documentation/ABI/testing/debugfs-kmemtrace create mode 100644 Documentation/vm/kmemtrace.txt create mode 100644 include/linux/kmemtrace.h create mode 100644 mm/kmemtrace.c -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-22 18:31 [RFC PATCH 0/4] kmemtrace RFC (resend 2) Eduard - Gabriel Munteanu @ 2008-07-22 18:31 ` Eduard - Gabriel Munteanu 2008-07-22 21:28 ` Frank Ch. Eigler 0 siblings, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-22 18:31 UTC (permalink / raw) To: penberg; +Cc: cl, linux-mm, linux-kernel, rdunlap, mpm kmemtrace provides tracing for slab allocator functions, such as kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed to the userspace application in order to analyse allocation hotspots, internal fragmentation and so on, making it possible to see how well an allocator performs, as well as debug and profile kernel code. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> --- Documentation/ABI/testing/debugfs-kmemtrace | 58 +++++++ Documentation/kernel-parameters.txt | 10 + Documentation/vm/kmemtrace.txt | 126 ++++++++++++++ MAINTAINERS | 6 + include/linux/kmemtrace.h | 110 ++++++++++++ init/main.c | 2 + lib/Kconfig.debug | 28 +++ mm/Makefile | 2 +- mm/kmemtrace.c | 244 +++++++++++++++++++++++++++ 9 files changed, 585 insertions(+), 1 deletions(-) create mode 100644 Documentation/ABI/testing/debugfs-kmemtrace create mode 100644 Documentation/vm/kmemtrace.txt create mode 100644 include/linux/kmemtrace.h create mode 100644 mm/kmemtrace.c diff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace new file mode 100644 index 0000000..466c2bb --- /dev/null +++ b/Documentation/ABI/testing/debugfs-kmemtrace @@ -0,0 +1,58 @@ +What: /sys/kernel/debug/kmemtrace/ +Date: July 2008 +Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> +Description: + +In kmemtrace-enabled kernels, the following files are created: + +/sys/kernel/debug/kmemtrace/ + cpu<n> (0400) Per-CPU tracing data, see below. (binary) + total_overruns (0400) Total number of bytes which were dropped from + cpu<n> files because of full buffer condition, + non-binary. (text) + abi_version (0400) Kernel's kmemtrace ABI version. (text) + +Each per-CPU file should be read according to the relay interface. That is, +the reader should set affinity to that specific CPU and, as currently done by +the userspace application (though there are other methods), use poll() with +an infinite timeout before every read(). Otherwise, erroneous data may be +read. The binary data has the following _core_ format: + Event id (1 byte) Unsigned integer, one of: + 0 - erroneous event, this is illegal/invalid data and must + not occur (KMEMTRACE_EVENT_NULL) + 1 - represents an allocation (KMEMTRACE_EVENT_ALLOC) + 2 - represents a freeing of previously allocated memory + (KMEMTRACE_EVENT_FREE) + Type id (1 byte) Unsigned integer, one of: + 0 - this is a kmalloc() / kfree() + 1 - this is a kmem_cache_alloc() / kmem_cache_free() + 2 - this is a __get_free_pages() et al. + Event size (2 bytes) Unsigned integer representing the + size of this event. Used to extend + kmemtrace. Discard the bytes you + don't know about. + Target CPU (4 bytes) Signed integer, valid for event id 1. + If equal to -1, target CPU is the same + as origin CPU, but the reverse might + not be true. + Caller address (8 bytes) Return address to the caller. + Pointer to mem (8 bytes) Pointer to allocated memory, must not + be NULL. + Requested bytes (8 bytes) Total number of requested bytes, + unsigned, must not be zero. + Allocated bytes (8 bytes) Total number of actually allocated + bytes, unsigned, must not be lower + than requested bytes. + Requested flags (8 bytes) GFP flags supplied by the caller. + Timestamp (8 bytes) Signed integer representing timestamp. + +The data is made available in the same endianness the machine has. + +Other event ids and type ids may be defined and added. Other fields may be +added by increasing event size. Every modification to the ABI, including +new id definitions, are followed by bumping the ABI version by one. + + +Users: + kmemtrace-user - git://repo.or.cz/kmemtrace-user.git + diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index b52f47d..446a257 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -49,6 +49,7 @@ parameter is applicable: ISAPNP ISA PnP code is enabled. ISDN Appropriate ISDN support is enabled. JOY Appropriate joystick support is enabled. + KMEMTRACE kmemtrace is enabled. LIBATA Libata driver is enabled LP Printer support is enabled. LOOP Loopback device support is enabled. @@ -941,6 +942,15 @@ and is between 256 and 4096 characters. It is defined in the file use the HighMem zone if it exists, and the Normal zone if it does not. + kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no } + Controls whether kmemtrace is enabled + at boot-time. + + kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of + subbufs kmemtrace's relay channel has. Set this + higher than default (KMEMTRACE_N_SUBBUFS in code) if + you experience buffer overruns. + movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This parameter is similar to kernelcore except it specifies the amount of memory used for migratable allocations. diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt new file mode 100644 index 0000000..75360b1 --- /dev/null +++ b/Documentation/vm/kmemtrace.txt @@ -0,0 +1,126 @@ + kmemtrace - Kernel Memory Tracer + + by Eduard - Gabriel Munteanu + <eduard.munteanu@linux360.ro> + +I. Introduction +=============== + +kmemtrace helps kernel developers figure out two things: +1) how different allocators (SLAB, SLUB etc.) perform +2) how kernel code allocates memory and how much + +To do this, we trace every allocation and export information to the userspace +through the relay interface. We export things such as the number of requested +bytes, the number of bytes actually allocated (i.e. including internal +fragmentation), whether this is a slab allocation or a plain kmalloc() and so +on. + +The actual analysis is performed by a userspace tool (see section III for +details on where to get it from). It logs the data exported by the kernel, +processes it and (as of writing this) can provide the following information: +- the total amount of memory allocated and fragmentation per call-site +- the amount of memory allocated and fragmentation per allocation +- total memory allocated and fragmentation in the collected dataset +- number of cross-CPU allocation and frees (makes sense in NUMA environments) + +Moreover, it can potentially find inconsistent and erroneous behavior in +kernel code, such as using slab free functions on kmalloc'ed memory or +allocating less memory than requested (but not truly failed allocations). + +kmemtrace also makes provisions for tracing on some arch and analysing the +data on another. + +II. Design and goals +==================== + +kmemtrace was designed to handle rather large amounts of data. Thus, it uses +the relay interface to export whatever is logged to userspace, which then +stores it. Analysis and reporting is done asynchronously, that is, after the +data is collected and stored. By design, it allows one to log and analyse +on different machines and different arches. + +As of writing this, the ABI is not considered stable, though it might not +change much. However, no guarantees are made about compatibility yet. When +deemed stable, the ABI should still allow easy extension while maintaining +backward compatibility. This is described further in Documentation/ABI. + +Summary of design goals: + - allow logging and analysis to be done across different machines + - be fast and anticipate usage in high-load environments (*) + - be reasonably extensible + - make it possible for GNU/Linux distributions to have kmemtrace + included in their repositories + +(*) - one of the reasons Pekka Enberg's original userspace data analysis + tool's code was rewritten from Perl to C (although this is more than a + simple conversion) + + +III. Quick usage guide +====================== + +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable +CONFIG_KMEMTRACE and CONFIG_DEFAULT_ENABLED). + +2) Get the userspace tool and build it: +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository +$ cd kmemtrace-user/ +$ ./autogen.sh +$ ./configure +$ make + +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the +'single' runlevel (so that relay buffers don't fill up easily), and run +kmemtrace: +# '$' does not mean user, but root here. +$ mount -t debugfs none /sys/kernel/debug +$ mount -t proc none /proc +$ cd path/to/kmemtrace-user/ +$ ./kmemtraced +Wait a bit, then stop it with CTRL+C. +$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't + # overrun, should + # be zero. +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to + check its correctness] +$ ./kmemtrace-report + +Now you should have a nice and short summary of how the allocator performs. + +IV. FAQ and known issues +======================== + +Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix +this? Should I worry? +A: If it's non-zero, this affects kmemtrace's accuracy, depending on how +large the number is. You can fix it by supplying a higher +'kmemtrace.subbufs=N' kernel parameter. +--- + +Q: kmemtrace_check reports errors, how do I fix this? Should I worry? +A: This is a bug and should be reported. It can occur for a variety of +reasons: + - possible bugs in relay code + - possible misuse of relay by kmemtrace + - timestamps being collected unorderly +Or you may fix it yourself and send us a patch. +--- + +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? +A: This is a known issue and I'm working on it. These might be true errors +in kernel code, which may have inconsistent behavior (e.g. allocating memory +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed +out this behavior may work with SLAB, but may fail with other allocators. + +It may also be due to lack of tracing in some unusual allocator functions. + +We don't want bug reports regarding this issue yet. +--- + +V. See also +=========== + +Documentation/kernel-parameters.txt +Documentation/ABI/testing/debugfs-kmemtrace + diff --git a/MAINTAINERS b/MAINTAINERS index 56a2f67..e967bc2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2425,6 +2425,12 @@ M: jason.wessel@windriver.com L: kgdb-bugreport@lists.sourceforge.net S: Maintained +KMEMTRACE +P: Eduard - Gabriel Munteanu +M: eduard.munteanu@linux360.ro +L: linux-kernel@vger.kernel.org +S: Maintained + KPROBES P: Ananth N Mavinakayanahalli M: ananth@in.ibm.com diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h new file mode 100644 index 0000000..48746ee --- /dev/null +++ b/include/linux/kmemtrace.h @@ -0,0 +1,110 @@ +/* + * Copyright (C) 2008 Eduard - Gabriel Munteanu + * + * This file is released under GPL version 2. + */ + +#ifndef _LINUX_KMEMTRACE_H +#define _LINUX_KMEMTRACE_H + +#ifdef __KERNEL__ + +#include <linux/types.h> +#include <linux/marker.h> + +/* ABI definition starts here. */ + +#define KMEMTRACE_ABI_VERSION 1 + +enum kmemtrace_event_id { + KMEMTRACE_EVENT_NULL = 0, /* Erroneous event. */ + KMEMTRACE_EVENT_ALLOC, + KMEMTRACE_EVENT_FREE, +}; + +enum kmemtrace_type_id { + KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() / kfree(). */ + KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */ + KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */ +}; + +struct kmemtrace_event { + u8 event_id; /* Allocate or free? */ + u8 type_id; /* Kind of allocation/free. */ + u16 event_size; /* Size of event */ + s32 node; /* Target CPU. */ + u64 call_site; /* Caller address. */ + u64 ptr; /* Pointer to allocation. */ + u64 bytes_req; /* Number of bytes requested. */ + u64 bytes_alloc; /* Number of bytes allocated. */ + u64 gfp_flags; /* Requested flags. */ + s64 timestamp; /* When the operation occured in ns. */ +} __attribute__ ((__packed__)); + +/* End of ABI definition. */ + +#ifdef CONFIG_KMEMTRACE + +extern void kmemtrace_init(void); + +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr, + size_t bytes_req, + size_t bytes_alloc, + gfp_t gfp_flags, + int node) +{ + trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu " + "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d", + type_id, call_site, (unsigned long) ptr, + bytes_req, bytes_alloc, (unsigned long) gfp_flags, node); +} + +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr) +{ + trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu", + type_id, call_site, (unsigned long) ptr); +} + +#else /* CONFIG_KMEMTRACE */ + +static inline void kmemtrace_init(void) +{ +} + +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr, + size_t bytes_req, + size_t bytes_alloc, + gfp_t gfp_flags, + int node) +{ +} + +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr) +{ +} + +#endif /* CONFIG_KMEMTRACE */ + +static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr, + size_t bytes_req, + size_t bytes_alloc, + gfp_t gfp_flags) +{ + kmemtrace_mark_alloc_node(type_id, call_site, ptr, + bytes_req, bytes_alloc, gfp_flags, -1); +} + +#endif /* __KERNEL__ */ + +#endif /* _LINUX_KMEMTRACE_H */ + diff --git a/init/main.c b/init/main.c index 057f364..c00659c 100644 --- a/init/main.c +++ b/init/main.c @@ -66,6 +66,7 @@ #include <asm/setup.h> #include <asm/sections.h> #include <asm/cacheflush.h> +#include <linux/kmemtrace.h> #ifdef CONFIG_X86_LOCAL_APIC #include <asm/smp.h> @@ -641,6 +642,7 @@ asmlinkage void __init start_kernel(void) enable_debug_pagealloc(); cpu_hotplug_init(); kmem_cache_init(); + kmemtrace_init(); debug_objects_mem_init(); idr_init_cache(); setup_per_cpu_pageset(); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index d2099f4..0ade2ae 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -674,6 +674,34 @@ config FIREWIRE_OHCI_REMOTE_DMA If unsure, say N. +config KMEMTRACE + bool "Kernel memory tracer (kmemtrace)" + depends on RELAY && DEBUG_FS && MARKERS + help + kmemtrace provides tracing for slab allocator functions, such as + kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected + data is then fed to the userspace application in order to analyse + allocation hotspots, internal fragmentation and so on, making it + possible to see how well an allocator performs, as well as debug + and profile kernel code. + + This requires an userspace application to use. See + Documentation/vm/kmemtrace.txt for more information. + + Saying Y will make the kernel somewhat larger and slower. However, + if you disable kmemtrace at run-time or boot-time, the performance + impact is minimal (depending on the arch the kernel is built for). + + If unsure, say N. + +config KMEMTRACE_DEFAULT_ENABLED + bool "Enabled by default at boot" + depends on KMEMTRACE + help + Say Y here to enable kmemtrace at boot-time by default. Whatever + the choice, the behavior can be overridden by a kernel parameter, + as described in documentation. + source "samples/Kconfig" source "lib/Kconfig.kgdb" diff --git a/mm/Makefile b/mm/Makefile index 18c143b..d88a3bc 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -33,4 +33,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o - +obj-$(CONFIG_KMEMTRACE) += kmemtrace.o diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c new file mode 100644 index 0000000..4b33ace --- /dev/null +++ b/mm/kmemtrace.c @@ -0,0 +1,244 @@ +/* + * Copyright (C) 2008 Pekka Enberg, Eduard - Gabriel Munteanu + * + * This file is released under GPL version 2. + */ + +#include <linux/string.h> +#include <linux/debugfs.h> +#include <linux/relay.h> +#include <linux/module.h> +#include <linux/marker.h> +#include <linux/gfp.h> +#include <linux/kmemtrace.h> + +#define KMEMTRACE_SUBBUF_SIZE (8192 * sizeof(struct kmemtrace_event)) +#define KMEMTRACE_N_SUBBUFS 20 + +static struct rchan *kmemtrace_chan; +static u32 kmemtrace_buf_overruns; + +static unsigned int kmemtrace_n_subbufs; +#ifdef CONFIG_KMEMTRACE_DEFAULT_ENABLED +static unsigned int kmemtrace_enabled = 1; +#else +static unsigned int kmemtrace_enabled = 0; +#endif + +static u32 kmemtrace_abi_version __read_mostly = KMEMTRACE_ABI_VERSION; + +static inline void kmemtrace_log_event(struct kmemtrace_event *event) +{ + relay_write(kmemtrace_chan, event, sizeof(struct kmemtrace_event)); +} + +static void kmemtrace_probe_alloc(void *probe_data, void *call_data, + const char *format, va_list *args) +{ + unsigned long flags; + struct kmemtrace_event ev; + + /* + * Don't convert this to use structure initializers, + * C99 does not guarantee the rvalues evaluation order. + */ + ev.event_id = KMEMTRACE_EVENT_ALLOC; + ev.type_id = va_arg(*args, int); + ev.event_size = sizeof(struct kmemtrace_event); + ev.call_site = va_arg(*args, unsigned long); + ev.ptr = va_arg(*args, unsigned long); + /* Don't trace ignored allocations. */ + if (!ev.ptr) + return; + ev.bytes_req = va_arg(*args, unsigned long); + ev.bytes_alloc = va_arg(*args, unsigned long); + /* ev.timestamp set below, to preserve event ordering. */ + ev.gfp_flags = va_arg(*args, unsigned long); + ev.node = va_arg(*args, int); + + /* We disable IRQs for timestamps to match event ordering. */ + local_irq_save(flags); + ev.timestamp = ktime_to_ns(ktime_get()); + kmemtrace_log_event(&ev); + local_irq_restore(flags); +} + +static void kmemtrace_probe_free(void *probe_data, void *call_data, + const char *format, va_list *args) +{ + unsigned long flags; + struct kmemtrace_event ev; + + /* + * Don't convert this to use structure initializers, + * C99 does not guarantee the rvalues evaluation order. + */ + ev.event_id = KMEMTRACE_EVENT_FREE; + ev.type_id = va_arg(*args, int); + ev.event_size = sizeof(struct kmemtrace_event); + ev.call_site = va_arg(*args, unsigned long); + ev.ptr = va_arg(*args, unsigned long); + /* Don't trace ignored allocations. */ + if (!ev.ptr) + return; + /* ev.timestamp set below, to preserve event ordering. */ + + /* We disable IRQs for timestamps to match event ordering. */ + local_irq_save(flags); + ev.timestamp = ktime_to_ns(ktime_get()); + kmemtrace_log_event(&ev); + local_irq_restore(flags); +} + +static struct dentry * +kmemtrace_create_buf_file(const char *filename, struct dentry *parent, + int mode, struct rchan_buf *buf, int *is_global) +{ + return debugfs_create_file(filename, mode, parent, buf, + &relay_file_operations); +} + +static int kmemtrace_remove_buf_file(struct dentry *dentry) +{ + debugfs_remove(dentry); + + return 0; +} + +static int kmemtrace_count_overruns(struct rchan_buf *buf, + void *subbuf, void *prev_subbuf, + size_t prev_padding) +{ + if (relay_buf_full(buf)) { + /* + * We know it's not SMP-safe, but neither + * debugfs_create_u32() is. + */ + kmemtrace_buf_overruns++; + return 0; + } + + return 1; +} + +static struct rchan_callbacks relay_callbacks = { + .create_buf_file = kmemtrace_create_buf_file, + .remove_buf_file = kmemtrace_remove_buf_file, + .subbuf_start = kmemtrace_count_overruns, +}; + +static struct dentry *kmemtrace_dir; +static struct dentry *kmemtrace_overruns_dentry; +static struct dentry *kmemtrace_abi_version_dentry; + +static void kmemtrace_cleanup(void) +{ + marker_probe_unregister("kmemtrace_alloc", kmemtrace_probe_alloc, NULL); + marker_probe_unregister("kmemtrace_free", kmemtrace_probe_free, NULL); + + if (kmemtrace_abi_version_dentry) + debugfs_remove(kmemtrace_abi_version_dentry); + if (kmemtrace_overruns_dentry) + debugfs_remove(kmemtrace_overruns_dentry); + + relay_close(kmemtrace_chan); + kmemtrace_chan = NULL; + + if (kmemtrace_dir) + debugfs_remove(kmemtrace_dir); +} + +static int __init kmemtrace_setup_late(void) +{ + if (!kmemtrace_chan) + goto failed; + + kmemtrace_dir = debugfs_create_dir("kmemtrace", NULL); + if (!kmemtrace_dir) + goto cleanup; + + kmemtrace_abi_version_dentry = + debugfs_create_u32("abi_version", S_IRUSR, + kmemtrace_dir, &kmemtrace_abi_version); + kmemtrace_overruns_dentry = + debugfs_create_u32("total_overruns", S_IRUSR, + kmemtrace_dir, &kmemtrace_buf_overruns); + if (!kmemtrace_overruns_dentry || !kmemtrace_abi_version_dentry) + goto cleanup; + + if (relay_late_setup_files(kmemtrace_chan, "cpu", kmemtrace_dir)) + goto cleanup; + + printk(KERN_INFO "kmemtrace: fully up.\n"); + + return 0; + +cleanup: + kmemtrace_cleanup(); +failed: + return 1; +} +late_initcall(kmemtrace_setup_late); + +static int __init kmemtrace_set_boot_enabled(char *str) +{ + if (!str) + return -EINVAL; + + if (!strcmp(str, "yes")) + kmemtrace_enabled = 1; + else if (!strcmp(str, "no")) + kmemtrace_enabled = 0; + else + return -EINVAL; + + return 0; +} +early_param("kmemtrace.enable", kmemtrace_set_boot_enabled); + +static int __init kmemtrace_set_subbufs(char *str) +{ + get_option(&str, &kmemtrace_n_subbufs); + return 0; +} +early_param("kmemtrace.subbufs", kmemtrace_set_subbufs); + +void kmemtrace_init(void) +{ + int err; + + if (!kmemtrace_enabled) + return; + + if (!kmemtrace_n_subbufs) + kmemtrace_n_subbufs = KMEMTRACE_N_SUBBUFS; + + kmemtrace_chan = relay_open(NULL, NULL, KMEMTRACE_SUBBUF_SIZE, + kmemtrace_n_subbufs, &relay_callbacks, + NULL); + if (!kmemtrace_chan) { + printk(KERN_INFO "kmemtrace: could not open relay channel\n"); + return; + } + + err = marker_probe_register("kmemtrace_alloc", "type_id %d " + "call_site %lu ptr %lu " + "bytes_req %lu bytes_alloc %lu " + "gfp_flags %lu node %d", + kmemtrace_probe_alloc, NULL); + if (err) + goto probe_fail; + err = marker_probe_register("kmemtrace_free", "type_id %d " + "call_site %lu ptr %lu", + kmemtrace_probe_free, NULL); + if (err) + goto probe_fail; + + printk(KERN_INFO "kmemtrace: early init successful.\n"); + return; + +probe_fail: + printk(KERN_INFO "kmemtrace: could not register marker probes!\n"); + kmemtrace_cleanup(); +} + -- 1.5.6.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-22 18:31 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu @ 2008-07-22 21:28 ` Frank Ch. Eigler 2008-07-23 0:50 ` Eduard - Gabriel Munteanu 0 siblings, 1 reply; 30+ messages in thread From: Frank Ch. Eigler @ 2008-07-22 21:28 UTC (permalink / raw) To: Eduard - Gabriel Munteanu Cc: penberg, cl, linux-mm, linux-kernel, rdunlap, mpm Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> writes: > kmemtrace provides tracing for slab allocator functions, such as kmalloc, > kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed > to the userspace application in order to analyse allocation hotspots, > internal fragmentation and so on, making it possible to see how well an > allocator performs, as well as debug and profile kernel code. > [...] It may make sense to mention in addition that this version of kmemtrace uses markers as the low-level hook mechanism, and this makes the data generated directly accessible to other tracing tools such as systemtap. Thank you! - FChE -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-22 21:28 ` Frank Ch. Eigler @ 2008-07-23 0:50 ` Eduard - Gabriel Munteanu 2008-07-23 0:55 ` Eduard - Gabriel Munteanu 2008-07-23 1:26 ` Frank Ch. Eigler 0 siblings, 2 replies; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-23 0:50 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: penberg, cl, linux-mm, linux-kernel, rdunlap, mpm On Tue, Jul 22, 2008 at 05:28:16PM -0400, Frank Ch. Eigler wrote: > > Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> writes: > > > kmemtrace provides tracing for slab allocator functions, such as kmalloc, > > kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed > > to the userspace application in order to analyse allocation hotspots, > > internal fragmentation and so on, making it possible to see how well an > > allocator performs, as well as debug and profile kernel code. > > [...] > > It may make sense to mention in addition that this version of > kmemtrace uses markers as the low-level hook mechanism, and this makes > the data generated directly accessible to other tracing tools such as > systemtap. Thank you! > > > - FChE Sounds like a good idea, but I'd like to get rid of markers and use Mathieu Desnoyers' tracepoints instead. I'm just waiting for tracepoints to get closer to inclusion in mainline/-mm. It would be great if tracepoints completely replaced markers, so SystemTap would use those instead. However, if tracepoints are not ready when kmemtrace is to be merged, I'll take your advice and mention markers and SystemTap. Thanks, Eduard -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-23 0:50 ` Eduard - Gabriel Munteanu @ 2008-07-23 0:55 ` Eduard - Gabriel Munteanu 2008-07-23 1:26 ` Frank Ch. Eigler 1 sibling, 0 replies; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-23 0:55 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: penberg, cl, linux-mm, linux-kernel, rdunlap, mpm On Wed, Jul 23, 2008 at 03:50:02AM +0300, Eduard - Gabriel Munteanu wrote: > On Tue, Jul 22, 2008 at 05:28:16PM -0400, Frank Ch. Eigler wrote: > > > > Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> writes: > > > > > kmemtrace provides tracing for slab allocator functions, such as kmalloc, > > > kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed > > > to the userspace application in order to analyse allocation hotspots, > > > internal fragmentation and so on, making it possible to see how well an > > > allocator performs, as well as debug and profile kernel code. > > > [...] > > > > It may make sense to mention in addition that this version of > > kmemtrace uses markers as the low-level hook mechanism, and this makes > > the data generated directly accessible to other tracing tools such as > > systemtap. Thank you! > > > > > > - FChE > > Sounds like a good idea, but I'd like to get rid of markers and use > Mathieu Desnoyers' tracepoints instead. I'm just waiting for tracepoints > to get closer to inclusion in mainline/-mm. > > It would be great if tracepoints completely replaced markers, so SystemTap > would use those instead. > > However, if tracepoints are not ready when kmemtrace is to be merged, > I'll take your advice and mention markers and SystemTap. > > > Thanks, > Eduard > (fixed Matt's Cc.) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-23 0:50 ` Eduard - Gabriel Munteanu 2008-07-23 0:55 ` Eduard - Gabriel Munteanu @ 2008-07-23 1:26 ` Frank Ch. Eigler 1 sibling, 0 replies; 30+ messages in thread From: Frank Ch. Eigler @ 2008-07-23 1:26 UTC (permalink / raw) To: Eduard - Gabriel Munteanu Cc: penberg, cl, linux-mm, linux-kernel, rdunlap, mpm Hi - On Wed, Jul 23, 2008 at 03:50:02AM +0300, Eduard - Gabriel Munteanu wrote: > [...] Sounds like a good idea, but I'd like to get rid of markers > and use Mathieu Desnoyers' tracepoints instead. I'm just waiting for > tracepoints to get closer to inclusion in mainline/-mm. OK. > It would be great if tracepoints completely replaced markers, so > SystemTap would use those instead. Raw tracepoints are problematic as they require a per-tracepoint C function signature to be synthesized by the tool (or hard-coded in the tool or elsewhere). We haven't worked out how best do to this. OTOH, markers don't require such hard-coding, so are simpler for a general tool to interface to. > However, if tracepoints are not ready when kmemtrace is to be merged, > I'll take your advice and mention markers and SystemTap. Thanks either way - I'm glad you found an existing tracing mechanism usable and didn't choose/need to invent your own. - FChE -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 0/4] kmemtrace RFC (resend 2, fixed wrong Cc) @ 2008-07-22 18:36 Eduard - Gabriel Munteanu 2008-07-22 18:36 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu 0 siblings, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-22 18:36 UTC (permalink / raw) To: penberg; +Cc: cl, linux-mm, linux-kernel, rdunlap, mpm Hi everyone, I hopefully fixed all your previous objections. I have also set up a git tree for anyone who'd like to try kmemtrace (gitweb URL): http://repo.or.cz/w/linux-2.6/kmemtrace.git Comment on the patchset and please try running kmemtrace if possible. Check the docs for information on how to get the userspace tool and set it up. Important: the kmemtrace-user repo went stable and I'll not alter the revision history anymore. BTW, don't be scared if you see many errors being reported by kmemtrace-report, this is a known issue (I could use some advice on this if you know what's going on). Changes since last submission: 1. fixed allocator tracing 2. wrote more documentation 3. reworked the ABI and documented it in Documentation/ABI; we don't include kernel headers in userspace anymore 4. added support for disabling kmemtrace at boot-time 5. added provisions for disabling kmemtrace at runtime 6. changed slab allocators to use __always_inline instead of plain inline, so that we're sure the return address is valid 7. removed some useless cast, as pointed out by Pekka Enberg Since the changes were quite extensive, I chose not to preserve any tags such as "Reviewed-by". I'm waiting for your input on this. Thanks, Eduard P.S.: Pekka, I followed your advice on adding a field containing the struct size (managed to make room for it without adding to the current struct size). This allows us to do crazy stuff in the future, like exporting the whole stack trace on every allocation. Not sure how useful this is right now, but let's keep the ABI extensible. Eduard - Gabriel Munteanu (4): kmemtrace: Core implementation. kmemtrace: SLAB hooks. kmemtrace: SLUB hooks. kmemtrace: SLOB hooks. Documentation/ABI/testing/debugfs-kmemtrace | 58 +++++++ Documentation/kernel-parameters.txt | 10 + Documentation/vm/kmemtrace.txt | 126 ++++++++++++++ MAINTAINERS | 6 + include/linux/kmemtrace.h | 110 ++++++++++++ include/linux/slab_def.h | 68 +++++++- include/linux/slob_def.h | 9 +- include/linux/slub_def.h | 53 ++++++- init/main.c | 2 + lib/Kconfig.debug | 28 +++ mm/Makefile | 2 +- mm/kmemtrace.c | 244 +++++++++++++++++++++++++++ mm/slab.c | 71 +++++++- mm/slob.c | 37 ++++- mm/slub.c | 66 +++++++- 15 files changed, 854 insertions(+), 36 deletions(-) create mode 100644 Documentation/ABI/testing/debugfs-kmemtrace create mode 100644 Documentation/vm/kmemtrace.txt create mode 100644 include/linux/kmemtrace.h create mode 100644 mm/kmemtrace.c -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-22 18:36 [RFC PATCH 0/4] kmemtrace RFC (resend 2, fixed wrong Cc) Eduard - Gabriel Munteanu @ 2008-07-22 18:36 ` Eduard - Gabriel Munteanu 2008-07-28 9:24 ` Pekka Enberg 0 siblings, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-22 18:36 UTC (permalink / raw) To: penberg; +Cc: cl, linux-mm, linux-kernel, rdunlap, mpm kmemtrace provides tracing for slab allocator functions, such as kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed to the userspace application in order to analyse allocation hotspots, internal fragmentation and so on, making it possible to see how well an allocator performs, as well as debug and profile kernel code. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> --- Documentation/ABI/testing/debugfs-kmemtrace | 58 +++++++ Documentation/kernel-parameters.txt | 10 + Documentation/vm/kmemtrace.txt | 126 ++++++++++++++ MAINTAINERS | 6 + include/linux/kmemtrace.h | 110 ++++++++++++ init/main.c | 2 + lib/Kconfig.debug | 28 +++ mm/Makefile | 2 +- mm/kmemtrace.c | 244 +++++++++++++++++++++++++++ 9 files changed, 585 insertions(+), 1 deletions(-) create mode 100644 Documentation/ABI/testing/debugfs-kmemtrace create mode 100644 Documentation/vm/kmemtrace.txt create mode 100644 include/linux/kmemtrace.h create mode 100644 mm/kmemtrace.c diff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace new file mode 100644 index 0000000..466c2bb --- /dev/null +++ b/Documentation/ABI/testing/debugfs-kmemtrace @@ -0,0 +1,58 @@ +What: /sys/kernel/debug/kmemtrace/ +Date: July 2008 +Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> +Description: + +In kmemtrace-enabled kernels, the following files are created: + +/sys/kernel/debug/kmemtrace/ + cpu<n> (0400) Per-CPU tracing data, see below. (binary) + total_overruns (0400) Total number of bytes which were dropped from + cpu<n> files because of full buffer condition, + non-binary. (text) + abi_version (0400) Kernel's kmemtrace ABI version. (text) + +Each per-CPU file should be read according to the relay interface. That is, +the reader should set affinity to that specific CPU and, as currently done by +the userspace application (though there are other methods), use poll() with +an infinite timeout before every read(). Otherwise, erroneous data may be +read. The binary data has the following _core_ format: + Event id (1 byte) Unsigned integer, one of: + 0 - erroneous event, this is illegal/invalid data and must + not occur (KMEMTRACE_EVENT_NULL) + 1 - represents an allocation (KMEMTRACE_EVENT_ALLOC) + 2 - represents a freeing of previously allocated memory + (KMEMTRACE_EVENT_FREE) + Type id (1 byte) Unsigned integer, one of: + 0 - this is a kmalloc() / kfree() + 1 - this is a kmem_cache_alloc() / kmem_cache_free() + 2 - this is a __get_free_pages() et al. + Event size (2 bytes) Unsigned integer representing the + size of this event. Used to extend + kmemtrace. Discard the bytes you + don't know about. + Target CPU (4 bytes) Signed integer, valid for event id 1. + If equal to -1, target CPU is the same + as origin CPU, but the reverse might + not be true. + Caller address (8 bytes) Return address to the caller. + Pointer to mem (8 bytes) Pointer to allocated memory, must not + be NULL. + Requested bytes (8 bytes) Total number of requested bytes, + unsigned, must not be zero. + Allocated bytes (8 bytes) Total number of actually allocated + bytes, unsigned, must not be lower + than requested bytes. + Requested flags (8 bytes) GFP flags supplied by the caller. + Timestamp (8 bytes) Signed integer representing timestamp. + +The data is made available in the same endianness the machine has. + +Other event ids and type ids may be defined and added. Other fields may be +added by increasing event size. Every modification to the ABI, including +new id definitions, are followed by bumping the ABI version by one. + + +Users: + kmemtrace-user - git://repo.or.cz/kmemtrace-user.git + diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index b52f47d..446a257 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -49,6 +49,7 @@ parameter is applicable: ISAPNP ISA PnP code is enabled. ISDN Appropriate ISDN support is enabled. JOY Appropriate joystick support is enabled. + KMEMTRACE kmemtrace is enabled. LIBATA Libata driver is enabled LP Printer support is enabled. LOOP Loopback device support is enabled. @@ -941,6 +942,15 @@ and is between 256 and 4096 characters. It is defined in the file use the HighMem zone if it exists, and the Normal zone if it does not. + kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no } + Controls whether kmemtrace is enabled + at boot-time. + + kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of + subbufs kmemtrace's relay channel has. Set this + higher than default (KMEMTRACE_N_SUBBUFS in code) if + you experience buffer overruns. + movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This parameter is similar to kernelcore except it specifies the amount of memory used for migratable allocations. diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt new file mode 100644 index 0000000..75360b1 --- /dev/null +++ b/Documentation/vm/kmemtrace.txt @@ -0,0 +1,126 @@ + kmemtrace - Kernel Memory Tracer + + by Eduard - Gabriel Munteanu + <eduard.munteanu@linux360.ro> + +I. Introduction +=============== + +kmemtrace helps kernel developers figure out two things: +1) how different allocators (SLAB, SLUB etc.) perform +2) how kernel code allocates memory and how much + +To do this, we trace every allocation and export information to the userspace +through the relay interface. We export things such as the number of requested +bytes, the number of bytes actually allocated (i.e. including internal +fragmentation), whether this is a slab allocation or a plain kmalloc() and so +on. + +The actual analysis is performed by a userspace tool (see section III for +details on where to get it from). It logs the data exported by the kernel, +processes it and (as of writing this) can provide the following information: +- the total amount of memory allocated and fragmentation per call-site +- the amount of memory allocated and fragmentation per allocation +- total memory allocated and fragmentation in the collected dataset +- number of cross-CPU allocation and frees (makes sense in NUMA environments) + +Moreover, it can potentially find inconsistent and erroneous behavior in +kernel code, such as using slab free functions on kmalloc'ed memory or +allocating less memory than requested (but not truly failed allocations). + +kmemtrace also makes provisions for tracing on some arch and analysing the +data on another. + +II. Design and goals +==================== + +kmemtrace was designed to handle rather large amounts of data. Thus, it uses +the relay interface to export whatever is logged to userspace, which then +stores it. Analysis and reporting is done asynchronously, that is, after the +data is collected and stored. By design, it allows one to log and analyse +on different machines and different arches. + +As of writing this, the ABI is not considered stable, though it might not +change much. However, no guarantees are made about compatibility yet. When +deemed stable, the ABI should still allow easy extension while maintaining +backward compatibility. This is described further in Documentation/ABI. + +Summary of design goals: + - allow logging and analysis to be done across different machines + - be fast and anticipate usage in high-load environments (*) + - be reasonably extensible + - make it possible for GNU/Linux distributions to have kmemtrace + included in their repositories + +(*) - one of the reasons Pekka Enberg's original userspace data analysis + tool's code was rewritten from Perl to C (although this is more than a + simple conversion) + + +III. Quick usage guide +====================== + +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable +CONFIG_KMEMTRACE and CONFIG_DEFAULT_ENABLED). + +2) Get the userspace tool and build it: +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository +$ cd kmemtrace-user/ +$ ./autogen.sh +$ ./configure +$ make + +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the +'single' runlevel (so that relay buffers don't fill up easily), and run +kmemtrace: +# '$' does not mean user, but root here. +$ mount -t debugfs none /sys/kernel/debug +$ mount -t proc none /proc +$ cd path/to/kmemtrace-user/ +$ ./kmemtraced +Wait a bit, then stop it with CTRL+C. +$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't + # overrun, should + # be zero. +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to + check its correctness] +$ ./kmemtrace-report + +Now you should have a nice and short summary of how the allocator performs. + +IV. FAQ and known issues +======================== + +Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix +this? Should I worry? +A: If it's non-zero, this affects kmemtrace's accuracy, depending on how +large the number is. You can fix it by supplying a higher +'kmemtrace.subbufs=N' kernel parameter. +--- + +Q: kmemtrace_check reports errors, how do I fix this? Should I worry? +A: This is a bug and should be reported. It can occur for a variety of +reasons: + - possible bugs in relay code + - possible misuse of relay by kmemtrace + - timestamps being collected unorderly +Or you may fix it yourself and send us a patch. +--- + +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? +A: This is a known issue and I'm working on it. These might be true errors +in kernel code, which may have inconsistent behavior (e.g. allocating memory +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed +out this behavior may work with SLAB, but may fail with other allocators. + +It may also be due to lack of tracing in some unusual allocator functions. + +We don't want bug reports regarding this issue yet. +--- + +V. See also +=========== + +Documentation/kernel-parameters.txt +Documentation/ABI/testing/debugfs-kmemtrace + diff --git a/MAINTAINERS b/MAINTAINERS index 56a2f67..e967bc2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2425,6 +2425,12 @@ M: jason.wessel@windriver.com L: kgdb-bugreport@lists.sourceforge.net S: Maintained +KMEMTRACE +P: Eduard - Gabriel Munteanu +M: eduard.munteanu@linux360.ro +L: linux-kernel@vger.kernel.org +S: Maintained + KPROBES P: Ananth N Mavinakayanahalli M: ananth@in.ibm.com diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h new file mode 100644 index 0000000..48746ee --- /dev/null +++ b/include/linux/kmemtrace.h @@ -0,0 +1,110 @@ +/* + * Copyright (C) 2008 Eduard - Gabriel Munteanu + * + * This file is released under GPL version 2. + */ + +#ifndef _LINUX_KMEMTRACE_H +#define _LINUX_KMEMTRACE_H + +#ifdef __KERNEL__ + +#include <linux/types.h> +#include <linux/marker.h> + +/* ABI definition starts here. */ + +#define KMEMTRACE_ABI_VERSION 1 + +enum kmemtrace_event_id { + KMEMTRACE_EVENT_NULL = 0, /* Erroneous event. */ + KMEMTRACE_EVENT_ALLOC, + KMEMTRACE_EVENT_FREE, +}; + +enum kmemtrace_type_id { + KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() / kfree(). */ + KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */ + KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */ +}; + +struct kmemtrace_event { + u8 event_id; /* Allocate or free? */ + u8 type_id; /* Kind of allocation/free. */ + u16 event_size; /* Size of event */ + s32 node; /* Target CPU. */ + u64 call_site; /* Caller address. */ + u64 ptr; /* Pointer to allocation. */ + u64 bytes_req; /* Number of bytes requested. */ + u64 bytes_alloc; /* Number of bytes allocated. */ + u64 gfp_flags; /* Requested flags. */ + s64 timestamp; /* When the operation occured in ns. */ +} __attribute__ ((__packed__)); + +/* End of ABI definition. */ + +#ifdef CONFIG_KMEMTRACE + +extern void kmemtrace_init(void); + +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr, + size_t bytes_req, + size_t bytes_alloc, + gfp_t gfp_flags, + int node) +{ + trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu " + "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d", + type_id, call_site, (unsigned long) ptr, + bytes_req, bytes_alloc, (unsigned long) gfp_flags, node); +} + +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr) +{ + trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu", + type_id, call_site, (unsigned long) ptr); +} + +#else /* CONFIG_KMEMTRACE */ + +static inline void kmemtrace_init(void) +{ +} + +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr, + size_t bytes_req, + size_t bytes_alloc, + gfp_t gfp_flags, + int node) +{ +} + +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr) +{ +} + +#endif /* CONFIG_KMEMTRACE */ + +static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id, + unsigned long call_site, + const void *ptr, + size_t bytes_req, + size_t bytes_alloc, + gfp_t gfp_flags) +{ + kmemtrace_mark_alloc_node(type_id, call_site, ptr, + bytes_req, bytes_alloc, gfp_flags, -1); +} + +#endif /* __KERNEL__ */ + +#endif /* _LINUX_KMEMTRACE_H */ + diff --git a/init/main.c b/init/main.c index 057f364..c00659c 100644 --- a/init/main.c +++ b/init/main.c @@ -66,6 +66,7 @@ #include <asm/setup.h> #include <asm/sections.h> #include <asm/cacheflush.h> +#include <linux/kmemtrace.h> #ifdef CONFIG_X86_LOCAL_APIC #include <asm/smp.h> @@ -641,6 +642,7 @@ asmlinkage void __init start_kernel(void) enable_debug_pagealloc(); cpu_hotplug_init(); kmem_cache_init(); + kmemtrace_init(); debug_objects_mem_init(); idr_init_cache(); setup_per_cpu_pageset(); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index d2099f4..0ade2ae 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -674,6 +674,34 @@ config FIREWIRE_OHCI_REMOTE_DMA If unsure, say N. +config KMEMTRACE + bool "Kernel memory tracer (kmemtrace)" + depends on RELAY && DEBUG_FS && MARKERS + help + kmemtrace provides tracing for slab allocator functions, such as + kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected + data is then fed to the userspace application in order to analyse + allocation hotspots, internal fragmentation and so on, making it + possible to see how well an allocator performs, as well as debug + and profile kernel code. + + This requires an userspace application to use. See + Documentation/vm/kmemtrace.txt for more information. + + Saying Y will make the kernel somewhat larger and slower. However, + if you disable kmemtrace at run-time or boot-time, the performance + impact is minimal (depending on the arch the kernel is built for). + + If unsure, say N. + +config KMEMTRACE_DEFAULT_ENABLED + bool "Enabled by default at boot" + depends on KMEMTRACE + help + Say Y here to enable kmemtrace at boot-time by default. Whatever + the choice, the behavior can be overridden by a kernel parameter, + as described in documentation. + source "samples/Kconfig" source "lib/Kconfig.kgdb" diff --git a/mm/Makefile b/mm/Makefile index 18c143b..d88a3bc 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -33,4 +33,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o - +obj-$(CONFIG_KMEMTRACE) += kmemtrace.o diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c new file mode 100644 index 0000000..4b33ace --- /dev/null +++ b/mm/kmemtrace.c @@ -0,0 +1,244 @@ +/* + * Copyright (C) 2008 Pekka Enberg, Eduard - Gabriel Munteanu + * + * This file is released under GPL version 2. + */ + +#include <linux/string.h> +#include <linux/debugfs.h> +#include <linux/relay.h> +#include <linux/module.h> +#include <linux/marker.h> +#include <linux/gfp.h> +#include <linux/kmemtrace.h> + +#define KMEMTRACE_SUBBUF_SIZE (8192 * sizeof(struct kmemtrace_event)) +#define KMEMTRACE_N_SUBBUFS 20 + +static struct rchan *kmemtrace_chan; +static u32 kmemtrace_buf_overruns; + +static unsigned int kmemtrace_n_subbufs; +#ifdef CONFIG_KMEMTRACE_DEFAULT_ENABLED +static unsigned int kmemtrace_enabled = 1; +#else +static unsigned int kmemtrace_enabled = 0; +#endif + +static u32 kmemtrace_abi_version __read_mostly = KMEMTRACE_ABI_VERSION; + +static inline void kmemtrace_log_event(struct kmemtrace_event *event) +{ + relay_write(kmemtrace_chan, event, sizeof(struct kmemtrace_event)); +} + +static void kmemtrace_probe_alloc(void *probe_data, void *call_data, + const char *format, va_list *args) +{ + unsigned long flags; + struct kmemtrace_event ev; + + /* + * Don't convert this to use structure initializers, + * C99 does not guarantee the rvalues evaluation order. + */ + ev.event_id = KMEMTRACE_EVENT_ALLOC; + ev.type_id = va_arg(*args, int); + ev.event_size = sizeof(struct kmemtrace_event); + ev.call_site = va_arg(*args, unsigned long); + ev.ptr = va_arg(*args, unsigned long); + /* Don't trace ignored allocations. */ + if (!ev.ptr) + return; + ev.bytes_req = va_arg(*args, unsigned long); + ev.bytes_alloc = va_arg(*args, unsigned long); + /* ev.timestamp set below, to preserve event ordering. */ + ev.gfp_flags = va_arg(*args, unsigned long); + ev.node = va_arg(*args, int); + + /* We disable IRQs for timestamps to match event ordering. */ + local_irq_save(flags); + ev.timestamp = ktime_to_ns(ktime_get()); + kmemtrace_log_event(&ev); + local_irq_restore(flags); +} + +static void kmemtrace_probe_free(void *probe_data, void *call_data, + const char *format, va_list *args) +{ + unsigned long flags; + struct kmemtrace_event ev; + + /* + * Don't convert this to use structure initializers, + * C99 does not guarantee the rvalues evaluation order. + */ + ev.event_id = KMEMTRACE_EVENT_FREE; + ev.type_id = va_arg(*args, int); + ev.event_size = sizeof(struct kmemtrace_event); + ev.call_site = va_arg(*args, unsigned long); + ev.ptr = va_arg(*args, unsigned long); + /* Don't trace ignored allocations. */ + if (!ev.ptr) + return; + /* ev.timestamp set below, to preserve event ordering. */ + + /* We disable IRQs for timestamps to match event ordering. */ + local_irq_save(flags); + ev.timestamp = ktime_to_ns(ktime_get()); + kmemtrace_log_event(&ev); + local_irq_restore(flags); +} + +static struct dentry * +kmemtrace_create_buf_file(const char *filename, struct dentry *parent, + int mode, struct rchan_buf *buf, int *is_global) +{ + return debugfs_create_file(filename, mode, parent, buf, + &relay_file_operations); +} + +static int kmemtrace_remove_buf_file(struct dentry *dentry) +{ + debugfs_remove(dentry); + + return 0; +} + +static int kmemtrace_count_overruns(struct rchan_buf *buf, + void *subbuf, void *prev_subbuf, + size_t prev_padding) +{ + if (relay_buf_full(buf)) { + /* + * We know it's not SMP-safe, but neither + * debugfs_create_u32() is. + */ + kmemtrace_buf_overruns++; + return 0; + } + + return 1; +} + +static struct rchan_callbacks relay_callbacks = { + .create_buf_file = kmemtrace_create_buf_file, + .remove_buf_file = kmemtrace_remove_buf_file, + .subbuf_start = kmemtrace_count_overruns, +}; + +static struct dentry *kmemtrace_dir; +static struct dentry *kmemtrace_overruns_dentry; +static struct dentry *kmemtrace_abi_version_dentry; + +static void kmemtrace_cleanup(void) +{ + marker_probe_unregister("kmemtrace_alloc", kmemtrace_probe_alloc, NULL); + marker_probe_unregister("kmemtrace_free", kmemtrace_probe_free, NULL); + + if (kmemtrace_abi_version_dentry) + debugfs_remove(kmemtrace_abi_version_dentry); + if (kmemtrace_overruns_dentry) + debugfs_remove(kmemtrace_overruns_dentry); + + relay_close(kmemtrace_chan); + kmemtrace_chan = NULL; + + if (kmemtrace_dir) + debugfs_remove(kmemtrace_dir); +} + +static int __init kmemtrace_setup_late(void) +{ + if (!kmemtrace_chan) + goto failed; + + kmemtrace_dir = debugfs_create_dir("kmemtrace", NULL); + if (!kmemtrace_dir) + goto cleanup; + + kmemtrace_abi_version_dentry = + debugfs_create_u32("abi_version", S_IRUSR, + kmemtrace_dir, &kmemtrace_abi_version); + kmemtrace_overruns_dentry = + debugfs_create_u32("total_overruns", S_IRUSR, + kmemtrace_dir, &kmemtrace_buf_overruns); + if (!kmemtrace_overruns_dentry || !kmemtrace_abi_version_dentry) + goto cleanup; + + if (relay_late_setup_files(kmemtrace_chan, "cpu", kmemtrace_dir)) + goto cleanup; + + printk(KERN_INFO "kmemtrace: fully up.\n"); + + return 0; + +cleanup: + kmemtrace_cleanup(); +failed: + return 1; +} +late_initcall(kmemtrace_setup_late); + +static int __init kmemtrace_set_boot_enabled(char *str) +{ + if (!str) + return -EINVAL; + + if (!strcmp(str, "yes")) + kmemtrace_enabled = 1; + else if (!strcmp(str, "no")) + kmemtrace_enabled = 0; + else + return -EINVAL; + + return 0; +} +early_param("kmemtrace.enable", kmemtrace_set_boot_enabled); + +static int __init kmemtrace_set_subbufs(char *str) +{ + get_option(&str, &kmemtrace_n_subbufs); + return 0; +} +early_param("kmemtrace.subbufs", kmemtrace_set_subbufs); + +void kmemtrace_init(void) +{ + int err; + + if (!kmemtrace_enabled) + return; + + if (!kmemtrace_n_subbufs) + kmemtrace_n_subbufs = KMEMTRACE_N_SUBBUFS; + + kmemtrace_chan = relay_open(NULL, NULL, KMEMTRACE_SUBBUF_SIZE, + kmemtrace_n_subbufs, &relay_callbacks, + NULL); + if (!kmemtrace_chan) { + printk(KERN_INFO "kmemtrace: could not open relay channel\n"); + return; + } + + err = marker_probe_register("kmemtrace_alloc", "type_id %d " + "call_site %lu ptr %lu " + "bytes_req %lu bytes_alloc %lu " + "gfp_flags %lu node %d", + kmemtrace_probe_alloc, NULL); + if (err) + goto probe_fail; + err = marker_probe_register("kmemtrace_free", "type_id %d " + "call_site %lu ptr %lu", + kmemtrace_probe_free, NULL); + if (err) + goto probe_fail; + + printk(KERN_INFO "kmemtrace: early init successful.\n"); + return; + +probe_fail: + printk(KERN_INFO "kmemtrace: could not register marker probes!\n"); + kmemtrace_cleanup(); +} + -- 1.5.6.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-22 18:36 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu @ 2008-07-28 9:24 ` Pekka Enberg 2008-07-28 16:29 ` Mathieu Desnoyers 0 siblings, 1 reply; 30+ messages in thread From: Pekka Enberg @ 2008-07-28 9:24 UTC (permalink / raw) To: Eduard - Gabriel Munteanu Cc: cl, linux-mm, linux-kernel, rdunlap, mpm, mathieu.desnoyers Hi, [I'm cc'ing Mathieu if he wants to comment on this.] On Tue, 2008-07-22 at 21:36 +0300, Eduard - Gabriel Munteanu wrote: > kmemtrace provides tracing for slab allocator functions, such as kmalloc, > kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed > to the userspace application in order to analyse allocation hotspots, > internal fragmentation and so on, making it possible to see how well an > allocator performs, as well as debug and profile kernel code. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > --- > Documentation/ABI/testing/debugfs-kmemtrace | 58 +++++++ > Documentation/kernel-parameters.txt | 10 + > Documentation/vm/kmemtrace.txt | 126 ++++++++++++++ > MAINTAINERS | 6 + > include/linux/kmemtrace.h | 110 ++++++++++++ > init/main.c | 2 + > lib/Kconfig.debug | 28 +++ > mm/Makefile | 2 +- > mm/kmemtrace.c | 244 +++++++++++++++++++++++++++ > 9 files changed, 585 insertions(+), 1 deletions(-) > create mode 100644 Documentation/ABI/testing/debugfs-kmemtrace > create mode 100644 Documentation/vm/kmemtrace.txt > create mode 100644 include/linux/kmemtrace.h > create mode 100644 mm/kmemtrace.c > > diff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace > new file mode 100644 > index 0000000..466c2bb > --- /dev/null > +++ b/Documentation/ABI/testing/debugfs-kmemtrace > @@ -0,0 +1,58 @@ > +What: /sys/kernel/debug/kmemtrace/ > +Date: July 2008 > +Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > +Description: > + > +In kmemtrace-enabled kernels, the following files are created: > + > +/sys/kernel/debug/kmemtrace/ > + cpu<n> (0400) Per-CPU tracing data, see below. (binary) > + total_overruns (0400) Total number of bytes which were dropped from > + cpu<n> files because of full buffer condition, > + non-binary. (text) > + abi_version (0400) Kernel's kmemtrace ABI version. (text) > + > +Each per-CPU file should be read according to the relay interface. That is, > +the reader should set affinity to that specific CPU and, as currently done by > +the userspace application (though there are other methods), use poll() with > +an infinite timeout before every read(). Otherwise, erroneous data may be > +read. The binary data has the following _core_ format: > + Event id (1 byte) Unsigned integer, one of: > + 0 - erroneous event, this is illegal/invalid data and must > + not occur (KMEMTRACE_EVENT_NULL) > + 1 - represents an allocation (KMEMTRACE_EVENT_ALLOC) > + 2 - represents a freeing of previously allocated memory > + (KMEMTRACE_EVENT_FREE) > + Type id (1 byte) Unsigned integer, one of: > + 0 - this is a kmalloc() / kfree() > + 1 - this is a kmem_cache_alloc() / kmem_cache_free() > + 2 - this is a __get_free_pages() et al. > + Event size (2 bytes) Unsigned integer representing the > + size of this event. Used to extend > + kmemtrace. Discard the bytes you > + don't know about. > + Target CPU (4 bytes) Signed integer, valid for event id 1. > + If equal to -1, target CPU is the same > + as origin CPU, but the reverse might > + not be true. > + Caller address (8 bytes) Return address to the caller. > + Pointer to mem (8 bytes) Pointer to allocated memory, must not > + be NULL. > + Requested bytes (8 bytes) Total number of requested bytes, > + unsigned, must not be zero. > + Allocated bytes (8 bytes) Total number of actually allocated > + bytes, unsigned, must not be lower > + than requested bytes. > + Requested flags (8 bytes) GFP flags supplied by the caller. > + Timestamp (8 bytes) Signed integer representing timestamp. > + > +The data is made available in the same endianness the machine has. > + > +Other event ids and type ids may be defined and added. Other fields may be > +added by increasing event size. Every modification to the ABI, including > +new id definitions, are followed by bumping the ABI version by one. > + > + > +Users: > + kmemtrace-user - git://repo.or.cz/kmemtrace-user.git > + > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > index b52f47d..446a257 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -49,6 +49,7 @@ parameter is applicable: > ISAPNP ISA PnP code is enabled. > ISDN Appropriate ISDN support is enabled. > JOY Appropriate joystick support is enabled. > + KMEMTRACE kmemtrace is enabled. > LIBATA Libata driver is enabled > LP Printer support is enabled. > LOOP Loopback device support is enabled. > @@ -941,6 +942,15 @@ and is between 256 and 4096 characters. It is defined in the file > use the HighMem zone if it exists, and the Normal > zone if it does not. > > + kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no } > + Controls whether kmemtrace is enabled > + at boot-time. > + > + kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of > + subbufs kmemtrace's relay channel has. Set this > + higher than default (KMEMTRACE_N_SUBBUFS in code) if > + you experience buffer overruns. > + > movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This parameter > is similar to kernelcore except it specifies the > amount of memory used for migratable allocations. > diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt > new file mode 100644 > index 0000000..75360b1 > --- /dev/null > +++ b/Documentation/vm/kmemtrace.txt > @@ -0,0 +1,126 @@ > + kmemtrace - Kernel Memory Tracer > + > + by Eduard - Gabriel Munteanu > + <eduard.munteanu@linux360.ro> > + > +I. Introduction > +=============== > + > +kmemtrace helps kernel developers figure out two things: > +1) how different allocators (SLAB, SLUB etc.) perform > +2) how kernel code allocates memory and how much > + > +To do this, we trace every allocation and export information to the userspace > +through the relay interface. We export things such as the number of requested > +bytes, the number of bytes actually allocated (i.e. including internal > +fragmentation), whether this is a slab allocation or a plain kmalloc() and so > +on. > + > +The actual analysis is performed by a userspace tool (see section III for > +details on where to get it from). It logs the data exported by the kernel, > +processes it and (as of writing this) can provide the following information: > +- the total amount of memory allocated and fragmentation per call-site > +- the amount of memory allocated and fragmentation per allocation > +- total memory allocated and fragmentation in the collected dataset > +- number of cross-CPU allocation and frees (makes sense in NUMA environments) > + > +Moreover, it can potentially find inconsistent and erroneous behavior in > +kernel code, such as using slab free functions on kmalloc'ed memory or > +allocating less memory than requested (but not truly failed allocations). > + > +kmemtrace also makes provisions for tracing on some arch and analysing the > +data on another. > + > +II. Design and goals > +==================== > + > +kmemtrace was designed to handle rather large amounts of data. Thus, it uses > +the relay interface to export whatever is logged to userspace, which then > +stores it. Analysis and reporting is done asynchronously, that is, after the > +data is collected and stored. By design, it allows one to log and analyse > +on different machines and different arches. > + > +As of writing this, the ABI is not considered stable, though it might not > +change much. However, no guarantees are made about compatibility yet. When > +deemed stable, the ABI should still allow easy extension while maintaining > +backward compatibility. This is described further in Documentation/ABI. > + > +Summary of design goals: > + - allow logging and analysis to be done across different machines > + - be fast and anticipate usage in high-load environments (*) > + - be reasonably extensible > + - make it possible for GNU/Linux distributions to have kmemtrace > + included in their repositories > + > +(*) - one of the reasons Pekka Enberg's original userspace data analysis > + tool's code was rewritten from Perl to C (although this is more than a > + simple conversion) > + > + > +III. Quick usage guide > +====================== > + > +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable > +CONFIG_KMEMTRACE and CONFIG_DEFAULT_ENABLED). > + > +2) Get the userspace tool and build it: > +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository > +$ cd kmemtrace-user/ > +$ ./autogen.sh > +$ ./configure > +$ make > + > +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the > +'single' runlevel (so that relay buffers don't fill up easily), and run > +kmemtrace: > +# '$' does not mean user, but root here. > +$ mount -t debugfs none /sys/kernel/debug > +$ mount -t proc none /proc > +$ cd path/to/kmemtrace-user/ > +$ ./kmemtraced > +Wait a bit, then stop it with CTRL+C. > +$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't > + # overrun, should > + # be zero. > +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to > + check its correctness] > +$ ./kmemtrace-report > + > +Now you should have a nice and short summary of how the allocator performs. > + > +IV. FAQ and known issues > +======================== > + > +Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix > +this? Should I worry? > +A: If it's non-zero, this affects kmemtrace's accuracy, depending on how > +large the number is. You can fix it by supplying a higher > +'kmemtrace.subbufs=N' kernel parameter. > +--- > + > +Q: kmemtrace_check reports errors, how do I fix this? Should I worry? > +A: This is a bug and should be reported. It can occur for a variety of > +reasons: > + - possible bugs in relay code > + - possible misuse of relay by kmemtrace > + - timestamps being collected unorderly > +Or you may fix it yourself and send us a patch. > +--- > + > +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? > +A: This is a known issue and I'm working on it. These might be true errors > +in kernel code, which may have inconsistent behavior (e.g. allocating memory > +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed > +out this behavior may work with SLAB, but may fail with other allocators. > + > +It may also be due to lack of tracing in some unusual allocator functions. > + > +We don't want bug reports regarding this issue yet. > +--- > + > +V. See also > +=========== > + > +Documentation/kernel-parameters.txt > +Documentation/ABI/testing/debugfs-kmemtrace > + > diff --git a/MAINTAINERS b/MAINTAINERS > index 56a2f67..e967bc2 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -2425,6 +2425,12 @@ M: jason.wessel@windriver.com > L: kgdb-bugreport@lists.sourceforge.net > S: Maintained > > +KMEMTRACE > +P: Eduard - Gabriel Munteanu > +M: eduard.munteanu@linux360.ro > +L: linux-kernel@vger.kernel.org > +S: Maintained > + > KPROBES > P: Ananth N Mavinakayanahalli > M: ananth@in.ibm.com > diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h > new file mode 100644 > index 0000000..48746ee > --- /dev/null > +++ b/include/linux/kmemtrace.h > @@ -0,0 +1,110 @@ > +/* > + * Copyright (C) 2008 Eduard - Gabriel Munteanu > + * > + * This file is released under GPL version 2. > + */ > + > +#ifndef _LINUX_KMEMTRACE_H > +#define _LINUX_KMEMTRACE_H > + > +#ifdef __KERNEL__ > + > +#include <linux/types.h> > +#include <linux/marker.h> > + > +/* ABI definition starts here. */ > + > +#define KMEMTRACE_ABI_VERSION 1 > + > +enum kmemtrace_event_id { > + KMEMTRACE_EVENT_NULL = 0, /* Erroneous event. */ > + KMEMTRACE_EVENT_ALLOC, > + KMEMTRACE_EVENT_FREE, > +}; > + > +enum kmemtrace_type_id { > + KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() / kfree(). */ > + KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */ > + KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */ > +}; > + > +struct kmemtrace_event { > + u8 event_id; /* Allocate or free? */ > + u8 type_id; /* Kind of allocation/free. */ > + u16 event_size; /* Size of event */ > + s32 node; /* Target CPU. */ > + u64 call_site; /* Caller address. */ > + u64 ptr; /* Pointer to allocation. */ > + u64 bytes_req; /* Number of bytes requested. */ > + u64 bytes_alloc; /* Number of bytes allocated. */ > + u64 gfp_flags; /* Requested flags. */ > + s64 timestamp; /* When the operation occured in ns. */ > +} __attribute__ ((__packed__)); > + > +/* End of ABI definition. */ > + > +#ifdef CONFIG_KMEMTRACE > + > +extern void kmemtrace_init(void); > + > +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr, > + size_t bytes_req, > + size_t bytes_alloc, > + gfp_t gfp_flags, > + int node) > +{ > + trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu " > + "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d", > + type_id, call_site, (unsigned long) ptr, > + bytes_req, bytes_alloc, (unsigned long) gfp_flags, node); > +} > + > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr) > +{ > + trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu", > + type_id, call_site, (unsigned long) ptr); > +} > + > +#else /* CONFIG_KMEMTRACE */ > + > +static inline void kmemtrace_init(void) > +{ > +} > + > +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr, > + size_t bytes_req, > + size_t bytes_alloc, > + gfp_t gfp_flags, > + int node) > +{ > +} > + > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr) > +{ > +} > + > +#endif /* CONFIG_KMEMTRACE */ > + > +static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id, > + unsigned long call_site, > + const void *ptr, > + size_t bytes_req, > + size_t bytes_alloc, > + gfp_t gfp_flags) > +{ > + kmemtrace_mark_alloc_node(type_id, call_site, ptr, > + bytes_req, bytes_alloc, gfp_flags, -1); > +} > + > +#endif /* __KERNEL__ */ > + > +#endif /* _LINUX_KMEMTRACE_H */ > + > diff --git a/init/main.c b/init/main.c > index 057f364..c00659c 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -66,6 +66,7 @@ > #include <asm/setup.h> > #include <asm/sections.h> > #include <asm/cacheflush.h> > +#include <linux/kmemtrace.h> > > #ifdef CONFIG_X86_LOCAL_APIC > #include <asm/smp.h> > @@ -641,6 +642,7 @@ asmlinkage void __init start_kernel(void) > enable_debug_pagealloc(); > cpu_hotplug_init(); > kmem_cache_init(); > + kmemtrace_init(); > debug_objects_mem_init(); > idr_init_cache(); > setup_per_cpu_pageset(); > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index d2099f4..0ade2ae 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -674,6 +674,34 @@ config FIREWIRE_OHCI_REMOTE_DMA > > If unsure, say N. > > +config KMEMTRACE > + bool "Kernel memory tracer (kmemtrace)" > + depends on RELAY && DEBUG_FS && MARKERS > + help > + kmemtrace provides tracing for slab allocator functions, such as > + kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected > + data is then fed to the userspace application in order to analyse > + allocation hotspots, internal fragmentation and so on, making it > + possible to see how well an allocator performs, as well as debug > + and profile kernel code. > + > + This requires an userspace application to use. See > + Documentation/vm/kmemtrace.txt for more information. > + > + Saying Y will make the kernel somewhat larger and slower. However, > + if you disable kmemtrace at run-time or boot-time, the performance > + impact is minimal (depending on the arch the kernel is built for). > + > + If unsure, say N. > + > +config KMEMTRACE_DEFAULT_ENABLED > + bool "Enabled by default at boot" > + depends on KMEMTRACE > + help > + Say Y here to enable kmemtrace at boot-time by default. Whatever > + the choice, the behavior can be overridden by a kernel parameter, > + as described in documentation. > + > source "samples/Kconfig" > > source "lib/Kconfig.kgdb" > diff --git a/mm/Makefile b/mm/Makefile > index 18c143b..d88a3bc 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -33,4 +33,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o > obj-$(CONFIG_SMP) += allocpercpu.o > obj-$(CONFIG_QUICKLIST) += quicklist.o > obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o > - > +obj-$(CONFIG_KMEMTRACE) += kmemtrace.o > diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c > new file mode 100644 > index 0000000..4b33ace > --- /dev/null > +++ b/mm/kmemtrace.c > @@ -0,0 +1,244 @@ > +/* > + * Copyright (C) 2008 Pekka Enberg, Eduard - Gabriel Munteanu > + * > + * This file is released under GPL version 2. > + */ > + > +#include <linux/string.h> > +#include <linux/debugfs.h> > +#include <linux/relay.h> > +#include <linux/module.h> > +#include <linux/marker.h> > +#include <linux/gfp.h> > +#include <linux/kmemtrace.h> > + > +#define KMEMTRACE_SUBBUF_SIZE (8192 * sizeof(struct kmemtrace_event)) > +#define KMEMTRACE_N_SUBBUFS 20 > + > +static struct rchan *kmemtrace_chan; > +static u32 kmemtrace_buf_overruns; > + > +static unsigned int kmemtrace_n_subbufs; > +#ifdef CONFIG_KMEMTRACE_DEFAULT_ENABLED > +static unsigned int kmemtrace_enabled = 1; > +#else > +static unsigned int kmemtrace_enabled = 0; > +#endif > + > +static u32 kmemtrace_abi_version __read_mostly = KMEMTRACE_ABI_VERSION; > + > +static inline void kmemtrace_log_event(struct kmemtrace_event *event) > +{ > + relay_write(kmemtrace_chan, event, sizeof(struct kmemtrace_event)); > +} > + > +static void kmemtrace_probe_alloc(void *probe_data, void *call_data, > + const char *format, va_list *args) > +{ > + unsigned long flags; > + struct kmemtrace_event ev; > + > + /* > + * Don't convert this to use structure initializers, > + * C99 does not guarantee the rvalues evaluation order. > + */ > + ev.event_id = KMEMTRACE_EVENT_ALLOC; > + ev.type_id = va_arg(*args, int); > + ev.event_size = sizeof(struct kmemtrace_event); > + ev.call_site = va_arg(*args, unsigned long); > + ev.ptr = va_arg(*args, unsigned long); > + /* Don't trace ignored allocations. */ > + if (!ev.ptr) > + return; > + ev.bytes_req = va_arg(*args, unsigned long); > + ev.bytes_alloc = va_arg(*args, unsigned long); > + /* ev.timestamp set below, to preserve event ordering. */ > + ev.gfp_flags = va_arg(*args, unsigned long); > + ev.node = va_arg(*args, int); > + > + /* We disable IRQs for timestamps to match event ordering. */ > + local_irq_save(flags); > + ev.timestamp = ktime_to_ns(ktime_get()); > + kmemtrace_log_event(&ev); > + local_irq_restore(flags); > +} > + > +static void kmemtrace_probe_free(void *probe_data, void *call_data, > + const char *format, va_list *args) > +{ > + unsigned long flags; > + struct kmemtrace_event ev; > + > + /* > + * Don't convert this to use structure initializers, > + * C99 does not guarantee the rvalues evaluation order. > + */ > + ev.event_id = KMEMTRACE_EVENT_FREE; > + ev.type_id = va_arg(*args, int); > + ev.event_size = sizeof(struct kmemtrace_event); > + ev.call_site = va_arg(*args, unsigned long); > + ev.ptr = va_arg(*args, unsigned long); > + /* Don't trace ignored allocations. */ > + if (!ev.ptr) > + return; > + /* ev.timestamp set below, to preserve event ordering. */ > + > + /* We disable IRQs for timestamps to match event ordering. */ > + local_irq_save(flags); > + ev.timestamp = ktime_to_ns(ktime_get()); > + kmemtrace_log_event(&ev); > + local_irq_restore(flags); > +} > + > +static struct dentry * > +kmemtrace_create_buf_file(const char *filename, struct dentry *parent, > + int mode, struct rchan_buf *buf, int *is_global) > +{ > + return debugfs_create_file(filename, mode, parent, buf, > + &relay_file_operations); > +} > + > +static int kmemtrace_remove_buf_file(struct dentry *dentry) > +{ > + debugfs_remove(dentry); > + > + return 0; > +} > + > +static int kmemtrace_count_overruns(struct rchan_buf *buf, > + void *subbuf, void *prev_subbuf, > + size_t prev_padding) > +{ > + if (relay_buf_full(buf)) { > + /* > + * We know it's not SMP-safe, but neither > + * debugfs_create_u32() is. > + */ > + kmemtrace_buf_overruns++; > + return 0; > + } > + > + return 1; > +} > + > +static struct rchan_callbacks relay_callbacks = { > + .create_buf_file = kmemtrace_create_buf_file, > + .remove_buf_file = kmemtrace_remove_buf_file, > + .subbuf_start = kmemtrace_count_overruns, > +}; > + > +static struct dentry *kmemtrace_dir; > +static struct dentry *kmemtrace_overruns_dentry; > +static struct dentry *kmemtrace_abi_version_dentry; > + > +static void kmemtrace_cleanup(void) > +{ > + marker_probe_unregister("kmemtrace_alloc", kmemtrace_probe_alloc, NULL); > + marker_probe_unregister("kmemtrace_free", kmemtrace_probe_free, NULL); > + > + if (kmemtrace_abi_version_dentry) > + debugfs_remove(kmemtrace_abi_version_dentry); > + if (kmemtrace_overruns_dentry) > + debugfs_remove(kmemtrace_overruns_dentry); > + > + relay_close(kmemtrace_chan); > + kmemtrace_chan = NULL; > + > + if (kmemtrace_dir) > + debugfs_remove(kmemtrace_dir); > +} > + > +static int __init kmemtrace_setup_late(void) > +{ > + if (!kmemtrace_chan) > + goto failed; > + > + kmemtrace_dir = debugfs_create_dir("kmemtrace", NULL); > + if (!kmemtrace_dir) > + goto cleanup; > + > + kmemtrace_abi_version_dentry = > + debugfs_create_u32("abi_version", S_IRUSR, > + kmemtrace_dir, &kmemtrace_abi_version); > + kmemtrace_overruns_dentry = > + debugfs_create_u32("total_overruns", S_IRUSR, > + kmemtrace_dir, &kmemtrace_buf_overruns); > + if (!kmemtrace_overruns_dentry || !kmemtrace_abi_version_dentry) > + goto cleanup; > + > + if (relay_late_setup_files(kmemtrace_chan, "cpu", kmemtrace_dir)) > + goto cleanup; > + > + printk(KERN_INFO "kmemtrace: fully up.\n"); > + > + return 0; > + > +cleanup: > + kmemtrace_cleanup(); > +failed: > + return 1; > +} > +late_initcall(kmemtrace_setup_late); > + > +static int __init kmemtrace_set_boot_enabled(char *str) > +{ > + if (!str) > + return -EINVAL; > + > + if (!strcmp(str, "yes")) > + kmemtrace_enabled = 1; > + else if (!strcmp(str, "no")) > + kmemtrace_enabled = 0; > + else > + return -EINVAL; > + > + return 0; > +} > +early_param("kmemtrace.enable", kmemtrace_set_boot_enabled); > + > +static int __init kmemtrace_set_subbufs(char *str) > +{ > + get_option(&str, &kmemtrace_n_subbufs); > + return 0; > +} > +early_param("kmemtrace.subbufs", kmemtrace_set_subbufs); > + > +void kmemtrace_init(void) > +{ > + int err; > + > + if (!kmemtrace_enabled) > + return; > + > + if (!kmemtrace_n_subbufs) > + kmemtrace_n_subbufs = KMEMTRACE_N_SUBBUFS; > + > + kmemtrace_chan = relay_open(NULL, NULL, KMEMTRACE_SUBBUF_SIZE, > + kmemtrace_n_subbufs, &relay_callbacks, > + NULL); > + if (!kmemtrace_chan) { > + printk(KERN_INFO "kmemtrace: could not open relay channel\n"); > + return; > + } > + > + err = marker_probe_register("kmemtrace_alloc", "type_id %d " > + "call_site %lu ptr %lu " > + "bytes_req %lu bytes_alloc %lu " > + "gfp_flags %lu node %d", > + kmemtrace_probe_alloc, NULL); > + if (err) > + goto probe_fail; > + err = marker_probe_register("kmemtrace_free", "type_id %d " > + "call_site %lu ptr %lu", > + kmemtrace_probe_free, NULL); > + if (err) > + goto probe_fail; > + > + printk(KERN_INFO "kmemtrace: early init successful.\n"); > + return; > + > +probe_fail: > + printk(KERN_INFO "kmemtrace: could not register marker probes!\n"); > + kmemtrace_cleanup(); > +} > + -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-28 9:24 ` Pekka Enberg @ 2008-07-28 16:29 ` Mathieu Desnoyers 2008-07-28 17:09 ` Matt Mackall 2008-07-28 17:35 ` Eduard - Gabriel Munteanu 0 siblings, 2 replies; 30+ messages in thread From: Mathieu Desnoyers @ 2008-07-28 16:29 UTC (permalink / raw) To: Pekka Enberg Cc: Eduard - Gabriel Munteanu, cl, linux-mm, linux-kernel, rdunlap, mpm, Steven Rostedt, Thomas Gleixner * Pekka Enberg (penberg@cs.helsinki.fi) wrote: > Hi, > > [I'm cc'ing Mathieu if he wants to comment on this.] > > On Tue, 2008-07-22 at 21:36 +0300, Eduard - Gabriel Munteanu wrote: > > kmemtrace provides tracing for slab allocator functions, such as kmalloc, > > kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed > > to the userspace application in order to analyse allocation hotspots, > > internal fragmentation and so on, making it possible to see how well an > > allocator performs, as well as debug and profile kernel code. > > Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> > > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > > --- > > Documentation/ABI/testing/debugfs-kmemtrace | 58 +++++++ > > Documentation/kernel-parameters.txt | 10 + > > Documentation/vm/kmemtrace.txt | 126 ++++++++++++++ > > MAINTAINERS | 6 + > > include/linux/kmemtrace.h | 110 ++++++++++++ > > init/main.c | 2 + > > lib/Kconfig.debug | 28 +++ > > mm/Makefile | 2 +- > > mm/kmemtrace.c | 244 +++++++++++++++++++++++++++ > > 9 files changed, 585 insertions(+), 1 deletions(-) > > create mode 100644 Documentation/ABI/testing/debugfs-kmemtrace > > create mode 100644 Documentation/vm/kmemtrace.txt > > create mode 100644 include/linux/kmemtrace.h > > create mode 100644 mm/kmemtrace.c > > > > diff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace > > new file mode 100644 > > index 0000000..466c2bb > > --- /dev/null > > +++ b/Documentation/ABI/testing/debugfs-kmemtrace Documentation should probably come in a separate patch. > > @@ -0,0 +1,58 @@ > > +What: /sys/kernel/debug/kmemtrace/ > > +Date: July 2008 > > +Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > > +Description: > > + > > +In kmemtrace-enabled kernels, the following files are created: > > + > > +/sys/kernel/debug/kmemtrace/ > > + cpu<n> (0400) Per-CPU tracing data, see below. (binary) > > + total_overruns (0400) Total number of bytes which were dropped from > > + cpu<n> files because of full buffer condition, > > + non-binary. (text) > > + abi_version (0400) Kernel's kmemtrace ABI version. (text) > > + > > +Each per-CPU file should be read according to the relay interface. That is, > > +the reader should set affinity to that specific CPU and, as currently done by > > +the userspace application (though there are other methods), use poll() with > > +an infinite timeout before every read(). Otherwise, erroneous data may be > > +read. The binary data has the following _core_ format: > > + Event id (1 byte) Unsigned integer, one of: > > + 0 - erroneous event, this is illegal/invalid data and must > > + not occur (KMEMTRACE_EVENT_NULL) Hmm ? why record an invalid event ?? I see it's not used in the code, is that actually used in some way because the memory is set to 0 ? > > + 1 - represents an allocation (KMEMTRACE_EVENT_ALLOC) > > + 2 - represents a freeing of previously allocated memory > > + (KMEMTRACE_EVENT_FREE) > > + Type id (1 byte) Unsigned integer, one of: > > + 0 - this is a kmalloc() / kfree() > > + 1 - this is a kmem_cache_alloc() / kmem_cache_free() > > + 2 - this is a __get_free_pages() et al. > > + Event size (2 bytes) Unsigned integer representing the > > + size of this event. Used to extend > > + kmemtrace. Discard the bytes you > > + don't know about. > > + Target CPU (4 bytes) Signed integer, valid for event id 1. > > + If equal to -1, target CPU is the same > > + as origin CPU, but the reverse might > > + not be true. If only valid for event ID 1 and only in NUMA case, please don't waste space in each event header and make that a event-specific field... ? > > + Caller address (8 bytes) Return address to the caller. Not true on 32 bits machines. You are wasting 4 bytes on those archs. > > + Pointer to mem (8 bytes) Pointer to allocated memory, must not > > + be NULL. Same here. > > + Requested bytes (8 bytes) Total number of requested bytes, > > + unsigned, must not be zero. Same here. > > + Allocated bytes (8 bytes) Total number of actually allocated > > + bytes, unsigned, must not be lower > > + than requested bytes. And here. > > + Requested flags (8 bytes) GFP flags supplied by the caller. 8 bytes for GFP flags ?? Whoah, that's a lot of one-hot bits ! :) I knew that some allocators were bloated, bit not that much. :) > > + Timestamp (8 bytes) Signed integer representing timestamp. > > + With a heartbeat, as lttng does, you can cut that to a 4 bytes field. > > +The data is made available in the same endianness the machine has. > > + Using a magic number in the trace header lets you deal with cross-endianness. Saving the type sizes in the trace header lets you deal with different int/long/pointer type sizes. > > +Other event ids and type ids may be defined and added. Other fields may be > > +added by increasing event size. Every modification to the ABI, including > > +new id definitions, are followed by bumping the ABI version by one. > > + I personally prefer a self-describing trace :) > > + > > +Users: > > + kmemtrace-user - git://repo.or.cz/kmemtrace-user.git > > + > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > > index b52f47d..446a257 100644 > > --- a/Documentation/kernel-parameters.txt > > +++ b/Documentation/kernel-parameters.txt > > @@ -49,6 +49,7 @@ parameter is applicable: > > ISAPNP ISA PnP code is enabled. > > ISDN Appropriate ISDN support is enabled. > > JOY Appropriate joystick support is enabled. > > + KMEMTRACE kmemtrace is enabled. > > LIBATA Libata driver is enabled > > LP Printer support is enabled. > > LOOP Loopback device support is enabled. > > @@ -941,6 +942,15 @@ and is between 256 and 4096 characters. It is defined in the file > > use the HighMem zone if it exists, and the Normal > > zone if it does not. > > > > + kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no } > > + Controls whether kmemtrace is enabled > > + at boot-time. > > + > > + kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of > > + subbufs kmemtrace's relay channel has. Set this > > + higher than default (KMEMTRACE_N_SUBBUFS in code) if > > + you experience buffer overruns. > > + That kind of stuff would be nice to have in lttng. > > movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This parameter > > is similar to kernelcore except it specifies the > > amount of memory used for migratable allocations. > > diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt > > new file mode 100644 > > index 0000000..75360b1 > > --- /dev/null > > +++ b/Documentation/vm/kmemtrace.txt > > @@ -0,0 +1,126 @@ > > + kmemtrace - Kernel Memory Tracer > > + > > + by Eduard - Gabriel Munteanu > > + <eduard.munteanu@linux360.ro> > > + > > +I. Introduction > > +=============== > > + > > +kmemtrace helps kernel developers figure out two things: > > +1) how different allocators (SLAB, SLUB etc.) perform > > +2) how kernel code allocates memory and how much > > + > > +To do this, we trace every allocation and export information to the userspace > > +through the relay interface. We export things such as the number of requested > > +bytes, the number of bytes actually allocated (i.e. including internal > > +fragmentation), whether this is a slab allocation or a plain kmalloc() and so > > +on. > > + > > +The actual analysis is performed by a userspace tool (see section III for > > +details on where to get it from). It logs the data exported by the kernel, > > +processes it and (as of writing this) can provide the following information: > > +- the total amount of memory allocated and fragmentation per call-site > > +- the amount of memory allocated and fragmentation per allocation > > +- total memory allocated and fragmentation in the collected dataset > > +- number of cross-CPU allocation and frees (makes sense in NUMA environments) > > + > > +Moreover, it can potentially find inconsistent and erroneous behavior in > > +kernel code, such as using slab free functions on kmalloc'ed memory or > > +allocating less memory than requested (but not truly failed allocations). > > + > > +kmemtrace also makes provisions for tracing on some arch and analysing the > > +data on another. > > + > > +II. Design and goals > > +==================== > > + > > +kmemtrace was designed to handle rather large amounts of data. Thus, it uses > > +the relay interface to export whatever is logged to userspace, which then > > +stores it. Analysis and reporting is done asynchronously, that is, after the > > +data is collected and stored. By design, it allows one to log and analyse > > +on different machines and different arches. > > + > > +As of writing this, the ABI is not considered stable, though it might not > > +change much. However, no guarantees are made about compatibility yet. When > > +deemed stable, the ABI should still allow easy extension while maintaining > > +backward compatibility. This is described further in Documentation/ABI. > > + > > +Summary of design goals: > > + - allow logging and analysis to be done across different machines Not currently true : cross-endianness/wastes space for 32 bits archs. > > + - be fast and anticipate usage in high-load environments (*) LTTng will be faster though : per-cpu atomic ops instead of interrupt disable makes the probe faster. > > + - be reasonably extensible Automatic description of markers and dynamic assignation of IDs to markers should provide a bit more flexibility here. > > + - make it possible for GNU/Linux distributions to have kmemtrace > > + included in their repositories > > + > > +(*) - one of the reasons Pekka Enberg's original userspace data analysis > > + tool's code was rewritten from Perl to C (although this is more than a > > + simple conversion) > > + > > + > > +III. Quick usage guide > > +====================== > > + > > +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable > > +CONFIG_KMEMTRACE and CONFIG_DEFAULT_ENABLED). > > + > > +2) Get the userspace tool and build it: > > +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository > > +$ cd kmemtrace-user/ > > +$ ./autogen.sh > > +$ ./configure > > +$ make > > + > > +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the > > +'single' runlevel (so that relay buffers don't fill up easily), and run > > +kmemtrace: > > +# '$' does not mean user, but root here. Change the documentation to prefix a root command line by "#" instead of leaving this weird comment. > > +$ mount -t debugfs none /sys/kernel/debug > > +$ mount -t proc none /proc > > +$ cd path/to/kmemtrace-user/ > > +$ ./kmemtraced > > +Wait a bit, then stop it with CTRL+C. > > +$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't > > + # overrun, should > > + # be zero. > > +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to > > + check its correctness] > > +$ ./kmemtrace-report > > + > > +Now you should have a nice and short summary of how the allocator performs. > > + > > +IV. FAQ and known issues > > +======================== > > + > > +Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix > > +this? Should I worry? > > +A: If it's non-zero, this affects kmemtrace's accuracy, depending on how > > +large the number is. You can fix it by supplying a higher > > +'kmemtrace.subbufs=N' kernel parameter. > > +--- > > + > > +Q: kmemtrace_check reports errors, how do I fix this? Should I worry? > > +A: This is a bug and should be reported. It can occur for a variety of > > +reasons: > > + - possible bugs in relay code > > + - possible misuse of relay by kmemtrace > > + - timestamps being collected unorderly > > +Or you may fix it yourself and send us a patch. > > +--- > > + > > +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? > > +A: This is a known issue and I'm working on it. These might be true errors > > +in kernel code, which may have inconsistent behavior (e.g. allocating memory > > +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed > > +out this behavior may work with SLAB, but may fail with other allocators. > > + > > +It may also be due to lack of tracing in some unusual allocator functions. > > + > > +We don't want bug reports regarding this issue yet. What in the world can be causing that ? Shouldn't it be fixed ? It might be due to unexpected allocator behavior, non-instrumented alloc/free code or broken tracer.... > > +--- > > + > > +V. See also > > +=========== > > + > > +Documentation/kernel-parameters.txt > > +Documentation/ABI/testing/debugfs-kmemtrace > > + > > diff --git a/MAINTAINERS b/MAINTAINERS > > index 56a2f67..e967bc2 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -2425,6 +2425,12 @@ M: jason.wessel@windriver.com > > L: kgdb-bugreport@lists.sourceforge.net > > S: Maintained > > > > +KMEMTRACE > > +P: Eduard - Gabriel Munteanu > > +M: eduard.munteanu@linux360.ro > > +L: linux-kernel@vger.kernel.org > > +S: Maintained > > + > > KPROBES > > P: Ananth N Mavinakayanahalli > > M: ananth@in.ibm.com > > diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h > > new file mode 100644 > > index 0000000..48746ee > > --- /dev/null > > +++ b/include/linux/kmemtrace.h > > @@ -0,0 +1,110 @@ > > +/* > > + * Copyright (C) 2008 Eduard - Gabriel Munteanu > > + * > > + * This file is released under GPL version 2. > > + */ > > + > > +#ifndef _LINUX_KMEMTRACE_H > > +#define _LINUX_KMEMTRACE_H > > + > > +#ifdef __KERNEL__ > > + > > +#include <linux/types.h> > > +#include <linux/marker.h> > > + > > +/* ABI definition starts here. */ > > + > > +#define KMEMTRACE_ABI_VERSION 1 > > + > > +enum kmemtrace_event_id { > > + KMEMTRACE_EVENT_NULL = 0, /* Erroneous event. */ > > + KMEMTRACE_EVENT_ALLOC, > > + KMEMTRACE_EVENT_FREE, > > +}; > > + > > +enum kmemtrace_type_id { > > + KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() / kfree(). */ > > + KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */ > > + KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */ > > +}; > > + > > +struct kmemtrace_event { > > + u8 event_id; /* Allocate or free? */ > > + u8 type_id; /* Kind of allocation/free. */ > > + u16 event_size; /* Size of event */ > > + s32 node; /* Target CPU. */ > > + u64 call_site; /* Caller address. */ > > + u64 ptr; /* Pointer to allocation. */ > > + u64 bytes_req; /* Number of bytes requested. */ > > + u64 bytes_alloc; /* Number of bytes allocated. */ > > + u64 gfp_flags; /* Requested flags. */ > > + s64 timestamp; /* When the operation occured in ns. */ > > +} __attribute__ ((__packed__)); > > + See below for detail, but this event record is way too big and not adapted to 32 bits architectures. > > +/* End of ABI definition. */ > > + > > +#ifdef CONFIG_KMEMTRACE > > + > > +extern void kmemtrace_init(void); > > + > > +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, > > + unsigned long call_site, > > + const void *ptr, > > + size_t bytes_req, > > + size_t bytes_alloc, > > + gfp_t gfp_flags, > > + int node) > > +{ > > + trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu " > > + "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d", > > + type_id, call_site, (unsigned long) ptr, > > + bytes_req, bytes_alloc, (unsigned long) gfp_flags, node); > > +} > > + > > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, > > + unsigned long call_site, > > + const void *ptr) > > +{ > > + trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu", > > + type_id, call_site, (unsigned long) ptr); > > +} This could be trivially turned into a tracepoint probe. > > + > > +#else /* CONFIG_KMEMTRACE */ > > + > > +static inline void kmemtrace_init(void) > > +{ > > +} > > + > > +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, > > + unsigned long call_site, > > + const void *ptr, > > + size_t bytes_req, > > + size_t bytes_alloc, > > + gfp_t gfp_flags, > > + int node) > > +{ > > +} > > + > > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, > > + unsigned long call_site, > > + const void *ptr) > > +{ > > +} > > + > > +#endif /* CONFIG_KMEMTRACE */ > > + > > +static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id, > > + unsigned long call_site, > > + const void *ptr, > > + size_t bytes_req, > > + size_t bytes_alloc, > > + gfp_t gfp_flags) > > +{ > > + kmemtrace_mark_alloc_node(type_id, call_site, ptr, > > + bytes_req, bytes_alloc, gfp_flags, -1); > > +} > > + > > +#endif /* __KERNEL__ */ > > + > > +#endif /* _LINUX_KMEMTRACE_H */ > > + > > diff --git a/init/main.c b/init/main.c > > index 057f364..c00659c 100644 > > --- a/init/main.c > > +++ b/init/main.c > > @@ -66,6 +66,7 @@ > > #include <asm/setup.h> > > #include <asm/sections.h> > > #include <asm/cacheflush.h> > > +#include <linux/kmemtrace.h> > > > > #ifdef CONFIG_X86_LOCAL_APIC > > #include <asm/smp.h> > > @@ -641,6 +642,7 @@ asmlinkage void __init start_kernel(void) > > enable_debug_pagealloc(); > > cpu_hotplug_init(); > > kmem_cache_init(); > > + kmemtrace_init(); > > debug_objects_mem_init(); > > idr_init_cache(); > > setup_per_cpu_pageset(); > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > index d2099f4..0ade2ae 100644 > > --- a/lib/Kconfig.debug > > +++ b/lib/Kconfig.debug > > @@ -674,6 +674,34 @@ config FIREWIRE_OHCI_REMOTE_DMA > > > > If unsure, say N. > > > > +config KMEMTRACE > > + bool "Kernel memory tracer (kmemtrace)" > > + depends on RELAY && DEBUG_FS && MARKERS > > + help > > + kmemtrace provides tracing for slab allocator functions, such as > > + kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected > > + data is then fed to the userspace application in order to analyse > > + allocation hotspots, internal fragmentation and so on, making it > > + possible to see how well an allocator performs, as well as debug > > + and profile kernel code. > > + > > + This requires an userspace application to use. See > > + Documentation/vm/kmemtrace.txt for more information. > > + > > + Saying Y will make the kernel somewhat larger and slower. However, > > + if you disable kmemtrace at run-time or boot-time, the performance > > + impact is minimal (depending on the arch the kernel is built for). > > + > > + If unsure, say N. > > + > > +config KMEMTRACE_DEFAULT_ENABLED > > + bool "Enabled by default at boot" > > + depends on KMEMTRACE > > + help > > + Say Y here to enable kmemtrace at boot-time by default. Whatever > > + the choice, the behavior can be overridden by a kernel parameter, > > + as described in documentation. > > + > > source "samples/Kconfig" > > > > source "lib/Kconfig.kgdb" > > diff --git a/mm/Makefile b/mm/Makefile > > index 18c143b..d88a3bc 100644 > > --- a/mm/Makefile > > +++ b/mm/Makefile > > @@ -33,4 +33,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o > > obj-$(CONFIG_SMP) += allocpercpu.o > > obj-$(CONFIG_QUICKLIST) += quicklist.o > > obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o > > - > > +obj-$(CONFIG_KMEMTRACE) += kmemtrace.o > > diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c > > new file mode 100644 > > index 0000000..4b33ace > > --- /dev/null > > +++ b/mm/kmemtrace.c > > @@ -0,0 +1,244 @@ > > +/* > > + * Copyright (C) 2008 Pekka Enberg, Eduard - Gabriel Munteanu > > + * > > + * This file is released under GPL version 2. > > + */ > > + > > +#include <linux/string.h> > > +#include <linux/debugfs.h> > > +#include <linux/relay.h> > > +#include <linux/module.h> > > +#include <linux/marker.h> > > +#include <linux/gfp.h> > > +#include <linux/kmemtrace.h> > > + > > +#define KMEMTRACE_SUBBUF_SIZE (8192 * sizeof(struct kmemtrace_event)) > > +#define KMEMTRACE_N_SUBBUFS 20 > > + Isn't this overridable by a command line param ? Shouldn't it be called "DEFAULT_KMEMTRACE_*" then ? > > +static struct rchan *kmemtrace_chan; > > +static u32 kmemtrace_buf_overruns; > > + > > +static unsigned int kmemtrace_n_subbufs; > > +#ifdef CONFIG_KMEMTRACE_DEFAULT_ENABLED > > +static unsigned int kmemtrace_enabled = 1; > > +#else > > +static unsigned int kmemtrace_enabled = 0; > > +#endif Hrm, I'd leave that as a kernel command line option, not config option. If you ever want to _aways_ have it on, then change your lilo/grub file. > > + > > +static u32 kmemtrace_abi_version __read_mostly = KMEMTRACE_ABI_VERSION; > > + > > +static inline void kmemtrace_log_event(struct kmemtrace_event *event) > > +{ > > + relay_write(kmemtrace_chan, event, sizeof(struct kmemtrace_event)); > > +} > > + > > +static void kmemtrace_probe_alloc(void *probe_data, void *call_data, > > + const char *format, va_list *args) > > +{ > > + unsigned long flags; > > + struct kmemtrace_event ev; > > + > > + /* > > + * Don't convert this to use structure initializers, > > + * C99 does not guarantee the rvalues evaluation order. > > + */ > > + ev.event_id = KMEMTRACE_EVENT_ALLOC; > > + ev.type_id = va_arg(*args, int); > > + ev.event_size = sizeof(struct kmemtrace_event); > > + ev.call_site = va_arg(*args, unsigned long); > > + ev.ptr = va_arg(*args, unsigned long); Argh, and you do a supplementary copy here. You could simply alias the buffers and write directly to them after reserving the correct amount of space. > > + /* Don't trace ignored allocations. */ > > + if (!ev.ptr) > > + return; > > + ev.bytes_req = va_arg(*args, unsigned long); > > + ev.bytes_alloc = va_arg(*args, unsigned long); > > + /* ev.timestamp set below, to preserve event ordering. */ > > + ev.gfp_flags = va_arg(*args, unsigned long); > > + ev.node = va_arg(*args, int); > > + > > + /* We disable IRQs for timestamps to match event ordering. */ > > + local_irq_save(flags); > > + ev.timestamp = ktime_to_ns(ktime_get()); ktime_get is monotonic, but with potentially coarse granularity. I see that you use ktime_to_ns here, which gives you a resolution of 1 timer tick in the case where the TSCs are not synchronized. While it should be "good enough" for the scheduler, I doubt it's enough for a tracer. It also takes the xtime seqlock, which adds a potentially big delay to the tracing code (if you read the clock while the writer lock is taken). Also, when NTP modifies the clock, although it stays monotonic, the rate at which it increments can dramatically change. I doubt you want to use that as a reference for performance analysis. > > + kmemtrace_log_event(&ev); > > + local_irq_restore(flags); > > +} > > + > > +static void kmemtrace_probe_free(void *probe_data, void *call_data, > > + const char *format, va_list *args) > > +{ > > + unsigned long flags; > > + struct kmemtrace_event ev; > > + > > + /* > > + * Don't convert this to use structure initializers, > > + * C99 does not guarantee the rvalues evaluation order. > > + */ > > + ev.event_id = KMEMTRACE_EVENT_FREE; > > + ev.type_id = va_arg(*args, int); > > + ev.event_size = sizeof(struct kmemtrace_event); > > + ev.call_site = va_arg(*args, unsigned long); > > + ev.ptr = va_arg(*args, unsigned long); > > + /* Don't trace ignored allocations. */ > > + if (!ev.ptr) > > + return; > > + /* ev.timestamp set below, to preserve event ordering. */ > > + > > + /* We disable IRQs for timestamps to match event ordering. */ > > + local_irq_save(flags); > > + ev.timestamp = ktime_to_ns(ktime_get()); > > + kmemtrace_log_event(&ev); > > + local_irq_restore(flags); > > +} > > + > > +static struct dentry * > > +kmemtrace_create_buf_file(const char *filename, struct dentry *parent, > > + int mode, struct rchan_buf *buf, int *is_global) > > +{ > > + return debugfs_create_file(filename, mode, parent, buf, > > + &relay_file_operations); > > +} > > + > > +static int kmemtrace_remove_buf_file(struct dentry *dentry) > > +{ > > + debugfs_remove(dentry); > > + > > + return 0; > > +} > > + > > +static int kmemtrace_count_overruns(struct rchan_buf *buf, > > + void *subbuf, void *prev_subbuf, > > + size_t prev_padding) > > +{ > > + if (relay_buf_full(buf)) { > > + /* > > + * We know it's not SMP-safe, but neither > > + * debugfs_create_u32() is. > > + */ > > + kmemtrace_buf_overruns++; > > + return 0; > > + } > > + > > + return 1; > > +} > > + > > +static struct rchan_callbacks relay_callbacks = { > > + .create_buf_file = kmemtrace_create_buf_file, > > + .remove_buf_file = kmemtrace_remove_buf_file, > > + .subbuf_start = kmemtrace_count_overruns, > > +}; > > + > > +static struct dentry *kmemtrace_dir; > > +static struct dentry *kmemtrace_overruns_dentry; > > +static struct dentry *kmemtrace_abi_version_dentry; > > + > > +static void kmemtrace_cleanup(void) > > +{ > > + marker_probe_unregister("kmemtrace_alloc", kmemtrace_probe_alloc, NULL); > > + marker_probe_unregister("kmemtrace_free", kmemtrace_probe_free, NULL); > > + > > + if (kmemtrace_abi_version_dentry) > > + debugfs_remove(kmemtrace_abi_version_dentry); > > + if (kmemtrace_overruns_dentry) > > + debugfs_remove(kmemtrace_overruns_dentry); > > + > > + relay_close(kmemtrace_chan); > > + kmemtrace_chan = NULL; > > + > > + if (kmemtrace_dir) > > + debugfs_remove(kmemtrace_dir); > > +} > > + > > +static int __init kmemtrace_setup_late(void) > > +{ > > + if (!kmemtrace_chan) > > + goto failed; > > + > > + kmemtrace_dir = debugfs_create_dir("kmemtrace", NULL); > > + if (!kmemtrace_dir) > > + goto cleanup; > > + > > + kmemtrace_abi_version_dentry = > > + debugfs_create_u32("abi_version", S_IRUSR, > > + kmemtrace_dir, &kmemtrace_abi_version); > > + kmemtrace_overruns_dentry = > > + debugfs_create_u32("total_overruns", S_IRUSR, > > + kmemtrace_dir, &kmemtrace_buf_overruns); > > + if (!kmemtrace_overruns_dentry || !kmemtrace_abi_version_dentry) > > + goto cleanup; > > + > > + if (relay_late_setup_files(kmemtrace_chan, "cpu", kmemtrace_dir)) > > + goto cleanup; > > + > > + printk(KERN_INFO "kmemtrace: fully up.\n"); > > + > > + return 0; > > + > > +cleanup: > > + kmemtrace_cleanup(); > > +failed: > > + return 1; > > +} > > +late_initcall(kmemtrace_setup_late); > > + > > +static int __init kmemtrace_set_boot_enabled(char *str) > > +{ > > + if (!str) > > + return -EINVAL; > > + > > + if (!strcmp(str, "yes")) I think the standard is to use =0, =1 here, not =yes, =no ? Mathieu > > + kmemtrace_enabled = 1; > > + else if (!strcmp(str, "no")) > > + kmemtrace_enabled = 0; > > + else > > + return -EINVAL; > > + > > + return 0; > > +} > > +early_param("kmemtrace.enable", kmemtrace_set_boot_enabled); > > + > > +static int __init kmemtrace_set_subbufs(char *str) > > +{ > > + get_option(&str, &kmemtrace_n_subbufs); > > + return 0; > > +} > > +early_param("kmemtrace.subbufs", kmemtrace_set_subbufs); > > + > > +void kmemtrace_init(void) > > +{ > > + int err; > > + > > + if (!kmemtrace_enabled) > > + return; > > + > > + if (!kmemtrace_n_subbufs) > > + kmemtrace_n_subbufs = KMEMTRACE_N_SUBBUFS; > > + > > + kmemtrace_chan = relay_open(NULL, NULL, KMEMTRACE_SUBBUF_SIZE, > > + kmemtrace_n_subbufs, &relay_callbacks, > > + NULL); > > + if (!kmemtrace_chan) { > > + printk(KERN_INFO "kmemtrace: could not open relay channel\n"); > > + return; > > + } > > + > > + err = marker_probe_register("kmemtrace_alloc", "type_id %d " > > + "call_site %lu ptr %lu " > > + "bytes_req %lu bytes_alloc %lu " > > + "gfp_flags %lu node %d", > > + kmemtrace_probe_alloc, NULL); > > + if (err) > > + goto probe_fail; > > + err = marker_probe_register("kmemtrace_free", "type_id %d " > > + "call_site %lu ptr %lu", > > + kmemtrace_probe_free, NULL); > > + if (err) > > + goto probe_fail; > > + > > + printk(KERN_INFO "kmemtrace: early init successful.\n"); > > + return; > > + > > +probe_fail: > > + printk(KERN_INFO "kmemtrace: could not register marker probes!\n"); > > + kmemtrace_cleanup(); > > +} > > + > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-28 16:29 ` Mathieu Desnoyers @ 2008-07-28 17:09 ` Matt Mackall 2008-07-28 17:35 ` Eduard - Gabriel Munteanu 1 sibling, 0 replies; 30+ messages in thread From: Matt Mackall @ 2008-07-28 17:09 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Pekka Enberg, Eduard - Gabriel Munteanu, cl, linux-mm, linux-kernel, rdunlap, Steven Rostedt, Thomas Gleixner, Michael Kerrisk On Mon, 2008-07-28 at 12:29 -0400, Mathieu Desnoyers wrote: > * Pekka Enberg (penberg@cs.helsinki.fi) wrote: > > Hi, > > > > [I'm cc'ing Mathieu if he wants to comment on this.] > > > > On Tue, 2008-07-22 at 21:36 +0300, Eduard - Gabriel Munteanu wrote: > > > kmemtrace provides tracing for slab allocator functions, such as kmalloc, > > > kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed > > > to the userspace application in order to analyse allocation hotspots, > > > internal fragmentation and so on, making it possible to see how well an > > > allocator performs, as well as debug and profile kernel code. > > > > Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> > > > > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > > > --- > > > Documentation/ABI/testing/debugfs-kmemtrace | 58 +++++++ > > > Documentation/kernel-parameters.txt | 10 + > > > Documentation/vm/kmemtrace.txt | 126 ++++++++++++++ > > > MAINTAINERS | 6 + > > > include/linux/kmemtrace.h | 110 ++++++++++++ > > > init/main.c | 2 + > > > lib/Kconfig.debug | 28 +++ > > > mm/Makefile | 2 +- > > > mm/kmemtrace.c | 244 +++++++++++++++++++++++++++ > > > 9 files changed, 585 insertions(+), 1 deletions(-) > > > create mode 100644 Documentation/ABI/testing/debugfs-kmemtrace > > > create mode 100644 Documentation/vm/kmemtrace.txt > > > create mode 100644 include/linux/kmemtrace.h > > > create mode 100644 mm/kmemtrace.c > > > > > > diff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace > > > new file mode 100644 > > > index 0000000..466c2bb > > > --- /dev/null > > > +++ b/Documentation/ABI/testing/debugfs-kmemtrace > > Documentation should probably come in a separate patch. We've also a new Documentation/ maintainer these days (in addition to Randy) who should get cc:ed: Michael Kerrisk <michael.kerrisk@googlemail.com> > > > @@ -0,0 +1,58 @@ > > > +What: /sys/kernel/debug/kmemtrace/ > > > +Date: July 2008 > > > +Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> > > > +Description: > > > + > > > +In kmemtrace-enabled kernels, the following files are created: > > > + > > > +/sys/kernel/debug/kmemtrace/ > > > + cpu<n> (0400) Per-CPU tracing data, see below. (binary) > > > + total_overruns (0400) Total number of bytes which were dropped from > > > + cpu<n> files because of full buffer condition, > > > + non-binary. (text) > > > + abi_version (0400) Kernel's kmemtrace ABI version. (text) > > > + > > > +Each per-CPU file should be read according to the relay interface. That is, > > > +the reader should set affinity to that specific CPU and, as currently done by > > > +the userspace application (though there are other methods), use poll() with > > > +an infinite timeout before every read(). Otherwise, erroneous data may be > > > +read. The binary data has the following _core_ format: > > > + Event id (1 byte) Unsigned integer, one of: > > > + 0 - erroneous event, this is illegal/invalid data and must > > > + not occur (KMEMTRACE_EVENT_NULL) > > Hmm ? why record an invalid event ?? I see it's not used in the code, is > that actually used in some way because the memory is set to 0 ? > > > > > + 1 - represents an allocation (KMEMTRACE_EVENT_ALLOC) > > > + 2 - represents a freeing of previously allocated memory > > > + (KMEMTRACE_EVENT_FREE) > > > + Type id (1 byte) Unsigned integer, one of: > > > + 0 - this is a kmalloc() / kfree() > > > + 1 - this is a kmem_cache_alloc() / kmem_cache_free() > > > + 2 - this is a __get_free_pages() et al. > > > + Event size (2 bytes) Unsigned integer representing the > > > + size of this event. Used to extend > > > + kmemtrace. Discard the bytes you > > > + don't know about. > > > + Target CPU (4 bytes) Signed integer, valid for event id 1. > > > + If equal to -1, target CPU is the same > > > + as origin CPU, but the reverse might > > > + not be true. > > If only valid for event ID 1 and only in NUMA case, please don't waste > space in each event header and make that a event-specific field... ? > > > > + Caller address (8 bytes) Return address to the caller. > > Not true on 32 bits machines. You are wasting 4 bytes on those archs. > > > > + Pointer to mem (8 bytes) Pointer to allocated memory, must not > > > + be NULL. > > Same here. > > > > + Requested bytes (8 bytes) Total number of requested bytes, > > > + unsigned, must not be zero. > > Same here. > > > > + Allocated bytes (8 bytes) Total number of actually allocated > > > + bytes, unsigned, must not be lower > > > + than requested bytes. > > And here. > > > > + Requested flags (8 bytes) GFP flags supplied by the caller. > > 8 bytes for GFP flags ?? Whoah, that's a lot of one-hot bits ! :) I knew > that some allocators were bloated, bit not that much. :) > > > > + Timestamp (8 bytes) Signed integer representing timestamp. > > > + > > With a heartbeat, as lttng does, you can cut that to a 4 bytes field. > > > > +The data is made available in the same endianness the machine has. > > > + > > Using a magic number in the trace header lets you deal with > cross-endianness. > > Saving the type sizes in the trace header lets you deal with different > int/long/pointer type sizes. > > > > +Other event ids and type ids may be defined and added. Other fields may be > > > +added by increasing event size. Every modification to the ABI, including > > > +new id definitions, are followed by bumping the ABI version by one. > > > + > > I personally prefer a self-describing trace :) > > > > + > > > +Users: > > > + kmemtrace-user - git://repo.or.cz/kmemtrace-user.git > > > + > > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > > > index b52f47d..446a257 100644 > > > --- a/Documentation/kernel-parameters.txt > > > +++ b/Documentation/kernel-parameters.txt > > > @@ -49,6 +49,7 @@ parameter is applicable: > > > ISAPNP ISA PnP code is enabled. > > > ISDN Appropriate ISDN support is enabled. > > > JOY Appropriate joystick support is enabled. > > > + KMEMTRACE kmemtrace is enabled. > > > LIBATA Libata driver is enabled > > > LP Printer support is enabled. > > > LOOP Loopback device support is enabled. > > > @@ -941,6 +942,15 @@ and is between 256 and 4096 characters. It is defined in the file > > > use the HighMem zone if it exists, and the Normal > > > zone if it does not. > > > > > > + kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no } > > > + Controls whether kmemtrace is enabled > > > + at boot-time. > > > + > > > + kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of > > > + subbufs kmemtrace's relay channel has. Set this > > > + higher than default (KMEMTRACE_N_SUBBUFS in code) if > > > + you experience buffer overruns. > > > + > > That kind of stuff would be nice to have in lttng. > > > > movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This parameter > > > is similar to kernelcore except it specifies the > > > amount of memory used for migratable allocations. > > > diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt > > > new file mode 100644 > > > index 0000000..75360b1 > > > --- /dev/null > > > +++ b/Documentation/vm/kmemtrace.txt > > > @@ -0,0 +1,126 @@ > > > + kmemtrace - Kernel Memory Tracer > > > + > > > + by Eduard - Gabriel Munteanu > > > + <eduard.munteanu@linux360.ro> > > > + > > > +I. Introduction > > > +=============== > > > + > > > +kmemtrace helps kernel developers figure out two things: > > > +1) how different allocators (SLAB, SLUB etc.) perform > > > +2) how kernel code allocates memory and how much > > > + > > > +To do this, we trace every allocation and export information to the userspace > > > +through the relay interface. We export things such as the number of requested > > > +bytes, the number of bytes actually allocated (i.e. including internal > > > +fragmentation), whether this is a slab allocation or a plain kmalloc() and so > > > +on. > > > + > > > +The actual analysis is performed by a userspace tool (see section III for > > > +details on where to get it from). It logs the data exported by the kernel, > > > +processes it and (as of writing this) can provide the following information: > > > +- the total amount of memory allocated and fragmentation per call-site > > > +- the amount of memory allocated and fragmentation per allocation > > > +- total memory allocated and fragmentation in the collected dataset > > > +- number of cross-CPU allocation and frees (makes sense in NUMA environments) > > > + > > > +Moreover, it can potentially find inconsistent and erroneous behavior in > > > +kernel code, such as using slab free functions on kmalloc'ed memory or > > > +allocating less memory than requested (but not truly failed allocations). > > > + > > > +kmemtrace also makes provisions for tracing on some arch and analysing the > > > +data on another. > > > + > > > +II. Design and goals > > > +==================== > > > + > > > +kmemtrace was designed to handle rather large amounts of data. Thus, it uses > > > +the relay interface to export whatever is logged to userspace, which then > > > +stores it. Analysis and reporting is done asynchronously, that is, after the > > > +data is collected and stored. By design, it allows one to log and analyse > > > +on different machines and different arches. > > > + > > > +As of writing this, the ABI is not considered stable, though it might not > > > +change much. However, no guarantees are made about compatibility yet. When > > > +deemed stable, the ABI should still allow easy extension while maintaining > > > +backward compatibility. This is described further in Documentation/ABI. > > > + > > > +Summary of design goals: > > > + - allow logging and analysis to be done across different machines > > Not currently true : cross-endianness/wastes space for 32 bits archs. > > > > + - be fast and anticipate usage in high-load environments (*) > > LTTng will be faster though : per-cpu atomic ops instead of interrupt > disable makes the probe faster. > > > > + - be reasonably extensible > > Automatic description of markers and dynamic assignation of IDs to > markers should provide a bit more flexibility here. > > > > > + - make it possible for GNU/Linux distributions to have kmemtrace > > > + included in their repositories > > > + > > > +(*) - one of the reasons Pekka Enberg's original userspace data analysis > > > + tool's code was rewritten from Perl to C (although this is more than a > > > + simple conversion) > > > + > > > + > > > +III. Quick usage guide > > > +====================== > > > + > > > +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable > > > +CONFIG_KMEMTRACE and CONFIG_DEFAULT_ENABLED). > > > + > > > +2) Get the userspace tool and build it: > > > +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository > > > +$ cd kmemtrace-user/ > > > +$ ./autogen.sh > > > +$ ./configure > > > +$ make > > > + > > > +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the > > > +'single' runlevel (so that relay buffers don't fill up easily), and run > > > +kmemtrace: > > > +# '$' does not mean user, but root here. > > Change the documentation to prefix a root command line by "#" instead of > leaving this weird comment. > > > > +$ mount -t debugfs none /sys/kernel/debug > > > +$ mount -t proc none /proc > > > +$ cd path/to/kmemtrace-user/ > > > +$ ./kmemtraced > > > +Wait a bit, then stop it with CTRL+C. > > > +$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't > > > + # overrun, should > > > + # be zero. > > > +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to > > > + check its correctness] > > > +$ ./kmemtrace-report > > > + > > > +Now you should have a nice and short summary of how the allocator performs. > > > + > > > +IV. FAQ and known issues > > > +======================== > > > + > > > +Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix > > > +this? Should I worry? > > > +A: If it's non-zero, this affects kmemtrace's accuracy, depending on how > > > +large the number is. You can fix it by supplying a higher > > > +'kmemtrace.subbufs=N' kernel parameter. > > > +--- > > > + > > > +Q: kmemtrace_check reports errors, how do I fix this? Should I worry? > > > +A: This is a bug and should be reported. It can occur for a variety of > > > +reasons: > > > + - possible bugs in relay code > > > + - possible misuse of relay by kmemtrace > > > + - timestamps being collected unorderly > > > +Or you may fix it yourself and send us a patch. > > > +--- > > > + > > > +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? > > > +A: This is a known issue and I'm working on it. These might be true errors > > > +in kernel code, which may have inconsistent behavior (e.g. allocating memory > > > +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed > > > +out this behavior may work with SLAB, but may fail with other allocators. > > > + > > > +It may also be due to lack of tracing in some unusual allocator functions. > > > + > > > +We don't want bug reports regarding this issue yet. > > What in the world can be causing that ? Shouldn't it be fixed ? It might > be due to unexpected allocator behavior, non-instrumented alloc/free > code or broken tracer.... > > > > > +--- > > > + > > > +V. See also > > > +=========== > > > + > > > +Documentation/kernel-parameters.txt > > > +Documentation/ABI/testing/debugfs-kmemtrace > > > + > > > diff --git a/MAINTAINERS b/MAINTAINERS > > > index 56a2f67..e967bc2 100644 > > > --- a/MAINTAINERS > > > +++ b/MAINTAINERS > > > @@ -2425,6 +2425,12 @@ M: jason.wessel@windriver.com > > > L: kgdb-bugreport@lists.sourceforge.net > > > S: Maintained > > > > > > +KMEMTRACE > > > +P: Eduard - Gabriel Munteanu > > > +M: eduard.munteanu@linux360.ro > > > +L: linux-kernel@vger.kernel.org > > > +S: Maintained > > > + > > > KPROBES > > > P: Ananth N Mavinakayanahalli > > > M: ananth@in.ibm.com > > > diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h > > > new file mode 100644 > > > index 0000000..48746ee > > > --- /dev/null > > > +++ b/include/linux/kmemtrace.h > > > @@ -0,0 +1,110 @@ > > > +/* > > > + * Copyright (C) 2008 Eduard - Gabriel Munteanu > > > + * > > > + * This file is released under GPL version 2. > > > + */ > > > + > > > +#ifndef _LINUX_KMEMTRACE_H > > > +#define _LINUX_KMEMTRACE_H > > > + > > > +#ifdef __KERNEL__ > > > + > > > +#include <linux/types.h> > > > +#include <linux/marker.h> > > > + > > > +/* ABI definition starts here. */ > > > + > > > +#define KMEMTRACE_ABI_VERSION 1 > > > + > > > +enum kmemtrace_event_id { > > > + KMEMTRACE_EVENT_NULL = 0, /* Erroneous event. */ > > > + KMEMTRACE_EVENT_ALLOC, > > > + KMEMTRACE_EVENT_FREE, > > > +}; > > > + > > > +enum kmemtrace_type_id { > > > + KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() / kfree(). */ > > > + KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */ > > > + KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */ > > > +}; > > > + > > > +struct kmemtrace_event { > > > + u8 event_id; /* Allocate or free? */ > > > + u8 type_id; /* Kind of allocation/free. */ > > > + u16 event_size; /* Size of event */ > > > + s32 node; /* Target CPU. */ > > > + u64 call_site; /* Caller address. */ > > > + u64 ptr; /* Pointer to allocation. */ > > > + u64 bytes_req; /* Number of bytes requested. */ > > > + u64 bytes_alloc; /* Number of bytes allocated. */ > > > + u64 gfp_flags; /* Requested flags. */ > > > + s64 timestamp; /* When the operation occured in ns. */ > > > +} __attribute__ ((__packed__)); > > > + > > See below for detail, but this event record is way too big and not > adapted to 32 bits architectures. > > > > +/* End of ABI definition. */ > > > + > > > +#ifdef CONFIG_KMEMTRACE > > > + > > > +extern void kmemtrace_init(void); > > > + > > > +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, > > > + unsigned long call_site, > > > + const void *ptr, > > > + size_t bytes_req, > > > + size_t bytes_alloc, > > > + gfp_t gfp_flags, > > > + int node) > > > +{ > > > + trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu " > > > + "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d", > > > + type_id, call_site, (unsigned long) ptr, > > > + bytes_req, bytes_alloc, (unsigned long) gfp_flags, node); > > > +} > > > + > > > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, > > > + unsigned long call_site, > > > + const void *ptr) > > > +{ > > > + trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu", > > > + type_id, call_site, (unsigned long) ptr); > > > +} > > This could be trivially turned into a tracepoint probe. > > > > + > > > +#else /* CONFIG_KMEMTRACE */ > > > + > > > +static inline void kmemtrace_init(void) > > > +{ > > > +} > > > + > > > +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id, > > > + unsigned long call_site, > > > + const void *ptr, > > > + size_t bytes_req, > > > + size_t bytes_alloc, > > > + gfp_t gfp_flags, > > > + int node) > > > +{ > > > +} > > > + > > > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, > > > + unsigned long call_site, > > > + const void *ptr) > > > +{ > > > +} > > > + > > > +#endif /* CONFIG_KMEMTRACE */ > > > + > > > +static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id, > > > + unsigned long call_site, > > > + const void *ptr, > > > + size_t bytes_req, > > > + size_t bytes_alloc, > > > + gfp_t gfp_flags) > > > +{ > > > + kmemtrace_mark_alloc_node(type_id, call_site, ptr, > > > + bytes_req, bytes_alloc, gfp_flags, -1); > > > +} > > > + > > > +#endif /* __KERNEL__ */ > > > + > > > +#endif /* _LINUX_KMEMTRACE_H */ > > > + > > > diff --git a/init/main.c b/init/main.c > > > index 057f364..c00659c 100644 > > > --- a/init/main.c > > > +++ b/init/main.c > > > @@ -66,6 +66,7 @@ > > > #include <asm/setup.h> > > > #include <asm/sections.h> > > > #include <asm/cacheflush.h> > > > +#include <linux/kmemtrace.h> > > > > > > #ifdef CONFIG_X86_LOCAL_APIC > > > #include <asm/smp.h> > > > @@ -641,6 +642,7 @@ asmlinkage void __init start_kernel(void) > > > enable_debug_pagealloc(); > > > cpu_hotplug_init(); > > > kmem_cache_init(); > > > + kmemtrace_init(); > > > debug_objects_mem_init(); > > > idr_init_cache(); > > > setup_per_cpu_pageset(); > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > > index d2099f4..0ade2ae 100644 > > > --- a/lib/Kconfig.debug > > > +++ b/lib/Kconfig.debug > > > @@ -674,6 +674,34 @@ config FIREWIRE_OHCI_REMOTE_DMA > > > > > > If unsure, say N. > > > > > > +config KMEMTRACE > > > + bool "Kernel memory tracer (kmemtrace)" > > > + depends on RELAY && DEBUG_FS && MARKERS > > > + help > > > + kmemtrace provides tracing for slab allocator functions, such as > > > + kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected > > > + data is then fed to the userspace application in order to analyse > > > + allocation hotspots, internal fragmentation and so on, making it > > > + possible to see how well an allocator performs, as well as debug > > > + and profile kernel code. > > > + > > > + This requires an userspace application to use. See > > > + Documentation/vm/kmemtrace.txt for more information. > > > + > > > + Saying Y will make the kernel somewhat larger and slower. However, > > > + if you disable kmemtrace at run-time or boot-time, the performance > > > + impact is minimal (depending on the arch the kernel is built for). > > > + > > > + If unsure, say N. > > > + > > > +config KMEMTRACE_DEFAULT_ENABLED > > > + bool "Enabled by default at boot" > > > + depends on KMEMTRACE > > > + help > > > + Say Y here to enable kmemtrace at boot-time by default. Whatever > > > + the choice, the behavior can be overridden by a kernel parameter, > > > + as described in documentation. > > > + > > > source "samples/Kconfig" > > > > > > source "lib/Kconfig.kgdb" > > > diff --git a/mm/Makefile b/mm/Makefile > > > index 18c143b..d88a3bc 100644 > > > --- a/mm/Makefile > > > +++ b/mm/Makefile > > > @@ -33,4 +33,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o > > > obj-$(CONFIG_SMP) += allocpercpu.o > > > obj-$(CONFIG_QUICKLIST) += quicklist.o > > > obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o > > > - > > > +obj-$(CONFIG_KMEMTRACE) += kmemtrace.o > > > diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c > > > new file mode 100644 > > > index 0000000..4b33ace > > > --- /dev/null > > > +++ b/mm/kmemtrace.c > > > @@ -0,0 +1,244 @@ > > > +/* > > > + * Copyright (C) 2008 Pekka Enberg, Eduard - Gabriel Munteanu > > > + * > > > + * This file is released under GPL version 2. > > > + */ > > > + > > > +#include <linux/string.h> > > > +#include <linux/debugfs.h> > > > +#include <linux/relay.h> > > > +#include <linux/module.h> > > > +#include <linux/marker.h> > > > +#include <linux/gfp.h> > > > +#include <linux/kmemtrace.h> > > > + > > > +#define KMEMTRACE_SUBBUF_SIZE (8192 * sizeof(struct kmemtrace_event)) > > > +#define KMEMTRACE_N_SUBBUFS 20 > > > + > > Isn't this overridable by a command line param ? Shouldn't it be called > "DEFAULT_KMEMTRACE_*" then ? > > > > +static struct rchan *kmemtrace_chan; > > > +static u32 kmemtrace_buf_overruns; > > > + > > > +static unsigned int kmemtrace_n_subbufs; > > > +#ifdef CONFIG_KMEMTRACE_DEFAULT_ENABLED > > > +static unsigned int kmemtrace_enabled = 1; > > > +#else > > > +static unsigned int kmemtrace_enabled = 0; > > > +#endif > > Hrm, I'd leave that as a kernel command line option, not config option. > If you ever want to _aways_ have it on, then change your lilo/grub file. > > > > + > > > +static u32 kmemtrace_abi_version __read_mostly = KMEMTRACE_ABI_VERSION; > > > + > > > +static inline void kmemtrace_log_event(struct kmemtrace_event *event) > > > +{ > > > + relay_write(kmemtrace_chan, event, sizeof(struct kmemtrace_event)); > > > +} > > > + > > > +static void kmemtrace_probe_alloc(void *probe_data, void *call_data, > > > + const char *format, va_list *args) > > > +{ > > > + unsigned long flags; > > > + struct kmemtrace_event ev; > > > + > > > + /* > > > + * Don't convert this to use structure initializers, > > > + * C99 does not guarantee the rvalues evaluation order. > > > + */ > > > + ev.event_id = KMEMTRACE_EVENT_ALLOC; > > > + ev.type_id = va_arg(*args, int); > > > + ev.event_size = sizeof(struct kmemtrace_event); > > > + ev.call_site = va_arg(*args, unsigned long); > > > + ev.ptr = va_arg(*args, unsigned long); > > Argh, and you do a supplementary copy here. You could simply alias the > buffers and write directly to them after reserving the correct amount of > space. > > > > + /* Don't trace ignored allocations. */ > > > + if (!ev.ptr) > > > + return; > > > + ev.bytes_req = va_arg(*args, unsigned long); > > > + ev.bytes_alloc = va_arg(*args, unsigned long); > > > + /* ev.timestamp set below, to preserve event ordering. */ > > > + ev.gfp_flags = va_arg(*args, unsigned long); > > > + ev.node = va_arg(*args, int); > > > + > > > + /* We disable IRQs for timestamps to match event ordering. */ > > > + local_irq_save(flags); > > > + ev.timestamp = ktime_to_ns(ktime_get()); > > ktime_get is monotonic, but with potentially coarse granularity. I see > that you use ktime_to_ns here, which gives you a resolution of 1 timer > tick in the case where the TSCs are not synchronized. While it should be > "good enough" for the scheduler, I doubt it's enough for a tracer. > > It also takes the xtime seqlock, which adds a potentially big delay to > the tracing code (if you read the clock while the writer lock is taken). > > Also, when NTP modifies the clock, although it stays monotonic, the rate > at which it increments can dramatically change. I doubt you want to use > that as a reference for performance analysis. > > > > > + kmemtrace_log_event(&ev); > > > + local_irq_restore(flags); > > > +} > > > + > > > +static void kmemtrace_probe_free(void *probe_data, void *call_data, > > > + const char *format, va_list *args) > > > +{ > > > + unsigned long flags; > > > + struct kmemtrace_event ev; > > > + > > > + /* > > > + * Don't convert this to use structure initializers, > > > + * C99 does not guarantee the rvalues evaluation order. > > > + */ > > > + ev.event_id = KMEMTRACE_EVENT_FREE; > > > + ev.type_id = va_arg(*args, int); > > > + ev.event_size = sizeof(struct kmemtrace_event); > > > + ev.call_site = va_arg(*args, unsigned long); > > > + ev.ptr = va_arg(*args, unsigned long); > > > + /* Don't trace ignored allocations. */ > > > + if (!ev.ptr) > > > + return; > > > + /* ev.timestamp set below, to preserve event ordering. */ > > > + > > > + /* We disable IRQs for timestamps to match event ordering. */ > > > + local_irq_save(flags); > > > + ev.timestamp = ktime_to_ns(ktime_get()); > > > + kmemtrace_log_event(&ev); > > > + local_irq_restore(flags); > > > +} > > > + > > > +static struct dentry * > > > +kmemtrace_create_buf_file(const char *filename, struct dentry *parent, > > > + int mode, struct rchan_buf *buf, int *is_global) > > > +{ > > > + return debugfs_create_file(filename, mode, parent, buf, > > > + &relay_file_operations); > > > +} > > > + > > > +static int kmemtrace_remove_buf_file(struct dentry *dentry) > > > +{ > > > + debugfs_remove(dentry); > > > + > > > + return 0; > > > +} > > > + > > > +static int kmemtrace_count_overruns(struct rchan_buf *buf, > > > + void *subbuf, void *prev_subbuf, > > > + size_t prev_padding) > > > +{ > > > + if (relay_buf_full(buf)) { > > > + /* > > > + * We know it's not SMP-safe, but neither > > > + * debugfs_create_u32() is. > > > + */ > > > + kmemtrace_buf_overruns++; > > > + return 0; > > > + } > > > + > > > + return 1; > > > +} > > > + > > > +static struct rchan_callbacks relay_callbacks = { > > > + .create_buf_file = kmemtrace_create_buf_file, > > > + .remove_buf_file = kmemtrace_remove_buf_file, > > > + .subbuf_start = kmemtrace_count_overruns, > > > +}; > > > + > > > +static struct dentry *kmemtrace_dir; > > > +static struct dentry *kmemtrace_overruns_dentry; > > > +static struct dentry *kmemtrace_abi_version_dentry; > > > + > > > +static void kmemtrace_cleanup(void) > > > +{ > > > + marker_probe_unregister("kmemtrace_alloc", kmemtrace_probe_alloc, NULL); > > > + marker_probe_unregister("kmemtrace_free", kmemtrace_probe_free, NULL); > > > + > > > + if (kmemtrace_abi_version_dentry) > > > + debugfs_remove(kmemtrace_abi_version_dentry); > > > + if (kmemtrace_overruns_dentry) > > > + debugfs_remove(kmemtrace_overruns_dentry); > > > + > > > + relay_close(kmemtrace_chan); > > > + kmemtrace_chan = NULL; > > > + > > > + if (kmemtrace_dir) > > > + debugfs_remove(kmemtrace_dir); > > > +} > > > + > > > +static int __init kmemtrace_setup_late(void) > > > +{ > > > + if (!kmemtrace_chan) > > > + goto failed; > > > + > > > + kmemtrace_dir = debugfs_create_dir("kmemtrace", NULL); > > > + if (!kmemtrace_dir) > > > + goto cleanup; > > > + > > > + kmemtrace_abi_version_dentry = > > > + debugfs_create_u32("abi_version", S_IRUSR, > > > + kmemtrace_dir, &kmemtrace_abi_version); > > > + kmemtrace_overruns_dentry = > > > + debugfs_create_u32("total_overruns", S_IRUSR, > > > + kmemtrace_dir, &kmemtrace_buf_overruns); > > > + if (!kmemtrace_overruns_dentry || !kmemtrace_abi_version_dentry) > > > + goto cleanup; > > > + > > > + if (relay_late_setup_files(kmemtrace_chan, "cpu", kmemtrace_dir)) > > > + goto cleanup; > > > + > > > + printk(KERN_INFO "kmemtrace: fully up.\n"); > > > + > > > + return 0; > > > + > > > +cleanup: > > > + kmemtrace_cleanup(); > > > +failed: > > > + return 1; > > > +} > > > +late_initcall(kmemtrace_setup_late); > > > + > > > +static int __init kmemtrace_set_boot_enabled(char *str) > > > +{ > > > + if (!str) > > > + return -EINVAL; > > > + > > > + if (!strcmp(str, "yes")) > > I think the standard is to use =0, =1 here, not =yes, =no ? > > Mathieu > > > > + kmemtrace_enabled = 1; > > > + else if (!strcmp(str, "no")) > > > + kmemtrace_enabled = 0; > > > + else > > > + return -EINVAL; > > > + > > > + return 0; > > > +} > > > +early_param("kmemtrace.enable", kmemtrace_set_boot_enabled); > > > + > > > +static int __init kmemtrace_set_subbufs(char *str) > > > +{ > > > + get_option(&str, &kmemtrace_n_subbufs); > > > + return 0; > > > +} > > > +early_param("kmemtrace.subbufs", kmemtrace_set_subbufs); > > > + > > > +void kmemtrace_init(void) > > > +{ > > > + int err; > > > + > > > + if (!kmemtrace_enabled) > > > + return; > > > + > > > + if (!kmemtrace_n_subbufs) > > > + kmemtrace_n_subbufs = KMEMTRACE_N_SUBBUFS; > > > + > > > + kmemtrace_chan = relay_open(NULL, NULL, KMEMTRACE_SUBBUF_SIZE, > > > + kmemtrace_n_subbufs, &relay_callbacks, > > > + NULL); > > > + if (!kmemtrace_chan) { > > > + printk(KERN_INFO "kmemtrace: could not open relay channel\n"); > > > + return; > > > + } > > > + > > > + err = marker_probe_register("kmemtrace_alloc", "type_id %d " > > > + "call_site %lu ptr %lu " > > > + "bytes_req %lu bytes_alloc %lu " > > > + "gfp_flags %lu node %d", > > > + kmemtrace_probe_alloc, NULL); > > > + if (err) > > > + goto probe_fail; > > > + err = marker_probe_register("kmemtrace_free", "type_id %d " > > > + "call_site %lu ptr %lu", > > > + kmemtrace_probe_free, NULL); > > > + if (err) > > > + goto probe_fail; > > > + > > > + printk(KERN_INFO "kmemtrace: early init successful.\n"); > > > + return; > > > + > > > +probe_fail: > > > + printk(KERN_INFO "kmemtrace: could not register marker probes!\n"); > > > + kmemtrace_cleanup(); > > > +} > > > + > > > > -- > Mathieu Desnoyers > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-28 16:29 ` Mathieu Desnoyers 2008-07-28 17:09 ` Matt Mackall @ 2008-07-28 17:35 ` Eduard - Gabriel Munteanu 2008-07-29 8:25 ` Pekka Enberg 1 sibling, 1 reply; 30+ messages in thread From: Eduard - Gabriel Munteanu @ 2008-07-28 17:35 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Pekka Enberg, cl, linux-mm, linux-kernel, rdunlap, mpm, Steven Rostedt, Thomas Gleixner [-- Attachment #1: Type: text/plain, Size: 9416 bytes --] On Mon, Jul 28, 2008 at 12:29:17PM -0400, Mathieu Desnoyers wrote: > Hmm ? why record an invalid event ?? I see it's not used in the code, is > that actually used in some way because the memory is set to 0 ? The relay interface is really inconsistent and produces erroneous output if it's not used in a specific way. It's nice to be able to catch these errors if they occur (e.g. in case we have a regression). > > > + Target CPU (4 bytes) Signed integer, valid for event id 1. > > > + If equal to -1, target CPU is the same > > > + as origin CPU, but the reverse might > > > + not be true. > > If only valid for event ID 1 and only in NUMA case, please don't waste > space in each event header and make that a event-specific field... ? Yes, this would probably be a better approach. > > > + Caller address (8 bytes) Return address to the caller. > > Not true on 32 bits machines. You are wasting 4 bytes on those archs. Pekka suggested we use types that have constant size on every arch. I could change this. > 8 bytes for GFP flags ?? Whoah, that's a lot of one-hot bits ! :) I knew > that some allocators were bloated, bit not that much. :) This could change too, but if the number of GFP flags is too close to 32, I'd rather keep the ABI stable, providing for a larger number of GFP flags. > > > + Timestamp (8 bytes) Signed integer representing timestamp. > > > + > > With a heartbeat, as lttng does, you can cut that to a 4 bytes field. Hmm, I'll look at lttng's code and see what exactly you are talking about. For now, I'm not sure how 32-bit timestamps perform. > > > +The data is made available in the same endianness the machine has. > > > + > > Using a magic number in the trace header lets you deal with > cross-endianness. Why? I mean I can do this in the userspace app when I record the data. > Saving the type sizes in the trace header lets you deal with different > int/long/pointer type sizes. > > > > +Other event ids and type ids may be defined and added. Other fields may be > > > +added by increasing event size. Every modification to the ABI, including > > > +new id definitions, are followed by bumping the ABI version by one. > > > + > > I personally prefer a self-describing trace :) ASCII/text? :) I'm not sure what you meant, but non-binary traces would result in huge amounts of data. > Not currently true : cross-endianness/wastes space for 32 bits archs. Sure, cross-endianness is not even currently implemented in the userspace app. > > > + - be fast and anticipate usage in high-load environments (*) > > LTTng will be faster though : per-cpu atomic ops instead of interrupt > disable makes the probe faster. I'm not sure how one could record a timestamp orderly into relay buffers using only atomic ops and no locks. > > > + - be reasonably extensible > > Automatic description of markers and dynamic assignation of IDs to > markers should provide a bit more flexibility here. Dynamic assignation makes it hard to preserve ABI compatibility with the userspace app. And Pekka suggested it's important to preserve it in order to allow distros to include kmemtrace. > > > + - make it possible for GNU/Linux distributions to have kmemtrace > > > + included in their repositories > > > + > > > +(*) - one of the reasons Pekka Enberg's original userspace data analysis > > > + tool's code was rewritten from Perl to C (although this is more than a > > > + simple conversion) > > > + > > > + > > > +III. Quick usage guide > > > +====================== > > > + > > > +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable > > > +CONFIG_KMEMTRACE and CONFIG_DEFAULT_ENABLED). > > > + > > > +2) Get the userspace tool and build it: > > > +$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository > > > +$ cd kmemtrace-user/ > > > +$ ./autogen.sh > > > +$ ./configure > > > +$ make > > > + > > > +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the > > > +'single' runlevel (so that relay buffers don't fill up easily), and run > > > +kmemtrace: > > > +# '$' does not mean user, but root here. > > Change the documentation to prefix a root command line by "#" instead of > leaving this weird comment. Yes, it's probably better. I just wanted to avoid a user taking that as a comment. > What in the world can be causing that ? Shouldn't it be fixed ? It might > be due to unexpected allocator behavior, non-instrumented alloc/free > code or broken tracer.... Of course it will be fixed. But this FAQ entry also serves as a warning that future allocator patches could introduce untraced functions. > > > +struct kmemtrace_event { > > > + u8 event_id; /* Allocate or free? */ > > > + u8 type_id; /* Kind of allocation/free. */ > > > + u16 event_size; /* Size of event */ > > > + s32 node; /* Target CPU. */ > > > + u64 call_site; /* Caller address. */ > > > + u64 ptr; /* Pointer to allocation. */ > > > + u64 bytes_req; /* Number of bytes requested. */ > > > + u64 bytes_alloc; /* Number of bytes allocated. */ > > > + u64 gfp_flags; /* Requested flags. */ > > > + s64 timestamp; /* When the operation occured in ns. */ > > > +} __attribute__ ((__packed__)); > > > + > > See below for detail, but this event record is way too big and not > adapted to 32 bits architectures. Pekka, what do you think? > > > +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id, > > > + unsigned long call_site, > > > + const void *ptr) > > > +{ > > > + trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu", > > > + type_id, call_site, (unsigned long) ptr); > > > +} > > This could be trivially turned into a tracepoint probe. Okay, will rebase my patches on a tracepoints-enabled branch. How close are they to mainline? > > > + > > > +#define KMEMTRACE_SUBBUF_SIZE (8192 * sizeof(struct kmemtrace_event)) > > > +#define KMEMTRACE_N_SUBBUFS 20 > > > + > > Isn't this overridable by a command line param ? Shouldn't it be called > "DEFAULT_KMEMTRACE_*" then ? I wanted to avoid using too long macro names. But I can change this. > > > +static struct rchan *kmemtrace_chan; > > > +static u32 kmemtrace_buf_overruns; > > > + > > > +static unsigned int kmemtrace_n_subbufs; > > > +#ifdef CONFIG_KMEMTRACE_DEFAULT_ENABLED > > > +static unsigned int kmemtrace_enabled = 1; > > > +#else > > > +static unsigned int kmemtrace_enabled = 0; > > > +#endif > > Hrm, I'd leave that as a kernel command line option, not config option. > If you ever want to _aways_ have it on, then change your lilo/grub file. Not quite true. I saw a few kernel subsystems that provide compile-time options for those arches where supplying command-line options is hard/impossible. > > > + * Don't convert this to use structure initializers, > > > + * C99 does not guarantee the rvalues evaluation order. > > > + */ > > > + ev.event_id = KMEMTRACE_EVENT_ALLOC; > > > + ev.type_id = va_arg(*args, int); > > > + ev.event_size = sizeof(struct kmemtrace_event); > > > + ev.call_site = va_arg(*args, unsigned long); > > > + ev.ptr = va_arg(*args, unsigned long); > > Argh, and you do a supplementary copy here. You could simply alias the > buffers and write directly to them after reserving the correct amount of > space. Oh, good point. I could use relay_reserve() here. > > > + /* Don't trace ignored allocations. */ > > > + if (!ev.ptr) > > > + return; > > > + ev.bytes_req = va_arg(*args, unsigned long); > > > + ev.bytes_alloc = va_arg(*args, unsigned long); > > > + /* ev.timestamp set below, to preserve event ordering. */ > > > + ev.gfp_flags = va_arg(*args, unsigned long); > > > + ev.node = va_arg(*args, int); > > > + > > > + /* We disable IRQs for timestamps to match event ordering. */ > > > + local_irq_save(flags); > > > + ev.timestamp = ktime_to_ns(ktime_get()); > > ktime_get is monotonic, but with potentially coarse granularity. I see > that you use ktime_to_ns here, which gives you a resolution of 1 timer > tick in the case where the TSCs are not synchronized. While it should be > "good enough" for the scheduler, I doubt it's enough for a tracer. > > It also takes the xtime seqlock, which adds a potentially big delay to > the tracing code (if you read the clock while the writer lock is taken). > > Also, when NTP modifies the clock, although it stays monotonic, the rate > at which it increments can dramatically change. I doubt you want to use > that as a reference for performance analysis. What would you suggest instead? Please keep in mind timer resolution is not that critical, we're not benchmarking the allocators cycle-wise; instead we're merely looking at allocation lifetimes, fragmentation, patterns etc.. Timestamps are most important for reordering events in userspace. > > > +static int __init kmemtrace_set_boot_enabled(char *str) > > > +{ > > > + if (!str) > > > + return -EINVAL; > > > + > > > + if (!strcmp(str, "yes")) > > I think the standard is to use =0, =1 here, not =yes, =no ? Okay. > Mathieu > > -- > Mathieu Desnoyers > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 Thanks for your comments. Cheers, Eduard [-- Attachment #2: Type: application/pgp-signature, Size: 827 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/4] kmemtrace: Core implementation. 2008-07-28 17:35 ` Eduard - Gabriel Munteanu @ 2008-07-29 8:25 ` Pekka Enberg 0 siblings, 0 replies; 30+ messages in thread From: Pekka Enberg @ 2008-07-29 8:25 UTC (permalink / raw) To: Eduard - Gabriel Munteanu Cc: Mathieu Desnoyers, cl, linux-mm, linux-kernel, rdunlap, mpm, Steven Rostedt, Thomas Gleixner Hi Eduard-Gabriel, On Mon, 2008-07-28 at 20:35 +0300, Eduard - Gabriel Munteanu wrote: > > > > +struct kmemtrace_event { > > > > + u8 event_id; /* Allocate or free? */ > > > > + u8 type_id; /* Kind of allocation/free. */ > > > > + u16 event_size; /* Size of event */ > > > > + s32 node; /* Target CPU. */ > > > > + u64 call_site; /* Caller address. */ > > > > + u64 ptr; /* Pointer to allocation. */ > > > > + u64 bytes_req; /* Number of bytes requested. */ > > > > + u64 bytes_alloc; /* Number of bytes allocated. */ > > > > + u64 gfp_flags; /* Requested flags. */ > > > > + s64 timestamp; /* When the operation occured in ns. */ > > > > +} __attribute__ ((__packed__)); > > > > See below for detail, but this event record is way too big and not > > adapted to 32 bits architectures. > > Pekka, what do you think? i>>?Mathieu does have a good point of optimizing the memory use of an individual event so I'm okay with that. But we really don't want to force people the analyze the dump on same architecture where we captured it. So as long as that is taken care of, I'm happy. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-07-29 8:25 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-07-17 0:46 [RFC PATCH 0/4] kmemtrace RFC (resubmit 1) Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Eduard - Gabriel Munteanu 2008-07-17 0:46 ` [RFC PATCH 4/4] kmemtrace: SLOB hooks Eduard - Gabriel Munteanu 2008-07-17 7:43 ` Pekka Enberg 2008-07-17 15:46 ` Matt Mackall 2008-07-17 7:46 ` [RFC PATCH 3/4] kmemtrace: SLUB hooks Pekka Enberg 2008-07-17 18:06 ` Eduard - Gabriel Munteanu 2008-07-17 7:38 ` [RFC PATCH 2/4] kmemtrace: SLAB hooks Pekka Enberg 2008-07-17 8:01 ` [RFC PATCH 1/4] kmemtrace: Core implementation Pekka Enberg 2008-07-17 18:32 ` Eduard - Gabriel Munteanu 2008-07-18 8:48 ` Pekka J Enberg 2008-07-18 10:13 ` Eduard - Gabriel Munteanu 2008-07-18 14:38 ` Pekka Enberg 2008-07-18 19:40 ` Eduard - Gabriel Munteanu 2008-07-18 20:07 ` Matt Mackall 2008-07-17 21:34 ` Randy Dunlap 2008-07-17 23:49 ` Eduard - Gabriel Munteanu 2008-07-22 18:31 [RFC PATCH 0/4] kmemtrace RFC (resend 2) Eduard - Gabriel Munteanu 2008-07-22 18:31 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu 2008-07-22 21:28 ` Frank Ch. Eigler 2008-07-23 0:50 ` Eduard - Gabriel Munteanu 2008-07-23 0:55 ` Eduard - Gabriel Munteanu 2008-07-23 1:26 ` Frank Ch. Eigler 2008-07-22 18:36 [RFC PATCH 0/4] kmemtrace RFC (resend 2, fixed wrong Cc) Eduard - Gabriel Munteanu 2008-07-22 18:36 ` [RFC PATCH 1/4] kmemtrace: Core implementation Eduard - Gabriel Munteanu 2008-07-28 9:24 ` Pekka Enberg 2008-07-28 16:29 ` Mathieu Desnoyers 2008-07-28 17:09 ` Matt Mackall 2008-07-28 17:35 ` Eduard - Gabriel Munteanu 2008-07-29 8:25 ` Pekka Enberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox