From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EE917C43334
	for <linux-mm@archiver.kernel.org>; Tue,  5 Jul 2022 02:34:22 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 635D46B0071; Mon,  4 Jul 2022 22:34:22 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5E5B36B0073; Mon,  4 Jul 2022 22:34:22 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4ADB26B0074; Mon,  4 Jul 2022 22:34:22 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 3C97B6B0071
	for <linux-mm@kvack.org>; Mon,  4 Jul 2022 22:34:22 -0400 (EDT)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id CD3743E7
	for <linux-mm@kvack.org>; Tue,  5 Jul 2022 02:34:21 +0000 (UTC)
X-FDA: 79651477122.16.645FD37
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
	by imf12.hostedemail.com (Postfix) with ESMTP id F137040005
	for <linux-mm@kvack.org>; Tue,  5 Jul 2022 02:34:20 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1656988461; x=1688524461;
  h=date:from:to:cc:subject:message-id:references:
   mime-version:in-reply-to;
  bh=H1yxVJjBqlwYCSjTQqL31rgL4vIhJ/1YYAf1E/slgSI=;
  b=EYGr7m6HQha9hohwAi3N8JGoVwvWRy2GsNb8FszUfCS1o2IaoWaRd3nq
   mGAICZGje/ToScYbqWc5iDcIm83VIwfde12UcfVIX4Cx9WPuQMJjGRAqR
   RejuyyTt5J/WjbUlNhQuieM1wwfogmjj96YouiUCbqaxK1E9W7PgszxYL
   P5d32nXZY9ay7p0yY3g9EWKfd+UM/EiYJeIU2cHHkM5PBTfgEMlwxwN9E
   42aAIHhbYaG1xuypElFChaCEsq41yH5YxDNes+5z9A3KI0NBy21Z/X+yR
   VzDDbVN+Axk/4PTAxZR1w8cEff3xnzvzhfR6OJ44F/rEXEE+sHxST2fU4
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10398"; a="266257144"
X-IronPort-AV: E=Sophos;i="5.92,245,1650956400"; 
   d="scan'208";a="266257144"
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jul 2022 19:34:19 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.92,245,1650956400"; 
   d="scan'208";a="919526865"
Received: from shbuild999.sh.intel.com (HELO localhost) ([10.239.146.138])
  by fmsmga005.fm.intel.com with ESMTP; 04 Jul 2022 19:34:16 -0700
Date: Tue, 5 Jul 2022 10:34:15 +0800
From: Feng Tang <feng.tang@intel.com>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Christoph Lameter <cl@gentwo.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Roman Gushchin <roman.gushchin@linux.dev>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, dave.hansen@intel.com,
	Robin Murphy <robin.murphy@arm.com>,
	John Garry <john.garry@huawei.com>
Subject: Re: [PATCH v1] mm/slub: enable debugging memory wasting of kmalloc
Message-ID: <20220705023415.GE62281@shbuild999.sh.intel.com>
References: <20220701135954.45045-1-feng.tang@intel.com>
 <alpine.DEB.2.22.394.2207011635040.588600@gentwo.de>
 <20220701150451.GA62281@shbuild999.sh.intel.com>
 <YsGlAYujuJSTBLLf@ip-172-31-24-42.ap-northeast-1.compute.internal>
 <20220704055600.GD62281@shbuild999.sh.intel.com>
 <YsK7aUtPvX8fCWFJ@ip-172-31-24-42.ap-northeast-1.compute.internal>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <YsK7aUtPvX8fCWFJ@ip-172-31-24-42.ap-northeast-1.compute.internal>
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656988461; a=rsa-sha256;
	cv=none;
	b=DKjWvYAyyt6m5yU646G6Py3W9h5YnHf11kmYbZeGWrYY5T+wIzU8l0VeapArcJmre9Dd9d
	sGwza6wqsNUi6ZM9V11IeoHjk4oh7aBitEiA8+iznHE3adpw7Mceem+yVEdlAduIKs479f
	RV1qRvGbm1mVIXEY+s/sJU044gw5uOk=
ARC-Authentication-Results: i=1;
	imf12.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=EYGr7m6H;
	spf=none (imf12.hostedemail.com: domain of feng.tang@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=feng.tang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1656988461;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=6aRSHwARHecAVRXzjsybiXH9OxRAUs4ei9yhnVAGRmY=;
	b=Z0dUZ3x/Xl/GGWmt323DTS7ooF6JtnmealyzYsv+Jd0jJjlNDU0fBI7tQFr58eFLAt3AFJ
	dRKw4W+ymOOOfTc5Xmfacd1JeU0BhQ97ghgnh8HI5Q2GeEJKeAOPXG379DwGMqDhFrJfy9
	UjLjH6dQUnp0ibzIP9qmHqTC2fATlGM=
X-Stat-Signature: k9is78z4obf3om3dfkptq1eb59hyr1db
X-Rspamd-Queue-Id: F137040005
Authentication-Results: imf12.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=EYGr7m6H;
	spf=none (imf12.hostedemail.com: domain of feng.tang@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=feng.tang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
X-Rspam-User: 
X-Rspamd-Server: rspam12
X-HE-Tag: 1656988460-865397
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Jul 04, 2022 at 10:05:29AM +0000, Hyeonggon Yoo wrote:
> On Mon, Jul 04, 2022 at 01:56:00PM +0800, Feng Tang wrote:
> > On Sun, Jul 03, 2022 at 02:17:37PM +0000, Hyeonggon Yoo wrote:
> > > On Fri, Jul 01, 2022 at 11:04:51PM +0800, Feng Tang wrote:
> > > > Hi Christoph,
> > > > 
> > > > On Fri, Jul 01, 2022 at 04:37:00PM +0200, Christoph Lameter wrote:
> > > > > On Fri, 1 Jul 2022, Feng Tang wrote:
> > > > > 
> > > > > >  static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> > > > > > -			  unsigned long addr, struct kmem_cache_cpu *c)
> > > > > > +			  unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size)
> > > > > >  {
> > > > > 
> > > > > It would be good to avoid expanding the basic slab handling functions for
> > > > > kmalloc. Can we restrict the mods to the kmalloc related functions?
> > > > 
> > > > Yes, this is the part that concerned me. I tried but haven't figured
> > > > a way.
> > > > 
> > > > I started implemting it several month ago, and stuck with several
> > > > kmalloc APIs in a hacky way like dump_stack() when there is a waste
> > > > over 1/4 of the object_size of the kmalloc_caches[][].
> > > > 
> > > > Then I found one central API which has all the needed info (object_size &
> > > > orig_size) that we can yell about the waste :
> > > > 
> > > > static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
> > > >                 gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
> > > > 
> > > > which I thought could be still hacky, as the existing 'alloc_traces'
> > > > can't be resued which already has the count/call-stack info. Current
> > > > solution leverage it at the cost of adding 'orig_size' parameters, but
> > > > I don't know how to pass the 'waste' info through as track/location is
> > > > in the lowest level.
> > > 
> > > If adding cost of orig_size parameter for non-debugging case is concern,
> > > what about doing this in userspace script that makes use of kmalloc
> > > tracepoints?
> > > 
> > > 	kmalloc: call_site=tty_buffer_alloc+0x43/0x90 ptr=00000000b78761e1
> > > 	bytes_req=1056 bytes_alloc=2048 gfp_flags=GFP_ATOMIC|__GFP_NOWARN
> > > 	accounted=false
> > > 
> > > calculating sum of (bytes_alloc - bytes_req) for each call_site
> > > may be an alternative solution.
> > 
> > Yes, this is doable, and it will met some of the problems I met before,
> > one is there are currently 2 alloc path: kmalloc and kmalloc_node, also
> > we need to consider the free problem to calculate the real waste, and
> > the free trace point doesn't have size info (Yes, we could compare
> > the pointer with alloc path, and the user script may need to be more
> > complexer). That's why I love the current 'alloc_traces' interface,
> > which has the count (slove the free counting problem) and full call
> > stack info.
> 
> Understood.
> 
> > And for the extra parameter cost issue, I rethink about it, and we
> > can leverage the 'slab_alloc_node()' to solve it, and the patch is 
> > much simpler now without adding a new parameter:
> > 
> > ---
> > diff --git a/mm/slub.c b/mm/slub.c
> > index b1281b8654bd3..ce4568dbb0f2d 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -271,6 +271,7 @@ struct track {
> >  #endif
> >  	int cpu;		/* Was running on cpu */
> >  	int pid;		/* Pid context */
> > +	unsigned long waste;	/* memory waste for a kmalloc-ed object */
> >  	unsigned long when;	/* When did the operation occur */
> >  };
> >  
> > @@ -3240,6 +3241,16 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l
> >  	init = slab_want_init_on_alloc(gfpflags, s);
> >  
> >  out:
> > +
> > +#ifdef CONFIG_SLUB_DEBUG
> > +	if (object && s->object_size != orig_size) {
> > +		struct track *track;
> > +
> > +		track = get_track(s, object, TRACK_ALLOC);
> > +		track->waste = s->object_size - orig_size;
> > +	}
> > +#endif
> > +
> 
> This scares me. It does not check if the cache has
> SLAB_STORE_USER flag.
 
Yes, I missed that.

> Also CONFIG_SLUB_DEBUG is enabled by default, which means that
> it is still against not affecting non-debugging case.
 
Yes, logically these debug stuff can be put together in low-level
function.

> I like v1 more than modified version.

I see, thanks

- Feng

> Thanks,
> Hyeonggon
> 
> >  	slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init);
> >  
> >  	return object;
> > @@ -5092,6 +5103,7 @@ struct location {
> >  	depot_stack_handle_t handle;
> >  	unsigned long count;
> >  	unsigned long addr;
> > +	unsigned long waste;
> >  	long long sum_time;
> >  	long min_time;
> >  	long max_time;
> > @@ -5142,7 +5154,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
> >  {
> >  	long start, end, pos;
> >  	struct location *l;
> > -	unsigned long caddr, chandle;
> > +	unsigned long caddr, chandle, cwaste;
> >  	unsigned long age = jiffies - track->when;
> >  	depot_stack_handle_t handle = 0;
> >  
> > @@ -5162,11 +5174,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
> >  		if (pos == end)
> >  			break;
> >  
> > -		caddr = t->loc[pos].addr;
> > -		chandle = t->loc[pos].handle;
> > -		if ((track->addr == caddr) && (handle == chandle)) {
> > +		l = &t->loc[pos];
> > +		caddr = l->addr;
> > +		chandle = l->handle;
> > +		cwaste = l->waste;
> > +		if ((track->addr == caddr) && (handle == chandle) &&
> > +			(track->waste == cwaste)) {
> >  
> > -			l = &t->loc[pos];
> >  			l->count++;
> >  			if (track->when) {
> >  				l->sum_time += age;
> > @@ -5191,6 +5205,9 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
> >  			end = pos;
> >  		else if (track->addr == caddr && handle < chandle)
> >  			end = pos;
> > +		else if (track->addr == caddr && handle == chandle &&
> > +				track->waste < cwaste)
> > +			end = pos;
> >  		else
> >  			start = pos;
> >  	}
> > @@ -5214,6 +5231,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
> >  	l->min_pid = track->pid;
> >  	l->max_pid = track->pid;
> >  	l->handle = handle;
> > +	l->waste = track->waste;
> >  	cpumask_clear(to_cpumask(l->cpus));
> >  	cpumask_set_cpu(track->cpu, to_cpumask(l->cpus));
> >  	nodes_clear(l->nodes);
> > @@ -6102,6 +6120,10 @@ static int slab_debugfs_show(struct seq_file *seq, void *v)
> >  		else
> >  			seq_puts(seq, "<not-available>");
> >  
> > +		if (l->waste)
> > +			seq_printf(seq, " waste=%lu/%lu",
> > +				l->count * l->waste, l->waste);
> > +
> >  		if (l->sum_time != l->min_time) {
> >  			seq_printf(seq, " age=%ld/%llu/%ld",
> >  				l->min_time, div_u64(l->sum_time, l->count),
> > 
> > Thanks,
> > Feng
> > 
> > > Thanks,
> > > Hyeonggon
> > > 
> > > > Thanks,
> > > > Feng
> > > > 
> > > > 
> > > >