From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 39F10C4167B
	for <linux-mm@archiver.kernel.org>; Thu, 22 Dec 2022 11:52:07 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 88DAF940007; Thu, 22 Dec 2022 06:52:06 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 83DA4900002; Thu, 22 Dec 2022 06:52:06 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6DE05940007; Thu, 22 Dec 2022 06:52:06 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 5EAFB900002
	for <linux-mm@kvack.org>; Thu, 22 Dec 2022 06:52:06 -0500 (EST)
Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 22A56AB3F5
	for <linux-mm@kvack.org>; Thu, 22 Dec 2022 11:52:06 +0000 (UTC)
X-FDA: 80269778652.30.5F8D1E6
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf17.hostedemail.com (Postfix) with ESMTP id 2B18540016
	for <linux-mm@kvack.org>; Thu, 22 Dec 2022 11:52:03 +0000 (UTC)
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bd2D8eye;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf17.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1671709924;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=l41aH0WBriZBVAcbCCI1ziH5/SVn64wwNEilAzNOnCQ=;
	b=qz2YHN4cAm49Ldfn1iSJhDNp2kl+V2XbK/1Y4eXWgCwUbTOqYnSJu7lPuZexfISIIqdT/l
	O35XHrrcHtlxyNJy0VG14al8REPc0SQBgKbMxz4xPjXOBxPj+v6LsibBPyjcbDfi7grM00
	3ATCBQmYmLPDhB+IeM46EoSC6NGIHSk=
ARC-Authentication-Results: i=1;
	imf17.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bd2D8eye;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf17.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671709924; a=rsa-sha256;
	cv=none;
	b=3g2Q29gS9hiw1mtPAw8mpPf365VgzPFJZgN8DUoi6zT6d0WhMTUl5Y5GtTf8+cxNnRljvw
	9pEfZcc6+JwALvROIIu1vlOi9CzMq8gBfr4voWYNzNJh9s+50t5uSWUwBoez5nYBI4MIWe
	RC6vUWLuvVqHN9DAZlr6k3QJbwxdcK4=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1671709923;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=l41aH0WBriZBVAcbCCI1ziH5/SVn64wwNEilAzNOnCQ=;
	b=bd2D8eye2Ubdl9i8BulU+sFZnBsaFyk97zIiARpDXMW3eZRz/MDQWStpVjQaTr+WhsP8iY
	jMUwPqaYkway4j8q5G7cWVi9sj9mEXlg1e/JiUB2p0jyvQrnYKEw/k0/bwkfeZH/ZqTUeQ
	wY+go9xpLf5h/3MGc0Q9B6y3kUmq7Lo=
Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com
 [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-153-II1ODsGyMruk3oyOfpokyQ-1; Thu, 22 Dec 2022 06:52:02 -0500
X-MC-Unique: II1ODsGyMruk3oyOfpokyQ-1
Received: by mail-qk1-f197.google.com with SMTP id o13-20020a05620a2a0d00b006cf9085682dso1078582qkp.7
        for <linux-mm@kvack.org>; Thu, 22 Dec 2022 03:52:02 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=l41aH0WBriZBVAcbCCI1ziH5/SVn64wwNEilAzNOnCQ=;
        b=GlN5E47aend3qYH/cKs4X+9hxGv/weBEJxQNWgWtarDQ5UVGzzni+FMYm8D+kdnzs3
         aCvyqvzvgwpZUtAd0suPjRkLY8xf3poNaAzv+NiaVMQMuZk3LhlOL/xNh627utH6bHK1
         ikBQ6+/FMbnH47dzOlpAvVFaT8q57zcyvvu5qf4JDscIg89FXxQYPDu94dIeGQcOpW6+
         0YsRkdjcny8eLOpX7HtbP9hgOx7dKQXgAxUCunTq+8SeNet/rgEEYqrPEXJ0Y+LkSa2m
         fk10RuryMMfBQwGcJE+ab+UTn7bOm/3sQtJgZBnRgqpmf+dWn7jSoEV7JEDAAU0mhVej
         DgxQ==
X-Gm-Message-State: AFqh2kq/Q9XgZQZqa/ccaYlIoslGnuAgXdjAT5TNkbrdxKwmjjw/pw0l
	dxzTmuZHOHTH7092D2Nq3R1Gdr9XU15n/n51bzwo83YqbZJJkRqeZfm/vUM8cY/v5rPVRxgxNYO
	3tI1vS9IDXpw=
X-Received: by 2002:a05:6214:3b06:b0:4c7:935f:c888 with SMTP id nm6-20020a0562143b0600b004c7935fc888mr6321923qvb.42.1671709920170;
        Thu, 22 Dec 2022 03:52:00 -0800 (PST)
X-Google-Smtp-Source: AMrXdXs23jUJN+k7IRFIRjwWmG4sFROD8uJ34MtfszUJtLdIDz+Vkjv8IDoC4F/m0B1brfNm2REWQg==
X-Received: by 2002:a05:6214:3b06:b0:4c7:935f:c888 with SMTP id nm6-20020a0562143b0600b004c7935fc888mr6321899qvb.42.1671709919793;
        Thu, 22 Dec 2022 03:51:59 -0800 (PST)
Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116])
        by smtp.gmail.com with ESMTPSA id s4-20020a05620a254400b006feb158e5e7sm154139qko.70.2022.12.22.03.51.58
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 22 Dec 2022 03:51:58 -0800 (PST)
Date: Thu, 22 Dec 2022 06:52:06 -0500
From: Brian Foster <bfoster@redhat.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Shakeel Butt <shakeelb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Waiman Long <longman@redhat.com>,
	Sven Luther <Sven.Luther@windriver.com>
Subject: Re: [PATCH RFC] ipc/mqueue: introduce msg cache
Message-ID: <Y6RE5iUHSinUJxDt@bfoster>
References: <20221220184813.1908318-1-roman.gushchin@linux.dev>
MIME-Version: 1.0
In-Reply-To: <20221220184813.1908318-1-roman.gushchin@linux.dev>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Rspam-User: 
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: 2B18540016
X-Stat-Signature: 8rfw643ym3kgg3anyfqrnxfig1ekxj8s
X-HE-Tag: 1671709923-893350
X-HE-Meta: U2FsdGVkX1/zizNnGucHoCkJF6ODgMRUtTgb1KMS8B8BDKZuIDUufvaTMM6yUiZ6gzooYRgJLIELw/QXq2cMYNsLTfSlE0Uh9ydJ4HDiNzgh1Mjq6JIVL4BXLeL0J96sAulhbqlN87oKPRq6oRYV7TNg1f5lqPDs5yJXix38YEcghpS9QC9NFMNIL8PMX2hUitCbv79tLpnAYbuu7E4fbfxMTTNLOJOmv9WtfWC8Jzx4EVc0x9dy74h/wmQa27KPsT5JCMe6fB26eHHMdm75loue9utL9Ef7/frvJurxU5nKGUsIyUqv/HFLLdTB6nxPTo83atsvYwj1QHskBCwE3idDtkUFs4J3gNOAnOoI9z3Jvn7tK87VxVzhWHnLc+XsM1BiU9ml0K+MhyF2wffudr2U26UlxiNfzlYsZRwHTFzYgSdJsab7fpFmPhjI+xLSVh815UqquFrIw8TBvulrBL/eWVpFz9e29UPKuAAla517O07AyWh4JHdglNofLSVwR91P7IWpphKKrvp54aUYc0tH4kqzaV8+GPfG2zoUo6N6dfvUMV0uB6juAuTmENlvObU20BX8vu+eC5gHA27P5hqwMtbxQdctuEEgB8IxsYdhClqtUuASBDpccpnol/K0RorKpU9JLNTfeLiGL6a4n8kdjFvzdqY9sY02kc+OGtnNaTn7nxFSGj00LG4QWTi/tif4+nZjPC1TY5gxreWctmfD7/TYAOq8pGft1kQ8cnOx2FXfAoF+BSlwlcoGawef/KBiK7eV+Ayi7bS9w5CVmMC4TsR0FVAGjHOjvKDeOuaZwVfRYRXJTr5f77RaI2/Ck745TESWSWBeHnb7AEnp65geFIdQ1U1br7CR7gbQgdh9QYIW978QTl4B4K95ISPzBxUgWL8cHJAKz/npMb3FCL+BKCr5XZ2RlhToD+kfqgiVXM1t3lgUTmmaUNiMi+x/6A1IgNioKo8qUxACv5c
 2vHogVye
 NzcF/t9dS54oXNGxJMutOQ742WI/g7UjDLXPqvD4xoCgBEz/OXtJ/5RWhJRAZVOU7e4+3fKVFsoEMCCriqC7vtHtnpnAb/n+31VKzxkEgnPLnB1U=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Dec 20, 2022 at 10:48:13AM -0800, Roman Gushchin wrote:
> Sven Luther reported a regression in the posix message queues
> performance caused by switching to the per-object tracking of
> slab objects introduced by patch series ending with the
> commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all
> allocations").
> 
> To mitigate the regression cache allocated mqueue messages on a small
> percpu cache instead of releasing and re-allocating them every time.
> 
> This change brings the performance measured by a benchmark kindly
> provided by Sven [1] almost back to the original (or even better)
> numbers. Measurements results are also provided by Sven.
> 
> +------------+---------------------------------+--------------------------------+
> | kernel     | mqueue_rcv (ns)     variation   | mqueue_send (ns)   variation   |
> | version    | min avg  max      min      avg  | min avg max       min     avg  |
> +------------+--------------------------+---------------------------------------+
> | 4.18.45    | 351 382 17533    base     base  | 383 410 13178     base    base |
> | 5.8-good   | 380 392  7156   -7,63%  -2,55%  | 376 384  6225    1,86%   6,77% |
> | 5.8-bad    | 524 530  5310  -33,02% -27,92%  | 512 519  8775  -25,20% -21,00% |
> | 5.10       | 520 533  4078  -32,20% -28,33%  | 518 534  8108  -26,06% -23,22% |
> | 5.15       | 431 444  8440  -18,56% -13,96%  | 425 437  6170   -9,88%  -6,18% |
> | 6.0.3      | 474 614  3881  -25,95% -37,79%  | 482 693   931  -20,54% -40,84% |
> | 6.1-rc8    | 496 509  8804  -29,23% -24,95%  | 493 512  5748  -22,31% -19,92% |
> | 6.1-rc8+p  | 392 397  5479  -10,46%  -3,78%  | 364 369 10776    5,22%  11,11% |
> +------------+---------------------------------+--------------------------------+
> 
> The last line reflects the result with this patch ("6.1-rc8+p").
> 
> [1]: https://lore.kernel.org/linux-mm/Y46lqCToUa%2FBgt%2Fc@P9FQF9L96D/T/
> 
> Reported-by: Sven Luther <Sven.Luther@windriver.com>
> Tested-by: Sven Luther <Sven.Luther@windriver.com>
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---
>  include/linux/memcontrol.h |  12 +++++
>  ipc/mqueue.c               |  20 ++++++--
>  ipc/msg.c                  |  12 ++---
>  ipc/msgutil.c              | 101 +++++++++++++++++++++++++++++++++----
>  ipc/util.h                 |   8 ++-
>  mm/memcontrol.c            |  62 +++++++++++++++++++++++
>  6 files changed, 194 insertions(+), 21 deletions(-)
> 
...
> diff --git a/ipc/msgutil.c b/ipc/msgutil.c
> index d0a0e877cadd..8667708fc00a 100644
> --- a/ipc/msgutil.c
> +++ b/ipc/msgutil.c
...
> @@ -39,16 +40,76 @@ struct msg_msgseg {
...
> +static struct msg_msg *alloc_msg(size_t len, struct msg_cache *cache)
>  {
>  	struct msg_msg *msg;
>  	struct msg_msgseg **pseg;
>  	size_t alen;
>  
> +	if (cache) {
> +		struct pcpu_msg_cache *pc;
> +
> +		msg = NULL;
> +		pc = get_cpu_ptr(cache->pcpu_cache);
> +		if (pc->msg && pc->len == len) {
> +			struct obj_cgroup *objcg;
> +
> +			rcu_read_lock();
> +			objcg = obj_cgroup_from_current();
> +			if (objcg == pc->objcg) {
> +				msg = pc->msg;
> +				pc->msg = NULL;
> +				obj_cgroup_put(pc->objcg);
> +			}
> +			rcu_read_unlock();
> +		}
> +		put_cpu_ptr(cache->pcpu_cache);
> +		if (msg)
> +			return msg;
> +	}
> +
>  	alen = min(len, DATALEN_MSG);
>  	msg = kmalloc(sizeof(*msg) + alen, GFP_KERNEL_ACCOUNT);
>  	if (msg == NULL)
> @@ -77,18 +138,19 @@ static struct msg_msg *alloc_msg(size_t len)
>  	return msg;
>  
>  out_err:
> -	free_msg(msg);
> +	free_msg(msg, NULL);
>  	return NULL;
>  }
>  
> -struct msg_msg *load_msg(const void __user *src, size_t len)
> +struct msg_msg *load_msg(const void __user *src, size_t len,
> +			 struct msg_cache *cache)
>  {
>  	struct msg_msg *msg;
>  	struct msg_msgseg *seg;
>  	int err = -EFAULT;
>  	size_t alen;
>  
> -	msg = alloc_msg(len);
> +	msg = alloc_msg(len, cache);
>  	if (msg == NULL)
>  		return ERR_PTR(-ENOMEM);
>  
> @@ -104,14 +166,16 @@ struct msg_msg *load_msg(const void __user *src, size_t len)
>  			goto out_err;
>  	}
>  
> -	err = security_msg_msg_alloc(msg);
> -	if (err)
> -		goto out_err;
> +	if (!msg->security) {
> +		err = security_msg_msg_alloc(msg);
> +		if (err)
> +			goto out_err;
> +	}
>  
>  	return msg;
>  
>  out_err:
> -	free_msg(msg);
> +	free_msg(msg, NULL);
>  	return ERR_PTR(err);
>  }
>  #ifdef CONFIG_CHECKPOINT_RESTORE
> @@ -166,10 +230,29 @@ int store_msg(void __user *dest, struct msg_msg *msg, size_t len)
>  	return 0;
>  }
>  
> -void free_msg(struct msg_msg *msg)
> +void free_msg(struct msg_msg *msg, struct msg_cache *cache)
>  {
>  	struct msg_msgseg *seg;
>  
> +	if (cache) {
> +		struct pcpu_msg_cache *pc;
> +		bool cached = false;
> +
> +		pc = get_cpu_ptr(cache->pcpu_cache);
> +		if (!pc->msg) {
> +			pc->objcg = get_obj_cgroup_from_slab_obj(msg);
> +			pc->len = msg->m_ts;
> +			pc->msg = msg;
> +
> +			if (pc->objcg)
> +				cached = true;
> +		}

Hi Roman,

It seems that this is kind of tailored to the ideal conditions
implemented by the test case: i.e., a single, fixed size message being
passed back and forth on a single cpu. Does that actually represent the
production workload?

I'm a little curious if/how this might work for workloads that might
involve more variable sized messages, deeper queue depths (i.e.  sending
more than one message before attempting a recv) and more tasks across
different cpus. For example, it looks like if an "uncommonly" sized
message ended up cached on a cpu, this would always result in subsequent
misses because the alloc side requires an exact size match and the free
side never replaces a cached msg. Hm?

Brian

> +		put_cpu_ptr(cache->pcpu_cache);
> +
> +		if (cached)
> +			return;
> +	}
> +
>  	security_msg_msg_free(msg);
>  
>  	seg = msg->next;
> diff --git a/ipc/util.h b/ipc/util.h
> index b2906e366539..a2da266386aa 100644
> --- a/ipc/util.h
> +++ b/ipc/util.h
> @@ -196,8 +196,12 @@ static inline void ipc_update_pid(struct pid **pos, struct pid *pid)
>  int ipc_parse_version(int *cmd);
>  #endif
>  
> -extern void free_msg(struct msg_msg *msg);
> -extern struct msg_msg *load_msg(const void __user *src, size_t len);
> +struct msg_cache;
> +extern int init_msg_cache(struct msg_cache *cache);
> +extern void free_msg_cache(struct msg_cache *cache);
> +extern void free_msg(struct msg_msg *msg, struct msg_cache *cache);
> +extern struct msg_msg *load_msg(const void __user *src, size_t len,
> +				struct msg_cache *cache);
>  extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst);
>  extern int store_msg(void __user *dest, struct msg_msg *msg, size_t len);
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a1a35c12635e..28528b4da0fb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3004,6 +3004,28 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
>  	return objcg;
>  }
>  
> +__always_inline struct obj_cgroup *obj_cgroup_from_current(void)
> +{
> +	struct obj_cgroup *objcg = NULL;
> +	struct mem_cgroup *memcg;
> +
> +	if (memcg_kmem_bypass())
> +		return NULL;
> +
> +	if (unlikely(active_memcg()))
> +		memcg = active_memcg();
> +	else
> +		memcg = mem_cgroup_from_task(current);
> +
> +	for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) {
> +		objcg = rcu_dereference(memcg->objcg);
> +		if (likely(objcg))
> +			return objcg;
> +	}
> +
> +	return NULL;
> +}
> +
>  __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
>  {
>  	struct obj_cgroup *objcg = NULL;
> @@ -3046,6 +3068,46 @@ struct obj_cgroup *get_obj_cgroup_from_page(struct page *page)
>  	return objcg;
>  }
>  
> +/*
> + * A passed kernel object must be a slab object or a generic kernel page.
> + * Not suitable for objects, allocated using vmalloc().
> + */
> +struct obj_cgroup *get_obj_cgroup_from_slab_obj(void *p)
> +{
> +	struct obj_cgroup *objcg = NULL;
> +	struct folio *folio;
> +
> +	if (mem_cgroup_disabled())
> +		return NULL;
> +
> +	folio = virt_to_folio(p);
> +	/*
> +	 * Slab object can be either a true slab object, which are accounted
> +	 * individually with objcg pointers stored in a separate objcg array,
> +	 * or it can a generic folio with MEMCG_DATA_KMEM flag set.
> +	 */
> +	if (folio_test_slab(folio)) {
> +		struct obj_cgroup **objcgs;
> +		struct slab *slab;
> +		unsigned int off;
> +
> +		slab = folio_slab(folio);
> +		objcgs = slab_objcgs(slab);
> +		if (!objcgs)
> +			return NULL;
> +
> +		off = obj_to_index(slab->slab_cache, slab, p);
> +		objcg = objcgs[off];
> +	} else {
> +		objcg = __folio_objcg(folio);
> +	}
> +
> +	if (objcg)
> +		obj_cgroup_get(objcg);
> +
> +	return objcg;
> +}
> +
>  static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
>  {
>  	mod_memcg_state(memcg, MEMCG_KMEM, nr_pages);
> -- 
> 2.39.0
> 
>