From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39F10C4167B for ; Thu, 22 Dec 2022 11:52:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 88DAF940007; Thu, 22 Dec 2022 06:52:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 83DA4900002; Thu, 22 Dec 2022 06:52:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6DE05940007; Thu, 22 Dec 2022 06:52:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5EAFB900002 for ; Thu, 22 Dec 2022 06:52:06 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 22A56AB3F5 for ; Thu, 22 Dec 2022 11:52:06 +0000 (UTC) X-FDA: 80269778652.30.5F8D1E6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 2B18540016 for ; Thu, 22 Dec 2022 11:52:03 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bd2D8eye; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf17.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671709924; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l41aH0WBriZBVAcbCCI1ziH5/SVn64wwNEilAzNOnCQ=; b=qz2YHN4cAm49Ldfn1iSJhDNp2kl+V2XbK/1Y4eXWgCwUbTOqYnSJu7lPuZexfISIIqdT/l O35XHrrcHtlxyNJy0VG14al8REPc0SQBgKbMxz4xPjXOBxPj+v6LsibBPyjcbDfi7grM00 3ATCBQmYmLPDhB+IeM46EoSC6NGIHSk= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bd2D8eye; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf17.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671709924; a=rsa-sha256; cv=none; b=3g2Q29gS9hiw1mtPAw8mpPf365VgzPFJZgN8DUoi6zT6d0WhMTUl5Y5GtTf8+cxNnRljvw 9pEfZcc6+JwALvROIIu1vlOi9CzMq8gBfr4voWYNzNJh9s+50t5uSWUwBoez5nYBI4MIWe RC6vUWLuvVqHN9DAZlr6k3QJbwxdcK4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671709923; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=l41aH0WBriZBVAcbCCI1ziH5/SVn64wwNEilAzNOnCQ=; b=bd2D8eye2Ubdl9i8BulU+sFZnBsaFyk97zIiARpDXMW3eZRz/MDQWStpVjQaTr+WhsP8iY jMUwPqaYkway4j8q5G7cWVi9sj9mEXlg1e/JiUB2p0jyvQrnYKEw/k0/bwkfeZH/ZqTUeQ wY+go9xpLf5h/3MGc0Q9B6y3kUmq7Lo= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-153-II1ODsGyMruk3oyOfpokyQ-1; Thu, 22 Dec 2022 06:52:02 -0500 X-MC-Unique: II1ODsGyMruk3oyOfpokyQ-1 Received: by mail-qk1-f197.google.com with SMTP id o13-20020a05620a2a0d00b006cf9085682dso1078582qkp.7 for ; Thu, 22 Dec 2022 03:52:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=l41aH0WBriZBVAcbCCI1ziH5/SVn64wwNEilAzNOnCQ=; b=GlN5E47aend3qYH/cKs4X+9hxGv/weBEJxQNWgWtarDQ5UVGzzni+FMYm8D+kdnzs3 aCvyqvzvgwpZUtAd0suPjRkLY8xf3poNaAzv+NiaVMQMuZk3LhlOL/xNh627utH6bHK1 ikBQ6+/FMbnH47dzOlpAvVFaT8q57zcyvvu5qf4JDscIg89FXxQYPDu94dIeGQcOpW6+ 0YsRkdjcny8eLOpX7HtbP9hgOx7dKQXgAxUCunTq+8SeNet/rgEEYqrPEXJ0Y+LkSa2m fk10RuryMMfBQwGcJE+ab+UTn7bOm/3sQtJgZBnRgqpmf+dWn7jSoEV7JEDAAU0mhVej DgxQ== X-Gm-Message-State: AFqh2kq/Q9XgZQZqa/ccaYlIoslGnuAgXdjAT5TNkbrdxKwmjjw/pw0l dxzTmuZHOHTH7092D2Nq3R1Gdr9XU15n/n51bzwo83YqbZJJkRqeZfm/vUM8cY/v5rPVRxgxNYO 3tI1vS9IDXpw= X-Received: by 2002:a05:6214:3b06:b0:4c7:935f:c888 with SMTP id nm6-20020a0562143b0600b004c7935fc888mr6321923qvb.42.1671709920170; Thu, 22 Dec 2022 03:52:00 -0800 (PST) X-Google-Smtp-Source: AMrXdXs23jUJN+k7IRFIRjwWmG4sFROD8uJ34MtfszUJtLdIDz+Vkjv8IDoC4F/m0B1brfNm2REWQg== X-Received: by 2002:a05:6214:3b06:b0:4c7:935f:c888 with SMTP id nm6-20020a0562143b0600b004c7935fc888mr6321899qvb.42.1671709919793; Thu, 22 Dec 2022 03:51:59 -0800 (PST) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id s4-20020a05620a254400b006feb158e5e7sm154139qko.70.2022.12.22.03.51.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Dec 2022 03:51:58 -0800 (PST) Date: Thu, 22 Dec 2022 06:52:06 -0500 From: Brian Foster To: Roman Gushchin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt , Johannes Weiner , Michal Hocko , Muchun Song , Andrew Morton , Waiman Long , Sven Luther Subject: Re: [PATCH RFC] ipc/mqueue: introduce msg cache Message-ID: References: <20221220184813.1908318-1-roman.gushchin@linux.dev> MIME-Version: 1.0 In-Reply-To: <20221220184813.1908318-1-roman.gushchin@linux.dev> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2B18540016 X-Stat-Signature: 8rfw643ym3kgg3anyfqrnxfig1ekxj8s X-HE-Tag: 1671709923-893350 X-HE-Meta: U2FsdGVkX1/zizNnGucHoCkJF6ODgMRUtTgb1KMS8B8BDKZuIDUufvaTMM6yUiZ6gzooYRgJLIELw/QXq2cMYNsLTfSlE0Uh9ydJ4HDiNzgh1Mjq6JIVL4BXLeL0J96sAulhbqlN87oKPRq6oRYV7TNg1f5lqPDs5yJXix38YEcghpS9QC9NFMNIL8PMX2hUitCbv79tLpnAYbuu7E4fbfxMTTNLOJOmv9WtfWC8Jzx4EVc0x9dy74h/wmQa27KPsT5JCMe6fB26eHHMdm75loue9utL9Ef7/frvJurxU5nKGUsIyUqv/HFLLdTB6nxPTo83atsvYwj1QHskBCwE3idDtkUFs4J3gNOAnOoI9z3Jvn7tK87VxVzhWHnLc+XsM1BiU9ml0K+MhyF2wffudr2U26UlxiNfzlYsZRwHTFzYgSdJsab7fpFmPhjI+xLSVh815UqquFrIw8TBvulrBL/eWVpFz9e29UPKuAAla517O07AyWh4JHdglNofLSVwR91P7IWpphKKrvp54aUYc0tH4kqzaV8+GPfG2zoUo6N6dfvUMV0uB6juAuTmENlvObU20BX8vu+eC5gHA27P5hqwMtbxQdctuEEgB8IxsYdhClqtUuASBDpccpnol/K0RorKpU9JLNTfeLiGL6a4n8kdjFvzdqY9sY02kc+OGtnNaTn7nxFSGj00LG4QWTi/tif4+nZjPC1TY5gxreWctmfD7/TYAOq8pGft1kQ8cnOx2FXfAoF+BSlwlcoGawef/KBiK7eV+Ayi7bS9w5CVmMC4TsR0FVAGjHOjvKDeOuaZwVfRYRXJTr5f77RaI2/Ck745TESWSWBeHnb7AEnp65geFIdQ1U1br7CR7gbQgdh9QYIW978QTl4B4K95ISPzBxUgWL8cHJAKz/npMb3FCL+BKCr5XZ2RlhToD+kfqgiVXM1t3lgUTmmaUNiMi+x/6A1IgNioKo8qUxACv5c 2vHogVye NzcF/t9dS54oXNGxJMutOQ742WI/g7UjDLXPqvD4xoCgBEz/OXtJ/5RWhJRAZVOU7e4+3fKVFsoEMCCriqC7vtHtnpnAb/n+31VKzxkEgnPLnB1U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 20, 2022 at 10:48:13AM -0800, Roman Gushchin wrote: > Sven Luther reported a regression in the posix message queues > performance caused by switching to the per-object tracking of > slab objects introduced by patch series ending with the > commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all > allocations"). > > To mitigate the regression cache allocated mqueue messages on a small > percpu cache instead of releasing and re-allocating them every time. > > This change brings the performance measured by a benchmark kindly > provided by Sven [1] almost back to the original (or even better) > numbers. Measurements results are also provided by Sven. > > +------------+---------------------------------+--------------------------------+ > | kernel | mqueue_rcv (ns) variation | mqueue_send (ns) variation | > | version | min avg max min avg | min avg max min avg | > +------------+--------------------------+---------------------------------------+ > | 4.18.45 | 351 382 17533 base base | 383 410 13178 base base | > | 5.8-good | 380 392 7156 -7,63% -2,55% | 376 384 6225 1,86% 6,77% | > | 5.8-bad | 524 530 5310 -33,02% -27,92% | 512 519 8775 -25,20% -21,00% | > | 5.10 | 520 533 4078 -32,20% -28,33% | 518 534 8108 -26,06% -23,22% | > | 5.15 | 431 444 8440 -18,56% -13,96% | 425 437 6170 -9,88% -6,18% | > | 6.0.3 | 474 614 3881 -25,95% -37,79% | 482 693 931 -20,54% -40,84% | > | 6.1-rc8 | 496 509 8804 -29,23% -24,95% | 493 512 5748 -22,31% -19,92% | > | 6.1-rc8+p | 392 397 5479 -10,46% -3,78% | 364 369 10776 5,22% 11,11% | > +------------+---------------------------------+--------------------------------+ > > The last line reflects the result with this patch ("6.1-rc8+p"). > > [1]: https://lore.kernel.org/linux-mm/Y46lqCToUa%2FBgt%2Fc@P9FQF9L96D/T/ > > Reported-by: Sven Luther > Tested-by: Sven Luther > Signed-off-by: Roman Gushchin > --- > include/linux/memcontrol.h | 12 +++++ > ipc/mqueue.c | 20 ++++++-- > ipc/msg.c | 12 ++--- > ipc/msgutil.c | 101 +++++++++++++++++++++++++++++++++---- > ipc/util.h | 8 ++- > mm/memcontrol.c | 62 +++++++++++++++++++++++ > 6 files changed, 194 insertions(+), 21 deletions(-) > ... > diff --git a/ipc/msgutil.c b/ipc/msgutil.c > index d0a0e877cadd..8667708fc00a 100644 > --- a/ipc/msgutil.c > +++ b/ipc/msgutil.c ... > @@ -39,16 +40,76 @@ struct msg_msgseg { ... > +static struct msg_msg *alloc_msg(size_t len, struct msg_cache *cache) > { > struct msg_msg *msg; > struct msg_msgseg **pseg; > size_t alen; > > + if (cache) { > + struct pcpu_msg_cache *pc; > + > + msg = NULL; > + pc = get_cpu_ptr(cache->pcpu_cache); > + if (pc->msg && pc->len == len) { > + struct obj_cgroup *objcg; > + > + rcu_read_lock(); > + objcg = obj_cgroup_from_current(); > + if (objcg == pc->objcg) { > + msg = pc->msg; > + pc->msg = NULL; > + obj_cgroup_put(pc->objcg); > + } > + rcu_read_unlock(); > + } > + put_cpu_ptr(cache->pcpu_cache); > + if (msg) > + return msg; > + } > + > alen = min(len, DATALEN_MSG); > msg = kmalloc(sizeof(*msg) + alen, GFP_KERNEL_ACCOUNT); > if (msg == NULL) > @@ -77,18 +138,19 @@ static struct msg_msg *alloc_msg(size_t len) > return msg; > > out_err: > - free_msg(msg); > + free_msg(msg, NULL); > return NULL; > } > > -struct msg_msg *load_msg(const void __user *src, size_t len) > +struct msg_msg *load_msg(const void __user *src, size_t len, > + struct msg_cache *cache) > { > struct msg_msg *msg; > struct msg_msgseg *seg; > int err = -EFAULT; > size_t alen; > > - msg = alloc_msg(len); > + msg = alloc_msg(len, cache); > if (msg == NULL) > return ERR_PTR(-ENOMEM); > > @@ -104,14 +166,16 @@ struct msg_msg *load_msg(const void __user *src, size_t len) > goto out_err; > } > > - err = security_msg_msg_alloc(msg); > - if (err) > - goto out_err; > + if (!msg->security) { > + err = security_msg_msg_alloc(msg); > + if (err) > + goto out_err; > + } > > return msg; > > out_err: > - free_msg(msg); > + free_msg(msg, NULL); > return ERR_PTR(err); > } > #ifdef CONFIG_CHECKPOINT_RESTORE > @@ -166,10 +230,29 @@ int store_msg(void __user *dest, struct msg_msg *msg, size_t len) > return 0; > } > > -void free_msg(struct msg_msg *msg) > +void free_msg(struct msg_msg *msg, struct msg_cache *cache) > { > struct msg_msgseg *seg; > > + if (cache) { > + struct pcpu_msg_cache *pc; > + bool cached = false; > + > + pc = get_cpu_ptr(cache->pcpu_cache); > + if (!pc->msg) { > + pc->objcg = get_obj_cgroup_from_slab_obj(msg); > + pc->len = msg->m_ts; > + pc->msg = msg; > + > + if (pc->objcg) > + cached = true; > + } Hi Roman, It seems that this is kind of tailored to the ideal conditions implemented by the test case: i.e., a single, fixed size message being passed back and forth on a single cpu. Does that actually represent the production workload? I'm a little curious if/how this might work for workloads that might involve more variable sized messages, deeper queue depths (i.e. sending more than one message before attempting a recv) and more tasks across different cpus. For example, it looks like if an "uncommonly" sized message ended up cached on a cpu, this would always result in subsequent misses because the alloc side requires an exact size match and the free side never replaces a cached msg. Hm? Brian > + put_cpu_ptr(cache->pcpu_cache); > + > + if (cached) > + return; > + } > + > security_msg_msg_free(msg); > > seg = msg->next; > diff --git a/ipc/util.h b/ipc/util.h > index b2906e366539..a2da266386aa 100644 > --- a/ipc/util.h > +++ b/ipc/util.h > @@ -196,8 +196,12 @@ static inline void ipc_update_pid(struct pid **pos, struct pid *pid) > int ipc_parse_version(int *cmd); > #endif > > -extern void free_msg(struct msg_msg *msg); > -extern struct msg_msg *load_msg(const void __user *src, size_t len); > +struct msg_cache; > +extern int init_msg_cache(struct msg_cache *cache); > +extern void free_msg_cache(struct msg_cache *cache); > +extern void free_msg(struct msg_msg *msg, struct msg_cache *cache); > +extern struct msg_msg *load_msg(const void __user *src, size_t len, > + struct msg_cache *cache); > extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst); > extern int store_msg(void __user *dest, struct msg_msg *msg, size_t len); > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index a1a35c12635e..28528b4da0fb 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3004,6 +3004,28 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg) > return objcg; > } > > +__always_inline struct obj_cgroup *obj_cgroup_from_current(void) > +{ > + struct obj_cgroup *objcg = NULL; > + struct mem_cgroup *memcg; > + > + if (memcg_kmem_bypass()) > + return NULL; > + > + if (unlikely(active_memcg())) > + memcg = active_memcg(); > + else > + memcg = mem_cgroup_from_task(current); > + > + for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { > + objcg = rcu_dereference(memcg->objcg); > + if (likely(objcg)) > + return objcg; > + } > + > + return NULL; > +} > + > __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) > { > struct obj_cgroup *objcg = NULL; > @@ -3046,6 +3068,46 @@ struct obj_cgroup *get_obj_cgroup_from_page(struct page *page) > return objcg; > } > > +/* > + * A passed kernel object must be a slab object or a generic kernel page. > + * Not suitable for objects, allocated using vmalloc(). > + */ > +struct obj_cgroup *get_obj_cgroup_from_slab_obj(void *p) > +{ > + struct obj_cgroup *objcg = NULL; > + struct folio *folio; > + > + if (mem_cgroup_disabled()) > + return NULL; > + > + folio = virt_to_folio(p); > + /* > + * Slab object can be either a true slab object, which are accounted > + * individually with objcg pointers stored in a separate objcg array, > + * or it can a generic folio with MEMCG_DATA_KMEM flag set. > + */ > + if (folio_test_slab(folio)) { > + struct obj_cgroup **objcgs; > + struct slab *slab; > + unsigned int off; > + > + slab = folio_slab(folio); > + objcgs = slab_objcgs(slab); > + if (!objcgs) > + return NULL; > + > + off = obj_to_index(slab->slab_cache, slab, p); > + objcg = objcgs[off]; > + } else { > + objcg = __folio_objcg(folio); > + } > + > + if (objcg) > + obj_cgroup_get(objcg); > + > + return objcg; > +} > + > static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages) > { > mod_memcg_state(memcg, MEMCG_KMEM, nr_pages); > -- > 2.39.0 > >