From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C0DAC54731 for ; Tue, 27 Aug 2024 17:23:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 854476B007B; Tue, 27 Aug 2024 13:23:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DCB66B0083; Tue, 27 Aug 2024 13:23:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67D4A6B0085; Tue, 27 Aug 2024 13:23:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3FF0F6B007B for ; Tue, 27 Aug 2024 13:23:32 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C222DAAFED for ; Tue, 27 Aug 2024 17:23:31 +0000 (UTC) X-FDA: 82498697022.03.28A2D70 Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) by imf21.hostedemail.com (Postfix) with ESMTP id C1F961C000C for ; Tue, 27 Aug 2024 17:23:28 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=YvWTLJtL; spf=pass (imf21.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724779293; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s209v91OTMRFYwS0/Qp960kr7E7Ft0gsEXkxY27YQEU=; b=QbsHBWnWRxiHUme16ZhfloGA9jaWWCCv29MKKaj6v0dUUxzhyrS/EgKbmFFKy2va/uepus +DW+An9NQK7A2qBxuMfiPwgatoRsLIgr5oOhqWjztdVpZuRqvZw9uSfO8fkBGHK89lsBb7 QzdEC3gqUMG8ojZ65TLaDXndr/GJHeY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=YvWTLJtL; spf=pass (imf21.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724779293; a=rsa-sha256; cv=none; b=V1UmHm+inr5Nhj7nuhnREJjDCan6IJUIcoFbgADWficPBn2HKiF1JUmPXEw5uUkNS/ogHA NShqQoiI1GbZUs6VsMTza5/7JY2Nqn9pBvHDKvx5RL6heO6uNCDjPugA6pi/W0o6HTpmGN clQHYoDSA+pIXuzBlV++9ZfX9TYuAPU= Date: Tue, 27 Aug 2024 10:23:18 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1724779406; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=s209v91OTMRFYwS0/Qp960kr7E7Ft0gsEXkxY27YQEU=; b=YvWTLJtLK1YggX2faAdZy2v+USXGrnbEA6Xjwmfmx1gu+DWMbR2eHihrA7ZpHciQwuiyva IgF1rcqMnw0zJqNMR9YHAFeCz0MrSqt01cRD0SxlCKJKCygTtf6Y60J3DRq226hEC54jQP 3XS8p0GAU2wYXbeHHCo0jOulpGF3oNI= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Roman Gushchin Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Muchun Song , Vlastimil Babka , David Rientjes , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Eric Dumazet , "David S . Miller" , Jakub Kicinski , Paolo Abeni , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Meta kernel team , cgroups@vger.kernel.org, netdev@vger.kernel.org Subject: Re: [PATCH v1] memcg: add charging of already allocated slab objects Message-ID: References: <20240826232908.4076417-1-shakeel.butt@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C1F961C000C X-Stat-Signature: 6xnrmp7e9rw59tdiqbaecd4j5e6tasuu X-Rspam-User: X-HE-Tag: 1724779408-852564 X-HE-Meta: U2FsdGVkX18xyQZ7XrhkBA6rOA6ySEul7rZdzywW85VPUZs3FP+0clked3AVcC4TwIjft5q5nasrkl5wnaOdvqi4MVbr+jd5+MJ8E1nlz4Z9HY6677Ii896Y+uL1MtzXoUrkvKtXLYmEJMlcqGQ9AC3ItW9M5rOMTNpIZGc5mJDHEiCnEONpnRLO8NA7o9ZqLJa6StY+XUldT27zdMrpgzHoCf6O/EfOTmgxVDu/4yVn+H7zOaYxnHId+j6Iy872WwXj15/9rVQGW115bd+O6ZCdzx3zYp2qZ5AOJfB7eawscbSHfZ3YwKMCX/gEbsWpidOC/fp4/zf6HGxaZv5oB5GesjuU6/2AyIpRk2cKcWgvJpYLzDTHsEyS1D6SYAStuZxPCFkH0zviySZ4UPrsTU+rp/8brOb8hntNZkx8eYWrgtsC8Ai5IIUpFyegnTB5ZQWZiQogCDOUJytqhyN4hy19Hp5L6RrvChLayc4IN93TPjXVz8lvNytB8u6TRZsEZJjI7MlM1DfWlB4ijUN2x7t76XWFZ7TTiYYAPaXGslAsk1NchASGZ1Wrg1bygD4SloACakgY4i4uzq/3RWf2YhfDAIJ5dsoZkaqahU7Xfyy/jPsSAVy1Pd0adrgLU10gvCIxyqWWA6SrEscFYKYHFtJWKmPX++2gTN1r6y5T/oxneFcFUwFZcdCszi9rdPg/Z43slruYMnxTGPYM9j9KViOL/8Dc73q2DuaKQzF+q2eljL4Qj2NZHczMCNiIkvOyOtGXyKmQ+dw/stPAdErS24i+Z4BrrGqmQbDSCirbb3CqyVFzWD0WdUAL8Q5OfdzXyyTcj9hfVgpRsuf5/NsQsEIyhLeviXbHWlOGhpPGo8lhnrM9txMEN4NAzP/nlAKyeYhSHamnsr5iSUYxDjbfOKSojHm15u0bwC6YPj4O1LHGkceH5K/jf280m0P0UlF9tkifayYXCZJl8Ul8+qM 3poPGjYi ZIzlkPMLFhymCbzGAetkNSNwWXGOo870cRq3Pbwa0VBwxbwgarR+I2nYuWQbc4NELHZn4lR6G70fBU7Rc6Ka4qhzdjLQrv9nR3RSx+Wq9W9AcQxK35XQ7ng8zOgap6kCEsq6W0m4nR+v8ABbiFDxq1QFNXu81ZCC5RJVEqBvz3RQ6HBCnEu1zyS2cIpr6/hGjdXIL1g8pBSvA018h4Bw6bQDs0mPRg68k9SKKPVhEDc59JTsOv1xIswhASA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 27, 2024 at 03:06:32AM GMT, Roman Gushchin wrote: > On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote: > > At the moment, the slab objects are charged to the memcg at the > > allocation time. However there are cases where slab objects are > > allocated at the time where the right target memcg to charge it to is > > not known. One such case is the network sockets for the incoming > > connection which are allocated in the softirq context. > > > > Couple hundred thousand connections are very normal on large loaded > > server and almost all of those sockets underlying those connections get > > allocated in the softirq context and thus not charged to any memcg. > > However later at the accept() time we know the right target memcg to > > charge. Let's add new API to charge already allocated objects, so we can > > have better accounting of the memory usage. > > > > To measure the performance impact of this change, tcp_crr is used from > > the neper [1] performance suite. Basically it is a network ping pong > > test with new connection for each ping pong. > > > > The server and the client are run inside 3 level of cgroup hierarchy > > using the following commands: > > > > Server: > > $ tcp_crr -6 > > > > Client: > > $ tcp_crr -6 -c -H ${server_ip} > > > > If the client and server run on different machines with 50 GBPS NIC, > > there is no visible impact of the change. > > > > For the same machine experiment with v6.11-rc5 as base. > > > > base (throughput) with-patch > > tcp_crr 14545 (+- 80) 14463 (+- 56) > > > > It seems like the performance impact is within the noise. > > > > Link: https://github.com/google/neper [1] > > Signed-off-by: Shakeel Butt > > Hi Shakeel, > > I like the idea and performance numbers look good. However some comments on > the implementation: > Thanks for taking a look. > > --- > > > > Changes since the RFC: > > - Added check for already charged slab objects. > > - Added performance results from neper's tcp_crr > > > > include/linux/slab.h | 1 + > > mm/slub.c | 54 +++++++++++++++++++++++++++++++++ > > net/ipv4/inet_connection_sock.c | 5 +-- > > 3 files changed, 58 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/slab.h b/include/linux/slab.h > > index eb2bf4629157..05cfab107c72 100644 > > --- a/include/linux/slab.h > > +++ b/include/linux/slab.h > > @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, > > gfp_t gfpflags) __assume_slab_alignment __malloc; > > #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__)) > > > > +bool kmem_cache_charge(void *objp, gfp_t gfpflags); > > void kmem_cache_free(struct kmem_cache *s, void *objp); > > > > kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, > > diff --git a/mm/slub.c b/mm/slub.c > > index c9d8a2497fd6..580683597b5c 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, > > > > __memcg_slab_free_hook(s, slab, p, objects, obj_exts); > > } > > + > > +static __fastpath_inline > > +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags) > > +{ > > + if (likely(!memcg_kmem_online())) > > + return true; > > We do have this check in kmem_cache_charge(), why do we need to check it again? > I missed to remove this one. I am going to rearrange the code bit more in these functions to avoid the build errors in non MEMCG builds. > > + > > + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p); > > +} > > + > > #else /* CONFIG_MEMCG */ > > static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s, > > struct list_lru *lru, > > @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, > > void **p, int objects) > > { > > } > > + > > +static inline bool memcg_slab_post_charge(struct kmem_cache *s, > > + void *p, > > + gfp_t flags) > > +{ > > + return true; > > +} > > #endif /* CONFIG_MEMCG */ > > > > /* > > @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, > > } > > EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof); > > > > +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \ > > + SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT) > > + > > +bool kmem_cache_charge(void *objp, gfp_t gfpflags) > > +{ > > + struct slabobj_ext *slab_exts; > > + struct kmem_cache *s; > > + struct folio *folio; > > + struct slab *slab; > > + unsigned long off; > > + > > + if (!memcg_kmem_online()) > > + return true; > > + > > + folio = virt_to_folio(objp); > > + if (unlikely(!folio_test_slab(folio))) > > + return false; > > Does it handle the case of a too-big-to-be-a-slab-object allocation? > I think it's better to handle it properly. Also, why return false here? > Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I should just follow the kfree() hanlding on !folio_test_slab() i.e. that the given object is the large or too-big-to-be-a-slab-object. > > + > > + slab = folio_slab(folio); > > + s = slab->slab_cache; > > + > > + /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */ > > + if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC) > > + return true; > > And true here? It seems to be a bit inconsistent. Will be consistent after handling of the too-big-to-be-a-slab-object. > Also, if we have this check here, it means your function won't handle kmallocs > at all? Because !KMALLOC_NORMAL allocations won't get here. The non-KMALLOC_NORMAL kmalloc caches should also have one of SLAB_CACHE_DMA, SLAB_ACCOUNT and SLAB_RECLAIM_ACCOUNT flag, so the above check will only be true for KMALLOC_NORMAL caches. > > > + > > + /* Ignore already charged objects. */ > > + slab_exts = slab_obj_exts(slab); > > + if (slab_exts) { > > + off = obj_to_index(s, slab, objp); > > + if (unlikely(slab_exts[off].objcg)) > > + return true; > > + } > > + > > + return memcg_slab_post_charge(s, objp, gfpflags); > > +} > > +EXPORT_SYMBOL(kmem_cache_charge); > > + > > /** > > * kmem_cache_alloc_node - Allocate an object on the specified node > > * @s: The cache to allocate from. > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > > index 64d07b842e73..3c13ca8c11fb 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) > > release_sock(sk); > > if (newsk && mem_cgroup_sockets_enabled) { > > int amt = 0; > > + gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL; > > > > /* atomically get the memory usage, set and charge the > > * newsk->sk_memcg. > > @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) > > } > > > > if (amt) > > - mem_cgroup_charge_skmem(newsk->sk_memcg, amt, > > - GFP_KERNEL | __GFP_NOFAIL); > > + mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp); > > + kmem_cache_charge(newsk, gfp); > > Wait, so we assume that newsk->sk_memcg === current memcg? Or we're ok with them being > different? We set newsk->sk_memcg in the same function (see call to mem_cgroup_sk_alloc(newsk) couple of lines above). So, the newsk->sk_memcg will be equal to the current memcg. Thanks a lot of valuable feedback. Shakeel