From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CC74CD8CBD for ; Thu, 5 Sep 2024 17:49:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 201FC6B008C; Thu, 5 Sep 2024 13:49:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B2046B0093; Thu, 5 Sep 2024 13:49:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 079FF6B0095; Thu, 5 Sep 2024 13:49:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DDA456B008C for ; Thu, 5 Sep 2024 13:49:30 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4998E120969 for ; Thu, 5 Sep 2024 17:49:30 +0000 (UTC) X-FDA: 82531421700.14.55C4ED5 Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by imf01.hostedemail.com (Postfix) with ESMTP id 5F7BE40004 for ; Thu, 5 Sep 2024 17:49:28 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4tg4C2WE; spf=pass (imf01.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725558471; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MVYNP0AQu042AEwsGYdloOFpH9oJC7m5TbfPp6Jpayo=; b=tmeuqLsT1SmJOwUPAJU2r5HSu/xmgY5zcFut6syQqW1kOs9PiTdJ1gOQGnoTcehcaJrKeE VCKbOKG5uu46iEdSw4e4pIHcRO+G1TUaoxcnnbeMU9jGHXrNwHqHMuwi1FOJuL/KS3xBiR E7Tv3hgOfrwUWR6w3OlFSxayeHgCcVo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725558471; a=rsa-sha256; cv=none; b=1QTE8uor9zJriiQhiq//9HoSCbS4QCQId/FUQAgKRdZ8l4oe/UpvcFfMTcoGQnaNZwYx0C z+3JAPMGRYa7cYWDN6qDL+9diQkB9bMCkjHEi4EP9vVSzfBYgatfmt4tcgPCOua/UnN1Et 87dIb9vYdR2lx2TvRd1uk4yptqy6pPA= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4tg4C2WE; spf=pass (imf01.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-a8695cc91c8so137312766b.3 for ; Thu, 05 Sep 2024 10:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725558567; x=1726163367; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MVYNP0AQu042AEwsGYdloOFpH9oJC7m5TbfPp6Jpayo=; b=4tg4C2WE7ONLAeu66PYK5vBwnDNur/+FDtZekRVnOnu9LEefbku5+oXP8agpYMYm5v RGrwu4HsXsq/EtLTT8XAXkXSOGHCYEYFbGoBuVI6QJruUdK/11AfuEHZkdietU8cY0go 7m0cG1tsZRjArpRs8fWRTOi0wxe2aGA+n8SB24kN3crvpHnlh8tuuoHbZuRzbDKp1t7q k48Xq7aB/JQibRzQSX23udcq6HWneUt5vmOk6qUsBFZVUIoRYyNcFnR7PBFKhQksJYry NXrKmdS7YqgtdgD1XueaIs2DjtWS7tUhhZrRKUTRIcG/GVlrthNoGHYRRnf4punRtwUJ fIKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725558567; x=1726163367; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MVYNP0AQu042AEwsGYdloOFpH9oJC7m5TbfPp6Jpayo=; b=wS8gS16eOHScEbeOCs8pI0rQkIBnwn2WltWJZEo34vsoUN72WcY9ketvz7VPb5BIUx JK0snv2KvQaZWsooUgkP1kUdMzHo9nya/GXxW8kLKM3WUvn6s5rHZnT4iUeJjlOiet+V gzQlWVVRkOqM/71yCuzo6ii5yD2teMwQ0NI2pesI9vIqIV6e409FHEr6dfTrH9V2cS9K iWp2+lbalVp3Vq26rBmShh76EhWtka5+wRzjReP57yUqS6Jq0sLgmO0DDp9HzyzTSPoO fZslnkyeg96jDV/wHX+MUjRmB7c1vDnGJ8CNVru4405sOJ1C4XFKUIWyZqOVRUTSCou7 cFbg== X-Forwarded-Encrypted: i=1; AJvYcCV6qRN0/kZkEMzChQkjR+olFVw6aNK6XzTFVC7jHqJUUnmcXOc/FKeg5zoDsJzSDgoiFvCae2PLkQ==@kvack.org X-Gm-Message-State: AOJu0Ywrh94WEZ0bQ2lQvykEqjd3F+40u7FAtsh27oVUdvJWkpOaVMcd /D+24/lPAzigZe3vKPv/sIPDJ+kWguc347qavZ/9CpsoKXz+b+5ZXPCS6AJXGrQfKmnBcg1XLY1 1F9LtcSMOWiuXKA+AOZhj0zsjhXQli3s/Wtue X-Google-Smtp-Source: AGHT+IEfNLWW62TaMg5qGWyYeeDzZZuPeTlQviiykA3/jU7E+9ffcuRDoTd9QZGyOXuUZ94JUsT/UVkIa63AKbXh/Fo= X-Received: by 2002:a17:907:2d8e:b0:a7a:b73f:7584 with SMTP id a640c23a62f3a-a897f92014bmr1810017566b.34.1725558566082; Thu, 05 Sep 2024 10:49:26 -0700 (PDT) MIME-Version: 1.0 References: <20240905173422.1565480-1-shakeel.butt@linux.dev> In-Reply-To: <20240905173422.1565480-1-shakeel.butt@linux.dev> From: Yosry Ahmed Date: Thu, 5 Sep 2024 10:48:50 -0700 Message-ID: Subject: Re: [PATCH v4] memcg: add charging of already allocated slab objects To: Shakeel Butt Cc: Andrew Morton , Vlastimil Babka , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , David Rientjes , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Eric Dumazet , "David S . Miller" , Jakub Kicinski , Paolo Abeni , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Meta kernel team , cgroups@vger.kernel.org, netdev@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5F7BE40004 X-Stat-Signature: exfgyxps6eyxiup18mqsrbxy8yzourkk X-HE-Tag: 1725558568-622024 X-HE-Meta: U2FsdGVkX1+NRZJJ+l5to6vi81wur291fjmFLO/M2Auj039vn4zn7YKwd+pz1nFKG2gSbse+d9NAZqTg36mvPY9JyRC8NBeyYgl+e9t9cpIHXAs/aKgo4avcFhAYnu5tcFennSyOu/lSMnpXQBtlBr9fe46smOwVyYGyL0/BgYumQM9EZNJbrmoFuajiAKyiW3/QvlnHovRxUFf58vN1N3HSZJ2pF2pMyEjX0jrhAIdZsmSuTvb0W78/748a9RFpsA4csYWsfd1FooRL0vJBMpgKs04RB1oh8aH83kY2VsBSWf38GcOdvOz6LAWgviuviPHF87J30y+RIql3Um/CTohOQABh0eQ9/cJV7vr8D2+5Mf2Ck6JoAWgWtpCGtZwAxKQbd1WD5GxYolQU6ymsC2c1YV2X6VwhaGZERxnxsqci2j2O2nSlBC/HBRf9nx+qtGbN08kXcy82lkAguVBOv/IkphoVg6FxKXbmKJKo+NgAhLgnjuvlsw12vNz+veiHvK9BnAsTOZScxa/rtqqbGM1KjSAzdqgQ4ZFmQXUwXn+1F0cYTN1RjbXSvZh8TbhfdUIlmKjb0oq60Ds45Q+Hc08DS5sX30vY9h8FZuT3v/OSnDHPj0o/IbAZBEjGQlp58KDmtFyTtHxLznN7cbig/gE0QfuvO2AqrYJbHmaZjAtEshMZJU4Yfx1gDUsbO8PPWFeYAHBy2U2CH+zU8f3JcBzO8JLywOGoAyhQ0JrXkc+mItsaN2zPNklWasfCjBZ9wbG4IidoHeDczOLW6qv+bW1ijh+iFbOFn/Lsd6SNmJHRwtm5OeErxyYDQcfSJpyTToVvO1yfGSjr2wtu7ZnqTVbpY+eCuja19INf5n/ENNOjpVLAglGCVeSf21tWjIAtzNp9VfeUXYdmXbCX2hWG60njuO5sK0N50rwILumDBmiXe5tARHuzKoTIfdMU7R+N7A/clrYos1Tv8+KCKuT i/iApAiP hTu80D3Jf2unKeYWv1OjvWMpc++iDaDv8RzLE/dHEDvxspL77DQ5O3EdbZMi7qI5gUdMAnsUlCLMSOlN41SI3po3AA/i0wCWmY8KVtQZX9JVlZBlpZz3rgjSIo2Vn8L6YHK0z0Vgn2x5noK8V0pma6NmVsk/CsCcME8L7RX/qiuqaN9dh/D99MWFVvYm1Sjxm91Uq5VZKkUt28oN0HRprdGDw6es385J5jvgOis2AMlclsFGa0Cri+jDH/0sVxFrp0We8O+4uOiedn62bqItFgIV6F1S4Ca/VTBd/+IpnuT14XeRPpTn1n/1d8WSTMv8iYrPExPHSkUsjJMptEwo2BCuMLg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Sep 5, 2024 at 10:34=E2=80=AFAM Shakeel Butt wrote: > > At the moment, the slab objects are charged to the memcg at the > allocation time. However there are cases where slab objects are > allocated at the time where the right target memcg to charge it to is > not known. One such case is the network sockets for the incoming > connection which are allocated in the softirq context. > > Couple hundred thousand connections are very normal on large loaded > server and almost all of those sockets underlying those connections get > allocated in the softirq context and thus not charged to any memcg. > However later at the accept() time we know the right target memcg to > charge. Let's add new API to charge already allocated objects, so we can > have better accounting of the memory usage. > > To measure the performance impact of this change, tcp_crr is used from > the neper [1] performance suite. Basically it is a network ping pong > test with new connection for each ping pong. > > The server and the client are run inside 3 level of cgroup hierarchy > using the following commands: > > Server: > $ tcp_crr -6 > > Client: > $ tcp_crr -6 -c -H ${server_ip} > > If the client and server run on different machines with 50 GBPS NIC, > there is no visible impact of the change. > > For the same machine experiment with v6.11-rc5 as base. > > base (throughput) with-patch > tcp_crr 14545 (+- 80) 14463 (+- 56) > > It seems like the performance impact is within the noise. > > Link: https://github.com/google/neper [1] > Signed-off-by: Shakeel Butt > Reviewed-by: Roman Gushchin LGTM from an MM perspective with a few nits below. FWIW: Reviewed-by: Yosry Ahmed > --- > v3: https://lore.kernel.org/all/20240829175339.2424521-1-shakeel.butt@lin= ux.dev/ > Changes since v3: > - Add kernel doc for kmem_cache_charge. > > v2: https://lore.kernel.org/all/20240827235228.1591842-1-shakeel.butt@lin= ux.dev/ > Change since v2: > - Add handling of already charged large kmalloc objects. > - Move the normal kmalloc cache check into a function. > > v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@lin= ux.dev/ > Changes since v1: > - Correctly handle large allocations which bypass slab > - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds > > RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@li= nux.dev/ > Changes since the RFC: > - Added check for already charged slab objects. > - Added performance results from neper's tcp_crr > > > include/linux/slab.h | 20 ++++++++++++++ > mm/slab.h | 7 +++++ > mm/slub.c | 49 +++++++++++++++++++++++++++++++++ > net/ipv4/inet_connection_sock.c | 5 ++-- > 4 files changed, 79 insertions(+), 2 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index eb2bf4629157..68789c79a530 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -547,6 +547,26 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache = *s, struct list_lru *lru, > gfp_t gfpflags) __assume_slab_alignment __mal= loc; > #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_= noprof(__VA_ARGS__)) > > +/** > + * kmem_cache_charge - memcg charge an already allocated slab memory > + * @objp: address of the slab object to memcg charge. > + * @gfpflags: describe the allocation context > + * > + * kmem_cache_charge is the normal method to charge a slab object to the= current > + * memcg. The objp should be pointer returned by the slab allocator func= tions > + * like kmalloc or kmem_cache_alloc. The memcg charge behavior can be co= ntroller s/controller/controlled > + * through gfpflags parameter. > + * > + * There are several cases where it will return true regardless. More > + * specifically: > + * > + * 1. For !CONFIG_MEMCG or cgroup_disable=3Dmemory systems. > + * 2. Already charged slab objects. > + * 3. For slab objects from KMALLOC_NORMAL caches. > + * > + * Return: true if charge was successful otherwise false. > + */ > +bool kmem_cache_charge(void *objp, gfp_t gfpflags); > void kmem_cache_free(struct kmem_cache *s, void *objp); > > kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, > diff --git a/mm/slab.h b/mm/slab.h > index dcdb56b8e7f5..9f907e930609 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -443,6 +443,13 @@ static inline bool is_kmalloc_cache(struct kmem_cach= e *s) > return (s->flags & SLAB_KMALLOC); > } > > +static inline bool is_kmalloc_normal(struct kmem_cache *s) > +{ > + if (!is_kmalloc_cache(s)) > + return false; > + return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACC= OUNT)); > +} > + > /* Legal flag mask for kmem_cache_create(), for various configurations *= / > #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \ > SLAB_CACHE_DMA32 | SLAB_PANIC | \ > diff --git a/mm/slub.c b/mm/slub.c > index c9d8a2497fd6..3f2a89f7a23a 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2185,6 +2185,41 @@ void memcg_slab_free_hook(struct kmem_cache *s, st= ruct slab *slab, void **p, > > __memcg_slab_free_hook(s, slab, p, objects, obj_exts); > } > + > +static __fastpath_inline > +bool memcg_slab_post_charge(void *p, gfp_t flags) > +{ > + struct slabobj_ext *slab_exts; > + struct kmem_cache *s; > + struct folio *folio; > + struct slab *slab; > + unsigned long off; > + > + folio =3D virt_to_folio(p); > + if (!folio_test_slab(folio)) { > + return folio_memcg_kmem(folio) || If the folio is charged user memory, we will still double charge here, but that would be a bug. We can put a warning in this case or use folio_memcg() instead to avoid double charges in that case as well. > + (__memcg_kmem_charge_page(folio_page(folio, 0), f= lags, > + folio_order(folio)) =3D= =3D 0); > + } > + > + slab =3D folio_slab(folio); > + s =3D slab->slab_cache; > + > + /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */ Is it possible to point to the commit that has the explanation here? The one you pointed me to before? Otherwise it's not really obvious where the circular dependency comes from (at least to me). > + if (is_kmalloc_normal(s)) > + return true; > + > + /* Ignore already charged objects. */ > + slab_exts =3D slab_obj_exts(slab); > + if (slab_exts) { > + off =3D obj_to_index(s, slab, p); > + if (unlikely(slab_exts[off].objcg)) > + return true; > + } > + > + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p); > +} > + > #else /* CONFIG_MEMCG */ > static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s, > struct list_lru *lru, > @@ -2198,6 +2233,11 @@ static inline void memcg_slab_free_hook(struct kme= m_cache *s, struct slab *slab, > void **p, int objects) > { > } > + > +static inline bool memcg_slab_post_charge(void *p, gfp_t flags) > +{ > + return true; > +} > #endif /* CONFIG_MEMCG */ > > /* > @@ -4062,6 +4102,15 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cach= e *s, struct list_lru *lru, > } > EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof); > > +bool kmem_cache_charge(void *objp, gfp_t gfpflags) > +{ > + if (!memcg_kmem_online()) > + return true; > + > + return memcg_slab_post_charge(objp, gfpflags); > +} > +EXPORT_SYMBOL(kmem_cache_charge); > + > /** > * kmem_cache_alloc_node - Allocate an object on the specified node > * @s: The cache to allocate from. > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_s= ock.c > index 64d07b842e73..3c13ca8c11fb 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct = proto_accept_arg *arg) > release_sock(sk); > if (newsk && mem_cgroup_sockets_enabled) { > int amt =3D 0; > + gfp_t gfp =3D GFP_KERNEL | __GFP_NOFAIL; > > /* atomically get the memory usage, set and charge the > * newsk->sk_memcg. > @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct = proto_accept_arg *arg) > } > > if (amt) > - mem_cgroup_charge_skmem(newsk->sk_memcg, amt, > - GFP_KERNEL | __GFP_NOFAIL= ); > + mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp= ); > + kmem_cache_charge(newsk, gfp); > > release_sock(newsk); > } > -- > 2.43.5 > >