From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53A43ECE582 for ; Tue, 10 Sep 2024 08:26:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C3B7F8D0032; Tue, 10 Sep 2024 04:26:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC3308D0002; Tue, 10 Sep 2024 04:26:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8A9B8D0032; Tue, 10 Sep 2024 04:26:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8342F8D0002 for ; Tue, 10 Sep 2024 04:26:45 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2576FAA1CF for ; Tue, 10 Sep 2024 08:26:45 +0000 (UTC) X-FDA: 82548147570.28.B8C574B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf24.hostedemail.com (Postfix) with ESMTP id DF0C8180009 for ; Tue, 10 Sep 2024 08:26:42 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=McWKMK3m; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf24.hostedemail.com: domain of pabeni@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=pabeni@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725956799; a=rsa-sha256; cv=none; b=5ZwRRyCOOa+cgwcF4LfXJ3yaUaLUqWwxwJc7BEZpo2H+h4iCA7omu+EAEGmK7wt3hJ1M78 1qdJzO2oC98MdC2l9OuSAf5ZXnIFuRnZu1/10S5FInhXuAClaSIm/wBztkA6z1cbMReOl4 /p/OPg14e2OG4WgnrCayb8asS4+Xx5Y= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=McWKMK3m; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf24.hostedemail.com: domain of pabeni@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=pabeni@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725956799; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i/f3xKucSZstOyu20HXtUEhNFbLGLB7gxrDvpwkC6hI=; b=DSPiEuQwtoAb/00io9vfgsh43r5NyY9jSIQs886JpVwftIkPqAX7bbvXM4gYycyooJ7oB9 LQNWGGIet24AA+oAD1HYJ1LtVAULmMH7SxJPaEfMsvS7sJKkZtpm+IJG7mxZEev7yKTlEu ErCaz/QH5Bd/t5rRHq0vNmy4tVULde8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725956802; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i/f3xKucSZstOyu20HXtUEhNFbLGLB7gxrDvpwkC6hI=; b=McWKMK3mBDTKB0ma1HI+N6RKE7JIWbTRpWfUfLtGNhnyxvHRdpJHgSObRQMGWtTYx+WY2v KCZ6A+xU32LYgvTXGFO8Z76tprHN49YKRZ/I1RTIIWNMFWuBrTEw4NCUj5z2LRn0FZ3uqI YkYznUQCOqE0zkuHrm21RkMpU8KWEnE= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-641-TMlM6shwNruyQYrepSofTQ-1; Tue, 10 Sep 2024 04:26:40 -0400 X-MC-Unique: TMlM6shwNruyQYrepSofTQ-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-42cb115566eso23663255e9.2 for ; Tue, 10 Sep 2024 01:26:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725956799; x=1726561599; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=i/f3xKucSZstOyu20HXtUEhNFbLGLB7gxrDvpwkC6hI=; b=rUOBpZVm8blFxE8Kx/Nf4EH5OjvlT0XfuUJDXNKYbn0IUdhAH5exHUpIyjo3r25mpd HndVEwwcy4KqKF6TECUT25HmALLrBFBDDgAMHwoSxR65VVrVvTnthujU1AA6ulZhhAkv S0ZimMcn4GqQZlH2G3VbXcxw1RkIVfrJEU+VKCx8meBUj4NpFaGdAjhCZLmmhKo3YJuc bpvG68waIxYtaXd9+oyTG836MeuGnKKT2u1OTTCwDItJuxkw8uttDanxjyjYhfAiPRfY JeV0Xg91ZoyNuDakOq8BTRjipQGNpUA4wXZrUh4mGOkgi6Fn6FC34MHp34uWbOQazUG2 4UZg== X-Forwarded-Encrypted: i=1; AJvYcCXNq8fWm4GWTLhXKcjpiwM8uVT53CCReW5TqjxAfeVkoQS2s+BTowokCP2Y9fV6rORvilV661cWbQ==@kvack.org X-Gm-Message-State: AOJu0YzinYL3DYeDCR4BROTd4YjbMvvfXkVpvwj6/JpDZFLSXmUkJ/Yt lx5pFznibZOEYdDHZjrmQqCwoRpnLwmXL9NWIc0pYpHixqLx1z8p9C1i5wDpwsrpPG1zJQbikhX 3jfhs3h565c0FByOxARXPCPEmCiAvN2CfD8fxoC1CdUatnr16k3ngrOKzEs0= X-Received: by 2002:a05:600c:34ce:b0:42c:aeaa:6b0d with SMTP id 5b1f17b1804b1-42caeaa6d63mr87920305e9.9.1725956799115; Tue, 10 Sep 2024 01:26:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEEDyrPSKH2mkRn9e1Vet7iGJS2hDUtsGPa3BCEFXCXiRlZt2EERC3XThHVOSGVofUnbC+P5A== X-Received: by 2002:a05:600c:34ce:b0:42c:aeaa:6b0d with SMTP id 5b1f17b1804b1-42caeaa6d63mr87919835e9.9.1725956798487; Tue, 10 Sep 2024 01:26:38 -0700 (PDT) Received: from [192.168.88.27] (146-241-69-130.dyn.eolo.it. [146.241.69.130]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42caeb43d73sm105918415e9.24.2024.09.10.01.26.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 10 Sep 2024 01:26:38 -0700 (PDT) Message-ID: <5d48fe26-1e18-4941-99d4-8c03e83f5e76@redhat.com> Date: Tue, 10 Sep 2024 10:26:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Paolo Abeni Subject: Re: [PATCH v4] memcg: add charging of already allocated slab objects To: Shakeel Butt , Andrew Morton , Vlastimil Babka Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , David Rientjes , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Eric Dumazet , "David S . Miller" , Jakub Kicinski , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Meta kernel team , cgroups@vger.kernel.org, netdev@vger.kernel.org References: <20240905173422.1565480-1-shakeel.butt@linux.dev> In-Reply-To: <20240905173422.1565480-1-shakeel.butt@linux.dev> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: dphanckp8yhwkyi788wgpghz13fjuyrq X-Rspamd-Queue-Id: DF0C8180009 X-Rspamd-Server: rspam02 X-HE-Tag: 1725956802-352970 X-HE-Meta: U2FsdGVkX1/rJMT0KhXmfXuWEnhFZ+Tyt3LHM+dz1cjiI58Q5hKU0nHo9ftps68N3Kf6rvzbILu19JE8fpuA1UKJ1ZSbx6lydEw9+mvK3UFg1PJG7uH+H8n4m+B/r4g14NY/wdY6CVNpIxcOmdTsUr7xtTDkOdmouALRl4uXCAQ7InwgdQ6GQ7qiV7PIQBDnlnGODCwMRc7jNFneONyqafaQ2ZPxc1OfZXsn0vD2vxDcWXRuqDntmoRerZTz3vsNX+piZK30Kvt2zcjFDPpgyPVnVTkGYsC/w3/OpmaTfdOzapxz+JItPnAM4Xtp3tb6qzEUmWNnJs5PfaGPcLYCncGj3/ZA2cCx6p0+2DJ+s+uB2V3/Hm5pOBdUF2NRyLH/H7GPxQeLxB0fWiJlZqmURUvS6l4rwUSJ2PPmMP66Fc4yPNLrEuEzk3p+lJtxfF1Wi6pE26DDtl+u9GU1Ob6lDPBDEp0yuEMBhYr37EexESc7mDi+GEFtFUKMqqxHUwmZ93zdybvZfpvOPL79XMU3qOY8c4wxnZtyjuoCivl+Sa/OtByy5A46NIRXQTd6dfm7xQeh1MoectXpeE300duDplp/6xKij5KtN8I9z0mTCOWjckNupXYDHsGpFwLTEjzaFQkub9LBS9Y9PwZ9w5oEOgMbcU9u658kmMQkx32S3vU48mhaSj89g6VCGjwSsA4wAitTZiILb/lvYsQQskV2ohBAzimuwyOtPxJyr01PZi9k/IEwVEZVMZnAREX3yo5dCzlcEjL5vDJ6u+obIFBRkvY7sFAys1IaixNpTXDnRyW0osg7umxE14HdXG7qU+MIpXt6C/W7BAgktRSv+pky9LXIQWYW9JXKrJy2GK2ayLxowWPfZaezXAQPoFNOYjdMHA4KgEa9t3ELxfFU7nsuRZWjHJNOVun0zs1Fvlx3JvaxbNQBRclVK+pOw+c4UmI0AzoUY2BfjdzuuI+Am5g 0b+0Skcs tHCmGHJRyRHMkqA9Pin7lNmpYcfZOzzNcJHgReDMUNRzSKkRVl0zFEtkEF6IEo4kJ6g4GutYkiVHWX88dZ2fCYxuRCAiBfKQZFzRZ4+KhX4kb2lztgHGi6bHXPT9AUpRVpFajSz5geA7lkTXEAhHEkMk6D9N+IHelrYbv3qalVt1cerG1OC9MZSocIEsaaDU974ri2MmCaWgJF5VoC1dlMNG7O4atmWCYFZjgICKfRTcP2aphxtQRdfkD5uR4Qu5L2rqJofNMyYytl5b75bQEaQPbEoSFpCPPS5yiHsFWA1Fl26U8/L5wX501BaX0pjKBSqFv2WxOT351KsIRbvj5TgGcpNg+gOAgqF+UQOHxD5XfLkLKUC7e6Pq7z8XE8FqRZYixJ+T4J/8PPQzLU0jV0EONEdUZtprchLXTjqNMaPtru/qkY7gxaEKN2AJ3h4eKMnWQ2C9x5bn0rySZ/9W+uFJSUuRUkJFD4R6h0i3YynTXiVOaK7RHamrPrvEl2VOqFL6Q2bF+s+IpOwI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 9/5/24 19:34, Shakeel Butt wrote: > At the moment, the slab objects are charged to the memcg at the > allocation time. However there are cases where slab objects are > allocated at the time where the right target memcg to charge it to is > not known. One such case is the network sockets for the incoming > connection which are allocated in the softirq context. > > Couple hundred thousand connections are very normal on large loaded > server and almost all of those sockets underlying those connections get > allocated in the softirq context and thus not charged to any memcg. > However later at the accept() time we know the right target memcg to > charge. Let's add new API to charge already allocated objects, so we can > have better accounting of the memory usage. > > To measure the performance impact of this change, tcp_crr is used from > the neper [1] performance suite. Basically it is a network ping pong > test with new connection for each ping pong. > > The server and the client are run inside 3 level of cgroup hierarchy > using the following commands: > > Server: > $ tcp_crr -6 > > Client: > $ tcp_crr -6 -c -H ${server_ip} > > If the client and server run on different machines with 50 GBPS NIC, > there is no visible impact of the change. > > For the same machine experiment with v6.11-rc5 as base. > > base (throughput) with-patch > tcp_crr 14545 (+- 80) 14463 (+- 56) > > It seems like the performance impact is within the noise. > > Link: https://github.com/google/neper [1] > Signed-off-by: Shakeel Butt > Reviewed-by: Roman Gushchin > --- > v3: https://lore.kernel.org/all/20240829175339.2424521-1-shakeel.butt@linux.dev/ > Changes since v3: > - Add kernel doc for kmem_cache_charge. > > v2: https://lore.kernel.org/all/20240827235228.1591842-1-shakeel.butt@linux.dev/ > Change since v2: > - Add handling of already charged large kmalloc objects. > - Move the normal kmalloc cache check into a function. > > v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/ > Changes since v1: > - Correctly handle large allocations which bypass slab > - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds > > RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/ > Changes since the RFC: > - Added check for already charged slab objects. > - Added performance results from neper's tcp_crr > > > include/linux/slab.h | 20 ++++++++++++++ > mm/slab.h | 7 +++++ > mm/slub.c | 49 +++++++++++++++++++++++++++++++++ > net/ipv4/inet_connection_sock.c | 5 ++-- > 4 files changed, 79 insertions(+), 2 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index eb2bf4629157..68789c79a530 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -547,6 +547,26 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, > gfp_t gfpflags) __assume_slab_alignment __malloc; > #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__)) > > +/** > + * kmem_cache_charge - memcg charge an already allocated slab memory > + * @objp: address of the slab object to memcg charge. > + * @gfpflags: describe the allocation context > + * > + * kmem_cache_charge is the normal method to charge a slab object to the current > + * memcg. The objp should be pointer returned by the slab allocator functions > + * like kmalloc or kmem_cache_alloc. The memcg charge behavior can be controller > + * through gfpflags parameter. > + * > + * There are several cases where it will return true regardless. More > + * specifically: > + * > + * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems. > + * 2. Already charged slab objects. > + * 3. For slab objects from KMALLOC_NORMAL caches. > + * > + * Return: true if charge was successful otherwise false. > + */ > +bool kmem_cache_charge(void *objp, gfp_t gfpflags); > void kmem_cache_free(struct kmem_cache *s, void *objp); > > kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, > diff --git a/mm/slab.h b/mm/slab.h > index dcdb56b8e7f5..9f907e930609 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -443,6 +443,13 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s) > return (s->flags & SLAB_KMALLOC); > } > > +static inline bool is_kmalloc_normal(struct kmem_cache *s) > +{ > + if (!is_kmalloc_cache(s)) > + return false; > + return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT)); > +} > + > /* Legal flag mask for kmem_cache_create(), for various configurations */ > #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \ > SLAB_CACHE_DMA32 | SLAB_PANIC | \ > diff --git a/mm/slub.c b/mm/slub.c > index c9d8a2497fd6..3f2a89f7a23a 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2185,6 +2185,41 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, > > __memcg_slab_free_hook(s, slab, p, objects, obj_exts); > } > + > +static __fastpath_inline > +bool memcg_slab_post_charge(void *p, gfp_t flags) > +{ > + struct slabobj_ext *slab_exts; > + struct kmem_cache *s; > + struct folio *folio; > + struct slab *slab; > + unsigned long off; > + > + folio = virt_to_folio(p); > + if (!folio_test_slab(folio)) { > + return folio_memcg_kmem(folio) || > + (__memcg_kmem_charge_page(folio_page(folio, 0), flags, > + folio_order(folio)) == 0); > + } > + > + slab = folio_slab(folio); > + s = slab->slab_cache; > + > + /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */ > + if (is_kmalloc_normal(s)) > + return true; > + > + /* Ignore already charged objects. */ > + slab_exts = slab_obj_exts(slab); > + if (slab_exts) { > + off = obj_to_index(s, slab, p); > + if (unlikely(slab_exts[off].objcg)) > + return true; > + } > + > + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p); > +} > + > #else /* CONFIG_MEMCG */ > static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s, > struct list_lru *lru, > @@ -2198,6 +2233,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, > void **p, int objects) > { > } > + > +static inline bool memcg_slab_post_charge(void *p, gfp_t flags) > +{ > + return true; > +} > #endif /* CONFIG_MEMCG */ > > /* > @@ -4062,6 +4102,15 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, > } > EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof); > > +bool kmem_cache_charge(void *objp, gfp_t gfpflags) > +{ > + if (!memcg_kmem_online()) > + return true; > + > + return memcg_slab_post_charge(objp, gfpflags); > +} > +EXPORT_SYMBOL(kmem_cache_charge); > + > /** > * kmem_cache_alloc_node - Allocate an object on the specified node > * @s: The cache to allocate from. > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index 64d07b842e73..3c13ca8c11fb 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) > release_sock(sk); > if (newsk && mem_cgroup_sockets_enabled) { > int amt = 0; > + gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL; > > /* atomically get the memory usage, set and charge the > * newsk->sk_memcg. > @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg) > } > > if (amt) > - mem_cgroup_charge_skmem(newsk->sk_memcg, amt, > - GFP_KERNEL | __GFP_NOFAIL); > + mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp); > + kmem_cache_charge(newsk, gfp); > > release_sock(newsk); > } The networking bits looks sane to me - with a very minor nit about the reverse xmas tree order in variables declaration above. Acked-by: Paolo Abeni