linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Muchun Song <muchun.song@linux.dev>,
	Vlastimil Babka <vbabka@suse.cz>,
	 David Rientjes <rientjes@google.com>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	 Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	 Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	Meta kernel team <kernel-team@meta.com>,
	 cgroups@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH v2] memcg: add charging of already allocated slab objects
Date: Tue, 27 Aug 2024 17:34:24 -0700	[thread overview]
Message-ID: <CAJD7tkawaUoTBQLW1tUfFc06uBacjJH7d6iUFE+fzM5+jgOBig@mail.gmail.com> (raw)
In-Reply-To: <20240827235228.1591842-1-shakeel.butt@linux.dev>

On Tue, Aug 27, 2024 at 4:52 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> At the moment, the slab objects are charged to the memcg at the
> allocation time. However there are cases where slab objects are
> allocated at the time where the right target memcg to charge it to is
> not known. One such case is the network sockets for the incoming
> connection which are allocated in the softirq context.
>
> Couple hundred thousand connections are very normal on large loaded
> server and almost all of those sockets underlying those connections get
> allocated in the softirq context and thus not charged to any memcg.
> However later at the accept() time we know the right target memcg to
> charge. Let's add new API to charge already allocated objects, so we can
> have better accounting of the memory usage.
>
> To measure the performance impact of this change, tcp_crr is used from
> the neper [1] performance suite. Basically it is a network ping pong
> test with new connection for each ping pong.
>
> The server and the client are run inside 3 level of cgroup hierarchy
> using the following commands:
>
> Server:
>  $ tcp_crr -6
>
> Client:
>  $ tcp_crr -6 -c -H ${server_ip}
>
> If the client and server run on different machines with 50 GBPS NIC,
> there is no visible impact of the change.
>
> For the same machine experiment with v6.11-rc5 as base.
>
>           base (throughput)     with-patch
> tcp_crr   14545 (+- 80)         14463 (+- 56)
>
> It seems like the performance impact is within the noise.
>
> Link: https://github.com/google/neper [1]
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
> v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
> Changes since v1:
> - Correctly handle large allocations which bypass slab
> - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
>
> RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
> Changes since the RFC:
> - Added check for already charged slab objects.
> - Added performance results from neper's tcp_crr
>
>  include/linux/slab.h            |  1 +
>  mm/slub.c                       | 51 +++++++++++++++++++++++++++++++++
>  net/ipv4/inet_connection_sock.c |  5 ++--
>  3 files changed, 55 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index eb2bf4629157..05cfab107c72 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>                             gfp_t gfpflags) __assume_slab_alignment __malloc;
>  #define kmem_cache_alloc_lru(...)      alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
>  void kmem_cache_free(struct kmem_cache *s, void *objp);
>
>  kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
> diff --git a/mm/slub.c b/mm/slub.c
> index c9d8a2497fd6..8265ea5f25be 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2185,6 +2185,43 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
>
>         __memcg_slab_free_hook(s, slab, p, objects, obj_exts);
>  }
> +
> +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
> +                     SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
> +
> +static __fastpath_inline
> +bool memcg_slab_post_charge(void *p, gfp_t flags)
> +{
> +       struct slabobj_ext *slab_exts;
> +       struct kmem_cache *s;
> +       struct folio *folio;
> +       struct slab *slab;
> +       unsigned long off;
> +
> +       folio = virt_to_folio(p);
> +       if (!folio_test_slab(folio)) {
> +               return __memcg_kmem_charge_page(folio_page(folio, 0), flags,
> +                                               folio_order(folio)) == 0;

Will this charge the folio again if it was already charged? It seems
like we avoid this for already charged slab objects below but not
here.

> +       }
> +
> +       slab = folio_slab(folio);
> +       s = slab->slab_cache;
> +
> +       /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> +       if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC)
> +               return true;

Would it be clearer to check if the slab cache is one of
kmalloc_caches[KMALLOC_NORMAL]? This should be doable by comparing the
address of the slab cache with the addresses of
kmalloc_cache[KMALLOC_NORMAL] (perhaps in a helper). I need to refer
to your reply to Roman to understand why this works.

> +
> +       /* Ignore already charged objects. */
> +       slab_exts = slab_obj_exts(slab);
> +       if (slab_exts) {
> +               off = obj_to_index(s, slab, p);
> +               if (unlikely(slab_exts[off].objcg))
> +                       return true;
> +       }
> +
> +       return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> +}
> +
>  #else /* CONFIG_MEMCG */
>  static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
>                                               struct list_lru *lru,
> @@ -2198,6 +2235,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
>                                         void **p, int objects)
>  {
>  }
> +
> +static inline bool memcg_slab_post_charge(void *p, gfp_t flags)
> +{
> +       return true;
> +}
>  #endif /* CONFIG_MEMCG */
>
>  /*
> @@ -4062,6 +4104,15 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
>
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
> +{
> +       if (!memcg_kmem_online())
> +               return true;
> +
> +       return memcg_slab_post_charge(objp, gfpflags);
> +}
> +EXPORT_SYMBOL(kmem_cache_charge);
> +
>  /**
>   * kmem_cache_alloc_node - Allocate an object on the specified node
>   * @s: The cache to allocate from.
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 64d07b842e73..3c13ca8c11fb 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>         release_sock(sk);
>         if (newsk && mem_cgroup_sockets_enabled) {
>                 int amt = 0;
> +               gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>
>                 /* atomically get the memory usage, set and charge the
>                  * newsk->sk_memcg.
> @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>                 }
>
>                 if (amt)
> -                       mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
> -                                               GFP_KERNEL | __GFP_NOFAIL);
> +                       mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
> +               kmem_cache_charge(newsk, gfp);
>
>                 release_sock(newsk);
>         }
> --
> 2.43.5
>
>


  reply	other threads:[~2024-08-28  0:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-27 23:52 Shakeel Butt
2024-08-28  0:34 ` Yosry Ahmed [this message]
2024-08-28 19:14   ` Shakeel Butt
2024-08-28 19:42     ` Yosry Ahmed
2024-08-28 20:16       ` Shakeel Butt
2024-08-28 22:10         ` Yosry Ahmed
2024-08-28 23:25 ` Yosry Ahmed
2024-08-29  0:20   ` Shakeel Butt
2024-08-29  0:49     ` Yosry Ahmed
2024-08-29  8:42       ` Vlastimil Babka
2024-08-29 15:50         ` Shakeel Butt
2024-08-29 18:28         ` Yosry Ahmed
2024-08-29  9:42 ` Vlastimil Babka
2024-08-29 16:10   ` Shakeel Butt
2024-08-29 16:20     ` Roman Gushchin
2024-08-29 17:39     ` Vlastimil Babka
2024-08-30 20:34 ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkawaUoTBQLW1tUfFc06uBacjJH7d6iUFE+fzM5+jgOBig@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox