Re: [PATCH] mm: memcg: optimize parent iteration in memcg_rstat_updated()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosryahmed@google.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Muchun Song <muchun.song@linux.dev>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org,
	kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH] mm: memcg: optimize parent iteration in memcg_rstat_updated()
Date: Wed, 24 Jan 2024 12:53:52 -0800	[thread overview]
Message-ID: <CAJD7tkYR8-Xo566=KwxiZDJVOqG0NvYCo0jwg59Loqd22CwCuA@mail.gmail.com> (raw)
In-Reply-To: <CALvZod5+S5RLt5t=+ZvrRgOkAhNvC9mJo1SE7r6Ms1LRodV3RQ@mail.gmail.com>

On Wed, Jan 24, 2024 at 9:38 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Jan 24, 2024 at 2:00 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > In memcg_rstat_updated(), we iterate the memcg being updated and its
> > parents to update memcg->vmstats_percpu->stats_updates in the fast path
> > (i.e. no atomic updates). According to my math, this is 3 memory loads
> > (and potentially 3 cache misses) per memcg:
> > - Load the address of memcg->vmstats_percpu.
> > - Load vmstats_percpu->stats_updates (based on some percpu calculation).
> > - Load the address of the parent memcg.
> >
> > Avoid most of the cache misses by caching a pointer from each struct
> > memcg_vmstats_percpu to its parent on the corresponding CPU. In this
> > case, for the first memcg we have 2 memory loads (same as above):
> > - Load the address of memcg->vmstats_percpu.
> > - Load vmstats_percpu->stats_updates (based on some percpu calculation).
> >
> > Then for each additional memcg, we need a single load to get the
> > parent's stats_updates directly. This reduces the number of loads from
> > O(3N) to O(2+N) -- where N is the number of memcgs we need to iterate.

This is actually O(1+N) not O(2+N). Every memcg needs one load, and
the first one needs an extra load.

> >
> > Additionally, stash a pointer to memcg->vmstats in each struct
> > memcg_vmstats_percpu such that we can access the atomic counter that all
> > CPUs fold into, memcg->vmstats->stats_updates.
> > memcg_should_flush_stats() is changed to memcg_vmstats_needs_flush() to
> > accept a struct memcg_vmstats pointer accordingly.
> >
> > In struct memcg_vmstats_percpu, make sure both pointers together with
> > stats_updates live on the same cacheline. Finally, update
> > mem_cgroup_alloc() to take in a parent pointer and initialize the new
> > cache pointers on each CPU. The percpu loop in mem_cgroup_alloc() may
> > look concerning, but there are multiple similar loops in the cgroup
> > creation path (e.g. cgroup_rstat_init()), most of which are hidden
> > within alloc_percpu().
> >
> > According to Oliver's testing [1], this fixes multiple 30-38%
> > regressions in vm-scalability, will-it-scale-tlb_flush2, and
> > will-it-scale-fallocate1. This comes at a cost of 2 more pointers per
> > CPU (<2KB on a machine with 128 CPUs).
> >
> > [1] https://lore.kernel.org/lkml/ZbDJsfsZt2ITyo61@xsang-OptiPlex-9020/
> >
> > Fixes: 8d59d2214c23 ("mm: memcg: make stats flushing threshold per-memcg")
> > Tested-by: kernel test robot <oliver.sang@intel.com>
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
>
> Nice work.
>
> Acked-by: Shakeel Butt <shakeelb@google.com>

Thanks!

next prev parent reply	other threads:[~2024-01-24 20:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24 10:00 Yosry Ahmed
2024-01-24 17:38 ` Shakeel Butt
2024-01-24 20:53   ` Yosry Ahmed [this message]
2024-01-26 15:36 ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJD7tkYR8-Xo566=KwxiZDJVOqG0NvYCo0jwg59Loqd22CwCuA@mail.gmail.com' \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox