From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A822EC43334
	for <linux-mm@archiver.kernel.org>; Fri, 24 Jun 2022 05:45:13 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3937A8E01E4; Fri, 24 Jun 2022 01:45:13 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 31C858E01E3; Fri, 24 Jun 2022 01:45:13 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 1BC448E01E4; Fri, 24 Jun 2022 01:45:13 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 06E968E01E3
	for <linux-mm@kvack.org>; Fri, 24 Jun 2022 01:45:13 -0400 (EDT)
Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id BDF3320E5E
	for <linux-mm@kvack.org>; Fri, 24 Jun 2022 05:45:12 +0000 (UTC)
X-FDA: 79612041264.24.88A4F55
Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172])
	by imf22.hostedemail.com (Postfix) with ESMTP id 58C2CC0029
	for <linux-mm@kvack.org>; Fri, 24 Jun 2022 05:45:12 +0000 (UTC)
Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-3176d94c236so14797557b3.3
        for <linux-mm@kvack.org>; Thu, 23 Jun 2022 22:45:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=UUGqTgN+5YKv2H0rf/ztiANxsWf1MI3a/7GrwKOQcgM=;
        b=VWOlSEOdFiePk9Rc5gCg1TK+lVqlDRMW8n9OQyh6w4erjgn20jqCcBjiyzbPeEZkm/
         Ju3DJRybXMomwT4SKbwv6AoRGa8i06EDXov0akpTQSw2VVZfmiL+N8uKdpKC6GAqyXrg
         XLNNQRLU7BkoOv7W/61vjxPbuxN3h9QYx7UHfu3UqhJkGHdevTZAqrYlmVFW7uQQ++HK
         9oOh+E/03sukVd8lf6Jl+FKqID9qaVM8fSN1m0cuK3dUi4PQOeYXhtPJeikMdaqgt/YL
         7s30iAUICIM1bsnf1ub4rqh8GV1lNDCECsTSq1H3Vqwo8vD4ZPmGqICzqh0ia+ZWym65
         fanw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=UUGqTgN+5YKv2H0rf/ztiANxsWf1MI3a/7GrwKOQcgM=;
        b=Gj69ZMYAMz4igzfuLcrrCy0enAJlHP8rnB5dY81Ds3tRvsOfSo7WPx2jhEoB2xlHnV
         pSgNKDxzHElATnnGhCcxYgYH0XcKHqrAsEWo8+ZvOoEfrpld3AIgxcXC0L0bXlugntKl
         /6pVIkFeOiuT233+QLbFfjFvmQBh8V4+3hEbAABBk/Pyi5+4fyeSBEPRCHQf27py91dM
         hsi/e4mg7seoJPY5qCaUb19Ch61a+b0yAGlyEkxsXwN6scqx+QEamqlHScBzsF7Vktkq
         KToXB72TSDpD8yJTYzr/0xZ7F7Ab4P3gf9RS1Ylt2HVnNuBNFKfbuHWyS6LYlj0lTlhA
         cGfA==
X-Gm-Message-State: AJIora/MSwBbvG6zC8xl2d/yTZe1e5t6+VcMOf3YBbVbWH9pYdzxgjUy
	aRukz+11GY/jReD71hSaPN7yQmPN6+HZaFdZqSFeOw==
X-Google-Smtp-Source: AGRyM1sGmW1bcpDPhuGfi1tODFyvBPsSQPhj1u2FdrTxYlZVUp1IWGCGVa7x0KMrFG85ujhORR0J1SjmvvKnNbwvSj0=
X-Received: by 2002:a81:1809:0:b0:317:c014:f700 with SMTP id
 9-20020a811809000000b00317c014f700mr14454062ywy.255.1656049511378; Thu, 23
 Jun 2022 22:45:11 -0700 (PDT)
MIME-Version: 1.0
References: <20220619150456.GB34471@xsang-OptiPlex-9020> <20220622172857.37db0d29@kernel.org>
 <CADvbK_csvmkKe46hT9792=+Qcjor2EvkkAnr--CJK3NGX-N9BQ@mail.gmail.com>
 <CADvbK_eQUmb942vC+bG+NRzM1ki1LiCydEDR1AezZ35Jvsdfnw@mail.gmail.com>
 <20220623185730.25b88096@kernel.org> <CANn89iLidqjiiV8vxr7KnUg0JvfoS9+TRGg=8ANZ8NBRjeQxsQ@mail.gmail.com>
 <20220624051351.GA72171@shbuild999.sh.intel.com>
In-Reply-To: <20220624051351.GA72171@shbuild999.sh.intel.com>
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 24 Jun 2022 07:45:00 +0200
Message-ID: <CANn89iLwwN7hRsJD_skbcRNY9sBtPh1fhULKco5wosx_i4x6gg@mail.gmail.com>
Subject: Re: [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression
To: Feng Tang <feng.tang@intel.com>
Cc: Jakub Kicinski <kuba@kernel.org>, Xin Long <lucien.xin@gmail.com>, 
	Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>, kernel test robot <oliver.sang@intel.com>, 
	Shakeel Butt <shakeelb@google.com>, Soheil Hassas Yeganeh <soheil@google.com>, 
	LKML <linux-kernel@vger.kernel.org>, 
	Linux Memory Management List <linux-mm@kvack.org>, network dev <netdev@vger.kernel.org>, linux-s390@vger.kernel.org, 
	MPTCP Upstream <mptcp@lists.linux.dev>, 
	"linux-sctp @ vger . kernel . org" <linux-sctp@vger.kernel.org>, lkp@lists.01.org, 
	kbuild test robot <lkp@intel.com>, Huang Ying <ying.huang@intel.com>, zhengjun.xing@linux.intel.com, 
	fengwei.yin@intel.com, Ying Xu <yinxu@redhat.com>
Content-Type: text/plain; charset="UTF-8"
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1656049512;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=UUGqTgN+5YKv2H0rf/ztiANxsWf1MI3a/7GrwKOQcgM=;
	b=sfRLBB7l1S1StuVX71EOH+FOuXbqV7rKsAOGL+wY+q/qAOGSl4+SW7PZ9/ARGhvx2o+PwR
	nbh0C2Esx8K9pkaC9MmnWPPqMaqOLBidjmPvHbHJa2LNcoR5BUQK9+iFIljdWuOKxwraMQ
	/gUMyHir9ZyVRDcHuE2egqYwncpr+Po=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656049512; a=rsa-sha256;
	cv=none;
	b=wZN1ptskF6hzk9EU5B4YjNsLPfraK0L1DRS8JNEtjmMK6xR4NGA3DapE5QvDxgkRg+0Ftw
	+dHBew1Em0LjLxNdNQJFiY6p4AbZ6ZgCGJBKunfaYElCcCO2FchP2zr+bwgfT2JJ6yv7gr
	n3TCDa23hAi5nbc1gd+CXZEPJ1fGEKc=
ARC-Authentication-Results: i=1;
	imf22.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=VWOlSEOd;
	spf=pass (imf22.hostedemail.com: domain of edumazet@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=edumazet@google.com;
	dmarc=pass (policy=reject) header.from=google.com
X-Rspam-User: 
Authentication-Results: imf22.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=VWOlSEOd;
	spf=pass (imf22.hostedemail.com: domain of edumazet@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=edumazet@google.com;
	dmarc=pass (policy=reject) header.from=google.com
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 58C2CC0029
X-Stat-Signature: dxkm5zzrqj9dn8qyg4eaea7xfj9w5hy5
X-HE-Tag: 1656049512-43397
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Jun 24, 2022 at 7:14 AM Feng Tang <feng.tang@intel.com> wrote:
>
> Hi Eric,
>
> On Fri, Jun 24, 2022 at 06:13:51AM +0200, Eric Dumazet wrote:
> > On Fri, Jun 24, 2022 at 3:57 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Thu, 23 Jun 2022 18:50:07 -0400 Xin Long wrote:
> > > > From the perf data, we can see __sk_mem_reduce_allocated() is the one
> > > > using CPU the most more than before, and mem_cgroup APIs are also
> > > > called in this function. It means the mem cgroup must be enabled in
> > > > the test env, which may explain why I couldn't reproduce it.
> > > >
> > > > The Commit 4890b686f4 ("net: keep sk->sk_forward_alloc as small as
> > > > possible") uses sk_mem_reclaim(checking reclaimable >= PAGE_SIZE) to
> > > > reclaim the memory, which is *more frequent* to call
> > > > __sk_mem_reduce_allocated() than before (checking reclaimable >=
> > > > SK_RECLAIM_THRESHOLD). It might be cheap when
> > > > mem_cgroup_sockets_enabled is false, but I'm not sure if it's still
> > > > cheap when mem_cgroup_sockets_enabled is true.
> > > >
> > > > I think SCTP netperf could trigger this, as the CPU is the bottleneck
> > > > for SCTP netperf testing, which is more sensitive to the extra
> > > > function calls than TCP.
> > > >
> > > > Can we re-run this testing without mem cgroup enabled?
> > >
> > > FWIW I defer to Eric, thanks a lot for double checking the report
> > > and digging in!
> >
> > I did tests with TCP + memcg and noticed a very small additional cost
> > in memcg functions,
> > because of suboptimal layout:
> >
> > Extract of an internal Google bug, update from June 9th:
> >
> > --------------------------------
> > I have noticed a minor false sharing to fetch (struct
> > mem_cgroup)->css.parent, at offset 0xc0,
> > because it shares the cache line containing struct mem_cgroup.memory,
> > at offset 0xd0
> >
> > Ideally, memcg->socket_pressure and memcg->parent should sit in a read
> > mostly cache line.
> > -----------------------
> >
> > But nothing that could explain a "-69.4% regression"
>
> We can double check that.
>
> > memcg has a very similar strategy of per-cpu reserves, with
> > MEMCG_CHARGE_BATCH being 32 pages per cpu.
>
> We have proposed patch to increase the batch numer for stats
> update, which was not accepted as it hurts the accuracy and
> the data is used by many tools.
>
> > It is not clear why SCTP with 10K writes would overflow this reserve constantly.
> >
> > Presumably memcg experts will have to rework structure alignments to
> > make sure they can cope better
> > with more charge/uncharge operations, because we are not going back to
> > gigantic per-socket reserves,
> > this simply does not scale.
>
> Yes, the memcg statitics and charge/unchage update is very sensitive
> with the data alignemnt layout, and can easily trigger peformance
> changes, as we've seen quite some similar cases in the past several
> years.
>
> One pattern we've seen is, even if a memcg stats updating or charge
> function only takes about 2%~3% of the CPU cycles in perf-profile data,
> once it got affected, the peformance change could be amplified to up to
> 60% or more.
>

Reorganizing "struct mem_cgroup" to put "struct page_counter memory"
in a separate cache line would be beneficial.

Many low hanging fruits, assuming nobody will use __randomize_layout on it ;)

Also some fields are written even if their value is not changed.

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abec50f31fe64100f4be5b029c7161b3a6077a74..53d9c1e581e78303ef73942e2b34338567987b74
100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7037,10 +7037,12 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup
*memcg, unsigned int nr_pages,
                struct page_counter *fail;

                if (page_counter_try_charge(&memcg->tcpmem, nr_pages, &fail)) {
-                       memcg->tcpmem_pressure = 0;
+                       if (READ_ONCE(memcg->tcpmem_pressure))
+                               WRITE_ONCE(memcg->tcpmem_pressure, 0);
                        return true;
                }
-               memcg->tcpmem_pressure = 1;
+               if (!READ_ONCE(memcg->tcpmem_pressure))
+                       WRITE_ONCE(memcg->tcpmem_pressure, 1);
                if (gfp_mask & __GFP_NOFAIL) {
                        page_counter_charge(&memcg->tcpmem, nr_pages);
                        return true;