From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E0A4AC77B7A
	for <linux-mm@archiver.kernel.org>; Wed, 17 May 2023 17:06:04 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 2B26B900004; Wed, 17 May 2023 13:06:04 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 23AD0900003; Wed, 17 May 2023 13:06:04 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 104C7900004; Wed, 17 May 2023 13:06:04 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id EF9AC900003
	for <linux-mm@kvack.org>; Wed, 17 May 2023 13:06:03 -0400 (EDT)
Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id AA7B3A02A4
	for <linux-mm@kvack.org>; Wed, 17 May 2023 17:06:03 +0000 (UTC)
X-FDA: 80800374606.26.5DB3004
Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173])
	by imf17.hostedemail.com (Postfix) with ESMTP id 78077400FA
	for <linux-mm@kvack.org>; Wed, 17 May 2023 17:04:57 +0000 (UTC)
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=MuHHRRyw;
	spf=pass (imf17.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=shakeelb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1684343097;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=gKr+V+2IZIYa6CpuFFeV8UF5upmdCKVs0jpk2D6+3kA=;
	b=eD2m6bpMILSYd4buJ3fD+JmjreHJeYhd8w3LKlZZiZ6qobk1Zl9//zNhLLJmwueiI0uAkY
	vzg+n4IvQQypYuRofoQ7d+IoIxAdSaaF7rhDPqI4jv4fevOMyKpQRr9sPe5xXZTr4TZfxq
	X9bQWWSFFnImxz0iXWapCi6gHfR/JiQ=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684343097; a=rsa-sha256;
	cv=none;
	b=LkMP8+HeSRB1+wKmqxAGIp7Zxp7D734LtyFx4GQdFQDHQtqUU0mbCLOKuZZ5JwDMd91XmB
	yZXbJkiLRODvPVRrJ5HuVUltU8umN03cYIMSz6gItPVQAteZFOpRacE9PgsIbs+nUiK/Xt
	QN4O2WyVGDBQdcvv6Zn+pFH8wJ+z3AY=
ARC-Authentication-Results: i=1;
	imf17.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=MuHHRRyw;
	spf=pass (imf17.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=shakeelb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1ae3f74c98bso4445ad.1
        for <linux-mm@kvack.org>; Wed, 17 May 2023 10:04:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20221208; t=1684343096; x=1686935096;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=gKr+V+2IZIYa6CpuFFeV8UF5upmdCKVs0jpk2D6+3kA=;
        b=MuHHRRywhO1aSHU/MEanLSNwrqTsXNgGhXpEHv+Gd8S0hwOcQL217X2Ki8tX/r2Mdp
         2NEszZRFQrKKEzSloCfMXL+YkCte1+2uEysd7xGefM+J+TWlNJh4TsRgeW77bhP6+kwO
         23Xi1rxMk2QV7bWUV4P//0NIZbIyeuqzNOBaUoMJ84HfZTnE1l5O2PUW2b+iG0T9/T13
         Z9V8t9i5e5YBLwx3c+4zXl3IlWl+5jSODWaiVlemT0TnZoahQxM7WN4iSLNmr5tFG0pH
         BwYPJYGxU7Ajkdyn4I8jL1bvPDyO1qrrHxoTcvahP6GpyH9bqqCWJIwkJRVeShq5tJLn
         LieA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684343096; x=1686935096;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=gKr+V+2IZIYa6CpuFFeV8UF5upmdCKVs0jpk2D6+3kA=;
        b=JAxa1jNddFV8OGw1sKk+mKAgNboFxqBTxcRHEKpHrecBbpasFvsyzymZeNbRDY9xGN
         QYKY2gf93NWgppFntaDctBzIQvFpTBUMMkOuxWHLnxRwzGD6YO7MAgATX5AV7zKIbFHb
         TBSEDXjOlLRJZ6/OJBOGaCOrJDYGvt/qCtLhQs99HHkD8X9afFBoL+jRdKgbMISZkDS1
         vBe+FGf1V3gxvnYhNzBTRX4sOBtsDELq44Z9DktvsTPfKhIUaQHIJ5WqADrpDS+oliCW
         2q0xgRQNq2Wnyl/Pt4i0q7/INCt058cOZdEm5I2uacY5Ruv0EIz88zMSoTvaWFOgXNhS
         Bygw==
X-Gm-Message-State: AC+VfDyz7fEmh9JL3x8nbWWx+y4aDuz38AfrDwtRgpxu/KolFtk2u4+7
	rZ1+n1S7gDbpt014Iuhgm5AsQWXiUzBq2XPXT6kSt/MeJL+I80ZQj7Mywg==
X-Google-Smtp-Source: ACHHUZ7KKUoCffZgSyGuzx7i3/4H3NPwhF7EzftKGDKe1OgKxOmnMBm6vVmgCSTa78JQT6bJzOPyyzJSIV0YwtBHRVU=
X-Received: by 2002:a17:902:d483:b0:1a9:bb1d:64e with SMTP id
 c3-20020a170902d48300b001a9bb1d064emr353009plg.15.1684343093602; Wed, 17 May
 2023 10:04:53 -0700 (PDT)
MIME-Version: 1.0
References: <CH3PR11MB7345DBA6F79282169AAFE9E0FC759@CH3PR11MB7345.namprd11.prod.outlook.com>
 <20230512171702.923725-1-shakeelb@google.com> <CH3PR11MB7345035086C1661BF5352E6EFC789@CH3PR11MB7345.namprd11.prod.outlook.com>
 <CALvZod7n2yHU8PMn5b39w6E+NhLtBynDKfo1GEfXaa64_tqMWQ@mail.gmail.com>
 <CH3PR11MB7345E9EAC5917338F1C357C0FC789@CH3PR11MB7345.namprd11.prod.outlook.com>
 <CALvZod6txDQ9kOHrNFL64XiKxmbVHqMtWNiptUdGt9UuhQVLOQ@mail.gmail.com>
 <ZGMYz+08I62u+Yeu@xsang-OptiPlex-9020> <20230517162447.dztfzmx3hhetfs2q@google.com>
 <CANn89iL0SD=F69b=naEmzoKysscnHGX7tP6jF9MOvthSeZ53Pw@mail.gmail.com>
In-Reply-To: <CANn89iL0SD=F69b=naEmzoKysscnHGX7tP6jF9MOvthSeZ53Pw@mail.gmail.com>
From: Shakeel Butt <shakeelb@google.com>
Date: Wed, 17 May 2023 10:04:42 -0700
Message-ID: <CALvZod6LFdydR5Zdhx1SMgknxTUJgabewi5-Ux6U=nO105GPSg@mail.gmail.com>
Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size
To: Eric Dumazet <edumazet@google.com>, Andrew Morton <akpm@linux-foundation.org>
Cc: Oliver Sang <oliver.sang@intel.com>, Zhang Cathy <cathy.zhang@intel.com>, 
	Yin Fengwei <fengwei.yin@intel.com>, Feng Tang <feng.tang@intel.com>, 
	Linux MM <linux-mm@kvack.org>, Cgroups <cgroups@vger.kernel.org>, 
	Paolo Abeni <pabeni@redhat.com>, "davem@davemloft.net" <davem@davemloft.net>, 
	"kuba@kernel.org" <kuba@kernel.org>, Brandeburg Jesse <jesse.brandeburg@intel.com>, 
	Srinivas Suresh <suresh.srinivas@intel.com>, Chen Tim C <tim.c.chen@intel.com>, 
	You Lizhen <lizhen.you@intel.com>, "eric.dumazet@gmail.com" <eric.dumazet@gmail.com>, 
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>, philip.li@intel.com, yujie.liu@intel.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 78077400FA
X-Rspam-User: 
X-Rspamd-Server: rspam06
X-Stat-Signature: nzszxg7drc6swymw7yj7h8kmtnf13e54
X-HE-Tag: 1684343097-779896
X-HE-Meta: U2FsdGVkX18lQuU4qWSVaVRRftuq9TlSRowqlQn97Wg+OGDq+FSNK1878jkrp1s+6IeFW5qjR1bR/ewk2O7FClujDJPAi7gHon0sEw5W58dMtwu0AD/X53IF3b/ksMKwqmKdbnW7PAHfkCw4fAgLk+cCBunObNG0KuEhjyd7xNotoUTh+dYZ61NTtwdeSDwVzEtJyUr8HRcXVXtiukCdDdqN22p4WiQ+WSaThTnzn0gYvo0aBGIwbTheFMWUv8qYtBCFM5KhjmueOcBezja88kuTc7qYrTpN14aV5nIrsLQ2RLvUy2kgFyAyAek1p4ZzLPF20AfFj3n+I6XxOGADv5BGdiKk6/q+rwOX5Hkh3TSvdF3hyGTGEYA24LUUomff6k24PQm/uqTg/LfP2XZZg2JZ4wtcw+yRj1/ONIQKo/cGkU3/ry5OOwrYFFqt9LJKnXc6kINNwxngiEGquapqf7XaWZXDBl5SG8o72eMml4hWib+CTiWJh4ZiTfk80SE16kdU4gWPzyBGLbs5tNwJUJdJW0qVze+dGOMfp6E48uQEui3ou3FiJYDlL5tAAoZ7NZX0lzJQEpFW2797lmYdUKyo7ykSNO1boMtfjG+buP67OQZmAz2od224FOqHkAjlI+2sAOW/vEh4vvB/iZXpA+dzdqp89LFuiys+MMcfgK3ZZ5OXdg+B7VZ1uu9qV57iSs/eT60vBeyULdugTfI77qICmA92R27GEesms2YyEycZKOxVhnfKuOXMXVeBzQMUuS1NYZD0LicLx2SND1iGMEt7q2uy4TOPixxUo/cRg9yccsTJyegMEB9ZY7gIzAcfzvDUrI4qQlrVzE5aY/L1ZpzoQFYaALSMZ0VLUELxLcpj68HBzWjmoVwldo7PXqIlbmQlSlUHTGQB2nnYHa27xgpP55rG0cL+wMUNGdnFuQlZPIuVafMc4XYjW5UtbWI8LncBMsan9VkieM9npcB
 NAVrXDb1
 XYHCsy7iW8lLD60mjliaQVVthDgwx/w4w7ads8KOmQs94/SBDkIPdh1LsjRWrLmoxIklp0S4VH2yvzsgCjh8+WRh8Uiq+Q3DaWXC5f2wi1UWtvEbMLpKRJmyhUx9tPu/KzUJR1rDwb+RAJNG6XAVp2igay9tjDgOItoAiHgtuDrlbDpA41bKOPdmdhM+fiFX76RL7rcU0YPEvfb6vBp/kAaZk5FLNvQzDyHj3EWpZez6u/00o6/sOG01/z+uFaChCRVwqNmommI7HarXDNthNnSAw+w6xW3P8998AsuCOHAnt1aDH3uV+taHp3cKLcQIbY6Phr/Ei+PQN4xzj9N+z+vsEr4+QusQcRF/7a7gkzfnmDqle0wRJCyHS8H2/SpVKxJm7xI1eH0Q4MzuGBbl03hsRwRuqZA6YZT5Y
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

+Andrew

On Wed, May 17, 2023 at 9:33=E2=80=AFAM Eric Dumazet <edumazet@google.com> =
wrote:
>
> On Wed, May 17, 2023 at 6:24=E2=80=AFPM Shakeel Butt <shakeelb@google.com=
> wrote:
> >
> > On Tue, May 16, 2023 at 01:46:55PM +0800, Oliver Sang wrote:
> > > hi Shakeel,
> > >
> > > On Mon, May 15, 2023 at 12:50:31PM -0700, Shakeel Butt wrote:
> > > > +Feng, Yin and Oliver
> > > >
> > > > >
> > > > > > Thanks a lot Cathy for testing. Do you see any performance impr=
ovement for
> > > > > > the memcached benchmark with the patch?
> > > > >
> > > > > Yep, absolutely :- ) RPS (with/without patch) =3D +1.74
> > > >
> > > > Thanks a lot Cathy.
> > > >
> > > > Feng/Yin/Oliver, can you please test the patch at [1] with other
> > > > workloads used by the test robot? Basically I wanted to know if it =
has
> > > > any positive or negative impact on other perf benchmarks.
> > >
> > > is it possible for you to resend patch with Signed-off-by?
> > > without it, test robot will regard the patch as informal, then it can=
not feed
> > > into auto test process.
> > > and could you tell us the base of this patch? it will help us apply i=
t
> > > correctly.
> > >
> > > on the other hand, due to resource restraint, we normally cannot supp=
ort
> > > this type of on-demand test upon a single patch, patch set, or a bran=
ch.
> > > instead, we try to merge them into so-called hourly-kernels, then dis=
tribute
> > > tests and auto-bisects to various platforms.
> > > after we applying your patch and merging it to hourly-kernels sccussf=
ully,
> > > if it really causes some performance changes, the test robot could sp=
ot out
> > > this patch as 'fbc' and we will send report to you. this could happen=
 within
> > > several weeks after applying.
> > > but due to the complexity of whole process (also limited resourse, su=
ch like
> > > we cannot run all tests on all platforms), we cannot guanrantee captu=
re all
> > > possible performance impacts of this patch. and it's hard for us to p=
rovide
> > > a big picture like what's the general performance impact of this patc=
h.
> > > this maybe is not exactly what you want. is it ok for you?
> > >
> > >
> >
> > Yes, that is fine and thanks for the help. The patch is below:
> >
> >
> > From 93b3b4c5f356a5090551519522cfd5740ae7e774 Mon Sep 17 00:00:00 2001
> > From: Shakeel Butt <shakeelb@google.com>
> > Date: Tue, 16 May 2023 20:30:26 +0000
> > Subject: [PATCH] memcg: skip stock refill in irq context
> >
> > The linux kernel processes incoming packets in softirq on a given CPU
> > and those packets may belong to different jobs. This is very normal on
> > large systems running multiple workloads. With memcg enabled, network
> > memory for such packets is charged to the corresponding memcgs of the
> > jobs.
> >
> > Memcg charging can be a costly operation and the memcg code implements
> > a per-cpu memcg charge caching optimization to reduce the cost of
> > charging. More specifically, the kernel charges the given memcg for mor=
e
> > memory than requested and keep the remaining charge in a local per-cpu
> > cache. The insight behind this heuristic is that there will be more
> > charge requests for that memcg in near future. This optimization works
> > well when a specific job runs on a CPU for long time and majority of th=
e
> > charging requests happen in process context. However the kernel's
> > incoming packet processing does not work well with this optimization.
> >
> > Recently Cathy Zhang has shown [1] that memcg charge flushing within th=
e
> > memcg charge path can become a performance bottleneck for the memcg
> > charging of network traffic.
> >
> > Perf profile:
> >
> > 8.98%  mc-worker        [kernel.vmlinux]          [k] page_counter_canc=
el
> >     |
> >      --8.97%--page_counter_cancel
> >                |
> >                 --8.97%--page_counter_uncharge
> >                           drain_stock
> >                           __refill_stock
> >                           refill_stock
> >                           |
> >                            --8.91%--try_charge_memcg
> >                                      mem_cgroup_charge_skmem
> >                                      |
> >                                       --8.91%--__sk_mem_raise_allocated
> >                                                 __sk_mem_schedule
> >                                                 |
> >                                                 |--5.41%--tcp_try_rmem_=
schedule
> >                                                 |          tcp_data_que=
ue
> >                                                 |          tcp_rcv_esta=
blished
> >                                                 |          tcp_v4_do_rc=
v
> >                                                 |          tcp_v4_rcv
> >
> > The simplest way to solve this issue is to not refill the memcg charge
> > stock in the irq context. Since networking is the main source of memcg
> > charging in the irq context, other users will not be impacted. In
> > addition, this will preseve the memcg charge cache of the application
> > running on that CPU.
> >
> > There are also potential side effects. What if all the packets belong t=
o
> > the same application and memcg? More specifically, users can use Receiv=
e
> > Flow Steering (RFS) to make sure the kernel process the packets of the
> > application on the CPU where the application is running. This change ma=
y
> > cause the kernel to do slowpath memcg charging more often in irq
> > context.
>
> Could we have per-memcg per-cpu caches, instead of one set of per-cpu cac=
hes
> needing to be drained evertime a cpu deals with 'another memcg' ?
>

The hierarchical nature of memcg makes that a bit complicated. We have
something similar for memcg stats which is rstat infra where the stats
are saved per-memcg per-cpu and get accumulated hierarchically every 2
seconds. This works fine for stats but for limits there would be a
need for some additional restrictions.

Also sometime ago Andrew asked me to explore replacing the atomic
counter in page_counter with percpu_counter. Intuition is that most of
the time the usage is not hitting the limit, so we can use
__percpu_counter_compare for enforcement.

Let me spend some time to explore per-memcg per-cpu cache or if
percpu_counter would be better.

For now, this patch is more like an RFC.