From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 40F43C19F32
	for <linux-mm@archiver.kernel.org>; Fri,  7 Mar 2025 19:42:11 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 4B4706B0083; Fri,  7 Mar 2025 14:42:08 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4638C6B0085; Fri,  7 Mar 2025 14:42:08 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 32F796B0088; Fri,  7 Mar 2025 14:42:08 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 15FAB6B0083
	for <linux-mm@kvack.org>; Fri,  7 Mar 2025 14:42:08 -0500 (EST)
Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id 13E70A9776
	for <linux-mm@kvack.org>; Fri,  7 Mar 2025 19:42:10 +0000 (UTC)
X-FDA: 83195776020.08.1467D79
Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188])
	by imf14.hostedemail.com (Postfix) with ESMTP id E44E0100002
	for <linux-mm@kvack.org>; Fri,  7 Mar 2025 19:42:07 +0000 (UTC)
Authentication-Results: imf14.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=IWdkuHeR;
	spf=pass (imf14.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.188 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1741376528;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=hUfk1/2NcSfPN/tpPfLTglczAj9VxtvTU3Ogyf9VQss=;
	b=TM7ifL4NIDx8XJrKJsqKbtGe7EC0UVe5V2ZNYGEMWOoJefDHK6zgF5QqEePmAR1daOxCQc
	ZhNLfP6/y94pnXULTgi8w2mM5Da2v036dcB5WNWQHVoPf1FooGeJx5gvEZsouaLW6orH0g
	9P8U8bQZ1ULIfnMTU3gBQV2+kmzAc6I=
ARC-Authentication-Results: i=1;
	imf14.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=IWdkuHeR;
	spf=pass (imf14.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.188 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741376528; a=rsa-sha256;
	cv=none;
	b=3EYBdmyvDI0EtDLyJ+bv00+QgXw9Nncs1v4rGFffzX8xaQSJQMOZZMNuzbR15rAfrURCHY
	VHfejd2tmXyGafDd/YmJx7KnZ2sCL7ONqKE4buXndbV2Ty8Om1jJE1QTrTVDY4mE1GS3NX
	AB03/6CfWmuoz0DjM0n5ZsZ6miAflKg=
Date: Fri, 7 Mar 2025 19:41:59 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1741376525;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=hUfk1/2NcSfPN/tpPfLTglczAj9VxtvTU3Ogyf9VQss=;
	b=IWdkuHeRMDC+9wNq0Lo0/O/5kMk6BYVu38wbtXm8ScUXAhZyee1hoE9gApdp7k2fHqLaEa
	lSJIMUsgS+5avUY2z2QStbB7g6TAtfdGW26DdBIJvqBW7FbfDuySpBvMwgW+pR60vCGAUR
	kaRdKLxXImqEfy4bcU+uXr24hZsNnsU=
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Meta kernel team <kernel-team@meta.com>
Subject: Re: [RFC PATCH] memcg: net: improve charging of incoming network
 traffic
Message-ID: <Z8tMB4i_hBLaSZS1@google.com>
References: <20250307055936.3988572-1-shakeel.butt@linux.dev>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20250307055936.3988572-1-shakeel.butt@linux.dev>
X-Migadu-Flow: FLOW_OUT
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: E44E0100002
X-Stat-Signature: t317x4j6y96ebcxjxk16e6hdjtqm8bd4
X-Rspam-User: 
X-HE-Tag: 1741376527-165553
X-HE-Meta: U2FsdGVkX18yCrCGI9WlqbKOVk9ecR4FAv3hdes60Vm10nF4d1QLCx6/KAO07DdO4p1l/UyOf+8x0mVwQBDNrqZlp05Rc4hkFJ43Z0j21BjzFGTPyajtFOHOvJDTvh5TBKOjbjPtu5lNOo+o+XDyT1jFUdeiPC3EFntObS4VP0e2BjaudpX6bwHSAtJ4/0lq3e773/tdbAeG3ZAVUQ2vGWmZX/VKSt38PWdd4XTzNmksrnyRBvGYUJPJG2GZEZ1MjqbLr1mH3j31e8jcX04SUjVF/iv+4lHkfIcDfF/lv8StgboCmobIUy/iROixax1Nrij1H/DDIaw07BKH0Mt7CVJcitCzsClZnVjuQKrBfKKruQDXMnU4dMU03e44g3+dYc3eHri//1rwECt04mf+dhuOui3hp5uzmNIVWWEf/h4vOw8+qQZC3Ao2aFF+pXMx/7AryHKrLsga7mIvtkluGPukF6wxHcWU7eTp1b6/A2RYYP8BOcJ++RIKcwfT7rH/uPrcvgW8mihI6zLMT9CZdYCZRDRL8ntGaEB10i7+PbwoqQ7nW/gN6YGGrSILC3XgFYlL3nDYtA0vQrnd3QX3NipVrVZnx4kbjXzuvDNZRjm7pkfz8vnbjrdqSzuu5zN0eN5ep6Bs9hogEeh5qQWxISqD4AUc8hkyzqiE2LUO0oX/6LACbuAq27Zqa5qSKG4Isf9q3+wwCJ4/9wE7yO0ng8TuTy7pY+WjLBx3NR8nnqbw9XqUYgXOT8wohxl3aM5vM0424hM1kNwtCVFNE2Db/6q1CqvsZIFb9CosVBPA17SImM0RteTgsTLgFhHDiJYHNwtNxrkopl80x8CZiIos4B5kt8WYbCjJtajC9IbXeOA7qCCBmZw+VcGKxB5ZNW0xK+agOTfWS1tYZPj8xVpEE+7ubr4N3vOlSH3DD3WyAqqgyDVMlkXaLwKyE8c44cm5h3XExpGC4kTPotpjwdZ
 wsI78CoD
 g9OFe64FA+8t3md37TB1+eyeHcW1reS2TCeHkGDB9ZKy5U284HsULk30DhaRL0IYarMoIO/5+oEGkv5My+Vzn4sBd9Mj+he1xyN4quWTkouYVTwFQ4dqFZ14zuIYyYCpthU8madTkCo+GXn3n39AgSAvIlKgY8+KyLRVx3sWYaRDOr5gpfaxBhfD9UjfsOsfb+HPL10k3+xEVf8K9JFEtXViL7A==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, Mar 06, 2025 at 09:59:36PM -0800, Shakeel Butt wrote:
> Memory cgroup accounting is expensive and to reduce the cost, the kernel
> maintains per-cpu charge cache for a single memcg. So, if a charge
> request comes for a different memcg, the kernel will flush the old
> memcg's charge cache and then charge the newer memcg a fixed amount (64
> pages), subtracts the charge request amount and stores the remaining in
> the per-cpu charge cache for the newer memcg.
> 
> This mechanism is based on the assumption that the kernel, for locality,
> keep a process on a CPU for long period of time and most of the charge
> requests from that process will be served by that CPU's local charge
> cache.
> 
> However this assumption breaks down for incoming network traffic in a
> multi-tenant machine. We are in the process of running multiple
> workloads on a single machine and if such workloads are network heavy,
> we are seeing very high network memory accounting cost. We have observed
> multiple CPUs spending almost 100% of their time in net_rx_action and
> almost all of that time is spent in memcg accounting of the network
> traffic.
> 
> More precisely, net_rx_action is serving packets from multiple workloads
> and is observing/serving mix of packets of these workloads. The memcg
> switch of per-cpu cache is very expensive and we are observing a lot of
> memcg switches on the machine. Almost all the time is being spent on
> charging new memcg and flushing older memcg cache. So, definitely we
> need per-cpu cache that support multiple memcgs for this scenario.

We've internally faced a different situation on machines with a large
number of CPUs where the mod_memcg_state(MEMCG_SOCK) call in
mem_cgroup_[un]charge_skmem() causes latency due to high contention on
the atomic update in memcg_rstat_updated().

In this case, networking performs a lot of charge/uncharge operations,
but because we count the absolute magnitude updates in
memcg_rstat_updated(), we reach the threshold quickly. In practice, a
lot of these updates cancel each other out so the net change in the
stats may not be that large.

However, not using the absolute value of the updates could cause stat
updates of irrelevant stats with opposite polarity to cancel out,
potentially delaying stat updates.

I wonder if we can leverage the batching introduced here to fix this
problem as well. For example, if the charging in
mem_cgroup_[un]charge_skmem() is satisfied from this catch, can we avoid
mod_memcg_state() and only update the stats once at the end of batching?

IIUC the current implementation only covers the RX path, so it will
reduce the number of calls to mod_memcg_state(), but it won't prevent
charge/uncharge operations from raising the update counter
unnecessarily. I wonder if the scope of the batching could be increased
so that both TX and RX use the same cache, and charge/uncharge
operations cancel out completely in terms of stat updates.

WDYT?