From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3B65C83F1A for ; Mon, 14 Jul 2025 14:37:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 63C028D0009; Mon, 14 Jul 2025 10:37:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5ED608D0001; Mon, 14 Jul 2025 10:37:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4DB6E8D0009; Mon, 14 Jul 2025 10:37:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 39CDC8D0001 for ; Mon, 14 Jul 2025 10:37:15 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F266A12AFC8 for ; Mon, 14 Jul 2025 14:37:14 +0000 (UTC) X-FDA: 83663122788.06.496C925 Received: from mail-internal.sh.cz (mail-internal.sh.cz [95.168.196.40]) by imf24.hostedemail.com (Postfix) with ESMTP id 0ACC9180002 for ; Mon, 14 Jul 2025 14:37:12 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=cdn77.com header.s=dkim2019 header.b=C42GjA+i; spf=pass (imf24.hostedemail.com: domain of daniel.sedlak@cdn77.com designates 95.168.196.40 as permitted sender) smtp.mailfrom=daniel.sedlak@cdn77.com; dmarc=pass (policy=quarantine) header.from=cdn77.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752503833; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nmfvFOIF+WoxFHwhudl/2NOecTFCtVuNHCzK+miq5GM=; b=E9zptz6IFRXDLryyeGBYUSgzSaLxMVt2chDV/BeMXPWoZjQhJho5JlghrHp28FeF74I2aJ 3Hsi3XM2/c3rl9uk6atcsGvEU48PAvmkDtBL4Chwzg3tgymkuC47Um0vsEXk1WOGK1H2bP uNTpUzQLOYOVlfJ9+iTTiftTDfkk3dU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=cdn77.com header.s=dkim2019 header.b=C42GjA+i; spf=pass (imf24.hostedemail.com: domain of daniel.sedlak@cdn77.com designates 95.168.196.40 as permitted sender) smtp.mailfrom=daniel.sedlak@cdn77.com; dmarc=pass (policy=quarantine) header.from=cdn77.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752503833; a=rsa-sha256; cv=none; b=1/u/4VOLZ5B23Kf56YCwd3lpV06Fns+ccbvfJWwST1XL1TZd8NBILPtyooKgpWU4fNmgCy q/x0L8YslZNBjEDqZ4+d2IQN+PaWZMMXrU0uqX7OafNoaIUBBmSk4P0cdcIyiA60rpom3x RyT39pDeSc5rRVRWLr+vm44XFEMrgDc= DKIM-Signature: a=rsa-sha256; t=1752503828; x=1753108628; s=dkim2019; d=cdn77.com; c=relaxed/relaxed; v=1; bh=nmfvFOIF+WoxFHwhudl/2NOecTFCtVuNHCzK+miq5GM=; h=From:Subject:Date:Message-ID:To:Cc:MIME-Version:Content-Transfer-Encoding:In-Reply-To:References; b=C42GjA+i7kkUPdiKTJ2ajROZlAWw4YnTDKYi9n3ZHy2P2EHv232vWrJRhXhcRg2zu9y9jE3gAghTFd9x5Gk+mxQk3qgIArpmODIKJb/3p4jEtIMRnfU4Xivruug1hmeusrWBLrQaRIjpocNG1txZJSytk55tJS32dy+CtLz1FeM= Received: from osgiliath.superhosting.cz ([95.168.203.222]) by mail.sh.cz (14.1.0 build 16 ) with ASMTP (SSL) id 202507141637071511; Mon, 14 Jul 2025 16:37:07 +0200 From: Daniel Sedlak To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Jonathan Corbet , Neal Cardwell , Kuniyuki Iwashima , David Ahern , Andrew Morton , Shakeel Butt , Yosry Ahmed , linux-mm@kvack.org, netdev@vger.kernel.org Cc: Daniel Sedlak , Matyas Hurtik Subject: [PATCH v2 net-next 1/2] tcp: account for memory pressure signaled by cgroup Date: Mon, 14 Jul 2025 16:36:12 +0200 Message-ID: <20250714143613.42184-2-daniel.sedlak@cdn77.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20250714143613.42184-1-daniel.sedlak@cdn77.com> References: <20250714143613.42184-1-daniel.sedlak@cdn77.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CTCH: RefID="str=0001.0A006396.68751625.000E,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0"; Spam="Unknown"; VOD="Unknown" X-Rspam-User: X-Rspamd-Queue-Id: 0ACC9180002 X-Rspamd-Server: rspam06 X-Stat-Signature: mn1bc64io8zknu411g4jxmyprkaek5od X-HE-Tag: 1752503832-802701 X-HE-Meta: U2FsdGVkX18oUBrnLAWdttZkRApYZHbeklrL5GiDze3QtwZBSV9iNzaey86W0lvbyhs2nx+FSYGTt8ZPmCttsNd2I33ZG79xs83zx5Jo36bDCNHLL9259uW3eFd4BtcinWTlnmukVOjks16ECg86799GmeKBX8FBXq3t+0qjqAwMOQPFThQwxAyq+y8VJ6ZbbyoZPf9rRffR8JI9BjKq2QocLG/gc9FzotG50wfRy7IO2luy6ISjj8AIJuazWlPqK5+PnxZuwz7SQAAUQ2GjyoHkdo7BH8+sKthvO06a4XCuWQNdNKgQmWepiKulaLK/vHAUnkFuisy9EooHKjHT2u6xl//E8H6T8YdPqYGO5nH1h7xru6+pmSAODX1HvTIzJfN5ToK/645ZQBvGwCEwCtSaw/kAPX4+zHeQJ3wYqPiyYUa5vOkuvywEEa5vbInJkzDVy193fdimVml7AQD3Pqy6zBShF9YEtHvOfCH67oXApRJ09Jn0jlmCyC2hOidtmAiBCSwHW+4YZwkDaoZ0oSUDySU/sEpjkPF9GL7HJ0WsYfhPL0eyV23P/EG2tIL1Kg7zQHVdnQhA2/c/JD1nwvQulShccnbcKPtEpITEWlKpcP2fq0a/ghFc1gVom6Xd5J+HI42uCPOuP/D0rmtLcs5uPHfcItYNdp32nVTY09ohLAhtHpA1g+E7H6opcX2aRJxMHjR3qCleoN3QCzVfKCvNpDSse34hKCUpfwDd3DJK5WJKpL9S8Pd3KVg54bd/oqFsUpcMQjtG33jWVAkVRsusheYgKmlmrv/ka/z6hrfM4dRujCu7gmhfTVsrQGwwMUJJ1AZPlZzXMxBhkrStbR0alLmbMKZTmye8I8rxPJUQKfGJ9OHLtt+6LkWVifI9oadjVD+OX7N6I67wIFqYy0tlaoGYGsr2P4qdYP178gfOXRvAwpvNzwo2Egu0jIUYPnZ1lAC9COY6VwnJvQJ FAH6atiI Qaw/AwOdtMpuPvJu0B3bON3IAiRKInsYLQV0WRX2uBroJbejF7Mq8Zp4nH6+0plLXPSigZHnVgmqSZ5Rh9afDN97LYN2PG2I+mbf9wSU07bvdVKZZrh4gJnWo43UOJIdY6rG/b3ip8LaMdi9VmRGts9q13SHJNw+02XbY4FCsxBoyC/WfYkY1tYBo7U26QtrOFyLUHNxfuTF9M9gqGoSi/mgeOF+l9mOw0/H9jJV6O9YcoIY4ONHSUZdnXOTd+423ota+f15uo+DsUqM6glcS5OQA8PTkSKqde+lr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch is a result of our long-standing debug sessions, where it all started as "networking is slow", and TCP network throughput suddenly dropped from tens of Gbps to few Mbps, and we could not see anything in the kernel log or netstat counters. Currently, we have two memory pressure counters for TCP sockets [1], which we manipulate only when the memory pressure is signalled through the proto struct [2]. However, the memory pressure can also be signaled through the cgroup memory subsystem, which we do not reflect in the netstat counters. In the end, when the cgroup memory subsystem signals that it is under pressure, we silently reduce the advertised TCP window with tcp_adjust_rcv_ssthresh() to 4*advmss, which causes a significant throughput reduction. So this patch adds a new counter to account for memory pressure signaled by the memory cgroup, so it is much easier to spot. Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/uapi/linux/snmp.h#L231-L232 [1] Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/net/sock.h#L1300-L1301 [2] Co-developed-by: Matyas Hurtik Signed-off-by: Matyas Hurtik Signed-off-by: Daniel Sedlak --- Documentation/networking/net_cachelines/snmp.rst | 1 + include/net/tcp.h | 14 ++++++++------ include/uapi/linux/snmp.h | 1 + net/ipv4/proc.c | 1 + 4 files changed, 11 insertions(+), 6 deletions(-) diff --git a/Documentation/networking/net_cachelines/snmp.rst b/Documentation/networking/net_cachelines/snmp.rst index bd44b3eebbef..ed17ff84e39c 100644 --- a/Documentation/networking/net_cachelines/snmp.rst +++ b/Documentation/networking/net_cachelines/snmp.rst @@ -76,6 +76,7 @@ unsigned_long LINUX_MIB_TCPABORTONLINGER unsigned_long LINUX_MIB_TCPABORTFAILED unsigned_long LINUX_MIB_TCPMEMORYPRESSURES unsigned_long LINUX_MIB_TCPMEMORYPRESSURESCHRONO +unsigned_long LINUX_MIB_TCPCGROUPSOCKETPRESSURE unsigned_long LINUX_MIB_TCPSACKDISCARD unsigned_long LINUX_MIB_TCPDSACKIGNOREDOLD unsigned_long LINUX_MIB_TCPDSACKIGNOREDNOUNDO diff --git a/include/net/tcp.h b/include/net/tcp.h index 761c4a0ad386..aae3efe24282 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -267,6 +267,11 @@ extern long sysctl_tcp_mem[3]; #define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */ #define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in RACK */ +#define TCP_INC_STATS(net, field) SNMP_INC_STATS((net)->mib.tcp_statistics, field) +#define __TCP_INC_STATS(net, field) __SNMP_INC_STATS((net)->mib.tcp_statistics, field) +#define TCP_DEC_STATS(net, field) SNMP_DEC_STATS((net)->mib.tcp_statistics, field) +#define TCP_ADD_STATS(net, field, val) SNMP_ADD_STATS((net)->mib.tcp_statistics, field, val) + extern atomic_long_t tcp_memory_allocated; DECLARE_PER_CPU(int, tcp_memory_per_cpu_fw_alloc); @@ -277,8 +282,10 @@ extern unsigned long tcp_memory_pressure; static inline bool tcp_under_memory_pressure(const struct sock *sk) { if (mem_cgroup_sockets_enabled && sk->sk_memcg && - mem_cgroup_under_socket_pressure(sk->sk_memcg)) + mem_cgroup_under_socket_pressure(sk->sk_memcg)) { + TCP_INC_STATS(sock_net(sk), LINUX_MIB_TCPCGROUPSOCKETPRESSURE); return true; + } return READ_ONCE(tcp_memory_pressure); } @@ -316,11 +323,6 @@ bool tcp_check_oom(const struct sock *sk, int shift); extern struct proto tcp_prot; -#define TCP_INC_STATS(net, field) SNMP_INC_STATS((net)->mib.tcp_statistics, field) -#define __TCP_INC_STATS(net, field) __SNMP_INC_STATS((net)->mib.tcp_statistics, field) -#define TCP_DEC_STATS(net, field) SNMP_DEC_STATS((net)->mib.tcp_statistics, field) -#define TCP_ADD_STATS(net, field, val) SNMP_ADD_STATS((net)->mib.tcp_statistics, field, val) - void tcp_tsq_work_init(void); int tcp_v4_err(struct sk_buff *skb, u32); diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index 1d234d7e1892..9e8d1a5e56a9 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -231,6 +231,7 @@ enum LINUX_MIB_TCPABORTFAILED, /* TCPAbortFailed */ LINUX_MIB_TCPMEMORYPRESSURES, /* TCPMemoryPressures */ LINUX_MIB_TCPMEMORYPRESSURESCHRONO, /* TCPMemoryPressuresChrono */ + LINUX_MIB_TCPCGROUPSOCKETPRESSURE, /* TCPCgroupSocketPressure */ LINUX_MIB_TCPSACKDISCARD, /* TCPSACKDiscard */ LINUX_MIB_TCPDSACKIGNOREDOLD, /* TCPSACKIgnoredOld */ LINUX_MIB_TCPDSACKIGNOREDNOUNDO, /* TCPSACKIgnoredNoUndo */ diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index ea2f01584379..0bcec9a51fb0 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -235,6 +235,7 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TCPAbortFailed", LINUX_MIB_TCPABORTFAILED), SNMP_MIB_ITEM("TCPMemoryPressures", LINUX_MIB_TCPMEMORYPRESSURES), SNMP_MIB_ITEM("TCPMemoryPressuresChrono", LINUX_MIB_TCPMEMORYPRESSURESCHRONO), + SNMP_MIB_ITEM("TCPCgroupSocketPressure", LINUX_MIB_TCPCGROUPSOCKETPRESSURE), SNMP_MIB_ITEM("TCPSACKDiscard", LINUX_MIB_TCPSACKDISCARD), SNMP_MIB_ITEM("TCPDSACKIgnoredOld", LINUX_MIB_TCPDSACKIGNOREDOLD), SNMP_MIB_ITEM("TCPDSACKIgnoredNoUndo", LINUX_MIB_TCPDSACKIGNOREDNOUNDO), -- 2.39.5