From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B789CA0FE1 for ; Fri, 1 Sep 2023 14:00:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13B8D8D001E; Fri, 1 Sep 2023 10:00:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0ED8D8D0002; Fri, 1 Sep 2023 10:00:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF5C38D001E; Fri, 1 Sep 2023 10:00:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E0A598D0002 for ; Fri, 1 Sep 2023 10:00:03 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B2351402D9 for ; Fri, 1 Sep 2023 14:00:03 +0000 (UTC) X-FDA: 81188187486.02.DF1E266 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf21.hostedemail.com (Postfix) with ESMTP id 3C9731C0021 for ; Fri, 1 Sep 2023 13:59:59 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="e/sGK5Ko"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf21.hostedemail.com: domain of horms@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=horms@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693576801; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N8BJ7JjKDiFPni+s1opkRaW3+9qWJm1q0Ja0QF8r2z0=; b=ZJ25PCF34s95yN15iRPGYLbZn7ujr9+7xBrg8CSYDbvqfuU/vND6etxQ5ckkXxv+9Aj2R2 juut0RN1Ckyg/YskvtIIIEn6b/+Kr/ZNhG8A6WjSCwaMcY+AA375cZiWGtpii3btFmnrPl igi34MctuiQcIsRHj6BoTJeGpSA4dDY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="e/sGK5Ko"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf21.hostedemail.com: domain of horms@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=horms@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693576801; a=rsa-sha256; cv=none; b=LRDkZ2NhRSQBawrPISBMCU2iHw0I1HbBtBdlsRi93BF24Xz0rIbQbHqneKcgz+zWrR3h3D 9y44WHHT7FQ8+t/j0vxr6HRrGg3ppK9FoMWKYmAxot3UPvL+44lLZdXQBCyae7R4xYhztP 7pQs9d3nrI3ypz9Ov52fMFOLhN2fo9g= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 09E64CE239F; Fri, 1 Sep 2023 13:59:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5931C433CA; Fri, 1 Sep 2023 13:59:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1693576794; bh=yAwUPzOrWH4shkflWMSqld45TrqZ1FzcaWesFYjjXnQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=e/sGK5KomfdYJHa7siMkKhrPtuNkRH2/f0YhyeCK6a6xz49C5fJR1qfzFMDHS2WOG tUCbN0EyT89U1z3ecSft1cQFZpEpOiW+3oZ+WRK9OehXa00zlkKJVbIWTfAnMaeHB/ qRRSQwgkrpU41mcz14rpG2u3qQTAeX4oFLDrfrzn0tMBs1PDe+0sFE1FPFp42mE2vL 4VMky/evuWA6pqNsFXnxZqKiBNoGSFz6MAlhR5HKijc5aaL14L+7x8/oUIb2VkA8qX OWxc4YQuCoV+XroMTtmyutqsyzwQGTI2C2SmmUzLld12/2caF+Daw5h0lI9IGXI0Q7 lJa+OpgHI3XOQ== Date: Fri, 1 Sep 2023 15:59:32 +0200 From: Simon Horman To: Abel Wu Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Andrew Morton , Shakeel Butt , Roman Gushchin , Michal Hocko , Johannes Weiner , Yosry Ahmed , Yu Zhao , "Matthew Wilcox (Oracle)" , Kefeng Wang , Yafang Shao , Kuniyuki Iwashima , Martin KaFai Lau , Breno Leitao , Alexander Mikhalitsyn , David Howells , Jason Xing , open list , "open list:NETWORKING [GENERAL]" , "open list:MEMORY MANAGEMENT" Subject: Re: [RFC PATCH net-next 3/3] sock: Throttle pressure-aware sockets under pressure Message-ID: <20230901135932.GH140739@kernel.org> References: <20230901062141.51972-1-wuyun.abel@bytedance.com> <20230901062141.51972-4-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230901062141.51972-4-wuyun.abel@bytedance.com> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 3C9731C0021 X-Stat-Signature: tzmnedcxsei1ct4osb8tcehozatpqoti X-HE-Tag: 1693576799-73957 X-HE-Meta: U2FsdGVkX18wPEFHzpFGJJKh9RjF9fbbI44/stAPNXmEM2sNAi385/zPpEf/xQ8cCF/tu2Me3iJoXwQXoqljIBKzZQmCV3d3gP8/EBERyv5MGN8IwIKAeFvvOcDQEbtkTt2Pf6dOPX9xQQCQLaX5Z5PrfAj0OPvFzfIRESvvCPcE0PTOMrbzbypGW159fnbr/fbpvKyk8wYdBfoL42gZH4V7iQxZKQQxiGqktzJK7uGbwpVhuqArkHfKM+kHogNoas9LLZdmaHPMSYcwrn3m8oI51V7UwvATTgsXSwLYX63XNbli/hInvtgPXd1WJDHeqqD+Lv85jZnpix4MSE8+LMCa5BatOEBWVk51Rz604qFMO+3a05eTNg8aXtrE94dDw/heyVH12hFX0aYpBhyQKZQLs8L1Wsaz0PatySoGgJTrH0IDevoQCRXzuZvr0d6bu8IdzJ0ArI4HmNFyFrmzE+Q1j6EbJ7SIZcNuMjxHY4b6N/nFFrT9GWxqw6tQnB4Xk6JWpNi63x/SylA57KvHOMEaVAd3MwHLsZ10GyhRB9AA7iTt6h7Crz/F/RlKKzKa/zV5bAFx1mxEydhVT73c2hdrcfsG/EhquhriZb0SYi9TbBBUOsMjn2cqtQ+0fX7LXdUIUjtB29zF85kkflbBP0FFQXN1o/0pr9Z36+OEtkQANJOUKa8Y5mKCJvrEo4+shZQwsx/IJPvqXrZE+9Y1HXZbjCuYVugckbu2bOXNMu35KEYcE2liPse1Q3Mzl7GOuWjyeW1c5ev6TGm1tcMZz90wt8GywhivYeA5ICmyR+UXJk7RTNmw3v2yVCYcHxl8pM2PKneKhZwERVRYkhYQ4GN5v9NjKuoj4Pp/dKnZ0eg9tvtSuc6mosmypKcucB8pMVpUkjyCVjHpVGIbCcR0v8Uf2borLthK5pu9CxaUL/RjbodtGoKorgcCZk+pjtEhwr9GbjFIbAksmsuwvTc eEt2Pybn zfjziN9az2ntdjh3YqhUhzaAT/5XnPrKZ6oOCLwrLMYtz4ZdGnFYscHDB2BBRWsnRwZfGVbEz86PXL54jnbKvwuIIAwQPwBIw4G1DOAtFAWnW/XpUfhcVcLDOBM37odLVIS3tNaUqd8ZwZV4xEeFXKSLzpgt7a8jgbHWGXlV/LsWAjGNTpX4aXOhRoc+cGKftsqNpab08hUEyKkW++mxU/9An+cff8JqLGashyF2umfDTTu6108sw2BErM5E9cFmpfDEtojE83CsVjwjGkd5Kqokzdcz9BeKK+VTJNXTLvEQY67gRhav9k+9oPZ5t7d8XXV+7R0mKBRzXuawi/AWv+1ge2JI2EQfR7Im5hrXFa3OqR9d4cqX9VBqq/tBPtqJXxr1CX4nLmxqRKl+jgO4fq0L4KvDPED36ibL1H5v0QGI1e8k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Sep 01, 2023 at 02:21:28PM +0800, Abel Wu wrote: > A socket is pressure-aware when its protocol has pressure defined, that > is sk_has_memory_pressure(sk) != NULL, e.g. TCP. These protocols might > want to limit the usage of socket memory depending on both the state of > global & memcg pressure through sk_under_memory_pressure(sk). > > While for allocation, memcg pressure will be simply ignored when usage > is under global limit (sysctl_mem[0]). This behavior has different impacts > on different cgroup modes. In cgroupv2 socket and other purposes share a > same memory limit, thus allowing sockmem to burst under memcg reclaiming > pressure could lead to longer stall, sometimes even OOM. While cgroupv1 > has no such worries. > > As a cloud service provider, we encountered a problem in our production > environment during the transition from cgroup v1 to v2 (partly due to the > heavy taxes of accounting socket memory in v1). Say one workload behaves > fine in cgroupv1 with memcg limit configured to 10GB memory and another > 1GB tcpmem, but will suck (or even be OOM-killed) in v2 with 11GB memory > due to burst memory usage on socket, since there is no specific limit for > socket memory in cgroupv2 and relies largely on workloads doing traffic > control themselves. > > It's rational for the workloads to build some traffic control to better > utilize the resources they bought, but from kernel's point of view it's > also reasonable to suppress the allocation of socket memory once there is > a shortage of free memory, given that performance degradation is better > than failure. > > As per the above, this patch aims to be more conservative on allocation > for the pressure-aware sockets under global and/or memcg pressure. While > OTOH throttling on incoming traffic could hurt latency badly possibly > due to SACKed segs get dropped from the OFO queue. See a related commit > 720ca52bcef22 ("net-memcg: avoid stalls when under memory pressure"). > This patch preserves this decision by throttling RX allocation only at > critical pressure level when it hardly makes sense to continue receive > data. > > No functional change intended for pressure-unaware protocols. > > Signed-off-by: Abel Wu ... > @@ -3087,8 +3100,20 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind) > if (sk_has_memory_pressure(sk)) { > u64 alloc; > > - if (!sk_under_memory_pressure(sk)) > + /* Be more conservative if the socket's memcg (or its > + * parents) is under reclaim pressure, try to possibly > + * avoid further memstall. > + */ > + if (under_memcg_pressure) > + goto suppress_allocation; > + > + if (!sk_under_global_memory_pressure(sk)) > return 1; > + > + /* Trying to be fair among all the sockets of same > + * protocal under global memory pressure, by allowing nit: checkpatch.pl --codespell says, protocal -> protocol > + * the ones that under average usage to raise. > + */ > alloc = sk_sockets_allocated_read_positive(sk); > if (sk_prot_mem_limits(sk, 2) > alloc * > sk_mem_pages(sk->sk_wmem_queued + > -- > 2.37.3 > >