From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31F96C0015E for ; Thu, 27 Jul 2023 00:19:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 816206B0072; Wed, 26 Jul 2023 20:19:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C68C6B0074; Wed, 26 Jul 2023 20:19:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68FC38D0001; Wed, 26 Jul 2023 20:19:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5AAA36B0072 for ; Wed, 26 Jul 2023 20:19:13 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2AD85160436 for ; Thu, 27 Jul 2023 00:19:13 +0000 (UTC) X-FDA: 81055482186.18.5BE7EA9 Received: from out-32.mta1.migadu.com (out-32.mta1.migadu.com [95.215.58.32]) by imf27.hostedemail.com (Postfix) with ESMTP id 0168C4000D for ; Thu, 27 Jul 2023 00:19:10 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=lHp8Ez0c; spf=pass (imf27.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.32 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690417151; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jizH7v8h0ocLCeBVRJsVfm8/gDMdAlThvwyLIayxfSI=; b=tQBn+66LK5qxoHsZO1PVtSWQsgF1TJ8Or2/XPGP5Sr4s+zct6//CuI8aEBkdEe5spi6Mh8 q3cgTNF0KCFZU7yzLofbE1ecZCaaD7BJ7VJ9HmQldz4pvvkP7uXmqzdfyhSTGso85uIy1Q oyByye8IP0P3CSTduXBYElKO0l1UugA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690417151; a=rsa-sha256; cv=none; b=uXq7CTp1CVjSRluqJ7vFsneuozGKM1cwOFUkD47blcA2vz259LTHTJBvXrONEFPN0WR9AQ +xdo+Y4qIJbDq7zLaDEyZm/8ycNS3GLALrPbco1XLbOFkNox7X+keO1165nKl1BgqU2Z0D nVJ1RihWfIrUFaYsRMB5gJW8NSRnFyg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=lHp8Ez0c; spf=pass (imf27.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.32 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Wed, 26 Jul 2023 17:19:00 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690417148; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jizH7v8h0ocLCeBVRJsVfm8/gDMdAlThvwyLIayxfSI=; b=lHp8Ez0cvZftnjDrB+shn/YNJVCu9ie28cUDW6kKAWN9CYushKaLWWLFDlFpQOd9vnCkaO r4qTSQbFbOUgr2R2v2e9qVbliFGWAF6WcfaWgPuxoQboQjxu39lTx2LJSJKTdhv8YbLd4U 7V/b6hzWXibp+IHendqnuPPJOMGZtJM= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Abel Wu Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Andrew Morton , David Ahern , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Kefeng Wang , Yafang Shao , Kuniyuki Iwashima , Martin KaFai Lau , Alexander Mikhalitsyn , Breno Leitao , David Howells , Jason Xing , Xin Long , Michal Hocko , Alexei Starovoitov , open list , "open list:NETWORKING [GENERAL]" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Subject: Re: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure Message-ID: References: <20230711124157.97169-1-wuyun.abel@bytedance.com> <58e75f44-16e3-a40a-4c8a-0f61bbf393f9@bytedance.com> <29de901f-ae4c-a900-a553-17ec4f096f0e@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <29de901f-ae4c-a900-a553-17ec4f096f0e@bytedance.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 0168C4000D X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 1b4p7psdbcarn71fta7chg64tc1tf9tf X-HE-Tag: 1690417150-727307 X-HE-Meta: U2FsdGVkX19T2hAPX4EWHEKPsuqUWLatLepTHL0RAZUPN8MfefxcKuA1o4t0oFghSVf7UxkqYP1fCgzGTkdTWimvdm9Ek6T/J2vYBaHe7VOvGVgzr6l6n4mYdR/A2DEmTBa66mvwhnxxXSaFn0V84x3YQ2+/800CadlzOKY+dny2B16w0DrNiLw9ERGssu02A/SqmNx9EIZAxciOUQ61CVLojVBFohu0/Ucyu/k8j9t4FmhpRZl73i/KlGbpPnThv4mueQHJ0k67teo/6loXw2xUDEGlHsDf2vjeiYbHC3XXadak2GOvpqFXx1zH0ZrkzD4p9Y2T+pms7P1ZjAZ8TrPi7MigMGGI1j6Tnj7LYCmWpq1hHWF9zTfR0D+bVsmALv1SJTHnR2drJT3g5ISTMUjxrLdd9/9wfpjj4gzC7lKBRpFwetFJU7pMrFTWp1OfRsO7hDUSQhv7isF9ygnAQ+EP4JHrQ32Q98fY1XaSZgrqYrubuAYV1K3HRtoeq8VR9HfSiBu7NnWvOudUN7174u/vQ8K+FjxjVf9GTkgm8ciDaWseYt0qDQ1tX4ZXYeAFvoQTUuRlZ3XoOK7eTu2vOhbXNdcFwssZXjJVCdoMjRJpZ0D0R8WUvT5qA6IEb7OTEdw14KdHeOWdMYMvxisJRRuzrmUI10QuupMBHTpcV+q1+yeBEDjXmzZExEEYIcKLaTyiyyrz49vEJttuxKG+MwWQloUkt6wmXvRCAD7uNC8EBK0cS08hQQDt+avQa1EcTL9AM4KS2pWWkeHyHkihkpbJnZt8EIC9bC6fpEIpNME9W80xXWLpgtClMnM3k6412JpZYgJti/Qmrjw63NxeT+zHXPT0telBhqty8YaLcLlS2FdPIFQX97ka4O6VtDawBBxolH9wW46LQnvyLufHXqrgnRrJgkvijSlo6bMnAWpYbcs7xCFyGmxe/mHwn6ndUFXCyMLiDulCs7boPbX waV/wYUw VRKH0yfx7aUfKw702BV98e6SMR9wJtg4y3UlH4ZgO2xAbWtbGiDgPzQNWMhuN0+7HIG1usZqzKH/9fT45cGbn0HgEtyAMptZrTsClJ144KBTJnPulhYZWFGUOlzhPXZ0gWbBBfBK5zbSKrGFTeEkhnTeoHV05iGsVgnM3/s2GABAd/HnQheY+z6fOeh/Q2t2LQjzOKZGzmZey55HUnsWLXK8YJnPmPnRMNCYi9JI+tZhOTDLYoXHASDsKyh6vxBaVIGs2QIdZJNj1IF55KazOM8NjHA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 26, 2023 at 04:44:24PM +0800, Abel Wu wrote: > On 7/26/23 10:56 AM, Roman Gushchin wrote: > > On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote: > > > Hi Roman, thanks for taking time to have a look! > > > > > > > > > When in legacy mode aka. cgroupv1, the socket memory is charged > > > > > into a separate counter memcg->tcpmem rather than ->memory, so > > > > > the reclaim pressure of the memcg has nothing to do with socket's > > > > > pressure at all. > > > > > > > > But we still might set memcg->socket_pressure and propagate the pressure, > > > > right? > > > > > > Yes, but the pressure comes from memcg->socket_pressure does not mean > > > pressure in socket memory in cgroupv1, which might lead to premature > > > reclamation or throttling on socket memory allocation. As the following > > > example shows: > > > > > > ->memory ->tcpmem > > > limit 10G 10G > > > usage 9G 4G > > > pressure true false > > > > Yes, now it makes sense to me. Thank you for the explanation. > > Cheers! > > > > > Then I'd organize the patchset in the following way: > > 1) cgroup v1-only fix to not throttle tcpmem based on the vmpressure > > 2) a formal code refactoring > > OK, I will take a try to re-organize in next version. Thank you! > > > > > > > > > Overall I think it's a good idea to clean these things up and thank you > > > > for working on this. But I wonder if we can make the next step and leave only > > > > one mechanism for both cgroup v1 and v2 instead of having this weird setup > > > > where memcg->socket_pressure is set differently from different paths on cgroup > > > > v1 and v2. > > > > > > There is some difficulty in unifying the mechanism for both cgroup > > > designs. Throttling socket memory allocation when memcg is under > > > pressure only makes sense when socket memory and other usages are > > > sharing the same limit, which is not true for cgroupv1. Thoughts? > > > > I see... Generally speaking cgroup v1 is considered frozen, so we can leave it > > as it is, except when it creates an unnecessary complexity in the code. > > Are you suggesting that the 2nd patch can be ignored and keep > ->tcpmem_pressure as it is? Or keep the 2nd patch and add some > explanation around as you suggested in last reply? I suggest to split a code refactoring (which is not expected to bring any functional changes) and an actual change of the behavior on cgroup v1. Re the refactoring: I see a lot of value in adding comments and make the code more readable, I don't see that much value in merging two variables. But if it comes organically with the code simplification - nice. > > > > > I'm curious, was your work driven by some real-world problem or a desire to clean > > up the code? Both are valid reasons of course. > > We (a cloud service provider) are migrating users to cgroupv2, > but encountered some problems among which the socket memory > really puts us in a difficult situation. There is no specific > threshold for socket memory in cgroupv2 and relies largely on > workloads doing traffic control themselves. > > Say one workload behaves fine in cgroupv1 with 10G of ->memory > and 1G of ->tcpmem, but will suck (or even be OOMed) in cgroupv2 > with 11G of ->memory due to burst memory usage on socket. > > It's rational for the workloads to build some traffic control > to better utilize the resources they bought, but from kernel's > point of view it's also reasonable to suppress the allocation > of socket memory once there is a shortage of free memory, given > that performance degradation is better than failure. Yeah, I can see it. But Idk if it's too workload-specific to have a single-policy-fits-all-cases approach. E.g. some workloads might prefer to have a portion of pagecache being reclaimed. What do you think? > > Currently the mechanism of net-memcg's pressure doesn't work as > we expected, please check the discussion in [1]. Besides this, > we are also working on mitigating the priority inversion issue > introduced by the net protocols' global shared thresholds [2], > which has something to do with the net-memcg's pressure. This > patchset and maybe some other are byproducts of the above work. > > [1] https://lore.kernel.org/netdev/20230602081135.75424-1-wuyun.abel@bytedance.com/ > [2] https://lore.kernel.org/netdev/20230609082712.34889-1-wuyun.abel@bytedance.com/ Thanks for the clarification!