linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Abel Wu <wuyun.abel@bytedance.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Ahern <dsahern@kernel.org>,
	Yosry Ahmed <yosryahmed@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Yu Zhao <yuzhao@google.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	Kuniyuki Iwashima <kuniyu@amazon.com>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Alexander Mikhalitsyn <alexander@mihalicyn.com>,
	Breno Leitao <leitao@debian.org>,
	David Howells <dhowells@redhat.com>,
	Jason Xing <kernelxing@tencent.com>,
	Xin Long <lucien.xin@gmail.com>, Michal Hocko <mhocko@suse.com>,
	Alexei Starovoitov <ast@kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>,
	"open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)"
	<cgroups@vger.kernel.org>,
	"open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)"
	<linux-mm@kvack.org>
Subject: Re: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
Date: Wed, 26 Jul 2023 17:19:00 -0700	[thread overview]
Message-ID: <ZMG39B6B41yLAu9r@P9FQF9L96D> (raw)
In-Reply-To: <29de901f-ae4c-a900-a553-17ec4f096f0e@bytedance.com>

On Wed, Jul 26, 2023 at 04:44:24PM +0800, Abel Wu wrote:
> On 7/26/23 10:56 AM, Roman Gushchin wrote:
> > On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote:
> > > Hi Roman, thanks for taking time to have a look!
> > > > 
> > > > > When in legacy mode aka. cgroupv1, the socket memory is charged
> > > > > into a separate counter memcg->tcpmem rather than ->memory, so
> > > > > the reclaim pressure of the memcg has nothing to do with socket's
> > > > > pressure at all.
> > > > 
> > > > But we still might set memcg->socket_pressure and propagate the pressure,
> > > > right?
> > > 
> > > Yes, but the pressure comes from memcg->socket_pressure does not mean
> > > pressure in socket memory in cgroupv1, which might lead to premature
> > > reclamation or throttling on socket memory allocation. As the following
> > > example shows:
> > > 
> > > 			->memory	->tcpmem
> > > 	limit		10G		10G
> > > 	usage		9G		4G
> > > 	pressure	true		false
> > 
> > Yes, now it makes sense to me. Thank you for the explanation.
> 
> Cheers!
> 
> > 
> > Then I'd organize the patchset in the following way:
> > 1) cgroup v1-only fix to not throttle tcpmem based on the vmpressure
> > 2) a formal code refactoring
> 
> OK, I will take a try to re-organize in next version.

Thank you!
> 
> > > > 
> > > > Overall I think it's a good idea to clean these things up and thank you
> > > > for working on this. But I wonder if we can make the next step and leave only
> > > > one mechanism for both cgroup v1 and v2 instead of having this weird setup
> > > > where memcg->socket_pressure is set differently from different paths on cgroup
> > > > v1 and v2.
> > > 
> > > There is some difficulty in unifying the mechanism for both cgroup
> > > designs. Throttling socket memory allocation when memcg is under
> > > pressure only makes sense when socket memory and other usages are
> > > sharing the same limit, which is not true for cgroupv1. Thoughts?
> > 
> > I see... Generally speaking cgroup v1 is considered frozen, so we can leave it
> > as it is, except when it creates an unnecessary complexity in the code.
> 
> Are you suggesting that the 2nd patch can be ignored and keep
> ->tcpmem_pressure as it is? Or keep the 2nd patch and add some
> explanation around as you suggested in last reply?

I suggest to split a code refactoring (which is not expected to bring any
functional changes) and an actual change of the behavior on cgroup v1.
Re the refactoring: I see a lot of value in adding comments and make the
code more readable, I don't see that much value in merging two variables.
But if it comes organically with the code simplification - nice.

> 
> > 
> > I'm curious, was your work driven by some real-world problem or a desire to clean
> > up the code? Both are valid reasons of course.
> 
> We (a cloud service provider) are migrating users to cgroupv2,
> but encountered some problems among which the socket memory
> really puts us in a difficult situation. There is no specific
> threshold for socket memory in cgroupv2 and relies largely on
> workloads doing traffic control themselves.
> 
> Say one workload behaves fine in cgroupv1 with 10G of ->memory
> and 1G of ->tcpmem, but will suck (or even be OOMed) in cgroupv2
> with 11G of ->memory due to burst memory usage on socket.
> 
> It's rational for the workloads to build some traffic control
> to better utilize the resources they bought, but from kernel's
> point of view it's also reasonable to suppress the allocation
> of socket memory once there is a shortage of free memory, given
> that performance degradation is better than failure.

Yeah, I can see it. But Idk if it's too workload-specific to have
a single-policy-fits-all-cases approach.
E.g. some workloads might prefer to have a portion of pagecache
being reclaimed.
What do you think?

> 
> Currently the mechanism of net-memcg's pressure doesn't work as
> we expected, please check the discussion in [1]. Besides this,
> we are also working on mitigating the priority inversion issue
> introduced by the net protocols' global shared thresholds [2],
> which has something to do with the net-memcg's pressure. This
> patchset and maybe some other are byproducts of the above work.
> 
> [1] https://lore.kernel.org/netdev/20230602081135.75424-1-wuyun.abel@bytedance.com/
> [2] https://lore.kernel.org/netdev/20230609082712.34889-1-wuyun.abel@bytedance.com/

Thanks for the clarification!


  reply	other threads:[~2023-07-27  0:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-11 12:41 Abel Wu
2023-07-11 12:41 ` [PATCH RESEND net-next 2/2] net-memcg: Remove redundant tcpmem_pressure Abel Wu
2023-07-12  3:45 ` [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure Jakub Kicinski
2023-07-12  6:45   ` Abel Wu
2023-07-20  7:58 ` Abel Wu
2023-07-20  8:57   ` Eric Dumazet
2023-07-20 11:34     ` Abel Wu
2023-07-22  0:20 ` Roman Gushchin
2023-07-24  3:47   ` Abel Wu
2023-07-26  2:56     ` Roman Gushchin
2023-07-26  8:44       ` Abel Wu
2023-07-27  0:19         ` Roman Gushchin [this message]
2023-07-28 12:45           ` Abel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMG39B6B41yLAu9r@P9FQF9L96D \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=alexander@mihalicyn.com \
    --cc=ast@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=laoar.shao@gmail.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lucien.xin@gmail.com \
    --cc=martin.lau@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shakeelb@google.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=wuyun.abel@bytedance.com \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox