From: Yosry Ahmed <yosryahmed@google.com>
To: "程垲涛 Chengkaitao Cheng" <chengkaitao@didiglobal.com>
Cc: "tj@kernel.org" <tj@kernel.org>,
"lizefan.x@bytedance.com" <lizefan.x@bytedance.com>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
"corbet@lwn.net" <corbet@lwn.net>,
"mhocko@kernel.org" <mhocko@kernel.org>,
"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
"shakeelb@google.com" <shakeelb@google.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"brauner@kernel.org" <brauner@kernel.org>,
"muchun.song@linux.dev" <muchun.song@linux.dev>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
"ebiederm@xmission.com" <ebiederm@xmission.com>,
"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
"chengzhihao1@huawei.com" <chengzhihao1@huawei.com>,
"pilgrimtao@gmail.com" <pilgrimtao@gmail.com>,
"haolee.swjtu@gmail.com" <haolee.swjtu@gmail.com>,
"yuzhao@google.com" <yuzhao@google.com>,
"willy@infradead.org" <willy@infradead.org>,
"vasily.averin@linux.dev" <vasily.averin@linux.dev>,
"vbabka@suse.cz" <vbabka@suse.cz>,
"surenb@google.com" <surenb@google.com>,
"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
"mcgrof@kernel.org" <mcgrof@kernel.org>,
"feng.tang@intel.com" <feng.tang@intel.com>,
"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v4 0/2] memcontrol: support cgroup level OOM protection
Date: Tue, 23 May 2023 15:02:55 -0700 [thread overview]
Message-ID: <CAJD7tkZwCreOS_XxDM_9mOTBo=Gatr12r1xtc64B_e5+HJhRqg@mail.gmail.com> (raw)
In-Reply-To: <B55000F8-BD65-432F-8430-F58054611474@didiglobal.com>
On Sat, May 20, 2023 at 2:52 AM 程垲涛 Chengkaitao Cheng
<chengkaitao@didiglobal.com> wrote:
>
> At 2023-05-20 06:04:26, "Yosry Ahmed" <yosryahmed@google.com> wrote:
> >On Wed, May 17, 2023 at 10:12 PM 程垲涛 Chengkaitao Cheng
> ><chengkaitao@didiglobal.com> wrote:
> >>
> >> At 2023-05-18 04:42:12, "Yosry Ahmed" <yosryahmed@google.com> wrote:
> >> >On Wed, May 17, 2023 at 3:01 AM 程垲涛 Chengkaitao Cheng
> >> ><chengkaitao@didiglobal.com> wrote:
> >> >>
> >> >> At 2023-05-17 16:09:50, "Yosry Ahmed" <yosryahmed@google.com> wrote:
> >> >> >On Wed, May 17, 2023 at 1:01 AM 程垲涛 Chengkaitao Cheng
> >> >> ><chengkaitao@didiglobal.com> wrote:
> >> >> >>
> >> >>
> >> >> Killing processes in order of memory usage cannot effectively protect
> >> >> important processes. Killing processes in a user-defined priority order
> >> >> will result in a large number of OOM events and still not being able to
> >> >> release enough memory. I have been searching for a balance between
> >> >> the two methods, so that their shortcomings are not too obvious.
> >> >> The biggest advantage of memcg is its tree topology, and I also hope
> >> >> to make good use of it.
> >> >
> >> >For us, killing processes in a user-defined priority order works well.
> >> >
> >> >It seems like to tune memory.oom.protect you use oom_kill_inherit to
> >> >observe how many times this memcg has been killed due to a limit in an
> >> >ancestor. Wouldn't it be more straightforward to specify the priority
> >> >of protections among memcgs?
> >> >
> >> >For example, if you observe multiple memcgs being OOM killed due to
> >> >hitting an ancestor limit, you will need to decide which of them to
> >> >increase memory.oom.protect for more, based on their importance.
> >> >Otherwise, if you increase all of them, then there is no point if all
> >> >the memory is protected, right?
> >>
> >> If all memory in memcg is protected, its meaning is similar to that of the
> >> highest priority memcg in your approach, which is ultimately killed or
> >> never killed.
> >
> >Makes sense. I believe it gets a bit trickier when you want to
> >describe relative ordering between memcgs using memory.oom.protect.
>
> Actually, my original intention was not to use memory.oom.protect to
> achieve relative ordering between memcgs, it was just a feature that
> happened to be achievable. My initial idea was to protect a certain
> proportion of memory in memcg from being killed, and through the
> method, physical memory can be reasonably planned. Both the physical
> machine manager and container manager can add some unimportant
> loads beyond the oom.protect limit, greatly improving the oversold
> rate of memory. In the worst case scenario, the physical machine can
> always provide all the memory limited by memory.oom.protect for memcg.
>
> On the other hand, I also want to achieve relative ordering of internal
> processes in memcg, not just a unified ordering of all memcgs on
> physical machines.
For us, having a strict priority ordering-based selection is
essential. We have different tiers of jobs of different importance,
and a job of higher priority should not be killed before a lower
priority task if possible, no matter how much memory either of them is
using. Protecting memcgs solely based on their usage can be useful in
some scenarios, but not in a system where you have different tiers of
jobs running with strict priority ordering.
>
> >> >In this case, wouldn't it be easier to just tell the OOM killer the
> >> >relative priority among the memcgs?
> >> >
> >> >>
> >> >> >If this approach works for you (or any other audience), that's great,
> >> >> >I can share more details and perhaps we can reach something that we
> >> >> >can both use :)
> >> >>
> >> >> If you have a good idea, please share more details or show some code.
> >> >> I would greatly appreciate it
> >> >
> >> >The code we have needs to be rebased onto a different version and
> >> >cleaned up before it can be shared, but essentially it is as
> >> >described.
> >> >
> >> >(a) All processes and memcgs start with a default score.
> >> >(b) Userspace can specify scores for memcgs and processes. A higher
> >> >score means higher priority (aka less score gets killed first).
> >> >(c) The OOM killer essentially looks for the memcg with the lowest
> >> >scores to kill, then among this memcg, it looks for the process with
> >> >the lowest score. Ties are broken based on usage, so essentially if
> >> >all processes/memcgs have the default score, we fallback to the
> >> >current OOM behavior.
> >>
> >> If memory oversold is severe, all processes of the lowest priority
> >> memcg may be killed before selecting other memcg processes.
> >> If there are 1000 processes with almost zero memory usage in
> >> the lowest priority memcg, 1000 invalid kill events may occur.
> >> To avoid this situation, even for the lowest priority memcg,
> >> I will leave him a very small oom.protect quota.
> >
> >I checked internally, and this is indeed something that we see from
> >time to time. We try to avoid that with userspace OOM killing, but
> >it's not 100% effective.
> >
> >>
> >> If faced with two memcgs with the same total memory usage and
> >> priority, memcg A has more processes but less memory usage per
> >> single process, and memcg B has fewer processes but more
> >> memory usage per single process, then when OOM occurs, the
> >> processes in memcg B may continue to be killed until all processes
> >> in memcg B are killed, which is unfair to memcg B because memcg A
> >> also occupies a large amount of memory.
> >
> >I believe in this case we will kill one process in memcg B, then the
> >usage of memcg A will become higher, so we will pick a process from
> >memcg A next.
>
> If there is only one process in memcg A and its memory usage is higher
> than any other process in memcg B, but the total memory usage of
> memcg A is lower than that of memcg B. In this case, if the OOM-killer
> still chooses the process in memcg A. it may be unfair to memcg A.
>
> >> Dose your approach have these issues? Killing processes in a
> >> user-defined priority is indeed easier and can work well in most cases,
> >> but I have been trying to solve the cases that it cannot cover.
> >
> >The first issue is relatable with our approach. Let me dig more info
> >from our internal teams and get back to you with more details.
>
> --
> Thanks for your comment!
> chengkaitao
>
>
next prev parent reply other threads:[~2023-05-23 22:03 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-17 3:20 chengkaitao
2023-05-17 3:20 ` [PATCH v4 1/2] mm: memcontrol: protect the memory in cgroup from being oom killed chengkaitao
2023-05-17 3:20 ` [PATCH v4 2/2] memcg: add oom_kill_inherit event indicator chengkaitao
2023-05-17 6:59 ` [PATCH v4 0/2] memcontrol: support cgroup level OOM protection Yosry Ahmed
2023-05-17 8:01 ` 程垲涛 Chengkaitao Cheng
2023-05-17 8:09 ` Yosry Ahmed
2023-05-17 10:01 ` 程垲涛 Chengkaitao Cheng
2023-05-17 20:42 ` Yosry Ahmed
2023-05-18 5:12 ` 程垲涛 Chengkaitao Cheng
2023-05-19 22:04 ` Yosry Ahmed
2023-05-20 9:52 ` 程垲涛 Chengkaitao Cheng
2023-05-23 22:02 ` Yosry Ahmed [this message]
2023-05-25 8:19 ` 程垲涛 Chengkaitao Cheng
2023-05-25 17:19 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJD7tkZwCreOS_XxDM_9mOTBo=Gatr12r1xtc64B_e5+HJhRqg@mail.gmail.com' \
--to=yosryahmed@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=chengkaitao@didiglobal.com \
--cc=chengzhihao1@huawei.com \
--cc=corbet@lwn.net \
--cc=ebiederm@xmission.com \
--cc=feng.tang@intel.com \
--cc=hannes@cmpxchg.org \
--cc=haolee.swjtu@gmail.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mcgrof@kernel.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=pilgrimtao@gmail.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=sfr@canb.auug.org.au \
--cc=shakeelb@google.com \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=vasily.averin@linux.dev \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox