linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: "程垲涛 Chengkaitao Cheng" <chengkaitao@didiglobal.com>,
	"tj@kernel.org" <tj@kernel.org>,
	"lizefan.x@bytedance.com" <lizefan.x@bytedance.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
	"shakeelb@google.com" <shakeelb@google.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"muchun.song@linux.dev" <muchun.song@linux.dev>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
	"ebiederm@xmission.com" <ebiederm@xmission.com>,
	"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
	"chengzhihao1@huawei.com" <chengzhihao1@huawei.com>,
	"pilgrimtao@gmail.com" <pilgrimtao@gmail.com>,
	"haolee.swjtu@gmail.com" <haolee.swjtu@gmail.com>,
	"yuzhao@google.com" <yuzhao@google.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"vasily.averin@linux.dev" <vasily.averin@linux.dev>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"surenb@google.com" <surenb@google.com>,
	"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	"sujiaxun@uniontech.com" <sujiaxun@uniontech.com>,
	"feng.tang@intel.com" <feng.tang@intel.com>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"David Rientjes" <rientjes@google.com>
Subject: Re: [PATCH v3 0/2] memcontrol: support cgroup level OOM protection
Date: Tue, 13 Jun 2023 14:06:44 +0200	[thread overview]
Message-ID: <ZIhb1EwvrdKXpEMb@dhcp22.suse.cz> (raw)
In-Reply-To: <CAJD7tkawYZAWKYgttgtPjscnZTARj+QaGZLGiMiSadwC3oCELQ@mail.gmail.com>

On Tue 13-06-23 01:36:51, Yosry Ahmed wrote:
> +David Rientjes
> 
> On Tue, Jun 13, 2023 at 1:27 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Sun 04-06-23 01:25:42, Yosry Ahmed wrote:
> > [...]
> > > There has been a parallel discussion in the cover letter thread of v4
> > > [1]. To summarize, at Google, we have been using OOM scores to
> > > describe different job priorities in a more explicit way -- regardless
> > > of memory usage. It is strictly priority-based OOM killing. Ties are
> > > broken based on memory usage.
> > >
> > > We understand that something like memory.oom.protect has an advantage
> > > in the sense that you can skip killing a process if you know that it
> > > won't free enough memory anyway, but for an environment where multiple
> > > jobs of different priorities are running, we find it crucial to be
> > > able to define strict ordering. Some jobs are simply more important
> > > than others, regardless of their memory usage.
> >
> > I do remember that discussion. I am not a great fan of simple priority
> > based interfaces TBH. It sounds as an easy interface but it hits
> > complications as soon as you try to define a proper/sensible
> > hierarchical semantic. I can see how they might work on leaf memcgs with
> > statically assigned priorities but that sounds like a very narrow
> > usecase IMHO.
> 
> Do you mind elaborating the problem with the hierarchical semantics?

Well, let me be more specific. If you have a simple hierarchical numeric
enforcement (assume higher priority more likely to be chosen and the
effective priority to be max(self, max(parents)) then the semantic
itslef is straightforward.

I am not really sure about the practical manageability though. I have
hard time to imagine priority assignment on something like a shared
workload with a more complex hierarchy. For example:
	    root
	/    |    \
cont_A    cont_B  cont_C

each container running its workload with own hierarchy structures that
might be rather dynamic during the lifetime. In order to have a
predictable OOM behavior you need to watch and reassign priorities all
the time, no?

> The way it works with our internal implementation is (imo) sensible
> and straightforward from a hierarchy POV. Starting at the OOM memcg
> (which can be root), we recursively compare the OOM scores of the
> children memcgs and pick the one with the lowest score, until we
> arrive at a leaf memcg.

This approach has a strong requirement on the memcg hierarchy
organization. Siblings have to be directly comparable because you cut
off many potential sub-trees this way (e.g. is it easy to tell
whether you want to rule out all system or user slices?).

I can imagine usecases where this could work reasonably well e.g. a set
of workers of a different priority all of them running under a shared
memcg parent. But more more involved hierarchies seem more complex
because you always keep in mind how the hierarchy is organize to get to
your desired victim.

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2023-06-13 12:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-06 11:49 chengkaitao
2023-05-06 11:49 ` [PATCH v3 1/2] mm: memcontrol: protect the memory in cgroup from being oom killed chengkaitao
2023-05-06 11:49 ` [PATCH v3 2/2] memcg: add oom_kill_inherit event indicator chengkaitao
2023-05-07 10:11 ` [PATCH v3 0/2] memcontrol: support cgroup level OOM protection Michal Hocko
2023-05-08  9:08   ` 程垲涛 Chengkaitao Cheng
2023-05-08 14:18     ` Michal Hocko
2023-05-09  6:50       ` 程垲涛 Chengkaitao Cheng
2023-05-22 13:03         ` Michal Hocko
2023-05-25  7:35           ` 程垲涛 Chengkaitao Cheng
2023-05-29 14:02             ` Michal Hocko
     [not found]               ` <C5E5137F-8754-40CC-9F0C-0EB3D8AC1EC2@didiglobal.com>
2023-06-13  8:16                 ` Michal Hocko
     [not found]       ` <CAJD7tkaw_7vYACsyzAtY9L0ZVC0B=XJEWgG=Ad_dOtL_pBDDvQ@mail.gmail.com>
2023-06-13  8:27         ` Michal Hocko
2023-06-13  8:36           ` Yosry Ahmed
2023-06-13 12:06             ` Michal Hocko [this message]
2023-06-13 20:24               ` Yosry Ahmed
2023-06-15 10:39                 ` Michal Hocko
2023-06-16  1:44                   ` Yosry Ahmed
2023-06-13  8:40           ` tj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIhb1EwvrdKXpEMb@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengkaitao@didiglobal.com \
    --cc=chengzhihao1@huawei.com \
    --cc=corbet@lwn.net \
    --cc=ebiederm@xmission.com \
    --cc=feng.tang@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=haolee.swjtu@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mcgrof@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=pilgrimtao@gmail.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=sfr@canb.auug.org.au \
    --cc=shakeelb@google.com \
    --cc=sujiaxun@uniontech.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox