From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B1BEEB64D9 for ; Thu, 15 Jun 2023 10:39:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DFC76B0072; Thu, 15 Jun 2023 06:39:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 58FA16B0074; Thu, 15 Jun 2023 06:39:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 430738E0001; Thu, 15 Jun 2023 06:39:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 313316B0072 for ; Thu, 15 Jun 2023 06:39:35 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EAD1780BB5 for ; Thu, 15 Jun 2023 10:39:34 +0000 (UTC) X-FDA: 80904635868.14.32352D1 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf15.hostedemail.com (Postfix) with ESMTP id EB3D9A0017 for ; Thu, 15 Jun 2023 10:39:32 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=s9spF5V6; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf15.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686825573; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Lf/o+Tumekel5U1473bLJQUdNs1l659RBaGUIIod3vI=; b=CBQ++TezFkefHwOLS2JCOXIEIrPZ2bYPjIK/VVjv2OBeqI8DOm/C6fXaNj41oHXhKRjo0c 834Hclswoad77eK6tY+2XoQ1rw+qMgocn8H2MgSFl6hjJmaBpXrp+kONmS1kTY3fjpWFwE UF6RXjA2nLg424ppwco82nK52LyZAV0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=s9spF5V6; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf15.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686825573; a=rsa-sha256; cv=none; b=EWaN3fZp32d3faMrHjyFMb1AXdqJoQ6aO62fE5BvHvQR5GtSuhBK06Mpr+Uo5xOek5FB+E aJ/fHCexqnY7FICFwRF5v/EE22uPOeSU3eLXwHtQyRulfsguAPM/2h5B+uVR4IiGyKjmdm +HLaP2RiaPNTUtPY6HeCaoCxSBWGqdc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 4DADA1FE03; Thu, 15 Jun 2023 10:39:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1686825571; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Lf/o+Tumekel5U1473bLJQUdNs1l659RBaGUIIod3vI=; b=s9spF5V6mDdepKkscrT8momqQJjeDvFzoQlUIbfOW3JPRGpKNxecgIEDZpyy8z94dy/Msy WOoLDuP7q0SiXwNa21Z6tOVX31yqInnkuHfgcaEqQkdiFYXC1dOj4GTBuotHQMiJt7Xv3w ot68oZkMivxfYyN77eCS/0oJxc3St+Q= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 397DA13A47; Thu, 15 Jun 2023 10:39:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id AfHtDWPqimQQdwAAMHmgww (envelope-from ); Thu, 15 Jun 2023 10:39:31 +0000 Date: Thu, 15 Jun 2023 12:39:29 +0200 From: Michal Hocko To: Yosry Ahmed Cc: =?utf-8?B?56iL5Z6y5rab?= Chengkaitao Cheng , "tj@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "akpm@linux-foundation.org" , "brauner@kernel.org" , "muchun.song@linux.dev" , "viro@zeniv.linux.org.uk" , "zhengqi.arch@bytedance.com" , "ebiederm@xmission.com" , "Liam.Howlett@oracle.com" , "chengzhihao1@huawei.com" , "pilgrimtao@gmail.com" , "haolee.swjtu@gmail.com" , "yuzhao@google.com" , "willy@infradead.org" , "vasily.averin@linux.dev" , "vbabka@suse.cz" , "surenb@google.com" , "sfr@canb.auug.org.au" , "mcgrof@kernel.org" , "sujiaxun@uniontech.com" , "feng.tang@intel.com" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , David Rientjes Subject: Re: [PATCH v3 0/2] memcontrol: support cgroup level OOM protection Message-ID: References: <66F9BB37-3BE1-4B0F-8DE1-97085AF4BED2@didiglobal.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Stat-Signature: 8ff3yw5zqcwttr6r13sytt1gei4xudo1 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: EB3D9A0017 X-HE-Tag: 1686825572-509710 X-HE-Meta: U2FsdGVkX1+9Hx9by6TX7bfZMjvqUnH/bHV4D152bAgH9BalN9AqHlMg+LJcLAliY+s0ca3im5oZ6k9i3ejO0WGtppmHFhFrmB7o7CPZIiXtolDzaxbzS3zc/2oogLkbI+0nvqW/Rh7qQwPVo5znmCpuU1aazZ/9+8G9vrvh82BbPag1KSLOcUkfcR6Gl0apdNcjUMPLnqGmSKBwhaGUp2HTLJIOA9zlAhf/LCQP8QdNYf1TIifUYltaGoqMtWtQG5/wvo8o6nIuN5CTN/4Cu6O+RwGr85xhDZlguTgpZotKiExbZlogzNfuI2L+vs3dXN5rGRXaN7mNAD1v9Kjh4iZS/XWi7NqUE7S01Q+pJWxCGEr0rtTs3ZQ2JMYij2vY42wnTjpIzAfyuIpLviS0wojr5mo2K63xp7CTJtmqazsxPanJ1/gZyqDo5YjXV9eRcn5wEdPPw8Jkae7t2TFDrUSqNgYxb+TjrJfRB/3fk9mkWIlUHTufO/pumKS4raJQY8Viy6OPqlx2O/hz3n4P+An0uSS4fjV/A+i11xzCJkszWmzTb+EnsEJ+ziDBqCT4XvFMQVMpH4xK5oOHevJCEha6jCOQOigr2GcPL1mrdyUwS27bSEYRpwIJtR4DrjUFpew2HcwH1rYLZWL0ewbdb6jQjyD16f8XiGA8iGEZmmk1nobQkrt9RqkxYyKc2hIu1pUnTTjkX9/PaggJEHjFxav7WVncKEA1NPxRyR4waEGRGBd3AiCZjRntmbAESfFeX2P1dQzbbtTFYwrOaW0+ryROyEKkGJnx44OO6w8jIACyz+RsWhCOWsNOIB9q0AuhnTGCS3rmMl+m+XPyEmZauE2rBkVWjD6xiMGzyKJICjHuM7zx55zDZwP4CaOGW/+C6y2tH8wnY5Juxne0iWAPp1IkmvSXq/WQLRepmEwX+V4lXyhT9IYgsBlvFKtXJcdR45G7gARLAws3nFYJipq V0RYC1i/ /aZLKDijxL81ZmXt/dAZ0qoh6z3Lql2xWVmixAgXh1OCK3j9LR4MdRca+96frYUkc/KgG4vVQ8tnOYBTB4uBRMYDwsbUfJGLBuSGT+NruQikR138Mws2QNlH5EI4uSo+wFhEqUkex77BGAjmZYHvmIMcD0w/tIwWXyZq8u4apJ49jxD98ePGySHU1sQjm5Z737cUpUR1CU60hKuzjb7IIU8eu0ZISOaaGOGyQXoe3wQKMAThEtAfN8Ze4emSQAfcLu4yNXYBN9Ewdz/CVxavsTdzwXMEwZ6GhUEivZebpyiTc+lImctwgG8b28n2htSSnDu/c8m+h/5Nzf5qTAlX8hjPXAWW8m8n5UV+2IGAyOONn4OaPBzVKxcGf5Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 13-06-23 13:24:24, Yosry Ahmed wrote: > On Tue, Jun 13, 2023 at 5:06 AM Michal Hocko wrote: > > > > On Tue 13-06-23 01:36:51, Yosry Ahmed wrote: > > > +David Rientjes > > > > > > On Tue, Jun 13, 2023 at 1:27 AM Michal Hocko wrote: > > > > > > > > On Sun 04-06-23 01:25:42, Yosry Ahmed wrote: > > > > [...] > > > > > There has been a parallel discussion in the cover letter thread of v4 > > > > > [1]. To summarize, at Google, we have been using OOM scores to > > > > > describe different job priorities in a more explicit way -- regardless > > > > > of memory usage. It is strictly priority-based OOM killing. Ties are > > > > > broken based on memory usage. > > > > > > > > > > We understand that something like memory.oom.protect has an advantage > > > > > in the sense that you can skip killing a process if you know that it > > > > > won't free enough memory anyway, but for an environment where multiple > > > > > jobs of different priorities are running, we find it crucial to be > > > > > able to define strict ordering. Some jobs are simply more important > > > > > than others, regardless of their memory usage. > > > > > > > > I do remember that discussion. I am not a great fan of simple priority > > > > based interfaces TBH. It sounds as an easy interface but it hits > > > > complications as soon as you try to define a proper/sensible > > > > hierarchical semantic. I can see how they might work on leaf memcgs with > > > > statically assigned priorities but that sounds like a very narrow > > > > usecase IMHO. > > > > > > Do you mind elaborating the problem with the hierarchical semantics? > > > > Well, let me be more specific. If you have a simple hierarchical numeric > > enforcement (assume higher priority more likely to be chosen and the > > effective priority to be max(self, max(parents)) then the semantic > > itslef is straightforward. > > > > I am not really sure about the practical manageability though. I have > > hard time to imagine priority assignment on something like a shared > > workload with a more complex hierarchy. For example: > > root > > / | \ > > cont_A cont_B cont_C > > > > each container running its workload with own hierarchy structures that > > might be rather dynamic during the lifetime. In order to have a > > predictable OOM behavior you need to watch and reassign priorities all > > the time, no? > > In our case we don't really manage the entire hierarchy in a > centralized fashion. Each container gets a score based on their > relative priority, and each container is free to set scores within its > subcontainers if needed. Isn't this what the hierarchy is all about? > Each parent only cares about its direct children. On the system level, > we care about the priority ordering of containers. Ordering within > containers can be deferred to containers. This really depends on the workload. This might be working for your setup but as I've said above, many workloads would be struggling with re-prioritizing as soon as a new workload is started and oom priorities would need to be reorganized as a result. The setup is just too static to be generally useful IMHO. You can avoid that by essentially making mid-layers no priority and only rely on leaf memcgs when this would become more flexible. This is something even more complicated with the top-down approach. That being said, I can see workloads which could benefit from a priority (essentially user spaced controlled oom pre-selection) based policy. But there are many other policies like that that would be usecase specific and not generic enough so I do not think this is worth a generic interface and would fall into BPF or alike based policies. -- Michal Hocko SUSE Labs