From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C2B2EB64DA for ; Fri, 16 Jun 2023 01:45:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 571D88E0001; Thu, 15 Jun 2023 21:45:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 521DD6B0075; Thu, 15 Jun 2023 21:45:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C2318E0001; Thu, 15 Jun 2023 21:45:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 298266B0074 for ; Thu, 15 Jun 2023 21:45:27 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E057D1C8D98 for ; Fri, 16 Jun 2023 01:45:26 +0000 (UTC) X-FDA: 80906918652.05.9CCB0F5 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf07.hostedemail.com (Postfix) with ESMTP id 178DE4000D for ; Fri, 16 Jun 2023 01:45:24 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=S3sEkwkM; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686879925; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TTYhjgjxFSXf3VV7TigycNG8AweXhCpfWjk/TzQLXHc=; b=yQhNrxDsjG9k5GAyxLhetOMlxH6XTjgdLVp86pqjhKFxpsNhjyXgOf0O4/C15tOlrHQMk0 BlKetBIHfdbPj74Ae4vzxND25ugq4wQ84j00CtOE8qNcMysHwwQbRn6kuAAKhGSkUEVH81 KNY/LwR3DI4gsMoQswUzgEuRp5Z4dts= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=S3sEkwkM; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686879925; a=rsa-sha256; cv=none; b=mIpNEvZfBXaSq4dpJC/78kb05k+DGWz9HKKDUz+VOGiftlrB7ug/cdRhaov71TLg9ubP5q HTMxiP58KV8+hJYguWpwwn1B/eMKUb0kyAC2BwJQ7ZObtnZ59sggF5pfKfoq1i9p+zsNn/ 250SciJahBUN1EcM34SfPc1xNlxyI40= Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-982a99fda0dso15586966b.1 for ; Thu, 15 Jun 2023 18:45:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686879923; x=1689471923; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TTYhjgjxFSXf3VV7TigycNG8AweXhCpfWjk/TzQLXHc=; b=S3sEkwkMSll1Qn9sm5LHOkOGyqa9IWwEh1o6Y2R2ElKB5bZkFZpm65z7B/tJRybHER yTY2PIEZH0eR3Po6ZzHc16OeGyVIkYmwtCdXTjriszEjXtNBantUF2Ozi0mpVcPOGbQR VgvxxP+s36cJ6ooAMxS1X3+6p3goyhZP/H5vJrbNfsyKpR2M7tBc+v8IN6GtYfyfbbta ICo8LaY+2xcwDkL7DVIETr6Y4ilhpY6Yewxp7bD4HTMOHcQrYq25VgUL9Z1BeZ7c2lQW a+FSy7QJEBUbtpKMDq5KKPiz2RtmaJRgAFtsxBoSZ+nKhq+XI8wXK4yAB4bxQZC1j+qn v45Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686879923; x=1689471923; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TTYhjgjxFSXf3VV7TigycNG8AweXhCpfWjk/TzQLXHc=; b=PgPn8FXinNqrMPGe3zWen1pS839KCseFhYjLS4QSzgLFqAVU1lmYRmf2g4/JDb/J3l eRECdAi4O/hgFb0IiZdkQkY/Bvs4fNE8Y8wvOLQLhBDtW1nYEHjA/IZFb8cFJ2BGOOGC RhSTJAEgjF2VVjewGKQaio7oVfzEhxpx+kF1qdSIpZkjK4KcOSfoMmAQqcMBBODuTckP v9E/LZsPJmAh8Uq6+WoPP95e4xrAdSwoCEmapBRpfhiADxoIYC5GIzraIK767VJjcMib Lf8gVqTonDKMiyvd2Elds8KkLbP5uvDklqtg+SNqMsC43lbkeN+fB0L3HrBcyPFjPo94 ukDg== X-Gm-Message-State: AC+VfDzfOtyq0AQavnr9Ai+ThAa+4CsOjG0jI22AGLpYdgkyFt8F5Spq D8dF4x40Wz5z1bp2hPulv2TwT+ZkSrVBp0r4HXlWOg== X-Google-Smtp-Source: ACHHUZ76cAdEZK0llouR54CbD1icf6/E/UdAzOAQRemvApKZ4TRkyjsCJnLxVEcQpOzKFdNMdqwzNpJQNZ132gn3wXY= X-Received: by 2002:a17:906:730d:b0:974:b15:fcd9 with SMTP id di13-20020a170906730d00b009740b15fcd9mr615087ejc.53.1686879923492; Thu, 15 Jun 2023 18:45:23 -0700 (PDT) MIME-Version: 1.0 References: <66F9BB37-3BE1-4B0F-8DE1-97085AF4BED2@didiglobal.com> In-Reply-To: From: Yosry Ahmed Date: Thu, 15 Jun 2023 18:44:46 -0700 Message-ID: Subject: Re: [PATCH v3 0/2] memcontrol: support cgroup level OOM protection To: Michal Hocko Cc: =?UTF-8?B?56iL5Z6y5rabIENoZW5na2FpdGFvIENoZW5n?= , "tj@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "akpm@linux-foundation.org" , "brauner@kernel.org" , "muchun.song@linux.dev" , "viro@zeniv.linux.org.uk" , "zhengqi.arch@bytedance.com" , "ebiederm@xmission.com" , "Liam.Howlett@oracle.com" , "chengzhihao1@huawei.com" , "pilgrimtao@gmail.com" , "haolee.swjtu@gmail.com" , "yuzhao@google.com" , "willy@infradead.org" , "vasily.averin@linux.dev" , "vbabka@suse.cz" , "surenb@google.com" , "sfr@canb.auug.org.au" , "mcgrof@kernel.org" , "sujiaxun@uniontech.com" , "feng.tang@intel.com" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , David Rientjes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 178DE4000D X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: rporcnqi5bjzcpux73jnyjg75pxht9ma X-HE-Tag: 1686879924-644980 X-HE-Meta: U2FsdGVkX1+8+5rnwS5N6xDqxmH+JfTpsJ2L9AZQyp8ra9poZUiMxoNVPOw1nyoccfbXVwy4INIYp4J3SPJFpdULaAKm7dl9BXc7Z+oRBsBlHYTbT2rsDv4l14OwEw8HXem96yhs58kpKLDzi3rNsz6b1i4T2vl1wQca1yliyReYHnmCjBYwAHNvQRGgyOs5EtwBU1S9WDW2cX8YtqosxKKG9zM2Ee5bEDKDB0xwPqp5eDRz8609/2qKaEhgZoMUJZ1fG4wTDZ6o0QweidspLq8or7jvRdHcQ0kkhIqT7c5Bef54UNBRW+WMb3CpmbSFchmKs6XaSGj7mLqL636SdVOIY4IrQ10htuO9SPvvNPx5kg4lE4SYmVhV4X5V+8GZ0cyhdQbidWC13UxkDTVlL0vN2UIkTDrhTEzLJPhkVpdttQ8AkYCJZgmIYRk1it5nG48rJNVbzZjPCkKH2rL86rcNl+Ccg3QphNvbtXvEwXwJbsn/dTn+W4CWfHAahs5deo4s+hsYIp+tlOJKw+O48jMYO8qLCc3sMz0kc1DTFIwhawp3i1wmZF7e+0nTKKyc1aGbM8GGS8zA6SCwDrh3R2TSI9TDyxq1gXXiwIAV9DrqPUjqVXU/AatqCsIZajLqs2xcNJbHUprmnOfA2gHLNZ2hMh7mnpNNTscv1eZu2WE8CqW4zabnapHdc4VfwWHuMr9QdA5KqApTSlJaNcZmBHmD13v4wQCue6QqgvLEIach7ynPU+GPnLZROBqOa1giOT+wnHK2gbn18/+q36PnAyUe6PrWrIDRh9vYxEgXl/5X6bneokKuXMdy5ems3Ni4/cSoQiQ7uGUTaUZl0ko9etS1nq6+naciJi5y93iYs1aDo87cjbrg96ROskvQJlKG9RsH3LrhPmoMacDgTgKZbH8Sn3WTvIxdn/JqW6OKAgpQMkhPlBa314L9nohkYwKg+XAsFexHbhvn6SFLb0v CSC/8vBH /wTW34fT0TjLTwhf22PRWPsdcui25LFlkCpluJKtYzcP7R9JE7cFCJBGWdDbUAY/O+bkb6KlmuufpXsUS+EfR7TZEolcCwto+V1x7XzyFjO/j7leyckQR7AtQzy1n89pTQ1OXC9SCWrKqd2L0qi+Mzlbp16jqvIBqI7T97f9cJWd3HBixerqsm1fHlpSETqNlMGiUR5QNSK/LfyPJwy73aC2CVsmnReHtgzwl10Y1uQbsVsPmO+fd0MSGKevSH72HT+lhc6geQ4bQjsmURFmWeJcP2yk2lfdP99kaDUkZ+TAHGEx3SEgpMwOBOWxyYZmVnYX3Hm9KgPUFutZv0hYZ0NMdJNXJ6O8LOacdtTrqnAOSscEogQ1fw3Dh+Yz8pswi25E8QKtrYvZnl82FzihcBjbH9H0WgbcrqRX8rgmxQVEafvbQgWJOGmNTgl0FKJ4Zq9CY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000018, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 15, 2023 at 3:39=E2=80=AFAM Michal Hocko wrot= e: > > On Tue 13-06-23 13:24:24, Yosry Ahmed wrote: > > On Tue, Jun 13, 2023 at 5:06=E2=80=AFAM Michal Hocko = wrote: > > > > > > On Tue 13-06-23 01:36:51, Yosry Ahmed wrote: > > > > +David Rientjes > > > > > > > > On Tue, Jun 13, 2023 at 1:27=E2=80=AFAM Michal Hocko wrote: > > > > > > > > > > On Sun 04-06-23 01:25:42, Yosry Ahmed wrote: > > > > > [...] > > > > > > There has been a parallel discussion in the cover letter thread= of v4 > > > > > > [1]. To summarize, at Google, we have been using OOM scores to > > > > > > describe different job priorities in a more explicit way -- reg= ardless > > > > > > of memory usage. It is strictly priority-based OOM killing. Tie= s are > > > > > > broken based on memory usage. > > > > > > > > > > > > We understand that something like memory.oom.protect has an adv= antage > > > > > > in the sense that you can skip killing a process if you know th= at it > > > > > > won't free enough memory anyway, but for an environment where m= ultiple > > > > > > jobs of different priorities are running, we find it crucial to= be > > > > > > able to define strict ordering. Some jobs are simply more impor= tant > > > > > > than others, regardless of their memory usage. > > > > > > > > > > I do remember that discussion. I am not a great fan of simple pri= ority > > > > > based interfaces TBH. It sounds as an easy interface but it hits > > > > > complications as soon as you try to define a proper/sensible > > > > > hierarchical semantic. I can see how they might work on leaf memc= gs with > > > > > statically assigned priorities but that sounds like a very narrow > > > > > usecase IMHO. > > > > > > > > Do you mind elaborating the problem with the hierarchical semantics= ? > > > > > > Well, let me be more specific. If you have a simple hierarchical nume= ric > > > enforcement (assume higher priority more likely to be chosen and the > > > effective priority to be max(self, max(parents)) then the semantic > > > itslef is straightforward. > > > > > > I am not really sure about the practical manageability though. I have > > > hard time to imagine priority assignment on something like a shared > > > workload with a more complex hierarchy. For example: > > > root > > > / | \ > > > cont_A cont_B cont_C > > > > > > each container running its workload with own hierarchy structures tha= t > > > might be rather dynamic during the lifetime. In order to have a > > > predictable OOM behavior you need to watch and reassign priorities al= l > > > the time, no? > > > > In our case we don't really manage the entire hierarchy in a > > centralized fashion. Each container gets a score based on their > > relative priority, and each container is free to set scores within its > > subcontainers if needed. Isn't this what the hierarchy is all about? > > Each parent only cares about its direct children. On the system level, > > we care about the priority ordering of containers. Ordering within > > containers can be deferred to containers. > > This really depends on the workload. This might be working for your > setup but as I've said above, many workloads would be struggling with > re-prioritizing as soon as a new workload is started and oom priorities > would need to be reorganized as a result. The setup is just too static > to be generally useful IMHO. > You can avoid that by essentially making mid-layers no priority and only > rely on leaf memcgs when this would become more flexible. This is > something even more complicated with the top-down approach. I agree that other setups may find it more difficult if one entity needs to manage the entire tree, although if the scores range is large enough, I don't really think it's that static. When a new workload is started you decide what its priority is compared to the existing workloads and set its score as such. We use a range of scores from 0 to 10,000 (and it can easily be larger), so it's easy to assign new scores without reorganizing the existing scores. > > That being said, I can see workloads which could benefit from a > priority (essentially user spaced controlled oom pre-selection) based > policy. But there are many other policies like that that would be > usecase specific and not generic enough so I do not think this is worth > a generic interface and would fall into BPF or alike based policies. That's reasonable. I can't speak for other folks. Perhaps no single policy will be generic enough, and we should focus on enabling customized policy. Perhaps other userspace OOM agents can benefit from this as well. > > -- > Michal Hocko > SUSE Labs