From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB6D1EB64D7 for ; Tue, 13 Jun 2023 20:25:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 719026B008C; Tue, 13 Jun 2023 16:25:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C8FC6B0093; Tue, 13 Jun 2023 16:25:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 569468E0002; Tue, 13 Jun 2023 16:25:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 411D46B008C for ; Tue, 13 Jun 2023 16:25:06 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E7DDBC0614 for ; Tue, 13 Jun 2023 20:25:05 +0000 (UTC) X-FDA: 80898853770.29.121BD55 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf16.hostedemail.com (Postfix) with ESMTP id 0BD9A180017 for ; Tue, 13 Jun 2023 20:25:02 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ulZ0jrJS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686687903; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Tu74121bGTOyi/gScvgyRcJ4UHCFo8s9EjAjR7M4bno=; b=Q1MUrx+Apz82EG9R/Bt68/Jf8Mv5kXaQxvqqBtxUprLG0kVhNb6/UokT8q0fjBChTwl9wv HyhtHg2Bjii/sVFOTMfZC2PmkkZV58Kfl+hUZNs6nE/10+7iPolyinrbqwqSXzX33Me/vC fW9TEa/vVB5gzBZL/q0k6UGzyie4iiE= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ulZ0jrJS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686687903; a=rsa-sha256; cv=none; b=JRCDZmQ22PKO5b0W+Bh+0e3JES9wsPH5Vi3JC+t6UBAlkcThEEDB0mfyeQlRzn9DzBJl1I eo4totAQxLn/mfchkyTfQE39lsOhbAUsfeJyYmDZXZeoO34shh0LkCV1CjycpMOI3SHGHN CjM8n8yR+UTZ14VkQ3R5X84u2gnKEmU= Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-977cc662f62so862574666b.3 for ; Tue, 13 Jun 2023 13:25:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686687901; x=1689279901; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Tu74121bGTOyi/gScvgyRcJ4UHCFo8s9EjAjR7M4bno=; b=ulZ0jrJSHTa44Qg7LztFaCU5OztMo1pzVdcwaQdm3K0MJ+V8u2aBjSkTyr9cquXrQA GHi/c9unlglS2gH6aSu87kJPrEnsYJkUqT7K98t95dvhTb2hl8FnJlyKH59hFs/QomsE B0IG2WTvaJQe+TMeAvtQzzZCYw9ZbLHP7XM+ayfnU//j7Plf0t2fyDdDz/YHSjg1fE1y vfJ7xZTNRGH04qcTdDaxtvDjnf7xuB9RSqUHby5veNVYLqpQ7nGJy4ytb5MQtdmPP4O8 yNPNCgR/Ew0tge3k25Z+klUrBuGsQ4LGa/QDhPzFgee+jNvpGtHmhB70UEtFK1O60XHE s3iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686687901; x=1689279901; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Tu74121bGTOyi/gScvgyRcJ4UHCFo8s9EjAjR7M4bno=; b=QvkYWc6s+Ipr6r/2Xvk+MnNv7fTD2MD64l1X2QEzWIGxjuBEoJEV3DjXodtkmAU0Et Ext2csGfCyHZd3oiWrYaAOfkWxBvGzNkJ7HZJ4ES0j3NbSWcKM0fHeqajw3aoTb0HY3Q bsRvXC7vqN1VqLC6YJFz2yEnIv11NNqACjCsP3zZwgfSPpDOoh1daO0wU2HXMFDfeTWx LnAFbpqOndjU73I8k7oHNrrlu4UrIRGO/NXFpF8GdDVIopPuU+QM1aeO94er7Yw8X4HI bSVdXsI5z3/YMcGGLkh8miGctWMeIJHBXTZXMzuaRlHX5a0bExVZxEpO9YO1z/cqOgbI 3hnQ== X-Gm-Message-State: AC+VfDwQLprjD1lilOIMro3Sa3vDMOoT4Y606S6kZRuW0B4UbRRXDmCE jV5e0fOJz6zBKKyWcnwbAH3f0BV/UolrOXE7ZbAJgg== X-Google-Smtp-Source: ACHHUZ4UQLCRFW8Ae1CeHiyl3YXv4gfhRdDudewN0V5us8ifj90n5vl30hC8MeMLRwFSyiBhziB9drX0AujNZLFOpc0= X-Received: by 2002:a17:907:2d8f:b0:959:5407:3e65 with SMTP id gt15-20020a1709072d8f00b0095954073e65mr16064057ejc.55.1686687901264; Tue, 13 Jun 2023 13:25:01 -0700 (PDT) MIME-Version: 1.0 References: <66F9BB37-3BE1-4B0F-8DE1-97085AF4BED2@didiglobal.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 13 Jun 2023 13:24:24 -0700 Message-ID: Subject: Re: [PATCH v3 0/2] memcontrol: support cgroup level OOM protection To: Michal Hocko Cc: =?UTF-8?B?56iL5Z6y5rabIENoZW5na2FpdGFvIENoZW5n?= , "tj@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "akpm@linux-foundation.org" , "brauner@kernel.org" , "muchun.song@linux.dev" , "viro@zeniv.linux.org.uk" , "zhengqi.arch@bytedance.com" , "ebiederm@xmission.com" , "Liam.Howlett@oracle.com" , "chengzhihao1@huawei.com" , "pilgrimtao@gmail.com" , "haolee.swjtu@gmail.com" , "yuzhao@google.com" , "willy@infradead.org" , "vasily.averin@linux.dev" , "vbabka@suse.cz" , "surenb@google.com" , "sfr@canb.auug.org.au" , "mcgrof@kernel.org" , "sujiaxun@uniontech.com" , "feng.tang@intel.com" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , David Rientjes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 0BD9A180017 X-Stat-Signature: t138pd5y43n69i6634tbzbwmhdhezion X-Rspam-User: X-HE-Tag: 1686687902-164187 X-HE-Meta: U2FsdGVkX19vKbfu1W6CmXtjCtBTwo0HsKUmGZ4+Ukl9mL21Bc6tWD2kIVHBlS/8oqYbvSSOQ3EoyWMNzLYp+oKqCZjlI3WDcTcyQxvsR5XL4IFa2zV4PzD8/w8LUX7mbnI9YyH6xrgiVgxBxf9EeRzp+ycQ7ikOLGY7j86cOGayOb/WKXDLN/+wVHw1pdW6bJNTpLUvi7uxO1u5bmbYJ0ci02nshv4DJ6EXHIqPoTL4S+9vIdbC3qr35ZGbZEfSzUmW5b84WljfYeEP64NPSIMjK2wVkf0aBlwJ2phR8qwvzdD1GVtSievcCCW4ZP0UFHm9wZsmEC9fovaRychp21aUjwehFj3RCdedukjX7sAV8g6f/snpJv3b5GOT8yHIUO2TIHS7DsEL0jskNcw32E7bf2Thb+0lBFBZvsOEFY/n8EC/tv8KbhxAkED9ZV55+eJEf54o+DhLb0w9mdndTT873P1KVsdLc1DZSDev8wnMFPGF8HiYBhqwAXZ6lyiD9pb7sOCUx55lpIsCEosAu2nZJ/MsTroZwmZZIAyEZNO3oT1GZAdBlAk0z0z/fkKPlgakDeGPSfxrUi/+VIVAYJQaraBruGTBHLSiBbCetJfcyc8Nr7fhyvd6Af+n6s9yas01oZrjsPFMUEtT5MTEvttXiBU5DKIJ4RT+kQsgbICcN7MU50D/whQlA68roYzHfgseWnzLcwonvhv1+jIbJAVnscmTGMcp4Oz6ieP/jpCyZvuoI431ciTfMRH/4Pof4c4/FIu//ou4lJKUQAdcZgiaaEdYeXpSBStLfzAK91gK9fO5i77i0fk6P/qYPH214s+OAhJOfTjZ1hNdPSQ8AOca5a0HM3kevIN4RoXTkcwEJGrxBYoWLHHLtKeBEb6W7RHhLCQRGtaUjGI0GHjV65K5e1JZfF0fwlIvMiNeKBRQCw4hu7993MUGQV5WpUgibuqlia7dSdvLJBL8YHS oIGqMAq1 kiIYsWbMV6DsQh6r3uo+tQ0UtsvqI5oxXcaCIf41mbPK3hc5qpb0pN2n2kB5Iu9sE9qmj+duBalTKGOpV3e1D13i2fYsXLP7CYudkRcQjry9RraZfGjd/Jqcpwm50tlQFloFjQcYhz5PWZkVPKlawDwx9XnWy1fmwr7ZRHs8BB+vFicQROrOb8eXzJidtY9ZvVbVUBbUuGPHaoP0Bhph+cCzp84KtN+CwL37ozxClv7asfwcs/ovth1SAGGn/tvcvo98lS/muPmHt8+zR6iOdMCTmPtl86Vac6wFitzYaKx1ALOsoanQic/hF3w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 13, 2023 at 5:06=E2=80=AFAM Michal Hocko wrot= e: > > On Tue 13-06-23 01:36:51, Yosry Ahmed wrote: > > +David Rientjes > > > > On Tue, Jun 13, 2023 at 1:27=E2=80=AFAM Michal Hocko = wrote: > > > > > > On Sun 04-06-23 01:25:42, Yosry Ahmed wrote: > > > [...] > > > > There has been a parallel discussion in the cover letter thread of = v4 > > > > [1]. To summarize, at Google, we have been using OOM scores to > > > > describe different job priorities in a more explicit way -- regardl= ess > > > > of memory usage. It is strictly priority-based OOM killing. Ties ar= e > > > > broken based on memory usage. > > > > > > > > We understand that something like memory.oom.protect has an advanta= ge > > > > in the sense that you can skip killing a process if you know that i= t > > > > won't free enough memory anyway, but for an environment where multi= ple > > > > jobs of different priorities are running, we find it crucial to be > > > > able to define strict ordering. Some jobs are simply more important > > > > than others, regardless of their memory usage. > > > > > > I do remember that discussion. I am not a great fan of simple priorit= y > > > based interfaces TBH. It sounds as an easy interface but it hits > > > complications as soon as you try to define a proper/sensible > > > hierarchical semantic. I can see how they might work on leaf memcgs w= ith > > > statically assigned priorities but that sounds like a very narrow > > > usecase IMHO. > > > > Do you mind elaborating the problem with the hierarchical semantics? > > Well, let me be more specific. If you have a simple hierarchical numeric > enforcement (assume higher priority more likely to be chosen and the > effective priority to be max(self, max(parents)) then the semantic > itslef is straightforward. > > I am not really sure about the practical manageability though. I have > hard time to imagine priority assignment on something like a shared > workload with a more complex hierarchy. For example: > root > / | \ > cont_A cont_B cont_C > > each container running its workload with own hierarchy structures that > might be rather dynamic during the lifetime. In order to have a > predictable OOM behavior you need to watch and reassign priorities all > the time, no? In our case we don't really manage the entire hierarchy in a centralized fashion. Each container gets a score based on their relative priority, and each container is free to set scores within its subcontainers if needed. Isn't this what the hierarchy is all about? Each parent only cares about its direct children. On the system level, we care about the priority ordering of containers. Ordering within containers can be deferred to containers. > > > The way it works with our internal implementation is (imo) sensible > > and straightforward from a hierarchy POV. Starting at the OOM memcg > > (which can be root), we recursively compare the OOM scores of the > > children memcgs and pick the one with the lowest score, until we > > arrive at a leaf memcg. > > This approach has a strong requirement on the memcg hierarchy > organization. Siblings have to be directly comparable because you cut > off many potential sub-trees this way (e.g. is it easy to tell > whether you want to rule out all system or user slices?). > > I can imagine usecases where this could work reasonably well e.g. a set > of workers of a different priority all of them running under a shared > memcg parent. But more more involved hierarchies seem more complex > because you always keep in mind how the hierarchy is organize to get to > your desired victim. I guess the main point is what I mentioned above, you don't need to manage the entire tree, containers can manage their subtrees. The most important thing is to provide the kernel with priority ordering among containers, and optionally priority ordering within a container (disregarding other containers). > > -- > Michal Hocko > SUSE Labs