From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F32F7C77B7A for ; Fri, 19 May 2023 22:05:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5AC9F900005; Fri, 19 May 2023 18:05:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 55BC0900003; Fri, 19 May 2023 18:05:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FC49900005; Fri, 19 May 2023 18:05:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 314FB900003 for ; Fri, 19 May 2023 18:05:07 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 05CB4140AED for ; Fri, 19 May 2023 22:05:07 +0000 (UTC) X-FDA: 80808385854.11.F33EFDE Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by imf19.hostedemail.com (Postfix) with ESMTP id 115CA1A0017 for ; Fri, 19 May 2023 22:05:04 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=RnHJnnkj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684533905; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7JS9bsmb2ExhOm+g4Lt75nCSXcOXphRaQoc6aXCE0Cw=; b=VGnkX6V0LTnSSsNYlDfUIv/wRxIEZyF9NU6x8orxl+fAuz3mmboGD4SdLSjhbWntD45qYT 8RQ6H7sHUpgAC3xEtkILANG2xf9ARlFESRHx20QLRxpehEH0ft8CCUgEYiIw9DCTT4R+Fc DP81RV65RnxM5wMRzFhN31DUUPsxLtw= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=RnHJnnkj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684533905; a=rsa-sha256; cv=none; b=so82eiI+ylg6k1oOzTxH43Fwge2sWd1UczLwA/C9QN+b2nAikTzh7VBmLvp6iBFOM6lbEC x+fX534Zc29aZ0UwAb2EdZ/Y9kYylaR4ipWFB8ousIXnRqDyGlufjF8D+rTa8BFwFmRE54 GO3L1KB4RmAefxLJdz8u0fyfWK3jrOw= Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-9659e9bbff5so693834866b.1 for ; Fri, 19 May 2023 15:05:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684533903; x=1687125903; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7JS9bsmb2ExhOm+g4Lt75nCSXcOXphRaQoc6aXCE0Cw=; b=RnHJnnkj1VIHqje8IVNGPXgjD9OytYKxqYmMWS+0edD6d8Vlh452IK5UUoVYNHA0m/ JrZSLCWIuJkr/qFmn1GwZ0Z9okNmo3X9wIG4aOXiJQ71dR9N+z/hpGwqL9OucT/xYJs8 sXEdyf8eOAvooUYTi70XJlwMMww38PhJnoEwn+5ccVFnCUU0iBMWPYHYxxF+nFh6BS11 hlwa6Rx019RR4e90c7/yYg70FhEHrfU21w6a39b0FXQyYvZVkLYv79wYuo3K268S+3qN t9JlkLqNgoKr9rx/LuewcGO1LO3srqyemDzUIMJlk4pSxE08v9Vre2LHZVt2SYIitl+8 7ARQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684533903; x=1687125903; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7JS9bsmb2ExhOm+g4Lt75nCSXcOXphRaQoc6aXCE0Cw=; b=GKzGpG/EtHVxJnpunsBCN7b+gv7+CktUdHZx31/8MDAcxo3hu5LE3xQM2PymCB/6w5 5XDh7r7BU8EQVTRCkTr7fJfRnex0tRSR5ptzILLWxK5x2twFvVK3+cEKjvK30q37YJZM eBQa9abUTjVJzB4QJkkTvtzwYFytByPFlum8S/u3qUh51U0neQN6kmvaG906R0xopdFW x2IbzVpTAOpxOQ5XqIdEO7/JQ4PZmuXbsBW4Kz3GZjNsmwyROjJZpouKET8nWMIjlIXu xuchcUD8szxRI2K9IOZfsl28M2+7Ot3qTRJJQzJp/kfssQuDM7UnBe5e2FctfI8mAVM1 Lyjg== X-Gm-Message-State: AC+VfDzpHSvIrnL6BpAWtxZuszH3qrAyo4JleQmw99zybOiIiG/WsXnJ CdXKVO1MaRwuput1ykDitnwemhyhifkN7Z6QAAb5WA== X-Google-Smtp-Source: ACHHUZ7PmKcUuxcRDtXRlzFfxhF3Z2gXJBCsnJtL/zsYC6bfcRtFijkSyPQ0sZzD2Lm33IiCzO9EnEZJEPBnURKzzfQ= X-Received: by 2002:a17:907:a428:b0:94f:59aa:8a7c with SMTP id sg40-20020a170907a42800b0094f59aa8a7cmr3260984ejc.20.1684533903287; Fri, 19 May 2023 15:05:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Fri, 19 May 2023 15:04:26 -0700 Message-ID: Subject: Re: [PATCH v4 0/2] memcontrol: support cgroup level OOM protection To: =?UTF-8?B?56iL5Z6y5rabIENoZW5na2FpdGFvIENoZW5n?= Cc: "tj@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "mhocko@kernel.org" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "akpm@linux-foundation.org" , "brauner@kernel.org" , "muchun.song@linux.dev" , "viro@zeniv.linux.org.uk" , "zhengqi.arch@bytedance.com" , "ebiederm@xmission.com" , "Liam.Howlett@oracle.com" , "chengzhihao1@huawei.com" , "pilgrimtao@gmail.com" , "haolee.swjtu@gmail.com" , "yuzhao@google.com" , "willy@infradead.org" , "vasily.averin@linux.dev" , "vbabka@suse.cz" , "surenb@google.com" , "sfr@canb.auug.org.au" , "mcgrof@kernel.org" , "feng.tang@intel.com" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , David Rientjes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 115CA1A0017 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 8pfnfyn8dq1gsyjwephdy5ue5u5tmwot X-HE-Tag: 1684533904-83699 X-HE-Meta: U2FsdGVkX1+L6c0tSVLi1V7w+nNPS9VJY/TZPhmIwZ/fYjrlZ/e2x9YeXgFn81Z/s5yR5Yu9cyrU+CkeICKCsRwe/BR5E5IfTSb/iH0XQLUuWPgO5/tVBxISBE7aLj4DFaKEOXi0Eb+q4JvmNjS5VTkflK6gAwOFmnwZ0L5eFwx8CSaikmC3xacCvPnC3Ty4hyZ5XEtdsA+rhj5Sau1vysPt/PSfC/ms1G7W2eTEjRgQRHt0FgsffA6lisXza02mqvruh9AVKTCBzvuWQVhQ0c/SsW/bJDO+JJKQeNNt8x/ftzNS/mv0VEtEnmukTS4lvRtP0rOykxlKslm9ujhfUis071mRJhtdDgAv/BRj1h6AY126OL5l2jHMHaM9ohhA8velEeu13/NCKyyBH04mMjhw8xVt1fJ3ujQc/MbkoRHguewnm13xpKM+NeDSvAxKh7VT7PPxWxWaz4K7OuCvM7CiLwnvnKwiSK+bN2SUSACEPbWlvgm4FQ20aAkE10yL4vJdOgGjWud7h9wKXVlstp72mVDAU0/x/T68G+MGzhzvNDkNYXSkSO+rA4csoZ1k97aFPe37iGoV0IPQLXJfF3JAo6LMz0bKmAahsq07mVhBwYYoQsd+HKe2eus55zF3EBQJKgpp36tNKRr9CdD8ZopEYdAmQFDZgkXRMYG0KG0/2DcEtbNi/XLtgZrWNlHHmBR88ayBEQJvdTfBNTEUWULTfsjtGHQLIaapMzmJtjEMDJrM3juBdqEZUjHEjQSDk40VdOlTncHQlMSygQ+TBEskfbknt3Mm6snErxn9pSAnWvBeNyFJ2lulVu5pJchEudzID2Jed+S9VjYWiTR2f5YQsHcXltJ1Z1qb9tUuzJKEGFhO5nNcUrVE0lPrP6WAIL0UP1+oXbw6xuZkCzM0WMxIqcElCwHNgyurExyEJzGbLG35J+5OgWeuHUNASkUtfWnGoDU9A49cuUgO4qp dhngU7Yi G5SqvVuxYvGmU0kwd4quhNq1PLkCBmto8eAJtcyw/khv0dhmcl8pcmzckih27hRzmjjppgPb7efcql/6OP767Hlqfg0+0KHaIDC16KnMdlpc6kqvb4MFKz23QFpHj+Xx5Gobt7mBZ4fUOOYj62b4sv4W1vJwHx8LozDtZu238DUyv7lnjgtUO/uxI6+CERs5dT5wCJ8wDWGeyreqdCcfBYs+UN2b/TEoa2TVXZNszfVy6+ChpPjuCteL3hezRD2UpCzHVoFPAQaR1AI7OHJBf4grRDxDW6kuAwjfOrKCtf1CJ8rz3zIQnCm1uX56NuOPAdD6wFc0yAsXWhuwO1JDhhA1LTgKynzCpsuZ8y78MbVJ/cAujXaA+jviTIfbkK3ZHOHQZVgp/aTSMDLOW6tCRw/elcush3QHGiJw0RbLd0BM/870= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 17, 2023 at 10:12=E2=80=AFPM =E7=A8=8B=E5=9E=B2=E6=B6=9B Chengk= aitao Cheng wrote: > > At 2023-05-18 04:42:12, "Yosry Ahmed" wrote: > >On Wed, May 17, 2023 at 3:01=E2=80=AFAM =E7=A8=8B=E5=9E=B2=E6=B6=9B Chen= gkaitao Cheng > > wrote: > >> > >> At 2023-05-17 16:09:50, "Yosry Ahmed" wrote: > >> >On Wed, May 17, 2023 at 1:01=E2=80=AFAM =E7=A8=8B=E5=9E=B2=E6=B6=9B C= hengkaitao Cheng > >> > wrote: > >> >> > >> >> At 2023-05-17 14:59:06, "Yosry Ahmed" wrote= : > >> >> >+David Rientjes > >> >> > > >> >> >On Tue, May 16, 2023 at 8:20=E2=80=AFPM chengkaitao wrote: > >> >> >> > >> >> Thank you for providing a new application scenario. You have descri= bed a > >> >> new per-memcg approach, but a simple introduction cannot explain th= e > >> >> details of your approach clearly. If you could compare and analyze = my > >> >> patches for possible defects, or if your new approach has advantage= s > >> >> that my patches do not have, I would greatly appreciate it. > >> > > >> >Sorry if I was not clear, I am not implying in any way that the > >> >approach I am describing is better than your patches. I am guilty of > >> >not conducting the proper analysis you are requesting. > >> > >> There is no perfect approach in the world, and I also seek your advice= with > >> a learning attitude. You don't need to say sorry, I should say thank y= ou. > >> > >> >I just saw the thread and thought it might be interesting to you or > >> >others to know the approach that we have been using for years in our > >> >production. I guess the target is the same, be able to tell the OOM > >> >killer which memcgs/processes are more important to protect. The > >> >fundamental difference is that instead of tuning this based on the > >> >memory usage of the memcg (your approach), we essentially give the OO= M > >> >killer the ordering in which we want memcgs/processes to be OOM > >> >killed. This maps to jobs priorities essentially. > >> > >> Killing processes in order of memory usage cannot effectively protect > >> important processes. Killing processes in a user-defined priority orde= r > >> will result in a large number of OOM events and still not being able t= o > >> release enough memory. I have been searching for a balance between > >> the two methods, so that their shortcomings are not too obvious. > >> The biggest advantage of memcg is its tree topology, and I also hope > >> to make good use of it. > > > >For us, killing processes in a user-defined priority order works well. > > > >It seems like to tune memory.oom.protect you use oom_kill_inherit to > >observe how many times this memcg has been killed due to a limit in an > >ancestor. Wouldn't it be more straightforward to specify the priority > >of protections among memcgs? > > > >For example, if you observe multiple memcgs being OOM killed due to > >hitting an ancestor limit, you will need to decide which of them to > >increase memory.oom.protect for more, based on their importance. > >Otherwise, if you increase all of them, then there is no point if all > >the memory is protected, right? > > If all memory in memcg is protected, its meaning is similar to that of th= e > highest priority memcg in your approach, which is ultimately killed or > never killed. Makes sense. I believe it gets a bit trickier when you want to describe relative ordering between memcgs using memory.oom.protect. > > >In this case, wouldn't it be easier to just tell the OOM killer the > >relative priority among the memcgs? > > > >> > >> >If this approach works for you (or any other audience), that's great, > >> >I can share more details and perhaps we can reach something that we > >> >can both use :) > >> > >> If you have a good idea, please share more details or show some code. > >> I would greatly appreciate it > > > >The code we have needs to be rebased onto a different version and > >cleaned up before it can be shared, but essentially it is as > >described. > > > >(a) All processes and memcgs start with a default score. > >(b) Userspace can specify scores for memcgs and processes. A higher > >score means higher priority (aka less score gets killed first). > >(c) The OOM killer essentially looks for the memcg with the lowest > >scores to kill, then among this memcg, it looks for the process with > >the lowest score. Ties are broken based on usage, so essentially if > >all processes/memcgs have the default score, we fallback to the > >current OOM behavior. > > If memory oversold is severe, all processes of the lowest priority > memcg may be killed before selecting other memcg processes. > If there are 1000 processes with almost zero memory usage in > the lowest priority memcg, 1000 invalid kill events may occur. > To avoid this situation, even for the lowest priority memcg, > I will leave him a very small oom.protect quota. I checked internally, and this is indeed something that we see from time to time. We try to avoid that with userspace OOM killing, but it's not 100% effective. > > If faced with two memcgs with the same total memory usage and > priority, memcg A has more processes but less memory usage per > single process, and memcg B has fewer processes but more > memory usage per single process, then when OOM occurs, the > processes in memcg B may continue to be killed until all processes > in memcg B are killed, which is unfair to memcg B because memcg A > also occupies a large amount of memory. I believe in this case we will kill one process in memcg B, then the usage of memcg A will become higher, so we will pick a process from memcg A next. > > Dose your approach have these issues? Killing processes in a > user-defined priority is indeed easier and can work well in most cases, > but I have been trying to solve the cases that it cannot cover. The first issue is relatable with our approach. Let me dig more info from our internal teams and get back to you with more details. > > -- > Thanks for your comment! > chengkaitao > >