From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3ABFBC77B75 for ; Wed, 17 May 2023 20:42:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 916D1900005; Wed, 17 May 2023 16:42:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C610900003; Wed, 17 May 2023 16:42:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B502900005; Wed, 17 May 2023 16:42:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 69164900003 for ; Wed, 17 May 2023 16:42:53 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2D7E5C068C for ; Wed, 17 May 2023 20:42:53 +0000 (UTC) X-FDA: 80800921026.19.9149E9E Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf26.hostedemail.com (Postfix) with ESMTP id 266E6140006 for ; Wed, 17 May 2023 20:42:49 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=k1ibNgQ1; spf=pass (imf26.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684356170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CvCkfJyXdBizSrj+S1CIuIjHPj7mR2pZW70tMG5/ENw=; b=gf9Apyu7zwk3LkcevMlteSux957DC441hFFw3gzw6O9DGjaJ1guhYsVEbjKlh0BXncdiJn xLLQdh/YWDWPkem9/jWJ5H/LVsRYLsKalBR40OJHlUROJm3TdohKGfj3Wr4x+3YR5djskS H0r3coV3cGw+IhPEjQd55s9wOI/WJ/c= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=k1ibNgQ1; spf=pass (imf26.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684356170; a=rsa-sha256; cv=none; b=PMOjNwqqvuT8BG2Fs5DnW7/ll7S2BX7/4rDXRGNwg1us1mYjIGSRXNsCKV2oVH5QBlm3N5 bAuOCoVzzyftvBA48tbqatPPshzH2zmB5hM6o69LRg/QV5sAQ5qo3fzjG7fJ3XDxvHNIc+ EKKL0e8In5TB47dNb3sFdjvLyq58ajs= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-966400ee79aso230799266b.0 for ; Wed, 17 May 2023 13:42:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684356168; x=1686948168; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CvCkfJyXdBizSrj+S1CIuIjHPj7mR2pZW70tMG5/ENw=; b=k1ibNgQ1aq28Dhp9cGBlgSxQJ7lbc/VdbcXrDj0eAoSPCEx1xcoDbcgMCSEqtIEIKI 3GPic5hMCSaX21aJvzjbvjDyMc8l0sqejX2otxM8GSrW6JWPtr/4f8YtHld2rn1dnUBx k+aSQCrSTdfmO5gim5Lnq9Ve8U1YlVehAWMGpz0+oufZ424aTfD8G852MYX5GuYYjS21 vsrdXtzfvgk/1X5CChW00yaz0QHjQVFPesWj0npLjj185JeKtvibvT3TRJg1tejKzou7 gemYCK9U2mfESleEXZ1k2GewsSh30YnAmHtuHgyWG99vSxr6pMWI54GUX+JThPbOoTqA IbpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684356168; x=1686948168; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CvCkfJyXdBizSrj+S1CIuIjHPj7mR2pZW70tMG5/ENw=; b=RpLDwlF85HVIBPaZmmoWvWhE8TbGKEWuSqj9Iw/A9Wibog3D6TWWLazrcd+ncdeIhx S5FqNpmp9FeK8nY6Jb4eJqs3UrhVff6wdi9b0AJxeUDr9qOqqruu3XlSPLleagOjkPL8 z4bmHL5Bp3ZqXlTuQTdtK3WXoZg2XWl4hOpjoWkhW5gjIZW1Iv2DhuoqPV74/CgKjg01 RBIygZMuSY8+/R67i6lIjz/9He7MhFvjN4GHKyucF6fOaPYN9S/rtGJkyxGqLhaATuyS WNpyV7BVO02hWEDyyGhohUDLoGZ75LgEz7hf/l06JVHYwlbrfVFtGAd/77HNI4drPzLi POwg== X-Gm-Message-State: AC+VfDzm+xS9pwbz+3ir6Wk12DYC8rLWwygJSA2kbRGS9Agd03hKEm4G fcJsLWBTXpcF24Fja/xem+b5E2xRZFtjp9w/+uL3Vw== X-Google-Smtp-Source: ACHHUZ5kUTrtS3PDNiv88Mejxdsl502rh+r9lku+/B71PLDd2EJdDbwQfpLDEruHsoMoATzJa3F7uuidOinOVVG7Ia0= X-Received: by 2002:a17:907:9485:b0:96a:b12d:2fdf with SMTP id dm5-20020a170907948500b0096ab12d2fdfmr17122192ejc.12.1684356168500; Wed, 17 May 2023 13:42:48 -0700 (PDT) MIME-Version: 1.0 References: <6AB7FF12-F855-4D5B-9F75-9F7D64823144@didiglobal.com> In-Reply-To: <6AB7FF12-F855-4D5B-9F75-9F7D64823144@didiglobal.com> From: Yosry Ahmed Date: Wed, 17 May 2023 13:42:12 -0700 Message-ID: Subject: Re: [PATCH v4 0/2] memcontrol: support cgroup level OOM protection To: =?UTF-8?B?56iL5Z6y5rabIENoZW5na2FpdGFvIENoZW5n?= Cc: "tj@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "mhocko@kernel.org" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "akpm@linux-foundation.org" , "brauner@kernel.org" , "muchun.song@linux.dev" , "viro@zeniv.linux.org.uk" , "zhengqi.arch@bytedance.com" , "ebiederm@xmission.com" , "Liam.Howlett@oracle.com" , "chengzhihao1@huawei.com" , "pilgrimtao@gmail.com" , "haolee.swjtu@gmail.com" , "yuzhao@google.com" , "willy@infradead.org" , "vasily.averin@linux.dev" , "vbabka@suse.cz" , "surenb@google.com" , "sfr@canb.auug.org.au" , "mcgrof@kernel.org" , "feng.tang@intel.com" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , David Rientjes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 266E6140006 X-Stat-Signature: put5atuxky6xshpke8h5a55359gwyda3 X-HE-Tag: 1684356169-16665 X-HE-Meta: U2FsdGVkX19zGc+aMa1r/SDew1zZ9jGduYpD4qXuvkKka7kV3Uki/yKttgVRBAOz8ZQ0k3cZv9e+ls25hPklK5vp/1JEpyn+oN59upnQnxIA9blhbmX+XCvmi7fpNgP1sVsEUS4/WW1lbV8j6Yk5LBI88wPDlXg3HAPSiK9ZGL7YEwG1hH/M9Qc6eDTUXYhOePIxXGP6RUkvIx1kvWA7w1NQOgrKnz2LupcFx+gh6DOHLRpSk0COhyLd0yhk9UmZfGA8ZUrkh6mCqBhP1zmnD2tZsUPKNx5HX42TcaVhUJPrA8sXU1ykWhaC4ueyZhUonMtbLHi0utFV46X7F+mmSnv5nOW/yxIhIrHUwMcrcClrf9bcWDApGwYJJPr1yHGFr8eTCD6i6yQuLpELwQ1hSKRwV4uVslJRoXtRTQgk8tRPG6Mmp7sfXYFtGakelf/ZkcIT060D3OeJVg8Pg+c6Ip9A448vVjiUqLP+/bUd76I/N9ytXKa4C2nNk76TMS7UR6MCVcquWE23610zuoSF2Gpv81299sd8urVBtniiEipOWmYGpuxUTU0zbl/ninr+ZQDWEHFXfALmmKlEyEA3tXOOoUVjVLlTc48H95ymxwzzcbE+/IzCJ5se9/fBUl9Pk7cibNX9axU5BXQIh9ie4JMwnXbyu61kSm3TwcLaroyQx850O0h2A6kOFh/QKQYaqF7novgdalsq2bBeiBxy6w8ZqGGTZWPOmjEUFhKvmyov7gOcyyIujJfasYHa1+KEfo6YQxGd8yZca5JCa55UAzmqNhVGoUx/HgnXT9pFMeLHfPWtak1B0HYrAdvBH0+bjWymFojwiFHJCVcmZzGcY3VTub5ZJ0q8gWZ4bvn6lVCa+WyFAz//34K0GHMj7HLoqf/0Fz6u5F87B/Yh2xJhQ8zD2/US/5He4kSY+YySTIIuWK5k6Hi1H875pq+C/6p9JbuaLfXz712qQiSN4AP uc6FhuC+ zkH0QR9dMmHF4wLxoxGolly1EI9kZcWJqwODYf9nbvTDRaEOgGY/nRhPpX/DhyHILOqg7cWkzrZw8K8UDNtGZXMrKOwJv8DMV28oyESOItSoYlWwkeEpmr8raZWmpiGi/7+4JkuO7hONkhTM2Ob5o9UKwuTEp57Zk7S70w5hMLfzmVH2Og442a3pGALjuXE3tRoOA2LdyjZ0PIFGewRSFIf6cqaUd4Q2+xa3jcCjDa9zQUzhCHm+eRbLaUINfFTWRVKMtjI46XUCQGjYisWTVNlMzy9eDJbJVIQOMttL50maKW5Q3HkJcUxF4ABhdN/WzuYov5GATWry5A5lZbTro5E5B/tJvvIcCCjREVPayf3yxxHX1X+fTQyab2VoJR1mvM6RDe/O39+WOLtewbBppQNW/pWJVnQ3usL962+n3OMYTFv9j33QX0qbRteYVbe1iUQ6Gq2ZSc+cStg5dHjaGBLZhSuH6OWzFi1S0YS+Dr9VZGr4/1B6f4lLvJ8OD9FHTlooo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 17, 2023 at 3:01=E2=80=AFAM =E7=A8=8B=E5=9E=B2=E6=B6=9B Chengka= itao Cheng wrote: > > At 2023-05-17 16:09:50, "Yosry Ahmed" wrote: > >On Wed, May 17, 2023 at 1:01=E2=80=AFAM =E7=A8=8B=E5=9E=B2=E6=B6=9B Chen= gkaitao Cheng > > wrote: > >> > >> At 2023-05-17 14:59:06, "Yosry Ahmed" wrote: > >> >+David Rientjes > >> > > >> >On Tue, May 16, 2023 at 8:20=E2=80=AFPM chengkaitao wrote: > >> >> > >> >> Establish a new OOM score algorithm, supports the cgroup level OOM > >> >> protection mechanism. When an global/memcg oom event occurs, we tre= at > >> >> all processes in the cgroup as a whole, and OOM killers need to sel= ect > >> >> the process to kill based on the protection quota of the cgroup. > >> >> > >> > > >> >Perhaps this is only slightly relevant, but at Google we do have a > >> >different per-memcg approach to protect from OOM kills, or more > >> >specifically tell the kernel how we would like the OOM killer to > >> >behave. > >> > > >> >We define an interface called memory.oom_score_badness, and we also > >> >allow it to be specified per-process through a procfs interface, > >> >similar to oom_score_adj. > >> > > >> >These scores essentially tell the OOM killer the order in which we > >> >prefer memcgs to be OOM'd, and the order in which we want processes i= n > >> >the memcg to be OOM'd. By default, all processes and memcgs start wit= h > >> >the same score. Ties are broken based on the rss of the process or th= e > >> >usage of the memcg (prefer to kill the process/memcg that will free > >> >more memory) -- similar to the current OOM killer. > >> > >> Thank you for providing a new application scenario. You have described= a > >> new per-memcg approach, but a simple introduction cannot explain the > >> details of your approach clearly. If you could compare and analyze my > >> patches for possible defects, or if your new approach has advantages > >> that my patches do not have, I would greatly appreciate it. > > > >Sorry if I was not clear, I am not implying in any way that the > >approach I am describing is better than your patches. I am guilty of > >not conducting the proper analysis you are requesting. > > There is no perfect approach in the world, and I also seek your advice wi= th > a learning attitude. You don't need to say sorry, I should say thank you. > > >I just saw the thread and thought it might be interesting to you or > >others to know the approach that we have been using for years in our > >production. I guess the target is the same, be able to tell the OOM > >killer which memcgs/processes are more important to protect. The > >fundamental difference is that instead of tuning this based on the > >memory usage of the memcg (your approach), we essentially give the OOM > >killer the ordering in which we want memcgs/processes to be OOM > >killed. This maps to jobs priorities essentially. > > Killing processes in order of memory usage cannot effectively protect > important processes. Killing processes in a user-defined priority order > will result in a large number of OOM events and still not being able to > release enough memory. I have been searching for a balance between > the two methods, so that their shortcomings are not too obvious. > The biggest advantage of memcg is its tree topology, and I also hope > to make good use of it. For us, killing processes in a user-defined priority order works well. It seems like to tune memory.oom.protect you use oom_kill_inherit to observe how many times this memcg has been killed due to a limit in an ancestor. Wouldn't it be more straightforward to specify the priority of protections among memcgs? For example, if you observe multiple memcgs being OOM killed due to hitting an ancestor limit, you will need to decide which of them to increase memory.oom.protect for more, based on their importance. Otherwise, if you increase all of them, then there is no point if all the memory is protected, right? In this case, wouldn't it be easier to just tell the OOM killer the relative priority among the memcgs? > > >If this approach works for you (or any other audience), that's great, > >I can share more details and perhaps we can reach something that we > >can both use :) > > If you have a good idea, please share more details or show some code. > I would greatly appreciate it The code we have needs to be rebased onto a different version and cleaned up before it can be shared, but essentially it is as described. (a) All processes and memcgs start with a default score. (b) Userspace can specify scores for memcgs and processes. A higher score means higher priority (aka less score gets killed first). (c) The OOM killer essentially looks for the memcg with the lowest scores to kill, then among this memcg, it looks for the process with the lowest score. Ties are broken based on usage, so essentially if all processes/memcgs have the default score, we fallback to the current OOM behavior. > > >> > >> >This has been brought up before in other discussions without much > >> >interest [1], but just thought it may be relevant here. > >> > > >> >[1]https://lore.kernel.org/lkml/CAHS8izN3ej1mqUpnNQ8c-1Bx5EeO7q5NOkh0= qrY_4PLqc8rkHA@mail.gmail.com/#t > > -- > Thanks for your comment! > chengkaitao >