From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C711C4332F for ; Fri, 9 Dec 2022 08:25:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 146A58E0005; Fri, 9 Dec 2022 03:25:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F6DF8E0001; Fri, 9 Dec 2022 03:25:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F00368E0005; Fri, 9 Dec 2022 03:25:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E0B738E0001 for ; Fri, 9 Dec 2022 03:25:41 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 722A6ABA19 for ; Fri, 9 Dec 2022 08:25:41 +0000 (UTC) X-FDA: 80222084082.12.CC35C7C Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf27.hostedemail.com (Postfix) with ESMTP id A0E7440003 for ; Fri, 9 Dec 2022 08:25:39 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=Gnl1efSl; spf=pass (imf27.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670574339; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yg8Dy91nslxvuXyFZO8v15npgrF13z71XGIrk7x/zmg=; b=o+6RExgZgTire6WqMTnCj9UfmKlPUt/ozt0baiLsa/gsOPEgSXSdg3svHE5eAYEgF7fhNo R9c5bZG9wlDIQgrJLSFWllw8pls3mxqNn7u0ctF8ufFuYUZQMaUa8uF5w6RdZcEEe9DVaw AkedAOdxyg1QCHjm7/cZGOmDFBguiTw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=Gnl1efSl; spf=pass (imf27.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670574339; a=rsa-sha256; cv=none; b=lQLTM+FICh3/H5oDjNX80QV2iz80/xahqOreDSlBnlsb8QGLc8Reizat3Gw0qv95dX8Ssk jrs5E+X2sgIL2JEuGafvm+3zoQa9JtTdZatGGUT39pWDdgcvpiA/3CDvmc2CSKtYnRdE4k 9eVA4ooorsaaWR/s6zHwPGrto8VDbOs= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 282851FE57; Fri, 9 Dec 2022 08:25:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670574338; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yg8Dy91nslxvuXyFZO8v15npgrF13z71XGIrk7x/zmg=; b=Gnl1efSl8C1V561RAp72h3ej4lDX6Hv8yQ8w86HPJq65Ge6RaZnEvRYmYPoSJ9jiiv1NGv Fx9fSt/Cl1opw8apuCmh5dW02+cRgRM0sOsgUb5228e0NfEAvX3nWN0tgPL+Ob+UcDzKtb TWx/69xXPo0xOpP4HauT3GvHY0m9jyM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 024D2138E0; Fri, 9 Dec 2022 08:25:38 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id M/+IAALxkmOgQgAAMHmgww (envelope-from ); Fri, 09 Dec 2022 08:25:38 +0000 Date: Fri, 9 Dec 2022 09:25:37 +0100 From: Michal Hocko To: =?utf-8?B?56iL5Z6y5rab?= Chengkaitao Cheng Cc: chengkaitao , "tj@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "akpm@linux-foundation.org" , "songmuchun@bytedance.com" , "viro@zeniv.linux.org.uk" , "zhengqi.arch@bytedance.com" , "ebiederm@xmission.com" , "Liam.Howlett@oracle.com" , "chengzhihao1@huawei.com" , "haolee.swjtu@gmail.com" , "yuzhao@google.com" , "willy@infradead.org" , "vasily.averin@linux.dev" , "vbabka@suse.cz" , "surenb@google.com" , "sfr@canb.auug.org.au" , "mcgrof@kernel.org" , "sujiaxun@uniontech.com" , "feng.tang@intel.com" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" Subject: Re: [PATCH v2] mm: memcontrol: protect the memory in cgroup from being oom killed Message-ID: References: <114DF8F0-3E68-4F2B-8E35-0943EC2F51AE@didiglobal.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <114DF8F0-3E68-4F2B-8E35-0943EC2F51AE@didiglobal.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A0E7440003 X-Stat-Signature: 4ci8openzsfnhnoarpstjrkb59xestsc X-Rspam-User: X-HE-Tag: 1670574339-503879 X-HE-Meta: U2FsdGVkX18tVGbZcP/96CSGJe8JuC1mH5oEh8UWw/H5wZZ/xP1ZUA3VM+YEIHsopfvyHG6knc2lmmde62o5vDFBTuxiDqMCr47hZ0C7X4sONfFro05TMeYx1jbsMmuLkJNvfO1I+JP6fRpw6Mr5SpBRqfAA20OoNz4tK4+dm/MbjeXbScfTVVR7WdwVtOV2wg6wxrj3FeYHmyQEeum0PBthJ9ZPGr9CM0k/BPxo1JiU/OB/9GbK+/Mue6eAprZNVJb8Z+52c/aWF2QPZOv8Ysp8wo8NuhvttMHv3vvcugsM5W4alJc8gGXjNNA61+V3h7kSTNeTx1fgsTAbFvNqurzFdh0scQIXnMxX3iKB3Jc9172seP0/uJpSlSFtg3Vqo2hi7coIKAb3QDOZjUbgsjrilbmHAZM6hXHaUiqxllLLw7cZfQu4Sdr/PkSOUa2DeDAzvcRv1Tdpr3Zbf2eAE1+jiJcpKQPoCf4wBSn9qbqf3iZ3xKCFFga4gyS2otGuYd9CjxFeYmsxPTi3fIp64tKGAWg9rfFWU0ia/m5HDI0fMfENRix8+3MpU7VKTHwNrMSKMEjIC3xgbtmOrKHgCRxWSvNJLClRlvqquBP4gszFCeKOMLzP123FtXXX0Q7zUdU0C2RIZLbajvoAi8pwx+CnhrGP/pCOLVY+4cFVAPQIs/Uk9Y19nqb8sWr+HIOsFowtTK3bKY2wQs63WW6IOUcQjn6q3YeyTt6v9NX9KB7u8We+LjCRr3XYK1zGeva9QBXSA1CVcMXtO2QKspjxFGAzoVftpKFwdl8KsaxnzofM2lR8aXw5Y7Rlf/x+q2Gk7Uax81xzzJn5VD5odb7VlUDvVKZKmECDvlJpgp1z78wtbe+otCnNUPopajwnrsXvYDGC3dBRC9c6xqoRqpgPo6rtNe0jzwiEAlZXQuR/SwbTdkSsXa259nmGNAsNX20E4n2gafvibj3+QgjDQBu 50SFi9w8 r32tk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 09-12-22 05:07:15, 程垲涛 Chengkaitao Cheng wrote: > At 2022-12-08 22:23:56, "Michal Hocko" wrote: [...] > >oom killer is a memory reclaim of the last resort. So yes, there is some > >difference but fundamentally it is about releasing some memory. And long > >term we have learned that the more clever it tries to be the more likely > >corner cases can happen. It is simply impossible to know the best > >candidate so this is a just a best effort. We try to aim for > >predictability at least. > > Is the current oom_score strategy predictable? I don't think so. The score_adj > has broken the predictability of oom_score (it is no longer simply killing the > process that uses the most mems). oom_score as reported to the userspace already considers oom_score_adj which means that you can compare processes and get a reasonable guess what would be the current oom_victim. There is a certain fuzz level because this is not atomic and also there is no clear candidate when multiple processes have equal score. So yes, it is not 100% predictable. memory.reclaim as you propose doesn't change that though. Is oom_score_adj a good interface? No, not really. If I could go back in time I would nack it but here we are. We have an interface that promises quite much but essentially it only allows two usecases (OOM_SCORE_ADJ_MIN, OOM_SCORE_ADJ_MAX) reliably. Everything in between is clumsy at best because a real user space oom policy would require to re-evaluate the whole oom domain (be it global or memcg oom) as the memory consumption evolves over time. I am really worried that your memory.oom.protection directs a very similar trajectory because protection really needs to consider other memcgs to balance properly. [...] > > But I am really open > >to be convinced otherwise and this is in fact what I have been asking > >for since the beginning. I would love to see some examples on the > >reasonable configuration for a practical usecase. > > Here is a simple example. In a docker container, users can divide all processes > into two categories (important and normal), and put them in different cgroups. > One cgroup's oom.protect is set to "max", the other is set to "0". In this way, > important processes in the container can be protected. That is effectivelly oom_score_adj = OOM_SCORE_ADJ_MIN - 1 to all processes in the important group. I would argue you can achieve a very similar result by the process launcher to set the oom_score_adj and inherit it to all processes in that important container. You do not need any memcg tunable for that. I am really much more interested in examples when the protection is to be fine tuned. -- Michal Hocko SUSE Labs