From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 347B4C433EF for ; Fri, 8 Jul 2022 08:54:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8138B6B0071; Fri, 8 Jul 2022 04:54:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 799E76B0073; Fri, 8 Jul 2022 04:54:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 660D16B0074; Fri, 8 Jul 2022 04:54:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 507696B0071 for ; Fri, 8 Jul 2022 04:54:39 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2324433DC5 for ; Fri, 8 Jul 2022 08:54:39 +0000 (UTC) X-FDA: 79663321878.09.D5D3A77 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf21.hostedemail.com (Postfix) with ESMTP id 9D0291C0056 for ; Fri, 8 Jul 2022 08:54:38 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id F019E1FEFE; Fri, 8 Jul 2022 08:54:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657270477; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Y09aHnftKgsb755aav+tcisMBqwaGQVPRoF6I/GeksQ=; b=pFEUnbCb+AjEBxNHzdq1IzRE0SheLCfh79/KP/G2SXw3v7NpmirYHy0MKyq6FafCpJoHdC Zbw81wF2FOf/Qk3Ph1hEgS9khYQ4lGrGD+cADZ22SMXc6uOyDMgwsrnnvaOpOJQRDLhzAb HRMxK0Ok91k7A5/A6xLynKW6jrv/yZ0= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 04BBE2C141; Fri, 8 Jul 2022 08:54:33 +0000 (UTC) Date: Fri, 8 Jul 2022 10:54:33 +0200 From: Michal Hocko To: Gang Li Cc: akpm@linux-foundation.org, surenb@google.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, viro@zeniv.linux.org.uk, ebiederm@xmission.com, keescook@chromium.org, rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, david@redhat.com, imbrenda@linux.ibm.com, adobriyan@gmail.com, yang.yang29@zte.com.cn, brauner@kernel.org, stephen.s.brennan@oracle.com, zhengqi.arch@bytedance.com, haolee.swjtu@gmail.com, xu.xin16@zte.com.cn, Liam.Howlett@oracle.com, ohoono.kwon@samsung.com, peterx@redhat.com, arnd@arndb.de, shy828301@gmail.com, alex.sierra@amd.com, xianting.tian@linux.alibaba.com, willy@infradead.org, ccross@google.com, vbabka@suse.cz, sujiaxun@uniontech.com, sfr@canb.auug.org.au, vasily.averin@linux.dev, mgorman@suse.de, vvghjk1234@gmail.com, tglx@linutronix.de, luto@kernel.org, bigeasy@linutronix.de, fenghua.yu@intel.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-perf-users@vger.kernel.org Subject: Re: [PATCH v2 0/5] mm, oom: Introduce per numa node oom for CONSTRAINT_{MEMORY_POLICY,CPUSET} Message-ID: References: <20220708082129.80115-1-ligang.bdlg@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220708082129.80115-1-ligang.bdlg@bytedance.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657270478; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y09aHnftKgsb755aav+tcisMBqwaGQVPRoF6I/GeksQ=; b=hmCOm82KaVHVm7rHXmk+5JL/C6KNA6Dt5icX5atvtehfGFjSB8AcPIcqV41gXbdx1ja8/r NtFpIXnsKRfF9LESjcbXLqormckL2lML1RnRQFR5Q2HdUc7qzVtU3XCPTeRd3iQXU6hq0B ajac0sJ1Dyl0tZuDY4OmN13TOWdkyr4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657270478; a=rsa-sha256; cv=none; b=zM7roRRFovR67Vh/Wgg5YVq0x9mGRNZZievXlDcOlopDKvohO12jjpxaUB00JdxflfV3PW zEHQXb/UrZW7C0mzI7wvMMfLP1t4/LuTaZgfWSQJIyuiexiGa1GB9NSKGhQaNPiZbUEb6Y YIcrkKeRy6Mc/UDJ3ZhFwVo6rPr5gxc= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=pFEUnbCb; spf=pass (imf21.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Rspamd-Queue-Id: 9D0291C0056 X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=pFEUnbCb; spf=pass (imf21.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Rspamd-Server: rspam03 X-Stat-Signature: 39fu8583dx6yasb9d5fmygce7m8txzuo X-HE-Tag: 1657270478-595180 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 08-07-22 16:21:24, Gang Li wrote: > TLDR > ---- > If a mempolicy or cpuset is in effect, out_of_memory() will select victim > on specific node to kill. So that kernel can avoid accidental killing on > NUMA system. We have discussed this in your previous posting and an alternative proposal was to use cpusets to partition NUMA aware workloads and enhance the oom killer to be cpuset aware instead which should be a much easier solution. > Problem > ------- > Before this patch series, oom will only kill the process with the highest > memory usage by selecting process with the highest oom_badness on the > entire system. > > This works fine on UMA system, but may have some accidental killing on NUMA > system. > > As shown below, if process c.out is bind to Node1 and keep allocating pages > from Node1, a.out will be killed first. But killing a.out did't free any > mem on Node1, so c.out will be killed then. > > A lot of AMD machines have 8 numa nodes. In these systems, there is a > greater chance of triggering this problem. Please be more specific about existing usecases which suffer from the current OOM handling limitations. -- Michal Hocko SUSE Labs