From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B73C0C43334 for ; Tue, 12 Jul 2022 11:12:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 40AC494005D; Tue, 12 Jul 2022 07:12:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 394716B00C8; Tue, 12 Jul 2022 07:12:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20CD894005D; Tue, 12 Jul 2022 07:12:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 094BB6B00C6 for ; Tue, 12 Jul 2022 07:12:40 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D16B434D45 for ; Tue, 12 Jul 2022 11:12:39 +0000 (UTC) X-FDA: 79678184838.05.F140EFA Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf08.hostedemail.com (Postfix) with ESMTP id 4024A160054 for ; Tue, 12 Jul 2022 11:12:37 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id o31-20020a17090a0a2200b001ef7bd037bbso7645255pjo.0 for ; Tue, 12 Jul 2022 04:12:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=Qb55Kbq0y4vwgfIg3wFDxJyNQoImybJoWJeGkFEBt8Y=; b=1j7HJgDk+BOlvIrPto5v/w0UyoFz/lnQv9wgfOdsMphjfRFqT2DIg1DRCHe62jotIw Wcd/TP+YDwzfz1hnJ6brTDEGw2ZuVW3aJM4o1q8Q0gS/+iGgF/XMxyvjeSCbVzLv2yv/ AZOOfT7iwDEcacqkH1Rg/gSvMy7ht63jN/2WIxL6byrFEmQFcJ+MCxNo/6iVX5tgrM46 jeNMwF7+1ZlSigpdyBSpowNoE2GVGisPcjq7xOSY0ZJs36ILM/NQQSm4zFPQUpdNls/C 2lw78hd8AKVpILhM+sGMlgeveyvYN0B+WRx8Por5mmiOEpPIllo6NLOIUa7ziMhmBgP1 rMMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=Qb55Kbq0y4vwgfIg3wFDxJyNQoImybJoWJeGkFEBt8Y=; b=R8MmkN05EMJEmY2fnIpzuQdaZ+XO4sThnAUUDCx8IApPdHuhNDCD7GRxZWpSOOJUyK lKxCIc/3ugWeJhFOXMoSpobVG62RH8rOE5+IE5Ck1+pPBDg3yCtdMzg2AjZW31GmlB0P sB/WKXmIGywogwtMljCkkpjGpAvo7sAiTOnHffIgHwcuyWkZcY84n9ELw/urWZXFYyfx hLxw7kuqmRBmCx1c5I0oCqfta3XoReBor+08J1cNSVi1y47LkLVI1AR5f9W3wXqk/M6p Mz2/zKXNxZXhoPYINbQ4l4scvWTGA79haM1apdYcVDd0LpOO1mPDFoaEbhEaGGfNLlAv l0JA== X-Gm-Message-State: AJIora9P78I/6Mm9NlWxMOTwQelGi/i0CeTS5FSBmFagD178WkJPZWj3 NM0wsSLOV+NX5J6xeXYBRSmYTw== X-Google-Smtp-Source: AGRyM1uY3FpE7QhDTBbilYTgVnALa3bIqWVrC/zDtaL3RXmhC1OZWn1T+qY6Y660eckXwBBY0oBkyA== X-Received: by 2002:a17:90b:2c0b:b0:1ef:aa42:f19b with SMTP id rv11-20020a17090b2c0b00b001efaa42f19bmr3729663pjb.211.1657624355862; Tue, 12 Jul 2022 04:12:35 -0700 (PDT) Received: from [10.4.113.6] ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id t7-20020a17090340c700b0016c59b38254sm1550585pld.127.2022.07.12.04.12.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Jul 2022 04:12:35 -0700 (PDT) Message-ID: <41ae31a7-6998-be88-858c-744e31a76b2a@bytedance.com> Date: Tue, 12 Jul 2022 19:12:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v2 0/5] mm, oom: Introduce per numa node oom for CONSTRAINT_{MEMORY_POLICY,CPUSET} Content-Language: en-US To: Michal Hocko , Gang Li Cc: akpm@linux-foundation.org, surenb@google.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, viro@zeniv.linux.org.uk, ebiederm@xmission.com, keescook@chromium.org, rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, david@redhat.com, imbrenda@linux.ibm.com, adobriyan@gmail.com, yang.yang29@zte.com.cn, brauner@kernel.org, stephen.s.brennan@oracle.com, zhengqi.arch@bytedance.com, haolee.swjtu@gmail.com, xu.xin16@zte.com.cn, Liam.Howlett@oracle.com, ohoono.kwon@samsung.com, peterx@redhat.com, arnd@arndb.de, shy828301@gmail.com, alex.sierra@amd.com, xianting.tian@linux.alibaba.com, willy@infradead.org, ccross@google.com, vbabka@suse.cz, sujiaxun@uniontech.com, sfr@canb.auug.org.au, vasily.averin@linux.dev, mgorman@suse.de, vvghjk1234@gmail.com, tglx@linutronix.de, luto@kernel.org, bigeasy@linutronix.de, fenghua.yu@intel.com, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-perf-users@vger.kernel.org, hezhongkun.hzk@bytedance.com References: <20220708082129.80115-1-ligang.bdlg@bytedance.com> From: Abel Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=1j7HJgDk; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf08.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657624359; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Qb55Kbq0y4vwgfIg3wFDxJyNQoImybJoWJeGkFEBt8Y=; b=7W2WS+YCsTCCjlDA3flqUOGed112hDmA6rIJwePeIU7Tny3iHe54Van8tFgnXpsXIXginG Kz1ry3MFGZDOB3VGJvw22bo4U8WqBkouUp5h2o5dPIov+dlGsq8h9vkjAO6GuvzqvzvGBw bhztVBgKpzPfWgx7pLEd/VH7OdSkj58= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657624359; a=rsa-sha256; cv=none; b=gUsYyEooI9jrpbA7AAL7ZsHSCWpw4hawTvjGuZgPimCGYLlUFr9VxQmX+F9Kp+I3Iprzzj wMpK9VhvkMrXcu+LhkbAWS6Ny0/VDcuN8aLClNU+rLY4bGbby9CbsOf8TOpAzdWXpyW3yF +sfBaOSoiKSzrRRuqk3lzpH9XXMxrZA= X-Stat-Signature: qptpaptnsmzbt9ukya8wqrr9xhqxnoam X-Rspamd-Queue-Id: 4024A160054 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=1j7HJgDk; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf08.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1657624357-384119 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000035, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Michal, On 7/8/22 4:54 PM, Michal Hocko Wrote: > On Fri 08-07-22 16:21:24, Gang Li wrote: >> TLDR >> ---- >> If a mempolicy or cpuset is in effect, out_of_memory() will select victim >> on specific node to kill. So that kernel can avoid accidental killing on >> NUMA system. > > We have discussed this in your previous posting and an alternative > proposal was to use cpusets to partition NUMA aware workloads and > enhance the oom killer to be cpuset aware instead which should be a much > easier solution. > >> Problem >> ------- >> Before this patch series, oom will only kill the process with the highest >> memory usage by selecting process with the highest oom_badness on the >> entire system. >> >> This works fine on UMA system, but may have some accidental killing on NUMA >> system. >> >> As shown below, if process c.out is bind to Node1 and keep allocating pages >> from Node1, a.out will be killed first. But killing a.out did't free any >> mem on Node1, so c.out will be killed then. >> >> A lot of AMD machines have 8 numa nodes. In these systems, there is a >> greater chance of triggering this problem. > > Please be more specific about existing usecases which suffer from the > current OOM handling limitations. I was just going through the mail list and happen to see this. There is another usecase for us about per-numa memory usage. Say we have several important latency-critical services sitting inside different NUMA nodes without intersection. The need for memory of these LC services varies, so the free memory of each node is also different. Then we launch several background containers without cpuset constrains to eat the left resources. Now the problem is that there doesn't seem like a proper memory policy available to balance the usage between the nodes, which could lead to memory-heavy LC services suffer from high memory pressure and fails to meet the SLOs. It's quite appreciated if you can shed some light on this! Thanks & BR, Abel