From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5DC3C388F7 for ; Wed, 4 Nov 2020 08:40:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 40DFE2080C for ; Wed, 4 Nov 2020 08:40:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40DFE2080C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6FAB76B0068; Wed, 4 Nov 2020 03:40:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6ABBA6B006C; Wed, 4 Nov 2020 03:40:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C21D6B006E; Wed, 4 Nov 2020 03:40:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0117.hostedemail.com [216.40.44.117]) by kanga.kvack.org (Postfix) with ESMTP id 27BC86B0068 for ; Wed, 4 Nov 2020 03:40:28 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B7E0C180AD802 for ; Wed, 4 Nov 2020 08:40:27 +0000 (UTC) X-FDA: 77446089294.12.order76_630e91a272bf Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 93DFD18018512 for ; Wed, 4 Nov 2020 08:40:27 +0000 (UTC) X-HE-Tag: order76_630e91a272bf X-Filterd-Recvd-Size: 4136 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Wed, 4 Nov 2020 08:40:26 +0000 (UTC) IronPort-SDR: rD6lLEap/KuD7iDR5nu6tVQdkQ2SJDGN2aY05B9R3zhQCd5CK8ORG50HMrC9QJfaJEp2re+xxj B7n1V/u3vKFA== X-IronPort-AV: E=McAfee;i="6000,8403,9794"; a="169327535" X-IronPort-AV: E=Sophos;i="5.77,450,1596524400"; d="scan'208";a="169327535" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2020 00:40:24 -0800 IronPort-SDR: 1yqp2FoU4soRwC8zdeWoBQmLGPxpmPWAvYZG7Bs0yMWSWtekt/wPQ7d8ZxoU/TeW5HfQ3654bN uOiBlrM9daeQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,450,1596524400"; d="scan'208";a="528851047" Received: from shbuild999.sh.intel.com (HELO localhost) ([10.239.147.98]) by fmsmga005.fm.intel.com with ESMTP; 04 Nov 2020 00:40:22 -0800 Date: Wed, 4 Nov 2020 16:40:21 +0800 From: Feng Tang To: Michal Hocko Cc: Andrew Morton , Johannes Weiner , Matthew Wilcox , Mel Gorman , dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node Message-ID: <20201104084021.GB15700@shbuild999.sh.intel.com> References: <1604470210-124827-1-git-send-email-feng.tang@intel.com> <20201104071308.GN21990@dhcp22.suse.cz> <20201104073826.GA15700@shbuild999.sh.intel.com> <20201104075819.GA10052@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201104075819.GA10052@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 04, 2020 at 08:58:19AM +0100, Michal Hocko wrote: > On Wed 04-11-20 15:38:26, Feng Tang wrote: > [...] > > > Could you be more specific about the usecase here? Why do you need a > > > binding to a pure movable node? > > > > One common configuration for a platform is small size of DRAM plus huge > > size of PMEM (which is slower but cheaper), and my guess of their use > > is to try to lead the bulk of user space allocation (GFP_HIGHUSER_MOVABLE) > > to PMEM node, and only let DRAM be used as less as possible. > > While this is possible, it is a tricky configuration. It is essentially > get us back to 32b and highmem... :) Another possible case is similar binding on a memory hotplugable platform, which has one unplugable node and several other nodes configured as movable only to be hot removable when needed > As I've said in reply to your second patch. I think we can make the oom > killer behavior more sensible in this misconfigured cases but I do not > think we want break the cpuset isolation for such a configuration. Do you mean we skip the killing and just let the allocation fail? We've checked the oom killer code first, when the oom happens, both DRAM node and unmovable node have lots of free memory, and killing process won't improve the situation. (Folloing is copied from your comments for 2/2) > This allows to spill memory allocations over to any other node which > has Normal (or other lower) zones and as such it breaks cpuset isolation. > As I've pointed out in the reply to your cover letter it seems that > this is more of a misconfiguration than a bug. For the usage case (docker container running), the spilling is already happening, I traced its memory allocation requests, many of them are movable, and got fallback to the normal node naturally with current code, only a few got blocked, as many of __alloc_pages_nodemask are called witih 'NULL' nodemask parameter. And I made this RFC patch inspired by code in __alloc_pages_may_oom(): if (gfp_mask & __GFP_NOFAIL) page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_NO_WATERMARKS, ac); Thanks, Feng > -- > Michal Hocko > SUSE Labs