From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C14B3FA373D for ; Thu, 27 Oct 2022 06:48:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9B5F8E0002; Thu, 27 Oct 2022 02:48:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4B408E0001; Thu, 27 Oct 2022 02:48:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C14568E0002; Thu, 27 Oct 2022 02:48:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B2D848E0001 for ; Thu, 27 Oct 2022 02:48:16 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 78508120FC9 for ; Thu, 27 Oct 2022 06:48:16 +0000 (UTC) X-FDA: 80065800192.03.2804496 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf27.hostedemail.com (Postfix) with ESMTP id 73FA44000D for ; Thu, 27 Oct 2022 06:48:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666853294; x=1698389294; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=bNTVcZUk3EGA9xSMZhWo1dSEwPloZHZhSf6kjCFBv2U=; b=Glj1SoGUmMdR21K9cm3DUuXKWAUTH+f9ihsRAE8Aq2EmmYPZDduMYdkP GvyJ7CBve83AD9ik2xub6633Iawiyn+E+nGU19ogplKKw5ciiuQXneofc QGinrAFQojC9sCJmbZgFX3bvd5ScfmsGk9kj2W/CwCUhxHOpvYG7jFjjo 31uS/8bau/WvpxOMk5EDNNsiQTvTbwSioKElEC6GvitWzEe/kID7PZwMm 3WfWyIAP0Jju7hpqyQMnXPGaWNkZr61INA3a7kZPzO3+3ie0tvEKGzGHE pUq4GE7GVQdGnsqEiGiS/vOtQLvjMACMpeFFCUgVpRvrnA7QqP05Zz/IA w==; X-IronPort-AV: E=McAfee;i="6500,9779,10512"; a="309236627" X-IronPort-AV: E=Sophos;i="5.95,217,1661842800"; d="scan'208";a="309236627" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Oct 2022 23:48:05 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10512"; a="807339311" X-IronPort-AV: E=Sophos;i="5.95,217,1661842800"; d="scan'208";a="807339311" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Oct 2022 23:48:01 -0700 From: "Huang, Ying" To: Michal Hocko Cc: Feng Tang , Aneesh Kumar K V , Andrew Morton , Johannes Weiner , Tejun Heo , Zefan Li , Waiman Long , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Hansen, Dave" , "Chen, Tim C" , "Yin, Fengwei" Subject: Re: [PATCH] mm/vmscan: respect cpuset policy during page demotion References: <20221026074343.6517-1-feng.tang@intel.com> Date: Thu, 27 Oct 2022 14:47:22 +0800 In-Reply-To: (Michal Hocko's message of "Wed, 26 Oct 2022 17:59:19 +0200") Message-ID: <87wn8lkbk5.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666853295; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LF3dyDGw32wnLPF8zl0R2C5iVGguRAxTiC3HxvyuQyg=; b=ECOCNIqFQhWrcih4mSs9Su7Oyh9bTEAX2M3O4BZ1MP6m3zKUr/q6Cj0xqS88QSTXSxO8N6 y11IX/BRONQ34VNb6Ljr2QAtjPfvC3gcyI5UQ1BVvneWdw5OvgnA7poPm29d4mLoN192br 1IbM8ER5f291RpX7gMBvVSwIvJqerjQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Glj1SoGU; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666853295; a=rsa-sha256; cv=none; b=zR+tn68mGJ5wBsASLrytpzWGZ0znrAgjebQIhyLni8nY1kcSMttTx2YrqwOSw3+FoMAAm2 A9ly1IrNti86rfD8G7sDCd0ooFUOFG8heAL2DjKGSBodTm1svbYt+EqTJE2UNZ9OftSM+i FIjy8xynrbM6FWyQ5bacqs6qpnca1S4= Authentication-Results: imf27.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Glj1SoGU; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: zjnnmfimpzkbd917qjff7m76ejbg7e7n X-Rspamd-Queue-Id: 73FA44000D X-Rspamd-Server: rspam07 X-Rspam-User: X-HE-Tag: 1666853294-129557 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Michal Hocko writes: > On Wed 26-10-22 20:20:01, Feng Tang wrote: >> On Wed, Oct 26, 2022 at 05:19:50PM +0800, Michal Hocko wrote: >> > On Wed 26-10-22 16:00:13, Feng Tang wrote: >> > > On Wed, Oct 26, 2022 at 03:49:48PM +0800, Aneesh Kumar K V wrote: >> > > > On 10/26/22 1:13 PM, Feng Tang wrote: >> > > > > In page reclaim path, memory could be demoted from faster memory tier >> > > > > to slower memory tier. Currently, there is no check about cpuset's >> > > > > memory policy, that even if the target demotion node is not allowd >> > > > > by cpuset, the demotion will still happen, which breaks the cpuset >> > > > > semantics. >> > > > > >> > > > > So add cpuset policy check in the demotion path and skip demotion >> > > > > if the demotion targets are not allowed by cpuset. >> > > > > >> > > > >> > > > What about the vma policy or the task memory policy? Shouldn't we respect >> > > > those memory policy restrictions while demoting the page? >> > > >> > > Good question! We have some basic patches to consider memory policy >> > > in demotion path too, which are still under test, and will be posted >> > > soon. And the basic idea is similar to this patch. >> > >> > For that you need to consult each vma and it's owning task(s) and that >> > to me sounds like something to be done in folio_check_references. >> > Relying on memcg to get a cpuset cgroup is really ugly and not really >> > 100% correct. Memory controller might be disabled and then you do not >> > have your association anymore. >> >> You are right, for cpuset case, the solution depends on 'CONFIG_MEMCG=y', >> and the bright side is most of distribution have it on. > > CONFIG_MEMCG=y is not sufficient. You would need to enable memcg > controller during the runtime as well. > >> > This all can get quite expensive so the primary question is, does the >> > existing behavior generates any real issues or is this more of an >> > correctness exercise? I mean it certainly is not great to demote to an >> > incompatible numa node but are there any reasonable configurations when >> > the demotion target node is explicitly excluded from memory >> > policy/cpuset? >> >> We haven't got customer report on this, but there are quite some customers >> use cpuset to bind some specific memory nodes to a docker (You've helped >> us solve a OOM issue in such cases), so I think it's practical to respect >> the cpuset semantics as much as we can. > > Yes, it is definitely better to respect cpusets and all local memory > policies. There is no dispute there. The thing is whether this is really > worth it. How often would cpusets (or policies in general) go actively > against demotion nodes (i.e. exclude those nodes from their allowes node > mask)? > > I can imagine workloads which wouldn't like to get their memory demoted > for some reason but wouldn't it be more practical to tell that > explicitly (e.g. via prctl) rather than configuring cpusets/memory > policies explicitly? If my understanding were correct, prctl() configures the process or thread. How can we get process/thread configuration at demotion time? Best Regards, Huang, Ying