From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9611C38A2D for ; Thu, 27 Oct 2022 06:06:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30CC28E0002; Thu, 27 Oct 2022 02:06:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 295BC8E0001; Thu, 27 Oct 2022 02:06:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 137448E0002; Thu, 27 Oct 2022 02:06:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id F40218E0001 for ; Thu, 27 Oct 2022 02:06:11 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CB7121C684B for ; Thu, 27 Oct 2022 06:06:11 +0000 (UTC) X-FDA: 80065694142.27.D5C4785 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf02.hostedemail.com (Postfix) with ESMTP id 802A88003F for ; Thu, 27 Oct 2022 06:06:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666850770; x=1698386770; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=IOtoJvotBvGQVQl8hxk+xZrgGkgYvVwH6HE/NLIQ6g0=; b=KS4FXM1JxGC08XnstctZ0SHrZc2CQ8uPFQ0YAuKDr0f+Fb4A8RsCn2Tr lpHbByPFalaGTNFTspdlRxQTcGFAS2hjHZlfOWhUGQ66Z1dqZl2eUlkhL wyGYVQhOswZu2brcifnXeBKRhR656C7y6NVEWRxrsl4jPHxlZbxVUoHI5 vhST+4M+gMQw0VCDnqzcybPRZt5S+Hba9Z4HINYI0nIgz+qEV/sGzYHr4 Tr1pQ0U58j922Gwzi+HK1FpWNgimuAn16jYNviB1pnZRsb/uUlHbFfYAP AmIjuf2xIEGISLqznXI+l7gQBwIHFIVleoIieMIXhpUPOpFt74bbmEq4D Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10512"; a="372351367" X-IronPort-AV: E=Sophos;i="5.95,215,1661842800"; d="scan'208";a="372351367" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Oct 2022 23:06:02 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10512"; a="774871166" X-IronPort-AV: E=Sophos;i="5.95,215,1661842800"; d="scan'208";a="774871166" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Oct 2022 23:05:59 -0700 From: "Huang, Ying" To: Feng Tang Cc: Andrew Morton , Johannes Weiner , "Hocko, Michal" , Tejun Heo , Zefan Li , Waiman Long , "aneesh.kumar@linux.ibm.com" , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Hansen, Dave" , "Chen, Tim C" , "Yin, Fengwei" Subject: Re: [PATCH] mm/vmscan: respect cpuset policy during page demotion References: <20221026074343.6517-1-feng.tang@intel.com> <878rl1luh1.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 27 Oct 2022 14:05:19 +0800 In-Reply-To: (Feng Tang's message of "Thu, 27 Oct 2022 13:49:25 +0800") Message-ID: <871qqtls2o.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666850771; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MKY7QEHPQ9AF57iWvfdM4QTx50IjlYCbZKVK1Bn/jSQ=; b=8jRn6nDbhHGX3HYMubvUkItVVcA5DG1gqfM5EhvXcvV1ffC9mTzLjSZvvfqchbmqgxcVB1 Y6eJ/4eR1LXB+38nXdnrJk0zxGSNOm3Hmc1cdDcdHoetAqBOIzjjTaOLOmH1OUCrRAflsn bjhAc6YCML3dbZaNI4yNc8NLIbn53HQ= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=KS4FXM1J; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666850771; a=rsa-sha256; cv=none; b=jB4ClVo0/YfoPxW+AkDaAiKsPtWgTK2WmKChYtQ/KMFPCcCMFJ/wwZRwx1y8ZPQMhTzBuI nL8yAqRS+tNsPN8lSOvTAnTXDIzhtjYnRFF0ZCV/2/Q6/PAldBqbY45H8WwMbNOD56JBy/ znlJ8QK9URVGRmjsct8S1ZUkPdCWKYc= X-Rspamd-Queue-Id: 802A88003F Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=KS4FXM1J; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Stat-Signature: 86ob8jtx5n8ewjo5dqjsmgkhoyn3kh5n X-HE-Tag: 1666850770-345864 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Feng Tang writes: > On Thu, Oct 27, 2022 at 01:13:30PM +0800, Huang, Ying wrote: >> Feng Tang writes: >> >> > In page reclaim path, memory could be demoted from faster memory tier >> > to slower memory tier. Currently, there is no check about cpuset's >> > memory policy, that even if the target demotion node is not allowd >> > by cpuset, the demotion will still happen, which breaks the cpuset >> > semantics. >> > >> > So add cpuset policy check in the demotion path and skip demotion >> > if the demotion targets are not allowed by cpuset. >> > >> > Signed-off-by: Feng Tang >> > --- > [...] >> > index 18f6497994ec..c205d98283bc 100644 >> > --- a/mm/vmscan.c >> > +++ b/mm/vmscan.c >> > @@ -1537,9 +1537,21 @@ static struct page *alloc_demote_page(struct page *page, unsigned long private) >> > { >> > struct page *target_page; >> > nodemask_t *allowed_mask; >> > - struct migration_target_control *mtc; >> > + struct migration_target_control *mtc = (void *)private; >> > >> > - mtc = (struct migration_target_control *)private; >> >> I think we should avoid (void *) conversion here. > > OK, will change back. > >> > +#if IS_ENABLED(CONFIG_MEMCG) && IS_ENABLED(CONFIG_CPUSETS) >> > + struct mem_cgroup *memcg; >> > + nodemask_t cpuset_nmask; >> > + >> > + memcg = page_memcg(page); >> > + cpuset_get_allowed_mem_nodes(memcg->css.cgroup, &cpuset_nmask); >> > + >> > + if (!node_isset(mtc->nid, cpuset_nmask)) { >> > + if (mtc->nmask) >> > + nodes_and(*mtc->nmask, *mtc->nmask, cpuset_nmask); >> > + return alloc_migration_target(page, (unsigned long)mtc); >> > + } >> >> If node_isset(mtc->nid, cpuset_nmask) == true, we should keep the >> original 2 steps allocation and apply nodes_and() on node mask. > > Good catch! Yes, the nodes_and() call should be taken out from this > check and done before calling node_isset(). > >> > +#endif >> > >> > allowed_mask = mtc->nmask; >> > /* >> > @@ -1649,6 +1661,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, >> > enum folio_references references = FOLIOREF_RECLAIM; >> > bool dirty, writeback; >> > unsigned int nr_pages; >> > + bool skip_this_demotion = false; >> > >> > cond_resched(); >> > >> > @@ -1658,6 +1671,22 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, >> > if (!folio_trylock(folio)) >> > goto keep; >> > >> > +#if IS_ENABLED(CONFIG_MEMCG) && IS_ENABLED(CONFIG_CPUSETS) >> > + if (do_demote_pass) { >> > + struct mem_cgroup *memcg; >> > + nodemask_t nmask, nmask1; >> > + >> > + node_get_allowed_targets(pgdat, &nmask); >> >> pgdat will not change in the loop, so we can move this out of the loop? > > Yes > >> > + memcg = folio_memcg(folio); >> > + if (memcg) >> > + cpuset_get_allowed_mem_nodes(memcg->css.cgroup, >> > + &nmask1); >> > + >> > + if (!nodes_intersects(nmask, nmask1)) >> > + skip_this_demotion = true; >> > + } >> >> If nodes_intersects() == true, we will call >> cpuset_get_allowed_mem_nodes() twice. Better to pass the intersecting >> mask to demote_folio_list()? > > The pages in the loop may come from different mem control group, and > the cpuset's nodemask could be different, I don't know how to save > this per-page info to be used later in demote_folio_list. Yes. You are right. We cannot do that. Best Regards, Huang, Ying > >> > +#endif >> > + >> > VM_BUG_ON_FOLIO(folio_test_active(folio), folio); >> > >> > nr_pages = folio_nr_pages(folio); >> > @@ -1799,7 +1828,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, >> > * Before reclaiming the folio, try to relocate >> > * its contents to another node. >> > */ >> > - if (do_demote_pass && >> > + if (do_demote_pass && !skip_this_demotion && >> > (thp_migration_supported() || !folio_test_large(folio))) { >> > list_add(&folio->lru, &demote_folios); >> > folio_unlock(folio); >> >> Best Regards, >> Huang, Ying