From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 096C6C433EF
	for <linux-mm@archiver.kernel.org>; Mon,  6 Jun 2022 00:44:30 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 7732C8D0003; Sun,  5 Jun 2022 20:44:29 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 6F9A48D0002; Sun,  5 Jun 2022 20:44:29 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 5709B8D0003; Sun,  5 Jun 2022 20:44:29 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 3E2008D0002
	for <linux-mm@kvack.org>; Sun,  5 Jun 2022 20:44:29 -0400 (EDT)
Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay13.hostedemail.com (Postfix) with ESMTP id 0E691604FD
	for <linux-mm@kvack.org>; Mon,  6 Jun 2022 00:44:29 +0000 (UTC)
X-FDA: 79545965058.21.47E028A
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
	by imf01.hostedemail.com (Postfix) with ESMTP id ACB4E40005
	for <linux-mm@kvack.org>; Mon,  6 Jun 2022 00:44:16 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1654476266; x=1686012266;
  h=message-id:subject:from:to:cc:date:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=v72JwO4ZEs4E4qmYCJBQG3/w1N1+brvvHURcuUQL7aA=;
  b=hp430w6P3yzZiKRTiOoIgwdAZDRf4kExmsBNkcyoazmCp0uvWC7OilB7
   sMvOEYDn4NacTOzEiByzrgbgEVWjz/yzxkzesyD58oRjjGWxmAuHoDLPl
   wFb7N2Ytp1QOa+J0IoJDzG1XbnO0khVYJ4DPulFjphIxfoWsBFxok4b9U
   68rrS12QJfmfmrgW9A9dQYCLulTcOoHjUy1AjWG/42bfQUXOERq26C7UY
   hpNkF3tMXEBTk7B/aPBYe+rO6uA2eObVXuW4amVE8cvbyQCBXFEbqMqqK
   rF7Gu50UjfEIYSi8cUeyRAVt9YcjYg/Kf4eySzTWBNefaKDwSofB3tWv6
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10369"; a="257052858"
X-IronPort-AV: E=Sophos;i="5.91,280,1647327600"; 
   d="scan'208";a="257052858"
Received: from fmsmga007.fm.intel.com ([10.253.24.52])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jun 2022 17:44:10 -0700
X-IronPort-AV: E=Sophos;i="5.91,280,1647327600"; 
   d="scan'208";a="583354699"
Received: from xingguom-mobl.ccr.corp.intel.com ([10.254.213.116])
  by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jun 2022 17:44:06 -0700
Message-ID: <9f6e60cc8be3cbde4871458c612c5c31d2a9e056.camel@intel.com>
Subject: Re: [RFC PATCH v4 7/7] mm/demotion: Demote pages according to
 allocation fallback order
From: Ying Huang <ying.huang@intel.com>
To: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>, linux-mm@kvack.org, 
	akpm@linux-foundation.org
Cc: Greg Thelen <gthelen@google.com>, Yang Shi <shy828301@gmail.com>, 
 Davidlohr Bueso <dave@stgolabs.net>, Tim C Chen <tim.c.chen@intel.com>,
 Brice Goglin <brice.goglin@gmail.com>,  Michal Hocko <mhocko@kernel.org>,
 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Hesham Almatary
 <hesham.almatary@huawei.com>, Dave Hansen <dave.hansen@intel.com>, Jonathan
 Cameron <Jonathan.Cameron@huawei.com>, Alistair Popple
 <apopple@nvidia.com>, Dan Williams <dan.j.williams@intel.com>, Feng Tang
 <feng.tang@intel.com>, Jagdish Gediya <jvgediya@linux.ibm.com>, Baolin Wang
 <baolin.wang@linux.alibaba.com>, David Rientjes <rientjes@google.com>
Date: Mon, 06 Jun 2022 08:43:44 +0800
In-Reply-To: <046c373a-f30b-091d-47a1-e28bfb7e9394@linux.ibm.com>
References: 
	<CAAPL-u-dFp7PwPH6DfbYdnY8xaGsHz3tRQ0CPGVkiqURvdN8=A@mail.gmail.com>
	 <20220527122528.129445-1-aneesh.kumar@linux.ibm.com>
	 <20220527122528.129445-8-aneesh.kumar@linux.ibm.com>
	 <b102d5773bffd6391283773044f756e810c1f044.camel@intel.com>
	 <046c373a-f30b-091d-47a1-e28bfb7e9394@linux.ibm.com>
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.38.3-1 
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Rspamd-Queue-Id: ACB4E40005
X-Stat-Signature: tbbf4f865jqqeej9541t6iw1uuh11y9h
X-Rspam-User: 
Authentication-Results: imf01.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=hp430w6P;
	spf=none (imf01.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
X-Rspamd-Server: rspam08
X-HE-Tag: 1654476256-302174
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, 2022-06-03 at 20:39 +0530, Aneesh Kumar K V wrote:
> On 6/2/22 1:05 PM, Ying Huang wrote:
> > On Fri, 2022-05-27 at 17:55 +0530, Aneesh Kumar K.V wrote:
> > > From: Jagdish Gediya <jvgediya@linux.ibm.com>
> > > 
> > > currently, a higher tier node can only be demoted to selected
> > > nodes on the next lower tier as defined by the demotion path,
> > > not any other node from any lower tier.  This strict, hard-coded
> > > demotion order does not work in all use cases (e.g. some use cases
> > > may want to allow cross-socket demotion to another node in the same
> > > demotion tier as a fallback when the preferred demotion node is out
> > > of space). This demotion order is also inconsistent with the page
> > > allocation fallback order when all the nodes in a higher tier are
> > > out of space: The page allocation can fall back to any node from any
> > > lower tier, whereas the demotion order doesn't allow that currently.
> > > 
> > > This patch adds support to get all the allowed demotion targets mask
> > > for node, also demote_page_list() function is modified to utilize this
> > > allowed node mask by filling it in migration_target_control structure
> > > before passing it to migrate_pages().
> > 
> 
> ...
> 
> > >    * Take pages on @demote_list and attempt to demote them to
> > >    * another node.  Pages which are not demoted are left on
> > > @@ -1481,6 +1464,19 @@ static unsigned int demote_page_list(struct list_head *demote_pages,
> > >   {
> > >   	int target_nid = next_demotion_node(pgdat->node_id);
> > >   	unsigned int nr_succeeded;
> > > +	nodemask_t allowed_mask;
> > > +
> > > +	struct migration_target_control mtc = {
> > > +		/*
> > > +		 * Allocate from 'node', or fail quickly and quietly.
> > > +		 * When this happens, 'page' will likely just be discarded
> > > +		 * instead of migrated.
> > > +		 */
> > > +		.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
> > > +			__GFP_NOMEMALLOC | GFP_NOWAIT,
> > > +		.nid = target_nid,
> > > +		.nmask = &allowed_mask
> > > +	};
> > 
> > IMHO, we should try to allocate from preferred node firstly (which will
> > kick kswapd of the preferred node if necessary).  If failed, we will
> > fallback to all allowed node.
> > 
> > As we discussed as follows,
> > 
> > https://lore.kernel.org/lkml/69f2d063a15f8c4afb4688af7b7890f32af55391.camel@intel.com/
> > 
> > That is, something like below,
> > 
> > static struct page *alloc_demote_page(struct page *page, unsigned long node)
> > {
> > 	struct page *page;
> > 	nodemask_t allowed_mask;
> > 	struct migration_target_control mtc = {
> > 		/*
> > 		 * Allocate from 'node', or fail quickly and quietly.
> > 		 * When this happens, 'page' will likely just be discarded
> > 		 * instead of migrated.
> > 		 */
> > 		.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
> > 			    __GFP_THISNODE  | __GFP_NOWARN |
> > 			    __GFP_NOMEMALLOC | GFP_NOWAIT,
> > 		.nid = node
> > 	};
> > 
> > 	page = alloc_migration_target(page, (unsigned long)&mtc);
> > 	if (page)
> > 		return page;
> > 
> > 	mtc.gfp_mask &= ~__GFP_THISNODE;
> > 	mtc.nmask = &allowed_mask;
> > 
> > 	return alloc_migration_target(page, (unsigned long)&mtc);
> > }
> 
> I skipped doing this in v5 because I was not sure this is really what we 
> want.

I think so.  And this is the original behavior.  We should keep the
original behavior as much as possible, then make changes if necessary.

> I guess we can do this as part of the change that is going to 
> introduce the usage of memory policy for the allocation?

Like the memory allocation policy, the default policy should be local
preferred.  We shouldn't force users to use explicit memory policy for
that.

And the added code isn't complex.

Best Regards,
Huang, Ying