From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA7FCC433EF for ; Tue, 7 Jun 2022 22:52:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DF3A6B0073; Tue, 7 Jun 2022 18:52:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 38DF16B0074; Tue, 7 Jun 2022 18:52:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2558A6B0075; Tue, 7 Jun 2022 18:52:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 179076B0073 for ; Tue, 7 Jun 2022 18:52:09 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id D746F60B5C for ; Tue, 7 Jun 2022 22:52:08 +0000 (UTC) X-FDA: 79552939536.07.ABEBB91 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf01.hostedemail.com (Postfix) with ESMTP id CDD5940067 for ; Tue, 7 Jun 2022 22:52:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654642324; x=1686178324; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=khH4Qxle0YPqCnpZ0X/zFyy4Qo0r/sKFGCE5CQoJbzc=; b=C14vB7MTeNiJtp//53O0Vp/KkkoPDS20lasbORaplCyPsZZSw88WZy4X ATbvYtBmoGfwraUjjdeeilcmJTNT8d4CpbN9I3CUnloPo2IGjOqJknWQU Ijm7tOQCWZuw0G7oGDRAA98Z5ORM/F48WM/DW8Gj4rRvovieEbd/MWB0Z nX60oFqJpDjL6Xvx5u9IgiQrHT5yx2XuCDTxdChzUPkCosEjm7nsdpIml i6bOuHRdrhOGStSQVD24LzgUYcbnFi81r1pFnqKYpzn6VnIi51bPOmtFV YMb7KMsIa/2HH9tdHyMk4snxtMtJZyBreEelcNSR58Y/nl7ePjqbRhj4q w==; X-IronPort-AV: E=McAfee;i="6400,9594,10371"; a="265431103" X-IronPort-AV: E=Sophos;i="5.91,284,1647327600"; d="scan'208";a="265431103" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 15:51:58 -0700 X-IronPort-AV: E=Sophos;i="5.91,284,1647327600"; d="scan'208";a="609400279" Received: from schen9-mobl.amr.corp.intel.com ([10.251.8.166]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 15:51:57 -0700 Message-ID: Subject: Re: [PATCH v5 4/9] mm/demotion: Build demotion targets based on explicit memory tiers From: Tim Chen To: "Aneesh Kumar K.V" , linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Huang Ying , Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes Date: Tue, 07 Jun 2022 15:51:57 -0700 In-Reply-To: <20220603134237.131362-5-aneesh.kumar@linux.ibm.com> References: <20220603134237.131362-1-aneesh.kumar@linux.ibm.com> <20220603134237.131362-5-aneesh.kumar@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CDD5940067 X-Stat-Signature: w3a3xcioaf1yfn5dmyumpzd3at8yzu8k X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=C14vB7MT; spf=none (imf01.hostedemail.com: domain of tim.c.chen@linux.intel.com has no SPF policy when checking 134.134.136.20) smtp.mailfrom=tim.c.chen@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1654642324-919175 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote: > > +int next_demotion_node(int node) > +{ > + struct demotion_nodes *nd; > + int target, nnodes, i; > + > + if (!node_demotion) > + return NUMA_NO_NODE; > + > + nd = &node_demotion[node]; > + > + /* > + * node_demotion[] is updated without excluding this > + * function from running. > + * > + * Make sure to use RCU over entire code blocks if > + * node_demotion[] reads need to be consistent. > + */ > + rcu_read_lock(); > + > + nnodes = nodes_weight(nd->preferred); > + if (!nnodes) > + return NUMA_NO_NODE; > + > + /* > + * If there are multiple target nodes, just select one > + * target node randomly. > + * > + * In addition, we can also use round-robin to select > + * target node, but we should introduce another variable > + * for node_demotion[] to record last selected target node, > + * that may cause cache ping-pong due to the changing of > + * last target node. Or introducing per-cpu data to avoid > + * caching issue, which seems more complicated. So selecting > + * target node randomly seems better until now. > + */ > + nnodes = get_random_int() % nnodes; > + target = first_node(nd->preferred); > + for (i = 0; i < nnodes; i++) > + target = next_node(target, nd->preferred); We can simplify the above 4 lines. target = node_random(nd->preferred); There's still a loop overhead though :( > + > + rcu_read_unlock(); > + > + return target; > +} > + > > + */ > +static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, > + unsigned long action, void *_arg) > +{ > + struct memory_notify *arg = _arg; > + > + /* > + * Only update the node migration order when a node is > + * changing status, like online->offline. > + */ > + if (arg->status_change_nid < 0) > + return notifier_from_errno(0); > + > + switch (action) { > + case MEM_OFFLINE: > + /* > + * In case we are moving out of N_MEMORY. Keep the node > + * in the memory tier so that when we bring memory online, > + * they appear in the right memory tier. We still need > + * to rebuild the demotion order. > + */ > + mutex_lock(&memory_tier_lock); > + establish_migration_targets(); > + mutex_unlock(&memory_tier_lock); > + break; > + case MEM_ONLINE: > + /* > + * We ignore the error here, if the node already have the tier > + * registered, we will continue to use that for the new memory > + * we are adding here. > + */ > + node_set_memory_tier(arg->status_change_nid, DEFAULT_MEMORY_TIER); Should establish_migration_targets() be run here? Otherwise what are the demotion targets for this newly onlined node? > + break; > + } > + > + return notifier_from_errno(0); > +} > + Tim