From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 60EA8C433EF
	for <linux-mm@archiver.kernel.org>; Mon, 14 Mar 2022 03:09:52 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 816D96B0071; Sun, 13 Mar 2022 23:09:51 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 79E2B6B0072; Sun, 13 Mar 2022 23:09:51 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 64C808D0001; Sun, 13 Mar 2022 23:09:51 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27])
	by kanga.kvack.org (Postfix) with ESMTP id 4E91C6B0071
	for <linux-mm@kvack.org>; Sun, 13 Mar 2022 23:09:51 -0400 (EDT)
Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 17BCA231CF
	for <linux-mm@kvack.org>; Mon, 14 Mar 2022 03:09:51 +0000 (UTC)
X-FDA: 79241512182.08.CA5EBBD
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
	by imf26.hostedemail.com (Postfix) with ESMTP id 83B5314002D
	for <linux-mm@kvack.org>; Mon, 14 Mar 2022 03:09:49 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1647227389; x=1678763389;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version;
  bh=1oLdrFBl+WtRXo+l9dtBMSr3UK63CaXWCwzjWkONaG4=;
  b=SCd1LzcFzsp3ozr+ZDPk+k9lUE9PZmytE1gp4LJy+hrPJDmdu99mz7Jx
   xy6yLLWb1ovB26rlYCHfCs2lq+70l0Vq9Q1cIASUyX0sbe3q7hljDniFp
   jL0xR6H0kFx66PHBqe+Cd06mvAfUh0WvXhi8stjIPcMxEHthEqFodRbuD
   zIp3PdDVvgjrxrliN3jds+MsXRa/tKPSKXViXxJvIquIFFp895LTkR2W2
   kGm8JBskz9PF1qMKOQBEC8B5z1ZOTkKJggvnedvpHzzP6KX//14I+drXV
   OMtpdMLo0sHXHnmXjMbCVAbsbw/HLJPaU0GlbeW7StCP5w3ZxLRu+0QMw
   g==;
X-IronPort-AV: E=McAfee;i="6200,9189,10285"; a="256111172"
X-IronPort-AV: E=Sophos;i="5.90,179,1643702400"; 
   d="scan'208";a="256111172"
Received: from orsmga008.jf.intel.com ([10.7.209.65])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2022 20:09:48 -0700
X-IronPort-AV: E=Sophos;i="5.90,179,1643702400"; 
   d="scan'208";a="556190613"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.94])
  by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2022 20:09:46 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,  Dave Hansen
 <dave.hansen@linux.intel.com>,  Abhishek Goel
 <huntbag@linux.vnet.ibm.com>,  Baolin Wang
 <baolin.wang@linux.alibaba.com>,  linux-mm@kvack.org,
  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm: Only re-generate demotion targets when a numa
 node changes its N_CPU state
References: <20220310120749.23077-1-osalvador@suse.de>
	<87a6dxaxil.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<YisTwzumn3tgL9H4@localhost.localdomain>
Date: Mon, 14 Mar 2022 11:09:44 +0800
In-Reply-To: <YisTwzumn3tgL9H4@localhost.localdomain> (Oscar Salvador's
	message of "Fri, 11 Mar 2022 10:17:55 +0100")
Message-ID: <878rtd6xhj.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
X-Rspam-User: 
X-Rspamd-Queue-Id: 83B5314002D
X-Stat-Signature: 7ako58bdp7az7y194nzb5iw5o61tp96w
Authentication-Results: imf26.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=SCd1LzcF;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=none (imf26.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=ying.huang@intel.com
X-Rspamd-Server: rspam03
X-HE-Tag: 1647227389-984932
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Oscar Salvador <osalvador@suse.de> writes:

> On Fri, Mar 11, 2022 at 01:06:26PM +0800, Huang, Ying wrote:
>> Oscar Salvador <osalvador@suse.de> writes:
>> > -static int __init migrate_on_reclaim_init(void)
>> > -{
>> > -	int ret;
>> > -
>> >  	node_demotion = kmalloc_array(nr_node_ids,
>> >  				      sizeof(struct demotion_nodes),
>> >  				      GFP_KERNEL);
>> >  	WARN_ON(!node_demotion);
>> >  
>> > -	ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",
>> > -					NULL, migration_offline_cpu);
>> >  	/*
>> > -	 * In the unlikely case that this fails, the automatic
>> > -	 * migration targets may become suboptimal for nodes
>> > -	 * where N_CPU changes.  With such a small impact in a
>> > -	 * rare case, do not bother trying to do anything special.
>> > +	 * At this point, all numa nodes with memory/CPus have their state
>> > +	 * properly set, so we can build the demotion order now.
>> >  	 */
>> > -	WARN_ON(ret < 0);
>> > -	ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
>> > -				migration_online_cpu, NULL);
>> > -	WARN_ON(ret < 0);
>> > -
>> > +	set_migration_target_nodes();
>> 
>> If my understanding were correct, we should enclose
>> set_migration_target_nodes() here with cpus_read_lock().  And add some
>> comment before set_migration_target_nodes() for this.  I don't know
>> whether the locking order is right.
>
> Oh, I see that cpuhp_setup_state() holds the cpu-hotplug lock while
> calling in, so yeah, we might want to hold in there.
>
> The thing is, not long ago we found out that we could have ACPI events
> like memory-hotplug operations at boot stage [1], so I guess it is
> safe to assume we could also have cpu-hotplug operations at that stage
> as well, and so we want to hold cpus_read_lock() just to be on the safe
> side.
>
> But, unless I am missing something, that does not apply to
> set_migration_target_nodes() being called from a callback,
> as the callback (somewhere up the chain) already holds that lock.
> e.g: (_cpu_up takes cpus_write_lock()) and the same for the down
> operation.
>
> So, to sum it up, we only need the cpus_read_lock() in
> migrate_on_reclaim_init().

Yes.  That is what I want to say.  Sorry for confusing.

>> >  	hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
>> 
>> And we should register the notifier before calling set_migration_target_nodes()?
>
> I cannot made my mind here.
> The primary reason I placed the call before registering the notifier is
> because the original code called set_migration_target_nodes() before
> doing so:
>
> <--
> ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
> 			migration_online_cpu, NULL);
> WARN_ON(ret < 0);
>
> hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
> -->
>
> I thought about following the same line. Why do you think it should be
> called afterwards?
>
> I am not really sure whether it has a different impact depending on the
> order.
> Note that memory-hotplug acpi events can happen at boot time, so by the
> time we register the memory_hotplug notifier, we can have some hotplug
> memory coming in, and so we call set_migration_target_nodes().
>
> But that is fine, and I cannot see a difference shufling the order
> of them. 
> Do you see a problem in there?

Per my understanding, the race condition as follows may be possible in
theory,

CPU1                                CPU2
----                                ----
set_migration_target_nodes()
                                <-- // a new node is hotplugged, and missed
hotplug_memory_notifier()

During boot, this may be impossible in practice.  But I still think it's
good to make the order correct in general.  And it's not hard to do that.

Best Regards,
Huang, Ying

> [1] https://patchwork.kernel.org/project/linux-mm/patch/20200915094143.79181-3-ldufour@linux.ibm.com/