From: Donet Tom <donettom@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>, Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>, Zi Yan <ziy@nvidia.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
rafael@kernel.org, Danilo Krummrich <dakr@kernel.org>,
Ritesh Harjani <ritesh.list@gmail.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Alison Schofield <alison.schofield@intel.com>,
Yury Norov <yury.norov@gmail.com>,
Dave Jiang <dave.jiang@intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/3] driver/base: Optimize memory block registration to reduce boot time
Date: Mon, 5 May 2025 22:10:47 +0530 [thread overview]
Message-ID: <24a9480b-f494-4e55-82b8-e5443c694c9e@linux.ibm.com> (raw)
In-Reply-To: <a8f66175-dd1b-427c-9980-3a376a3c2fb0@redhat.com>
On 5/5/25 6:32 PM, David Hildenbrand wrote:
> On 05.05.25 14:51, Donet Tom wrote:
>>
>> On 5/5/25 4:06 PM, David Hildenbrand wrote:
>>> On 05.05.25 11:36, Oscar Salvador wrote:
>>>> On Mon, May 05, 2025 at 10:12:48AM +0200, David Hildenbrand wrote:
>>>>> Assume you hotplug the second CPU. The node is already
>>>>> registered/online, so
>>>>> who does the register_cpu_under_node() call?
>>>>>
>>>>> It's register_cpu() I guess? But no idea in which order that is
>>>>> called with
>>>>> node onlining.
>>>>>
>>>>> The code has to be cleaned up such that onlining a node does not
>>>>> traverse
>>>>> any cpus / memory.
>>>>>
>>>>> Whoever adds a CPU / memory *after onlining the node* must
>>>>> register the
>>>>> device manually under the *now online* node.
>>>>
>>>> So, I think this is the sequence of events:
>>>>
>>>> - hotplug cpu:
>>>> acpi_processor_hotadd_init
>>>> register_cpu
>>>> register_cpu_under_node
>>>>
>>>> online_store
>>>> device_online()->dev_bus_online()
>>>> cpu_subsys->online()
>>>> cpu_subsys_online
>>>> cpu_device_up
>>>> cpu_up
>>>> try_online_node <- brings node online
>>>> ...
>>>> register_one_node <- registers cpu under node
>>>> _cpu_up
>>>
>>> My thinking was, whether we can simply move the
>>> register_cpu_under_node() after the try_online_node(). See below
>>> regarding early.
>>>
>>> And then, remove the !node_online check from register_cpu_under_node().
>>>
>>> But it's all complicated, because for memory, we link a memory block
>>> to the node (+set the node online) when it gets added, not when it
>>> gets onlined.
>>>
>>> For CPUs, we seem to be creating the link + set the node online when
>>> the CPU gets onlined.
>>>
>>>>
>>>> The first time we hotplug a cpu to the node, note that
>>>> register_cpu()->register_cpu_under_node() will bail out as node is
>>>> still
>>>> offline, so only cpu's sysfs will be created but they will not be
>>>> linked
>>>> to the node.
>>>> Later, online_store()->...->cpu_subsys_online()->..->cpu_up() will
>>> take> care of 1) onlining the node and 2) register the cpu to the node
>>> (so,
>>>> link the sysfs).
>>>
>>>
>>> And only if it actually gets onlined I assume.
>>>
>>>>
>>>> The second time we hotplug a cpu,
>>>> register_cpu()->register_cpu_under_node() will do its job as the
>>>> node is
>>>> already onlined.
>>>> And we will not be calling register_one_node() from
>>>> __try_online_node()
>>>> because of the same reason.
>>>>
>>>> The thing that bothers me is having register_cpu_under_node() spread
>>>> around.
>>>
>>> Right.
>>>
>>>> I think that ideally, we should only be calling
>>>> register_cpu_under_node()
>>>> from register_cpu(), but we have this kinda of (sort of weird?)
>>>> relation
>>>> that even if we hotplug the cpu, but we do not online it, the numa
>>>> node
>>>> will remain online, and so we cannot do the linking part (cpu <->
>>>> node),
>>>> so we could not really only have register_cpu_under_node() in
>>>> register_cpu(), which is the hot-add part, but we also need it in the
>>>> cpu_up()->try_online_node() which is the online part.
>>>
>>> Maybe one could handle CPUs similar to how we handle it with memory:
>>> node gets onlined + link created as soon as we add the CPU, not when
>>> we online it.
>>>
>>> But likely there is a reason why we do it like that today ...
>>>
>>>>
>>>> And we cannot also remove the register_cpu_under_node() from
>>>> register_cpu() because it is used in other paths (e.g: at boot time ).
>>>
>>> Ah, so in that case we don't call cpu_up ... hm.
>>>
>>> Of course, we can always detect the context (early vs. hotplug).
>>> Maybe, we should split the early vs. hotplug case up much earlier.
>>>
>>> register_cpu_early() / register_cpu_hotplug() ... maybe
>>
>> Hi David and Oscar,
>>
>> I was thinking that __try_online_node(nid, true) being called from
>> try_online_node() might cause issues with this patch. From the
>> discussion above, what I understand is:
>>
>> When try_online_node() is called, there are no memory resources
>> available for the node, so register_memory_blocks_under_node()
>> has no effect. Therefore, our patch should work in all cases.
>
> Right, it's simply unnecessary to perform the lookup ... and confusing.
>
>>
>> Do you think we need to make any changes to this patch?
>
> Probably not to this patch (if it's all working as expected), but we
> should certainly look into cleaning that all up.
Thanks. I'll also check the cleanup part.
next prev parent reply other threads:[~2025-05-05 16:41 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-03 5:40 Donet Tom
2025-05-03 5:40 ` [PATCH v3 2/3] driver/base: remove register_mem_block_under_node_early() Donet Tom
2025-05-03 13:10 ` Zi Yan
2025-05-03 5:40 ` [PATCH v3 3/3] drivers/base : Rename register_memory_blocks_under_node() and remove context argument Donet Tom
2025-05-03 13:10 ` Zi Yan
2025-05-03 13:10 ` [PATCH v3 1/3] driver/base: Optimize memory block registration to reduce boot time Zi Yan
2025-05-04 11:09 ` Mike Rapoport
2025-05-04 16:34 ` Donet Tom
2025-05-04 20:03 ` Andrew Morton
2025-05-05 14:05 ` Mike Rapoport
2025-05-05 7:16 ` David Hildenbrand
2025-05-05 7:28 ` Oscar Salvador
2025-05-05 7:38 ` David Hildenbrand
2025-05-05 7:53 ` Mike Rapoport
2025-05-05 8:18 ` David Hildenbrand
2025-05-05 13:24 ` Mike Rapoport
2025-05-08 9:18 ` David Hildenbrand
2025-05-09 15:40 ` Donet Tom
2025-05-09 21:10 ` Andrew Morton
2025-05-11 6:40 ` Donet Tom
2025-05-11 5:39 ` Mike Rapoport
2025-05-11 12:33 ` Donet Tom
2025-05-05 7:57 ` Oscar Salvador
2025-05-05 8:12 ` David Hildenbrand
2025-05-05 9:36 ` Oscar Salvador
2025-05-05 10:36 ` David Hildenbrand
2025-05-05 12:51 ` Donet Tom
2025-05-05 13:02 ` David Hildenbrand
2025-05-05 16:40 ` Donet Tom [this message]
2025-05-05 13:07 ` Oscar Salvador
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=24a9480b-f494-4e55-82b8-e5443c694c9e@linux.ibm.com \
--to=donettom@linux.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=dakr@kernel.org \
--cc=dave.jiang@intel.com \
--cc=david@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
--cc=rafael@kernel.org \
--cc=ritesh.list@gmail.com \
--cc=rppt@kernel.org \
--cc=yury.norov@gmail.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox