From: Gregory Price <gourry@gourry.net>
To: Honggyu Kim <honggyu.kim@sk.com>
Cc: kernel_team@skhynix.com, Joshua Hahn <joshua.hahnjy@gmail.com>,
harry.yoo@oracle.com, ying.huang@linux.alibaba.com,
gregkh@linuxfoundation.org, rakie.kim@sk.com,
akpm@linux-foundation.org, rafael@kernel.org, lenb@kernel.org,
dan.j.williams@intel.com, Jonathan.Cameron@huawei.com,
dave.jiang@intel.com, horen.chuang@linux.dev, hannes@cmpxchg.org,
linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
linux-mm@kvack.org, kernel-team@meta.com, yunjeong.mun@sk.com
Subject: Re: [PATCH 2/2 v6] mm/mempolicy: Don't create weight sysfs for memoryless nodes
Date: Tue, 4 Mar 2025 11:16:16 -0500 [thread overview]
Message-ID: <Z8cnUA9WqsscbUtm@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <95541985-8d40-4ded-a83e-46203c441640@sk.com>
On Tue, Mar 04, 2025 at 10:03:22PM +0900, Honggyu Kim wrote:
> Hi Gregory,
>
> > This patch may have been a bit overzealous of us, I forgot to ask
> > whether N_MEMORY is set for nodes created but not onlined at boot. So
> > this is a good observation.
>
> I didn't want to make more noise but we found many issues again after
> getting a new machine and started using it with multiple CXL memory.
>
I spent yesterday looking into how nodes are created and marked N_MEMORY
and I think now that this patch is just not correct.
N_MEMORY for a given nid is toggled:
1) during mm_init if any page is associated with that node (DRAM)
2) memory_hotplug when a memory block is onlined/offlined (CXL)
This means a CXL node which is deferred to the driver will come up as
memoryless at boot (mm_init) but has N_MEMORY toggled on when the first
hotplug memory block is onlined. However, its access_coordinate data is
reported during cxl driver probe - well prior to memory hotplug.
This means we must expose a node entry for every possible node, always,
because we can't predict what nodes will have hotplug memory.
We COULD try to react to hotplug memory blocks, but this increase in
complexity just doesn't seem worth the hassle - the hotplug callback has
timing restrictions (callback must occur AFTER N_MEMORY is toggled).
It seems better to include all nodes with reported data in the reduction.
This has two downsides:
1) stale data may be used if hotplug occurs and the new device does
not have CDAT/HMAT/access_coordinate data.
2) any device without CDAT/HMAT/access_coordinate data will not be
included in the reduction by default.
I think we can work around #2 by detecting this (on reduction, if data
is missing but N_MEMORY is set, fire a warning). We can't do much about
#1 unless we field physical device hot-un-plug callbacks - and that
seems like a bit much.
~Gregory
next prev parent reply other threads:[~2025-03-04 16:16 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20250228001631.1102-1-yunjeong.mun@sk.com>
2025-02-26 21:35 ` [PATCH 1/2 v6] mm/mempolicy: Weighted Interleave Auto-tuning Joshua Hahn
2025-02-26 21:35 ` [PATCH 2/2 v6] mm/mempolicy: Don't create weight sysfs for memoryless nodes Joshua Hahn
2025-02-27 2:32 ` Honggyu Kim
2025-02-27 3:20 ` Honggyu Kim
2025-03-03 21:56 ` Joshua Hahn
2025-03-04 12:53 ` Honggyu Kim
2025-03-03 16:19 ` Gregory Price
2025-03-04 13:03 ` Honggyu Kim
2025-03-04 16:16 ` Gregory Price [this message]
2025-03-04 16:29 ` Gregory Price
2025-03-06 12:39 ` Honggyu Kim
2025-03-06 17:32 ` Gregory Price
2025-03-07 11:46 ` Honggyu Kim
2025-03-07 17:51 ` Gregory Price
2025-03-10 12:26 ` Honggyu Kim
2025-03-10 14:22 ` Gregory Price
2025-03-11 2:07 ` Yunjeong Mun
2025-03-11 2:42 ` Gregory Price
2025-03-11 4:02 ` Yunjeong Mun
2025-03-11 4:42 ` Gregory Price
2025-03-11 9:51 ` Yunjeong Mun
2025-03-11 15:52 ` Gregory Price
2025-03-18 8:02 ` Yunjeong Mun
2025-03-18 11:02 ` Honggyu Kim
2025-03-18 15:13 ` Gregory Price
2025-03-19 9:56 ` Yunjeong Mun
2025-03-19 14:54 ` Gregory Price
2025-02-28 0:16 ` [PATCH 1/2 v6] mm/mempolicy: Weighted Interleave Auto-tuning yunjeong.mun
2025-02-28 6:39 ` Yunjeong Mun
2025-02-28 16:24 ` Joshua Hahn
2025-03-04 21:56 ` Joshua Hahn
2025-03-04 22:22 ` Joshua Hahn
2025-03-05 9:49 ` Yunjeong Mun
2025-03-05 16:28 ` Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8cnUA9WqsscbUtm@gourry-fedora-PF4VCD3F \
--to=gourry@gourry.net \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=honggyu.kim@sk.com \
--cc=horen.chuang@linux.dev \
--cc=joshua.hahnjy@gmail.com \
--cc=kernel-team@meta.com \
--cc=kernel_team@skhynix.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rafael@kernel.org \
--cc=rakie.kim@sk.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yunjeong.mun@sk.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox