From: "Huang, Ying" <ying.huang@intel.com>
To: Gregory Price <gourry@gourry.net>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
dave.jiang@intel.com, Jonathan.Cameron@huawei.com,
horenchuang@bytedance.com, linux-kernel@vger.kernel.org,
linux-acpi@vger.kernel.org, dan.j.williams@intel.com,
lenb@kernel.org, Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Subject: Re: [PATCH] acpi/hmat,mm/memtier: always register hmat adist calculation callback
Date: Tue, 30 Jul 2024 09:12:55 +0800 [thread overview]
Message-ID: <877cd3u1go.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <ZqelvPwM2MIG26wY@PC2K9PVX.TheFacebook.com> (Gregory Price's message of "Mon, 29 Jul 2024 10:22:52 -0400")
Gregory Price <gourry@gourry.net> writes:
> On Mon, Jul 29, 2024 at 09:02:33AM +0800, Huang, Ying wrote:
>> Gregory Price <gourry@gourry.net> writes:
>>
>> > In the event that hmat data is not available for the DRAM tier,
>> > or if it is invalid (bandwidth or latency is 0), we can still register
>> > a callback to calculate the abstract distance for non-cpu nodes
>> > and simply assign it a different tier manually.
>> >
>> > In the case where DRAM HMAT values are missing or not sane we
>> > manually assign adist=(MEMTIER_ADISTANCE_DRAM + MEMTIER_CHUNK_SIZE).
>> >
>> > If the HMAT data for the non-cpu tier is invalid (e.g. bw = 0), we
>> > cannot reasonable determine where to place the tier, so it will default
>> > to MEMTIER_ADISTANCE_DRAM (which is the existing behavior).
>>
>> Why do we need this? Do you have machines with broken HMAT table? Can
>> you ask the vendor to fix the HMAT table?
>>
>
> It's a little unclear from the ACPI specification whether HMAT is
> technically optional or not (given that the kernel handles missing HMAT
> gracefully, it certainly seems optional). In one scenario I have seen
> incorrect data, and in another scenario I have seen the HMAT omitted
> entirely. In another scenario I have seen the HMAT-SLLBI omitted while
> the CDAT is present.
IIUC, HMAT is optional. Is it possible for you to ask the system vendor
to fix the broken HMAT table.
> In all scenarios the result is the same: all nodes in the same tier.
I don't think so, in drivers/dax/kmem.c, we will put memory devices
onlined by kmem.c in another tier by default.
> The HMAT is explicitly described as "A hint" in the ACPI spec.
>
> ACPI 5.2.28.1 HMAT Overview
>
> "The software is expected to use this information as a hint for
> optimization, or when the system has heterogeneous memory"
>
> If something is "a hint", then it should not be used prescriptively.
>
> Right now HMAT appears to be used prescriptively, this despite the fact
> that there was a clear intent to separate CPU-nodes and non-CPU-nodes in
> the memory-tier code. So this patch simply realizes this intent when the
> hints are not very reasonable.
If HMAT isn't available, it's hard to put memory devices to
appropriate memory tiers without other information. In commit
992bf77591cb ("mm/demotion: add support for explicit memory tiers"),
Aneesh pointed out that it doesn't work for his system to put
non-CPU-nodes in lower tier.
Even if we want to use other information to put memory devices to memory
tiers, we can register another adist calculation callback instead of
reusing hmat callback.
--
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2024-07-30 1:16 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-26 21:55 Gregory Price
2024-07-29 1:02 ` Huang, Ying
2024-07-29 14:22 ` Gregory Price
2024-07-30 1:12 ` Huang, Ying [this message]
2024-07-30 3:18 ` Gregory Price
2024-07-31 1:22 ` Huang, Ying
2024-07-30 19:58 ` Gregory Price
2024-07-31 7:20 ` Huang, Ying
2024-07-30 20:26 ` Gregory Price
2024-08-27 14:33 ` Gregory Price
2024-07-30 5:19 ` Gregory Price
2024-07-30 6:12 ` Gregory Price
2024-07-31 1:10 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877cd3u1go.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=gourry@gourry.net \
--cc=horenchuang@bytedance.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox