* Barlopass nvdimm as MemoryMode question
@ 2024-03-07 20:33 Jane Chu
2024-03-07 21:05 ` Dan Williams
0 siblings, 1 reply; 7+ messages in thread
From: Jane Chu @ 2024-03-07 20:33 UTC (permalink / raw)
To: dan.j.williams, vishal.l.verma, nvdimm; +Cc: Linux-MM
[-- Attachment #1: Type: text/plain, Size: 171 bytes --]
Hi, Dan and Vishal,
What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
backed by Barlopass nvdimms configured in MemoryMode by impctl ?
Thanks!
-jane
[-- Attachment #2: Type: text/html, Size: 503 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question
2024-03-07 20:33 Barlopass nvdimm as MemoryMode question Jane Chu
@ 2024-03-07 21:05 ` Dan Williams
2024-03-08 0:30 ` Jane Chu
0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-03-07 21:05 UTC (permalink / raw)
To: Jane Chu, dan.j.williams, vishal.l.verma, nvdimm; +Cc: Linux-MM
Jane Chu wrote:
> Hi, Dan and Vishal,
>
> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
As always, the NUMA description, is a property of the platform not the
media type / DIMM. The ACPI HMAT desrcibes the details of a
memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
6.4.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question
2024-03-07 21:05 ` Dan Williams
@ 2024-03-08 0:30 ` Jane Chu
2024-03-08 0:49 ` Dan Williams
0 siblings, 1 reply; 7+ messages in thread
From: Jane Chu @ 2024-03-08 0:30 UTC (permalink / raw)
To: Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins
Add Joao.
On 3/7/2024 1:05 PM, Dan Williams wrote:
> Jane Chu wrote:
>> Hi, Dan and Vishal,
>>
>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
>> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
> As always, the NUMA description, is a property of the platform not the
> media type / DIMM. The ACPI HMAT desrcibes the details of a
> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
> 6.4.
Thanks! So, compare to dax_kmem which assign a numa node to a newly
converted pmem/SysRAM region, w.r.t. pmem in MemoryMode, is there any
clue that kernel exposes(or could expose) to userland about the extra
latency such that userland may treat these memory regions differently?
thanks,
-jane
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question
2024-03-08 0:30 ` Jane Chu
@ 2024-03-08 0:49 ` Dan Williams
2024-03-08 1:42 ` Jane Chu
0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-03-08 0:49 UTC (permalink / raw)
To: Jane Chu, Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins
Jane Chu wrote:
> Add Joao.
>
> On 3/7/2024 1:05 PM, Dan Williams wrote:
>
> > Jane Chu wrote:
> >> Hi, Dan and Vishal,
> >>
> >> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
> >> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
> > As always, the NUMA description, is a property of the platform not the
> > media type / DIMM. The ACPI HMAT desrcibes the details of a
> > memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
> > 6.4.
>
> Thanks! So, compare to dax_kmem which assign a numa node to a newly
> converted pmem/SysRAM region,
...to be clear, dax_kmem is not creating a new NUMA node, it is just
potentially onlining a proximity domain that was fully described by ACPI
SRAT but offline.
> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or
> could expose) to userland about the extra latency such that userland
> may treat these memory regions differently?
Userland should be able to interrogate the memory_side_cache/ property
in NUMA sysfs:
https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache
Otherwise I believe SRAT and SLIT for that node only reflect the
performance of the DDR fronting the PMEM. So if you have a DDR node and
DDR+PMEM cache node, they may look the same from the ACPI SLIT
perspective, but the ACPI HMAT contains the details of the backing
memory. The Linux NUMA performance sysfs interface gets populated by
ACPI HMAT.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question
2024-03-08 0:49 ` Dan Williams
@ 2024-03-08 1:42 ` Jane Chu
2024-03-08 1:57 ` Jane Chu
0 siblings, 1 reply; 7+ messages in thread
From: Jane Chu @ 2024-03-08 1:42 UTC (permalink / raw)
To: Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins, Jane Chu
On 3/7/2024 4:49 PM, Dan Williams wrote:
> Jane Chu wrote:
>> Add Joao.
>>
>> On 3/7/2024 1:05 PM, Dan Williams wrote:
>>
>>> Jane Chu wrote:
>>>> Hi, Dan and Vishal,
>>>>
>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
>>> As always, the NUMA description, is a property of the platform not the
>>> media type / DIMM. The ACPI HMAT desrcibes the details of a
>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
>>> 6.4.
>> Thanks! So, compare to dax_kmem which assign a numa node to a newly
>> converted pmem/SysRAM region,
> ...to be clear, dax_kmem is not creating a new NUMA node, it is just
> potentially onlining a proximity domain that was fully described by ACPI
> SRAT but offline.
>
>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or
>> could expose) to userland about the extra latency such that userland
>> may treat these memory regions differently?
> Userland should be able to interrogate the memory_side_cache/ property
> in NUMA sysfs:
>
> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache
>
> Otherwise I believe SRAT and SLIT for that node only reflect the
> performance of the DDR fronting the PMEM. So if you have a DDR node and
> DDR+PMEM cache node, they may look the same from the ACPI SLIT
> perspective, but the ACPI HMAT contains the details of the backing
> memory. The Linux NUMA performance sysfs interface gets populated by
> ACPI HMAT.
Thanks Dan.
Please correct me if I'm mistaken: if I configure some barlowpass
nvdimms to MemoryMode and reboot, as those regions of memory is
automatically two level with DDR as the front cache, so hmat_init() is
expected to create the memory_side_cache/indexN interface, and if I see
multiple indexN layers, that would be a sign that pmem in MemoryMode is
present, right?
I've yet to grab hold of a system to confirm this, but apparently with
only DDR memory, memory_side_cache/ doesn't exist.
thanks!
-jane
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question
2024-03-08 1:42 ` Jane Chu
@ 2024-03-08 1:57 ` Jane Chu
2024-03-08 2:53 ` Dan Williams
0 siblings, 1 reply; 7+ messages in thread
From: Jane Chu @ 2024-03-08 1:57 UTC (permalink / raw)
To: Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins, Jane Chu
On 3/7/2024 5:42 PM, Jane Chu wrote:
> On 3/7/2024 4:49 PM, Dan Williams wrote:
>
>> Jane Chu wrote:
>>> Add Joao.
>>>
>>> On 3/7/2024 1:05 PM, Dan Williams wrote:
>>>
>>>> Jane Chu wrote:
>>>>> Hi, Dan and Vishal,
>>>>>
>>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
>>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
>>>> As always, the NUMA description, is a property of the platform not the
>>>> media type / DIMM. The ACPI HMAT desrcibes the details of a
>>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
>>>> 6.4.
>>> Thanks! So, compare to dax_kmem which assign a numa node to a newly
>>> converted pmem/SysRAM region,
>> ...to be clear, dax_kmem is not creating a new NUMA node, it is just
>> potentially onlining a proximity domain that was fully described by ACPI
>> SRAT but offline.
>>
>>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or
>>> could expose) to userland about the extra latency such that userland
>>> may treat these memory regions differently?
>> Userland should be able to interrogate the memory_side_cache/ property
>> in NUMA sysfs:
>>
>> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache
>>
>> Otherwise I believe SRAT and SLIT for that node only reflect the
>> performance of the DDR fronting the PMEM. So if you have a DDR node and
>> DDR+PMEM cache node, they may look the same from the ACPI SLIT
>> perspective, but the ACPI HMAT contains the details of the backing
>> memory. The Linux NUMA performance sysfs interface gets populated by
>> ACPI HMAT.
>
> Thanks Dan.
>
> Please correct me if I'm mistaken: if I configure some barlowpass
> nvdimms to MemoryMode and reboot, as those regions of memory is
> automatically two level with DDR as the front cache, so hmat_init() is
> expected to create the memory_side_cache/indexN interface, and if I
> see multiple indexN layers, that would be a sign that pmem in
> MemoryMode is present, right?
>
> I've yet to grab hold of a system to confirm this, but apparently with
> only DDR memory, memory_side_cache/ doesn't exist.
On each CPU socket node, we have
| |-memory_side_cache | | |-uevent | | |-power | | |-index1 | | |
|-uevent | | | |-power | | | |-line_size | | | |-write_policy | | |
|-size | | | |-indexing
where 'indexing' = 0, means direct-mapped cache?, so is that a clue that
slower/far-memory is behind the cache?
thanks!
-jane
>
> thanks!
>
> -jane
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question
2024-03-08 1:57 ` Jane Chu
@ 2024-03-08 2:53 ` Dan Williams
0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2024-03-08 2:53 UTC (permalink / raw)
To: Jane Chu, Dan Williams, vishal.l.verma, nvdimm
Cc: Linux-MM, Joao Martins, Jane Chu
Jane Chu wrote:
> On 3/7/2024 5:42 PM, Jane Chu wrote:
>
> > On 3/7/2024 4:49 PM, Dan Williams wrote:
> >
> >> Jane Chu wrote:
> >>> Add Joao.
> >>>
> >>> On 3/7/2024 1:05 PM, Dan Williams wrote:
> >>>
> >>>> Jane Chu wrote:
> >>>>> Hi, Dan and Vishal,
> >>>>>
> >>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
> >>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
> >>>> As always, the NUMA description, is a property of the platform not the
> >>>> media type / DIMM. The ACPI HMAT desrcibes the details of a
> >>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
> >>>> 6.4.
> >>> Thanks! So, compare to dax_kmem which assign a numa node to a newly
> >>> converted pmem/SysRAM region,
> >> ...to be clear, dax_kmem is not creating a new NUMA node, it is just
> >> potentially onlining a proximity domain that was fully described by ACPI
> >> SRAT but offline.
> >>
> >>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or
> >>> could expose) to userland about the extra latency such that userland
> >>> may treat these memory regions differently?
> >> Userland should be able to interrogate the memory_side_cache/ property
> >> in NUMA sysfs:
> >>
> >> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache
> >>
> >> Otherwise I believe SRAT and SLIT for that node only reflect the
> >> performance of the DDR fronting the PMEM. So if you have a DDR node and
> >> DDR+PMEM cache node, they may look the same from the ACPI SLIT
> >> perspective, but the ACPI HMAT contains the details of the backing
> >> memory. The Linux NUMA performance sysfs interface gets populated by
> >> ACPI HMAT.
> >
> > Thanks Dan.
> >
> > Please correct me if I'm mistaken: if I configure some barlowpass
> > nvdimms to MemoryMode and reboot, as those regions of memory is
> > automatically two level with DDR as the front cache, so hmat_init() is
> > expected to create the memory_side_cache/indexN interface, and if I
> > see multiple indexN layers, that would be a sign that pmem in
> > MemoryMode is present, right?
> >
> > I've yet to grab hold of a system to confirm this, but apparently with
> > only DDR memory, memory_side_cache/ doesn't exist.
>
> On each CPU socket node, we have
>
> | |-memory_side_cache | | |-uevent | | |-power | | |-index1 | | |
> |-uevent | | | |-power | | | |-line_size | | | |-write_policy | | |
> |-size | | | |-indexing
>
> where 'indexing' = 0, means direct-mapped cache?, so is that a clue that
> slower/far-memory is behind the cache?
Correct.
Note that the ACPI HMAT may also populate data about the performance of
the memory range on a cache miss (see ACPI 6.4 Table 5.129: System
Locality Latency and Bandwidth Information Structure), but the Linux
enabling does not export that information.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-03-08 2:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07 20:33 Barlopass nvdimm as MemoryMode question Jane Chu
2024-03-07 21:05 ` Dan Williams
2024-03-08 0:30 ` Jane Chu
2024-03-08 0:49 ` Dan Williams
2024-03-08 1:42 ` Jane Chu
2024-03-08 1:57 ` Jane Chu
2024-03-08 2:53 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox