linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Barlopass nvdimm as MemoryMode question
@ 2024-03-07 20:33 Jane Chu
  2024-03-07 21:05 ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Jane Chu @ 2024-03-07 20:33 UTC (permalink / raw)
  To: dan.j.williams, vishal.l.verma, nvdimm; +Cc: Linux-MM

[-- Attachment #1: Type: text/plain, Size: 171 bytes --]

Hi, Dan and Vishal,

What kind of NUMAness is visible to the kernel w.r.t. SysRAM region 
backed by Barlopass nvdimms configured in MemoryMode by impctl ?

Thanks!

-jane

[-- Attachment #2: Type: text/html, Size: 503 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Barlopass nvdimm as MemoryMode question
  2024-03-07 20:33 Barlopass nvdimm as MemoryMode question Jane Chu
@ 2024-03-07 21:05 ` Dan Williams
  2024-03-08  0:30   ` Jane Chu
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-03-07 21:05 UTC (permalink / raw)
  To: Jane Chu, dan.j.williams, vishal.l.verma, nvdimm; +Cc: Linux-MM

Jane Chu wrote:
> Hi, Dan and Vishal,
> 
> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region 
> backed by Barlopass nvdimms configured in MemoryMode by impctl ?

As always, the NUMA description, is a property of the platform not the
media type / DIMM. The ACPI HMAT desrcibes the details of a
memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
6.4.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Barlopass nvdimm as MemoryMode question
  2024-03-07 21:05 ` Dan Williams
@ 2024-03-08  0:30   ` Jane Chu
  2024-03-08  0:49     ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Jane Chu @ 2024-03-08  0:30 UTC (permalink / raw)
  To: Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins

Add Joao.

On 3/7/2024 1:05 PM, Dan Williams wrote:

> Jane Chu wrote:
>> Hi, Dan and Vishal,
>>
>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
>> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
> As always, the NUMA description, is a property of the platform not the
> media type / DIMM. The ACPI HMAT desrcibes the details of a
> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
> 6.4.

Thanks!  So, compare to dax_kmem which assign a numa node to a newly 
converted pmem/SysRAM region,  w.r.t. pmem in MemoryMode, is there any 
clue that kernel exposes(or could expose) to userland about the extra 
latency such that userland may treat these memory regions differently?

thanks,

-jane



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Barlopass nvdimm as MemoryMode question
  2024-03-08  0:30   ` Jane Chu
@ 2024-03-08  0:49     ` Dan Williams
  2024-03-08  1:42       ` Jane Chu
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-03-08  0:49 UTC (permalink / raw)
  To: Jane Chu, Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins

Jane Chu wrote:
> Add Joao.
> 
> On 3/7/2024 1:05 PM, Dan Williams wrote:
> 
> > Jane Chu wrote:
> >> Hi, Dan and Vishal,
> >>
> >> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
> >> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
> > As always, the NUMA description, is a property of the platform not the
> > media type / DIMM. The ACPI HMAT desrcibes the details of a
> > memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
> > 6.4.
> 
> Thanks!  So, compare to dax_kmem which assign a numa node to a newly 
> converted pmem/SysRAM region, 

...to be clear, dax_kmem is not creating a new NUMA node, it is just
potentially onlining a proximity domain that was fully described by ACPI
SRAT but offline.

> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or
> could expose) to userland about the extra latency such that userland
> may treat these memory regions differently?

Userland should be able to interrogate the memory_side_cache/ property
in NUMA sysfs:

https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache

Otherwise I believe SRAT and SLIT for that node only reflect the
performance of the DDR fronting the PMEM. So if you have a DDR node and
DDR+PMEM cache node, they may look the same from the ACPI SLIT
perspective, but the ACPI HMAT contains the details of the backing
memory. The Linux NUMA performance sysfs interface gets populated by
ACPI HMAT.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Barlopass nvdimm as MemoryMode question
  2024-03-08  0:49     ` Dan Williams
@ 2024-03-08  1:42       ` Jane Chu
  2024-03-08  1:57         ` Jane Chu
  0 siblings, 1 reply; 7+ messages in thread
From: Jane Chu @ 2024-03-08  1:42 UTC (permalink / raw)
  To: Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins, Jane Chu

On 3/7/2024 4:49 PM, Dan Williams wrote:

> Jane Chu wrote:
>> Add Joao.
>>
>> On 3/7/2024 1:05 PM, Dan Williams wrote:
>>
>>> Jane Chu wrote:
>>>> Hi, Dan and Vishal,
>>>>
>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
>>> As always, the NUMA description, is a property of the platform not the
>>> media type / DIMM. The ACPI HMAT desrcibes the details of a
>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
>>> 6.4.
>> Thanks!  So, compare to dax_kmem which assign a numa node to a newly
>> converted pmem/SysRAM region,
> ...to be clear, dax_kmem is not creating a new NUMA node, it is just
> potentially onlining a proximity domain that was fully described by ACPI
> SRAT but offline.
>
>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or
>> could expose) to userland about the extra latency such that userland
>> may treat these memory regions differently?
> Userland should be able to interrogate the memory_side_cache/ property
> in NUMA sysfs:
>
> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache
>
> Otherwise I believe SRAT and SLIT for that node only reflect the
> performance of the DDR fronting the PMEM. So if you have a DDR node and
> DDR+PMEM cache node, they may look the same from the ACPI SLIT
> perspective, but the ACPI HMAT contains the details of the backing
> memory. The Linux NUMA performance sysfs interface gets populated by
> ACPI HMAT.

Thanks Dan.

Please correct me if I'm mistaken:  if I configure some barlowpass 
nvdimms to MemoryMode and reboot, as those regions of memory is 
automatically two level with DDR as the front cache, so hmat_init() is 
expected to create the memory_side_cache/indexN interface, and if I see 
multiple indexN layers, that would be a sign that pmem in MemoryMode is 
present, right?

I've yet to grab hold of a system to confirm this, but apparently with 
only DDR memory, memory_side_cache/ doesn't exist.

thanks!

-jane



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Barlopass nvdimm as MemoryMode question
  2024-03-08  1:42       ` Jane Chu
@ 2024-03-08  1:57         ` Jane Chu
  2024-03-08  2:53           ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Jane Chu @ 2024-03-08  1:57 UTC (permalink / raw)
  To: Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins, Jane Chu

On 3/7/2024 5:42 PM, Jane Chu wrote:

> On 3/7/2024 4:49 PM, Dan Williams wrote:
>
>> Jane Chu wrote:
>>> Add Joao.
>>>
>>> On 3/7/2024 1:05 PM, Dan Williams wrote:
>>>
>>>> Jane Chu wrote:
>>>>> Hi, Dan and Vishal,
>>>>>
>>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
>>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
>>>> As always, the NUMA description, is a property of the platform not the
>>>> media type / DIMM. The ACPI HMAT desrcibes the details of a
>>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
>>>> 6.4.
>>> Thanks!  So, compare to dax_kmem which assign a numa node to a newly
>>> converted pmem/SysRAM region,
>> ...to be clear, dax_kmem is not creating a new NUMA node, it is just
>> potentially onlining a proximity domain that was fully described by ACPI
>> SRAT but offline.
>>
>>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or
>>> could expose) to userland about the extra latency such that userland
>>> may treat these memory regions differently?
>> Userland should be able to interrogate the memory_side_cache/ property
>> in NUMA sysfs:
>>
>> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache
>>
>> Otherwise I believe SRAT and SLIT for that node only reflect the
>> performance of the DDR fronting the PMEM. So if you have a DDR node and
>> DDR+PMEM cache node, they may look the same from the ACPI SLIT
>> perspective, but the ACPI HMAT contains the details of the backing
>> memory. The Linux NUMA performance sysfs interface gets populated by
>> ACPI HMAT.
>
> Thanks Dan.
>
> Please correct me if I'm mistaken:  if I configure some barlowpass 
> nvdimms to MemoryMode and reboot, as those regions of memory is 
> automatically two level with DDR as the front cache, so hmat_init() is 
> expected to create the memory_side_cache/indexN interface, and if I 
> see multiple indexN layers, that would be a sign that pmem in 
> MemoryMode is present, right?
>
> I've yet to grab hold of a system to confirm this, but apparently with 
> only DDR memory, memory_side_cache/ doesn't exist.

On each CPU socket node, we have

| |-memory_side_cache | | |-uevent | | |-power | | |-index1 | | | 
|-uevent | | | |-power | | | |-line_size | | | |-write_policy | | | 
|-size | | | |-indexing

where 'indexing' = 0, means direct-mapped cache?, so is that a clue that 
slower/far-memory is behind the cache?

thanks!

-jane

>
> thanks!
>
> -jane
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Barlopass nvdimm as MemoryMode question
  2024-03-08  1:57         ` Jane Chu
@ 2024-03-08  2:53           ` Dan Williams
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2024-03-08  2:53 UTC (permalink / raw)
  To: Jane Chu, Dan Williams, vishal.l.verma, nvdimm
  Cc: Linux-MM, Joao Martins, Jane Chu

Jane Chu wrote:
> On 3/7/2024 5:42 PM, Jane Chu wrote:
> 
> > On 3/7/2024 4:49 PM, Dan Williams wrote:
> >
> >> Jane Chu wrote:
> >>> Add Joao.
> >>>
> >>> On 3/7/2024 1:05 PM, Dan Williams wrote:
> >>>
> >>>> Jane Chu wrote:
> >>>>> Hi, Dan and Vishal,
> >>>>>
> >>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
> >>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
> >>>> As always, the NUMA description, is a property of the platform not the
> >>>> media type / DIMM. The ACPI HMAT desrcibes the details of a
> >>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
> >>>> 6.4.
> >>> Thanks!  So, compare to dax_kmem which assign a numa node to a newly
> >>> converted pmem/SysRAM region,
> >> ...to be clear, dax_kmem is not creating a new NUMA node, it is just
> >> potentially onlining a proximity domain that was fully described by ACPI
> >> SRAT but offline.
> >>
> >>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or
> >>> could expose) to userland about the extra latency such that userland
> >>> may treat these memory regions differently?
> >> Userland should be able to interrogate the memory_side_cache/ property
> >> in NUMA sysfs:
> >>
> >> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache
> >>
> >> Otherwise I believe SRAT and SLIT for that node only reflect the
> >> performance of the DDR fronting the PMEM. So if you have a DDR node and
> >> DDR+PMEM cache node, they may look the same from the ACPI SLIT
> >> perspective, but the ACPI HMAT contains the details of the backing
> >> memory. The Linux NUMA performance sysfs interface gets populated by
> >> ACPI HMAT.
> >
> > Thanks Dan.
> >
> > Please correct me if I'm mistaken:  if I configure some barlowpass 
> > nvdimms to MemoryMode and reboot, as those regions of memory is 
> > automatically two level with DDR as the front cache, so hmat_init() is 
> > expected to create the memory_side_cache/indexN interface, and if I 
> > see multiple indexN layers, that would be a sign that pmem in 
> > MemoryMode is present, right?
> >
> > I've yet to grab hold of a system to confirm this, but apparently with 
> > only DDR memory, memory_side_cache/ doesn't exist.
> 
> On each CPU socket node, we have
> 
> | |-memory_side_cache | | |-uevent | | |-power | | |-index1 | | | 
> |-uevent | | | |-power | | | |-line_size | | | |-write_policy | | | 
> |-size | | | |-indexing
> 
> where 'indexing' = 0, means direct-mapped cache?, so is that a clue that 
> slower/far-memory is behind the cache?

Correct.

Note that the ACPI HMAT may also populate data about the performance of
the memory range on a cache miss (see ACPI 6.4 Table 5.129: System
Locality Latency and Bandwidth Information Structure), but the Linux
enabling does not export that information.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-03-08  2:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07 20:33 Barlopass nvdimm as MemoryMode question Jane Chu
2024-03-07 21:05 ` Dan Williams
2024-03-08  0:30   ` Jane Chu
2024-03-08  0:49     ` Dan Williams
2024-03-08  1:42       ` Jane Chu
2024-03-08  1:57         ` Jane Chu
2024-03-08  2:53           ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox