* Barlopass nvdimm as MemoryMode question @ 2024-03-07 20:33 Jane Chu 2024-03-07 21:05 ` Dan Williams 0 siblings, 1 reply; 7+ messages in thread From: Jane Chu @ 2024-03-07 20:33 UTC (permalink / raw) To: dan.j.williams, vishal.l.verma, nvdimm; +Cc: Linux-MM [-- Attachment #1: Type: text/plain, Size: 171 bytes --] Hi, Dan and Vishal, What kind of NUMAness is visible to the kernel w.r.t. SysRAM region backed by Barlopass nvdimms configured in MemoryMode by impctl ? Thanks! -jane [-- Attachment #2: Type: text/html, Size: 503 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question 2024-03-07 20:33 Barlopass nvdimm as MemoryMode question Jane Chu @ 2024-03-07 21:05 ` Dan Williams 2024-03-08 0:30 ` Jane Chu 0 siblings, 1 reply; 7+ messages in thread From: Dan Williams @ 2024-03-07 21:05 UTC (permalink / raw) To: Jane Chu, dan.j.williams, vishal.l.verma, nvdimm; +Cc: Linux-MM Jane Chu wrote: > Hi, Dan and Vishal, > > What kind of NUMAness is visible to the kernel w.r.t. SysRAM region > backed by Barlopass nvdimms configured in MemoryMode by impctl ? As always, the NUMA description, is a property of the platform not the media type / DIMM. The ACPI HMAT desrcibes the details of a memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI 6.4. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question 2024-03-07 21:05 ` Dan Williams @ 2024-03-08 0:30 ` Jane Chu 2024-03-08 0:49 ` Dan Williams 0 siblings, 1 reply; 7+ messages in thread From: Jane Chu @ 2024-03-08 0:30 UTC (permalink / raw) To: Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins Add Joao. On 3/7/2024 1:05 PM, Dan Williams wrote: > Jane Chu wrote: >> Hi, Dan and Vishal, >> >> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region >> backed by Barlopass nvdimms configured in MemoryMode by impctl ? > As always, the NUMA description, is a property of the platform not the > media type / DIMM. The ACPI HMAT desrcibes the details of a > memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI > 6.4. Thanks! So, compare to dax_kmem which assign a numa node to a newly converted pmem/SysRAM region, w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or could expose) to userland about the extra latency such that userland may treat these memory regions differently? thanks, -jane ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question 2024-03-08 0:30 ` Jane Chu @ 2024-03-08 0:49 ` Dan Williams 2024-03-08 1:42 ` Jane Chu 0 siblings, 1 reply; 7+ messages in thread From: Dan Williams @ 2024-03-08 0:49 UTC (permalink / raw) To: Jane Chu, Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins Jane Chu wrote: > Add Joao. > > On 3/7/2024 1:05 PM, Dan Williams wrote: > > > Jane Chu wrote: > >> Hi, Dan and Vishal, > >> > >> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region > >> backed by Barlopass nvdimms configured in MemoryMode by impctl ? > > As always, the NUMA description, is a property of the platform not the > > media type / DIMM. The ACPI HMAT desrcibes the details of a > > memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI > > 6.4. > > Thanks! So, compare to dax_kmem which assign a numa node to a newly > converted pmem/SysRAM region, ...to be clear, dax_kmem is not creating a new NUMA node, it is just potentially onlining a proximity domain that was fully described by ACPI SRAT but offline. > w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or > could expose) to userland about the extra latency such that userland > may treat these memory regions differently? Userland should be able to interrogate the memory_side_cache/ property in NUMA sysfs: https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache Otherwise I believe SRAT and SLIT for that node only reflect the performance of the DDR fronting the PMEM. So if you have a DDR node and DDR+PMEM cache node, they may look the same from the ACPI SLIT perspective, but the ACPI HMAT contains the details of the backing memory. The Linux NUMA performance sysfs interface gets populated by ACPI HMAT. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question 2024-03-08 0:49 ` Dan Williams @ 2024-03-08 1:42 ` Jane Chu 2024-03-08 1:57 ` Jane Chu 0 siblings, 1 reply; 7+ messages in thread From: Jane Chu @ 2024-03-08 1:42 UTC (permalink / raw) To: Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins, Jane Chu On 3/7/2024 4:49 PM, Dan Williams wrote: > Jane Chu wrote: >> Add Joao. >> >> On 3/7/2024 1:05 PM, Dan Williams wrote: >> >>> Jane Chu wrote: >>>> Hi, Dan and Vishal, >>>> >>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region >>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ? >>> As always, the NUMA description, is a property of the platform not the >>> media type / DIMM. The ACPI HMAT desrcibes the details of a >>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI >>> 6.4. >> Thanks! So, compare to dax_kmem which assign a numa node to a newly >> converted pmem/SysRAM region, > ...to be clear, dax_kmem is not creating a new NUMA node, it is just > potentially onlining a proximity domain that was fully described by ACPI > SRAT but offline. > >> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or >> could expose) to userland about the extra latency such that userland >> may treat these memory regions differently? > Userland should be able to interrogate the memory_side_cache/ property > in NUMA sysfs: > > https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache > > Otherwise I believe SRAT and SLIT for that node only reflect the > performance of the DDR fronting the PMEM. So if you have a DDR node and > DDR+PMEM cache node, they may look the same from the ACPI SLIT > perspective, but the ACPI HMAT contains the details of the backing > memory. The Linux NUMA performance sysfs interface gets populated by > ACPI HMAT. Thanks Dan. Please correct me if I'm mistaken: if I configure some barlowpass nvdimms to MemoryMode and reboot, as those regions of memory is automatically two level with DDR as the front cache, so hmat_init() is expected to create the memory_side_cache/indexN interface, and if I see multiple indexN layers, that would be a sign that pmem in MemoryMode is present, right? I've yet to grab hold of a system to confirm this, but apparently with only DDR memory, memory_side_cache/ doesn't exist. thanks! -jane ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question 2024-03-08 1:42 ` Jane Chu @ 2024-03-08 1:57 ` Jane Chu 2024-03-08 2:53 ` Dan Williams 0 siblings, 1 reply; 7+ messages in thread From: Jane Chu @ 2024-03-08 1:57 UTC (permalink / raw) To: Dan Williams, vishal.l.verma, nvdimm; +Cc: Linux-MM, Joao Martins, Jane Chu On 3/7/2024 5:42 PM, Jane Chu wrote: > On 3/7/2024 4:49 PM, Dan Williams wrote: > >> Jane Chu wrote: >>> Add Joao. >>> >>> On 3/7/2024 1:05 PM, Dan Williams wrote: >>> >>>> Jane Chu wrote: >>>>> Hi, Dan and Vishal, >>>>> >>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region >>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ? >>>> As always, the NUMA description, is a property of the platform not the >>>> media type / DIMM. The ACPI HMAT desrcibes the details of a >>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI >>>> 6.4. >>> Thanks! So, compare to dax_kmem which assign a numa node to a newly >>> converted pmem/SysRAM region, >> ...to be clear, dax_kmem is not creating a new NUMA node, it is just >> potentially onlining a proximity domain that was fully described by ACPI >> SRAT but offline. >> >>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or >>> could expose) to userland about the extra latency such that userland >>> may treat these memory regions differently? >> Userland should be able to interrogate the memory_side_cache/ property >> in NUMA sysfs: >> >> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache >> >> Otherwise I believe SRAT and SLIT for that node only reflect the >> performance of the DDR fronting the PMEM. So if you have a DDR node and >> DDR+PMEM cache node, they may look the same from the ACPI SLIT >> perspective, but the ACPI HMAT contains the details of the backing >> memory. The Linux NUMA performance sysfs interface gets populated by >> ACPI HMAT. > > Thanks Dan. > > Please correct me if I'm mistaken: if I configure some barlowpass > nvdimms to MemoryMode and reboot, as those regions of memory is > automatically two level with DDR as the front cache, so hmat_init() is > expected to create the memory_side_cache/indexN interface, and if I > see multiple indexN layers, that would be a sign that pmem in > MemoryMode is present, right? > > I've yet to grab hold of a system to confirm this, but apparently with > only DDR memory, memory_side_cache/ doesn't exist. On each CPU socket node, we have | |-memory_side_cache | | |-uevent | | |-power | | |-index1 | | | |-uevent | | | |-power | | | |-line_size | | | |-write_policy | | | |-size | | | |-indexing where 'indexing' = 0, means direct-mapped cache?, so is that a clue that slower/far-memory is behind the cache? thanks! -jane > > thanks! > > -jane > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Barlopass nvdimm as MemoryMode question 2024-03-08 1:57 ` Jane Chu @ 2024-03-08 2:53 ` Dan Williams 0 siblings, 0 replies; 7+ messages in thread From: Dan Williams @ 2024-03-08 2:53 UTC (permalink / raw) To: Jane Chu, Dan Williams, vishal.l.verma, nvdimm Cc: Linux-MM, Joao Martins, Jane Chu Jane Chu wrote: > On 3/7/2024 5:42 PM, Jane Chu wrote: > > > On 3/7/2024 4:49 PM, Dan Williams wrote: > > > >> Jane Chu wrote: > >>> Add Joao. > >>> > >>> On 3/7/2024 1:05 PM, Dan Williams wrote: > >>> > >>>> Jane Chu wrote: > >>>>> Hi, Dan and Vishal, > >>>>> > >>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region > >>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ? > >>>> As always, the NUMA description, is a property of the platform not the > >>>> media type / DIMM. The ACPI HMAT desrcibes the details of a > >>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI > >>>> 6.4. > >>> Thanks! So, compare to dax_kmem which assign a numa node to a newly > >>> converted pmem/SysRAM region, > >> ...to be clear, dax_kmem is not creating a new NUMA node, it is just > >> potentially onlining a proximity domain that was fully described by ACPI > >> SRAT but offline. > >> > >>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or > >>> could expose) to userland about the extra latency such that userland > >>> may treat these memory regions differently? > >> Userland should be able to interrogate the memory_side_cache/ property > >> in NUMA sysfs: > >> > >> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache > >> > >> Otherwise I believe SRAT and SLIT for that node only reflect the > >> performance of the DDR fronting the PMEM. So if you have a DDR node and > >> DDR+PMEM cache node, they may look the same from the ACPI SLIT > >> perspective, but the ACPI HMAT contains the details of the backing > >> memory. The Linux NUMA performance sysfs interface gets populated by > >> ACPI HMAT. > > > > Thanks Dan. > > > > Please correct me if I'm mistaken: if I configure some barlowpass > > nvdimms to MemoryMode and reboot, as those regions of memory is > > automatically two level with DDR as the front cache, so hmat_init() is > > expected to create the memory_side_cache/indexN interface, and if I > > see multiple indexN layers, that would be a sign that pmem in > > MemoryMode is present, right? > > > > I've yet to grab hold of a system to confirm this, but apparently with > > only DDR memory, memory_side_cache/ doesn't exist. > > On each CPU socket node, we have > > | |-memory_side_cache | | |-uevent | | |-power | | |-index1 | | | > |-uevent | | | |-power | | | |-line_size | | | |-write_policy | | | > |-size | | | |-indexing > > where 'indexing' = 0, means direct-mapped cache?, so is that a clue that > slower/far-memory is behind the cache? Correct. Note that the ACPI HMAT may also populate data about the performance of the memory range on a cache miss (see ACPI 6.4 Table 5.129: System Locality Latency and Bandwidth Information Structure), but the Linux enabling does not export that information. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-03-08 2:53 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-03-07 20:33 Barlopass nvdimm as MemoryMode question Jane Chu 2024-03-07 21:05 ` Dan Williams 2024-03-08 0:30 ` Jane Chu 2024-03-08 0:49 ` Dan Williams 2024-03-08 1:42 ` Jane Chu 2024-03-08 1:57 ` Jane Chu 2024-03-08 2:53 ` Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox