From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "david@redhat.com" <david@redhat.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"rafael@kernel.org" <rafael@kernel.org>,
"osalvador@suse.de" <osalvador@suse.de>,
"aneesh.kumar@linux.ibm.com" <aneesh.kumar@linux.ibm.com>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"lenb@kernel.org" <lenb@kernel.org>,
"Jiang, Dave" <dave.jiang@intel.com>
Cc: "Huang, Ying" <ying.huang@intel.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Subject: Re: [PATCH 3/3] dax/kmem: Always enroll hotplugged memory for memmap_on_memory
Date: Thu, 13 Jul 2023 06:45:20 +0000 [thread overview]
Message-ID: <5a8e9b1b6c8d6d9e5405ca35abb9be3ed09761c3.camel@intel.com> (raw)
In-Reply-To: <1df12885-9ae4-6aef-1a31-91ecd5a18d24@redhat.com>
On Tue, 2023-07-11 at 17:21 +0200, David Hildenbrand wrote:
> On 11.07.23 16:30, Aneesh Kumar K.V wrote:
> > David Hildenbrand <david@redhat.com> writes:
> > >
> > > Maybe the better alternative is teach
> > > add_memory_resource()/try_remove_memory() to do that internally.
> > >
> > > In the add_memory_resource() case, it might be a loop around that
> > > memmap_on_memory + arch_add_memory code path (well, and the error path
> > > also needs adjustment):
> > >
> > > /*
> > > * Self hosted memmap array
> > > */
> > > if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
> > > if (!mhp_supports_memmap_on_memory(size)) {
> > > ret = -EINVAL;
> > > goto error;
> > > }
> > > mhp_altmap.free = PHYS_PFN(size);
> > > mhp_altmap.base_pfn = PHYS_PFN(start);
> > > params.altmap = &mhp_altmap;
> > > }
> > >
> > > /* call arch's memory hotadd */
> > > ret = arch_add_memory(nid, start, size, ¶ms);
> > > if (ret < 0)
> > > goto error;
> > >
> > >
> > > Note that we want to handle that on a per memory-block basis, because we
> > > don't want the vmemmap of memory block #2 to end up on memory block #1.
> > > It all gets messy with memory onlining/offlining etc otherwise ...
> > >
> >
> > I tried to implement this inside add_memory_driver_managed() and also
> > within dax/kmem. IMHO doing the error handling inside dax/kmem is
> > better. Here is how it looks:
> >
> > 1. If any blocks got added before (mapped > 0) we loop through all successful request_mem_regions
> > 2. For each succesful request_mem_regions if any blocks got added, we
> > keep the resource. If none got added, we will kfree the resource
> >
>
> Doing this unconditional splitting outside of
> add_memory_driver_managed() is undesirable for at least two reasons
>
> 1) You end up always creating individual entries in the resource tree
> (/proc/iomem) even if MHP_MEMMAP_ON_MEMORY is not effective.
> 2) As we call arch_add_memory() in memory block granularity (e.g., 128
> MiB on x86), we might not make use of large PUDs (e.g., 1 GiB) in the
> identify mapping -- even if MHP_MEMMAP_ON_MEMORY is not effective.
>
> While you could sense for support and do the split based on that, it
> will be beneficial for other users (especially DIMMs) if we do that
> internally -- where we already know if MHP_MEMMAP_ON_MEMORY can be
> effective or not.
I'm taking a shot at implementing the splitting internally in
memory_hotplug.c. The caller (kmem) side does become trivial with this
approach, but there's a slight complication if I don't have the module
param override (patch 1 of this series).
The kmem diff now looks like:
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 898ca9505754..8be932f63f90 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -105,6 +105,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
data->mgid = rc;
for (i = 0; i < dev_dax->nr_range; i++) {
+ mhp_t mhp_flags = MHP_NID_IS_MGID | MHP_MEMMAP_ON_MEMORY |
+ MHP_SPLIT_MEMBLOCKS;
struct resource *res;
struct range range;
@@ -141,7 +143,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
* this as RAM automatically.
*/
rc = add_memory_driver_managed(data->mgid, range.start,
- range_len(&range), kmem_name, MHP_NID_IS_MGID);
+ range_len(&range), kmem_name, mhp_flags);
if (rc) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
However this begins to fail if the memmap_on_memory modparam is not
set, as add_memory_driver_managed EINVALs from the
mhp_supports_memmap_on_memory() check.
The way to work around this would probably include doing the
mhp_supports_memmap_on_memory() check in kmem, in a loop to check for
each memblock sized chunk, and that feels like a leak of the
implementation details into the caller.
Any suggestions on how to go about this?
>
> In general, we avoid placing important kernel data-structures on slow
> memory. That's one of the reasons why PMEM decided to mostly always use
> ZONE_MOVABLE such that exactly what this patch does would not happen. So
> I'm wondering if there would be demand for an additional toggle.
>
> Because even with memmap_on_memory enabled in general, you might not
> want to do that for dax/kmem.
>
> IMHO, this patch should be dropped from your ppc64 series, as it's an
> independent change that might be valuable for other architectures as well.
>
Sure thing, I can pick this back up and Aneesh can drop this from his set.
next prev parent reply other threads:[~2023-07-13 6:45 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-15 22:00 [PATCH 0/3] mm: use memmap_on_memory semantics for dax/kmem Vishal Verma
2023-06-15 22:00 ` [PATCH 1/3] mm/memory_hotplug: Allow an override for the memmap_on_memory param Vishal Verma
2023-06-16 6:35 ` Huang, Ying
2023-06-16 7:46 ` David Hildenbrand
2023-06-22 13:37 ` Jonathan Cameron
2023-06-23 8:40 ` Aneesh Kumar K.V
2023-06-23 12:35 ` David Hildenbrand
2023-06-15 22:00 ` [PATCH 2/3] mm/memory_hotplug: Export symbol mhp_supports_memmap_on_memory() Vishal Verma
2023-06-16 7:47 ` David Hildenbrand
2023-06-15 22:00 ` [PATCH 3/3] dax/kmem: Always enroll hotplugged memory for memmap_on_memory Vishal Verma
2023-06-16 6:42 ` Huang, Ying
2023-06-16 7:54 ` David Hildenbrand
2023-07-11 14:30 ` Aneesh Kumar K.V
2023-07-11 15:21 ` David Hildenbrand
2023-07-13 6:45 ` Verma, Vishal L [this message]
2023-07-13 7:23 ` David Hildenbrand
2023-07-13 15:15 ` Verma, Vishal L
2023-07-13 15:23 ` David Hildenbrand
2023-07-13 15:40 ` Verma, Vishal L
2023-07-13 15:43 ` David Hildenbrand
2023-06-20 13:14 ` Tarun Sahu
2023-06-16 7:44 ` [PATCH 0/3] mm: use memmap_on_memory semantics for dax/kmem David Hildenbrand
2023-06-21 19:32 ` Verma, Vishal L
2023-06-22 13:55 ` David Hildenbrand
2023-07-13 19:12 ` Jeff Moyer
2023-07-14 8:35 ` David Hildenbrand
2023-07-14 13:54 ` Jeff Moyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a8e9b1b6c8d6d9e5405ca35abb9be3ed09761c3.camel@intel.com \
--to=vishal.l.verma@intel.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@redhat.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nvdimm@lists.linux.dev \
--cc=osalvador@suse.de \
--cc=rafael@kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox