From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>,
Derek Kiernan <derek.kiernan@amd.com>,
Dragan Cvetic <dragan.cvetic@amd.com>,
Arnd Bergmann <arnd@arndb.de>, Wei Liu <wei.liu@kernel.org>,
"K. Y. Srinivasan" <kys@microsoft.com>,
Madhavan Venkataraman <madvenka@linux.microsoft.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
"Mike Rapoport (IBM)" <rppt@kernel.org>,
James Gowans <jgowans@amazon.com>,
Anirudh Rayabharam <anrayabh@linux.microsoft.com>,
Jinank Jain <jinankjain@linux.microsoft.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] Introduce persistent memory pool
Date: Tue, 22 Aug 2023 18:36:10 -0700 [thread overview]
Message-ID: <20230823013610.GA25162@skinsburskii.> (raw)
In-Reply-To: <2023082506-enchanted-tripping-d1d5@gregkh>
On Fri, Aug 25, 2023 at 10:05:08AM +0200, Greg Kroah-Hartman wrote:
> On Tue, Aug 22, 2023 at 11:34:34AM -0700, Stanislav Kinsburskii wrote:
> > This patch addresses the need for a memory allocator dedicated to
> > persistent memory within the kernel. This allocator will preserve
> > kernel-specific states like DMA passthrough device states, IOMMU state, and
> > more across kexec.
>
> And CMA doesn't do this for you today?
>
Unfortunatelly no, as it doesn't preserve the bitmap (and thus the
actual pages state) across kexec.
But perhaps extending cma or building something else on top of it is the
right way forward here.
I'll look into it, thank you for the pointer.
> >acrros The proposed solution offers a foundational implementation for potential
> > custom solutions that might follow. Though the implementation is
> > intentionally kept concise and straightforward to foster discussion and
> > feedback, it's fully functional in its current state.
> >
> > The persistent memory pool consists of a simple page allocator that
> > operates on reserved physical memory regions. It employs bitmaps to
> > allocate and free pages, with the following characteristics:
> >
> > 1. Memory pool regions are specified using the command line option:
> >
> > pmpool=<base>,<size>
> >
> > Where <base> represents the physical memory base address and <size>
> > indicates the memory size.
> >
> > 2. While the pages allocation emulates the buddy allocator interface, it
> > isn't confined to the MAX_ORDER.
> >
> > 3. The memory pool initializes during a cold boot and is retained across
> > kexec.
> >
> > Potential applications include:
> >
> > 1. Allowing various in-kernel entities to allocate persistent pages from
> > a singular memory pool, eliminating the need for multiple region
> > reservations.
> >
> > 2. For in-kernel components that require the allocation address to be
> > available on kernel kexec, this address can be exposed to user space and
> > then passed via the command line.
> >
> > 3. Separate subsystems or drivers can reserve their region, sharing a
> > portion of it for their persistent memory pool. This might be a file
> > system, a key-value store, or other applications.
> >
> > Potential Enhancements for the Proposed Memory Pool:
> >
> > 1. Multiple Memory Regions Support: enhance the pool to accommodate and
> > manage multiple memory regions simultaneously.
> >
> > 2. In-Kernel Memory Allocations for Storage Memory Regions:
> > * Allow in-kernel memory allocations to serve as storage memory regions.
> > * Implement explicit reservation of these designated regions during kexec.
>
> As you have no in-kernel users of this, it's not something we can even
> consider at the moment for obvious reasons (neither would you want us
> to.)
>
This is fair.
I didn't have any anticipation for this code to get merged.
My hope is rather gathering any feedback I can get.
> Can you make this part of a patch series that actually adds a user,
> probably more than one, so that we can see if any of this even makes
> sense?
>
Will do.
>
> > drivers/misc/Kconfig | 7 +
> > drivers/misc/Makefile | 1
> > drivers/misc/pmpool.c | 270 ++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/pmpool.h | 20 ++++
> > 4 files changed, 298 insertions(+)
> > create mode 100644 drivers/misc/pmpool.c
> > create mode 100644 include/linux/pmpool.h
>
> misc is not for memory pools, as this is not a driver. please put this
> in the properly location instead of trying to hide it from the mm
> maintainers and subsystem :)
>
This is indeed a misplacement. I had the completely oposite intention as
I do know that this is a common area for multiple contributors. That's
actually why I carefully crafted the recepients list to make sure all
pkram-involved people are there.
I added the mm list to the CC.
> > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > index cadd4a820c03..c8ef5b37ee98 100644
> > --- a/drivers/misc/Kconfig
> > +++ b/drivers/misc/Kconfig
> > @@ -562,6 +562,13 @@ config TPS6594_PFSM
> > This driver can also be built as a module. If so, the module
> > will be called tps6594-pfsm.
> >
> > +config PMPOOL
> > + bool "Persistent memory pool support"
> > + help
> > + This option adds support for a persistent memory pool feature, which
> > + provides pages allocation and freeing from a set of persistent memory ranges,
> > + deposited to the memory pool.
>
> Why would this even be a user selectable option? Either the kernel
> needs this or it doesn't.
>
I don't see a default kernel need per se. Kernel drivers or file systems
may or may not use such a pool to store their state there. Therefore I
made it optional.
>
> > +
> > source "drivers/misc/c2port/Kconfig"
> > source "drivers/misc/eeprom/Kconfig"
> > source "drivers/misc/cb710/Kconfig"
> > diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> > index f2a4d1ff65d4..31dd6553057d 100644
> > --- a/drivers/misc/Makefile
> > +++ b/drivers/misc/Makefile
> > @@ -67,3 +67,4 @@ obj-$(CONFIG_TMR_MANAGER) += xilinx_tmr_manager.o
> > obj-$(CONFIG_TMR_INJECT) += xilinx_tmr_inject.o
> > obj-$(CONFIG_TPS6594_ESM) += tps6594-esm.o
> > obj-$(CONFIG_TPS6594_PFSM) += tps6594-pfsm.o
> > +obj-$(CONFIG_PMPOOL) += pmpool.o
> > diff --git a/drivers/misc/pmpool.c b/drivers/misc/pmpool.c
> > new file mode 100644
> > index 000000000000..e2c923b31b36
> > --- /dev/null
> > +++ b/drivers/misc/pmpool.c
> > @@ -0,0 +1,270 @@
> > +#include <linux/io.h>
>
> You forgot basic copyright/license stuff, did you use checkpatch.pl on
> your file?
>
Yeah, I used checkpatch.pl. And I deliberately omitted the license
stuff as I wanted the shorten the diff with intention to have it as an
RFC only.
I'll fix it next time.
> > +#include <linux/bitmap.h>
> > +#include <linux/memblock.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +
> > +#include <linux/pmpool.h>
> > +
> > +#define VERSION 1
>
> In kernel code does not need versions.
>
Could you elaborate on this? Should kernel version be used as a backward
compatitbility marker instead?
> > +#define MAX_REGIONS 14
>
> Why 14? Why not 24? Or something else?
>
That's the number of bitmaps for 8 MB memory ranges (default MAX_ORDER
allocations) fitting into the 4k page header after the pmempool header.
But yeah, it can be definitely something else.
> > +
> > +#define for_each_region(pmpool, r) \
> > + for (r = pmpool->hdr->regions; \
> > + r - pmpool->hdr->regions < pmpool->hdr->nr_regions; \
> > + r++)
> > +
> > +#define for_each_used_region(pmpool, r) \
> > + for_each_region((pmpool), (r)) \
> > + if (!(r)->size_in_pfns) { continue; } else
> > +
> > +struct pmpool_region {
> > + uint32_t base_pfn; /* 32 bits * 4k = up it 15TB of physical address space */
>
> Please use proper kernel types when writing kernel code (i.e. u32, u8,
> and so on.) uint*_t is for userspace code only.
>
Will do.
> > --- /dev/null
> > +++ b/include/linux/pmpool.h
> > @@ -0,0 +1,20 @@
> > +#ifndef _PMPOOL_H
> > +#define _PMPOOL_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/spinlock.h>
> > +
> > +void *alloc_pm_pages(unsigned int order);
> > +void free_pm_pages(void *addr, unsigned int order);
> > +
> > +struct pmpool {
> > + spinlock_t lock;
> > + struct pmpool_header *hdr;
> > +};
> > +
> > +int pmpool_init(struct pmpool *pmpool, phys_addr_t base, phys_addr_t size);
> > +
> > +void *alloc_pmpool_pages(struct pmpool *pmpool, unsigned int order);
> > +void free_pmpool_pages(struct pmpool *pmpool, void *addr, unsigned int order);
>
> Please use "noun_verb_*" for new apis so that we have a chance at
> understanding where the calls are living.
>
Will do.
> good luck!
>
Thank you, Greg.
Stanislav
> greg k-h
next parent reply other threads:[~2023-08-25 18:45 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <64e7cbf7.050a0220.114c7.b70dSMTPIN_ADDED_BROKEN@mx.google.com>
[not found] ` <2023082506-enchanted-tripping-d1d5@gregkh>
2023-08-23 1:36 ` Stanislav Kinsburskii [this message]
[not found] ` <c26ad989dcc6737dd295e980c78ef53740098810.camel@amazon.com>
2023-08-23 2:45 ` Stanislav Kinsburskii
2023-08-28 20:50 ` Alexander Graf
2023-08-29 22:07 ` Stanislav Kinsburskii
2023-08-30 7:20 ` Alexander Graf
2023-08-30 23:39 ` Arnd Bergmann
2023-08-31 2:24 ` Stanislav Kinsburskii
[not found] ` <64e8f6dd.050a0220.edb3c.c045SMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 7:45 ` Greg Kroah-Hartman
2023-08-23 6:15 ` Stanislav Kinsburskii
[not found] ` <64ea25cd.650a0220.642cc.50e6SMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 17:02 ` Greg Kroah-Hartman
2023-08-23 6:21 ` Stanislav Kinsburskii
[not found] ` <64ea3699.170a0220.13ee0.5c3aSMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 20:04 ` Greg Kroah-Hartman
2023-08-31 14:18 ` Paolo Bonzini
2023-08-31 2:37 ` Stanislav Kinsburskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230823013610.GA25162@skinsburskii. \
--to=skinsburskii@linux.microsoft.com \
--cc=akpm@linux-foundation.org \
--cc=anrayabh@linux.microsoft.com \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=derek.kiernan@amd.com \
--cc=dragan.cvetic@amd.com \
--cc=gregkh@linuxfoundation.org \
--cc=jgowans@amazon.com \
--cc=jinankjain@linux.microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=madvenka@linux.microsoft.com \
--cc=rppt@kernel.org \
--cc=stanislav.kinsburskii@gmail.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox