* [PATCH 1/2] mm: vmalloc: Improve description of vmap node layer
@ 2024-01-24 18:09 Uladzislau Rezki (Sony)
2024-01-24 18:09 ` [PATCH 2/2] mm: vmalloc: Refactor vmalloc_dump_obj() function Uladzislau Rezki (Sony)
2024-01-30 18:44 ` [PATCH 1/2] mm: vmalloc: Improve description of vmap node layer Lorenzo Stoakes
0 siblings, 2 replies; 5+ messages in thread
From: Uladzislau Rezki (Sony) @ 2024-01-24 18:09 UTC (permalink / raw)
To: linux-mm, Andrew Morton
Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
Matthew Wilcox, Dave Chinner, Uladzislau Rezki,
Oleksiy Avramchenko
This patch adds extra explanation of recently added vmap
node layer based on community feedback. No functional change.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
mm/vmalloc.c | 60 ++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 46 insertions(+), 14 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 257981e37936..b8be601b056d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -765,9 +765,10 @@ static struct rb_root free_vmap_area_root = RB_ROOT;
static DEFINE_PER_CPU(struct vmap_area *, ne_fit_preload_node);
/*
- * An effective vmap-node logic. Users make use of nodes instead
- * of a global heap. It allows to balance an access and mitigate
- * contention.
+ * This structure defines a single, solid model where a list and
+ * rb-tree are part of one entity protected by the lock. Nodes are
+ * sorted in ascending order, thus for O(1) access to left/right
+ * neighbors a list is used as well as for sequential traversal.
*/
struct rb_list {
struct rb_root root;
@@ -775,16 +776,23 @@ struct rb_list {
spinlock_t lock;
};
+/*
+ * A fast size storage contains VAs up to 1M size. A pool consists
+ * of linked between each other ready to go VAs of certain sizes.
+ * An index in the pool-array corresponds to number of pages + 1.
+ */
+#define MAX_VA_SIZE_PAGES 256
+
struct vmap_pool {
struct list_head head;
unsigned long len;
};
/*
- * A fast size storage contains VAs up to 1M size.
+ * An effective vmap-node logic. Users make use of nodes instead
+ * of a global heap. It allows to balance an access and mitigate
+ * contention.
*/
-#define MAX_VA_SIZE_PAGES 256
-
static struct vmap_node {
/* Simple size segregated storage. */
struct vmap_pool pool[MAX_VA_SIZE_PAGES];
@@ -803,6 +811,11 @@ static struct vmap_node {
unsigned long nr_purged;
} single;
+/*
+ * Initial setup consists of one single node, i.e. a balancing
+ * is fully disabled. Later on, after vmap is initialized these
+ * parameters are updated based on a system capacity.
+ */
static struct vmap_node *vmap_nodes = &single;
static __read_mostly unsigned int nr_vmap_nodes = 1;
static __read_mostly unsigned int vmap_zone_size = 1;
@@ -2048,7 +2061,12 @@ decay_va_pool_node(struct vmap_node *vn, bool full_decay)
}
}
- /* Attach the pool back if it has been partly decayed. */
+ /*
+ * Attach the pool back if it has been partly decayed.
+ * Please note, it is supposed that nobody(other contexts)
+ * can populate the pool therefore a simple list replace
+ * operation takes place here.
+ */
if (!full_decay && !list_empty(&tmp_list)) {
spin_lock(&vn->pool_lock);
list_replace_init(&tmp_list, &vn->pool[i].head);
@@ -2257,16 +2275,14 @@ struct vmap_area *find_vmap_area(unsigned long addr)
* An addr_to_node_id(addr) converts an address to a node index
* where a VA is located. If VA spans several zones and passed
* addr is not the same as va->va_start, what is not common, we
- * may need to scan an extra nodes. See an example:
+ * may need to scan extra nodes. See an example:
*
- * <--va-->
+ * <----va---->
* -|-----|-----|-----|-----|-
* 1 2 0 1
*
- * VA resides in node 1 whereas it spans 1 and 2. If passed
- * addr is within a second node we should do extra work. We
- * should mention that it is rare and is a corner case from
- * the other hand it has to be covered.
+ * VA resides in node 1 whereas it spans 1, 2 an 0. If passed
+ * addr is within 2 or 0 nodes we should do extra work.
*/
i = j = addr_to_node_id(addr);
do {
@@ -2289,6 +2305,9 @@ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
struct vmap_area *va;
int i, j;
+ /*
+ * Check the comment in the find_vmap_area() about the loop.
+ */
i = j = addr_to_node_id(addr);
do {
vn = &vmap_nodes[i];
@@ -4882,7 +4901,20 @@ static void vmap_init_nodes(void)
int i, n;
#if BITS_PER_LONG == 64
- /* A high threshold of max nodes is fixed and bound to 128. */
+ /*
+ * A high threshold of max nodes is fixed and bound to 128,
+ * thus a scale factor is 1 for systems where number of cores
+ * are less or equal to specified threshold.
+ *
+ * As for NUMA-aware notes. For bigger systems, for example
+ * NUMA with multi-sockets, where we can end-up with thousands
+ * of cores in total, a "sub-numa-clustering" should be added.
+ *
+ * In this case a NUMA domain is considered as a single entity
+ * with dedicated sub-nodes in it which describe one group or
+ * set of cores. Therefore a per-domain purging is supposed to
+ * be added as well as a per-domain balancing.
+ */
n = clamp_t(unsigned int, num_possible_cpus(), 1, 128);
if (n > 1) {
--
2.39.2
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 2/2] mm: vmalloc: Refactor vmalloc_dump_obj() function
2024-01-24 18:09 [PATCH 1/2] mm: vmalloc: Improve description of vmap node layer Uladzislau Rezki (Sony)
@ 2024-01-24 18:09 ` Uladzislau Rezki (Sony)
2024-01-30 18:50 ` Lorenzo Stoakes
2024-01-30 18:44 ` [PATCH 1/2] mm: vmalloc: Improve description of vmap node layer Lorenzo Stoakes
1 sibling, 1 reply; 5+ messages in thread
From: Uladzislau Rezki (Sony) @ 2024-01-24 18:09 UTC (permalink / raw)
To: linux-mm, Andrew Morton
Cc: LKML, Baoquan He, Lorenzo Stoakes, Christoph Hellwig,
Matthew Wilcox, Dave Chinner, Uladzislau Rezki,
Oleksiy Avramchenko
This patch tends to simplify the function in question,
by removing an extra stack "objp" variable, returning
back to an early exit approach if spin_trylock() fails
or VA was not found.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
mm/vmalloc.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b8be601b056d..449f45b0e474 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4696,34 +4696,35 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
#ifdef CONFIG_PRINTK
bool vmalloc_dump_obj(void *object)
{
- void *objp = (void *)PAGE_ALIGN((unsigned long)object);
const void *caller;
+ struct vm_struct *vm;
struct vmap_area *va;
struct vmap_node *vn;
unsigned long addr;
unsigned int nr_pages;
- bool success = false;
-
- vn = addr_to_node((unsigned long)objp);
- if (spin_trylock(&vn->busy.lock)) {
- va = __find_vmap_area((unsigned long)objp, &vn->busy.root);
+ addr = PAGE_ALIGN((unsigned long) object);
+ vn = addr_to_node(addr);
- if (va && va->vm) {
- addr = (unsigned long)va->vm->addr;
- caller = va->vm->caller;
- nr_pages = va->vm->nr_pages;
- success = true;
- }
+ if (!spin_trylock(&vn->busy.lock))
+ return false;
+ va = __find_vmap_area(addr, &vn->busy.root);
+ if (!va || !va->vm) {
spin_unlock(&vn->busy.lock);
+ return false;
}
- if (success)
- pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
- nr_pages, addr, caller);
+ vm = va->vm;
+ addr = (unsigned long) vm->addr;
+ caller = vm->caller;
+ nr_pages = vm->nr_pages;
+ spin_unlock(&vn->busy.lock);
- return success;
+ pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
+ nr_pages, addr, caller);
+
+ return true;
}
#endif
--
2.39.2
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH 2/2] mm: vmalloc: Refactor vmalloc_dump_obj() function
2024-01-24 18:09 ` [PATCH 2/2] mm: vmalloc: Refactor vmalloc_dump_obj() function Uladzislau Rezki (Sony)
@ 2024-01-30 18:50 ` Lorenzo Stoakes
2024-01-31 9:49 ` Uladzislau Rezki
0 siblings, 1 reply; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-01-30 18:50 UTC (permalink / raw)
To: Uladzislau Rezki (Sony)
Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Christoph Hellwig,
Matthew Wilcox, Dave Chinner, Oleksiy Avramchenko
On Wed, Jan 24, 2024 at 07:09:20PM +0100, Uladzislau Rezki (Sony) wrote:
> This patch tends to simplify the function in question,
> by removing an extra stack "objp" variable, returning
> back to an early exit approach if spin_trylock() fails
> or VA was not found.
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
> mm/vmalloc.c | 33 +++++++++++++++++----------------
> 1 file changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index b8be601b056d..449f45b0e474 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4696,34 +4696,35 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> #ifdef CONFIG_PRINTK
> bool vmalloc_dump_obj(void *object)
> {
> - void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> const void *caller;
> + struct vm_struct *vm;
> struct vmap_area *va;
> struct vmap_node *vn;
> unsigned long addr;
> unsigned int nr_pages;
> - bool success = false;
> -
> - vn = addr_to_node((unsigned long)objp);
>
> - if (spin_trylock(&vn->busy.lock)) {
> - va = __find_vmap_area((unsigned long)objp, &vn->busy.root);
> + addr = PAGE_ALIGN((unsigned long) object);
> + vn = addr_to_node(addr);
>
> - if (va && va->vm) {
> - addr = (unsigned long)va->vm->addr;
> - caller = va->vm->caller;
> - nr_pages = va->vm->nr_pages;
> - success = true;
> - }
> + if (!spin_trylock(&vn->busy.lock))
> + return false;
>
> + va = __find_vmap_area(addr, &vn->busy.root);
> + if (!va || !va->vm) {
> spin_unlock(&vn->busy.lock);
> + return false;
> }
>
> - if (success)
> - pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> - nr_pages, addr, caller);
> + vm = va->vm;
> + addr = (unsigned long) vm->addr;
Hmm not so nice to reuse addr here for something different, might be nice
to have separate obj_addr and vm_addr or something. But it's not critical!
> + caller = vm->caller;
> + nr_pages = vm->nr_pages;
> + spin_unlock(&vn->busy.lock);
>
> - return success;
> + pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> + nr_pages, addr, caller);
> +
> + return true;
> }
> #endif
>
> --
> 2.39.2
>
Other than the nit, which I don't insist on, this is a big improvement so,
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH 2/2] mm: vmalloc: Refactor vmalloc_dump_obj() function
2024-01-30 18:50 ` Lorenzo Stoakes
@ 2024-01-31 9:49 ` Uladzislau Rezki
0 siblings, 0 replies; 5+ messages in thread
From: Uladzislau Rezki @ 2024-01-31 9:49 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Uladzislau Rezki (Sony),
linux-mm, Andrew Morton, LKML, Baoquan He, Christoph Hellwig,
Matthew Wilcox, Dave Chinner, Oleksiy Avramchenko
On Tue, Jan 30, 2024 at 06:50:48PM +0000, Lorenzo Stoakes wrote:
> On Wed, Jan 24, 2024 at 07:09:20PM +0100, Uladzislau Rezki (Sony) wrote:
> > This patch tends to simplify the function in question,
> > by removing an extra stack "objp" variable, returning
> > back to an early exit approach if spin_trylock() fails
> > or VA was not found.
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> > mm/vmalloc.c | 33 +++++++++++++++++----------------
> > 1 file changed, 17 insertions(+), 16 deletions(-)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index b8be601b056d..449f45b0e474 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -4696,34 +4696,35 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > #ifdef CONFIG_PRINTK
> > bool vmalloc_dump_obj(void *object)
> > {
> > - void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > const void *caller;
> > + struct vm_struct *vm;
> > struct vmap_area *va;
> > struct vmap_node *vn;
> > unsigned long addr;
> > unsigned int nr_pages;
> > - bool success = false;
> > -
> > - vn = addr_to_node((unsigned long)objp);
> >
> > - if (spin_trylock(&vn->busy.lock)) {
> > - va = __find_vmap_area((unsigned long)objp, &vn->busy.root);
> > + addr = PAGE_ALIGN((unsigned long) object);
> > + vn = addr_to_node(addr);
> >
> > - if (va && va->vm) {
> > - addr = (unsigned long)va->vm->addr;
> > - caller = va->vm->caller;
> > - nr_pages = va->vm->nr_pages;
> > - success = true;
> > - }
> > + if (!spin_trylock(&vn->busy.lock))
> > + return false;
> >
> > + va = __find_vmap_area(addr, &vn->busy.root);
> > + if (!va || !va->vm) {
> > spin_unlock(&vn->busy.lock);
> > + return false;
> > }
> >
> > - if (success)
> > - pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> > - nr_pages, addr, caller);
> > + vm = va->vm;
> > + addr = (unsigned long) vm->addr;
>
> Hmm not so nice to reuse addr here for something different, might be nice
> to have separate obj_addr and vm_addr or something. But it's not critical!
>
> > + caller = vm->caller;
> > + nr_pages = vm->nr_pages;
> > + spin_unlock(&vn->busy.lock);
> >
> > - return success;
> > + pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> > + nr_pages, addr, caller);
> > +
> > + return true;
> > }
> > #endif
> >
> > --
> > 2.39.2
> >
>
> Other than the nit, which I don't insist on, this is a big improvement so,
>
> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
>
Thanks!
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] mm: vmalloc: Improve description of vmap node layer
2024-01-24 18:09 [PATCH 1/2] mm: vmalloc: Improve description of vmap node layer Uladzislau Rezki (Sony)
2024-01-24 18:09 ` [PATCH 2/2] mm: vmalloc: Refactor vmalloc_dump_obj() function Uladzislau Rezki (Sony)
@ 2024-01-30 18:44 ` Lorenzo Stoakes
1 sibling, 0 replies; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-01-30 18:44 UTC (permalink / raw)
To: Uladzislau Rezki (Sony)
Cc: linux-mm, Andrew Morton, LKML, Baoquan He, Christoph Hellwig,
Matthew Wilcox, Dave Chinner, Oleksiy Avramchenko
On Wed, Jan 24, 2024 at 07:09:19PM +0100, Uladzislau Rezki (Sony) wrote:
> This patch adds extra explanation of recently added vmap
> node layer based on community feedback. No functional change.
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
> mm/vmalloc.c | 60 ++++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 46 insertions(+), 14 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 257981e37936..b8be601b056d 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -765,9 +765,10 @@ static struct rb_root free_vmap_area_root = RB_ROOT;
> static DEFINE_PER_CPU(struct vmap_area *, ne_fit_preload_node);
>
> /*
> - * An effective vmap-node logic. Users make use of nodes instead
> - * of a global heap. It allows to balance an access and mitigate
> - * contention.
> + * This structure defines a single, solid model where a list and
> + * rb-tree are part of one entity protected by the lock. Nodes are
> + * sorted in ascending order, thus for O(1) access to left/right
> + * neighbors a list is used as well as for sequential traversal.
> */
> struct rb_list {
> struct rb_root root;
> @@ -775,16 +776,23 @@ struct rb_list {
> spinlock_t lock;
> };
>
> +/*
> + * A fast size storage contains VAs up to 1M size. A pool consists
> + * of linked between each other ready to go VAs of certain sizes.
> + * An index in the pool-array corresponds to number of pages + 1.
> + */
> +#define MAX_VA_SIZE_PAGES 256
> +
> struct vmap_pool {
> struct list_head head;
> unsigned long len;
> };
>
> /*
> - * A fast size storage contains VAs up to 1M size.
> + * An effective vmap-node logic. Users make use of nodes instead
> + * of a global heap. It allows to balance an access and mitigate
> + * contention.
> */
> -#define MAX_VA_SIZE_PAGES 256
> -
> static struct vmap_node {
> /* Simple size segregated storage. */
> struct vmap_pool pool[MAX_VA_SIZE_PAGES];
> @@ -803,6 +811,11 @@ static struct vmap_node {
> unsigned long nr_purged;
> } single;
>
> +/*
> + * Initial setup consists of one single node, i.e. a balancing
> + * is fully disabled. Later on, after vmap is initialized these
> + * parameters are updated based on a system capacity.
> + */
> static struct vmap_node *vmap_nodes = &single;
> static __read_mostly unsigned int nr_vmap_nodes = 1;
> static __read_mostly unsigned int vmap_zone_size = 1;
> @@ -2048,7 +2061,12 @@ decay_va_pool_node(struct vmap_node *vn, bool full_decay)
> }
> }
>
> - /* Attach the pool back if it has been partly decayed. */
> + /*
> + * Attach the pool back if it has been partly decayed.
> + * Please note, it is supposed that nobody(other contexts)
> + * can populate the pool therefore a simple list replace
> + * operation takes place here.
> + */
> if (!full_decay && !list_empty(&tmp_list)) {
> spin_lock(&vn->pool_lock);
> list_replace_init(&tmp_list, &vn->pool[i].head);
> @@ -2257,16 +2275,14 @@ struct vmap_area *find_vmap_area(unsigned long addr)
> * An addr_to_node_id(addr) converts an address to a node index
> * where a VA is located. If VA spans several zones and passed
> * addr is not the same as va->va_start, what is not common, we
> - * may need to scan an extra nodes. See an example:
> + * may need to scan extra nodes. See an example:
> *
> - * <--va-->
> + * <----va---->
> * -|-----|-----|-----|-----|-
> * 1 2 0 1
> *
> - * VA resides in node 1 whereas it spans 1 and 2. If passed
> - * addr is within a second node we should do extra work. We
> - * should mention that it is rare and is a corner case from
> - * the other hand it has to be covered.
> + * VA resides in node 1 whereas it spans 1, 2 an 0. If passed
> + * addr is within 2 or 0 nodes we should do extra work.
> */
> i = j = addr_to_node_id(addr);
> do {
> @@ -2289,6 +2305,9 @@ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
> struct vmap_area *va;
> int i, j;
>
> + /*
> + * Check the comment in the find_vmap_area() about the loop.
> + */
> i = j = addr_to_node_id(addr);
> do {
> vn = &vmap_nodes[i];
> @@ -4882,7 +4901,20 @@ static void vmap_init_nodes(void)
> int i, n;
>
> #if BITS_PER_LONG == 64
> - /* A high threshold of max nodes is fixed and bound to 128. */
> + /*
> + * A high threshold of max nodes is fixed and bound to 128,
> + * thus a scale factor is 1 for systems where number of cores
> + * are less or equal to specified threshold.
> + *
> + * As for NUMA-aware notes. For bigger systems, for example
> + * NUMA with multi-sockets, where we can end-up with thousands
> + * of cores in total, a "sub-numa-clustering" should be added.
> + *
> + * In this case a NUMA domain is considered as a single entity
> + * with dedicated sub-nodes in it which describe one group or
> + * set of cores. Therefore a per-domain purging is supposed to
> + * be added as well as a per-domain balancing.
> + */
> n = clamp_t(unsigned int, num_possible_cpus(), 1, 128);
>
> if (n > 1) {
> --
> 2.39.2
>
Looks good to me (sorry for delay, busy with many things in life :)! Feel free to add:
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-31 9:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24 18:09 [PATCH 1/2] mm: vmalloc: Improve description of vmap node layer Uladzislau Rezki (Sony)
2024-01-24 18:09 ` [PATCH 2/2] mm: vmalloc: Refactor vmalloc_dump_obj() function Uladzislau Rezki (Sony)
2024-01-30 18:50 ` Lorenzo Stoakes
2024-01-31 9:49 ` Uladzislau Rezki
2024-01-30 18:44 ` [PATCH 1/2] mm: vmalloc: Improve description of vmap node layer Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox