* [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements
@ 2023-05-25 12:57 Thomas Gleixner
2023-05-25 12:57 ` [V2 patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
` (6 more replies)
0 siblings, 7 replies; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-25 12:57 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
This an update of version 1 which can be found here:
https://lore.kernel.org/mm/20230523135902.517032811@linutronix.de
Following up to the discussion about excessive TLB flushes
https://lore.kernel.org/all/87a5y5a6kj.ffs@tglx
this series addresses the following issues:
1) Prevent the stale TLB problem related to fully utilized vmap blocks
2) Avoid the double per CPU list walk in _vm_unmap_aliases()
3) Avoid flushing dirty space over and over
4) Add a lockless quickcheck in vb_alloc() and add missing
READ/WRITE_ONCE() annotations
5) Prevent overeager purging of usable vmap_blocks if
not under memory/address space pressure.
Changes vs. version 1:
- Coding style issues
- Remove unneccesary force_purge arguments
- Amend changelog of 1/6
- Pick up Reviewed-by tags where applicable
Thanks,
tglx
^ permalink raw reply [flat|nested] 17+ messages in thread
* [V2 patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks
2023-05-25 12:57 [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
@ 2023-05-25 12:57 ` Thomas Gleixner
2023-05-25 12:57 ` [V2 patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice Thomas Gleixner
` (5 subsequent siblings)
6 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-25 12:57 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
_vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a
page are left in the system. This is required due to the lazy TLB flush
mechanism in vmalloc.
This is tried to achieve by walking the per CPU free lists, but those do
not contain fully utilized vmap blocks because they are removed from the
free list once the blocks free space became zero.
When the block is not fully unmapped then it is not on the purge list
either.
So neither the per CPU list iteration nor the purge list walk find the
block and if the page was mapped via such a block and the TLB has not yet
been flushed, the guarantee of _vm_unmap_aliases() that there are no stale
TLBs after returning is broken:
x = vb_alloc() // Removes vmap_block from free list because vb->free became 0
vb_free(x) // Unmaps page and marks in dirty_min/max range
// Block has still mappings and is not put on purge list
// Page is reused
vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL
So instead of walking the per CPU free lists, walk the per CPU xarrays
which hold pointers to _all_ active blocks in the system including those
removed from the free lists.
Fixes: db64fe02258f ("mm: rewrite vmap layer")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
V2: Amended changelog so it's clear that the block is not on purge list either.
---
mm/vmalloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l
for_each_possible_cpu(cpu) {
struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
struct vmap_block *vb;
+ unsigned long idx;
rcu_read_lock();
- list_for_each_entry_rcu(vb, &vbq->free, free_list) {
+ xa_for_each(&vbq->vmap_blocks, idx, vb) {
spin_lock(&vb->lock);
if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
unsigned long va_start = vb->va->va_start;
^ permalink raw reply [flat|nested] 17+ messages in thread
* [V2 patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice
2023-05-25 12:57 [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
2023-05-25 12:57 ` [V2 patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
@ 2023-05-25 12:57 ` Thomas Gleixner
2023-05-26 7:55 ` Christoph Hellwig
2023-05-27 17:38 ` Lorenzo Stoakes
2023-05-25 12:57 ` [V2 patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
` (4 subsequent siblings)
6 siblings, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-25 12:57 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
_vunmap_aliases() walks the per CPU xarrays to find partially unmapped
blocks and then walks the per cpu free lists to purge fragmented blocks.
Arguably that's waste of CPU cycles and cache lines as the full xarray walk
already touches every block.
Avoid this double iteration:
- Split out the code to purge one block and the code to free the local
purge list into helper functions.
- Try to purge the fragmented blocks in the xarray walk before looking at
their dirty space.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fix coding style issues - Christoph
---
mm/vmalloc.c | 70 ++++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 46 insertions(+), 24 deletions(-)
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2086,39 +2086,54 @@ static void free_vmap_block(struct vmap_
kfree_rcu(vb, rcu_head);
}
+static bool purge_fragmented_block(struct vmap_block *vb,
+ struct vmap_block_queue *vbq, struct list_head *purge_list)
+{
+ if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
+ vb->dirty == VMAP_BBMAP_BITS)
+ return false;
+
+ /* prevent further allocs after releasing lock */
+ vb->free = 0;
+ /* prevent purging it again */
+ vb->dirty = VMAP_BBMAP_BITS;
+ vb->dirty_min = 0;
+ vb->dirty_max = VMAP_BBMAP_BITS;
+ spin_lock(&vbq->lock);
+ list_del_rcu(&vb->free_list);
+ spin_unlock(&vbq->lock);
+ list_add_tail(&vb->purge, purge_list);
+ return true;
+}
+
+static void free_purged_blocks(struct list_head *purge_list)
+{
+ struct vmap_block *vb, *n_vb;
+
+ list_for_each_entry_safe(vb, n_vb, purge_list, purge) {
+ list_del(&vb->purge);
+ free_vmap_block(vb);
+ }
+}
+
static void purge_fragmented_blocks(int cpu)
{
LIST_HEAD(purge);
struct vmap_block *vb;
- struct vmap_block *n_vb;
struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
rcu_read_lock();
list_for_each_entry_rcu(vb, &vbq->free, free_list) {
-
- if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
+ if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
+ vb->dirty == VMAP_BBMAP_BITS)
continue;
spin_lock(&vb->lock);
- if (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) {
- vb->free = 0; /* prevent further allocs after releasing lock */
- vb->dirty = VMAP_BBMAP_BITS; /* prevent purging it again */
- vb->dirty_min = 0;
- vb->dirty_max = VMAP_BBMAP_BITS;
- spin_lock(&vbq->lock);
- list_del_rcu(&vb->free_list);
- spin_unlock(&vbq->lock);
- spin_unlock(&vb->lock);
- list_add_tail(&vb->purge, &purge);
- } else
- spin_unlock(&vb->lock);
+ purge_fragmented_block(vb, vbq, &purge);
+ spin_unlock(&vb->lock);
}
rcu_read_unlock();
-
- list_for_each_entry_safe(vb, n_vb, &purge, purge) {
- list_del(&vb->purge);
- free_vmap_block(vb);
- }
+ free_purged_blocks(&purge);
}
static void purge_fragmented_blocks_allcpus(void)
@@ -2226,12 +2241,13 @@ static void vb_free(unsigned long addr,
static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
{
+ LIST_HEAD(purge_list);
int cpu;
if (unlikely(!vmap_initialized))
return;
- might_sleep();
+ mutex_lock(&vmap_purge_lock);
for_each_possible_cpu(cpu) {
struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
@@ -2241,7 +2257,14 @@ static void _vm_unmap_aliases(unsigned l
rcu_read_lock();
xa_for_each(&vbq->vmap_blocks, idx, vb) {
spin_lock(&vb->lock);
- if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
+
+ /*
+ * Try to purge a fragmented block first. If it's
+ * not purgeable, check whether there is dirty
+ * space to be flushed.
+ */
+ if (!purge_fragmented_block(vb, vbq, &purge_list) &&
+ vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
unsigned long va_start = vb->va->va_start;
unsigned long s, e;
@@ -2257,9 +2280,8 @@ static void _vm_unmap_aliases(unsigned l
}
rcu_read_unlock();
}
+ free_purged_blocks(&purge_list);
- mutex_lock(&vmap_purge_lock);
- purge_fragmented_blocks_allcpus();
if (!__purge_vmap_area_lazy(start, end) && flush)
flush_tlb_kernel_range(start, end);
mutex_unlock(&vmap_purge_lock);
^ permalink raw reply [flat|nested] 17+ messages in thread
* [V2 patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over
2023-05-25 12:57 [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
2023-05-25 12:57 ` [V2 patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
2023-05-25 12:57 ` [V2 patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice Thomas Gleixner
@ 2023-05-25 12:57 ` Thomas Gleixner
2023-05-26 7:56 ` Christoph Hellwig
2023-05-27 18:19 ` Lorenzo Stoakes
2023-05-25 12:57 ` [V2 patch 4/6] mm/vmalloc: Check free space in vmap_block lockless Thomas Gleixner
` (3 subsequent siblings)
6 siblings, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-25 12:57 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
vmap blocks which have active mappings cannot be purged. Allocations which
have been freed are accounted for in vmap_block::dirty_min/max, so that
they can be detected in _vm_unmap_aliases() as potentially stale TLBs.
If there are several invocations of _vm_unmap_aliases() then each of them
will flush the dirty range. That's pointless and just increases the
probability of full TLB flushes.
Avoid that by resetting the flush range after accounting for it. That's
safe versus other invocations of _vm_unmap_aliases() because this is all
serialized with vmap_purge_lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
---
mm/vmalloc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2226,7 +2226,7 @@ static void vb_free(unsigned long addr,
spin_lock(&vb->lock);
- /* Expand dirty range */
+ /* Expand the not yet TLB flushed dirty range */
vb->dirty_min = min(vb->dirty_min, offset);
vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
@@ -2264,7 +2264,7 @@ static void _vm_unmap_aliases(unsigned l
* space to be flushed.
*/
if (!purge_fragmented_block(vb, vbq, &purge_list) &&
- vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
+ vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
unsigned long va_start = vb->va->va_start;
unsigned long s, e;
@@ -2274,6 +2274,10 @@ static void _vm_unmap_aliases(unsigned l
start = min(s, start);
end = max(e, end);
+ /* Prevent that this is flushed again */
+ vb->dirty_min = VMAP_BBMAP_BITS;
+ vb->dirty_max = 0;
+
flush = 1;
}
spin_unlock(&vb->lock);
^ permalink raw reply [flat|nested] 17+ messages in thread
* [V2 patch 4/6] mm/vmalloc: Check free space in vmap_block lockless
2023-05-25 12:57 [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
` (2 preceding siblings ...)
2023-05-25 12:57 ` [V2 patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
@ 2023-05-25 12:57 ` Thomas Gleixner
2023-05-26 7:56 ` Christoph Hellwig
2023-05-27 18:27 ` Lorenzo Stoakes
2023-05-25 12:57 ` [V2 patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations Thomas Gleixner
` (2 subsequent siblings)
6 siblings, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-25 12:57 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
vb_alloc() unconditionally locks a vmap_block on the free list to check the
free space.
This can be done locklessly because vmap_block::free never increases, it's
only decreased on allocations.
Check the free space lockless and only if that succeeds, recheck under the
lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
mm/vmalloc.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2168,6 +2168,9 @@ static void *vb_alloc(unsigned long size
list_for_each_entry_rcu(vb, &vbq->free, free_list) {
unsigned long pages_off;
+ if (READ_ONCE(vb->free) < (1UL << order))
+ continue;
+
spin_lock(&vb->lock);
if (vb->free < (1UL << order)) {
spin_unlock(&vb->lock);
@@ -2176,7 +2179,7 @@ static void *vb_alloc(unsigned long size
pages_off = VMAP_BBMAP_BITS - vb->free;
vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
- vb->free -= 1UL << order;
+ WRITE_ONCE(vb->free, vb->free - (1UL << order));
bitmap_set(vb->used_map, pages_off, (1UL << order));
if (vb->free == 0) {
spin_lock(&vbq->lock);
^ permalink raw reply [flat|nested] 17+ messages in thread
* [V2 patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations
2023-05-25 12:57 [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
` (3 preceding siblings ...)
2023-05-25 12:57 ` [V2 patch 4/6] mm/vmalloc: Check free space in vmap_block lockless Thomas Gleixner
@ 2023-05-25 12:57 ` Thomas Gleixner
2023-05-26 7:56 ` Christoph Hellwig
2023-05-27 19:06 ` Lorenzo Stoakes
2023-05-25 12:57 ` [V2 patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily Thomas Gleixner
2023-05-27 10:21 ` [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Baoquan He
6 siblings, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-25 12:57 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
purge_fragmented_blocks() accesses vmap_block::free and vmap_block::dirty
lockless for a quick check.
Add the missing READ/WRITE_ONCE() annotations.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
mm/vmalloc.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2094,9 +2094,9 @@ static bool purge_fragmented_block(struc
return false;
/* prevent further allocs after releasing lock */
- vb->free = 0;
+ WRITE_ONCE(vb->free, 0);
/* prevent purging it again */
- vb->dirty = VMAP_BBMAP_BITS;
+ WRITE_ONCE(vb->dirty, VMAP_BBMAP_BITS);
vb->dirty_min = 0;
vb->dirty_max = VMAP_BBMAP_BITS;
spin_lock(&vbq->lock);
@@ -2124,8 +2124,11 @@ static void purge_fragmented_blocks(int
rcu_read_lock();
list_for_each_entry_rcu(vb, &vbq->free, free_list) {
- if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
- vb->dirty == VMAP_BBMAP_BITS)
+ unsigned long free = READ_ONCE(vb->free);
+ unsigned long dirty = READ_ONCE(vb->dirty);
+
+ if (free + dirty != VMAP_BBMAP_BITS ||
+ dirty == VMAP_BBMAP_BITS)
continue;
spin_lock(&vb->lock);
@@ -2233,7 +2236,7 @@ static void vb_free(unsigned long addr,
vb->dirty_min = min(vb->dirty_min, offset);
vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
- vb->dirty += 1UL << order;
+ WRITE_ONCE(vb->dirty, vb->dirty + (1UL << order));
if (vb->dirty == VMAP_BBMAP_BITS) {
BUG_ON(vb->free);
spin_unlock(&vb->lock);
^ permalink raw reply [flat|nested] 17+ messages in thread
* [V2 patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily
2023-05-25 12:57 [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
` (4 preceding siblings ...)
2023-05-25 12:57 ` [V2 patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations Thomas Gleixner
@ 2023-05-25 12:57 ` Thomas Gleixner
2023-05-27 19:28 ` Lorenzo Stoakes
2023-05-27 10:21 ` [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Baoquan He
6 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-25 12:57 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
Purging fragmented blocks is done unconditionally in several contexts:
1) From drain_vmap_area_work(), when the number of lazy to be freed
vmap_areas reached the threshold
2) Reclaiming vmalloc address space from pcpu_get_vm_areas()
3) _vm_unmap_aliases()
#1 There is no reason to zap fragmented vmap blocks unconditionally, simply
because reclaiming all lazy areas drains at least
32MB * fls(num_online_cpus())
per invocation which is plenty.
#2 Reclaiming when running out of space or due to memory pressure makes a
lot of sense
#3 _unmap_aliases() requires to touch everything because the caller has no
clue which vmap_area used a particular page last and the vmap_area lost
that information too.
Except for the vfree + VM_FLUSH_RESET_PERMS case, which removes the
vmap area first and then cares about the flush. That in turn requires
a full walk of _all_ vmap areas including the one which was just
added to the purge list.
But as this has to be flushed anyway this is an opportunity to combine
outstanding TLB flushes and do the housekeeping of purging freed areas,
but like #1 there is no real good reason to zap usable vmap blocks
unconditionally.
Add a @force_purge argument to the newly split out block purge function and
if not true only purge fragmented blocks which have less than 1/4 of their
capacity left.
Rename purge_vmap_area_lazy() to reclaim_and_purge_vmap_areas() to make it
clear what the function does.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Add the missing force_purge argument in _vm_unmap_aliases()
Remove force_purge argument from the reclaim path - Baoquan
---
mm/vmalloc.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -791,7 +791,7 @@ get_subtree_max_size(struct rb_node *nod
RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb,
struct vmap_area, rb_node, unsigned long, subtree_max_size, va_size)
-static void purge_vmap_area_lazy(void);
+static void reclaim_and_purge_vmap_areas(void);
static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
static void drain_vmap_area_work(struct work_struct *work);
static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work);
@@ -1649,7 +1649,7 @@ static struct vmap_area *alloc_vmap_area
overflow:
if (!purged) {
- purge_vmap_area_lazy();
+ reclaim_and_purge_vmap_areas();
purged = 1;
goto retry;
}
@@ -1785,9 +1785,10 @@ static bool __purge_vmap_area_lazy(unsig
}
/*
- * Kick off a purge of the outstanding lazy areas.
+ * Reclaim vmap areas by purging fragmented blocks and purge_vmap_area_list.
*/
-static void purge_vmap_area_lazy(void)
+static void reclaim_and_purge_vmap_areas(void)
+
{
mutex_lock(&vmap_purge_lock);
purge_fragmented_blocks_allcpus();
@@ -1908,6 +1909,12 @@ static struct vmap_area *find_unlink_vma
#define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE)
+/*
+ * Purge threshold to prevent overeager purging of fragmented blocks for
+ * regular operations: Purge if vb->free is less than 1/4 of the capacity.
+ */
+#define VMAP_PURGE_THRESHOLD (VMAP_BBMAP_BITS / 4)
+
#define VMAP_RAM 0x1 /* indicates vm_map_ram area*/
#define VMAP_BLOCK 0x2 /* mark out the vmap_block sub-type*/
#define VMAP_FLAGS_MASK 0x3
@@ -2087,12 +2094,17 @@ static void free_vmap_block(struct vmap_
}
static bool purge_fragmented_block(struct vmap_block *vb,
- struct vmap_block_queue *vbq, struct list_head *purge_list)
+ struct vmap_block_queue *vbq, struct list_head *purge_list,
+ bool force_purge)
{
if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
vb->dirty == VMAP_BBMAP_BITS)
return false;
+ /* Don't overeagerly purge usable blocks unless requested */
+ if (!force_purge && vb->free < VMAP_PURGE_THRESHOLD)
+ return false;
+
/* prevent further allocs after releasing lock */
WRITE_ONCE(vb->free, 0);
/* prevent purging it again */
@@ -2132,7 +2144,7 @@ static void purge_fragmented_blocks(int
continue;
spin_lock(&vb->lock);
- purge_fragmented_block(vb, vbq, &purge);
+ purge_fragmented_block(vb, vbq, &purge, true);
spin_unlock(&vb->lock);
}
rcu_read_unlock();
@@ -2269,7 +2281,7 @@ static void _vm_unmap_aliases(unsigned l
* not purgeable, check whether there is dirty
* space to be flushed.
*/
- if (!purge_fragmented_block(vb, vbq, &purge_list) &&
+ if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
unsigned long va_start = vb->va->va_start;
unsigned long s, e;
@@ -4175,7 +4187,7 @@ struct vm_struct **pcpu_get_vm_areas(con
overflow:
spin_unlock(&free_vmap_area_lock);
if (!purged) {
- purge_vmap_area_lazy();
+ reclaim_and_purge_vmap_areas();
purged = true;
/* Before "retry", check if we recover. */
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice
2023-05-25 12:57 ` [V2 patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice Thomas Gleixner
@ 2023-05-26 7:55 ` Christoph Hellwig
2023-05-27 17:38 ` Lorenzo Stoakes
1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-26 7:55 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over
2023-05-25 12:57 ` [V2 patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
@ 2023-05-26 7:56 ` Christoph Hellwig
2023-05-27 18:19 ` Lorenzo Stoakes
1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-26 7:56 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 4/6] mm/vmalloc: Check free space in vmap_block lockless
2023-05-25 12:57 ` [V2 patch 4/6] mm/vmalloc: Check free space in vmap_block lockless Thomas Gleixner
@ 2023-05-26 7:56 ` Christoph Hellwig
2023-05-27 18:27 ` Lorenzo Stoakes
1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-26 7:56 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations
2023-05-25 12:57 ` [V2 patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations Thomas Gleixner
@ 2023-05-26 7:56 ` Christoph Hellwig
2023-05-27 19:06 ` Lorenzo Stoakes
1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-26 7:56 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra, Baoquan He
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements
2023-05-25 12:57 [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
` (5 preceding siblings ...)
2023-05-25 12:57 ` [V2 patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily Thomas Gleixner
@ 2023-05-27 10:21 ` Baoquan He
6 siblings, 0 replies; 17+ messages in thread
From: Baoquan He @ 2023-05-27 10:21 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Lorenzo Stoakes, Peter Zijlstra
On 05/25/23 at 02:57pm, Thomas Gleixner wrote:
> This an update of version 1 which can be found here:
>
> https://lore.kernel.org/mm/20230523135902.517032811@linutronix.de
>
> Following up to the discussion about excessive TLB flushes
>
> https://lore.kernel.org/all/87a5y5a6kj.ffs@tglx
>
> this series addresses the following issues:
>
> 1) Prevent the stale TLB problem related to fully utilized vmap blocks
>
> 2) Avoid the double per CPU list walk in _vm_unmap_aliases()
>
> 3) Avoid flushing dirty space over and over
>
> 4) Add a lockless quickcheck in vb_alloc() and add missing
> READ/WRITE_ONCE() annotations
>
> 5) Prevent overeager purging of usable vmap_blocks if
> not under memory/address space pressure.
>
> Changes vs. version 1:
>
> - Coding style issues
> - Remove unneccesary force_purge arguments
> - Amend changelog of 1/6
> - Pick up Reviewed-by tags where applicable
The series looks good to me, thx.
Reviewed-by: Baoquan He <bhe@redhat.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice
2023-05-25 12:57 ` [V2 patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice Thomas Gleixner
2023-05-26 7:55 ` Christoph Hellwig
@ 2023-05-27 17:38 ` Lorenzo Stoakes
1 sibling, 0 replies; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-05-27 17:38 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Peter Zijlstra, Baoquan He
On Thu, May 25, 2023 at 02:57:04PM +0200, Thomas Gleixner wrote:
> _vunmap_aliases() walks the per CPU xarrays to find partially unmapped
> blocks and then walks the per cpu free lists to purge fragmented blocks.
>
> Arguably that's waste of CPU cycles and cache lines as the full xarray walk
> already touches every block.
>
> Avoid this double iteration:
>
> - Split out the code to purge one block and the code to free the local
> purge list into helper functions.
>
> - Try to purge the fragmented blocks in the xarray walk before looking at
> their dirty space.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Fix coding style issues - Christoph
> ---
> mm/vmalloc.c | 70 ++++++++++++++++++++++++++++++++++++++---------------------
> 1 file changed, 46 insertions(+), 24 deletions(-)
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2086,39 +2086,54 @@ static void free_vmap_block(struct vmap_
> kfree_rcu(vb, rcu_head);
> }
>
> +static bool purge_fragmented_block(struct vmap_block *vb,
> + struct vmap_block_queue *vbq, struct list_head *purge_list)
> +{
> + if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
> + vb->dirty == VMAP_BBMAP_BITS)
> + return false;
> +
> + /* prevent further allocs after releasing lock */
> + vb->free = 0;
> + /* prevent purging it again */
> + vb->dirty = VMAP_BBMAP_BITS;
> + vb->dirty_min = 0;
> + vb->dirty_max = VMAP_BBMAP_BITS;
> + spin_lock(&vbq->lock);
> + list_del_rcu(&vb->free_list);
> + spin_unlock(&vbq->lock);
> + list_add_tail(&vb->purge, purge_list);
> + return true;
> +}
> +
> +static void free_purged_blocks(struct list_head *purge_list)
> +{
> + struct vmap_block *vb, *n_vb;
> +
> + list_for_each_entry_safe(vb, n_vb, purge_list, purge) {
> + list_del(&vb->purge);
> + free_vmap_block(vb);
> + }
> +}
> +
> static void purge_fragmented_blocks(int cpu)
> {
> LIST_HEAD(purge);
> struct vmap_block *vb;
> - struct vmap_block *n_vb;
> struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
>
> rcu_read_lock();
> list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> -
> - if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
> + if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
> + vb->dirty == VMAP_BBMAP_BITS)
> continue;
>
> spin_lock(&vb->lock);
> - if (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) {
> - vb->free = 0; /* prevent further allocs after releasing lock */
> - vb->dirty = VMAP_BBMAP_BITS; /* prevent purging it again */
> - vb->dirty_min = 0;
> - vb->dirty_max = VMAP_BBMAP_BITS;
> - spin_lock(&vbq->lock);
> - list_del_rcu(&vb->free_list);
> - spin_unlock(&vbq->lock);
> - spin_unlock(&vb->lock);
> - list_add_tail(&vb->purge, &purge);
> - } else
> - spin_unlock(&vb->lock);
> + purge_fragmented_block(vb, vbq, &purge);
> + spin_unlock(&vb->lock);
> }
> rcu_read_unlock();
> -
> - list_for_each_entry_safe(vb, n_vb, &purge, purge) {
> - list_del(&vb->purge);
> - free_vmap_block(vb);
> - }
> + free_purged_blocks(&purge);
> }
>
> static void purge_fragmented_blocks_allcpus(void)
> @@ -2226,12 +2241,13 @@ static void vb_free(unsigned long addr,
>
> static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
> {
> + LIST_HEAD(purge_list);
> int cpu;
>
> if (unlikely(!vmap_initialized))
> return;
>
> - might_sleep();
> + mutex_lock(&vmap_purge_lock);
>
> for_each_possible_cpu(cpu) {
> struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
> @@ -2241,7 +2257,14 @@ static void _vm_unmap_aliases(unsigned l
> rcu_read_lock();
> xa_for_each(&vbq->vmap_blocks, idx, vb) {
> spin_lock(&vb->lock);
> - if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
> +
> + /*
> + * Try to purge a fragmented block first. If it's
> + * not purgeable, check whether there is dirty
> + * space to be flushed.
> + */
> + if (!purge_fragmented_block(vb, vbq, &purge_list) &&
> + vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
> unsigned long va_start = vb->va->va_start;
> unsigned long s, e;
>
> @@ -2257,9 +2280,8 @@ static void _vm_unmap_aliases(unsigned l
> }
> rcu_read_unlock();
> }
> + free_purged_blocks(&purge_list);
>
> - mutex_lock(&vmap_purge_lock);
> - purge_fragmented_blocks_allcpus();
Nice catch... and ugh that we were repeating work here.
> if (!__purge_vmap_area_lazy(start, end) && flush)
> flush_tlb_kernel_range(start, end);
> mutex_unlock(&vmap_purge_lock);
>
>
Overall nice cleanup, it feels like vmalloc has a lot of dirty, musty
corners. Glad you've broken out the feather duster :)
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over
2023-05-25 12:57 ` [V2 patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
2023-05-26 7:56 ` Christoph Hellwig
@ 2023-05-27 18:19 ` Lorenzo Stoakes
1 sibling, 0 replies; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-05-27 18:19 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Peter Zijlstra, Baoquan He
On Thu, May 25, 2023 at 02:57:05PM +0200, Thomas Gleixner wrote:
> vmap blocks which have active mappings cannot be purged. Allocations which
> have been freed are accounted for in vmap_block::dirty_min/max, so that
> they can be detected in _vm_unmap_aliases() as potentially stale TLBs.
>
> If there are several invocations of _vm_unmap_aliases() then each of them
> will flush the dirty range. That's pointless and just increases the
> probability of full TLB flushes.
>
> Avoid that by resetting the flush range after accounting for it. That's
> safe versus other invocations of _vm_unmap_aliases() because this is all
> serialized with vmap_purge_lock.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Baoquan He <bhe@redhat.com>
> ---
> mm/vmalloc.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2226,7 +2226,7 @@ static void vb_free(unsigned long addr,
>
> spin_lock(&vb->lock);
>
> - /* Expand dirty range */
> + /* Expand the not yet TLB flushed dirty range */
> vb->dirty_min = min(vb->dirty_min, offset);
> vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
>
> @@ -2264,7 +2264,7 @@ static void _vm_unmap_aliases(unsigned l
> * space to be flushed.
> */
> if (!purge_fragmented_block(vb, vbq, &purge_list) &&
> - vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
> + vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
> unsigned long va_start = vb->va->va_start;
> unsigned long s, e;
>
> @@ -2274,6 +2274,10 @@ static void _vm_unmap_aliases(unsigned l
> start = min(s, start);
> end = max(e, end);
>
> + /* Prevent that this is flushed again */
Sorry to be absurdly nitty, and I know there's other instances of this in
vmalloc.c but it'd be nice to correct the grammar here -> 'prevent this
from being flushed again'. However, far from urgent!
> + vb->dirty_min = VMAP_BBMAP_BITS;
> + vb->dirty_max = 0;
> +
> flush = 1;
> }
> spin_unlock(&vb->lock);
>
>
This is a really good catch, feel free to add:-
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 4/6] mm/vmalloc: Check free space in vmap_block lockless
2023-05-25 12:57 ` [V2 patch 4/6] mm/vmalloc: Check free space in vmap_block lockless Thomas Gleixner
2023-05-26 7:56 ` Christoph Hellwig
@ 2023-05-27 18:27 ` Lorenzo Stoakes
1 sibling, 0 replies; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-05-27 18:27 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Peter Zijlstra, Baoquan He
On Thu, May 25, 2023 at 02:57:07PM +0200, Thomas Gleixner wrote:
> vb_alloc() unconditionally locks a vmap_block on the free list to check the
> free space.
>
> This can be done locklessly because vmap_block::free never increases, it's
> only decreased on allocations.
>
> Check the free space lockless and only if that succeeds, recheck under the
> lock.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
> mm/vmalloc.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2168,6 +2168,9 @@ static void *vb_alloc(unsigned long size
> list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> unsigned long pages_off;
>
> + if (READ_ONCE(vb->free) < (1UL << order))
> + continue;
> +
> spin_lock(&vb->lock);
> if (vb->free < (1UL << order)) {
> spin_unlock(&vb->lock);
> @@ -2176,7 +2179,7 @@ static void *vb_alloc(unsigned long size
>
> pages_off = VMAP_BBMAP_BITS - vb->free;
> vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
> - vb->free -= 1UL << order;
> + WRITE_ONCE(vb->free, vb->free - (1UL << order));
> bitmap_set(vb->used_map, pages_off, (1UL << order));
> if (vb->free == 0) {
> spin_lock(&vbq->lock);
>
>
Looks good to me,
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations
2023-05-25 12:57 ` [V2 patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations Thomas Gleixner
2023-05-26 7:56 ` Christoph Hellwig
@ 2023-05-27 19:06 ` Lorenzo Stoakes
1 sibling, 0 replies; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-05-27 19:06 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Peter Zijlstra, Baoquan He
On Thu, May 25, 2023 at 02:57:08PM +0200, Thomas Gleixner wrote:
> purge_fragmented_blocks() accesses vmap_block::free and vmap_block::dirty
> lockless for a quick check.
>
> Add the missing READ/WRITE_ONCE() annotations.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
> mm/vmalloc.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2094,9 +2094,9 @@ static bool purge_fragmented_block(struc
> return false;
>
> /* prevent further allocs after releasing lock */
> - vb->free = 0;
> + WRITE_ONCE(vb->free, 0);
> /* prevent purging it again */
> - vb->dirty = VMAP_BBMAP_BITS;
> + WRITE_ONCE(vb->dirty, VMAP_BBMAP_BITS);
> vb->dirty_min = 0;
> vb->dirty_max = VMAP_BBMAP_BITS;
> spin_lock(&vbq->lock);
> @@ -2124,8 +2124,11 @@ static void purge_fragmented_blocks(int
>
> rcu_read_lock();
> list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> - if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
> - vb->dirty == VMAP_BBMAP_BITS)
> + unsigned long free = READ_ONCE(vb->free);
> + unsigned long dirty = READ_ONCE(vb->dirty);
> +
> + if (free + dirty != VMAP_BBMAP_BITS ||
> + dirty == VMAP_BBMAP_BITS)
> continue;
>
> spin_lock(&vb->lock);
> @@ -2233,7 +2236,7 @@ static void vb_free(unsigned long addr,
> vb->dirty_min = min(vb->dirty_min, offset);
> vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
>
> - vb->dirty += 1UL << order;
> + WRITE_ONCE(vb->dirty, vb->dirty + (1UL << order));
This is probably a me thing, but I'm a little confused as to why this is
necessary in a code path distinct from the purge stuff, as this will only
prevent the compiler from being 'creative' with ordering here which seems unlikely to be an issue? Or is it a case of belts + braces?
Also wouldn't we require a READ_ONCE() here and below also?
> if (vb->dirty == VMAP_BBMAP_BITS) {
> BUG_ON(vb->free);
> spin_unlock(&vb->lock);
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [V2 patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily
2023-05-25 12:57 ` [V2 patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily Thomas Gleixner
@ 2023-05-27 19:28 ` Lorenzo Stoakes
0 siblings, 0 replies; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-05-27 19:28 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-mm, Andrew Morton, Christoph Hellwig, Uladzislau Rezki,
Peter Zijlstra, Baoquan He
On Thu, May 25, 2023 at 02:57:09PM +0200, Thomas Gleixner wrote:
> Purging fragmented blocks is done unconditionally in several contexts:
>
> 1) From drain_vmap_area_work(), when the number of lazy to be freed
> vmap_areas reached the threshold
>
> 2) Reclaiming vmalloc address space from pcpu_get_vm_areas()
>
> 3) _vm_unmap_aliases()
>
> #1 There is no reason to zap fragmented vmap blocks unconditionally, simply
> because reclaiming all lazy areas drains at least
>
> 32MB * fls(num_online_cpus())
>
> per invocation which is plenty.
>
> #2 Reclaiming when running out of space or due to memory pressure makes a
> lot of sense
>
> #3 _unmap_aliases() requires to touch everything because the caller has no
> clue which vmap_area used a particular page last and the vmap_area lost
> that information too.
>
> Except for the vfree + VM_FLUSH_RESET_PERMS case, which removes the
> vmap area first and then cares about the flush. That in turn requires
> a full walk of _all_ vmap areas including the one which was just
> added to the purge list.
>
> But as this has to be flushed anyway this is an opportunity to combine
> outstanding TLB flushes and do the housekeeping of purging freed areas,
> but like #1 there is no real good reason to zap usable vmap blocks
> unconditionally.
>
> Add a @force_purge argument to the newly split out block purge function and
> if not true only purge fragmented blocks which have less than 1/4 of their
> capacity left.
>
> Rename purge_vmap_area_lazy() to reclaim_and_purge_vmap_areas() to make it
> clear what the function does.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Add the missing force_purge argument in _vm_unmap_aliases()
> Remove force_purge argument from the reclaim path - Baoquan
> ---
> mm/vmalloc.c | 28 ++++++++++++++++++++--------
> 1 file changed, 20 insertions(+), 8 deletions(-)
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -791,7 +791,7 @@ get_subtree_max_size(struct rb_node *nod
> RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb,
> struct vmap_area, rb_node, unsigned long, subtree_max_size, va_size)
>
> -static void purge_vmap_area_lazy(void);
> +static void reclaim_and_purge_vmap_areas(void);
> static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
> static void drain_vmap_area_work(struct work_struct *work);
> static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work);
> @@ -1649,7 +1649,7 @@ static struct vmap_area *alloc_vmap_area
>
> overflow:
> if (!purged) {
> - purge_vmap_area_lazy();
> + reclaim_and_purge_vmap_areas();
> purged = 1;
> goto retry;
> }
> @@ -1785,9 +1785,10 @@ static bool __purge_vmap_area_lazy(unsig
> }
>
> /*
> - * Kick off a purge of the outstanding lazy areas.
> + * Reclaim vmap areas by purging fragmented blocks and purge_vmap_area_list.
> */
This comment feels a little redundant as it simply describes the code
below. Perhaps worth highlighting that the purge is unconditional and why it
would be invoked?
Nitty, but I'm also not entirely sure about this use of 'reclaim', as it is
rather an overloaded term and you'd expect it to refer to the activities of
vmscan.c :) yes we're trying to 'reclaim' vmap areas in a sense but perhaps
better to either stick with original or something less overloaded (I would
suggest something but naming is hard... :)
> -static void purge_vmap_area_lazy(void)
> +static void reclaim_and_purge_vmap_areas(void)
> +
> {
> mutex_lock(&vmap_purge_lock);
> purge_fragmented_blocks_allcpus();
> @@ -1908,6 +1909,12 @@ static struct vmap_area *find_unlink_vma
>
> #define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE)
>
> +/*
> + * Purge threshold to prevent overeager purging of fragmented blocks for
> + * regular operations: Purge if vb->free is less than 1/4 of the capacity.
> + */
'Purge if vb->free is less than 1/4 of the capacity' appears to be contradicted
by the actual conditional in purge_fragmented_block() below (see comment below)
unless I'm missing something/simply confused :)
> +#define VMAP_PURGE_THRESHOLD (VMAP_BBMAP_BITS / 4)
> +
> #define VMAP_RAM 0x1 /* indicates vm_map_ram area*/
> #define VMAP_BLOCK 0x2 /* mark out the vmap_block sub-type*/
> #define VMAP_FLAGS_MASK 0x3
> @@ -2087,12 +2094,17 @@ static void free_vmap_block(struct vmap_
> }
>
> static bool purge_fragmented_block(struct vmap_block *vb,
> - struct vmap_block_queue *vbq, struct list_head *purge_list)
> + struct vmap_block_queue *vbq, struct list_head *purge_list,
> + bool force_purge)
> {
> if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
> vb->dirty == VMAP_BBMAP_BITS)
> return false;
>
> + /* Don't overeagerly purge usable blocks unless requested */
> + if (!force_purge && vb->free < VMAP_PURGE_THRESHOLD)
Little confused by this - if free capacity is less than the threshold, we
_don't_ purge? I thought we should purge in this instance? Should this be an >=?
> + return false;
> +
> /* prevent further allocs after releasing lock */
> WRITE_ONCE(vb->free, 0);
> /* prevent purging it again */
> @@ -2132,7 +2144,7 @@ static void purge_fragmented_blocks(int
> continue;
>
> spin_lock(&vb->lock);
> - purge_fragmented_block(vb, vbq, &purge);
> + purge_fragmented_block(vb, vbq, &purge, true);
> spin_unlock(&vb->lock);
> }
> rcu_read_unlock();
> @@ -2269,7 +2281,7 @@ static void _vm_unmap_aliases(unsigned l
> * not purgeable, check whether there is dirty
> * space to be flushed.
> */
> - if (!purge_fragmented_block(vb, vbq, &purge_list) &&
> + if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
> vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
> unsigned long va_start = vb->va->va_start;
> unsigned long s, e;
> @@ -4175,7 +4187,7 @@ struct vm_struct **pcpu_get_vm_areas(con
> overflow:
> spin_unlock(&free_vmap_area_lock);
> if (!purged) {
> - purge_vmap_area_lazy();
> + reclaim_and_purge_vmap_areas();
> purged = true;
>
> /* Before "retry", check if we recover. */
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-05-27 19:28 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-25 12:57 [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Thomas Gleixner
2023-05-25 12:57 ` [V2 patch 1/6] mm/vmalloc: Prevent stale TLBs in fully utilized blocks Thomas Gleixner
2023-05-25 12:57 ` [V2 patch 2/6] mm/vmalloc: Avoid iterating over per CPU vmap blocks twice Thomas Gleixner
2023-05-26 7:55 ` Christoph Hellwig
2023-05-27 17:38 ` Lorenzo Stoakes
2023-05-25 12:57 ` [V2 patch 3/6] mm/vmalloc: Prevent flushing dirty space over and over Thomas Gleixner
2023-05-26 7:56 ` Christoph Hellwig
2023-05-27 18:19 ` Lorenzo Stoakes
2023-05-25 12:57 ` [V2 patch 4/6] mm/vmalloc: Check free space in vmap_block lockless Thomas Gleixner
2023-05-26 7:56 ` Christoph Hellwig
2023-05-27 18:27 ` Lorenzo Stoakes
2023-05-25 12:57 ` [V2 patch 5/6] mm/vmalloc: Add missing READ/WRITE_ONCE() annotations Thomas Gleixner
2023-05-26 7:56 ` Christoph Hellwig
2023-05-27 19:06 ` Lorenzo Stoakes
2023-05-25 12:57 ` [V2 patch 6/6] mm/vmalloc: Dont purge usable blocks unnecessarily Thomas Gleixner
2023-05-27 19:28 ` Lorenzo Stoakes
2023-05-27 10:21 ` [V2 patch 0/6] mm/vmalloc: Assorted fixes and improvements Baoquan He
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox