* [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
@ 2025-10-23 12:01 Vlastimil Babka
2025-10-23 12:56 ` Harry Yoo
2025-10-23 23:13 ` Alexei Starovoitov
0 siblings, 2 replies; 9+ messages in thread
From: Vlastimil Babka @ 2025-10-23 12:01 UTC (permalink / raw)
To: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin,
Harry Yoo, Alexei Starovoitov
Cc: linux-mm, linux-kernel, Vlastimil Babka
Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
that we discard the newly allocated slab if we can't spin and we fail to
trylock. As a result we don't perform inc_slabs_node() later in the
function. Instead we perform a deferred deactivate_slab() which can
either put the unacounted slab on partial list, or discard it
immediately while performing dec_slabs_node(). Either way will cause an
accounting imbalance.
Fix this by not marking the slab as frozen, and using free_slab()
instead of deactivate_slab() for non-frozen slabs in
free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
case. By not using discard_slab() we avoid dec_slabs_node().
Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Changes in v2:
- Fix the problem differently. Harry pointed out that we can't move
inc_slabs_node() outside of list_lock protected regions as that would
reintroduce issues fixed by commit c7323a5ad078
- Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
---
mm/slub.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 23d8f54e9486..87a1d2f9de0d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
/* Unlucky, discard newly allocated slab */
- slab->frozen = 1;
defer_deactivate_slab(slab, NULL);
return NULL;
}
@@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
struct slab *slab = container_of(pos, struct slab, llnode);
#ifdef CONFIG_SLUB_TINY
- discard_slab(slab->slab_cache, slab);
+ free_slab(slab->slab_cache, slab);
#else
- deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
+ if (slab->frozen)
+ deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
+ else
+ free_slab(slab->slab_cache, slab);
#endif
}
}
---
base-commit: 6ed8bfd24ce1cb31742b09a3eb557cd008533eec
change-id: 20251022-fix-slab-accounting-f0abbda8a6ff
Best regards,
--
Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() 2025-10-23 12:01 [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() Vlastimil Babka @ 2025-10-23 12:56 ` Harry Yoo 2025-10-23 23:13 ` Alexei Starovoitov 1 sibling, 0 replies; 9+ messages in thread From: Harry Yoo @ 2025-10-23 12:56 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm, linux-kernel On Thu, Oct 23, 2025 at 02:01:07PM +0200, Vlastimil Babka wrote: > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and > kfree_nolock().") there's a possibility in alloc_single_from_new_slab() > that we discard the newly allocated slab if we can't spin and we fail to > trylock. As a result we don't perform inc_slabs_node() later in the > function. Instead we perform a deferred deactivate_slab() which can > either put the unacounted slab on partial list, or discard it > immediately while performing dec_slabs_node(). Either way will cause an > accounting imbalance. > > Fix this by not marking the slab as frozen, and using free_slab() > instead of deactivate_slab() for non-frozen slabs in > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible > case. By not using discard_slab() we avoid dec_slabs_node(). > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Changes in v2: > - Fix the problem differently. Harry pointed out that we can't move > inc_slabs_node() outside of list_lock protected regions as that would > reintroduce issues fixed by commit c7323a5ad078 > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz > --- Looks good to me, Reviewed-by: Harry Yoo <harry.yoo@oracle.com> -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() 2025-10-23 12:01 [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() Vlastimil Babka 2025-10-23 12:56 ` Harry Yoo @ 2025-10-23 23:13 ` Alexei Starovoitov 2025-10-24 0:00 ` Harry Yoo 1 sibling, 1 reply; 9+ messages in thread From: Alexei Starovoitov @ 2025-10-23 23:13 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Alexei Starovoitov, linux-mm, LKML On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and > kfree_nolock().") there's a possibility in alloc_single_from_new_slab() > that we discard the newly allocated slab if we can't spin and we fail to > trylock. As a result we don't perform inc_slabs_node() later in the > function. Instead we perform a deferred deactivate_slab() which can > either put the unacounted slab on partial list, or discard it > immediately while performing dec_slabs_node(). Either way will cause an > accounting imbalance. > > Fix this by not marking the slab as frozen, and using free_slab() > instead of deactivate_slab() for non-frozen slabs in > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible > case. By not using discard_slab() we avoid dec_slabs_node(). > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Changes in v2: > - Fix the problem differently. Harry pointed out that we can't move > inc_slabs_node() outside of list_lock protected regions as that would > reintroduce issues fixed by commit c7323a5ad078 > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz > --- > mm/slub.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 23d8f54e9486..87a1d2f9de0d 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab, > > if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) { > /* Unlucky, discard newly allocated slab */ > - slab->frozen = 1; > defer_deactivate_slab(slab, NULL); > return NULL; > } > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work) > struct slab *slab = container_of(pos, struct slab, llnode); > > #ifdef CONFIG_SLUB_TINY > - discard_slab(slab->slab_cache, slab); > + free_slab(slab->slab_cache, slab); > #else > - deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > + if (slab->frozen) > + deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > + else > + free_slab(slab->slab_cache, slab); A bit odd to use 'frozen' flag as such a signal. I guess I'm worried that truly !frozen slab can come here via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab(). And things will be much worse than just accounting. Maybe add inc_slabs_node(s, nid, slab->objects); right before defer_deactivate_slab(slab, NULL); return NULL; I don't quite get why c7323a5ad078 is doing everything under n->list_lock. It's been 3 years since. We have an empty slab here that is going to be freed soon. It's effectively frozen, so inc_slabs_node() on it seems like a safe fix. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() 2025-10-23 23:13 ` Alexei Starovoitov @ 2025-10-24 0:00 ` Harry Yoo 2025-10-24 1:17 ` Alexei Starovoitov 0 siblings, 1 reply; 9+ messages in thread From: Harry Yoo @ 2025-10-24 0:00 UTC (permalink / raw) To: Alexei Starovoitov Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm, LKML On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote: > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab() > > that we discard the newly allocated slab if we can't spin and we fail to > > trylock. As a result we don't perform inc_slabs_node() later in the > > function. Instead we perform a deferred deactivate_slab() which can > > either put the unacounted slab on partial list, or discard it > > immediately while performing dec_slabs_node(). Either way will cause an > > accounting imbalance. > > > > Fix this by not marking the slab as frozen, and using free_slab() > > instead of deactivate_slab() for non-frozen slabs in > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible > > case. By not using discard_slab() we avoid dec_slabs_node(). > > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > > --- > > Changes in v2: > > - Fix the problem differently. Harry pointed out that we can't move > > inc_slabs_node() outside of list_lock protected regions as that would > > reintroduce issues fixed by commit c7323a5ad078 > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz > > --- > > mm/slub.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index 23d8f54e9486..87a1d2f9de0d 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab, > > > > if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) { > > /* Unlucky, discard newly allocated slab */ > > - slab->frozen = 1; > > defer_deactivate_slab(slab, NULL); > > return NULL; > > } > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work) > > struct slab *slab = container_of(pos, struct slab, llnode); > > > > #ifdef CONFIG_SLUB_TINY > > - discard_slab(slab->slab_cache, slab); > > + free_slab(slab->slab_cache, slab); > > #else > > - deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > > + if (slab->frozen) > > + deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > > + else > > + free_slab(slab->slab_cache, slab); > > A bit odd to use 'frozen' flag as such a signal. > I guess I'm worried that truly !frozen slab can come here > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab(). > And things will be much worse than just accounting. But the cpu slab must have been frozen before it's attached to c->slab? > Maybe add > inc_slabs_node(s, nid, slab->objects); > right before > defer_deactivate_slab(slab, NULL); > return NULL; > > I don't quite get why c7323a5ad078 is doing everything under n->list_lock. > It's been 3 years since. When n->nr_slabs is inconsistent, validate_slab_node() might report an error (false positive) when someone wrote '1' to /sys/kernel/slab/<cache name>/validate > We have an empty slab here that is going to be freed soon. > It's effectively frozen, so inc_slabs_node() on it seems like a safe fix. -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() 2025-10-24 0:00 ` Harry Yoo @ 2025-10-24 1:17 ` Alexei Starovoitov 2025-10-24 2:03 ` Harry Yoo 0 siblings, 1 reply; 9+ messages in thread From: Alexei Starovoitov @ 2025-10-24 1:17 UTC (permalink / raw) To: Harry Yoo Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm, LKML On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@oracle.com> wrote: > > On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote: > > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and > > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab() > > > that we discard the newly allocated slab if we can't spin and we fail to > > > trylock. As a result we don't perform inc_slabs_node() later in the > > > function. Instead we perform a deferred deactivate_slab() which can > > > either put the unacounted slab on partial list, or discard it > > > immediately while performing dec_slabs_node(). Either way will cause an > > > accounting imbalance. > > > > > > Fix this by not marking the slab as frozen, and using free_slab() > > > instead of deactivate_slab() for non-frozen slabs in > > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible > > > case. By not using discard_slab() we avoid dec_slabs_node(). > > > > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") > > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > > > --- > > > Changes in v2: > > > - Fix the problem differently. Harry pointed out that we can't move > > > inc_slabs_node() outside of list_lock protected regions as that would > > > reintroduce issues fixed by commit c7323a5ad078 > > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz > > > --- > > > mm/slub.c | 8 +++++--- > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index 23d8f54e9486..87a1d2f9de0d 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab, > > > > > > if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) { > > > /* Unlucky, discard newly allocated slab */ > > > - slab->frozen = 1; > > > defer_deactivate_slab(slab, NULL); > > > return NULL; > > > } > > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work) > > > struct slab *slab = container_of(pos, struct slab, llnode); > > > > > > #ifdef CONFIG_SLUB_TINY > > > - discard_slab(slab->slab_cache, slab); > > > + free_slab(slab->slab_cache, slab); > > > #else > > > - deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > > > + if (slab->frozen) > > > + deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > > > + else > > > + free_slab(slab->slab_cache, slab); > > > > A bit odd to use 'frozen' flag as such a signal. > > I guess I'm worried that truly !frozen slab can come here > > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab(). > > And things will be much worse than just accounting. > > But the cpu slab must have been frozen before it's attached to > c->slab? Is it? the path is c = slub_get_cpu_ptr(s->cpu_slab); if (unlikely(c->slab)) { struct slab *flush_slab = c->slab; defer_deactivate_slab(flush_slab, ...); I don't see why it would be frozen. > > Maybe add > > inc_slabs_node(s, nid, slab->objects); > > right before > > defer_deactivate_slab(slab, NULL); > > return NULL; > > > > I don't quite get why c7323a5ad078 is doing everything under n->list_lock. > > It's been 3 years since. > > When n->nr_slabs is inconsistent, validate_slab_node() might report an > error (false positive) when someone wrote '1' to > /sys/kernel/slab/<cache name>/validate Ok. I see it now. It's the actual number of elements in n->full list needs to match n->nr_slabs. But then how it's not broken already? I see that alloc_single_from_new_slab() unconditionally does inc_slabs_node(), but slab itself is added either to n->full or n->partial lists. And validate_slab_node() should be complaining already. Anyway, I'm not arguing. Just trying to understand. If you think the fix is fine, then go ahead. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() 2025-10-24 1:17 ` Alexei Starovoitov @ 2025-10-24 2:03 ` Harry Yoo 2025-10-24 8:55 ` Vlastimil Babka 0 siblings, 1 reply; 9+ messages in thread From: Harry Yoo @ 2025-10-24 2:03 UTC (permalink / raw) To: Alexei Starovoitov Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm, LKML On Thu, Oct 23, 2025 at 06:17:19PM -0700, Alexei Starovoitov wrote: > On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@oracle.com> wrote: > > > > On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote: > > > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > > > > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and > > > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab() > > > > that we discard the newly allocated slab if we can't spin and we fail to > > > > trylock. As a result we don't perform inc_slabs_node() later in the > > > > function. Instead we perform a deferred deactivate_slab() which can > > > > either put the unacounted slab on partial list, or discard it > > > > immediately while performing dec_slabs_node(). Either way will cause an > > > > accounting imbalance. > > > > > > > > Fix this by not marking the slab as frozen, and using free_slab() > > > > instead of deactivate_slab() for non-frozen slabs in > > > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible > > > > case. By not using discard_slab() we avoid dec_slabs_node(). > > > > > > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") > > > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > > > > --- > > > > Changes in v2: > > > > - Fix the problem differently. Harry pointed out that we can't move > > > > inc_slabs_node() outside of list_lock protected regions as that would > > > > reintroduce issues fixed by commit c7323a5ad078 > > > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz > > > > --- > > > > mm/slub.c | 8 +++++--- > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > > index 23d8f54e9486..87a1d2f9de0d 100644 > > > > --- a/mm/slub.c > > > > +++ b/mm/slub.c > > > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab, > > > > > > > > if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) { > > > > /* Unlucky, discard newly allocated slab */ > > > > - slab->frozen = 1; > > > > defer_deactivate_slab(slab, NULL); > > > > return NULL; > > > > } > > > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work) > > > > struct slab *slab = container_of(pos, struct slab, llnode); > > > > > > > > #ifdef CONFIG_SLUB_TINY > > > > - discard_slab(slab->slab_cache, slab); > > > > + free_slab(slab->slab_cache, slab); > > > > #else > > > > - deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > > > > + if (slab->frozen) > > > > + deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > > > > + else > > > > + free_slab(slab->slab_cache, slab); > > > > > > A bit odd to use 'frozen' flag as such a signal. > > > I guess I'm worried that truly !frozen slab can come here > > > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab(). > > > And things will be much worse than just accounting. > > > > But the cpu slab must have been frozen before it's attached to > > c->slab? > > Is it? > the path is > c = slub_get_cpu_ptr(s->cpu_slab); > if (unlikely(c->slab)) { > struct slab *flush_slab = c->slab; > defer_deactivate_slab(flush_slab, ...); > > I don't see why it would be frozen. Oh god. I was going to say the cpu slab is always frozen. It has been true for very long time, but it seems it's not true after commit 90b1e56641 ("mm/slub: directly load freelist from cpu partial slab in the likely case"). So I think you're right that a non-frozen slab can go through free_slab() in free_deferred_objects()... But fixing this should be simple. Add something like freeze_and_get_freelist() and call it when SLUB take a slab from per-cpu partial slab list? > > > Maybe add > > > inc_slabs_node(s, nid, slab->objects); > > > right before > > > defer_deactivate_slab(slab, NULL); > > > return NULL; > > > > > > I don't quite get why c7323a5ad078 is doing everything under n->list_lock. > > > It's been 3 years since. > > > > When n->nr_slabs is inconsistent, validate_slab_node() might report an > > error (false positive) when someone wrote '1' to > > /sys/kernel/slab/<cache name>/validate > > Ok. I see it now. It's the actual number of elements in n->full > list needs to match n->nr_slabs. > > But then how it's not broken already? > I see that > alloc_single_from_new_slab() > unconditionally does inc_slabs_node(), but It increments n->nr_slabs. It doesn't matter which list it's going to be added to, because it's total number of slabs in that node. > slab itself is added either to n->full or n->partial lists. and then n->nr_partial is also incremented if it's added to n->partial. > And validate_slab_node() should be complaining already. The debug routine checks if: - the number of slabs in n->partial == n->nr_partial - the number of slabs in n->full + n->partial == n->nr_slabs under n->list_lock. So it's not broken? > Anyway, I'm not arguing. Just trying to understand. > If you think the fix is fine, then go ahead. -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() 2025-10-24 2:03 ` Harry Yoo @ 2025-10-24 8:55 ` Vlastimil Babka 2025-10-24 9:36 ` Harry Yoo 0 siblings, 1 reply; 9+ messages in thread From: Vlastimil Babka @ 2025-10-24 8:55 UTC (permalink / raw) To: Harry Yoo, Alexei Starovoitov Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm, LKML On 10/24/25 04:03, Harry Yoo wrote: > On Thu, Oct 23, 2025 at 06:17:19PM -0700, Alexei Starovoitov wrote: >> On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@oracle.com> wrote: >> > >> > On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote: >> > > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote: >> > > > >> > > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and >> > > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab() >> > > > that we discard the newly allocated slab if we can't spin and we fail to >> > > > trylock. As a result we don't perform inc_slabs_node() later in the >> > > > function. Instead we perform a deferred deactivate_slab() which can >> > > > either put the unacounted slab on partial list, or discard it >> > > > immediately while performing dec_slabs_node(). Either way will cause an >> > > > accounting imbalance. >> > > > >> > > > Fix this by not marking the slab as frozen, and using free_slab() >> > > > instead of deactivate_slab() for non-frozen slabs in >> > > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible >> > > > case. By not using discard_slab() we avoid dec_slabs_node(). >> > > > >> > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") >> > > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> >> > > > --- >> > > > Changes in v2: >> > > > - Fix the problem differently. Harry pointed out that we can't move >> > > > inc_slabs_node() outside of list_lock protected regions as that would >> > > > reintroduce issues fixed by commit c7323a5ad078 >> > > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz >> > > > --- >> > > > mm/slub.c | 8 +++++--- >> > > > 1 file changed, 5 insertions(+), 3 deletions(-) >> > > > >> > > > diff --git a/mm/slub.c b/mm/slub.c >> > > > index 23d8f54e9486..87a1d2f9de0d 100644 >> > > > --- a/mm/slub.c >> > > > +++ b/mm/slub.c >> > > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab, >> > > > >> > > > if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) { >> > > > /* Unlucky, discard newly allocated slab */ >> > > > - slab->frozen = 1; >> > > > defer_deactivate_slab(slab, NULL); >> > > > return NULL; >> > > > } >> > > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work) >> > > > struct slab *slab = container_of(pos, struct slab, llnode); >> > > > >> > > > #ifdef CONFIG_SLUB_TINY >> > > > - discard_slab(slab->slab_cache, slab); >> > > > + free_slab(slab->slab_cache, slab); >> > > > #else >> > > > - deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); >> > > > + if (slab->frozen) >> > > > + deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); >> > > > + else >> > > > + free_slab(slab->slab_cache, slab); >> > > >> > > A bit odd to use 'frozen' flag as such a signal. >> > > I guess I'm worried that truly !frozen slab can come here >> > > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab(). >> > > And things will be much worse than just accounting. >> > >> > But the cpu slab must have been frozen before it's attached to >> > c->slab? Note that deactivate_slab() contains VM_BUG_ON(!old.frozen); we would have seen this triggered if we were passing unfrozen slabs to (defer_)deactivate_slab(). I assume it's also why the "unlucky, discard" code marks it frozen before calling defer_deactivate_slab() (and this patch removes that). >> Is it? >> the path is >> c = slub_get_cpu_ptr(s->cpu_slab); >> if (unlikely(c->slab)) { >> struct slab *flush_slab = c->slab; >> defer_deactivate_slab(flush_slab, ...); >> >> I don't see why it would be frozen. c->slab is always frozen, that's an invariant > > Oh god. I was going to say the cpu slab is always frozen. It has been > true for very long time, but it seems it's not true after commit 90b1e56641 > ("mm/slub: directly load freelist from cpu partial slab in the likely case"). It's still true. That commit only removes VM_BUG_ON(!new.frozen); where "new" is in fact the old state - when slab is on cpu partial list it's not yet frozen. get_freelist() then sets new.frozen = freelist != NULL; and we know that freelist cant't be NULL for a slab on the cpu partial list. The commit even added VM_BUG_ON(!freelist); on the get_freelist() result for this case. So I think we're fine? > So I think you're right that a non-frozen slab can go through > free_slab() in free_deferred_objects()... > > But fixing this should be simple. Add something like > freeze_and_get_freelist() and call it when SLUB take a slab from > per-cpu partial slab list? > >> > > Maybe add >> > > inc_slabs_node(s, nid, slab->objects); >> > > right before >> > > defer_deactivate_slab(slab, NULL); >> > > return NULL; >> > > >> > > I don't quite get why c7323a5ad078 is doing everything under n->list_lock. >> > > It's been 3 years since. >> > >> > When n->nr_slabs is inconsistent, validate_slab_node() might report an >> > error (false positive) when someone wrote '1' to >> > /sys/kernel/slab/<cache name>/validate >> >> Ok. I see it now. It's the actual number of elements in n->full >> list needs to match n->nr_slabs. >> >> But then how it's not broken already? >> I see that >> alloc_single_from_new_slab() >> unconditionally does inc_slabs_node(), but > > It increments n->nr_slabs. It doesn't matter which list it's going to be > added to, because it's total number of slabs in that node. > >> slab itself is added either to n->full or n->partial lists. > > and then n->nr_partial is also incremented if it's added to n->partial. > >> And validate_slab_node() should be complaining already. > > The debug routine checks if: > - the number of slabs in n->partial == n->nr_partial > - the number of slabs in n->full + n->partial == n->nr_slabs > > under n->list_lock. So it's not broken? > >> Anyway, I'm not arguing. Just trying to understand. >> If you think the fix is fine, then go ahead. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() 2025-10-24 8:55 ` Vlastimil Babka @ 2025-10-24 9:36 ` Harry Yoo 2025-10-24 18:08 ` Alexei Starovoitov 0 siblings, 1 reply; 9+ messages in thread From: Harry Yoo @ 2025-10-24 9:36 UTC (permalink / raw) To: Vlastimil Babka Cc: Alexei Starovoitov, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm, LKML On Fri, Oct 24, 2025 at 10:55:20AM +0200, Vlastimil Babka wrote: > On 10/24/25 04:03, Harry Yoo wrote: > > On Thu, Oct 23, 2025 at 06:17:19PM -0700, Alexei Starovoitov wrote: > >> On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@oracle.com> wrote: > >> > > >> > On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote: > >> > > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote: > >> > > > > >> > > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and > >> > > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab() > >> > > > that we discard the newly allocated slab if we can't spin and we fail to > >> > > > trylock. As a result we don't perform inc_slabs_node() later in the > >> > > > function. Instead we perform a deferred deactivate_slab() which can > >> > > > either put the unacounted slab on partial list, or discard it > >> > > > immediately while performing dec_slabs_node(). Either way will cause an > >> > > > accounting imbalance. > >> > > > > >> > > > Fix this by not marking the slab as frozen, and using free_slab() > >> > > > instead of deactivate_slab() for non-frozen slabs in > >> > > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible > >> > > > case. By not using discard_slab() we avoid dec_slabs_node(). > >> > > > > >> > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") > >> > > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > >> > > > --- > >> > > > Changes in v2: > >> > > > - Fix the problem differently. Harry pointed out that we can't move > >> > > > inc_slabs_node() outside of list_lock protected regions as that would > >> > > > reintroduce issues fixed by commit c7323a5ad078 > >> > > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz > >> > > > --- > >> > > > mm/slub.c | 8 +++++--- > >> > > > 1 file changed, 5 insertions(+), 3 deletions(-) > >> > > > > >> > > > diff --git a/mm/slub.c b/mm/slub.c > >> > > > index 23d8f54e9486..87a1d2f9de0d 100644 > >> > > > --- a/mm/slub.c > >> > > > +++ b/mm/slub.c > >> > > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab, > >> > > > > >> > > > if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) { > >> > > > /* Unlucky, discard newly allocated slab */ > >> > > > - slab->frozen = 1; > >> > > > defer_deactivate_slab(slab, NULL); > >> > > > return NULL; > >> > > > } > >> > > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work) > >> > > > struct slab *slab = container_of(pos, struct slab, llnode); > >> > > > > >> > > > #ifdef CONFIG_SLUB_TINY > >> > > > - discard_slab(slab->slab_cache, slab); > >> > > > + free_slab(slab->slab_cache, slab); > >> > > > #else > >> > > > - deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > >> > > > + if (slab->frozen) > >> > > > + deactivate_slab(slab->slab_cache, slab, slab->flush_freelist); > >> > > > + else > >> > > > + free_slab(slab->slab_cache, slab); > >> > > > >> > > A bit odd to use 'frozen' flag as such a signal. > >> > > I guess I'm worried that truly !frozen slab can come here > >> > > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab(). > >> > > And things will be much worse than just accounting. > >> > > >> > But the cpu slab must have been frozen before it's attached to > >> > c->slab? > > Note that deactivate_slab() contains VM_BUG_ON(!old.frozen); > we would have seen this triggered if we were passing unfrozen slabs to > (defer_)deactivate_slab(). I assume it's also why the "unlucky, discard" > code marks it frozen before calling defer_deactivate_slab() (and this patch > removes that). > > >> Is it? > >> the path is > >> c = slub_get_cpu_ptr(s->cpu_slab); > >> if (unlikely(c->slab)) { > >> struct slab *flush_slab = c->slab; > >> defer_deactivate_slab(flush_slab, ...); > >> > >> I don't see why it would be frozen. > > c->slab is always frozen, that's an invariant > > > > > Oh god. I was going to say the cpu slab is always frozen. It has been > > true for very long time, but it seems it's not true after commit 90b1e56641 > > ("mm/slub: directly load freelist from cpu partial slab in the likely case"). > > It's still true. That commit only removes VM_BUG_ON(!new.frozen); where > "new" is in fact the old state - when slab is on cpu partial list it's not > yet frozen. get_freelist() then sets new.frozen = freelist != NULL; (scratching cheek in embarrassment) You're right. I was thinking that changing from calling freeze_slab() & get_freelist() to calling just get_freelist() would make it non-frozen, but actually get_freelist() freezes it! My mistake. > and we know that freelist cant't be NULL for a slab on the cpu partial list. > The commit even added VM_BUG_ON(!freelist); on the get_freelist() result for > this case. Yes, as long as it's in percpu partial slab list, it cannot be NULL. > So I think we're fine? Yes. > > So I think you're right that a non-frozen slab can go through > > free_slab() in free_deferred_objects()... > > > > But fixing this should be simple. Add something like > > freeze_and_get_freelist() and call it when SLUB take a slab from > > per-cpu partial slab list? > > > >> > > Maybe add > >> > > inc_slabs_node(s, nid, slab->objects); > >> > > right before > >> > > defer_deactivate_slab(slab, NULL); > >> > > return NULL; > >> > > > >> > > I don't quite get why c7323a5ad078 is doing everything under n->list_lock. > >> > > It's been 3 years since. > >> > > >> > When n->nr_slabs is inconsistent, validate_slab_node() might report an > >> > error (false positive) when someone wrote '1' to > >> > /sys/kernel/slab/<cache name>/validate > >> > >> Ok. I see it now. It's the actual number of elements in n->full > >> list needs to match n->nr_slabs. > >> > >> But then how it's not broken already? > >> I see that > >> alloc_single_from_new_slab() > >> unconditionally does inc_slabs_node(), but > > > > It increments n->nr_slabs. It doesn't matter which list it's going to be > > added to, because it's total number of slabs in that node. > > > >> slab itself is added either to n->full or n->partial lists. > > > > and then n->nr_partial is also incremented if it's added to n->partial. > > > >> And validate_slab_node() should be complaining already. > > > > The debug routine checks if: > > - the number of slabs in n->partial == n->nr_partial > > - the number of slabs in n->full + n->partial == n->nr_slabs > > > > under n->list_lock. So it's not broken? > > > >> Anyway, I'm not arguing. Just trying to understand. > >> If you think the fix is fine, then go ahead. > > > -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() 2025-10-24 9:36 ` Harry Yoo @ 2025-10-24 18:08 ` Alexei Starovoitov 0 siblings, 0 replies; 9+ messages in thread From: Alexei Starovoitov @ 2025-10-24 18:08 UTC (permalink / raw) To: Harry Yoo Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm, LKML On Fri, Oct 24, 2025 at 2:36 AM Harry Yoo <harry.yoo@oracle.com> wrote: > > > > > Note that deactivate_slab() contains VM_BUG_ON(!old.frozen); > > we would have seen this triggered if we were passing unfrozen slabs to > > (defer_)deactivate_slab(). I assume it's also why the "unlucky, discard" > > code marks it frozen before calling defer_deactivate_slab() (and this patch > > removes that). Yes, exactly, since that's what deactivate_slab() wants to see. > > So I think we're fine? > > Yes. All my concerns were answered. I guess I understand it enough now to: Acked-by: Alexei Starovoitov <ast@kernel.org> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-10-24 18:09 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-10-23 12:01 [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() Vlastimil Babka 2025-10-23 12:56 ` Harry Yoo 2025-10-23 23:13 ` Alexei Starovoitov 2025-10-24 0:00 ` Harry Yoo 2025-10-24 1:17 ` Alexei Starovoitov 2025-10-24 2:03 ` Harry Yoo 2025-10-24 8:55 ` Vlastimil Babka 2025-10-24 9:36 ` Harry Yoo 2025-10-24 18:08 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox