linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
@ 2025-10-23 12:01 Vlastimil Babka
  2025-10-23 12:56 ` Harry Yoo
  2025-10-23 23:13 ` Alexei Starovoitov
  0 siblings, 2 replies; 9+ messages in thread
From: Vlastimil Babka @ 2025-10-23 12:01 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin,
	Harry Yoo, Alexei Starovoitov
  Cc: linux-mm, linux-kernel, Vlastimil Babka

Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
that we discard the newly allocated slab if we can't spin and we fail to
trylock. As a result we don't perform inc_slabs_node() later in the
function. Instead we perform a deferred deactivate_slab() which can
either put the unacounted slab on partial list, or discard it
immediately while performing dec_slabs_node(). Either way will cause an
accounting imbalance.

Fix this by not marking the slab as frozen, and using free_slab()
instead of deactivate_slab() for non-frozen slabs in
free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
case. By not using discard_slab() we avoid dec_slabs_node().

Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Changes in v2:
- Fix the problem differently. Harry pointed out that we can't move
  inc_slabs_node() outside of list_lock protected regions as that would
  reintroduce issues fixed by commit c7323a5ad078
- Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
---
 mm/slub.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 23d8f54e9486..87a1d2f9de0d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
 
 	if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
 		/* Unlucky, discard newly allocated slab */
-		slab->frozen = 1;
 		defer_deactivate_slab(slab, NULL);
 		return NULL;
 	}
@@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
 		struct slab *slab = container_of(pos, struct slab, llnode);
 
 #ifdef CONFIG_SLUB_TINY
-		discard_slab(slab->slab_cache, slab);
+		free_slab(slab->slab_cache, slab);
 #else
-		deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
+		if (slab->frozen)
+			deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
+		else
+			free_slab(slab->slab_cache, slab);
 #endif
 	}
 }

---
base-commit: 6ed8bfd24ce1cb31742b09a3eb557cd008533eec
change-id: 20251022-fix-slab-accounting-f0abbda8a6ff

Best regards,
-- 
Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
  2025-10-23 12:01 [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() Vlastimil Babka
@ 2025-10-23 12:56 ` Harry Yoo
  2025-10-23 23:13 ` Alexei Starovoitov
  1 sibling, 0 replies; 9+ messages in thread
From: Harry Yoo @ 2025-10-23 12:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin,
	Alexei Starovoitov, linux-mm, linux-kernel

On Thu, Oct 23, 2025 at 02:01:07PM +0200, Vlastimil Babka wrote:
> Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> that we discard the newly allocated slab if we can't spin and we fail to
> trylock. As a result we don't perform inc_slabs_node() later in the
> function. Instead we perform a deferred deactivate_slab() which can
> either put the unacounted slab on partial list, or discard it
> immediately while performing dec_slabs_node(). Either way will cause an
> accounting imbalance.
> 
> Fix this by not marking the slab as frozen, and using free_slab()
> instead of deactivate_slab() for non-frozen slabs in
> free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> case. By not using discard_slab() we avoid dec_slabs_node().
> 
> Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Changes in v2:
> - Fix the problem differently. Harry pointed out that we can't move
>   inc_slabs_node() outside of list_lock protected regions as that would
>   reintroduce issues fixed by commit c7323a5ad078
> - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> ---

Looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
  2025-10-23 12:01 [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() Vlastimil Babka
  2025-10-23 12:56 ` Harry Yoo
@ 2025-10-23 23:13 ` Alexei Starovoitov
  2025-10-24  0:00   ` Harry Yoo
  1 sibling, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2025-10-23 23:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin,
	Harry Yoo, Alexei Starovoitov, linux-mm, LKML

On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> that we discard the newly allocated slab if we can't spin and we fail to
> trylock. As a result we don't perform inc_slabs_node() later in the
> function. Instead we perform a deferred deactivate_slab() which can
> either put the unacounted slab on partial list, or discard it
> immediately while performing dec_slabs_node(). Either way will cause an
> accounting imbalance.
>
> Fix this by not marking the slab as frozen, and using free_slab()
> instead of deactivate_slab() for non-frozen slabs in
> free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> case. By not using discard_slab() we avoid dec_slabs_node().
>
> Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Changes in v2:
> - Fix the problem differently. Harry pointed out that we can't move
>   inc_slabs_node() outside of list_lock protected regions as that would
>   reintroduce issues fixed by commit c7323a5ad078
> - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> ---
>  mm/slub.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 23d8f54e9486..87a1d2f9de0d 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
>
>         if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
>                 /* Unlucky, discard newly allocated slab */
> -               slab->frozen = 1;
>                 defer_deactivate_slab(slab, NULL);
>                 return NULL;
>         }
> @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
>                 struct slab *slab = container_of(pos, struct slab, llnode);
>
>  #ifdef CONFIG_SLUB_TINY
> -               discard_slab(slab->slab_cache, slab);
> +               free_slab(slab->slab_cache, slab);
>  #else
> -               deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> +               if (slab->frozen)
> +                       deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> +               else
> +                       free_slab(slab->slab_cache, slab);

A bit odd to use 'frozen' flag as such a signal.
I guess I'm worried that truly !frozen slab can come here
via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
And things will be much worse than just accounting.

Maybe add
  inc_slabs_node(s, nid, slab->objects);
right before
  defer_deactivate_slab(slab, NULL);
  return NULL;

I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
It's been 3 years since.
We have an empty slab here that is going to be freed soon.
It's effectively frozen, so inc_slabs_node() on it seems like a safe fix.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
  2025-10-23 23:13 ` Alexei Starovoitov
@ 2025-10-24  0:00   ` Harry Yoo
  2025-10-24  1:17     ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Harry Yoo @ 2025-10-24  0:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm,
	LKML

On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> > kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> > that we discard the newly allocated slab if we can't spin and we fail to
> > trylock. As a result we don't perform inc_slabs_node() later in the
> > function. Instead we perform a deferred deactivate_slab() which can
> > either put the unacounted slab on partial list, or discard it
> > immediately while performing dec_slabs_node(). Either way will cause an
> > accounting imbalance.
> >
> > Fix this by not marking the slab as frozen, and using free_slab()
> > instead of deactivate_slab() for non-frozen slabs in
> > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> > case. By not using discard_slab() we avoid dec_slabs_node().
> >
> > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > ---
> > Changes in v2:
> > - Fix the problem differently. Harry pointed out that we can't move
> >   inc_slabs_node() outside of list_lock protected regions as that would
> >   reintroduce issues fixed by commit c7323a5ad078
> > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> > ---
> >  mm/slub.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 23d8f54e9486..87a1d2f9de0d 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> >
> >         if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
> >                 /* Unlucky, discard newly allocated slab */
> > -               slab->frozen = 1;
> >                 defer_deactivate_slab(slab, NULL);
> >                 return NULL;
> >         }
> > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
> >                 struct slab *slab = container_of(pos, struct slab, llnode);
> >
> >  #ifdef CONFIG_SLUB_TINY
> > -               discard_slab(slab->slab_cache, slab);
> > +               free_slab(slab->slab_cache, slab);
> >  #else
> > -               deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > +               if (slab->frozen)
> > +                       deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > +               else
> > +                       free_slab(slab->slab_cache, slab);
> 
> A bit odd to use 'frozen' flag as such a signal.
> I guess I'm worried that truly !frozen slab can come here
> via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
> And things will be much worse than just accounting.

But the cpu slab must have been frozen before it's attached to
c->slab?

> Maybe add
>   inc_slabs_node(s, nid, slab->objects);
> right before
>   defer_deactivate_slab(slab, NULL);
>   return NULL;
> 
> I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
> It's been 3 years since.

When n->nr_slabs is inconsistent, validate_slab_node() might report an
error (false positive) when someone wrote '1' to
/sys/kernel/slab/<cache name>/validate

> We have an empty slab here that is going to be freed soon.
> It's effectively frozen, so inc_slabs_node() on it seems like a safe fix.

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
  2025-10-24  0:00   ` Harry Yoo
@ 2025-10-24  1:17     ` Alexei Starovoitov
  2025-10-24  2:03       ` Harry Yoo
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2025-10-24  1:17 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm,
	LKML

On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote:
> > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > >
> > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> > > that we discard the newly allocated slab if we can't spin and we fail to
> > > trylock. As a result we don't perform inc_slabs_node() later in the
> > > function. Instead we perform a deferred deactivate_slab() which can
> > > either put the unacounted slab on partial list, or discard it
> > > immediately while performing dec_slabs_node(). Either way will cause an
> > > accounting imbalance.
> > >
> > > Fix this by not marking the slab as frozen, and using free_slab()
> > > instead of deactivate_slab() for non-frozen slabs in
> > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> > > case. By not using discard_slab() we avoid dec_slabs_node().
> > >
> > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > > ---
> > > Changes in v2:
> > > - Fix the problem differently. Harry pointed out that we can't move
> > >   inc_slabs_node() outside of list_lock protected regions as that would
> > >   reintroduce issues fixed by commit c7323a5ad078
> > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> > > ---
> > >  mm/slub.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 23d8f54e9486..87a1d2f9de0d 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > >
> > >         if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
> > >                 /* Unlucky, discard newly allocated slab */
> > > -               slab->frozen = 1;
> > >                 defer_deactivate_slab(slab, NULL);
> > >                 return NULL;
> > >         }
> > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
> > >                 struct slab *slab = container_of(pos, struct slab, llnode);
> > >
> > >  #ifdef CONFIG_SLUB_TINY
> > > -               discard_slab(slab->slab_cache, slab);
> > > +               free_slab(slab->slab_cache, slab);
> > >  #else
> > > -               deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > > +               if (slab->frozen)
> > > +                       deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > > +               else
> > > +                       free_slab(slab->slab_cache, slab);
> >
> > A bit odd to use 'frozen' flag as such a signal.
> > I guess I'm worried that truly !frozen slab can come here
> > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
> > And things will be much worse than just accounting.
>
> But the cpu slab must have been frozen before it's attached to
> c->slab?

Is it?
the path is
c = slub_get_cpu_ptr(s->cpu_slab);
if (unlikely(c->slab)) {
   struct slab *flush_slab = c->slab;
   defer_deactivate_slab(flush_slab, ...);

I don't see why it would be frozen.

> > Maybe add
> >   inc_slabs_node(s, nid, slab->objects);
> > right before
> >   defer_deactivate_slab(slab, NULL);
> >   return NULL;
> >
> > I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
> > It's been 3 years since.
>
> When n->nr_slabs is inconsistent, validate_slab_node() might report an
> error (false positive) when someone wrote '1' to
> /sys/kernel/slab/<cache name>/validate

Ok. I see it now. It's the actual number of elements in n->full
list needs to match n->nr_slabs.

But then how it's not broken already?
I see that
alloc_single_from_new_slab()
unconditionally does inc_slabs_node(), but
slab itself is added either to n->full or n->partial lists.
And validate_slab_node() should be complaining already.

Anyway, I'm not arguing. Just trying to understand.
If you think the fix is fine, then go ahead.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
  2025-10-24  1:17     ` Alexei Starovoitov
@ 2025-10-24  2:03       ` Harry Yoo
  2025-10-24  8:55         ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Harry Yoo @ 2025-10-24  2:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm,
	LKML

On Thu, Oct 23, 2025 at 06:17:19PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote:
> > > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > >
> > > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> > > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> > > > that we discard the newly allocated slab if we can't spin and we fail to
> > > > trylock. As a result we don't perform inc_slabs_node() later in the
> > > > function. Instead we perform a deferred deactivate_slab() which can
> > > > either put the unacounted slab on partial list, or discard it
> > > > immediately while performing dec_slabs_node(). Either way will cause an
> > > > accounting imbalance.
> > > >
> > > > Fix this by not marking the slab as frozen, and using free_slab()
> > > > instead of deactivate_slab() for non-frozen slabs in
> > > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> > > > case. By not using discard_slab() we avoid dec_slabs_node().
> > > >
> > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> > > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > > > ---
> > > > Changes in v2:
> > > > - Fix the problem differently. Harry pointed out that we can't move
> > > >   inc_slabs_node() outside of list_lock protected regions as that would
> > > >   reintroduce issues fixed by commit c7323a5ad078
> > > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> > > > ---
> > > >  mm/slub.c | 8 +++++---
> > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > index 23d8f54e9486..87a1d2f9de0d 100644
> > > > --- a/mm/slub.c
> > > > +++ b/mm/slub.c
> > > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > > >
> > > >         if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
> > > >                 /* Unlucky, discard newly allocated slab */
> > > > -               slab->frozen = 1;
> > > >                 defer_deactivate_slab(slab, NULL);
> > > >                 return NULL;
> > > >         }
> > > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
> > > >                 struct slab *slab = container_of(pos, struct slab, llnode);
> > > >
> > > >  #ifdef CONFIG_SLUB_TINY
> > > > -               discard_slab(slab->slab_cache, slab);
> > > > +               free_slab(slab->slab_cache, slab);
> > > >  #else
> > > > -               deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > > > +               if (slab->frozen)
> > > > +                       deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > > > +               else
> > > > +                       free_slab(slab->slab_cache, slab);
> > >
> > > A bit odd to use 'frozen' flag as such a signal.
> > > I guess I'm worried that truly !frozen slab can come here
> > > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
> > > And things will be much worse than just accounting.
> >
> > But the cpu slab must have been frozen before it's attached to
> > c->slab?
> 
> Is it?
> the path is
> c = slub_get_cpu_ptr(s->cpu_slab);
> if (unlikely(c->slab)) {
>    struct slab *flush_slab = c->slab;
>    defer_deactivate_slab(flush_slab, ...);
> 
> I don't see why it would be frozen.

Oh god. I was going to say the cpu slab is always frozen. It has been
true for very long time, but it seems it's not true after commit 90b1e56641
("mm/slub: directly load freelist from cpu partial slab in the likely case").

So I think you're right that a non-frozen slab can go through
free_slab() in free_deferred_objects()...

But fixing this should be simple. Add something like
freeze_and_get_freelist() and call it when SLUB take a slab from
per-cpu partial slab list?

> > > Maybe add
> > >   inc_slabs_node(s, nid, slab->objects);
> > > right before
> > >   defer_deactivate_slab(slab, NULL);
> > >   return NULL;
> > >
> > > I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
> > > It's been 3 years since.
> >
> > When n->nr_slabs is inconsistent, validate_slab_node() might report an
> > error (false positive) when someone wrote '1' to
> > /sys/kernel/slab/<cache name>/validate
> 
> Ok. I see it now. It's the actual number of elements in n->full
> list needs to match n->nr_slabs.
> 
> But then how it's not broken already?
> I see that
> alloc_single_from_new_slab()
> unconditionally does inc_slabs_node(), but

It increments n->nr_slabs. It doesn't matter which list it's going to be
added to, because it's total number of slabs in that node.

> slab itself is added either to n->full or n->partial lists.

and then n->nr_partial is also incremented if it's added to n->partial.

> And validate_slab_node() should be complaining already.

The debug routine checks if:
- the number of slabs in n->partial == n->nr_partial
- the number of slabs in n->full + n->partial == n->nr_slabs

under n->list_lock. So it's not broken?

> Anyway, I'm not arguing. Just trying to understand.
> If you think the fix is fine, then go ahead.

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
  2025-10-24  2:03       ` Harry Yoo
@ 2025-10-24  8:55         ` Vlastimil Babka
  2025-10-24  9:36           ` Harry Yoo
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2025-10-24  8:55 UTC (permalink / raw)
  To: Harry Yoo, Alexei Starovoitov
  Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin,
	Alexei Starovoitov, linux-mm, LKML

On 10/24/25 04:03, Harry Yoo wrote:
> On Thu, Oct 23, 2025 at 06:17:19PM -0700, Alexei Starovoitov wrote:
>> On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@oracle.com> wrote:
>> >
>> > On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote:
>> > > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>> > > >
>> > > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
>> > > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
>> > > > that we discard the newly allocated slab if we can't spin and we fail to
>> > > > trylock. As a result we don't perform inc_slabs_node() later in the
>> > > > function. Instead we perform a deferred deactivate_slab() which can
>> > > > either put the unacounted slab on partial list, or discard it
>> > > > immediately while performing dec_slabs_node(). Either way will cause an
>> > > > accounting imbalance.
>> > > >
>> > > > Fix this by not marking the slab as frozen, and using free_slab()
>> > > > instead of deactivate_slab() for non-frozen slabs in
>> > > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
>> > > > case. By not using discard_slab() we avoid dec_slabs_node().
>> > > >
>> > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
>> > > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> > > > ---
>> > > > Changes in v2:
>> > > > - Fix the problem differently. Harry pointed out that we can't move
>> > > >   inc_slabs_node() outside of list_lock protected regions as that would
>> > > >   reintroduce issues fixed by commit c7323a5ad078
>> > > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
>> > > > ---
>> > > >  mm/slub.c | 8 +++++---
>> > > >  1 file changed, 5 insertions(+), 3 deletions(-)
>> > > >
>> > > > diff --git a/mm/slub.c b/mm/slub.c
>> > > > index 23d8f54e9486..87a1d2f9de0d 100644
>> > > > --- a/mm/slub.c
>> > > > +++ b/mm/slub.c
>> > > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
>> > > >
>> > > >         if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
>> > > >                 /* Unlucky, discard newly allocated slab */
>> > > > -               slab->frozen = 1;
>> > > >                 defer_deactivate_slab(slab, NULL);
>> > > >                 return NULL;
>> > > >         }
>> > > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
>> > > >                 struct slab *slab = container_of(pos, struct slab, llnode);
>> > > >
>> > > >  #ifdef CONFIG_SLUB_TINY
>> > > > -               discard_slab(slab->slab_cache, slab);
>> > > > +               free_slab(slab->slab_cache, slab);
>> > > >  #else
>> > > > -               deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
>> > > > +               if (slab->frozen)
>> > > > +                       deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
>> > > > +               else
>> > > > +                       free_slab(slab->slab_cache, slab);
>> > >
>> > > A bit odd to use 'frozen' flag as such a signal.
>> > > I guess I'm worried that truly !frozen slab can come here
>> > > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
>> > > And things will be much worse than just accounting.
>> >
>> > But the cpu slab must have been frozen before it's attached to
>> > c->slab?

Note that deactivate_slab() contains VM_BUG_ON(!old.frozen);
we would have seen this triggered if we were passing unfrozen slabs to
(defer_)deactivate_slab(). I assume it's also why the "unlucky, discard"
code marks it frozen before calling defer_deactivate_slab() (and this patch
removes that).

>> Is it?
>> the path is
>> c = slub_get_cpu_ptr(s->cpu_slab);
>> if (unlikely(c->slab)) {
>>    struct slab *flush_slab = c->slab;
>>    defer_deactivate_slab(flush_slab, ...);
>> 
>> I don't see why it would be frozen.

c->slab is always frozen, that's an invariant

> 
> Oh god. I was going to say the cpu slab is always frozen. It has been
> true for very long time, but it seems it's not true after commit 90b1e56641
> ("mm/slub: directly load freelist from cpu partial slab in the likely case").

It's still true. That commit only removes VM_BUG_ON(!new.frozen); where
"new" is in fact the old state - when slab is on cpu partial list it's not
yet frozen. get_freelist() then sets new.frozen = freelist != NULL;
and we know that freelist cant't be NULL for a slab on the cpu partial list.
The commit even added VM_BUG_ON(!freelist); on the get_freelist() result for
this case.

So I think we're fine?

> So I think you're right that a non-frozen slab can go through
> free_slab() in free_deferred_objects()...
> 
> But fixing this should be simple. Add something like
> freeze_and_get_freelist() and call it when SLUB take a slab from
> per-cpu partial slab list?
> 
>> > > Maybe add
>> > >   inc_slabs_node(s, nid, slab->objects);
>> > > right before
>> > >   defer_deactivate_slab(slab, NULL);
>> > >   return NULL;
>> > >
>> > > I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
>> > > It's been 3 years since.
>> >
>> > When n->nr_slabs is inconsistent, validate_slab_node() might report an
>> > error (false positive) when someone wrote '1' to
>> > /sys/kernel/slab/<cache name>/validate
>> 
>> Ok. I see it now. It's the actual number of elements in n->full
>> list needs to match n->nr_slabs.
>> 
>> But then how it's not broken already?
>> I see that
>> alloc_single_from_new_slab()
>> unconditionally does inc_slabs_node(), but
> 
> It increments n->nr_slabs. It doesn't matter which list it's going to be
> added to, because it's total number of slabs in that node.
> 
>> slab itself is added either to n->full or n->partial lists.
> 
> and then n->nr_partial is also incremented if it's added to n->partial.
> 
>> And validate_slab_node() should be complaining already.
> 
> The debug routine checks if:
> - the number of slabs in n->partial == n->nr_partial
> - the number of slabs in n->full + n->partial == n->nr_slabs
> 
> under n->list_lock. So it's not broken?
> 
>> Anyway, I'm not arguing. Just trying to understand.
>> If you think the fix is fine, then go ahead.
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
  2025-10-24  8:55         ` Vlastimil Babka
@ 2025-10-24  9:36           ` Harry Yoo
  2025-10-24 18:08             ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Harry Yoo @ 2025-10-24  9:36 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Alexei Starovoitov, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm,
	LKML

On Fri, Oct 24, 2025 at 10:55:20AM +0200, Vlastimil Babka wrote:
> On 10/24/25 04:03, Harry Yoo wrote:
> > On Thu, Oct 23, 2025 at 06:17:19PM -0700, Alexei Starovoitov wrote:
> >> On Thu, Oct 23, 2025 at 5:00 PM Harry Yoo <harry.yoo@oracle.com> wrote:
> >> >
> >> > On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote:
> >> > > On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >> > > >
> >> > > > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> >> > > > kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> >> > > > that we discard the newly allocated slab if we can't spin and we fail to
> >> > > > trylock. As a result we don't perform inc_slabs_node() later in the
> >> > > > function. Instead we perform a deferred deactivate_slab() which can
> >> > > > either put the unacounted slab on partial list, or discard it
> >> > > > immediately while performing dec_slabs_node(). Either way will cause an
> >> > > > accounting imbalance.
> >> > > >
> >> > > > Fix this by not marking the slab as frozen, and using free_slab()
> >> > > > instead of deactivate_slab() for non-frozen slabs in
> >> > > > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> >> > > > case. By not using discard_slab() we avoid dec_slabs_node().
> >> > > >
> >> > > > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> >> > > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> > > > ---
> >> > > > Changes in v2:
> >> > > > - Fix the problem differently. Harry pointed out that we can't move
> >> > > >   inc_slabs_node() outside of list_lock protected regions as that would
> >> > > >   reintroduce issues fixed by commit c7323a5ad078
> >> > > > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> >> > > > ---
> >> > > >  mm/slub.c | 8 +++++---
> >> > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> >> > > >
> >> > > > diff --git a/mm/slub.c b/mm/slub.c
> >> > > > index 23d8f54e9486..87a1d2f9de0d 100644
> >> > > > --- a/mm/slub.c
> >> > > > +++ b/mm/slub.c
> >> > > > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> >> > > >
> >> > > >         if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
> >> > > >                 /* Unlucky, discard newly allocated slab */
> >> > > > -               slab->frozen = 1;
> >> > > >                 defer_deactivate_slab(slab, NULL);
> >> > > >                 return NULL;
> >> > > >         }
> >> > > > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
> >> > > >                 struct slab *slab = container_of(pos, struct slab, llnode);
> >> > > >
> >> > > >  #ifdef CONFIG_SLUB_TINY
> >> > > > -               discard_slab(slab->slab_cache, slab);
> >> > > > +               free_slab(slab->slab_cache, slab);
> >> > > >  #else
> >> > > > -               deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> >> > > > +               if (slab->frozen)
> >> > > > +                       deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> >> > > > +               else
> >> > > > +                       free_slab(slab->slab_cache, slab);
> >> > >
> >> > > A bit odd to use 'frozen' flag as such a signal.
> >> > > I guess I'm worried that truly !frozen slab can come here
> >> > > via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
> >> > > And things will be much worse than just accounting.
> >> >
> >> > But the cpu slab must have been frozen before it's attached to
> >> > c->slab?
> 
> Note that deactivate_slab() contains VM_BUG_ON(!old.frozen);
> we would have seen this triggered if we were passing unfrozen slabs to
> (defer_)deactivate_slab(). I assume it's also why the "unlucky, discard"
> code marks it frozen before calling defer_deactivate_slab() (and this patch
> removes that).
> 
> >> Is it?
> >> the path is
> >> c = slub_get_cpu_ptr(s->cpu_slab);
> >> if (unlikely(c->slab)) {
> >>    struct slab *flush_slab = c->slab;
> >>    defer_deactivate_slab(flush_slab, ...);
> >> 
> >> I don't see why it would be frozen.
> 
> c->slab is always frozen, that's an invariant
> 
> > 
> > Oh god. I was going to say the cpu slab is always frozen. It has been
> > true for very long time, but it seems it's not true after commit 90b1e56641
> > ("mm/slub: directly load freelist from cpu partial slab in the likely case").
> 
> It's still true. That commit only removes VM_BUG_ON(!new.frozen); where
> "new" is in fact the old state - when slab is on cpu partial list it's not
> yet frozen. get_freelist() then sets new.frozen = freelist != NULL;

(scratching cheek in embarrassment) You're right.

I was thinking that changing from calling freeze_slab() & get_freelist()
to calling just get_freelist() would make it non-frozen, but actually
get_freelist() freezes it! My mistake.

> and we know that freelist cant't be NULL for a slab on the cpu partial list.
> The commit even added VM_BUG_ON(!freelist); on the get_freelist() result for
> this case.

Yes, as long as it's in percpu partial slab list, it cannot be NULL.

> So I think we're fine?

Yes.

> > So I think you're right that a non-frozen slab can go through
> > free_slab() in free_deferred_objects()...
> > 
> > But fixing this should be simple. Add something like
> > freeze_and_get_freelist() and call it when SLUB take a slab from
> > per-cpu partial slab list?
> > 
> >> > > Maybe add
> >> > >   inc_slabs_node(s, nid, slab->objects);
> >> > > right before
> >> > >   defer_deactivate_slab(slab, NULL);
> >> > >   return NULL;
> >> > >
> >> > > I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
> >> > > It's been 3 years since.
> >> >
> >> > When n->nr_slabs is inconsistent, validate_slab_node() might report an
> >> > error (false positive) when someone wrote '1' to
> >> > /sys/kernel/slab/<cache name>/validate
> >> 
> >> Ok. I see it now. It's the actual number of elements in n->full
> >> list needs to match n->nr_slabs.
> >> 
> >> But then how it's not broken already?
> >> I see that
> >> alloc_single_from_new_slab()
> >> unconditionally does inc_slabs_node(), but
> > 
> > It increments n->nr_slabs. It doesn't matter which list it's going to be
> > added to, because it's total number of slabs in that node.
> > 
> >> slab itself is added either to n->full or n->partial lists.
> > 
> > and then n->nr_partial is also incremented if it's added to n->partial.
> > 
> >> And validate_slab_node() should be complaining already.
> > 
> > The debug routine checks if:
> > - the number of slabs in n->partial == n->nr_partial
> > - the number of slabs in n->full + n->partial == n->nr_slabs
> > 
> > under n->list_lock. So it's not broken?
> > 
> >> Anyway, I'm not arguing. Just trying to understand.
> >> If you think the fix is fine, then go ahead.
> > 
> 

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
  2025-10-24  9:36           ` Harry Yoo
@ 2025-10-24 18:08             ` Alexei Starovoitov
  0 siblings, 0 replies; 9+ messages in thread
From: Alexei Starovoitov @ 2025-10-24 18:08 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Alexei Starovoitov, linux-mm,
	LKML

On Fri, Oct 24, 2025 at 2:36 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> >
> > Note that deactivate_slab() contains VM_BUG_ON(!old.frozen);
> > we would have seen this triggered if we were passing unfrozen slabs to
> > (defer_)deactivate_slab(). I assume it's also why the "unlucky, discard"
> > code marks it frozen before calling defer_deactivate_slab() (and this patch
> > removes that).

Yes, exactly, since that's what deactivate_slab() wants to see.

> > So I think we're fine?
>
> Yes.

All my concerns were answered. I guess I understand it enough now to:
Acked-by: Alexei Starovoitov <ast@kernel.org>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-10-24 18:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-23 12:01 [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab() Vlastimil Babka
2025-10-23 12:56 ` Harry Yoo
2025-10-23 23:13 ` Alexei Starovoitov
2025-10-24  0:00   ` Harry Yoo
2025-10-24  1:17     ` Alexei Starovoitov
2025-10-24  2:03       ` Harry Yoo
2025-10-24  8:55         ` Vlastimil Babka
2025-10-24  9:36           ` Harry Yoo
2025-10-24 18:08             ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox