linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Harry Yoo <harry.yoo@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	Hao Li <hao.li@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Ming Lei <ming.lei@redhat.com>
Subject: Re: [PATCH slab/for-next-fixes] mm/slab: allow sheaf refill if blocking is not allowed
Date: Wed, 4 Mar 2026 19:03:20 +0900	[thread overview]
Message-ID: <aagDaOUvgMSipjXa@hyeyoo> (raw)
In-Reply-To: <a7494308-cec6-43c7-aa17-a438747b50c3@suse.cz>

On Wed, Mar 04, 2026 at 10:58:58AM +0100, Vlastimil Babka wrote:
> On 3/4/26 4:05 AM, Harry Yoo wrote:
> > On Mon, Mar 02, 2026 at 10:55:37AM +0100, Vlastimil Babka (SUSE) wrote:
> >> Ming Lei reported [1] a regression in the ublk null target benchmark due
> >> to sheaves. The profile shows that the alloc_from_pcs() fastpath fails
> >> and allocations fall back to ___slab_alloc(). It also shows the
> >> allocations happen through mempool_alloc().
> >>
> >> The strategy of mempool_alloc() is to call the underlying allocator
> >> (here slab) without __GFP_DIRECT_RECLAIM first. This does not play well
> >> with __pcs_replace_empty_main() checking for gfpflags_allow_blocking()
> >> to decide if it should refill an empty sheaf or fallback to the
> >> slowpath, so we end up falling back.
> >>
> >> We could change the mempool strategy but there might be other paths
> >> doing the same ting. So instead allow sheaf refill when blocking is not
> >> allowed, changing the condition to gfpflags_allow_spinning(). The
> >> original condition was unnecessarily restrictive.
> >>
> >> Note this doesn't fully resolve the regression [1] as another component
> >> of that are memoryless nodes, which is to be addressed separately.
> >>
> >> Reported-by: Ming Lei <ming.lei@redhat.com>
> >> Fixes: e47c897a2949 ("slab: add sheaves to most caches")
> >> Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
> >> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> >> ---
> >>  mm/slub.c | 21 +++++++++------------
> >>  1 file changed, 9 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index b1e9f16ba435..17b200695e9b 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -4632,11 +4631,8 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
> >>  	if (!full)
> >>  		return NULL;
> >>  
> >> -	/*
> >> -	 * we can reach here only when gfpflags_allow_blocking
> >> -	 * so this must not be an irq
> >> -	 */
> >> -	local_lock(&s->cpu_sheaves->lock);
> >> +	if (!local_trylock(&s->cpu_sheaves->lock))
> >> +		goto barn_put;
> > 
> > My AI buddy says (don't worry, I filtered it):
> > | When local_trylock() fails above, the function jumps to barn_put and returns
> > | pcs without holding the lock. This appears to violate the function's contract
> > | documented in the comment at the beginning of __pcs_replace_empty_main():
> > | 
> > |     "If not successful, returns NULL and the local lock unlocked."
> > | 
> > | The caller in alloc_from_pcs() checks for NULL to detect failure:
> > | 
> > |     if (unlikely(pcs->main->size == 0)) {
> > |         pcs = __pcs_replace_empty_main(s, pcs, gfp);
> > |         if (unlikely(!pcs))
> > |             return NULL;
> > |     }
> > | 
> > | If the trylock fails and pcs (non-NULL) is returned, the caller proceeds
> > | without realizing the lock was never re-acquired. This leads to accessing
> > | pcs->main without the lock and later trying to unlock a lock that isn't held.
> > 
> > And the analysis sounds correct to me.
> > 
> > perhaps it should be:
> > 
> > if (!local_trylock(&s->cpu_sheaves->lock)) {
> > 	pcs = NULL;
> > 	goto barn_put;
> > }
> 
> Thanks a lot Harry. In fact I realized this mistake after initially
> sending the patch to Ming in a reply, and fixed it locally (same as you
> suggest).
> Or so I thought, because the fix got apparently lost.

That happens sometimes, yeah :)

> So I'll do that now in slab/for-next-fixes

Thanks.

> Or actually I think a more robust way is to set pcs = NULL after the
> unlock, unconditionally, so I'll do that.

Oh, that sounds better!

> >>  	pcs = this_cpu_ptr(s->cpu_sheaves);
> >>  
> >>  	/*
> >> @@ -4667,6 +4663,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
> >>  		return pcs;
> >>  	}
> >>  
> >> +barn_put:
> >>  	barn_put_full_sheaf(barn, full);
> >>  	stat(s, BARN_PUT);

-- 
Cheers,
Harry / Hyeonggon


  reply	other threads:[~2026-03-04 10:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02  9:55 Vlastimil Babka (SUSE)
2026-03-04  3:05 ` Harry Yoo
2026-03-04  9:58   ` Vlastimil Babka
2026-03-04 10:03     ` Harry Yoo [this message]
2026-03-04  7:44 ` Hao Li
2026-03-04 10:14   ` Vlastimil Babka (SUSE)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aagDaOUvgMSipjXa@hyeyoo \
    --to=harry.yoo@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hao.li@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ming.lei@redhat.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox