Hi,
OK it seems that this in deed fixed the issue, and i do not see leak now.
Thanks for fixing it :)
Btw if you guys need VPS with GPU resources, i can give you free ride on SimplePod.ai for helping fixing that issue :)




--

tel. 790 202 300

Tytus Rogalewski

Dolina Krzemowa 6A

83-010 Jagatowo

NIP: 9570976234



czw., 13 lis 2025 o 01:43 Harry Yoo <harry.yoo@oracle.com> napisał(a):
On Wed, Nov 12, 2025 at 10:46:45AM -0800, Darrick J. Wong wrote:
> On Tue, Nov 11, 2025 at 09:53:31PM +0900, Harry Yoo wrote:
> > The commit 989b09b73978 ("slab: skip percpu sheaves for remote object
> > freeing") introduced the remote_objects array in free_to_pcs_bulk() to
> > skip sheaves when objects from a remote node are freed.
> >
> > However, the array is flushed only when:
> >   1) the array becomes full (++remote_nr >= PCS_BATCH_MAX), or
> >   2) slab_free_hook() returns false and size becomes zero.
> >
> > When neither of the conditions is met, objects in the array are leaked.
> > This resulted in a memory leak [1], where 82 GiB of memory was allocated
> > for the maple_node cache.
> >
> > Flush the array after successfully freeing objects to sheaves
> > in the do_free: path.
> >
> > In the meantime, move the snippet if (!size) goto flush_remote; outside
> > the while loop for readability. Let's say all objects in the array are
> > from a remote node: then we acquire s->cpu_sheaves->lock and try to free
> > an object even when size is zero. This doesn't appear to be harmful,
> > but isn't really readable.
>
> I'll put this on my test fleet this evening.  Thank you for the quick
> fix! :)

Thanks for testing, Darrick!

--
Cheers,
Harry / Hyeonggon