Re: 2.5.39 kmem

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.5.39 kmem_cache bug
       [not found]   ` <3D972828.6010807@colorfullife.com>
@ 2002-09-30  0:20     ` Ed Tomlinson
  2002-09-30  5:55       ` Manfred Spraul
  0 siblings, 1 reply; 4+ messages in thread
From: Ed Tomlinson @ 2002-09-30  0:20 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-mm

On September 29, 2002 12:19 pm, Manfred Spraul wrote:
> Ed Tomlinson wrote:
> > On September 29, 2002 09:52 am, Manfred Spraul wrote:
> >>Ed Tomlinson wrote:
> >>>-	if (__kmem_cache_shrink(cachep)) {
> >>>+	/* remove any empty partial pages */
> >>>+	spin_lock_irq(&cachep->spinlock);
> >>>+	while (!cachep->growing) {
> >>>+		struct list_head *p;
> >>>+		slab_t *slabp;
> >>>+
> >>
> >>growing is guaranteed to be false - loop is not necessary.
> >
> > Sort of.  Guess since the lock is not dropped if we see !growing
> > it will stay that way as long as we stay locked.  So we do need
> > to test growing but only once.  Have I understood this correctly?
>
> No. Much simpler:
> There is no synchonization between kmem_cache_destroy and
> kmem_cache_{alloc,free}. The caller must do that.
>
> Both
> 	x = kmem_cache_create();
> 	kmem_cache_destroy(x);
> 	kmem_cache_alloc(x);
>
> and all variante where kmem_cache_alloc runs at the same time as
> kmem_cache_destroy [smp, or just sleeping in gfp] are illegal.

So if growing is set something is seriously wrong...

> > We do seem to agree on most issues.  Lets work with this and hopefully we
> > can end up with a firsst class slab implementation that works hand in
> > hand with the vm and helps the whole system perform effectivily.
>
> Yes, lets work together. Implementing & debugging slab is simple [if it
> boots, then it's correct], the design is difficult.
>
> The first problem is the per-cpu array draining. It's needed, too many
> objects can sit in the per-cpu arrays.
> < 2.5.39, the per-cpu arrays can cause more list operations than no
> batching, this is something that must be avoided.
>
> Do you see an alternative to a timer/callback/hook? What's the simplest
> approach to ensure that the callback runs on all cpus? I know Redhat has
> a scalable timer patch, that one would fix the timer to the cpu that
> called add_timer.

Maybe.  If we treat the per cpu data as special form of cache we could
use the shrinker callbacks to track how much we have to trim.  When the value
exceeds a threshold (set when we setup the callback) we trim.  We could
do the test in freeing path in slab.   

> My proposal would be a 1 (or 2, or 5) seconds callback, that frees
> 0.2*cc->limit, if there were no allocations from the slab during the
> last interval.

Using the above logic would tie the trimming to vm scanning pressure,
which is probably a good idea.  

The patch add shrinker callbacks was posted to linux-mm Sunday and
to lkml on Thursday.

My schedule lets me read and answer a little mail in the mornings (7-8am 
EDT).  When I get home from work (5pm EDT) I usually have a few hours to 
code etc.

Ed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.5.39 kmem_cache bug
  2002-09-30  0:20     ` 2.5.39 kmem_cache bug Ed Tomlinson
@ 2002-09-30  5:55       ` Manfred Spraul
  2002-09-30 11:18         ` Ed Tomlinson
  0 siblings, 1 reply; 4+ messages in thread
From: Manfred Spraul @ 2002-09-30  5:55 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: linux-mm

Ed Tomlinson wrote:
>
>>The first problem is the per-cpu array draining. It's needed, too many
>>objects can sit in the per-cpu arrays.
>>< 2.5.39, the per-cpu arrays can cause more list operations than no
>>batching, this is something that must be avoided.
>>
>>Do you see an alternative to a timer/callback/hook? What's the simplest
>>approach to ensure that the callback runs on all cpus? I know Redhat has
>>a scalable timer patch, that one would fix the timer to the cpu that
>>called add_timer.
> 
> 
> Maybe.  If we treat the per cpu data as special form of cache we could
> use the shrinker callbacks to track how much we have to trim.  When the value
> exceeds a threshold (set when we setup the callback) we trim.  We could
> do the test in freeing path in slab.   
>
2 problems:
* What if a cache falls completely idle? If there is freeing activity on 
the cache, then the cache is active, thus there is no need to flush.
* I don't think it's a good idea to add logic into the path that's 
executed for every kfree/kmem_cache_free. A timer might not be very 
pretty, but is definitively more efficient.

> The patch add shrinker callbacks was posted to linux-mm Sunday and
> to lkml on Thursday.
> 
I'll read them.
Is it guaranteed that the shrinker callbacks are called on all cpus, or 
could some cpu binding happen?

--
	Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.5.39 kmem_cache bug
  2002-09-30  5:55       ` Manfred Spraul
@ 2002-09-30 11:18         ` Ed Tomlinson
  2002-09-30 16:33           ` Manfred Spraul
  0 siblings, 1 reply; 4+ messages in thread
From: Ed Tomlinson @ 2002-09-30 11:18 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-mm

On September 30, 2002 01:55 am, Manfred Spraul wrote:
> Ed Tomlinson wrote:
> >>The first problem is the per-cpu array draining. It's needed, too many
> >>objects can sit in the per-cpu arrays.
> >>< 2.5.39, the per-cpu arrays can cause more list operations than no
> >>batching, this is something that must be avoided.
> >>
> >>Do you see an alternative to a timer/callback/hook? What's the simplest
> >>approach to ensure that the callback runs on all cpus? I know Redhat has
> >>a scalable timer patch, that one would fix the timer to the cpu that
> >>called add_timer.
> >
> > Maybe.  If we treat the per cpu data as special form of cache we could
> > use the shrinker callbacks to track how much we have to trim.  When the
> > value exceeds a threshold (set when we setup the callback) we trim.  We
> > could do the test in freeing path in slab.
>
> 2 problems:
> * What if a cache falls completely idle? If there is freeing activity on
> the cache, then the cache is active, thus there is no need to flush
> * I don't think it's a good idea to add logic into the path that's
> executed for every kfree/kmem_cache_free. A timer might not be very
> pretty, but is definitively more efficient.
> > The patch add shrinker callbacks was posted to linux-mm Sunday and
> > to lkml on Thursday.
>
> I'll read them.
> Is it guaranteed that the shrinker callbacks are called on all cpus, or
> could some cpu binding happen?

There is no guarantee.  The best we could use them for is to link the 'pressure'
on the percpu stuff to vm pressure.   From the above is does look like timers
are the way to go.

Ed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.5.39 kmem_cache bug
  2002-09-30 11:18         ` Ed Tomlinson
@ 2002-09-30 16:33           ` Manfred Spraul
  0 siblings, 0 replies; 4+ messages in thread
From: Manfred Spraul @ 2002-09-30 16:33 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: linux-mm

What's the optimal number of free objects in the partial/free lists of 
an active cache?

I'd say a few times the batchcount, otherwise a cpu won't be able to 
perform a complete refill. [during refill, at most one grow happens - 
I've assumed that swallowing 30 pages with GFP_ATOMIC in an interrupt 
handler is not nice from the system perspective]

What about this logic:
- if there were no recent allocations performed by a cpu, then return 
cc->limit/5 objects from the cpu array to the node lists.

- If a slab becomes a free slab, and there are more than 
3*cc->batchcount*NR_CPUS/NR_NODES objects in the partial or free lists, 
then return the slab immediately to the gfp.

- If noone accessed the free list recently, then a few slabs are 
returned to gfp. [<worst case number of free slabs that can exist>/5]

The constants could be updated by vm pressure callbacks.

--
	Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-09-30 16:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20020928201308.GA59189@compsoc.man.ac.uk>
     [not found] ` <200209291137.48483.tomlins@cam.org>
     [not found]   ` <3D972828.6010807@colorfullife.com>
2002-09-30  0:20     ` 2.5.39 kmem_cache bug Ed Tomlinson
2002-09-30  5:55       ` Manfred Spraul
2002-09-30 11:18         ` Ed Tomlinson
2002-09-30 16:33           ` Manfred Spraul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox