Re: [PATCH] per-page SLAB freeing (only dcache for now)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>,
	linux-mm@kvack.org, akpm@osdl.org, dgc@sgi.com,
	dipankar@in.ibm.com, mbligh@mbligh.org
Subject: Re: [PATCH] per-page SLAB freeing (only dcache for now)
Date: Thu, 6 Oct 2005 13:01:15 -0300	[thread overview]
Message-ID: <20051006160115.GA30677@logos.cnet> (raw)
In-Reply-To: <4342B623.3060007@colorfullife.com>

On Tue, Oct 04, 2005 at 07:04:35PM +0200, Manfred Spraul wrote:
> Marcelo Tosatti wrote:
> 
> >Hi Manfred,
> >
> >On Mon, Oct 03, 2005 at 10:37:26PM +0200, Manfred Spraul wrote:
> > 
> >
> >>Christoph Lameter wrote:
> >>
> >>   
> >>
> >>>On Sat, 1 Oct 2005, Marcelo wrote:
> >>>
> >>>
> >>>
> >>>     
> >>>
> >>>>I thought about having a mini-API for this such as "struct 
> >>>>slab_reclaim_ops" implemented by each reclaimable cache, invoked by a 
> >>>>generic SLAB function.
> >>>>
> >>>> 
> >>>>
> >>>>       
> >>>>
> >>Which functions would be needed?
> >>- lock_cache(): No more alive/dead changes
> >>- objp_is_alive()
> >>- objp_is_killable()
> >>- objp_kill() 
> >>   
> >>
> >
> >Yep something along that line. I'll come up with something more precise
> >tomorrow.
> >
> > 
> >
> >>I think it would be simpler if the caller must mark the objects as 
> >>alive/dead before/after calling kmem_cache_alloc/free: I don't think 
> >>it's a good idea to add special case code and branches to the normal 
> >>kmem_cache_alloc codepath. And especially: It would mean that 
> >>kmem_cache_alloc must perform a slab lookup  in each alloc call, this 
> >>could be slow.
> >>The slab users could store the alive status somewhere in the object. And 
> >>they could set the flag early, e.g. disable alive as soon as an object 
> >>is put on the rcu aging list.
> >>   
> >>
> >
> >The "i_am_alive" flag purpose at the moment is to avoid interpreting
> >uninitialized data (in the dentry cache, the reference counter is bogus
> >in such case). It was just a quick hack to watch it work, it seemed to
> >me it could be done within SLAB code.
> >
> >This information ("liveness" of objects) is managed inside the SLAB
> >generic code, and it seems to be available already through the
> >kmembufctl array which is part of the management data, right?
> >
> > 
> >
> Not really. The array is only updated when the free status reaches the 
> slab structure, which is quite late. 

Thats fine, the usage information inside the array is only going to be used 
to avoid interpretation of uninitialized objects. Its safe to say
that unallocated objects will have their corresponding kmembufctl array 
entry consistent (marked as freed) at all times, right?

Actual per-object live/dead information must reside inside the objp itself
as you suggest, with guaranteed synchronization.

For the dcache its possible to use the D_UNHASHED flag (or some other 
field which describes validity).

> kmem_cache_free
> - puts the object into a per-cpu array. No locking at all, each cpu can 
> only read it's own array.
> - when that array is full, then it's put into a global array (->shared).
> - when the global array is full, then the object is marked as free in 
> the slab structure.
> - when add objects from a slab are free, then the slab is placed on the 
> free slab list
> - when there is memory pressure, then the pages from the free slab list 
> are reclaimed.
> 
> >Suppose there's no need for the cache specific functions to be aware of
> >liveness, ie. its SLAB specific information.
> >
> > 
> >
> What about RCU? We have dying objects: Still alive, because someone 
> might have a pointer to it, but already on the rcu list and will be 
> released after the next quiescent state. slab can't know that.

Objects waiting for the next RCU quiescent state cannot have references
attached, and can't be reused either. When they reach the RCU list
they are already invalid (DCACHE_UNHASHED in dcache's case).

The only references they can have at this point is against the list_head
fields.

> >Another issue is synchronization between multiple threads in this 
> >level of the reclaim path. Can be dealt with PageLock: if the bit is set,
> >don't bother checking the page, someone else is already doing
> >so.
> >
> >You mention
> >
> > 
> >
> >>- lock_cache(): No more alive/dead changes
> >>   
> >>
> >
> >With the PageLock bit, you can instruct kmem_cache_alloc() to skip partial
> >but Locked pages (thus avoiding any object allocations within that page).
> >Hum, what about higher order SLABs?
> >
> > 
> >
> You have misunderstood my question: I was thinking about object 
> dead/alive changes.
> There are two questions: First figure out how many objects from a 
> certain slab are alive. Then, if it's below a threshold, try to free 
> them. With this approach, you need lock(), is_objp_alive(), release_objp().

I'm thinking over this, will be sending something soon. 

> >Well, kmem_cache_alloc() can be a little bit smarter at this point, since 
> >its already a slow path, no? Its refill time, per-CPU cache is exhausted...
> >
> > 
> >
> Definitively. Fast path is only kmem_cache_alloc and kmem_cache_free. No 
> global cache line writes in these functions. They were down to 1 
> conditional branch and 2-3 cachelines, One of them read-only, the 
> other(s) are read/write, but per-cpu. I'm not sure how much changed with 
> the NUMA patches, but the non-numa case should try to remain simple. And 
> e.g. looking up the bufctl means an integer division. Just that 
> instruction could nearly double the runtime of kmem_cache_free().
> The shared_array part from cache_flusharray and cache_alloc_refill are 
> partially fast path: If we slow that down, then it will affect packet 
> routing. The rest is slow path.

OK fine, thanks for all your help up to now!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2005-10-06 16:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-30 19:37 Marcelo
2005-10-01  2:46 ` Christoph Lameter
2005-10-01 21:52   ` Marcelo
2005-10-03 15:24     ` Christoph Lameter
2005-10-03 20:37       ` Manfred Spraul
2005-10-03 22:17         ` Marcelo Tosatti
2005-10-04 17:04           ` Manfred Spraul
2005-10-06 16:01             ` Marcelo Tosatti [this message]
2005-10-22  1:30               ` Marcelo Tosatti
2005-10-22  6:31                 ` Andrew Morton
2005-10-22  9:21                   ` Arjan van de Ven
2005-10-22 17:08                   ` Christoph Lameter
2005-10-22 17:13                     ` ia64 page size (was Re: [PATCH] per-page SLAB freeing (only dcache for now)) Arjan van de Ven
2005-10-22 18:16                     ` [PATCH] per-page SLAB freeing (only dcache for now) Manfred Spraul
2005-10-23 18:41                       ` Marcelo Tosatti
2005-10-23 16:30                   ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051006160115.GA30677@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=akpm@osdl.org \
    --cc=clameter@engr.sgi.com \
    --cc=dgc@sgi.com \
    --cc=dipankar@in.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=manfred@colorfullife.com \
    --cc=mbligh@mbligh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox