linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: clameter@sgi.com
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	suresh.b.siddha@intel.com
Subject: Re: [patch 12/26] SLUB: Slab defragmentation core
Date: Tue, 26 Jun 2007 01:18:31 -0700	[thread overview]
Message-ID: <20070626011831.181d7a6a.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070618095916.297690463@sgi.com>

On Mon, 18 Jun 2007 02:58:50 -0700 clameter@sgi.com wrote:

> Slab defragmentation occurs either
> 
> 1. Unconditionally when kmem_cache_shrink is called on slab by the kernel
>    calling kmem_cache_shrink or slabinfo triggering slab shrinking. This
>    form performs defragmentation on all nodes of a NUMA system.
> 
> 2. Conditionally when kmem_cache_defrag(<percentage>, <node>) is called.
> 
>    The defragmentation is only performed if the fragmentation of the slab
>    is higher then the specified percentage. Fragmentation ratios are measured
>    by calculating the percentage of objects in use compared to the total
>    number of objects that the slab cache could hold.
> 
>    kmem_cache_defrag takes a node parameter. This can either be -1 if
>    defragmentation should be performed on all nodes, or a node number.
>    If a node number was specified then defragmentation is only performed
>    on a specific node.
> 
>    Slab defragmentation is a memory intensive operation that can be
>    sped up in a NUMA system if mostly node local memory is accessed. That
>    is the case if we just have reclaimed reclaim on a node.
> 
> For defragmentation SLUB first generates a sorted list of partial slabs.
> Sorting is performed according to the number of objects allocated.
> Thus the slabs with the least objects will be at the end.
> 
> We extract slabs off the tail of that list until we have either reached a
> mininum number of slabs or until we encounter a slab that has more than a
> quarter of its objects allocated. Then we attempt to remove the objects
> from each of the slabs taken.
> 
> In order for a slabcache to support defragmentation a couple of functions
> must be defined via kmem_cache_ops. These are
> 
> void *get(struct kmem_cache *s, int nr, void **objects)
> 
> 	Must obtain a reference to the listed objects. SLUB guarantees that
> 	the objects are still allocated. However, other threads may be blocked
> 	in slab_free attempting to free objects in the slab. These may succeed
> 	as soon as get() returns to the slab allocator. The function must
> 	be able to detect the situation and void the attempts to handle such
> 	objects (by for example voiding the corresponding entry in the objects
> 	array).
> 
> 	No slab operations may be performed in get_reference(). Interrupts

s/get_reference/get/, yes?

> 	are disabled. What can be done is very limited. The slab lock
> 	for the page with the object is taken. Any attempt to perform a slab
> 	operation may lead to a deadlock.
> 
> 	get() returns a private pointer that is passed to kick. Should we
> 	be unable to obtain all references then that pointer may indicate
> 	to the kick() function that it should not attempt any object removal
> 	or move but simply remove the reference counts.
> 
> void kick(struct kmem_cache *, int nr, void **objects, void *get_result)
> 
> 	After SLUB has established references to the objects in a
> 	slab it will drop all locks and then use kick() to move objects out
> 	of the slab. The existence of the object is guaranteed by virtue of
> 	the earlier obtained references via get(). The callback may perform
> 	any slab operation since no locks are held at the time of call.
> 
> 	The callback should remove the object from the slab in some way. This
> 	may be accomplished by reclaiming the object and then running
> 	kmem_cache_free() or reallocating it and then running
> 	kmem_cache_free(). Reallocation is advantageous because the partial
> 	slabs were just sorted to have the partial slabs with the most objects
> 	first. Reallocation is likely to result in filling up a slab in
> 	addition to freeing up one slab so that it also can be removed from
> 	the partial list.
> 
> 	Kick() does not return a result. SLUB will check the number of
> 	remaining objects in the slab. If all objects were removed then
> 	we know that the operation was successful.
> 

Nice changelog ;)

> +static int __kmem_cache_vacate(struct kmem_cache *s,
> +		struct page *page, unsigned long flags, void *scratch)
> +{
> +	void **vector = scratch;
> +	void *p;
> +	void *addr = page_address(page);
> +	DECLARE_BITMAP(map, s->objects);

A variable-sized local.  We have a few of these in-kernel.

What's the worst-case here?  With 4k pages and 4-byte slab it's 128 bytes
of stack?  Seems acceptable.

(What's the smallest sized object slub will create?  4 bytes?)



To hold off a concurrent free while defragging, the code relies upon
slab_lock() on the current page, yes?

But slab_lock() isn't taken for slabs whose objects are larger than PAGE_SIZE. 
How's that handled?



Overall: looks good.  It'd be nice to get a buffer_head shrinker in place,
see how that goes from a proof-of-concept POV.


How much testing has been done on this code, and of what form, and with
what results?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-06-26  8:18 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-18  9:58 [patch 00/26] Current slab allocator / SLUB patch queue clameter
2007-06-18  9:58 ` [patch 01/26] SLUB Debug: Fix initial object debug state of NUMA bootstrap objects clameter
2007-06-18  9:58 ` [patch 02/26] Slab allocators: Consolidate code for krealloc in mm/util.c clameter
2007-06-18 20:03   ` Pekka Enberg
2007-06-18  9:58 ` [patch 03/26] Slab allocators: Consistent ZERO_SIZE_PTR support and NULL result semantics clameter
2007-06-18 20:08   ` Pekka Enberg
2007-06-18  9:58 ` [patch 04/26] Slab allocators: Support __GFP_ZERO in all allocators clameter
2007-06-18 10:09   ` Paul Mundt
2007-06-18 16:17     ` Christoph Lameter
2007-06-18 20:11   ` Pekka Enberg
2007-06-18  9:58 ` [patch 05/26] Slab allocators: Cleanup zeroing allocations clameter
2007-06-18 20:16   ` Pekka Enberg
2007-06-18 20:26     ` Pekka Enberg
2007-06-18 22:34       ` Christoph Lameter
2007-06-19  5:48         ` Pekka Enberg
2007-06-18 21:55     ` Christoph Lameter
2007-06-19 21:00   ` Matt Mackall
2007-06-19 22:33     ` Christoph Lameter
2007-06-20  6:14       ` Pekka J Enberg
2007-06-18  9:58 ` [patch 06/26] Slab allocators: Replace explicit zeroing with __GFP_ZERO clameter
2007-06-19 20:55   ` Pekka Enberg
2007-06-28  6:09   ` Andrew Morton
2007-06-18  9:58 ` [patch 07/26] SLUB: Add some more inlines and #ifdef CONFIG_SLUB_DEBUG clameter
2007-06-18  9:58 ` [patch 08/26] SLUB: Extract dma_kmalloc_cache from get_cache clameter
2007-06-18  9:58 ` [patch 09/26] SLUB: Do proper locking during dma slab creation clameter
2007-06-18  9:58 ` [patch 10/26] SLUB: Faster more efficient slab determination for __kmalloc clameter
2007-06-19 20:08   ` Andrew Morton
2007-06-19 22:22     ` Christoph Lameter
2007-06-19 22:29       ` Andrew Morton
2007-06-19 22:38         ` Christoph Lameter
2007-06-19 22:46           ` Andrew Morton
2007-06-25  6:41             ` Nick Piggin
2007-06-18  9:58 ` [patch 11/26] SLUB: Add support for kmem_cache_ops clameter
2007-06-19 20:58   ` Pekka Enberg
2007-06-19 22:32     ` Christoph Lameter
2007-06-18  9:58 ` [patch 12/26] SLUB: Slab defragmentation core clameter
2007-06-26  8:18   ` Andrew Morton [this message]
2007-06-26 18:19     ` Christoph Lameter
2007-06-26 18:38       ` Andrew Morton
2007-06-26 18:52         ` Christoph Lameter
2007-06-26 19:13   ` Nish Aravamudan
2007-06-26 19:19     ` Christoph Lameter
2007-06-18  9:58 ` [patch 13/26] SLUB: Extend slabinfo to support -D and -C options clameter
2007-06-18  9:58 ` [patch 14/26] SLUB: Logic to trigger slab defragmentation from memory reclaim clameter
2007-06-18  9:58 ` [patch 15/26] Slab defrag: Support generic defragmentation for inode slab caches clameter
2007-06-26  8:18   ` Andrew Morton
2007-06-26 18:21     ` Christoph Lameter
2007-06-26 19:28     ` Christoph Lameter
2007-06-26 19:37       ` Andrew Morton
2007-06-26 19:41         ` Christoph Lameter
2007-06-18  9:58 ` [patch 16/26] Slab defragmentation: Support defragmentation for extX filesystem inodes clameter
2007-06-18  9:58 ` [patch 17/26] Slab defragmentation: Support inode defragmentation for xfs clameter
2007-06-18  9:58 ` [patch 18/26] Slab defragmentation: Support procfs inode defragmentation clameter
2007-06-18  9:58 ` [patch 19/26] Slab defragmentation: Support reiserfs " clameter
2007-06-18  9:58 ` [patch 20/26] Slab defragmentation: Support inode defragmentation for sockets clameter
2007-06-18  9:58 ` [patch 21/26] Slab defragmentation: support dentry defragmentation clameter
2007-06-26  8:18   ` Andrew Morton
2007-06-26 18:23     ` Christoph Lameter
2007-06-18  9:59 ` [patch 22/26] SLUB: kmem_cache_vacate to support page allocator memory defragmentation clameter
2007-06-18  9:59 ` [patch 23/26] SLUB: Move sysfs operations outside of slub_lock clameter
2007-06-18  9:59 ` [patch 24/26] SLUB: Avoid page struct cacheline bouncing due to remote frees to cpu slab clameter
2007-06-18  9:59 ` [patch 25/26] SLUB: Add an object counter to the kmem_cache_cpu structure clameter
2007-06-18  9:59 ` [patch 26/26] SLUB: Place kmem_cache_cpu structures in a NUMA aware way clameter
2007-06-19 23:17   ` Christoph Lameter
2007-06-18 11:57 ` [patch 00/26] Current slab allocator / SLUB patch queue Michal Piotrowski
2007-06-18 16:46   ` Christoph Lameter
2007-06-18 17:38     ` Michal Piotrowski
2007-06-18 18:05       ` Christoph Lameter
2007-06-18 18:58         ` Michal Piotrowski
2007-06-18 19:00           ` Christoph Lameter
2007-06-18 19:09             ` Michal Piotrowski
2007-06-18 19:19               ` Christoph Lameter
2007-06-18 20:43                 ` Michal Piotrowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070626011831.181d7a6a.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=suresh.b.siddha@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox