From: Andrew Morton <akpm@linux-foundation.org>
To: clameter@sgi.com
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Pekka Enberg <penberg@cs.helsinki.fi>,
suresh.b.siddha@intel.com
Subject: Re: [patch 12/26] SLUB: Slab defragmentation core
Date: Tue, 26 Jun 2007 01:18:31 -0700 [thread overview]
Message-ID: <20070626011831.181d7a6a.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070618095916.297690463@sgi.com>
On Mon, 18 Jun 2007 02:58:50 -0700 clameter@sgi.com wrote:
> Slab defragmentation occurs either
>
> 1. Unconditionally when kmem_cache_shrink is called on slab by the kernel
> calling kmem_cache_shrink or slabinfo triggering slab shrinking. This
> form performs defragmentation on all nodes of a NUMA system.
>
> 2. Conditionally when kmem_cache_defrag(<percentage>, <node>) is called.
>
> The defragmentation is only performed if the fragmentation of the slab
> is higher then the specified percentage. Fragmentation ratios are measured
> by calculating the percentage of objects in use compared to the total
> number of objects that the slab cache could hold.
>
> kmem_cache_defrag takes a node parameter. This can either be -1 if
> defragmentation should be performed on all nodes, or a node number.
> If a node number was specified then defragmentation is only performed
> on a specific node.
>
> Slab defragmentation is a memory intensive operation that can be
> sped up in a NUMA system if mostly node local memory is accessed. That
> is the case if we just have reclaimed reclaim on a node.
>
> For defragmentation SLUB first generates a sorted list of partial slabs.
> Sorting is performed according to the number of objects allocated.
> Thus the slabs with the least objects will be at the end.
>
> We extract slabs off the tail of that list until we have either reached a
> mininum number of slabs or until we encounter a slab that has more than a
> quarter of its objects allocated. Then we attempt to remove the objects
> from each of the slabs taken.
>
> In order for a slabcache to support defragmentation a couple of functions
> must be defined via kmem_cache_ops. These are
>
> void *get(struct kmem_cache *s, int nr, void **objects)
>
> Must obtain a reference to the listed objects. SLUB guarantees that
> the objects are still allocated. However, other threads may be blocked
> in slab_free attempting to free objects in the slab. These may succeed
> as soon as get() returns to the slab allocator. The function must
> be able to detect the situation and void the attempts to handle such
> objects (by for example voiding the corresponding entry in the objects
> array).
>
> No slab operations may be performed in get_reference(). Interrupts
s/get_reference/get/, yes?
> are disabled. What can be done is very limited. The slab lock
> for the page with the object is taken. Any attempt to perform a slab
> operation may lead to a deadlock.
>
> get() returns a private pointer that is passed to kick. Should we
> be unable to obtain all references then that pointer may indicate
> to the kick() function that it should not attempt any object removal
> or move but simply remove the reference counts.
>
> void kick(struct kmem_cache *, int nr, void **objects, void *get_result)
>
> After SLUB has established references to the objects in a
> slab it will drop all locks and then use kick() to move objects out
> of the slab. The existence of the object is guaranteed by virtue of
> the earlier obtained references via get(). The callback may perform
> any slab operation since no locks are held at the time of call.
>
> The callback should remove the object from the slab in some way. This
> may be accomplished by reclaiming the object and then running
> kmem_cache_free() or reallocating it and then running
> kmem_cache_free(). Reallocation is advantageous because the partial
> slabs were just sorted to have the partial slabs with the most objects
> first. Reallocation is likely to result in filling up a slab in
> addition to freeing up one slab so that it also can be removed from
> the partial list.
>
> Kick() does not return a result. SLUB will check the number of
> remaining objects in the slab. If all objects were removed then
> we know that the operation was successful.
>
Nice changelog ;)
> +static int __kmem_cache_vacate(struct kmem_cache *s,
> + struct page *page, unsigned long flags, void *scratch)
> +{
> + void **vector = scratch;
> + void *p;
> + void *addr = page_address(page);
> + DECLARE_BITMAP(map, s->objects);
A variable-sized local. We have a few of these in-kernel.
What's the worst-case here? With 4k pages and 4-byte slab it's 128 bytes
of stack? Seems acceptable.
(What's the smallest sized object slub will create? 4 bytes?)
To hold off a concurrent free while defragging, the code relies upon
slab_lock() on the current page, yes?
But slab_lock() isn't taken for slabs whose objects are larger than PAGE_SIZE.
How's that handled?
Overall: looks good. It'd be nice to get a buffer_head shrinker in place,
see how that goes from a proof-of-concept POV.
How much testing has been done on this code, and of what form, and with
what results?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-26 8:18 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-18 9:58 [patch 00/26] Current slab allocator / SLUB patch queue clameter
2007-06-18 9:58 ` [patch 01/26] SLUB Debug: Fix initial object debug state of NUMA bootstrap objects clameter
2007-06-18 9:58 ` [patch 02/26] Slab allocators: Consolidate code for krealloc in mm/util.c clameter
2007-06-18 20:03 ` Pekka Enberg
2007-06-18 9:58 ` [patch 03/26] Slab allocators: Consistent ZERO_SIZE_PTR support and NULL result semantics clameter
2007-06-18 20:08 ` Pekka Enberg
2007-06-18 9:58 ` [patch 04/26] Slab allocators: Support __GFP_ZERO in all allocators clameter
2007-06-18 10:09 ` Paul Mundt
2007-06-18 16:17 ` Christoph Lameter
2007-06-18 20:11 ` Pekka Enberg
2007-06-18 9:58 ` [patch 05/26] Slab allocators: Cleanup zeroing allocations clameter
2007-06-18 20:16 ` Pekka Enberg
2007-06-18 20:26 ` Pekka Enberg
2007-06-18 22:34 ` Christoph Lameter
2007-06-19 5:48 ` Pekka Enberg
2007-06-18 21:55 ` Christoph Lameter
2007-06-19 21:00 ` Matt Mackall
2007-06-19 22:33 ` Christoph Lameter
2007-06-20 6:14 ` Pekka J Enberg
2007-06-18 9:58 ` [patch 06/26] Slab allocators: Replace explicit zeroing with __GFP_ZERO clameter
2007-06-19 20:55 ` Pekka Enberg
2007-06-28 6:09 ` Andrew Morton
2007-06-18 9:58 ` [patch 07/26] SLUB: Add some more inlines and #ifdef CONFIG_SLUB_DEBUG clameter
2007-06-18 9:58 ` [patch 08/26] SLUB: Extract dma_kmalloc_cache from get_cache clameter
2007-06-18 9:58 ` [patch 09/26] SLUB: Do proper locking during dma slab creation clameter
2007-06-18 9:58 ` [patch 10/26] SLUB: Faster more efficient slab determination for __kmalloc clameter
2007-06-19 20:08 ` Andrew Morton
2007-06-19 22:22 ` Christoph Lameter
2007-06-19 22:29 ` Andrew Morton
2007-06-19 22:38 ` Christoph Lameter
2007-06-19 22:46 ` Andrew Morton
2007-06-25 6:41 ` Nick Piggin
2007-06-18 9:58 ` [patch 11/26] SLUB: Add support for kmem_cache_ops clameter
2007-06-19 20:58 ` Pekka Enberg
2007-06-19 22:32 ` Christoph Lameter
2007-06-18 9:58 ` [patch 12/26] SLUB: Slab defragmentation core clameter
2007-06-26 8:18 ` Andrew Morton [this message]
2007-06-26 18:19 ` Christoph Lameter
2007-06-26 18:38 ` Andrew Morton
2007-06-26 18:52 ` Christoph Lameter
2007-06-26 19:13 ` Nish Aravamudan
2007-06-26 19:19 ` Christoph Lameter
2007-06-18 9:58 ` [patch 13/26] SLUB: Extend slabinfo to support -D and -C options clameter
2007-06-18 9:58 ` [patch 14/26] SLUB: Logic to trigger slab defragmentation from memory reclaim clameter
2007-06-18 9:58 ` [patch 15/26] Slab defrag: Support generic defragmentation for inode slab caches clameter
2007-06-26 8:18 ` Andrew Morton
2007-06-26 18:21 ` Christoph Lameter
2007-06-26 19:28 ` Christoph Lameter
2007-06-26 19:37 ` Andrew Morton
2007-06-26 19:41 ` Christoph Lameter
2007-06-18 9:58 ` [patch 16/26] Slab defragmentation: Support defragmentation for extX filesystem inodes clameter
2007-06-18 9:58 ` [patch 17/26] Slab defragmentation: Support inode defragmentation for xfs clameter
2007-06-18 9:58 ` [patch 18/26] Slab defragmentation: Support procfs inode defragmentation clameter
2007-06-18 9:58 ` [patch 19/26] Slab defragmentation: Support reiserfs " clameter
2007-06-18 9:58 ` [patch 20/26] Slab defragmentation: Support inode defragmentation for sockets clameter
2007-06-18 9:58 ` [patch 21/26] Slab defragmentation: support dentry defragmentation clameter
2007-06-26 8:18 ` Andrew Morton
2007-06-26 18:23 ` Christoph Lameter
2007-06-18 9:59 ` [patch 22/26] SLUB: kmem_cache_vacate to support page allocator memory defragmentation clameter
2007-06-18 9:59 ` [patch 23/26] SLUB: Move sysfs operations outside of slub_lock clameter
2007-06-18 9:59 ` [patch 24/26] SLUB: Avoid page struct cacheline bouncing due to remote frees to cpu slab clameter
2007-06-18 9:59 ` [patch 25/26] SLUB: Add an object counter to the kmem_cache_cpu structure clameter
2007-06-18 9:59 ` [patch 26/26] SLUB: Place kmem_cache_cpu structures in a NUMA aware way clameter
2007-06-19 23:17 ` Christoph Lameter
2007-06-18 11:57 ` [patch 00/26] Current slab allocator / SLUB patch queue Michal Piotrowski
2007-06-18 16:46 ` Christoph Lameter
2007-06-18 17:38 ` Michal Piotrowski
2007-06-18 18:05 ` Christoph Lameter
2007-06-18 18:58 ` Michal Piotrowski
2007-06-18 19:00 ` Christoph Lameter
2007-06-18 19:09 ` Michal Piotrowski
2007-06-18 19:19 ` Christoph Lameter
2007-06-18 20:43 ` Michal Piotrowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070626011831.181d7a6a.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penberg@cs.helsinki.fi \
--cc=suresh.b.siddha@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox