From: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-mm@kvack.org
Subject: Re: Support concurrent local and remote frees and allocs on a slab.
Date: Mon, 7 May 2007 14:50:30 -0700 [thread overview]
Message-ID: <20070507145030.9b7f41bd.akpm@linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0705042025520.29006@schroedinger.engr.sgi.com>
On Fri, 4 May 2007 20:28:41 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> About 5-10% performance gain on netperf.
>
> [Maybe put this patch at the end of the merge queue? Works fine here but
> this is a significant change that may impact stability]
>
> What we do is use the last free field in the page struct (the private
> field that was freed up through the compound page flag rework) to setup a
> separate per cpu freelist. From that one we can allocate without taking the
> slab lock because we checkout the complete list of free objects when we
> first touch the slab and then mark the slab as completely allocated.
> If we have a cpu_freelist then we can also free to that list if we run on
> that processor without taking the slab lock.
>
> This allows even concurrent allocations and frees on the same slab using
> two mutually exclusive freelists. Allocs and frees from the processor
> owning the per cpu slab will bypass the slab lock using the cpu_freelist.
> Remove frees will use the slab lock to synchronize and use the freelist
> for marking items as free. So local allocs and frees may run concurrently
> with remote frees without synchronization.
>
> If the allocator is running out of its per cpu freelist then it will consult
> the per slab freelist (which requires the slab lock) and reload the
> cpu_freelist if there are objects that were remotely freed.
>
I must say that I'm getting increasingly foggy about what the slub data
structures are. That was my problem with slab, too: it's hard to get a
picture in one's head.
Is there some way in which we can communicate this better? It is quite
central to maintainability.
>
> ---
> include/linux/mm_types.h | 5 ++-
> mm/slub.c | 67 ++++++++++++++++++++++++++++++++++++++---------
> 2 files changed, 59 insertions(+), 13 deletions(-)
>
> Index: slub/include/linux/mm_types.h
> ===================================================================
> --- slub.orig/include/linux/mm_types.h 2007-05-04 20:09:26.000000000 -0700
> +++ slub/include/linux/mm_types.h 2007-05-04 20:09:33.000000000 -0700
> @@ -50,9 +50,12 @@ struct page {
> spinlock_t ptl;
> #endif
> struct { /* SLUB uses */
> - struct page *first_page; /* Compound pages */
> + void **cpu_freelist; /* Per cpu freelist */
> struct kmem_cache *slab; /* Pointer to slab */
> };
> + struct {
> + struct page *first_page; /* Compound pages */
> + };
> };
This change implies that "first_page" is no longer a "SLUB use". Is that
true?
I'm a bit surprised that slub didn't already have a per-cpu freelist of
objects?
Each cache has this "cpu_slab" thing, which is not documented anywhere
afaict. What does it do, and how does this change enhance it?
(I'm not really asking for a reply-by-email, btw. This is more a "this is
what people will wonder when they read your code. Please ensure tha the
answers are there for them" thing.)
> union {
> pgoff_t index; /* Our offset within mapping. */
> Index: slub/mm/slub.c
> ===================================================================
> --- slub.orig/mm/slub.c 2007-05-04 20:09:26.000000000 -0700
> +++ slub/mm/slub.c 2007-05-04 20:14:04.000000000 -0700
> @@ -81,10 +81,13 @@
> * PageActive The slab is used as a cpu cache. Allocations
> * may be performed from the slab. The slab is not
> * on any slab list and cannot be moved onto one.
> + * The cpu slab may have a cpu_freelist in order
> + * to optimize allocations and frees on a particular
> + * cpu.
> *
> * PageError Slab requires special handling due to debug
> * options set. This moves slab handling out of
> - * the fast path.
> + * the fast path and disables cpu_freelists.
> */
>
> /*
> @@ -857,6 +860,7 @@ static struct page *new_slab(struct kmem
> set_freepointer(s, last, NULL);
>
> page->freelist = start;
> + page->cpu_freelist = NULL;
> page->inuse = 0;
> out:
> if (flags & __GFP_WAIT)
> @@ -1121,6 +1125,23 @@ static void putback_slab(struct kmem_cac
> */
> static void deactivate_slab(struct kmem_cache *s, struct page *page, int cpu)
> {
> + /*
> + * Merge cpu freelist into freelist. Typically we get here
> + * because both freelists are empty. So this is unlikely
> + * to occur.
> + */
> + while (unlikely(page->cpu_freelist)) {
> + void **object;
> +
> + /* Retrieve object from cpu_freelist */
> + object = page->cpu_freelist;
> + page->cpu_freelist = page->cpu_freelist[page->offset];
> +
> + /* And put onto the regular freelist */
> + object[page->offset] = page->freelist;
> + page->freelist = object;
> + page->inuse--;
> + }
page.offset doesn't appear to be documented anywhere?
So what is pointed at by page->cpu_freelist? It appears to point at an
array of pointers to recently-used objects. But where does the storage for
that array come from? All a bit mysterious.
btw, does this code, in slab_alloc()
if (unlikely(node != -1 && page_to_nid(page) != node)) {
get appropriately optimised away on non-NUMA?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-05-07 21:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-05 3:28 Christoph Lameter
2007-05-06 4:59 ` Christoph Lameter
2007-05-06 5:45 ` Christoph Lameter
2007-05-06 19:24 ` Andrew Morton
2007-05-07 15:15 ` Christoph Lameter
2007-05-07 18:39 ` Christoph Lameter
2007-05-07 18:54 ` Andrew Morton
2007-05-07 18:58 ` Christoph Lameter
2007-05-07 20:32 ` Andrew Morton
2007-05-07 21:50 ` Andrew Morton [this message]
2007-05-07 21:55 ` Christoph Lameter
2007-05-08 0:56 ` Christoph Lameter
2007-05-08 22:05 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070507145030.9b7f41bd.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox