From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Christoph Lameter <clameter@sgi.com>
Cc: Matt Mackall <mpm@selenic.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Thomas Graf <tgraf@suug.ch>, David Miller <davem@davemloft.net>,
Andrew Morton <akpm@linux-foundation.org>,
Daniel Phillips <phillips@google.com>,
Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [PATCH 0/5] make slab gfp fair
Date: Wed, 16 May 2007 22:18:58 +0200 [thread overview]
Message-ID: <1179346738.2912.39.camel@lappy> (raw)
In-Reply-To: <Pine.LNX.4.64.0705161235490.10660@schroedinger.engr.sgi.com>
On Wed, 2007-05-16 at 12:53 -0700, Christoph Lameter wrote:
> On Wed, 16 May 2007, Peter Zijlstra wrote:
>
> > If this 4k cpu system ever gets to touch the new lock it is in way
> > deeper problems than a bouncing cache-line.
>
> So its no use on NUMA?
It is, its just that we're swapping very heavily at that point, a
bouncing cache-line will not significantly slow down the box compared to
waiting for block IO, will it?
> > Please look at it more carefully.
> >
> > We differentiate pages allocated at the level where GFP_ATOMIC starts to
> > fail. By not updating the percpu slabs those are retried every time,
> > except for ALLOC_NO_WATERMARKS allocations; those are served from the
> > ->reserve_slab.
> >
> > Once a regular slab allocation succeeds again, the ->reserve_slab is
> > cleaned up and never again looked at it until we're in distress again.
>
> A single slab? This may only give you a a single object in an extreme
> case. Are you sure that this solution is generic enough?
Well, single as in a single active; it gets spilled into the full list
and a new one is instanciated if more is needed.
> The problem here is that you may spinlock and take out the slab for one
> cpu but then (AFAICT) other cpus can still not get their high priority
> allocs satisfied. Some comments follow.
All cpus are redirected to ->reserve_slab when the regular allocations
start to fail.
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> > include/linux/slub_def.h | 2 +
> > mm/slub.c | 85 ++++++++++++++++++++++++++++++++++++++++++-----
> > 2 files changed, 78 insertions(+), 9 deletions(-)
> >
> > Index: linux-2.6-git/include/linux/slub_def.h
> > ===================================================================
> > --- linux-2.6-git.orig/include/linux/slub_def.h
> > +++ linux-2.6-git/include/linux/slub_def.h
> > @@ -46,6 +46,8 @@ struct kmem_cache {
> > struct list_head list; /* List of slab caches */
> > struct kobject kobj; /* For sysfs */
> >
> > + struct page *reserve_slab;
> > +
> > #ifdef CONFIG_NUMA
> > int defrag_ratio;
> > struct kmem_cache_node *node[MAX_NUMNODES];
> > Index: linux-2.6-git/mm/slub.c
> > ===================================================================
> > --- linux-2.6-git.orig/mm/slub.c
> > +++ linux-2.6-git/mm/slub.c
> > @@ -20,11 +20,13 @@
> > #include <linux/mempolicy.h>
> > #include <linux/ctype.h>
> > #include <linux/kallsyms.h>
> > +#include "internal.h"
> >
> > /*
> > * Lock order:
> > - * 1. slab_lock(page)
> > - * 2. slab->list_lock
> > + * 1. reserve_lock
> > + * 2. slab_lock(page)
> > + * 3. node->list_lock
> > *
> > * The slab_lock protects operations on the object of a particular
> > * slab and its metadata in the page struct. If the slab lock
> > @@ -259,6 +261,8 @@ static int sysfs_slab_alias(struct kmem_
> > static void sysfs_slab_remove(struct kmem_cache *s) {}
> > #endif
> >
> > +static DEFINE_SPINLOCK(reserve_lock);
> > +
> > /********************************************************************
> > * Core slab cache functions
> > *******************************************************************/
> > @@ -1007,7 +1011,7 @@ static void setup_object(struct kmem_cac
> > s->ctor(object, s, 0);
> > }
> >
> > -static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
> > +static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node, int *rank)
> > {
> > struct page *page;
> > struct kmem_cache_node *n;
> > @@ -1025,6 +1029,7 @@ static struct page *new_slab(struct kmem
> > if (!page)
> > goto out;
> >
> > + *rank = page->rank;
> > n = get_node(s, page_to_nid(page));
> > if (n)
> > atomic_long_inc(&n->nr_slabs);
> > @@ -1311,7 +1316,7 @@ static void unfreeze_slab(struct kmem_ca
> > /*
> > * Remove the cpu slab
> > */
> > -static void deactivate_slab(struct kmem_cache *s, struct page *page, int cpu)
> > +static void __deactivate_slab(struct kmem_cache *s, struct page *page)
> > {
> > /*
> > * Merge cpu freelist into freelist. Typically we get here
> > @@ -1330,10 +1335,15 @@ static void deactivate_slab(struct kmem_
> > page->freelist = object;
> > page->inuse--;
> > }
> > - s->cpu_slab[cpu] = NULL;
> > unfreeze_slab(s, page);
> > }
>
> So you want to spill back the lockless_freelist without deactivating the
> slab? Why are you using the lockless_freelist at all? If you do not use it
> then you can call unfreeze_slab. No need for this split.
Ah, quite right. I do indeed not use the lockless_freelist.
> > @@ -1395,6 +1405,7 @@ static void *__slab_alloc(struct kmem_ca
> > {
> > void **object;
> > int cpu = smp_processor_id();
> > + int rank = 0;
> >
> > if (!page)
> > goto new_slab;
> > @@ -1424,10 +1435,26 @@ new_slab:
> > if (page) {
> > s->cpu_slab[cpu] = page;
> > goto load_freelist;
> > - }
> > + } else if (unlikely(gfp_to_alloc_flags(gfpflags) & ALLOC_NO_WATERMARKS))
> > + goto try_reserve;
>
> Ok so we are trying to allocate a slab and do not get one thus ->
> try_reserve.
Right, so the cpu-slab is NULL, and we need a new slab.
> But this is only working if we are using the slab after
> explicitly flushing the cpuslabs. Otherwise the slab may be full and we
> get to alloc_slab.
/me fails to parse.
When we need a new_slab:
- we try the partial lists,
- we try the reserve (if ALLOC_NO_WATERMARKS)
otherwise alloc_slab
> >
> > - page = new_slab(s, gfpflags, node);
> > - if (page) {
>
> > +alloc_slab:
> > + page = new_slab(s, gfpflags, node, &rank);
> > + if (page && rank) {
>
> Huh? You mean !page?
No, no, we did get a page, and it was !ALLOC_NO_WATERMARK hard to get
it.
> > + if (unlikely(s->reserve_slab)) {
> > + struct page *reserve;
> > +
> > + spin_lock(&reserve_lock);
> > + reserve = s->reserve_slab;
> > + s->reserve_slab = NULL;
> > + spin_unlock(&reserve_lock);
> > +
> > + if (reserve) {
> > + slab_lock(reserve);
> > + __deactivate_slab(s, reserve);
> > + putback_slab(s, reserve);
>
> Remove the above two lines (they are wrong regardless) and simply make
> this the cpu slab.
It need not be the same node; the reserve_slab is node agnostic.
So here the free page watermarks are good again, and we can forget all
about the ->reserve_slab. We just push it on the free/partial lists and
forget about it.
But like you said above: unfreeze_slab() should be good, since I don't
use the lockless_freelist.
> > + }
> > + }
> > cpu = smp_processor_id();
> > if (s->cpu_slab[cpu]) {
> > /*
> > @@ -1455,6 +1482,18 @@ new_slab:
> > SetSlabFrozen(page);
> > s->cpu_slab[cpu] = page;
> > goto load_freelist;
> > + } else if (page) {
> > + spin_lock(&reserve_lock);
> > + if (s->reserve_slab) {
> > + discard_slab(s, page);
> > + page = s->reserve_slab;
> > + }
> > + slab_lock(page);
> > + SetPageActive(page);
> > + s->reserve_slab = page;
> > + spin_unlock(&reserve_lock);
> > +
> > + goto got_reserve;
So this is when we get a page and it was ALLOC_NO_WATERMARKS hard to get
it. Instead of updating the cpu_slab we leave that unset, so that
subsequent allocations will try to allocate a slab again thereby testing
the current free pages limit (and not gobble up the reserve).
> > }
> > return NULL;
> > debug:
> > @@ -1470,6 +1509,31 @@ debug:
> > page->freelist = object[page->offset];
> > slab_unlock(page);
> > return object;
> > +
> > +try_reserve:
> > + spin_lock(&reserve_lock);
> > + page = s->reserve_slab;
> > + if (!page) {
> > + spin_unlock(&reserve_lock);
> > + goto alloc_slab;
> > + }
> > +
> > + slab_lock(page);
> > + if (!page->freelist) {
> > + s->reserve_slab = NULL;
> > + spin_unlock(&reserve_lock);
> > + __deactivate_slab(s, page);
> replace with unfreeze slab.
>
> > + putback_slab(s, page);
>
> Putting back the slab twice.
__deactivete_slab() doesn't do putback_slab, and now I see that whole
function isn't there anymore. unfreeze_slab() it is.
> > + goto alloc_slab;
> > + }
> > + spin_unlock(&reserve_lock);
> > +
> > +got_reserve:
> > + object = page->freelist;
> > + page->inuse++;
> > + page->freelist = object[page->offset];
> > + slab_unlock(page);
> > + return object;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-05-16 20:18 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-14 13:19 Peter Zijlstra
2007-05-14 13:19 ` [PATCH 1/5] mm: page allocation rank Peter Zijlstra
2007-05-14 13:19 ` [PATCH 2/5] mm: slab allocation fairness Peter Zijlstra
2007-05-14 15:51 ` Christoph Lameter
2007-05-14 13:19 ` [PATCH 3/5] mm: slub " Peter Zijlstra
2007-05-14 15:49 ` Christoph Lameter
2007-05-14 16:14 ` Peter Zijlstra
2007-05-14 16:35 ` Christoph Lameter
2007-05-14 13:19 ` [PATCH 4/5] mm: slob " Peter Zijlstra
2007-05-14 13:19 ` [PATCH 5/5] mm: allow mempool to fall back to memalloc reserves Peter Zijlstra
2007-05-14 15:53 ` [PATCH 0/5] make slab gfp fair Christoph Lameter
2007-05-14 16:10 ` Peter Zijlstra
2007-05-14 16:37 ` Christoph Lameter
2007-05-14 16:12 ` Matt Mackall
2007-05-14 16:29 ` Christoph Lameter
2007-05-14 17:40 ` Peter Zijlstra
2007-05-14 17:57 ` Christoph Lameter
2007-05-14 19:28 ` Peter Zijlstra
2007-05-14 19:56 ` Christoph Lameter
2007-05-14 20:03 ` Peter Zijlstra
2007-05-14 20:06 ` Christoph Lameter
2007-05-14 20:12 ` Peter Zijlstra
2007-05-14 20:25 ` Christoph Lameter
2007-05-15 17:27 ` Peter Zijlstra
2007-05-15 22:02 ` Christoph Lameter
2007-05-16 6:59 ` Peter Zijlstra
2007-05-16 18:43 ` Christoph Lameter
2007-05-16 19:25 ` Peter Zijlstra
2007-05-16 19:53 ` Christoph Lameter
2007-05-16 20:18 ` Peter Zijlstra [this message]
2007-05-16 20:27 ` Christoph Lameter
2007-05-16 20:40 ` Peter Zijlstra
2007-05-16 20:44 ` Christoph Lameter
2007-05-16 20:54 ` Peter Zijlstra
2007-05-16 20:59 ` Christoph Lameter
2007-05-16 21:04 ` Peter Zijlstra
2007-05-16 21:13 ` Christoph Lameter
2007-05-16 21:20 ` Peter Zijlstra
2007-05-16 21:42 ` Christoph Lameter
2007-05-17 7:28 ` Peter Zijlstra
2007-05-17 17:30 ` Christoph Lameter
2007-05-17 17:53 ` Peter Zijlstra
2007-05-17 18:01 ` Christoph Lameter
2007-05-14 19:44 ` Andrew Morton
2007-05-14 20:01 ` Matt Mackall
2007-05-14 20:05 ` Peter Zijlstra
2007-05-17 3:02 ` Christoph Lameter
2007-05-17 7:08 ` Peter Zijlstra
2007-05-17 17:29 ` Christoph Lameter
2007-05-17 17:52 ` Peter Zijlstra
2007-05-17 17:59 ` Christoph Lameter
2007-05-17 17:53 ` Matt Mackall
2007-05-17 18:02 ` Christoph Lameter
2007-05-17 19:18 ` Peter Zijlstra
2007-05-17 19:24 ` Christoph Lameter
2007-05-17 21:26 ` Peter Zijlstra
2007-05-17 21:44 ` Paul Jackson
2007-05-17 22:27 ` Christoph Lameter
2007-05-18 9:54 ` Peter Zijlstra
2007-05-18 17:11 ` Paul Jackson
2007-05-18 17:11 ` Christoph Lameter
2007-05-20 8:39 ` Peter Zijlstra
2007-05-21 16:45 ` Christoph Lameter
2007-05-21 19:33 ` Peter Zijlstra
2007-05-21 19:43 ` Christoph Lameter
2007-05-21 20:08 ` Peter Zijlstra
2007-05-21 20:32 ` Christoph Lameter
2007-05-21 20:54 ` Peter Zijlstra
2007-05-21 21:04 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1179346738.2912.39.camel@lappy \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mpm@selenic.com \
--cc=penberg@cs.helsinki.fi \
--cc=phillips@google.com \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox