linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
	Nick Piggin <npiggin@suse.de>,
	heiko.carstens@de.ibm.com, sachinp@in.ibm.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Tejun Heo <tj@kernel.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [PATCH 2/4] slqb: Record what node is local to a kmem_cache_cpu
Date: Wed, 30 Sep 2009 23:05:42 +0100	[thread overview]
Message-ID: <20090930220541.GA31530@csn.ul.ie> (raw)
In-Reply-To: <alpine.DEB.1.10.0909301053550.9450@gentwo.org>

On Wed, Sep 30, 2009 at 11:06:04AM -0400, Christoph Lameter wrote:
> On Wed, 30 Sep 2009, Mel Gorman wrote:
> 
> > Ok, so I spent today looking at this again. The problem is not with faulty
> > drain logic as such. As frees always place an object on a remote list
> > and the allocation side is often (but not always) allocating a new page,
> > a significant number of objects in the free list are the only object
> > in a page. SLQB drains based on the number of objects on the free list,
> > not the number of pages. With many of the pages having only one object,
> > the freelists are pinning a lot more memory than expected.  For example,
> > a watermark to drain of 512 could be pinning 2MB of pages.
> 
> No good. So we are allocating new pages from somewhere allocating a
> single object and putting them on the freelist where we do not find them
> again.

Yes

> This is bad caching behavior as well.
> 

Yes, I suppose it would be as it's not using the hottest object. The
fact it OOM storms is a bit more important than poor caching behaviour
but hey :/

> > The drain logic could be extended to track not only the number of objects on
> > the free list but also the number of pages but I really don't think that is
> > desirable behaviour. I'm somewhat running out of sensible ideas for dealing
> > with this but here is another go anyway that might be more palatable than
> > tracking what a "local" node is within the slab.
> 
> SLUB avoids that issue by having a "current" page for a processor. It
> allocates from the current page until its exhausted. It can use fast path
> logic both for allocations and frees regardless of the pages origin. The
> node fallback is handled by the page allocator and that one is only
> involved when a new slab page is needed.
> 

This is essentially the "unqueued" nature of SLUB. It's objective "I have this
page here which I'm going to use until I can't use it no more and will depend
on the page allocator to sort my stuff out". I have to read up on SLUB up
more to see if it's compatible with SLQB or not though. In particular, how
does SLUB deal with frees from pages that are not the "current" page? SLQB
does not care what page the object belongs to as long as it's node-local
as the object is just shoved onto a LIFO for maximum hotness.

> SLAB deals with it in fallback_alloc(). It scans the nodes in zonelist
> order for free objects of the kmem_cache and then picks up from the
> nearest node. Ugly but it works. SLQB would have to do something similar
> since it also has the per node object bins that SLAB has.
> 

In a real sense, this is what the patch ends up doing. When it fails to
get something locally but sees that the local node is memoryless, it
will check the remote node lists in zonelist order. I think that's
reasonable behaviour but I'm biased because I just want the damn machine
to boot again. What do you think? Pekka, Nick?

> The local node for a memoryless node may not exist at all since there may
> be multiple nodes at the same distance to the memoryless node. So at
> mininum you would have to manage a set of local nodes. If you have the set
> then you also would need to consider memory policies. During bootup you
> would have to simulate the interleave mode in effect. After bootup you
> would have to use the tasks policy.
> 

I think SLQBs treatment of memory policies needs to be handled as a separate
problem. It's less than perfect at the moment, more of that below.

> This all points to major NUMA issues in SLQB. This is not arch specific.
> SLQB cannot handle memoryless nodes at this point.
> 
> > This patch alters the allocation path. If the allocation from local
> > lists fails and the local node is memoryless, an attempt will be made to
> > allocate from the remote lists before going to the page allocator.
> 
> Are the allocation attempts from the remote lists governed by memory
> policies?

It does to some extent. When selecting a node zonelist, it takes the
current memory policy into account but at a glance, it does not appear
to obey a policy that restricts the available nodes.

> Otherwise you may create imbalances on neighboring nodes.
> 

I haven't thought about this aspect of things a whole lot to be honest.
It's not the problem at hand.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-09-30 21:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-22 12:54 [PATCH 0/3] Fix SLQB on memoryless configurations V3 Mel Gorman
2009-09-22 12:54 ` [PATCH 1/4] slqb: Do not use DEFINE_PER_CPU for per-node data Mel Gorman
2009-09-22 18:55   ` Pekka Enberg
2009-09-22 12:54 ` [PATCH 2/4] slqb: Record what node is local to a kmem_cache_cpu Mel Gorman
2009-09-22 13:38   ` Pekka Enberg
2009-09-22 13:54     ` Mel Gorman
2009-09-22 18:54       ` Pekka Enberg
2009-09-22 18:56         ` Mel Gorman
2009-09-30 14:41           ` Mel Gorman
2009-09-30 15:06             ` Christoph Lameter
2009-09-30 22:05               ` Mel Gorman [this message]
2009-09-30 23:45                 ` Christoph Lameter
2009-10-01 10:40                   ` Mel Gorman
2009-10-01 14:32                     ` Christoph Lameter
2009-10-01 15:03                       ` Mel Gorman
2009-10-01 15:03                         ` Christoph Lameter
2009-10-01 15:16                           ` Mel Gorman
2009-10-04 12:06                   ` Pekka Enberg
2009-10-05  9:49                     ` Mel Gorman
2009-09-22 12:54 ` [PATCH 3/4] slqb: Allow SLQB to be used on PPC and S390 Mel Gorman
2009-09-22 13:21 ` [PATCH 0/3] Fix SLQB on memoryless configurations V3 Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090930220541.GA31530@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=benh@kernel.crashing.org \
    --cc=cl@linux-foundation.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=sachinp@in.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox