linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Paul Jackson <pj@sgi.com>
Cc: clameter@sgi.com, linux-mm@kvack.org, rientjes@google.com, ak@suse.de
Subject: Re: [PATCH] GFP_THISNODE for the slab allocator
Date: Fri, 15 Sep 2006 21:48:22 -0700	[thread overview]
Message-ID: <20060915214822.1c15c2cb.akpm@osdl.org> (raw)
In-Reply-To: <20060915203816.fd260a0b.pj@sgi.com>

On Fri, 15 Sep 2006 20:38:16 -0700
Paul Jackson <pj@sgi.com> wrote:

> [Adding Andi to cc list, since I mention him below. -pj]
> 
> Andrew wrote:
> > I'm thinking a) is easily solved by adding an array of the zones inside the
> > `struct cpuset', and change get_page_from_freelist() to only look at those
> > zones.
> > ...
> > err, if we cache the most-recently-allocated-from zone in the cpuset then
> > we don't need the array-of-zones, do we?  We'll only need to do a zone
> > waddle when switching from one zone to the next, which is super-rare.
> > 
> > That's much simpler.
> > ...
> > And locking becomes simpler too.  It's just a check of
> > cpuset_zone_allowed(current->cpuset->current_allocation_zone)
> 
> This will blow chunks performance wise, with the current cpuset locking
> scheme.
> 
> Just one current_allocation_zone would not be enough.  Each node that
> the cpuset allowed would require its own current_allocation_zone.  For
> example, on a big honkin NUMA box with 2 CPUs per Node, tasks running
> on CPU 32, Node 16, might be able to find free memory right on that
> Node 16.  But another task in the same cpuset running on CPU 112, Node
> 56 might have to scan past a dozen Nodes to Node 68 to find memory.
> 
> Accessing anything from a cpuset that depends on what nodes it allows
> requires taking the global mutex callback_mutex (in kernel/cpuset.c).
> We don't want to put a global mutex on the page alloc hot code path.
> 
> Anything we need to access frequently from a tasks cpuset has to be
> cached in its task struct.
> 
> Three alternative possibilities:
> 
> 1)  Perhaps these most-recently-allocated-from-zone's shouldn't be
>     properties of the cpuset, nor even of the task, but of the zone structs.
> 
>     If each zone struct on the zonelist had an additional flag bit marking
>     the zones that had no free memory, then we could navigate the zonelist
>     pretty quickly.  One more bit per zone struct would be enough to track
>     a simple rescan mechanism, so that we could detect when a node that
>     had formerly run out of memory once again had free memory.
> 
>     One or two bits per zone struct would be way cheaper, so far as
>     data space requirements.
> 
>     Downside - it still hits each zone struct - suboptimal cache trashing.
>     One less pointer chase than z->zone_pgdat->node_id, but still not
>     great.
> 
> 2)  It may be sufficient to locally optimize get_page_from_freelist()'s
>     calls to cpuset_zone_allowed() - basically open code cpuset_zone_allowed,
>     or at least refine its invocation.
> 
>     This might require a second nodemask in the task struct, for the typically
>     larger set of nodes that GFP_KERNEL allocations can use, more than just
>     the nodes that GFP_USER can use.  Such a second nodemask in the task struct
>     would enable me to avoid taking the global callback_mutex for some GFP_KERNEL 
>     allocations on tight memory systems.
> 
>     Downside #1 - still requires z->zone_pgdat->node_id.  Andrew suspects
>     that this is enough of a problem in itself.  From the profile, which
>     showed cpuset_zone_allowed(), not get_page_from_freelist(), at the
>     top of the list, given that the node id is evaluated in the
>     get_page_from_freelist() routine, I was figuring that the real
>     problem was in the cpuset_zone_allowed() code.  Perhaps some testing
>     of a simple hack approximation to this patch will tell us - next week.
> 
>     Downside #2 - may require the above mentioned additional nodemask_t
>     in the task struct.
> 
> 3)  The custom zonelist option - which was part of my original cpuset
>     proposal, and which Andi K and I have gone back and forth on, with
>     each of us liking and disliking it, at different times.  See further
>     my latest writeup on this option:
> 
>       http://lkml.org/lkml/2005/11/5/252
>       Date	Sat, 5 Nov 2005 20:18:41 -0800
>       From	Paul Jackson <pj@sgi.com>
>       Subject	Re: [PATCH]: Clean up of __alloc_pages
> 
> My current plan - see if somehow I can code up and get tested (2),
> since a rough approximation to it would be trivial to code.  If that
> works, go with it, unless someone convinces me otherwise.  If (2) can't
> do the job, try (1), since that seems easier to code.  If that fails,
> or someone shoots that down, or Andi makes a good enough case for (3),
> give (3) a go - that's the hardest path, and risks the most collateral
> damage to the behaviour of the memory paging subsystem.
> 

hm.

Why is it not sufficient to cache the most-recent zone*  in task_struct?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-09-16  4:48 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-13 23:50 Christoph Lameter
2006-09-15  5:00 ` Andrew Morton
2006-09-15  6:49   ` Paul Jackson
2006-09-15  7:23     ` Andrew Morton
2006-09-15  7:44       ` Paul Jackson
2006-09-15  8:06         ` Andrew Morton
2006-09-15 15:53           ` David Rientjes
2006-09-15 23:03           ` David Rientjes
2006-09-16  0:04             ` Paul Jackson
2006-09-16  1:36               ` Andrew Morton
2006-09-16  2:23                 ` Christoph Lameter
2006-09-16  4:34                   ` Andrew Morton
2006-09-16  3:28                 ` [PATCH] Add node to zone for the NUMA case Christoph Lameter
2006-09-16  3:40                   ` Paul Jackson
2006-09-16  3:45                 ` [PATCH] GFP_THISNODE for the slab allocator Paul Jackson
2006-09-16  2:47             ` Christoph Lameter
2006-09-17  3:45             ` David Rientjes
2006-09-17 11:17               ` Paul Jackson
2006-09-17 12:41                 ` Christoph Lameter
2006-09-17 13:03                   ` Paul Jackson
2006-09-17 20:36                     ` David Rientjes
2006-09-17 21:20                       ` Paul Jackson
2006-09-17 22:27                       ` Paul Jackson
2006-09-17 23:49                         ` David Rientjes
2006-09-18  2:20                           ` Paul Jackson
2006-09-18 16:34                             ` Paul Jackson
2006-09-18 17:49                               ` David Rientjes
2006-09-18 20:46                                 ` Paul Jackson
2006-09-19 20:52                               ` David Rientjes
2006-09-19 21:26                                 ` Christoph Lameter
2006-09-19 21:50                                   ` David Rientjes
2006-09-21 22:11                                 ` David Rientjes
2006-09-22 10:10                                   ` Nick Piggin
2006-09-22 16:26                                   ` Paul Jackson
2006-09-22 16:36                                     ` Christoph Lameter
2006-09-15  8:28       ` Andrew Morton
2006-09-16  3:38         ` Paul Jackson
2006-09-16  4:42           ` Andi Kleen
2006-09-16 11:38             ` Paul Jackson
2006-09-16  4:48           ` Andrew Morton [this message]
2006-09-16 11:30             ` Paul Jackson
2006-09-16 15:18               ` Andrew Morton
2006-09-17  9:28                 ` Paul Jackson
2006-09-17  9:51                   ` Nick Piggin
2006-09-17 11:15                     ` Paul Jackson
2006-09-17 12:44                       ` Nick Piggin
2006-09-17 13:19                         ` Paul Jackson
2006-09-17 13:52                           ` Nick Piggin
2006-09-17 21:19                             ` Paul Jackson
2006-09-18 12:44                             ` [PATCH] mm: exempt pcp alloc from watermarks Peter Zijlstra
2006-09-18 20:20                               ` Christoph Lameter
2006-09-18 20:43                                 ` Peter Zijlstra
2006-09-19 14:35                               ` Nick Piggin
2006-09-19 14:44                                 ` Christoph Lameter
2006-09-19 15:02                                   ` Nick Piggin
2006-09-19 14:51                                 ` Peter Zijlstra
2006-09-19 15:10                                   ` Nick Piggin
2006-09-19 15:05                                     ` Peter Zijlstra
2006-09-19 15:39                                       ` Christoph Lameter
2006-09-17 16:29                   ` [PATCH] GFP_THISNODE for the slab allocator Andrew Morton
2006-09-18  2:11                     ` Paul Jackson
2006-09-18  5:09                       ` Andrew Morton
2006-09-18  7:49                         ` Paul Jackson
2006-09-16 11:48       ` Paul Jackson
2006-09-16 15:38         ` Andrew Morton
2006-09-16 21:51           ` Paul Jackson
2006-09-16 23:10             ` Andrew Morton
2006-09-17  4:37               ` Christoph Lameter
2006-09-17  4:55                 ` Andrew Morton
2006-09-17 12:09                   ` Paul Jackson
2006-09-17 12:36                   ` Christoph Lameter
2006-09-17 13:06                     ` Paul Jackson
2006-09-19 19:17                 ` David Rientjes
2006-09-19 19:19                   ` David Rientjes
2006-09-19 19:31                   ` Christoph Lameter
2006-09-19 21:12                     ` David Rientjes
2006-09-19 21:28                       ` Christoph Lameter
2006-09-19 21:53                         ` Paul Jackson
2006-09-15 17:08   ` Christoph Lameter
2006-09-15 17:37   ` [PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMA Christoph Lameter
2006-09-15 17:38   ` [PATCH] Disable GFP_THISNODE in the non-NUMA case Christoph Lameter
2006-09-15 17:42   ` [PATCH] GFP_THISNODE for the slab allocator V2 Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060915214822.1c15c2cb.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=ak@suse.de \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=pj@sgi.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox