From: Andrew Morton <akpm@osdl.org>
To: Paul Jackson <pj@sgi.com>
Cc: clameter@sgi.com, linux-mm@kvack.org, rientjes@google.com, ak@suse.de
Subject: Re: [PATCH] GFP_THISNODE for the slab allocator
Date: Fri, 15 Sep 2006 21:48:22 -0700 [thread overview]
Message-ID: <20060915214822.1c15c2cb.akpm@osdl.org> (raw)
In-Reply-To: <20060915203816.fd260a0b.pj@sgi.com>
On Fri, 15 Sep 2006 20:38:16 -0700
Paul Jackson <pj@sgi.com> wrote:
> [Adding Andi to cc list, since I mention him below. -pj]
>
> Andrew wrote:
> > I'm thinking a) is easily solved by adding an array of the zones inside the
> > `struct cpuset', and change get_page_from_freelist() to only look at those
> > zones.
> > ...
> > err, if we cache the most-recently-allocated-from zone in the cpuset then
> > we don't need the array-of-zones, do we? We'll only need to do a zone
> > waddle when switching from one zone to the next, which is super-rare.
> >
> > That's much simpler.
> > ...
> > And locking becomes simpler too. It's just a check of
> > cpuset_zone_allowed(current->cpuset->current_allocation_zone)
>
> This will blow chunks performance wise, with the current cpuset locking
> scheme.
>
> Just one current_allocation_zone would not be enough. Each node that
> the cpuset allowed would require its own current_allocation_zone. For
> example, on a big honkin NUMA box with 2 CPUs per Node, tasks running
> on CPU 32, Node 16, might be able to find free memory right on that
> Node 16. But another task in the same cpuset running on CPU 112, Node
> 56 might have to scan past a dozen Nodes to Node 68 to find memory.
>
> Accessing anything from a cpuset that depends on what nodes it allows
> requires taking the global mutex callback_mutex (in kernel/cpuset.c).
> We don't want to put a global mutex on the page alloc hot code path.
>
> Anything we need to access frequently from a tasks cpuset has to be
> cached in its task struct.
>
> Three alternative possibilities:
>
> 1) Perhaps these most-recently-allocated-from-zone's shouldn't be
> properties of the cpuset, nor even of the task, but of the zone structs.
>
> If each zone struct on the zonelist had an additional flag bit marking
> the zones that had no free memory, then we could navigate the zonelist
> pretty quickly. One more bit per zone struct would be enough to track
> a simple rescan mechanism, so that we could detect when a node that
> had formerly run out of memory once again had free memory.
>
> One or two bits per zone struct would be way cheaper, so far as
> data space requirements.
>
> Downside - it still hits each zone struct - suboptimal cache trashing.
> One less pointer chase than z->zone_pgdat->node_id, but still not
> great.
>
> 2) It may be sufficient to locally optimize get_page_from_freelist()'s
> calls to cpuset_zone_allowed() - basically open code cpuset_zone_allowed,
> or at least refine its invocation.
>
> This might require a second nodemask in the task struct, for the typically
> larger set of nodes that GFP_KERNEL allocations can use, more than just
> the nodes that GFP_USER can use. Such a second nodemask in the task struct
> would enable me to avoid taking the global callback_mutex for some GFP_KERNEL
> allocations on tight memory systems.
>
> Downside #1 - still requires z->zone_pgdat->node_id. Andrew suspects
> that this is enough of a problem in itself. From the profile, which
> showed cpuset_zone_allowed(), not get_page_from_freelist(), at the
> top of the list, given that the node id is evaluated in the
> get_page_from_freelist() routine, I was figuring that the real
> problem was in the cpuset_zone_allowed() code. Perhaps some testing
> of a simple hack approximation to this patch will tell us - next week.
>
> Downside #2 - may require the above mentioned additional nodemask_t
> in the task struct.
>
> 3) The custom zonelist option - which was part of my original cpuset
> proposal, and which Andi K and I have gone back and forth on, with
> each of us liking and disliking it, at different times. See further
> my latest writeup on this option:
>
> http://lkml.org/lkml/2005/11/5/252
> Date Sat, 5 Nov 2005 20:18:41 -0800
> From Paul Jackson <pj@sgi.com>
> Subject Re: [PATCH]: Clean up of __alloc_pages
>
> My current plan - see if somehow I can code up and get tested (2),
> since a rough approximation to it would be trivial to code. If that
> works, go with it, unless someone convinces me otherwise. If (2) can't
> do the job, try (1), since that seems easier to code. If that fails,
> or someone shoots that down, or Andi makes a good enough case for (3),
> give (3) a go - that's the hardest path, and risks the most collateral
> damage to the behaviour of the memory paging subsystem.
>
hm.
Why is it not sufficient to cache the most-recent zone* in task_struct?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-09-16 4:48 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-13 23:50 Christoph Lameter
2006-09-15 5:00 ` Andrew Morton
2006-09-15 6:49 ` Paul Jackson
2006-09-15 7:23 ` Andrew Morton
2006-09-15 7:44 ` Paul Jackson
2006-09-15 8:06 ` Andrew Morton
2006-09-15 15:53 ` David Rientjes
2006-09-15 23:03 ` David Rientjes
2006-09-16 0:04 ` Paul Jackson
2006-09-16 1:36 ` Andrew Morton
2006-09-16 2:23 ` Christoph Lameter
2006-09-16 4:34 ` Andrew Morton
2006-09-16 3:28 ` [PATCH] Add node to zone for the NUMA case Christoph Lameter
2006-09-16 3:40 ` Paul Jackson
2006-09-16 3:45 ` [PATCH] GFP_THISNODE for the slab allocator Paul Jackson
2006-09-16 2:47 ` Christoph Lameter
2006-09-17 3:45 ` David Rientjes
2006-09-17 11:17 ` Paul Jackson
2006-09-17 12:41 ` Christoph Lameter
2006-09-17 13:03 ` Paul Jackson
2006-09-17 20:36 ` David Rientjes
2006-09-17 21:20 ` Paul Jackson
2006-09-17 22:27 ` Paul Jackson
2006-09-17 23:49 ` David Rientjes
2006-09-18 2:20 ` Paul Jackson
2006-09-18 16:34 ` Paul Jackson
2006-09-18 17:49 ` David Rientjes
2006-09-18 20:46 ` Paul Jackson
2006-09-19 20:52 ` David Rientjes
2006-09-19 21:26 ` Christoph Lameter
2006-09-19 21:50 ` David Rientjes
2006-09-21 22:11 ` David Rientjes
2006-09-22 10:10 ` Nick Piggin
2006-09-22 16:26 ` Paul Jackson
2006-09-22 16:36 ` Christoph Lameter
2006-09-15 8:28 ` Andrew Morton
2006-09-16 3:38 ` Paul Jackson
2006-09-16 4:42 ` Andi Kleen
2006-09-16 11:38 ` Paul Jackson
2006-09-16 4:48 ` Andrew Morton [this message]
2006-09-16 11:30 ` Paul Jackson
2006-09-16 15:18 ` Andrew Morton
2006-09-17 9:28 ` Paul Jackson
2006-09-17 9:51 ` Nick Piggin
2006-09-17 11:15 ` Paul Jackson
2006-09-17 12:44 ` Nick Piggin
2006-09-17 13:19 ` Paul Jackson
2006-09-17 13:52 ` Nick Piggin
2006-09-17 21:19 ` Paul Jackson
2006-09-18 12:44 ` [PATCH] mm: exempt pcp alloc from watermarks Peter Zijlstra
2006-09-18 20:20 ` Christoph Lameter
2006-09-18 20:43 ` Peter Zijlstra
2006-09-19 14:35 ` Nick Piggin
2006-09-19 14:44 ` Christoph Lameter
2006-09-19 15:02 ` Nick Piggin
2006-09-19 14:51 ` Peter Zijlstra
2006-09-19 15:10 ` Nick Piggin
2006-09-19 15:05 ` Peter Zijlstra
2006-09-19 15:39 ` Christoph Lameter
2006-09-17 16:29 ` [PATCH] GFP_THISNODE for the slab allocator Andrew Morton
2006-09-18 2:11 ` Paul Jackson
2006-09-18 5:09 ` Andrew Morton
2006-09-18 7:49 ` Paul Jackson
2006-09-16 11:48 ` Paul Jackson
2006-09-16 15:38 ` Andrew Morton
2006-09-16 21:51 ` Paul Jackson
2006-09-16 23:10 ` Andrew Morton
2006-09-17 4:37 ` Christoph Lameter
2006-09-17 4:55 ` Andrew Morton
2006-09-17 12:09 ` Paul Jackson
2006-09-17 12:36 ` Christoph Lameter
2006-09-17 13:06 ` Paul Jackson
2006-09-19 19:17 ` David Rientjes
2006-09-19 19:19 ` David Rientjes
2006-09-19 19:31 ` Christoph Lameter
2006-09-19 21:12 ` David Rientjes
2006-09-19 21:28 ` Christoph Lameter
2006-09-19 21:53 ` Paul Jackson
2006-09-15 17:08 ` Christoph Lameter
2006-09-15 17:37 ` [PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMA Christoph Lameter
2006-09-15 17:38 ` [PATCH] Disable GFP_THISNODE in the non-NUMA case Christoph Lameter
2006-09-15 17:42 ` [PATCH] GFP_THISNODE for the slab allocator V2 Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060915214822.1c15c2cb.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=ak@suse.de \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=pj@sgi.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox