From: Andrew Morton <akpm@osdl.org>
To: Paul Jackson <pj@sgi.com>
Cc: clameter@sgi.com, linux-mm@kvack.org, rientjes@google.com, ak@suse.de
Subject: Re: [PATCH] GFP_THISNODE for the slab allocator
Date: Sun, 17 Sep 2006 09:29:26 -0700 [thread overview]
Message-ID: <20060917092926.01dc0012.akpm@osdl.org> (raw)
In-Reply-To: <20060917022834.9d56468a.pj@sgi.com>
On Sun, 17 Sep 2006 02:28:34 -0700
Paul Jackson <pj@sgi.com> wrote:
> Andrew wrote:
> > Could cache a zone* and the cpu number. If the cpu number has changed
> > since last time, do another lookup.
>
> Hmmm ... getting closer. This doesn't work as stated, because
> consecutive requests to allocate a page could use different zonelists,
> perhaps from MPOL_BIND, while still coming from the same cpu number.
> The cached zone* would be in the wrong zonelist in that case.
>
> How about two struct zone pointers in the task struct?
>
> One caching the zonelist pointer passed into get_page_from_freelist(),
> and the other caching the pointer you've been suggesting all along,
> to the zone where we found free memory last time we looked.
>
> If that same task tries to allocate a page with a different zonelist
> then we fallback to a brute force lookup and reset the cached state.
>
> (Note to self) The cpuset_update_task_memory_state() routine will
> have to zap these two cached zone pointers. That's easy.
>
>
> Also, as you noticed earlier, we need a way to notice if a once full
> zone that we've been skipping over gets some more free memory.
Do we?
I don't believe that we need to do this in the nodes-for-containers
application. If the machine is UMA then all nodes are equal. If the
machine is physically NUMA then one hopes that the administrator will
construct the container cpusets in a way which minimises off-node traffic,
but I haven't thought about that aspect of it much.
Perhaps we do need to do this in the legacy (;)) cpuset application?
Perhaps what happens is that the "first" memory nodes in a cpuset are those
which are "closest" to that cpuset's CPUs??
IOW: in which operational scenarios and configurations would you view this
go-back-to-the-earlier-zone-if-some-memory-came-free-in-it approach
to be needed?
> One way to do that would be to add one more (a third) zone* to the
> task struct. This third zone* would point to the next zone to retry
> for free memory.
>
> Once each time we call get_page_from_freelist(), we'd retry one
> zone, to see if it had gained some free memory.
>
> If it still had no free memory, increment the retry pointer,
> wrapping when it got up to the zone* we were currently getting
> memory from.
>
> If we discovered some new free memory on the retried node, then
> start using that zone* instead of the one we were using.
>
>
> Now we're up to three zone* pointers in the task struct:
> base -- the base zonelist pointer passed to get_page_from_freelist()
> cur -- the current zone we're getting memory from
> retry -- the next zone to recheck for free memory
>
> If we make the cur and retry pointers be 32 bit indices, instead of
> pointers, this saves 64 bits in the task struct on 64 bit arch's.
>
> Calls to get_page_from_freelist() with GFP_HARDWALL -not- set, and
> those with ALLOC_CPUSET -not- set, must bypass this cached state.
>
> The micro-optimizations I had in mind to the cpuset_zone_allowed()
> call from get_page_from_freelist() are probably still worth doing,
> as that code path, from a linear search of the zonelist, is still
> necessary in various situations.
>
> How does this sound?
Getting there. I don't think we're particularly worried about
CONFIG_NUMA-only space in the task_struct, btw..
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-09-17 16:29 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-13 23:50 Christoph Lameter
2006-09-15 5:00 ` Andrew Morton
2006-09-15 6:49 ` Paul Jackson
2006-09-15 7:23 ` Andrew Morton
2006-09-15 7:44 ` Paul Jackson
2006-09-15 8:06 ` Andrew Morton
2006-09-15 15:53 ` David Rientjes
2006-09-15 23:03 ` David Rientjes
2006-09-16 0:04 ` Paul Jackson
2006-09-16 1:36 ` Andrew Morton
2006-09-16 2:23 ` Christoph Lameter
2006-09-16 4:34 ` Andrew Morton
2006-09-16 3:28 ` [PATCH] Add node to zone for the NUMA case Christoph Lameter
2006-09-16 3:40 ` Paul Jackson
2006-09-16 3:45 ` [PATCH] GFP_THISNODE for the slab allocator Paul Jackson
2006-09-16 2:47 ` Christoph Lameter
2006-09-17 3:45 ` David Rientjes
2006-09-17 11:17 ` Paul Jackson
2006-09-17 12:41 ` Christoph Lameter
2006-09-17 13:03 ` Paul Jackson
2006-09-17 20:36 ` David Rientjes
2006-09-17 21:20 ` Paul Jackson
2006-09-17 22:27 ` Paul Jackson
2006-09-17 23:49 ` David Rientjes
2006-09-18 2:20 ` Paul Jackson
2006-09-18 16:34 ` Paul Jackson
2006-09-18 17:49 ` David Rientjes
2006-09-18 20:46 ` Paul Jackson
2006-09-19 20:52 ` David Rientjes
2006-09-19 21:26 ` Christoph Lameter
2006-09-19 21:50 ` David Rientjes
2006-09-21 22:11 ` David Rientjes
2006-09-22 10:10 ` Nick Piggin
2006-09-22 16:26 ` Paul Jackson
2006-09-22 16:36 ` Christoph Lameter
2006-09-15 8:28 ` Andrew Morton
2006-09-16 3:38 ` Paul Jackson
2006-09-16 4:42 ` Andi Kleen
2006-09-16 11:38 ` Paul Jackson
2006-09-16 4:48 ` Andrew Morton
2006-09-16 11:30 ` Paul Jackson
2006-09-16 15:18 ` Andrew Morton
2006-09-17 9:28 ` Paul Jackson
2006-09-17 9:51 ` Nick Piggin
2006-09-17 11:15 ` Paul Jackson
2006-09-17 12:44 ` Nick Piggin
2006-09-17 13:19 ` Paul Jackson
2006-09-17 13:52 ` Nick Piggin
2006-09-17 21:19 ` Paul Jackson
2006-09-18 12:44 ` [PATCH] mm: exempt pcp alloc from watermarks Peter Zijlstra
2006-09-18 20:20 ` Christoph Lameter
2006-09-18 20:43 ` Peter Zijlstra
2006-09-19 14:35 ` Nick Piggin
2006-09-19 14:44 ` Christoph Lameter
2006-09-19 15:02 ` Nick Piggin
2006-09-19 14:51 ` Peter Zijlstra
2006-09-19 15:10 ` Nick Piggin
2006-09-19 15:05 ` Peter Zijlstra
2006-09-19 15:39 ` Christoph Lameter
2006-09-17 16:29 ` Andrew Morton [this message]
2006-09-18 2:11 ` [PATCH] GFP_THISNODE for the slab allocator Paul Jackson
2006-09-18 5:09 ` Andrew Morton
2006-09-18 7:49 ` Paul Jackson
2006-09-16 11:48 ` Paul Jackson
2006-09-16 15:38 ` Andrew Morton
2006-09-16 21:51 ` Paul Jackson
2006-09-16 23:10 ` Andrew Morton
2006-09-17 4:37 ` Christoph Lameter
2006-09-17 4:55 ` Andrew Morton
2006-09-17 12:09 ` Paul Jackson
2006-09-17 12:36 ` Christoph Lameter
2006-09-17 13:06 ` Paul Jackson
2006-09-19 19:17 ` David Rientjes
2006-09-19 19:19 ` David Rientjes
2006-09-19 19:31 ` Christoph Lameter
2006-09-19 21:12 ` David Rientjes
2006-09-19 21:28 ` Christoph Lameter
2006-09-19 21:53 ` Paul Jackson
2006-09-15 17:08 ` Christoph Lameter
2006-09-15 17:37 ` [PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMA Christoph Lameter
2006-09-15 17:38 ` [PATCH] Disable GFP_THISNODE in the non-NUMA case Christoph Lameter
2006-09-15 17:42 ` [PATCH] GFP_THISNODE for the slab allocator V2 Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060917092926.01dc0012.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=ak@suse.de \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=pj@sgi.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox