From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: akpm@linux-foundation.org, Lee.Schermerhorn@hp.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
rientjes@google.com, kamezawa.hiroyu@jp.fujitsu.com,
clameter@sgi.com
Subject: Re: [PATCH 6/6] Use one zonelist that is filtered by nodemask
Date: Mon, 8 Oct 2007 18:11:43 -0700 [thread overview]
Message-ID: <20071009011143.GC14670@us.ibm.com> (raw)
In-Reply-To: <20070928142526.16783.97067.sendpatchset@skynet.skynet.ie>
On 28.09.2007 [15:25:27 +0100], Mel Gorman wrote:
>
> Two zonelists exist so that GFP_THISNODE allocations will be guaranteed
> to use memory only from a node local to the CPU. As we can now filter the
> zonelist based on a nodemask, we filter the standard node zonelist for zones
> on the local node when GFP_THISNODE is specified.
>
> When GFP_THISNODE is used, a temporary nodemask is created with only the
> node local to the CPU set. This allows us to eliminate the second zonelist.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Acked-by: Christoph Lameter <clameter@sgi.com>
<snip>
> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc8-mm2-030_filter_nodemask/include/linux/gfp.h linux-2.6.23-rc8-mm2-040_use_one_zonelist/include/linux/gfp.h
> --- linux-2.6.23-rc8-mm2-030_filter_nodemask/include/linux/gfp.h 2007-09-28 15:49:57.000000000 +0100
> +++ linux-2.6.23-rc8-mm2-040_use_one_zonelist/include/linux/gfp.h 2007-09-28 15:55:03.000000000 +0100
[Reordering the chunks to make my comments a little more logical]
<snip>
> -static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
> +static inline struct zonelist *node_zonelist(int nid)
> {
> - return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
> + return &NODE_DATA(nid)->node_zonelist;
> }
>
> #ifndef HAVE_ARCH_FREE_PAGE
> @@ -198,7 +186,7 @@ static inline struct page *alloc_pages_n
> if (nid < 0)
> nid = numa_node_id();
>
> - return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
> + return __alloc_pages(gfp_mask, order, node_zonelist(nid));
> }
This is alloc_pages_node(), and converting the nid to a zonelist means
that lower levels (specifically __alloc_pages() here) are not aware of
nids, as far as I can tell. This isn't a change, I just want to make
sure I understand...
<snip>
> struct page * fastcall
> __alloc_pages(gfp_t gfp_mask, unsigned int order,
> struct zonelist *zonelist)
> {
> + /*
> + * Use a temporary nodemask for __GFP_THISNODE allocations. If the
> + * cost of allocating on the stack or the stack usage becomes
> + * noticable, allocate the nodemasks per node at boot or compile time
> + */
> + if (unlikely(gfp_mask & __GFP_THISNODE)) {
> + nodemask_t nodemask;
> +
> + return __alloc_pages_internal(gfp_mask, order,
> + zonelist, nodemask_thisnode(&nodemask));
> + }
> +
> return __alloc_pages_internal(gfp_mask, order, zonelist, NULL);
> }
<snip>
So alloc_pages_node() calls here and for THISNODE allocations, we go ask
nodemask_thisnode() for a nodemask...
> +static nodemask_t *nodemask_thisnode(nodemask_t *nodemask)
> +{
> + /* Build a nodemask for just this node */
> + int nid = numa_node_id();
> +
> + nodes_clear(*nodemask);
> + node_set(nid, *nodemask);
> +
> + return nodemask;
> +}
<snip>
And nodemask_thisnode() always gives us a nodemask with only the node
the current process is running on set, I think?
That seems really wrong -- and would explain what Lee was seeing while
using my patches for the hugetlb pool allocator to use THISNODE
allocations. All the allocations would end up coming from whatever node
the process happened to be running on. This obviously messes up hugetlb
accounting, as I rely on THISNODE requests returning NULL if they go
off-node.
I'm not sure how this would be fixed, as __alloc_pages() no longer has
the nid to set in the mask.
Am I wrong in my analysis?
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-10-09 1:12 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-28 14:23 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v8 Mel Gorman
2007-09-28 14:23 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-09-28 14:24 ` [PATCH 2/6] Introduce node_zonelist() for accessing the zonelist for a GFP mask Mel Gorman
2007-09-28 14:24 ` [PATCH 3/6] Use two zonelist that are filtered by " Mel Gorman
2007-09-28 14:24 ` [PATCH 4/6] Have zonelist contains structs with both a zone pointer and zone_idx Mel Gorman
2007-10-17 3:22 ` David Rientjes
2007-09-28 14:25 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-09-28 15:37 ` Lee Schermerhorn
2007-09-28 18:28 ` Mel Gorman
2007-09-28 18:38 ` Paul Jackson
2007-09-28 21:03 ` Lee Schermerhorn
2007-09-28 14:25 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-10-09 1:11 ` Nishanth Aravamudan [this message]
2007-10-09 1:56 ` Christoph Lameter
2007-10-09 3:17 ` Nishanth Aravamudan
2007-10-09 15:40 ` Mel Gorman
2007-10-09 16:25 ` Nishanth Aravamudan
2007-10-09 18:47 ` Christoph Lameter
2007-10-09 18:12 ` Nishanth Aravamudan
2007-10-10 15:53 ` Lee Schermerhorn
2007-10-10 16:05 ` Nishanth Aravamudan
2007-10-10 16:09 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2007-11-09 14:32 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v9 Mel Gorman
2007-11-09 14:34 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-11-09 15:45 ` Christoph Lameter
2007-11-09 16:14 ` Mel Gorman
2007-11-09 16:19 ` Christoph Lameter
2007-11-09 16:45 ` Nishanth Aravamudan
2007-11-09 17:18 ` Lee Schermerhorn
2007-11-09 17:26 ` Christoph Lameter
2007-11-09 18:16 ` Nishanth Aravamudan
2007-11-09 18:20 ` Nishanth Aravamudan
2007-11-09 18:22 ` Christoph Lameter
2007-11-11 14:16 ` Mel Gorman
2007-11-12 19:07 ` Christoph Lameter
2007-11-09 18:14 ` Nishanth Aravamudan
2007-11-20 14:19 ` Mel Gorman
2007-11-20 15:14 ` Lee Schermerhorn
2007-11-20 16:21 ` Mel Gorman
2007-11-20 20:19 ` Christoph Lameter
2007-11-20 20:18 ` Christoph Lameter
2007-11-20 21:26 ` Mel Gorman
2007-11-20 21:33 ` Andrew Morton
2007-11-20 21:38 ` Christoph Lameter
2007-09-13 17:52 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v7 Mel Gorman
2007-09-13 17:54 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-09-12 21:04 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v6 Mel Gorman
2007-09-12 21:06 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-09-11 21:30 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 (resend) Mel Gorman
2007-09-11 21:32 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-09-11 15:19 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 Mel Gorman
2007-09-11 15:21 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071009011143.GC14670@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox