* Add gfp flag __GFP_POLICY to control policies and cpusets redirection of allocations
@ 2006-03-22 21:44 Christoph Lameter
2006-03-25 1:44 ` Paul Jackson
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Lameter @ 2006-03-22 21:44 UTC (permalink / raw)
To: akpm; +Cc: pj, ak, linux-mm
Note to Andrew: This patch replaces
cpuset-alloc_pages_node-overrides-cpuset-constraints.patch
cpuset-alloc_pages_node-overrides-cpuset-constraints-speedup.patch
Various subsystems manage their own locality (slab, block layer, device drivers)
and are rather sensitive to cpusets or memory policies interfering with their
operation. Also various kernel components rely on the ability to temporarily
allocate a page for a kernel thread. That page should be local to the process.
This patch introduces a flag __GFP_POLICY that can be specified to enable
cpusets and memory policy redirection to different nodes for alloc_pages()
and alloc_pages_node().
__GFP_POLICY is set by default for user space page allocations (GFP_USER and
GFP_HIGHUSER) but not for GFP_KERNEL which is used by device drivers and
other local uses of pages in the kernel.
The slab allocator does its own application of memory policies and cpuset
constraints based on SLAB_MEM_SPREAD flags. The patch just insures that the
page allocator does not apply additional policies after the slab allocator
has determined where memory should be allocated. This can happen f.e. if
a cpuset is active and then some kernel component tries to allocate
a new slab or grow the size of the slab.
vmalloc() and vmalloc_node() are exempted from policies since these are
typically used by device drivers for large memory allocations that should
be controlled by the device driver itself.
GFP_KERNEL does not have __GFP_POLICY set. Meaning that page allocations
with GFP_KERNEL are no longer subject to cpusets and policy constraints.
I have looked through the kernel for page allocation with GFP_KERNEL and
the instance I have seen should not use policies.
(Note that slab allocations with GFP_KERNEL still perform as before).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.16/include/linux/gfp.h
===================================================================
--- linux-2.6.16.orig/include/linux/gfp.h 2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.16/include/linux/gfp.h 2006-03-22 13:25:02.000000000 -0800
@@ -47,6 +47,9 @@ struct vm_area_struct;
#define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on success */
#define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
#define __GFP_HARDWALL ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
+#define __GFP_POLICY ((__force gfp_t)0x40000u) /* Allocation needs to obey memory policies
+ and cpuset constraints */
+
#define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
@@ -55,16 +58,17 @@ struct vm_area_struct;
#define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
- __GFP_NOMEMALLOC|__GFP_HARDWALL)
+ __GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_POLICY)
/* GFP_ATOMIC means both !wait (__GFP_WAIT not set) and use emergency pool */
#define GFP_ATOMIC (__GFP_HIGH)
#define GFP_NOIO (__GFP_WAIT)
#define GFP_NOFS (__GFP_WAIT | __GFP_IO)
#define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS)
-#define GFP_USER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
+#define GFP_USER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
+ __GFP_POLICY)
#define GFP_HIGHUSER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
- __GFP_HIGHMEM)
+ __GFP_HIGHMEM | __GFP_POLICY)
/* Flag - indicates that the buffer will be suitable for DMA. Ignored on some
platforms, used as appropriate on others */
Index: linux-2.6.16/mm/slab.c
===================================================================
--- linux-2.6.16.orig/mm/slab.c 2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.16/mm/slab.c 2006-03-22 13:25:02.000000000 -0800
@@ -1392,6 +1392,8 @@ static void *kmem_getpages(struct kmem_c
int i;
flags |= cachep->gfpflags;
+ flags &= ~__GFP_POLICY;
+
page = alloc_pages_node(nodeid, flags, cachep->gfporder);
if (!page)
return NULL;
Index: linux-2.6.16/mm/vmalloc.c
===================================================================
--- linux-2.6.16.orig/mm/vmalloc.c 2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.16/mm/vmalloc.c 2006-03-22 13:25:02.000000000 -0800
@@ -411,6 +411,9 @@ void *__vmalloc_area_node(struct vm_stru
nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
array_size = (nr_pages * sizeof(struct page *));
+ /* Do not obey policy or cpuset constraints */
+ gfp_mask &= ~__GFP_POLICY;
+
area->nr_pages = nr_pages;
/* Please note that the recursion is strictly bounded. */
if (array_size > PAGE_SIZE)
Index: linux-2.6.16/kernel/cpuset.c
===================================================================
--- linux-2.6.16.orig/kernel/cpuset.c 2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.16/kernel/cpuset.c 2006-03-22 13:25:02.000000000 -0800
@@ -2159,7 +2159,7 @@ int __cpuset_zone_allowed(struct zone *z
const struct cpuset *cs; /* current cpuset ancestors */
int allowed = 1; /* is allocation in zone z allowed? */
- if (in_interrupt())
+ if (in_interrupt() || !(gfp_mask & __GFP_POLICY))
return 1;
node = z->zone_pgdat->node_id;
if (node_isset(node, current->mems_allowed))
Index: linux-2.6.16/mm/mempolicy.c
===================================================================
--- linux-2.6.16.orig/mm/mempolicy.c 2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.16/mm/mempolicy.c 2006-03-22 13:25:02.000000000 -0800
@@ -1292,7 +1292,7 @@ struct page *alloc_pages_current(gfp_t g
if ((gfp & __GFP_WAIT) && !in_interrupt())
cpuset_update_task_memory_state();
- if (!pol || in_interrupt())
+ if (!pol || in_interrupt() || !(gfp & __GFP_POLICY))
pol = &default_policy;
if (pol->policy == MPOL_INTERLEAVE)
return alloc_page_interleave(gfp, order, interleave_nodes(pol));
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Add gfp flag __GFP_POLICY to control policies and cpusets redirection of allocations
2006-03-22 21:44 Add gfp flag __GFP_POLICY to control policies and cpusets redirection of allocations Christoph Lameter
@ 2006-03-25 1:44 ` Paul Jackson
2006-03-27 7:29 ` Paul Jackson
0 siblings, 1 reply; 3+ messages in thread
From: Paul Jackson @ 2006-03-25 1:44 UTC (permalink / raw)
To: akpm; +Cc: Christoph Lameter, ak, linux-mm, linux-kernel
Andrew,
I am NAQ'ing this patch, aka:
add-gfp-flag-__gfp_policy-to-control-policies-and-cpusets-redirection.patch added to -mm tree
This patch does not always fix the problem that first motivated it of
failed memory migrations, and it changes the semantics of the
interaction of the kernel page allocators with the cpuset and mempolicy
memory policies in ways that, in my view, need more analysis first.
I intend to send a patch with a different solution on about Monday
three days from now, hopefully with Christoph's review and ACK.
Details ... for the curious:
We have two sets of problems here.
1) Invoking memory migration via the cpuset interface 'memory_migrate'
would fail (do nothing, without complaint or explanation) if
the task invoking the migration was not in the target cpuset of
the migration. This caused much confusion and befuddlement of
Christoph, myself and our test engineers.
The key problem was that we are trying to allocate the new pages
to receive the migration in the context of the task invoking the
migration. If that tasks cpusets (or mbind mempolicy) doesn't allow
allocation on those nodes, the migration will move the target
task to some nodes that are in the invoking tasks cpuset instead.
This needs fixing sooner rather than later. The ordinary user
of memory migration will often find it broken until we fix this.
My next attempt to fix this will have the kernel migration code
temporarilly and silently and automatically put the invoking task
in the necessary cpuset, so that the migration code can allocate
the new pages on the requested nodes. I hope to prepare this
patch this weekend, so Christoph can review it Monday, and we
can submit it then.
2) The GFP flags and the interaction with various kernel allocators
and the cpuset and mm/mempolicy memory policies have some strange
'(mis)features'. In the normal case, when there is enough memory
where asked for, they are ok.
Or, at least, no one has actually noticed the breakage, even
though much of it has been there for over a year.
The 2 patches that Christoph and I sent so far (the above
NAQ'd patch and its predecessor) both addressed some of these
'(mis)features', with the side affect of fixing (most of the time,
not all cases) the failed migrations of problem (1) above.
But both patches were partial bandaids.
More thought will be required before we offer up solutions for
(2). If I get the chance this weekend, I will at least try to
write up an lkml post describing some of the '(mis)features' we
observed during our analysis of this area, under some such Subject
as "Misfeatures of the kernel allocators and memory policy."
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Add gfp flag __GFP_POLICY to control policies and cpusets redirection of allocations
2006-03-25 1:44 ` Paul Jackson
@ 2006-03-27 7:29 ` Paul Jackson
0 siblings, 0 replies; 3+ messages in thread
From: Paul Jackson @ 2006-03-27 7:29 UTC (permalink / raw)
To: Paul Jackson; +Cc: akpm, clameter, ak, linux-mm, linux-kernel
(Executive, aka Andrew, summary: no action items here yet ...)
Christoph sent me some corrections offline to my previous post.
I (pj) had written:
> This patch does not always fix the problem that first motivated it of
> failed memory migrations,
I had misunderstood Christoph's patch. He never intended to fix the
cpuset induced failure of memory migration. He intended to restore
proper behavior of the slab allocator and other kernel subsystems.
Part of my confusion arose from the fact that he took the occassion of
his patch to ask Andrew to drop an earlier patch of ours that -had-
intended, in part, to fix this cpuset-migration interaction.
And part of my confusion was just plain old confusion on my part.
> If I get the chance this weekend, I will at least try to
> write up an lkml post describing some of the '(mis)features' we
> observed during our analysis of this area, under some such Subject
> as "Misfeatures of the kernel allocators and memory policy."
I won't get that far. I'm still working with Christoph offline to make
sense of this. Hopefully I won't drive him to drink first ;-).
I still hope to have a much improved, agreed to by Christoph, patch to
fix the cpuset-migration interaction, posted to lkml in a day or two.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-03-27 7:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-22 21:44 Add gfp flag __GFP_POLICY to control policies and cpusets redirection of allocations Christoph Lameter
2006-03-25 1:44 ` Paul Jackson
2006-03-27 7:29 ` Paul Jackson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox