From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 22 Mar 2006 13:44:45 -0800 (PST) From: Christoph Lameter Subject: Add gfp flag __GFP_POLICY to control policies and cpusets redirection of allocations Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: akpm@osdl.org Cc: pj@sgi.com, ak@suse.de, linux-mm@kvack.org List-ID: Note to Andrew: This patch replaces cpuset-alloc_pages_node-overrides-cpuset-constraints.patch cpuset-alloc_pages_node-overrides-cpuset-constraints-speedup.patch Various subsystems manage their own locality (slab, block layer, device drivers) and are rather sensitive to cpusets or memory policies interfering with their operation. Also various kernel components rely on the ability to temporarily allocate a page for a kernel thread. That page should be local to the process. This patch introduces a flag __GFP_POLICY that can be specified to enable cpusets and memory policy redirection to different nodes for alloc_pages() and alloc_pages_node(). __GFP_POLICY is set by default for user space page allocations (GFP_USER and GFP_HIGHUSER) but not for GFP_KERNEL which is used by device drivers and other local uses of pages in the kernel. The slab allocator does its own application of memory policies and cpuset constraints based on SLAB_MEM_SPREAD flags. The patch just insures that the page allocator does not apply additional policies after the slab allocator has determined where memory should be allocated. This can happen f.e. if a cpuset is active and then some kernel component tries to allocate a new slab or grow the size of the slab. vmalloc() and vmalloc_node() are exempted from policies since these are typically used by device drivers for large memory allocations that should be controlled by the device driver itself. GFP_KERNEL does not have __GFP_POLICY set. Meaning that page allocations with GFP_KERNEL are no longer subject to cpusets and policy constraints. I have looked through the kernel for page allocation with GFP_KERNEL and the instance I have seen should not use policies. (Note that slab allocations with GFP_KERNEL still perform as before). Signed-off-by: Christoph Lameter Index: linux-2.6.16/include/linux/gfp.h =================================================================== --- linux-2.6.16.orig/include/linux/gfp.h 2006-03-19 21:53:29.000000000 -0800 +++ linux-2.6.16/include/linux/gfp.h 2006-03-22 13:25:02.000000000 -0800 @@ -47,6 +47,9 @@ struct vm_area_struct; #define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on success */ #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */ #define __GFP_HARDWALL ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */ +#define __GFP_POLICY ((__force gfp_t)0x40000u) /* Allocation needs to obey memory policies + and cpuset constraints */ + #define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */ #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) @@ -55,16 +58,17 @@ struct vm_area_struct; #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \ __GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \ __GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \ - __GFP_NOMEMALLOC|__GFP_HARDWALL) + __GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_POLICY) /* GFP_ATOMIC means both !wait (__GFP_WAIT not set) and use emergency pool */ #define GFP_ATOMIC (__GFP_HIGH) #define GFP_NOIO (__GFP_WAIT) #define GFP_NOFS (__GFP_WAIT | __GFP_IO) #define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS) -#define GFP_USER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL) +#define GFP_USER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \ + __GFP_POLICY) #define GFP_HIGHUSER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \ - __GFP_HIGHMEM) + __GFP_HIGHMEM | __GFP_POLICY) /* Flag - indicates that the buffer will be suitable for DMA. Ignored on some platforms, used as appropriate on others */ Index: linux-2.6.16/mm/slab.c =================================================================== --- linux-2.6.16.orig/mm/slab.c 2006-03-19 21:53:29.000000000 -0800 +++ linux-2.6.16/mm/slab.c 2006-03-22 13:25:02.000000000 -0800 @@ -1392,6 +1392,8 @@ static void *kmem_getpages(struct kmem_c int i; flags |= cachep->gfpflags; + flags &= ~__GFP_POLICY; + page = alloc_pages_node(nodeid, flags, cachep->gfporder); if (!page) return NULL; Index: linux-2.6.16/mm/vmalloc.c =================================================================== --- linux-2.6.16.orig/mm/vmalloc.c 2006-03-19 21:53:29.000000000 -0800 +++ linux-2.6.16/mm/vmalloc.c 2006-03-22 13:25:02.000000000 -0800 @@ -411,6 +411,9 @@ void *__vmalloc_area_node(struct vm_stru nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); + /* Do not obey policy or cpuset constraints */ + gfp_mask &= ~__GFP_POLICY; + area->nr_pages = nr_pages; /* Please note that the recursion is strictly bounded. */ if (array_size > PAGE_SIZE) Index: linux-2.6.16/kernel/cpuset.c =================================================================== --- linux-2.6.16.orig/kernel/cpuset.c 2006-03-19 21:53:29.000000000 -0800 +++ linux-2.6.16/kernel/cpuset.c 2006-03-22 13:25:02.000000000 -0800 @@ -2159,7 +2159,7 @@ int __cpuset_zone_allowed(struct zone *z const struct cpuset *cs; /* current cpuset ancestors */ int allowed = 1; /* is allocation in zone z allowed? */ - if (in_interrupt()) + if (in_interrupt() || !(gfp_mask & __GFP_POLICY)) return 1; node = z->zone_pgdat->node_id; if (node_isset(node, current->mems_allowed)) Index: linux-2.6.16/mm/mempolicy.c =================================================================== --- linux-2.6.16.orig/mm/mempolicy.c 2006-03-19 21:53:29.000000000 -0800 +++ linux-2.6.16/mm/mempolicy.c 2006-03-22 13:25:02.000000000 -0800 @@ -1292,7 +1292,7 @@ struct page *alloc_pages_current(gfp_t g if ((gfp & __GFP_WAIT) && !in_interrupt()) cpuset_update_task_memory_state(); - if (!pol || in_interrupt()) + if (!pol || in_interrupt() || !(gfp & __GFP_POLICY)) pol = &default_policy; if (pol->policy == MPOL_INTERLEAVE) return alloc_page_interleave(gfp, order, interleave_nodes(pol)); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org