* mempolicies: fix policy_zone check
@ 2006-08-04 23:54 Christoph Lameter
2006-08-04 23:55 ` Apply type enum zone_type Christoph Lameter
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Christoph Lameter @ 2006-08-04 23:54 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Lee Schermerhorn, ak
There is a check in zonelist_policy that compares pieces of the bitmap
obtained from a gfp mask via GFP_ZONETYPES with a zone number in function
zonelist_policy().
The bitmap is an ORed mask of __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM.
The policy_zone is a zone number with the possible values of ZONE_DMA,
ZONE_DMA32, ZONE_HIGHMEM and ZONE_NORMAL. These are two different domains
of values.
For some reason seemed to work before the zone reduction patchset (It
definitely works on SGI boxes since we just have one zone and the check
cannot fail).
With the zone reduction patchset this check definitely fails on systems
with two zones if the system actually has memory in both zones.
This is because ZONE_NORMAL is selected using no __GFP flag at
all and thus gfp_zone(gfpmask) == 0. ZONE_DMA is selected when __GFP_DMA
is set. __GFP_DMA is 0x01. So gfp_zone(gfpmask) == 1.
policy_zone is set to ZONE_NORMAL (==1) if ZONE_NORMAL and ZONE_DMA are
populated.
For ZONE_NORMAL gfp_zone(<no _GFP_DMA>) yields 0 which is <
policy_zone(ZONE_NORMAL) and so policy is not applied to regular memory
allocations!
Instead gfp_zone(__GFP_DMA) == 1 which results in policy being applied
to DMA allocations!
What we realy want in that place is to establish the highest allowable
zone for a given gfp_mask. If the highest zone is higher or equal to the
policy_zone then memory policies need to be applied. We have such
a highest_zone() function in page_alloc.c.
So move the highest_zone() function from mm/page_alloc.c into
include/linux/gfp.h. On the way we simplify the function and use the new
zone_type that was also introduced with the zone reduction patchset plus we
also specify the right type for the gfp flags parameter.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Index: test/mm/mempolicy.c
===================================================================
--- test.orig/mm/mempolicy.c 2006-07-15 14:53:08.000000000 -0700
+++ test/mm/mempolicy.c 2006-08-04 12:31:17.000000000 -0700
@@ -1096,7 +1096,7 @@
case MPOL_BIND:
/* Lower zones don't get a policy applied */
/* Careful: current->mems_allowed might have moved */
- if (gfp_zone(gfp) >= policy_zone)
+ if (highest_zone(gfp) >= policy_zone)
if (cpuset_zonelist_valid_mems_allowed(policy->v.zonelist))
return policy->v.zonelist;
/*FALL THROUGH*/
Index: test/include/linux/gfp.h
===================================================================
--- test.orig/include/linux/gfp.h 2006-08-04 12:16:03.000000000 -0700
+++ test/include/linux/gfp.h 2006-08-04 12:31:14.000000000 -0700
@@ -85,6 +85,21 @@
return zone;
}
+static inline enum zone_type highest_zone(gfp_t flags)
+{
+ if (flags & __GFP_DMA)
+ return ZONE_DMA;
+#ifdef CONFIG_ZONE_DMA32
+ if (flags & __GFP_DMA32)
+ return ZONE_DMA32;
+#endif
+#ifdef CONFIG_HIGHMEM
+ if (flags & __GFP_HIGHMEM)
+ return ZONE_HIGHMEM;
+#endif
+ return ZONE_NORMAL;
+}
+
/*
* There is only one page-allocator function, and two main namespaces to
* it. The alloc_page*() variants return 'struct page *' and as such
Index: test/mm/page_alloc.c
===================================================================
--- test.orig/mm/page_alloc.c 2006-08-04 12:16:13.000000000 -0700
+++ test/mm/page_alloc.c 2006-08-04 12:55:21.000000000 -0700
@@ -1466,22 +1466,6 @@
return nr_zones;
}
-static inline int highest_zone(int zone_bits)
-{
- int res = ZONE_NORMAL;
-#ifdef CONFIG_HIGHMEM
- if (zone_bits & (__force int)__GFP_HIGHMEM)
- res = ZONE_HIGHMEM;
-#endif
-#ifdef CONFIG_ZONE_DMA32
- if (zone_bits & (__force int)__GFP_DMA32)
- res = ZONE_DMA32;
-#endif
- if (zone_bits & (__force int)__GFP_DMA)
- res = ZONE_DMA;
- return res;
-}
-
#ifdef CONFIG_NUMA
#define MAX_NODE_LOAD (num_online_nodes())
static int __meminitdata node_load[MAX_NUMNODES];
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Apply type enum zone_type
2006-08-04 23:54 mempolicies: fix policy_zone check Christoph Lameter
@ 2006-08-04 23:55 ` Christoph Lameter
2006-08-04 23:57 ` linearly index zone->node_zonelists[] Christoph Lameter
2006-08-05 1:38 ` Apply type enum zone_type Andi Kleen
2006-08-05 0:08 ` mempolicies: fix policy_zone check Andrew Morton
` (2 subsequent siblings)
3 siblings, 2 replies; 14+ messages in thread
From: Christoph Lameter @ 2006-08-04 23:55 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Lee Schermerhorn, ak
After we have done this we can now do some typing cleanup.
The memory policy layer keeps a policy_zone that specifies
the zone that gets memory policies applied. This variable
can now be of type enum zone_type.
The check_highest_zone function and the build_zonelists funnctionm must
then also take a enum zone_type parameter.
Plus there are a number of loops over zones that also should use
zone_type.
We run into some troubles at some points with functions that need a
zone_type variable to become -1. Fix that up.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.18-rc2-mm1/mm/mempolicy.c
===================================================================
--- linux-2.6.18-rc2-mm1.orig/mm/mempolicy.c 2006-08-04 16:07:11.000000000 -0700
+++ linux-2.6.18-rc2-mm1/mm/mempolicy.c 2006-08-04 16:07:12.000000000 -0700
@@ -105,7 +105,7 @@
/* Highest zone. An specific allocation for a zone below that is not
policied. */
-int policy_zone = ZONE_DMA;
+enum zone_type policy_zone = ZONE_DMA;
struct mempolicy default_policy = {
.refcnt = ATOMIC_INIT(1), /* never free it */
@@ -137,7 +137,8 @@
static struct zonelist *bind_zonelist(nodemask_t *nodes)
{
struct zonelist *zl;
- int num, max, nd, k;
+ int num, max, nd;
+ enum zone_type k;
max = 1 + MAX_NR_ZONES * nodes_weight(*nodes);
zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL);
Index: linux-2.6.18-rc2-mm1/include/linux/mempolicy.h
===================================================================
--- linux-2.6.18-rc2-mm1.orig/include/linux/mempolicy.h 2006-08-04 16:07:11.000000000 -0700
+++ linux-2.6.18-rc2-mm1/include/linux/mempolicy.h 2006-08-04 16:07:12.000000000 -0700
@@ -162,9 +162,9 @@
unsigned long addr);
extern unsigned slab_node(struct mempolicy *policy);
-extern int policy_zone;
+extern enum zone_type policy_zone;
-static inline void check_highest_zone(int k)
+static inline void check_highest_zone(enum zone_type k)
{
if (k > policy_zone)
policy_zone = k;
Index: linux-2.6.18-rc2-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.18-rc2-mm1.orig/mm/page_alloc.c 2006-08-04 16:07:11.000000000 -0700
+++ linux-2.6.18-rc2-mm1/mm/page_alloc.c 2006-08-04 16:07:12.000000000 -0700
@@ -652,7 +652,8 @@
*/
void drain_node_pages(int nodeid)
{
- int i, z;
+ int i;
+ enum zone_type z;
unsigned long flags;
for (z = 0; z < MAX_NR_ZONES; z++) {
@@ -1232,7 +1233,8 @@
#ifdef CONFIG_NUMA
unsigned int nr_free_pages_pgdat(pg_data_t *pgdat)
{
- unsigned int i, sum = 0;
+ unsigned int sum = 0;
+ enum zone_type i;
for (i = 0; i < MAX_NR_ZONES; i++)
sum += pgdat->node_zones[i].free_pages;
@@ -1290,7 +1292,7 @@
*/
unsigned long nr_free_inactive_pages_node(int nid)
{
- unsigned int i;
+ enum zone_type i;
unsigned long sum = 0;
struct zone *zones = NODE_DATA(nid)->node_zones;
@@ -1448,21 +1450,22 @@
* Add all populated zones of a node to the zonelist.
*/
static int __meminit build_zonelists_node(pg_data_t *pgdat,
- struct zonelist *zonelist, int nr_zones, int zone_type)
+ struct zonelist *zonelist, int nr_zones, enum zone_type zone_type)
{
struct zone *zone;
BUG_ON(zone_type >= MAX_NR_ZONES);
+ zone_type++;
do {
+ zone_type--;
zone = pgdat->node_zones + zone_type;
if (populated_zone(zone)) {
zonelist->zones[nr_zones++] = zone;
check_highest_zone(zone_type);
}
- zone_type--;
- } while (zone_type >= 0);
+ } while (zone_type);
return nr_zones;
}
@@ -1531,10 +1534,11 @@
static void __meminit build_zonelists(pg_data_t *pgdat)
{
- int i, j, k, node, local_node;
+ int i, j, node, local_node;
int prev_node, load;
struct zonelist *zonelist;
nodemask_t used_mask;
+ enum zone_type k;
/* initialize zonelists */
for (i = 0; i < GFP_ZONETYPES; i++) {
@@ -1718,7 +1722,7 @@
unsigned long *zones_size, unsigned long *zholes_size)
{
unsigned long realtotalpages, totalpages = 0;
- int i;
+ enum zone_type i;
for (i = 0; i < MAX_NR_ZONES; i++)
totalpages += zones_size[i];
@@ -2207,7 +2211,7 @@
{
struct pglist_data *pgdat;
unsigned long reserve_pages = 0;
- int i, j;
+ enum zone_type i, j;
for_each_online_pgdat(pgdat) {
for (i = 0; i < MAX_NR_ZONES; i++) {
@@ -2240,7 +2244,7 @@
static void setup_per_zone_lowmem_reserve(void)
{
struct pglist_data *pgdat;
- int j, idx;
+ enum zone_type j, idx;
for_each_online_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
@@ -2249,9 +2253,12 @@
zone->lowmem_reserve[j] = 0;
- for (idx = j-1; idx >= 0; idx--) {
+ idx = j;
+ while (idx) {
struct zone *lower_zone;
+ idx--;
+
if (sysctl_lowmem_reserve_ratio[idx] < 1)
sysctl_lowmem_reserve_ratio[idx] = 1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* linearly index zone->node_zonelists[]
2006-08-04 23:55 ` Apply type enum zone_type Christoph Lameter
@ 2006-08-04 23:57 ` Christoph Lameter
2006-08-05 1:50 ` Andi Kleen
2006-08-08 12:20 ` Andy Whitcroft
2006-08-05 1:38 ` Apply type enum zone_type Andi Kleen
1 sibling, 2 replies; 14+ messages in thread
From: Christoph Lameter @ 2006-08-04 23:57 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Lee Schermerhorn, ak
I wonder why we need this bitmask indexing into zone->node_zonelists[]?
We always start with the highest zone and then include all lower zones
if we build zonelists.
Are there really cases where we need allocation from ZONE_DMA or
ZONE_HIGHMEM but not ZONE_NORMAL? It seems that the current implementation
of highest_zone() makes that already impossible.
If we go linear on the index then gfp_zone() == highest_zone() and a lot
of definitions fall by the wayside.
We can now revert back to the use of gfp_zone() in mempolicy.c ;-)
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.18-rc2-mm1/include/linux/gfp.h
===================================================================
--- linux-2.6.18-rc2-mm1.orig/include/linux/gfp.h 2006-08-04 15:25:00.000000000 -0700
+++ linux-2.6.18-rc2-mm1/include/linux/gfp.h 2006-08-04 16:07:39.000000000 -0700
@@ -12,9 +12,6 @@
*
* Zone modifiers (see linux/mmzone.h - low three bits)
*
- * These may be masked by GFP_ZONEMASK to make allocations with this bit
- * set fall back to ZONE_NORMAL.
- *
* Do not put any conditional on these. If necessary modify the definitions
* without the underscores and use the consistently. The definitions here may
* be used in bit comparisons.
@@ -78,14 +75,7 @@
#define GFP_DMA32 __GFP_DMA32
-static inline int gfp_zone(gfp_t gfp)
-{
- int zone = GFP_ZONEMASK & (__force int) gfp;
- BUG_ON(zone >= GFP_ZONETYPES);
- return zone;
-}
-
-static inline enum zone_type highest_zone(gfp_t flags)
+static inline enum zone_type gfp_zone(gfp_t flags)
{
if (flags & __GFP_DMA)
return ZONE_DMA;
Index: linux-2.6.18-rc2-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.18-rc2-mm1.orig/mm/page_alloc.c 2006-08-04 16:07:12.000000000 -0700
+++ linux-2.6.18-rc2-mm1/mm/page_alloc.c 2006-08-04 16:07:39.000000000 -0700
@@ -1534,14 +1534,14 @@
static void __meminit build_zonelists(pg_data_t *pgdat)
{
- int i, j, node, local_node;
+ int j, node, local_node;
+ enum zone_type i;
int prev_node, load;
struct zonelist *zonelist;
nodemask_t used_mask;
- enum zone_type k;
/* initialize zonelists */
- for (i = 0; i < GFP_ZONETYPES; i++) {
+ for (i = 0; i < MAX_NR_ZONES; i++) {
zonelist = pgdat->node_zonelists + i;
zonelist->zones[0] = NULL;
}
@@ -1571,13 +1571,11 @@
node_load[node] += load;
prev_node = node;
load--;
- for (i = 0; i < GFP_ZONETYPES; i++) {
+ for (i = 0; i < MAX_NR_ZONES; i++) {
zonelist = pgdat->node_zonelists + i;
for (j = 0; zonelist->zones[j] != NULL; j++);
- k = highest_zone(i);
-
- j = build_zonelists_node(NODE_DATA(node), zonelist, j, k);
+ j = build_zonelists_node(NODE_DATA(node), zonelist, j, i);
zonelist->zones[j] = NULL;
}
}
@@ -1587,19 +1585,16 @@
static void __meminit build_zonelists(pg_data_t *pgdat)
{
- int i, node, local_node;
- enum zone_type k;
- enum zone_type j;
+ int node, local_node;
+ enum zone_type i,j;
local_node = pgdat->node_id;
- for (i = 0; i < GFP_ZONETYPES; i++) {
+ for (i = 0; i < MAX_NR_ZONES; i++) {
struct zonelist *zonelist;
zonelist = pgdat->node_zonelists + i;
- j = 0;
- k = highest_zone(i);
- j = build_zonelists_node(pgdat, zonelist, j, k);
+ j = build_zonelists_node(pgdat, zonelist, 0, i);
/*
* Now we build the zonelist so that it contains the zones
* of all the other nodes.
@@ -1611,12 +1606,12 @@
for (node = local_node + 1; node < MAX_NUMNODES; node++) {
if (!node_online(node))
continue;
- j = build_zonelists_node(NODE_DATA(node), zonelist, j, k);
+ j = build_zonelists_node(NODE_DATA(node), zonelist, j, i);
}
for (node = 0; node < local_node; node++) {
if (!node_online(node))
continue;
- j = build_zonelists_node(NODE_DATA(node), zonelist, j, k);
+ j = build_zonelists_node(NODE_DATA(node), zonelist, j, i);
}
zonelist->zones[j] = NULL;
Index: linux-2.6.18-rc2-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.18-rc2-mm1.orig/include/linux/mmzone.h 2006-08-04 15:08:57.000000000 -0700
+++ linux-2.6.18-rc2-mm1/include/linux/mmzone.h 2006-08-04 16:07:39.000000000 -0700
@@ -136,60 +136,18 @@
MAX_NR_ZONES
};
-
/*
* When a memory allocation must conform to specific limitations (such
* as being suitable for DMA) the caller will pass in hints to the
* allocator in the gfp_mask, in the zone modifier bits. These bits
* are used to select a priority ordered list of memory zones which
- * match the requested limits. GFP_ZONEMASK defines which bits within
- * the gfp_mask should be considered as zone modifiers. Each valid
- * combination of the zone modifier bits has a corresponding list
- * of zones (in node_zonelists). Thus for two zone modifiers there
- * will be a maximum of 4 (2 ** 2) zonelists, for 3 modifiers there will
- * be 8 (2 ** 3) zonelists. GFP_ZONETYPES defines the number of possible
- * combinations of zone modifiers in "zone modifier space".
- *
- * As an optimisation any zone modifier bits which are only valid when
- * no other zone modifier bits are set (loners) should be placed in
- * the highest order bits of this field. This allows us to reduce the
- * extent of the zonelists thus saving space. For example in the case
- * of three zone modifier bits, we could require up to eight zonelists.
- * If the left most zone modifier is a "loner" then the highest valid
- * zonelist would be four allowing us to allocate only five zonelists.
- * Use the first form for GFP_ZONETYPES when the left most bit is not
- * a "loner", otherwise use the second.
- *
- * NOTE! Make sure this matches the zones in <linux/gfp.h>
+ * match the requested limits. See gfp_zone() in include/linux/gfp.h
*/
-#ifdef CONFIG_ZONE_DMA32
-
-#ifdef CONFIG_HIGHMEM
-#define GFP_ZONETYPES ((GFP_ZONEMASK + 1) / 2 + 1) /* Loner */
-#define GFP_ZONEMASK 0x07
-#define ZONES_SHIFT 2 /* ceil(log2(MAX_NR_ZONES)) */
-#else
-#define GFP_ZONETYPES ((0x07 + 1) / 2 + 1) /* Loner */
-/* Mask __GFP_HIGHMEM */
-#define GFP_ZONEMASK 0x05
-#define ZONES_SHIFT 2
-#endif
-
+#if !defined(CONFIG_ZONE_DMA32) && !defined(CONFIG_HIGHMEM)
+#define ZONES_SHIFT 1
#else
-#ifdef CONFIG_HIGHMEM
-
-#define GFP_ZONEMASK 0x03
-#define ZONES_SHIFT 2
-#define GFP_ZONETYPES 3
-
-#else
-
-#define GFP_ZONEMASK 0x01
-#define ZONES_SHIFT 1
-#define GFP_ZONETYPES 2
-
-#endif
+#define ZONES_SHIFT 2
#endif
struct zone {
@@ -364,7 +322,7 @@
struct bootmem_data;
typedef struct pglist_data {
struct zone node_zones[MAX_NR_ZONES];
- struct zonelist node_zonelists[GFP_ZONETYPES];
+ struct zonelist node_zonelists[MAX_NR_ZONES];
int nr_zones;
#ifdef CONFIG_FLAT_NODE_MEM_MAP
struct page *node_mem_map;
Index: linux-2.6.18-rc2-mm1/mm/mempolicy.c
===================================================================
--- linux-2.6.18-rc2-mm1.orig/mm/mempolicy.c 2006-08-04 16:07:12.000000000 -0700
+++ linux-2.6.18-rc2-mm1/mm/mempolicy.c 2006-08-04 16:07:39.000000000 -0700
@@ -1097,7 +1097,7 @@
case MPOL_BIND:
/* Lower zones don't get a policy applied */
/* Careful: current->mems_allowed might have moved */
- if (highest_zone(gfp) >= policy_zone)
+ if (gfp_zone(gfp) >= policy_zone)
if (cpuset_zonelist_valid_mems_allowed(policy->v.zonelist))
return policy->v.zonelist;
/*FALL THROUGH*/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mempolicies: fix policy_zone check
2006-08-04 23:54 mempolicies: fix policy_zone check Christoph Lameter
2006-08-04 23:55 ` Apply type enum zone_type Christoph Lameter
@ 2006-08-05 0:08 ` Andrew Morton
2006-08-05 0:18 ` Christoph Lameter
2006-08-05 1:49 ` Andi Kleen
2006-08-08 12:00 ` Andy Whitcroft
3 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2006-08-05 0:08 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, Lee Schermerhorn, ak
Do these patches fix Lee's "Regression in 2.6.18-rc2-mm1: mbind() not binding"?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mempolicies: fix policy_zone check
2006-08-05 0:08 ` mempolicies: fix policy_zone check Andrew Morton
@ 2006-08-05 0:18 ` Christoph Lameter
2006-08-07 13:40 ` Lee Schermerhorn
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2006-08-05 0:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Lee Schermerhorn, ak
On Fri, 4 Aug 2006, Andrew Morton wrote:
> Do these patches fix Lee's "Regression in 2.6.18-rc2-mm1: mbind() not binding"?
Yes. It only not binds for a two zone NUMA configuration though as
explained in the patch.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Apply type enum zone_type
2006-08-04 23:55 ` Apply type enum zone_type Christoph Lameter
2006-08-04 23:57 ` linearly index zone->node_zonelists[] Christoph Lameter
@ 2006-08-05 1:38 ` Andi Kleen
2006-08-05 2:01 ` Christoph Lameter
1 sibling, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2006-08-05 1:38 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm, Lee Schermerhorn
On Saturday 05 August 2006 01:55, Christoph Lameter wrote:
> After we have done this we can now do some typing cleanup.
>
> The memory policy layer keeps a policy_zone that specifies
> the zone that gets memory policies applied. This variable
> can now be of type enum zone_type.
>
> The check_highest_zone function and the build_zonelists funnctionm must
> then also take a enum zone_type parameter.
>
> Plus there are a number of loops over zones that also should use
> zone_type.
>
> We run into some troubles at some points with functions that need a
> zone_type variable to become -1. Fix that up.
enums are not really type checked, so it seems somewhat pointless
to do this change.
If you really wanted strict type checking you would need typedef struct
and accessors.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mempolicies: fix policy_zone check
2006-08-04 23:54 mempolicies: fix policy_zone check Christoph Lameter
2006-08-04 23:55 ` Apply type enum zone_type Christoph Lameter
2006-08-05 0:08 ` mempolicies: fix policy_zone check Andrew Morton
@ 2006-08-05 1:49 ` Andi Kleen
2006-08-05 2:05 ` Christoph Lameter
2006-08-08 12:00 ` Andy Whitcroft
3 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2006-08-05 1:49 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm, Lee Schermerhorn
On Saturday 05 August 2006 01:54, Christoph Lameter wrote:
> So move the highest_zone() function from mm/page_alloc.c into
> include/linux/gfp.h. On the way we simplify the function and use the new
> zone_type that was also introduced with the zone reduction patchset plus we
> also specify the right type for the gfp flags parameter.
The function is a bit big to inline. Better keep it in page_alloc.c, but
make it global.
Other than that it looks ok.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: linearly index zone->node_zonelists[]
2006-08-04 23:57 ` linearly index zone->node_zonelists[] Christoph Lameter
@ 2006-08-05 1:50 ` Andi Kleen
2006-08-08 12:20 ` Andy Whitcroft
1 sibling, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2006-08-05 1:50 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm, Lee Schermerhorn
On Saturday 05 August 2006 01:57, Christoph Lameter wrote:
> I wonder why we need this bitmask indexing into zone->node_zonelists[]?
Yes I always wondered that too.
It's probably a good idea to change this yes.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Apply type enum zone_type
2006-08-05 1:38 ` Apply type enum zone_type Andi Kleen
@ 2006-08-05 2:01 ` Christoph Lameter
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2006-08-05 2:01 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-mm, Lee Schermerhorn
On Sat, 5 Aug 2006, Andi Kleen wrote:
> > We run into some troubles at some points with functions that need a
> > zone_type variable to become -1. Fix that up.
>
> enums are not really type checked, so it seems somewhat pointless
> to do this change.
It helps to clarify what types of values can be expected from a variable.
We have that in various places.
> If you really wanted strict type checking you would need typedef struct
> and accessors.
Right. We definitely do not want that overhead.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mempolicies: fix policy_zone check
2006-08-05 1:49 ` Andi Kleen
@ 2006-08-05 2:05 ` Christoph Lameter
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2006-08-05 2:05 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-mm, Lee Schermerhorn
On Sat, 5 Aug 2006, Andi Kleen wrote:
> On Saturday 05 August 2006 01:54, Christoph Lameter wrote:
>
> > So move the highest_zone() function from mm/page_alloc.c into
> > include/linux/gfp.h. On the way we simplify the function and use the new
> > zone_type that was also introduced with the zone reduction patchset plus we
> > also specify the right type for the gfp flags parameter.
>
> The function is a bit big to inline. Better keep it in page_alloc.c, but
> make it global.
Basically we have a maximum of 2 comparisons (no architecture
supports 4 zones) in the function with a simple constant return.
Most modern processors can do that kind of thing inline without jumps and
its just a few instructions (likely less than a function call). On most
platforms that only support DMA and NORMAL we only have a single
comparison.
Also having that function inline allows optimizations if the gfp flag is
partially or fully known. If the compiler sees
gfp_zone(__GFP_HIGHMEM | blablabla) then it can substitute ZONE_HIGHMEM .
gfp_zone(GFP_USER) can be determined at compile time etc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mempolicies: fix policy_zone check
2006-08-05 0:18 ` Christoph Lameter
@ 2006-08-07 13:40 ` Lee Schermerhorn
0 siblings, 0 replies; 14+ messages in thread
From: Lee Schermerhorn @ 2006-08-07 13:40 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, linux-mm, ak
On Fri, 2006-08-04 at 17:18 -0700, Christoph Lameter wrote:
> On Fri, 4 Aug 2006, Andrew Morton wrote:
>
> > Do these patches fix Lee's "Regression in 2.6.18-rc2-mm1: mbind() not binding"?
>
> Yes. It only not binds for a two zone NUMA configuration though as
> explained in the patch.
Ack. I tested an earlier version of the fix for Christoph on Friday on
our platform but had to leave, incommunicado for the weekend, before his
final patch went out.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mempolicies: fix policy_zone check
2006-08-04 23:54 mempolicies: fix policy_zone check Christoph Lameter
` (2 preceding siblings ...)
2006-08-05 1:49 ` Andi Kleen
@ 2006-08-08 12:00 ` Andy Whitcroft
3 siblings, 0 replies; 14+ messages in thread
From: Andy Whitcroft @ 2006-08-08 12:00 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm, Lee Schermerhorn, ak
Christoph Lameter wrote:
> There is a check in zonelist_policy that compares pieces of the bitmap
> obtained from a gfp mask via GFP_ZONETYPES with a zone number in function
> zonelist_policy().
>
> The bitmap is an ORed mask of __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM.
> The policy_zone is a zone number with the possible values of ZONE_DMA,
> ZONE_DMA32, ZONE_HIGHMEM and ZONE_NORMAL. These are two different domains
> of values.
>
> For some reason seemed to work before the zone reduction patchset (It
> definitely works on SGI boxes since we just have one zone and the check
> cannot fail).
>
> With the zone reduction patchset this check definitely fails on systems
> with two zones if the system actually has memory in both zones.
>
> This is because ZONE_NORMAL is selected using no __GFP flag at
> all and thus gfp_zone(gfpmask) == 0. ZONE_DMA is selected when __GFP_DMA
> is set. __GFP_DMA is 0x01. So gfp_zone(gfpmask) == 1.
>
> policy_zone is set to ZONE_NORMAL (==1) if ZONE_NORMAL and ZONE_DMA are
> populated.
>
> For ZONE_NORMAL gfp_zone(<no _GFP_DMA>) yields 0 which is <
> policy_zone(ZONE_NORMAL) and so policy is not applied to regular memory
> allocations!
>
> Instead gfp_zone(__GFP_DMA) == 1 which results in policy being applied
> to DMA allocations!
>
> What we realy want in that place is to establish the highest allowable
> zone for a given gfp_mask. If the highest zone is higher or equal to the
> policy_zone then memory policies need to be applied. We have such
> a highest_zone() function in page_alloc.c.
>
> So move the highest_zone() function from mm/page_alloc.c into
> include/linux/gfp.h. On the way we simplify the function and use the new
> zone_type that was also introduced with the zone reduction patchset plus we
> also specify the right type for the gfp flags parameter.
>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
>
> Index: test/mm/mempolicy.c
> ===================================================================
> --- test.orig/mm/mempolicy.c 2006-07-15 14:53:08.000000000 -0700
> +++ test/mm/mempolicy.c 2006-08-04 12:31:17.000000000 -0700
> @@ -1096,7 +1096,7 @@
> case MPOL_BIND:
> /* Lower zones don't get a policy applied */
> /* Careful: current->mems_allowed might have moved */
> - if (gfp_zone(gfp) >= policy_zone)
> + if (highest_zone(gfp) >= policy_zone)
> if (cpuset_zonelist_valid_mems_allowed(policy->v.zonelist))
> return policy->v.zonelist;
> /*FALL THROUGH*/
> Index: test/include/linux/gfp.h
> ===================================================================
> --- test.orig/include/linux/gfp.h 2006-08-04 12:16:03.000000000 -0700
> +++ test/include/linux/gfp.h 2006-08-04 12:31:14.000000000 -0700
> @@ -85,6 +85,21 @@
> return zone;
> }
>
> +static inline enum zone_type highest_zone(gfp_t flags)
> +{
> + if (flags & __GFP_DMA)
> + return ZONE_DMA;
> +#ifdef CONFIG_ZONE_DMA32
> + if (flags & __GFP_DMA32)
> + return ZONE_DMA32;
> +#endif
> +#ifdef CONFIG_HIGHMEM
> + if (flags & __GFP_HIGHMEM)
> + return ZONE_HIGHMEM;
> +#endif
> + return ZONE_NORMAL;
> +}
> +
The name of the function is very missleading. What it actually does is
convert a gfp mask into a zone number. It is currently not legal to
specify more than one zone specifier so we don't have multiple to select
the 'highest' from.
Either way even if we had more than one specified neither of these
routines would return the highest as the order of the checks is incorrect.
Perhaps now is the time to change its name to something more
appropriate: gfp_to_zone_num or something?
-apw
> /*
> * There is only one page-allocator function, and two main namespaces to
> * it. The alloc_page*() variants return 'struct page *' and as such
> Index: test/mm/page_alloc.c
> ===================================================================
> --- test.orig/mm/page_alloc.c 2006-08-04 12:16:13.000000000 -0700
> +++ test/mm/page_alloc.c 2006-08-04 12:55:21.000000000 -0700
> @@ -1466,22 +1466,6 @@
> return nr_zones;
> }
>
> -static inline int highest_zone(int zone_bits)
> -{
> - int res = ZONE_NORMAL;
> -#ifdef CONFIG_HIGHMEM
> - if (zone_bits & (__force int)__GFP_HIGHMEM)
> - res = ZONE_HIGHMEM;
> -#endif
> -#ifdef CONFIG_ZONE_DMA32
> - if (zone_bits & (__force int)__GFP_DMA32)
> - res = ZONE_DMA32;
> -#endif
> - if (zone_bits & (__force int)__GFP_DMA)
> - res = ZONE_DMA;
> - return res;
> -}
> -
> #ifdef CONFIG_NUMA
> #define MAX_NODE_LOAD (num_online_nodes())
> static int __meminitdata node_load[MAX_NUMNODES];
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: linearly index zone->node_zonelists[]
2006-08-04 23:57 ` linearly index zone->node_zonelists[] Christoph Lameter
2006-08-05 1:50 ` Andi Kleen
@ 2006-08-08 12:20 ` Andy Whitcroft
2006-08-08 15:50 ` Christoph Lameter
1 sibling, 1 reply; 14+ messages in thread
From: Andy Whitcroft @ 2006-08-08 12:20 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm, Lee Schermerhorn, ak
Christoph Lameter wrote:
> I wonder why we need this bitmask indexing into zone->node_zonelists[]?
>
> We always start with the highest zone and then include all lower zones
> if we build zonelists.
>
> Are there really cases where we need allocation from ZONE_DMA or
> ZONE_HIGHMEM but not ZONE_NORMAL? It seems that the current implementation
> of highest_zone() makes that already impossible.
>
> If we go linear on the index then gfp_zone() == highest_zone() and a lot
> of definitions fall by the wayside.
>
> We can now revert back to the use of gfp_zone() in mempolicy.c ;-)
>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
We have had patches to do this very change before and they were
rejected. I can't of course find them to get the reasoning, but this is
my memory.
The GFP_foo flags are modifiers specifying some property we require from
an allocation. Currently all modifiers are singletons, that is they are
all specified in isolation. However, the code base as it stands does
not enforce this. I could see use cases where we might want to specify
more than one flag. For example a GFP_NODE_LOCAL flags which could be
specified with any of the 'zone selectors'. This would naturally work
with the current implementation.
Making the change you suggest here codifies the singleton status of
these bits. We should be sure we are not going to use this feature
before its removed. I am not sure I am comfortable saying there are no
uses for it.
-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: linearly index zone->node_zonelists[]
2006-08-08 12:20 ` Andy Whitcroft
@ 2006-08-08 15:50 ` Christoph Lameter
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2006-08-08 15:50 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: akpm, linux-mm, Lee Schermerhorn, ak
On Tue, 8 Aug 2006, Andy Whitcroft wrote:
> The GFP_foo flags are modifiers specifying some property we require from an
> allocation. Currently all modifiers are singletons, that is they are all
> specified in isolation. However, the code base as it stands does not enforce
> this. I could see use cases where we might want to specify more than one
> flag. For example a GFP_NODE_LOCAL flags which could be specified with any of
> the 'zone selectors'. This would naturally work with the current
> implementation.
I have a patch here that implements such a thing its called
__GFP_THISNODE but it does not use any of the bit mask features that I
removed. __GFP_THISNODE simply checks if the allocation zone is local.
> Making the change you suggest here codifies the singleton status of these
> bits. We should be sure we are not going to use this feature before its
> removed. I am not sure I am comfortable saying there are no uses for it.
This certainly codifies the singleton status. However, I cannot imagine
any uses for it that would not also require other significant changes in
the page allocator. We can revisit that if someone comes up with a feature
that needs this.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-08-08 15:50 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-04 23:54 mempolicies: fix policy_zone check Christoph Lameter
2006-08-04 23:55 ` Apply type enum zone_type Christoph Lameter
2006-08-04 23:57 ` linearly index zone->node_zonelists[] Christoph Lameter
2006-08-05 1:50 ` Andi Kleen
2006-08-08 12:20 ` Andy Whitcroft
2006-08-08 15:50 ` Christoph Lameter
2006-08-05 1:38 ` Apply type enum zone_type Andi Kleen
2006-08-05 2:01 ` Christoph Lameter
2006-08-05 0:08 ` mempolicies: fix policy_zone check Andrew Morton
2006-08-05 0:18 ` Christoph Lameter
2006-08-07 13:40 ` Lee Schermerhorn
2006-08-05 1:49 ` Andi Kleen
2006-08-05 2:05 ` Christoph Lameter
2006-08-08 12:00 ` Andy Whitcroft
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox