* Suspect use of "first_zones_zonelist()"
@ 2008-04-22 15:17 Lee Schermerhorn
2008-04-22 16:15 ` Mel Gorman
0 siblings, 1 reply; 7+ messages in thread
From: Lee Schermerhorn @ 2008-04-22 15:17 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, linux-mm, Eric Whitney
I was testing my "lazy migration" patches and noticed something
interesting about first_zones_zonelist(). I use this function to find
the target node for MPOL_BIND policy to determine if a page is
"misplaced" and should be migrated. In my testing, I found that I was
always "off by one". E.g., if my mempolicy nodemask contained only node
2, I'd migrate to node 3. If it contained node 3, I'd migrate to node 0
[on a 4-node platform], etc.
Following the usage in slab_node(), I was doing something like:
zr = first_zones_zonelist(node_zonelist(nid, ...), gfp_zone(...),
&pol->v.vnodes, &dummy);
newnid = zonelist_node_idx(zr);
Turns out that the return value is the NEXT zoneref in the zonelist
AFTER the one of interest--i.e., the first that satisfies any nodemask
constraint. I renamed 'dummy' to 'zone', ignore the return value and
use: newnid = zone->node. [I guess I could use zonelist_node_idx(zr
-1) as well.] This results in page migration to the expected node.
Anyway, after discovering this, I checked other usages of
first_zones_zonelist() outside of the iterator macros, and I THINK they
might be making the same mistake?
Here's a patch that "fixes" these. Do you agree? Or am I
misunderstanding this area [again!]?
Lee
PATCH fix off-by-one usage of first_zones_zonelist()
Against: 2.6.25-rc8-mm2
The return value of first_zones_zonelist() is actually the zoneref
AFTER the "requested zone"--i.e., the first zone in the zonelist
that satisfies any nodemask constraint. The "requested zone" is
returned via the @zone parameter. The returned zoneref is intended
to be passed to next_zones_zonelist() on subsequent iterations.
Fix up slab_node() and get_page_from_freelist() to use the requested
zone, rather than the next one in the list.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
mm/mempolicy.c | 9 ++++-----
mm/page_alloc.c | 2 +-
2 files changed, 5 insertions(+), 6 deletions(-)
Index: linux-2.6.25-rc8-mm2/mm/mempolicy.c
===================================================================
--- linux-2.6.25-rc8-mm2.orig/mm/mempolicy.c 2008-04-22 10:06:29.000000000 -0400
+++ linux-2.6.25-rc8-mm2/mm/mempolicy.c 2008-04-22 10:11:22.000000000 -0400
@@ -1396,14 +1396,13 @@ unsigned slab_node(struct mempolicy *pol
* first node.
*/
struct zonelist *zonelist;
- struct zoneref *z;
- struct zone *dummy;
+ struct zone *zone;
enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
zonelist = &NODE_DATA(numa_node_id())->node_zonelists[0];
- z = first_zones_zonelist(zonelist, highest_zoneidx,
+ (void)first_zones_zonelist(zonelist, highest_zoneidx,
&policy->v.nodes,
- &dummy);
- return zonelist_node_idx(z);
+ &zone);
+ return zone->node;
}
default:
Index: linux-2.6.25-rc8-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.25-rc8-mm2.orig/mm/page_alloc.c 2008-04-22 10:00:58.000000000 -0400
+++ linux-2.6.25-rc8-mm2/mm/page_alloc.c 2008-04-22 10:16:32.000000000 -0400
@@ -1414,7 +1414,7 @@ get_page_from_freelist(gfp_t gfp_mask, n
z = first_zones_zonelist(zonelist, high_zoneidx, nodemask,
&preferred_zone);
- classzone_idx = zonelist_zone_idx(z);
+ classzone_idx = zonelist_zone_idx(z - 1); /* z is next zone in list */
zonelist_scan:
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Suspect use of "first_zones_zonelist()"
2008-04-22 15:17 Suspect use of "first_zones_zonelist()" Lee Schermerhorn
@ 2008-04-22 16:15 ` Mel Gorman
2008-04-22 17:10 ` Lee Schermerhorn
0 siblings, 1 reply; 7+ messages in thread
From: Mel Gorman @ 2008-04-22 16:15 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: Andrew Morton, linux-mm, Eric Whitney
On (22/04/08 11:17), Lee Schermerhorn didst pronounce:
> Mel:
>
> I was testing my "lazy migration" patches and noticed something
> interesting about first_zones_zonelist(). I use this function to find
> the target node for MPOL_BIND policy to determine if a page is
> "misplaced" and should be migrated. In my testing, I found that I was
> always "off by one". E.g., if my mempolicy nodemask contained only node
> 2, I'd migrate to node 3. If it contained node 3, I'd migrate to node 0
> [on a 4-node platform], etc.
>
> Following the usage in slab_node(), I was doing something like:
>
> zr = first_zones_zonelist(node_zonelist(nid, ...), gfp_zone(...),
> &pol->v.vnodes, &dummy);
> newnid = zonelist_node_idx(zr);
>
> Turns out that the return value is the NEXT zoneref in the zonelist
> AFTER the one of interest
Yes, the intention was that the cursor (zr) was meant to be pointing to
the next reference likely to be of interest. Bad usage of the cursor was
a pretty stupid mistake particularly as the cursor was implemented this
way intentionally.
/me beats self with clue-stick
> --i.e., the first that satisfies any nodemask
> constraint. I renamed 'dummy' to 'zone', ignore the return value and
> use: newnid = zone->node. [I guess I could use zonelist_node_idx(zr
> -1) as well.]
zr - 1 would be vunerable to the iterator implementation changing.
> This results in page migration to the expected node.
>
This use of zone instead of the zoneref cursor should be made throughout.
> Anyway, after discovering this, I checked other usages of
> first_zones_zonelist() outside of the iterator macros, and I THINK they
> might be making the same mistake?
>
Yes, you're right.
> Here's a patch that "fixes" these. Do you agree? Or am I
> misunderstanding this area [again!]?
>
No, I screwed up with the use of cursors and didn't get caught for it as
the effect would be very difficult to spot normally. I extended your patch
slightly below to catch the other callers. Can you take a read-through please?
> Lee
>
> PATCH fix off-by-one usage of first_zones_zonelist()
>
> Against: 2.6.25-rc8-mm2
>
> The return value of first_zones_zonelist() is actually the zoneref
> AFTER the "requested zone"--i.e., the first zone in the zonelist
> that satisfies any nodemask constraint. The "requested zone" is
> returned via the @zone parameter. The returned zoneref is intended
> to be passed to next_zones_zonelist() on subsequent iterations.
>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
fs/buffer.c | 9 ++++-----
mm/mempolicy.c | 9 ++++-----
mm/page_alloc.c | 4 ++--
3 files changed, 10 insertions(+), 12 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/fs/buffer.c linux-2.6.25-mm1-fix-first_zone_zonelist/fs/buffer.c
--- linux-2.6.25-mm1-clean/fs/buffer.c 2008-04-22 10:30:02.000000000 +0100
+++ linux-2.6.25-mm1-fix-first_zone_zonelist/fs/buffer.c 2008-04-22 16:53:31.000000000 +0100
@@ -368,18 +368,17 @@ void invalidate_bdev(struct block_device
*/
static void free_more_memory(void)
{
- struct zoneref *zrefs;
- struct zone *dummy;
+ struct zone *zone;
int nid;
wakeup_pdflush(1024);
yield();
for_each_online_node(nid) {
- zrefs = first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
+ (void)first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
gfp_zone(GFP_NOFS), NULL,
- &dummy);
- if (zrefs->zone)
+ &zone);
+ if (zone)
try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
GFP_NOFS);
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/mm/mempolicy.c linux-2.6.25-mm1-fix-first_zone_zonelist/mm/mempolicy.c
--- linux-2.6.25-mm1-clean/mm/mempolicy.c 2008-04-22 10:30:04.000000000 +0100
+++ linux-2.6.25-mm1-fix-first_zone_zonelist/mm/mempolicy.c 2008-04-22 16:54:38.000000000 +0100
@@ -1396,14 +1396,13 @@ unsigned slab_node(struct mempolicy *pol
* first node.
*/
struct zonelist *zonelist;
- struct zoneref *z;
- struct zone *dummy;
+ struct zone *zone;
enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
zonelist = &NODE_DATA(numa_node_id())->node_zonelists[0];
- z = first_zones_zonelist(zonelist, highest_zoneidx,
+ (void)first_zones_zonelist(zonelist, highest_zoneidx,
&policy->v.nodes,
- &dummy);
- return zonelist_node_idx(z);
+ &zone);
+ return zone->node;
}
default:
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/mm/page_alloc.c linux-2.6.25-mm1-fix-first_zone_zonelist/mm/page_alloc.c
--- linux-2.6.25-mm1-clean/mm/page_alloc.c 2008-04-22 10:30:04.000000000 +0100
+++ linux-2.6.25-mm1-fix-first_zone_zonelist/mm/page_alloc.c 2008-04-22 16:58:19.000000000 +0100
@@ -1412,9 +1412,9 @@ get_page_from_freelist(gfp_t gfp_mask, n
int zlc_active = 0; /* set if using zonelist_cache */
int did_zlc_setup = 0; /* just call zlc_setup() one time */
- z = first_zones_zonelist(zonelist, high_zoneidx, nodemask,
+ (void)first_zones_zonelist(zonelist, high_zoneidx, nodemask,
&preferred_zone);
- classzone_idx = zonelist_zone_idx(z);
+ classzone_idx = zone_idx(preferred_zone);
zonelist_scan:
/*
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Suspect use of "first_zones_zonelist()"
2008-04-22 16:15 ` Mel Gorman
@ 2008-04-22 17:10 ` Lee Schermerhorn
2008-04-22 17:49 ` Mel Gorman
0 siblings, 1 reply; 7+ messages in thread
From: Lee Schermerhorn @ 2008-04-22 17:10 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, linux-mm, Eric Whitney
On Tue, 2008-04-22 at 17:15 +0100, Mel Gorman wrote:
> On (22/04/08 11:17), Lee Schermerhorn didst pronounce:
> > Mel:
> >
> > I was testing my "lazy migration" patches and noticed something
> > interesting about first_zones_zonelist(). I use this function to find
> > the target node for MPOL_BIND policy to determine if a page is
> > "misplaced" and should be migrated. In my testing, I found that I was
> > always "off by one". E.g., if my mempolicy nodemask contained only node
> > 2, I'd migrate to node 3. If it contained node 3, I'd migrate to node 0
> > [on a 4-node platform], etc.
> >
> > Following the usage in slab_node(), I was doing something like:
> >
> > zr = first_zones_zonelist(node_zonelist(nid, ...), gfp_zone(...),
> > &pol->v.vnodes, &dummy);
> > newnid = zonelist_node_idx(zr);
> >
> > Turns out that the return value is the NEXT zoneref in the zonelist
> > AFTER the one of interest
>
> Yes, the intention was that the cursor (zr) was meant to be pointing to
> the next reference likely to be of interest. Bad usage of the cursor was
> a pretty stupid mistake particularly as the cursor was implemented this
> way intentionally.
>
> /me beats self with clue-stick
>
> > --i.e., the first that satisfies any nodemask
> > constraint. I renamed 'dummy' to 'zone', ignore the return value and
> > use: newnid = zone->node. [I guess I could use zonelist_node_idx(zr
> > -1) as well.]
>
> zr - 1 would be vunerable to the iterator implementation changing.
Ah, good point. Shouldn't peek under the covers like that.
>
> > This results in page migration to the expected node.
> >
>
> This use of zone instead of the zoneref cursor should be made throughout.
>
> > Anyway, after discovering this, I checked other usages of
> > first_zones_zonelist() outside of the iterator macros, and I THINK they
> > might be making the same mistake?
> >
>
> Yes, you're right.
>
> > Here's a patch that "fixes" these. Do you agree? Or am I
> > misunderstanding this area [again!]?
> >
>
> No, I screwed up with the use of cursors and didn't get caught for it as
> the effect would be very difficult to spot normally. I extended your patch
> slightly below to catch the other callers. Can you take a read-through please?
OK. Looks good. I see I missed one case.
A suggestion. How about enhancing the comment [maybe a kernel doc
block?] on first_zones_zonelist() to explain that it returns the zone
via the zone parameter and that the return value is a cursor for
iterators? Perhaps similarly for next_zones_zonelist() in mmzone.c?
Or would you like me to take a cut at this?
Lee
<patch snipped>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Suspect use of "first_zones_zonelist()"
2008-04-22 17:10 ` Lee Schermerhorn
@ 2008-04-22 17:49 ` Mel Gorman
2008-04-22 18:01 ` Lee Schermerhorn
0 siblings, 1 reply; 7+ messages in thread
From: Mel Gorman @ 2008-04-22 17:49 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: Andrew Morton, linux-mm, Eric Whitney
On (22/04/08 13:10), Lee Schermerhorn didst pronounce:
> > > <SNIP>
> > >
> > > --i.e., the first that satisfies any nodemask
> > > constraint. I renamed 'dummy' to 'zone', ignore the return value and
> > > use: newnid = zone->node. [I guess I could use zonelist_node_idx(zr
> > > -1) as well.]
> >
> > zr - 1 would be vunerable to the iterator implementation changing.
>
> Ah, good point. Shouldn't peek under the covers like that.
>
> >
> > > This results in page migration to the expected node.
> > >
> >
> > This use of zone instead of the zoneref cursor should be made throughout.
> >
> > > Anyway, after discovering this, I checked other usages of
> > > first_zones_zonelist() outside of the iterator macros, and I THINK they
> > > might be making the same mistake?
> > >
> >
> > Yes, you're right.
> >
> > > Here's a patch that "fixes" these. Do you agree? Or am I
> > > misunderstanding this area [again!]?
> > >
> >
> > No, I screwed up with the use of cursors and didn't get caught for it as
> > the effect would be very difficult to spot normally. I extended your patch
> > slightly below to catch the other callers. Can you take a read-through please?
>
> OK. Looks good. I see I missed one case.
>
> A suggestion. How about enhancing the comment [maybe a kernel doc
> block?] on first_zones_zonelist() to explain that it returns the zone
> via the zone parameter and that the return value is a cursor for
> iterators? Perhaps similarly for next_zones_zonelist() in mmzone.c?
>
> Or would you like me to take a cut at this?
>
That's a good suggestion. How about the following?
=======
Subject: [PATCH] fix off-by-one usage of first_zones_zonelist()
Against: 2.6.25-mm1
The return value of first_zones_zonelist() is actually the zoneref
AFTER the "requested zone"--i.e., the first zone in the zonelist
that satisfies any nodemask constraint. The "requested zone" is
returned via the @zone parameter. The returned zoneref is intended
to be passed to next_zones_zonelist() on subsequent iterations.
This patch fixes first_zones_zonelist() callers appropriately and documents
first_zones_zonelist() to help avoid a repeat of the same mistake.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
fs/buffer.c | 9 ++++-----
include/linux/mmzone.h | 25 ++++++++++++++++++++++++-
mm/mempolicy.c | 9 ++++-----
mm/page_alloc.c | 4 ++--
4 files changed, 34 insertions(+), 13 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/fs/buffer.c linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/fs/buffer.c
--- linux-2.6.25-mm1-clean/fs/buffer.c 2008-04-22 10:30:02.000000000 +0100
+++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/fs/buffer.c 2008-04-22 18:35:56.000000000 +0100
@@ -368,18 +368,17 @@ void invalidate_bdev(struct block_device
*/
static void free_more_memory(void)
{
- struct zoneref *zrefs;
- struct zone *dummy;
+ struct zone *zone;
int nid;
wakeup_pdflush(1024);
yield();
for_each_online_node(nid) {
- zrefs = first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
+ (void)first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
gfp_zone(GFP_NOFS), NULL,
- &dummy);
- if (zrefs->zone)
+ &zone);
+ if (zone)
try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
GFP_NOFS);
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/include/linux/mmzone.h linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/include/linux/mmzone.h
--- linux-2.6.25-mm1-clean/include/linux/mmzone.h 2008-04-22 10:30:03.000000000 +0100
+++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/include/linux/mmzone.h 2008-04-22 18:45:37.000000000 +0100
@@ -742,12 +742,36 @@ static inline int zonelist_node_idx(stru
#endif /* CONFIG_NUMA */
}
+/**
+ * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point
+ * @z - The cursor used as a starting point for the search
+ * @highest_zoneidx - The zone index of the highest zone to return
+ * @nodes - An optional nodemask to filter the zonelist with
+ * @zone - The first suitable zone found is returned via this parameter
+ *
+ * This function returns the next zone at or below a given zone index that is
+ * within the allowed nodemask using a cursor as the starting point for the
+ * search. The zoneref returned is a cursor that is used as the next starting
+ * point for future calls to next_zones_zonelist().
+ */
struct zoneref *next_zones_zonelist(struct zoneref *z,
enum zone_type highest_zoneidx,
nodemask_t *nodes,
struct zone **zone);
-/* Returns the first zone at or below highest_zoneidx in a zonelist */
+/**
+ * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
+ * @zonelist - The zonelist to search for a suitable zone
+ * @highest_zoneidx - The zone index of the highest zone to return
+ * @nodes - An optional nodemask to filter the zonelist with
+ * @zone - The first suitable zone found is returned via this parameter
+ *
+ * This function returns the first zone at or below a given zone index that is
+ * within the allowed nodemask. The zoneref returned is a cursor that can be
+ * used to iterate the zonelist with next_zones_zonelist. The cursor should
+ * not be used by the caller as it does not match the value of the zone
+ * returned.
+ */
static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
enum zone_type highest_zoneidx,
nodemask_t *nodes,
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/mm/mempolicy.c linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/mempolicy.c
--- linux-2.6.25-mm1-clean/mm/mempolicy.c 2008-04-22 10:30:04.000000000 +0100
+++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/mempolicy.c 2008-04-22 18:35:56.000000000 +0100
@@ -1396,14 +1396,13 @@ unsigned slab_node(struct mempolicy *pol
* first node.
*/
struct zonelist *zonelist;
- struct zoneref *z;
- struct zone *dummy;
+ struct zone *zone;
enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
zonelist = &NODE_DATA(numa_node_id())->node_zonelists[0];
- z = first_zones_zonelist(zonelist, highest_zoneidx,
+ (void)first_zones_zonelist(zonelist, highest_zoneidx,
&policy->v.nodes,
- &dummy);
- return zonelist_node_idx(z);
+ &zone);
+ return zone->node;
}
default:
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/mm/page_alloc.c linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/page_alloc.c
--- linux-2.6.25-mm1-clean/mm/page_alloc.c 2008-04-22 10:30:04.000000000 +0100
+++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/page_alloc.c 2008-04-22 18:35:56.000000000 +0100
@@ -1412,9 +1412,9 @@ get_page_from_freelist(gfp_t gfp_mask, n
int zlc_active = 0; /* set if using zonelist_cache */
int did_zlc_setup = 0; /* just call zlc_setup() one time */
- z = first_zones_zonelist(zonelist, high_zoneidx, nodemask,
+ (void)first_zones_zonelist(zonelist, high_zoneidx, nodemask,
&preferred_zone);
- classzone_idx = zonelist_zone_idx(z);
+ classzone_idx = zone_idx(preferred_zone);
zonelist_scan:
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Suspect use of "first_zones_zonelist()"
2008-04-22 17:49 ` Mel Gorman
@ 2008-04-22 18:01 ` Lee Schermerhorn
2008-04-22 18:40 ` Lee Schermerhorn
0 siblings, 1 reply; 7+ messages in thread
From: Lee Schermerhorn @ 2008-04-22 18:01 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, linux-mm, Eric Whitney
On Tue, 2008-04-22 at 18:49 +0100, Mel Gorman wrote:
> On (22/04/08 13:10), Lee Schermerhorn didst pronounce:
> > > > <SNIP>
> > > >
> > > > --i.e., the first that satisfies any nodemask
> > > > constraint. I renamed 'dummy' to 'zone', ignore the return value and
> > > > use: newnid = zone->node. [I guess I could use zonelist_node_idx(zr
> > > > -1) as well.]
> > >
> > > zr - 1 would be vunerable to the iterator implementation changing.
> >
> > Ah, good point. Shouldn't peek under the covers like that.
> >
> > >
> > > > This results in page migration to the expected node.
> > > >
> > >
> > > This use of zone instead of the zoneref cursor should be made throughout.
> > >
> > > > Anyway, after discovering this, I checked other usages of
> > > > first_zones_zonelist() outside of the iterator macros, and I THINK they
> > > > might be making the same mistake?
> > > >
> > >
> > > Yes, you're right.
> > >
> > > > Here's a patch that "fixes" these. Do you agree? Or am I
> > > > misunderstanding this area [again!]?
> > > >
> > >
> > > No, I screwed up with the use of cursors and didn't get caught for it as
> > > the effect would be very difficult to spot normally. I extended your patch
> > > slightly below to catch the other callers. Can you take a read-through please?
> >
> > OK. Looks good. I see I missed one case.
> >
> > A suggestion. How about enhancing the comment [maybe a kernel doc
> > block?] on first_zones_zonelist() to explain that it returns the zone
> > via the zone parameter and that the return value is a cursor for
> > iterators? Perhaps similarly for next_zones_zonelist() in mmzone.c?
> >
> > Or would you like me to take a cut at this?
> >
>
> That's a good suggestion. How about the following?
Very nice. Wish I'd had that to reference earlier :)
Lee
>
> =======
> Subject: [PATCH] fix off-by-one usage of first_zones_zonelist()
>
> Against: 2.6.25-mm1
>
> The return value of first_zones_zonelist() is actually the zoneref
> AFTER the "requested zone"--i.e., the first zone in the zonelist
> that satisfies any nodemask constraint. The "requested zone" is
> returned via the @zone parameter. The returned zoneref is intended
> to be passed to next_zones_zonelist() on subsequent iterations.
>
> This patch fixes first_zones_zonelist() callers appropriately and documents
> first_zones_zonelist() to help avoid a repeat of the same mistake.
>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
> fs/buffer.c | 9 ++++-----
> include/linux/mmzone.h | 25 ++++++++++++++++++++++++-
> mm/mempolicy.c | 9 ++++-----
> mm/page_alloc.c | 4 ++--
> 4 files changed, 34 insertions(+), 13 deletions(-)
>
> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/fs/buffer.c linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/fs/buffer.c
> --- linux-2.6.25-mm1-clean/fs/buffer.c 2008-04-22 10:30:02.000000000 +0100
> +++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/fs/buffer.c 2008-04-22 18:35:56.000000000 +0100
> @@ -368,18 +368,17 @@ void invalidate_bdev(struct block_device
> */
> static void free_more_memory(void)
> {
> - struct zoneref *zrefs;
> - struct zone *dummy;
> + struct zone *zone;
> int nid;
>
> wakeup_pdflush(1024);
> yield();
>
> for_each_online_node(nid) {
> - zrefs = first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
> + (void)first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
> gfp_zone(GFP_NOFS), NULL,
> - &dummy);
> - if (zrefs->zone)
> + &zone);
> + if (zone)
> try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
> GFP_NOFS);
> }
> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/include/linux/mmzone.h linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/include/linux/mmzone.h
> --- linux-2.6.25-mm1-clean/include/linux/mmzone.h 2008-04-22 10:30:03.000000000 +0100
> +++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/include/linux/mmzone.h 2008-04-22 18:45:37.000000000 +0100
> @@ -742,12 +742,36 @@ static inline int zonelist_node_idx(stru
> #endif /* CONFIG_NUMA */
> }
>
> +/**
> + * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point
> + * @z - The cursor used as a starting point for the search
> + * @highest_zoneidx - The zone index of the highest zone to return
> + * @nodes - An optional nodemask to filter the zonelist with
> + * @zone - The first suitable zone found is returned via this parameter
> + *
> + * This function returns the next zone at or below a given zone index that is
> + * within the allowed nodemask using a cursor as the starting point for the
> + * search. The zoneref returned is a cursor that is used as the next starting
> + * point for future calls to next_zones_zonelist().
> + */
> struct zoneref *next_zones_zonelist(struct zoneref *z,
> enum zone_type highest_zoneidx,
> nodemask_t *nodes,
> struct zone **zone);
>
> -/* Returns the first zone at or below highest_zoneidx in a zonelist */
> +/**
> + * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
> + * @zonelist - The zonelist to search for a suitable zone
> + * @highest_zoneidx - The zone index of the highest zone to return
> + * @nodes - An optional nodemask to filter the zonelist with
> + * @zone - The first suitable zone found is returned via this parameter
> + *
> + * This function returns the first zone at or below a given zone index that is
> + * within the allowed nodemask. The zoneref returned is a cursor that can be
> + * used to iterate the zonelist with next_zones_zonelist. The cursor should
> + * not be used by the caller as it does not match the value of the zone
> + * returned.
> + */
> static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
> enum zone_type highest_zoneidx,
> nodemask_t *nodes,
> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/mm/mempolicy.c linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/mempolicy.c
> --- linux-2.6.25-mm1-clean/mm/mempolicy.c 2008-04-22 10:30:04.000000000 +0100
> +++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/mempolicy.c 2008-04-22 18:35:56.000000000 +0100
> @@ -1396,14 +1396,13 @@ unsigned slab_node(struct mempolicy *pol
> * first node.
> */
> struct zonelist *zonelist;
> - struct zoneref *z;
> - struct zone *dummy;
> + struct zone *zone;
> enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
> zonelist = &NODE_DATA(numa_node_id())->node_zonelists[0];
> - z = first_zones_zonelist(zonelist, highest_zoneidx,
> + (void)first_zones_zonelist(zonelist, highest_zoneidx,
> &policy->v.nodes,
> - &dummy);
> - return zonelist_node_idx(z);
> + &zone);
> + return zone->node;
> }
>
> default:
> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/mm/page_alloc.c linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/page_alloc.c
> --- linux-2.6.25-mm1-clean/mm/page_alloc.c 2008-04-22 10:30:04.000000000 +0100
> +++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/page_alloc.c 2008-04-22 18:35:56.000000000 +0100
> @@ -1412,9 +1412,9 @@ get_page_from_freelist(gfp_t gfp_mask, n
> int zlc_active = 0; /* set if using zonelist_cache */
> int did_zlc_setup = 0; /* just call zlc_setup() one time */
>
> - z = first_zones_zonelist(zonelist, high_zoneidx, nodemask,
> + (void)first_zones_zonelist(zonelist, high_zoneidx, nodemask,
> &preferred_zone);
> - classzone_idx = zonelist_zone_idx(z);
> + classzone_idx = zone_idx(preferred_zone);
>
> zonelist_scan:
> /*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Suspect use of "first_zones_zonelist()"
2008-04-22 18:01 ` Lee Schermerhorn
@ 2008-04-22 18:40 ` Lee Schermerhorn
2008-04-23 6:02 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Lee Schermerhorn @ 2008-04-22 18:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: Mel Gorman, linux-mm, Eric Whitney
On Tue, 2008-04-22 at 14:01 -0400, Lee Schermerhorn wrote:
> On Tue, 2008-04-22 at 18:49 +0100, Mel Gorman wrote:
> > On (22/04/08 13:10), Lee Schermerhorn didst pronounce:
<snip>
Andrew: I should have mentioned that this patch is a fix to:
mm-filter-based-on-a-nodemask-as-well-as-a-gfp_mask-deporkify.patch
currently in the 25-mm1 tree.
Lee
> > >
> > > A suggestion. How about enhancing the comment [maybe a kernel doc
> > > block?] on first_zones_zonelist() to explain that it returns the zone
> > > via the zone parameter and that the return value is a cursor for
> > > iterators? Perhaps similarly for next_zones_zonelist() in mmzone.c?
> > >
> > > Or would you like me to take a cut at this?
> > >
> >
> > That's a good suggestion. How about the following?
>
> Very nice. Wish I'd had that to reference earlier :)
>
> Lee
> >
> > =======
> > Subject: [PATCH] fix off-by-one usage of first_zones_zonelist()
> >
> > Against: 2.6.25-mm1
> >
> > The return value of first_zones_zonelist() is actually the zoneref
> > AFTER the "requested zone"--i.e., the first zone in the zonelist
> > that satisfies any nodemask constraint. The "requested zone" is
> > returned via the @zone parameter. The returned zoneref is intended
> > to be passed to next_zones_zonelist() on subsequent iterations.
> >
> > This patch fixes first_zones_zonelist() callers appropriately and documents
> > first_zones_zonelist() to help avoid a repeat of the same mistake.
> >
> > Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> > fs/buffer.c | 9 ++++-----
> > include/linux/mmzone.h | 25 ++++++++++++++++++++++++-
> > mm/mempolicy.c | 9 ++++-----
> > mm/page_alloc.c | 4 ++--
> > 4 files changed, 34 insertions(+), 13 deletions(-)
> >
> > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/fs/buffer.c linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/fs/buffer.c
> > --- linux-2.6.25-mm1-clean/fs/buffer.c 2008-04-22 10:30:02.000000000 +0100
> > +++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/fs/buffer.c 2008-04-22 18:35:56.000000000 +0100
> > @@ -368,18 +368,17 @@ void invalidate_bdev(struct block_device
> > */
> > static void free_more_memory(void)
> > {
> > - struct zoneref *zrefs;
> > - struct zone *dummy;
> > + struct zone *zone;
> > int nid;
> >
> > wakeup_pdflush(1024);
> > yield();
> >
> > for_each_online_node(nid) {
> > - zrefs = first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
> > + (void)first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
> > gfp_zone(GFP_NOFS), NULL,
> > - &dummy);
> > - if (zrefs->zone)
> > + &zone);
> > + if (zone)
> > try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
> > GFP_NOFS);
> > }
> > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/include/linux/mmzone.h linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/include/linux/mmzone.h
> > --- linux-2.6.25-mm1-clean/include/linux/mmzone.h 2008-04-22 10:30:03.000000000 +0100
> > +++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/include/linux/mmzone.h 2008-04-22 18:45:37.000000000 +0100
> > @@ -742,12 +742,36 @@ static inline int zonelist_node_idx(stru
> > #endif /* CONFIG_NUMA */
> > }
> >
> > +/**
> > + * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point
> > + * @z - The cursor used as a starting point for the search
> > + * @highest_zoneidx - The zone index of the highest zone to return
> > + * @nodes - An optional nodemask to filter the zonelist with
> > + * @zone - The first suitable zone found is returned via this parameter
> > + *
> > + * This function returns the next zone at or below a given zone index that is
> > + * within the allowed nodemask using a cursor as the starting point for the
> > + * search. The zoneref returned is a cursor that is used as the next starting
> > + * point for future calls to next_zones_zonelist().
> > + */
> > struct zoneref *next_zones_zonelist(struct zoneref *z,
> > enum zone_type highest_zoneidx,
> > nodemask_t *nodes,
> > struct zone **zone);
> >
> > -/* Returns the first zone at or below highest_zoneidx in a zonelist */
> > +/**
> > + * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
> > + * @zonelist - The zonelist to search for a suitable zone
> > + * @highest_zoneidx - The zone index of the highest zone to return
> > + * @nodes - An optional nodemask to filter the zonelist with
> > + * @zone - The first suitable zone found is returned via this parameter
> > + *
> > + * This function returns the first zone at or below a given zone index that is
> > + * within the allowed nodemask. The zoneref returned is a cursor that can be
> > + * used to iterate the zonelist with next_zones_zonelist. The cursor should
> > + * not be used by the caller as it does not match the value of the zone
> > + * returned.
> > + */
> > static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
> > enum zone_type highest_zoneidx,
> > nodemask_t *nodes,
> > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/mm/mempolicy.c linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/mempolicy.c
> > --- linux-2.6.25-mm1-clean/mm/mempolicy.c 2008-04-22 10:30:04.000000000 +0100
> > +++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/mempolicy.c 2008-04-22 18:35:56.000000000 +0100
> > @@ -1396,14 +1396,13 @@ unsigned slab_node(struct mempolicy *pol
> > * first node.
> > */
> > struct zonelist *zonelist;
> > - struct zoneref *z;
> > - struct zone *dummy;
> > + struct zone *zone;
> > enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
> > zonelist = &NODE_DATA(numa_node_id())->node_zonelists[0];
> > - z = first_zones_zonelist(zonelist, highest_zoneidx,
> > + (void)first_zones_zonelist(zonelist, highest_zoneidx,
> > &policy->v.nodes,
> > - &dummy);
> > - return zonelist_node_idx(z);
> > + &zone);
> > + return zone->node;
> > }
> >
> > default:
> > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-clean/mm/page_alloc.c linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/page_alloc.c
> > --- linux-2.6.25-mm1-clean/mm/page_alloc.c 2008-04-22 10:30:04.000000000 +0100
> > +++ linux-2.6.25-mm1-fix-first_zone_zonelist-v1r2/mm/page_alloc.c 2008-04-22 18:35:56.000000000 +0100
> > @@ -1412,9 +1412,9 @@ get_page_from_freelist(gfp_t gfp_mask, n
> > int zlc_active = 0; /* set if using zonelist_cache */
> > int did_zlc_setup = 0; /* just call zlc_setup() one time */
> >
> > - z = first_zones_zonelist(zonelist, high_zoneidx, nodemask,
> > + (void)first_zones_zonelist(zonelist, high_zoneidx, nodemask,
> > &preferred_zone);
> > - classzone_idx = zonelist_zone_idx(z);
> > + classzone_idx = zone_idx(preferred_zone);
> >
> > zonelist_scan:
> > /*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Suspect use of "first_zones_zonelist()"
2008-04-22 18:40 ` Lee Schermerhorn
@ 2008-04-23 6:02 ` Andrew Morton
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2008-04-23 6:02 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: mel, linux-mm, eric.whitney
> On Tue, 22 Apr 2008 14:40:39 -0400 Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> On Tue, 2008-04-22 at 14:01 -0400, Lee Schermerhorn wrote:
> > On Tue, 2008-04-22 at 18:49 +0100, Mel Gorman wrote:
> > > On (22/04/08 13:10), Lee Schermerhorn didst pronounce:
> <snip>
>
> Andrew: I should have mentioned that this patch is a fix to:
>
> mm-filter-based-on-a-nodemask-as-well-as-a-gfp_mask-deporkify.patch
>
> currently in the 25-mm1 tree.
Noted, thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-04-23 6:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-22 15:17 Suspect use of "first_zones_zonelist()" Lee Schermerhorn
2008-04-22 16:15 ` Mel Gorman
2008-04-22 17:10 ` Lee Schermerhorn
2008-04-22 17:49 ` Mel Gorman
2008-04-22 18:01 ` Lee Schermerhorn
2008-04-22 18:40 ` Lee Schermerhorn
2008-04-23 6:02 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox