* [PATCH] Mark the correct zone as full when scanning zonelists
@ 2008-09-11 21:25 Mel Gorman
2008-09-11 21:41 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2008-09-11 21:25 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu, apw
The for_each_zone_zonelist() uses a struct zoneref *z cursor when scanning
zonelists to keep track of where in the zonelist it is. The zoneref that
is returned corresponds to the the next zone that is to be scanned, not
the current one as it originally thought of as an opaque list.
When the page allocator is scanning a zonelist, it marks zones that it
temporarily full zones to eliminate near-future scanning attempts. It uses
the zoneref for the marking and consequently the incorrect zone gets marked
full. This leads to a suitable zone being skipped in the mistaken belief
it is full. This patch corrects the problem by changing zoneref to be the
current zone being scanned instead of the next one.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
include/linux/mmzone.h | 12 ++++++------
mm/mmzone.c | 2 +-
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 443bc7c..428328a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -751,8 +751,9 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
*
* This function returns the next zone at or below a given zone index that is
* within the allowed nodemask using a cursor as the starting point for the
- * search. The zoneref returned is a cursor that is used as the next starting
- * point for future calls to next_zones_zonelist().
+ * search. The zoneref returned is a cursor that represents the current zone
+ * being examined. It should be advanced by one before calling
+ * next_zones_zonelist again.
*/
struct zoneref *next_zones_zonelist(struct zoneref *z,
enum zone_type highest_zoneidx,
@@ -768,9 +769,8 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
*
* This function returns the first zone at or below a given zone index that is
* within the allowed nodemask. The zoneref returned is a cursor that can be
- * used to iterate the zonelist with next_zones_zonelist. The cursor should
- * not be used by the caller as it does not match the value of the zone
- * returned.
+ * used to iterate the zonelist with next_zones_zonelist by advancing it by
+ * one before calling.
*/
static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
enum zone_type highest_zoneidx,
@@ -795,7 +795,7 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
#define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone); \
zone; \
- z = next_zones_zonelist(z, highidx, nodemask, &zone)) \
+ z = next_zones_zonelist(++z, highidx, nodemask, &zone)) \
/**
* for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
diff --git a/mm/mmzone.c b/mm/mmzone.c
index 486ed59..16ce8b9 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -69,6 +69,6 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
(z->zone && !zref_in_nodemask(z, nodes)))
z++;
- *zone = zonelist_zone(z++);
+ *zone = zonelist_zone(z);
return z;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Mark the correct zone as full when scanning zonelists
2008-09-11 21:25 [PATCH] Mark the correct zone as full when scanning zonelists Mel Gorman
@ 2008-09-11 21:41 ` Andrew Morton
2008-09-12 1:10 ` KAMEZAWA Hiroyuki
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Andrew Morton @ 2008-09-11 21:41 UTC (permalink / raw)
To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu, apw
On Thu, 11 Sep 2008 22:25:51 +0100
Mel Gorman <mel@csn.ul.ie> wrote:
> The for_each_zone_zonelist() uses a struct zoneref *z cursor when scanning
> zonelists to keep track of where in the zonelist it is. The zoneref that
> is returned corresponds to the the next zone that is to be scanned, not
> the current one as it originally thought of as an opaque list.
>
> When the page allocator is scanning a zonelist, it marks zones that it
> temporarily full zones to eliminate near-future scanning attempts.
That sentence needs help.
> It uses
> the zoneref for the marking and consequently the incorrect zone gets marked
> full. This leads to a suitable zone being skipped in the mistaken belief
> it is full. This patch corrects the problem by changing zoneref to be the
> current zone being scanned instead of the next one.
Applicable to 2.6.26 as well, yes?
Someone reported a bug a few weeks ago which I think this patch will fix,
yes? I don't remember who that was, nor do I recall the precise details
of what the userspace-visible (mis)behaviour was.
Are you able to fill in the gaps here? Put yourself in the position of
a poor little -stable maintainer scratching his head wondering ytf he
was sent this patch.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Mark the correct zone as full when scanning zonelists
2008-09-11 21:41 ` Andrew Morton
@ 2008-09-12 1:10 ` KAMEZAWA Hiroyuki
2008-09-12 20:37 ` Mel Gorman
2008-09-12 18:58 ` Mel Gorman
2008-09-15 23:20 ` Mel Gorman
2 siblings, 1 reply; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-12 1:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Mel Gorman, linux-mm, linux-kernel, apw
On Thu, 11 Sep 2008 14:41:55 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 11 Sep 2008 22:25:51 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > The for_each_zone_zonelist() uses a struct zoneref *z cursor when scanning
> > zonelists to keep track of where in the zonelist it is. The zoneref that
> > is returned corresponds to the the next zone that is to be scanned, not
> > the current one as it originally thought of as an opaque list.
> >
> > When the page allocator is scanning a zonelist, it marks zones that it
> > temporarily full zones to eliminate near-future scanning attempts.
>
> That sentence needs help.
>
Hmm, should we rename
next_zone_zonelist() => get_appropriate_zone_from_list()
or some better name ?
> > It uses
> > the zoneref for the marking and consequently the incorrect zone gets marked
> > full. This leads to a suitable zone being skipped in the mistaken belief
> > it is full. This patch corrects the problem by changing zoneref to be the
> > current zone being scanned instead of the next one.
>
> Applicable to 2.6.26 as well, yes?
>
Maybe yes. But it's better to show where this patch really fixes.
Is this a fix for misunderstanding usage of zoneref in
mm/page_alloc.c::get_page_from_freelist() ?
== here ?==
if (NUMA_BUILD)
zlc_mark_zone_full(zonelist, z);
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Mark the correct zone as full when scanning zonelists
2008-09-11 21:41 ` Andrew Morton
2008-09-12 1:10 ` KAMEZAWA Hiroyuki
@ 2008-09-12 18:58 ` Mel Gorman
2008-09-15 23:20 ` Mel Gorman
2 siblings, 0 replies; 6+ messages in thread
From: Mel Gorman @ 2008-09-12 18:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu, apw
On (11/09/08 14:41), Andrew Morton didst pronounce:
> On Thu, 11 Sep 2008 22:25:51 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > The for_each_zone_zonelist() uses a struct zoneref *z cursor when scanning
> > zonelists to keep track of where in the zonelist it is. The zoneref that
> > is returned corresponds to the the next zone that is to be scanned, not
> > the current one as it originally thought of as an opaque list.
> >
> > When the page allocator is scanning a zonelist, it marks zones that it
> > temporarily full zones to eliminate near-future scanning attempts.
>
> That sentence needs help.
>
I've posted a revised leader below.
> > It uses
> > the zoneref for the marking and consequently the incorrect zone gets marked
> > full. This leads to a suitable zone being skipped in the mistaken belief
> > it is full. This patch corrects the problem by changing zoneref to be the
> > current zone being scanned instead of the next one.
>
> Applicable to 2.6.26 as well, yes?
>
Yes. I was going to get it right for mainline first before posting to
stable.
>
> Someone reported a bug a few weeks ago which I think this patch will fix,
> yes? I don't remember who that was, nor do I recall the precise details
> of what the userspace-visible (mis)behaviour was.
>
> Are you able to fill in the gaps here? Put yourself in the position of
> a poor little -stable maintainer scratching his head wondering ytf he
> was sent this patch.
>
I'm not aware of this bug but I'll go digging for it and see what I
find. Thanks
=== Begin revised changelog ===
The iterator for_each_zone_zonelist() uses a struct zoneref *z cursor when
scanning zonelists to keep track of where in the zonelist it is. The zoneref
that is returned corresponds to the the next zone that is to be scanned,
not the current one. It was intended to be treated as an opaque list.
When the page allocator is scanning a zonelist, it marks elements in the
zonelist corresponding to zones that are temporarily full. As the zonelist
is being updated, it uses the cursor here;
if (NUMA_BUILD)
zlc_mark_zone_full(zonelist, z);
This is intended to prevent rescanning in the near future but the zoneref
cursor does not correspond to the zone that has been found to be full. This is
an easy misunderstanding to make so this patch corrects the problem by changing
zoneref cursor to be the current zone being scanned instead of the next one.
This issue affects 2.6.26.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Mark the correct zone as full when scanning zonelists
2008-09-12 1:10 ` KAMEZAWA Hiroyuki
@ 2008-09-12 20:37 ` Mel Gorman
0 siblings, 0 replies; 6+ messages in thread
From: Mel Gorman @ 2008-09-12 20:37 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: Andrew Morton, linux-mm, linux-kernel, apw
On (12/09/08 10:10), KAMEZAWA Hiroyuki didst pronounce:
> On Thu, 11 Sep 2008 14:41:55 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > On Thu, 11 Sep 2008 22:25:51 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> >
> > > The for_each_zone_zonelist() uses a struct zoneref *z cursor when scanning
> > > zonelists to keep track of where in the zonelist it is. The zoneref that
> > > is returned corresponds to the the next zone that is to be scanned, not
> > > the current one as it originally thought of as an opaque list.
> > >
> > > When the page allocator is scanning a zonelist, it marks zones that it
> > > temporarily full zones to eliminate near-future scanning attempts.
> >
> > That sentence needs help.
> >
> Hmm, should we rename
>
> next_zone_zonelist() => get_appropriate_zone_from_list()
>
> or some better name ?
>
Maybe as a separate patch, but to be honest the name still makes sense
to me but I'm biased.
> > > It uses
> > > the zoneref for the marking and consequently the incorrect zone gets marked
> > > full. This leads to a suitable zone being skipped in the mistaken belief
> > > it is full. This patch corrects the problem by changing zoneref to be the
> > > current zone being scanned instead of the next one.
> >
> > Applicable to 2.6.26 as well, yes?
> >
> Maybe yes. But it's better to show where this patch really fixes.
> Is this a fix for misunderstanding usage of zoneref in
> mm/page_alloc.c::get_page_from_freelist() ?
>
> == here ?==
> if (NUMA_BUILD)
> zlc_mark_zone_full(zonelist, z);
>
I spelled this out a bit better hopefully in the updated changelog.
Thanks
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Mark the correct zone as full when scanning zonelists
2008-09-11 21:41 ` Andrew Morton
2008-09-12 1:10 ` KAMEZAWA Hiroyuki
2008-09-12 18:58 ` Mel Gorman
@ 2008-09-15 23:20 ` Mel Gorman
2 siblings, 0 replies; 6+ messages in thread
From: Mel Gorman @ 2008-09-15 23:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu, apw
On (11/09/08 14:41), Andrew Morton didst pronounce:
> On Thu, 11 Sep 2008 22:25:51 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > The for_each_zone_zonelist() uses a struct zoneref *z cursor when scanning
> > zonelists to keep track of where in the zonelist it is. The zoneref that
> > is returned corresponds to the the next zone that is to be scanned, not
> > the current one as it originally thought of as an opaque list.
> >
> > When the page allocator is scanning a zonelist, it marks zones that it
> > temporarily full zones to eliminate near-future scanning attempts.
>
> That sentence needs help.
>
> > It uses
> > the zoneref for the marking and consequently the incorrect zone gets marked
> > full. This leads to a suitable zone being skipped in the mistaken belief
> > it is full. This patch corrects the problem by changing zoneref to be the
> > current zone being scanned instead of the next one.
>
> Applicable to 2.6.26 as well, yes?
>
>
> Someone reported a bug a few weeks ago which I think this patch will fix,
> yes? I don't remember who that was, nor do I recall the precise details
> of what the userspace-visible (mis)behaviour was.
>
> Are you able to fill in the gaps here?
I searched through the archives and couldn't find a bug report that this
patch may be the fix to. However, I understand that the initial leader
could have been a lot better.
> Put yourself in the position of
> a poor little -stable maintainer scratching his head wondering ytf he
> was sent this patch.
>
> Thanks.
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-09-15 23:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-11 21:25 [PATCH] Mark the correct zone as full when scanning zonelists Mel Gorman
2008-09-11 21:41 ` Andrew Morton
2008-09-12 1:10 ` KAMEZAWA Hiroyuki
2008-09-12 20:37 ` Mel Gorman
2008-09-12 18:58 ` Mel Gorman
2008-09-15 23:20 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox