* bind_zonelist() - are we definitely sizing this correctly?
@ 2007-07-26 14:17 Mel Gorman
2007-07-26 14:25 ` Mel Gorman
0 siblings, 1 reply; 3+ messages in thread
From: Mel Gorman @ 2007-07-26 14:17 UTC (permalink / raw)
To: Lee Schermerhorn, ak, Christoph Lameter, apw, kamezawa.hiroyu
Cc: linux-mm, linux-kernel
I was looking closer at bind_zonelist() and it has the following snippet
struct zonelist *zl;
int num, max, nd;
enum zone_type k;
max = 1 + MAX_NR_ZONES * nodes_weight(*nodes);
max++; /* space for zlcache_ptr (see mmzone.h) */
zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL);
if (!zl)
return ERR_PTR(-ENOMEM);
That set off alarm bells because we are allocating based on the size of a
zone, not the size of the zonelist.
This is the definition of struct zonelist
struct zonelist {
struct zonelist_cache *zlcache_ptr; // NULL or &zlcache
struct zone *zones[MAX_ZONES_PER_ZONELIST + 1]; // NULL delimited
#ifdef CONFIG_NUMA
struct zonelist_cache zlcache; // optional ...
#endif
};
Important thing to note here is that zlcache is after *zones and it is
not a pointer. zlcache in turn is defined as
struct zonelist_cache {
unsigned short z_to_n[MAX_ZONES_PER_ZONELIST]; /* zone->nid */
DECLARE_BITMAP(fullzones, MAX_ZONES_PER_ZONELIST); /* zone full? */
unsigned long last_full_zap; /* when last zap'd (jiffies) */
};
This is on NUMA only and it's a big structure.
The intention of bind_zonelist() appears to be that we only allocate enough
memory to hold all the zones in the active nodes. This was fine in 2.6.19
but now with zlcache after *zones[], I think we are in danger of allocating
too little memory and any reading of zlcache may be reading randomness when
MPOL_BIND is in use because it will be using the full offset within the
structure whether the memory is allocated or not.
At the risk of sounding stupid, what obvious thing am I missing that makes
this work?
If I'm right and this is broken and we still want to allocate as little memory
as possible, zlcache has to move before zones and the call to kmalloc needs
to take the size of zlcache into account.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: bind_zonelist() - are we definitely sizing this correctly?
2007-07-26 14:17 bind_zonelist() - are we definitely sizing this correctly? Mel Gorman
@ 2007-07-26 14:25 ` Mel Gorman
2007-07-26 17:58 ` Christoph Lameter
0 siblings, 1 reply; 3+ messages in thread
From: Mel Gorman @ 2007-07-26 14:25 UTC (permalink / raw)
To: Lee Schermerhorn, ak, Christoph Lameter, apw, kamezawa.hiroyu
Cc: linux-mm, linux-kernel
On (26/07/07 15:17), Mel Gorman didst pronounce:
> I was looking closer at bind_zonelist() and it has the following snippet
>
> struct zonelist *zl;
> int num, max, nd;
> enum zone_type k;
>
> max = 1 + MAX_NR_ZONES * nodes_weight(*nodes);
> max++; /* space for zlcache_ptr (see mmzone.h) */
> zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL);
> if (!zl)
> return ERR_PTR(-ENOMEM);
>
> That set off alarm bells because we are allocating based on the size of a
> zone, not the size of the zonelist.
>
Never mind me, I'm a tool as it's now semi-obvious. When statically defined,
zlcache_ptr is pointing to something useful as it's setup at boottime. When
dynamically allocated in bind_zonelist, the zlcache_ptr is set to NULL so
it never gets used by zlc_setup().
This could have done with a comment.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: bind_zonelist() - are we definitely sizing this correctly?
2007-07-26 14:25 ` Mel Gorman
@ 2007-07-26 17:58 ` Christoph Lameter
0 siblings, 0 replies; 3+ messages in thread
From: Christoph Lameter @ 2007-07-26 17:58 UTC (permalink / raw)
To: Mel Gorman
Cc: Lee Schermerhorn, ak, apw, kamezawa.hiroyu, linux-mm, linux-kernel
On Thu, 26 Jul 2007, Mel Gorman wrote:
> This could have done with a comment.
See the exhaustive description by Paul include/linux/mmzone.h
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-07-26 17:58 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-26 14:17 bind_zonelist() - are we definitely sizing this correctly? Mel Gorman
2007-07-26 14:25 ` Mel Gorman
2007-07-26 17:58 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox