linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: mel@skynet.ie (Mel Gorman)
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	ak@suse.de, Christoph Lameter <clameter@sgi.com>,
	apw@shadowen.org, kamezawa.hiroyu@jp.fujitsu.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: bind_zonelist() - are we definitely sizing this correctly?
Date: Thu, 26 Jul 2007 15:17:57 +0100	[thread overview]
Message-ID: <20070726141756.GB18825@skynet.ie> (raw)

I was looking closer at bind_zonelist() and it has the following snippet

        struct zonelist *zl;
        int num, max, nd;
        enum zone_type k;

        max = 1 + MAX_NR_ZONES * nodes_weight(*nodes);
        max++;                  /* space for zlcache_ptr (see mmzone.h) */
        zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL);
        if (!zl)
                return ERR_PTR(-ENOMEM);

That set off alarm bells because we are allocating based on the size of a
zone, not the size of the zonelist.

This is the definition of struct zonelist

struct zonelist {
        struct zonelist_cache *zlcache_ptr;                  // NULL or &zlcache
        struct zone *zones[MAX_ZONES_PER_ZONELIST + 1];      // NULL delimited
#ifdef CONFIG_NUMA
        struct zonelist_cache zlcache;                       // optional ...
#endif
};

Important thing to note here is that zlcache is after *zones and it is
not a pointer. zlcache in turn is defined as

struct zonelist_cache {
        unsigned short z_to_n[MAX_ZONES_PER_ZONELIST];          /* zone->nid */
        DECLARE_BITMAP(fullzones, MAX_ZONES_PER_ZONELIST);      /* zone full? */
        unsigned long last_full_zap;            /* when last zap'd (jiffies) */
};

This is on NUMA only and it's a big structure.

The intention of bind_zonelist() appears to be that we only allocate enough
memory to hold all the zones in the active nodes. This was fine in 2.6.19
but now with zlcache after *zones[], I think we are in danger of allocating
too little memory and any reading of zlcache may be reading randomness when
MPOL_BIND is in use because it will be using the full offset within the
structure whether the memory is allocated or not.

At the risk of sounding stupid, what obvious thing am I missing that makes
this work?

If I'm right and this is broken and we still want to allocate as little memory
as possible, zlcache has to move before zones and the call to kmalloc needs
to take the size of zlcache into account.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2007-07-26 14:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-26 14:17 Mel Gorman [this message]
2007-07-26 14:25 ` Mel Gorman
2007-07-26 17:58   ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070726141756.GB18825@skynet.ie \
    --to=mel@skynet.ie \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=apw@shadowen.org \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox