Re: Excessive memory trapped in pageset lists

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jack Steiner <steiner@sgi.com>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: linux-mm <linux-mm@kvack.org>, clameter@sgi.com
Subject: Re: Excessive memory trapped in pageset lists
Date: Thu, 7 Apr 2005 21:34:36 -0500	[thread overview]
Message-ID: <20050408023436.GA1927@sgi.com> (raw)
In-Reply-To: <1112923481.21749.88.camel@localhost>

On Thu, Apr 07, 2005 at 06:24:41PM -0700, Dave Hansen wrote:
> On Thu, 2005-04-07 at 16:11 -0500, Jack Steiner wrote:
> >    28 pages/node/cpu * 512 cpus * 256nodes * 16384 bytes/page = 60GB  (Yikes!!!)
> ...
> > I have a couple of ideas for fixing this but it looks like Christoph is
> > actively making changes in this area. Christoph do you want to address
> > this issue or should I wait for your patch to stabilize?
> 
> What about only keeping the page lists populated for cpus which can
> locally allocate from the zone?
> 
> 	cpu_to_node(cpu) == page_nid(pfn_to_page(zone->zone_start_pfn)) 

Exactly. That is at the top of my list. What I haven't decided is whether to:

	- leave the list_heads for offnode pages in the per_cpu_pages
	  struct. Offnode lists would be unused but the amount of wasted space
	  is small - probably 0 because of the cacheline alignment 
	  of the per_cpu_pageset. This is the simplest solution
	  but is not clean because of the unused fields. Unless some
	  architectures want to control whether offnode pages
	  are kept in the lists (???).

	  	OR

	- remove the list_heads from the per_cpu_pageset and make it
	  a standalone array in the zone struct. Array size would be
	  MAX_CPUS_PER_NODE. I don't recall any notion of MAX_CPUS_PER_NODE
	  or a relative cpu number on a node (have I overlooked this?). 
	  This solution is cleaner in the long run but may involve more 
	  infrastructure than I wanted to get into at this point.

	  	OR

	- sane as above but have a SINGLE list_head per zone. The list
	  would be used by all cpus on the node. Thsi avoids the page coloring
	  issues I ran into earlier (see prev posting). Obviously, this requires 
	  a lock. However, only on-node cpus would normally take the lock. 
	  Another advantage of this scheme is that an offnode shaker could 
	  acquire the lock & drain the lists if memory became low.

I haven't fully thought thru these ideas. Maybe other alternatives would
be even better.... Suggestions????


> 
> There certainly aren't a lot of cases where frequent, persistent
> single-page allocations are occurring off-node, unless a node is empty.

Hmmmm. True, but one of our popular configurations consists of memory-only nodes.
I know of one site that has 240 memory-only nodes & 16 nodes with
both cpus & memory. For this configuration, most memory if offnode 
to EVERY cpu. (But I still don't want to cache offnode pages).


> If you go to an off-node 'struct zone', you're probably bouncing so many
> cachelines that you don't get any benefit from per-cpu-pages anyway.

Agree, although on the SGI systems, we set a global policy to roundrobin
all file pages across all nodes. However, I'm not suggesting we cache
offnode pages in the per_cpu_pageset. That gets us back to where we 
started - too much memory in percpu page lists. Also, creating a file
page already bounces a lot of cachelines around.

> 
> Maybe there could be a per-cpu-pages miss rate that's required to occur
> before the lists are even populated.  That would probably account better
> for cases where nodes are disproportionately populated with memory.
> This, along with the occasional flushing of the pages back into the
> general allocator if the miss rate isn't satisfied should give some good
> self-tuning behavior.

Makes sense.

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2005-04-08  2:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-07 21:11 Jack Steiner
2005-04-08  1:24 ` Dave Hansen
2005-04-08  2:34   ` Jack Steiner [this message]
2005-04-08  5:18     ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050408023436.GA1927@sgi.com \
    --to=steiner@sgi.com \
    --cc=clameter@sgi.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox