Re: [PATCH] VM: add vm.free_node_memory sysctl

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Ray Bryant" <raybry@mpdtxmail.amd.com>
To: Andi Kleen <ak@suse.de>
Cc: Martin Hicks <mort@sgi.com>, Ingo Molnar <mingo@elte.hu>,
	Linux MM <linux-mm@kvack.org>, Andrew Morton <akpm@osdl.org>,
	torvalds@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] VM: add vm.free_node_memory sysctl
Date: Fri, 5 Aug 2005 12:45:58 -0500	[thread overview]
Message-ID: <200508051245.59528.raybry@mpdtxmail.amd.com> (raw)
In-Reply-To: <20050803200808.GE8266@wotan.suse.de>

On Wednesday 03 August 2005 15:08, Andi Kleen wrote:

> >
> > Hmmm.... What happens if there are already mapped pages (e. g. mapped in
> > the sense that pages are mapped into an address space) on the node and
> > you want to allocate some more, but can't because the node is full of
> > clean page cache pages?   Then one would have to set the memhog argument
> > to the right thing to
>

> If you have a bind policy in the memory grabbing program then the standard
> try_to_free_pages should DTRT. That is because we generated a custom zone
> list only containing nodes in that zone and the zone reclaim only looks
> into those.
>

It may depend on what your definition of DTRT is here.  :-)

As I understand things, if we have a node that has some mapped memory 
allocated, and if one starts up a numactl -bind node memhog nodesize-slop so 
as to clear some clean page cache pages from that node, then unless the 
"slop" is sized in proportion to the amount of mapped memory used on the 
node, then the existing mapped memory will get swapped out in order to 
satisfy the new request.  In addition, clean page-cache pages will get 
discarded.  I think what Martin and I would prefer to see is an interface 
that allows one to just get rid of the clean page cache (or at least enough 
of it) so that additional mapped page allocations will occur locally to the 
node without causing swapping.

AFAIK, the number of mapped pages on the node is not exported to user space 
(by, for example, /sys).   So there is no good way to size the "slop" to 
allow for an existing allocation.  If there was, then using a bound memory 
hog would likely be a reasonable replacement for Martin's syscall to release 
all free page cache, at least for small to medium sized sized systems.

> With prefered or other policies it's different though, in that cases
> t_t_f_p will also look into other nodes because the policy is not binding.
>
> That said it might be probably possible to even make non bind policies more
> aggressive at freeing in the current node before looking into other nodes.
> I think the zone balancing has been mostly tuned on non NUMA systems, so
> some improvements might be possible here.
>
> Most people don't use BIND and changing the default policies like this
> might give NUMA systems a better "out of the box" experience.  However this
> memory balance is very subtle code and easy to break, so this would need
> some care.
>

Of course!

> I don't think sysctls or new syscalls are the way to go here though.
>

The reason we ended up with a sysctl/syscall (to control the aggressiveness 
with which __alloc_pages will try to free page cache before spilling) is that 
deciding whether or not  to spend the effort to free up page cache pages on 
the local node before  spilling is a workload dependent optimization.   For 
an HPC application it is  typically worth the effort to try to free local 
node page cache before spilling off node because the program will run 
sufficiently long to make the improvement due to getting local storage 
dominates the extra cost of doing the page allocation.   For file server 
workloads, for example, it is typically important to minimize the time to do 
the page allocation; if it turns out to be on a remote node it really doesn't 
matter that much.   So it seems to me that we need some way for the 
application to tell the system which approach it prefers based on the type of 
workload it is -- hence the sysctl or syscall approach.

> -Andi

-- 
Ray Bryant
AMD Performance Labs                   Austin, Tx
512-602-0038 (o)                 512-507-7807 (c)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2005-08-05 17:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20050801113913.GA7000@elte.hu>
     [not found] ` <20050801102903.378da54f.akpm@osdl.org>
     [not found]   ` <20050801195426.GA17548@elte.hu>
     [not found]     ` <20050802171050.GG26803@localhost>
     [not found]       ` <20050802210746.GA26494@elte.hu>
2005-08-03 13:56         ` Martin Hicks
2005-08-03 14:15           ` Andi Kleen
2005-08-03 14:24             ` Martin Hicks
2005-08-03 14:38               ` Andi Kleen
2005-08-03 14:56                 ` Martin Hicks
2005-08-03 19:59                 ` Ray Bryant
2005-08-03 20:08                   ` Andi Kleen
2005-08-05 17:45                     ` Ray Bryant [this message]
2005-08-05 21:48                       ` Andi Kleen
2005-08-15 16:05                         ` Martin Hicks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200508051245.59528.raybry@mpdtxmail.amd.com \
    --to=raybry@mpdtxmail.amd.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=mort@sgi.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox