From: Andi Kleen <ak@suse.de>
To: Ray Bryant <raybry@mpdtxmail.amd.com>
Cc: Andi Kleen <ak@suse.de>, Martin Hicks <mort@sgi.com>,
Ingo Molnar <mingo@elte.hu>, Linux MM <linux-mm@kvack.org>,
Andrew Morton <akpm@osdl.org>,
torvalds@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] VM: add vm.free_node_memory sysctl
Date: Fri, 5 Aug 2005 23:48:58 +0200 [thread overview]
Message-ID: <20050805214857.GD8266@wotan.suse.de> (raw)
In-Reply-To: <200508051245.59528.raybry@mpdtxmail.amd.com>
On Fri, Aug 05, 2005 at 12:45:58PM -0500, Ray Bryant wrote:
> > try_to_free_pages should DTRT. That is because we generated a custom zone
> > list only containing nodes in that zone and the zone reclaim only looks
> > into those.
> >
>
> It may depend on what your definition of DTRT is here. :-)
>
> As I understand things, if we have a node that has some mapped memory
> allocated, and if one starts up a numactl -bind node memhog nodesize-slop so
> as to clear some clean page cache pages from that node, then unless the
> "slop" is sized in proportion to the amount of mapped memory used on the
> node, then the existing mapped memory will get swapped out in order to
> satisfy the new request. In addition, clean page-cache pages will get
The VM should first eat clean pages, but yes at some point it will
swap if you want enough. It has to because it doesn't know how
to migrate to other nodes.
> discarded. I think what Martin and I would prefer to see is an interface
> that allows one to just get rid of the clean page cache (or at least enough
> of it) so that additional mapped page allocations will occur locally to the
> node without causing swapping.
That seems like a very special narrow case. But have you tried if memhog
really doesn't work this way?
>
> AFAIK, the number of mapped pages on the node is not exported to user space
> (by, for example, /sys). So there is no good way to size the "slop" to
> allow for an existing allocation. If there was, then using a bound memory
> hog would likely be a reasonable replacement for Martin's syscall to release
> all free page cache, at least for small to medium sized sized systems.
I guess it could be exported without too much trouble.
> The reason we ended up with a sysctl/syscall (to control the aggressiveness
> with which __alloc_pages will try to free page cache before spilling) is that
> deciding whether or not to spend the effort to free up page cache pages on
> the local node before spilling is a workload dependent optimization. For
> an HPC application it is typically worth the effort to try to free local
> node page cache before spilling off node because the program will run
> sufficiently long to make the improvement due to getting local storage
> dominates the extra cost of doing the page allocation. For file server
> workloads, for example, it is typically important to minimize the time to do
> the page allocation; if it turns out to be on a remote node it really doesn't
> matter that much. So it seems to me that we need some way for the
> application to tell the system which approach it prefers based on the type of
> workload it is -- hence the sysctl or syscall approach.
Ideally it should just work transparently. Maybe NUMA allocation
should be a bit more aggressive at cleaning local pages before fallback.
Problem is that it potentially makes the fast path slow.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2005-08-05 21:48 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20050801113913.GA7000@elte.hu>
[not found] ` <20050801102903.378da54f.akpm@osdl.org>
[not found] ` <20050801195426.GA17548@elte.hu>
[not found] ` <20050802171050.GG26803@localhost>
[not found] ` <20050802210746.GA26494@elte.hu>
2005-08-03 13:56 ` Martin Hicks
2005-08-03 14:15 ` Andi Kleen
2005-08-03 14:24 ` Martin Hicks
2005-08-03 14:38 ` Andi Kleen
2005-08-03 14:56 ` Martin Hicks
2005-08-03 19:59 ` Ray Bryant
2005-08-03 20:08 ` Andi Kleen
2005-08-05 17:45 ` Ray Bryant
2005-08-05 21:48 ` Andi Kleen [this message]
2005-08-15 16:05 ` Martin Hicks
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050805214857.GD8266@wotan.suse.de \
--to=ak@suse.de \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=mort@sgi.com \
--cc=raybry@mpdtxmail.amd.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox