From: Christoph Lameter <clameter@sgi.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org
Subject: Re: Page allocator: Single Zone optimizations
Date: Sat, 28 Oct 2006 18:29:07 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0610281805280.14100@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <20061028180402.7c3e6ad8.akpm@osdl.org>
On Sat, 28 Oct 2006, Andrew Morton wrote:
> > We (and I personally with the prezeroing patches) have been down
> > this road several times and did not like what we saw.
>
> Details?
The most important issues that come to my mind right now (this has
been discussed frequently in various contexts so I may be missing
some things) are:
1. Duplicate the caches (pageset structures). This reduces cache hit
rates. Duplicates lots of information in the page allocator.
2. Necessity of additional load balancing across multiple zones.
3. The NUMA layer can only support memory policies for a single zone.
4. You may have to duplicate the slab allocator caches for that
purpose.
5. More bits used in the page flags.
6. ZONES have to be sized at bootup which creates more dangers of runinng
out of memory, possibly requiring more complex load balancing.
7. Having more zones increases fragmentation since the different zones
have separate freelists.
> > For that we would have to have a distinction of removable memory which
> > wont be necessary if we use the existing mappings to move the physical
> > location while keeping the virtual addresses.
>
> You're proposing that all kernel memory be virtually mapped?
>
> I've never seen such a proposal nor any implementation.
It has been that way for years on ia64 and x86_64 also has virtual maps
for all of kernel memory. x86_64 currently uses huge page entries for
the kernel (arch/x86_64/mm/init.c). ia64 has a special TLB entry generator
in arch/ia64/kernel/ivt.S. I assume that other arches do the same. I have
hacked the ia64 TLB entry generator for variable kernel page sizes (see
my memmap patches posted a while back on linux-ia64).
> Or maybe you're referring to something else. Please let's stop playing
> question-and-answer. Please provide sufficient information so that people
> can understand what you're saying.
In the case of x86_64 it is possible to drain pages from an area and then
switch from a huge mapping to page size mappings for the leftover pages by
creating the lower layer pte pages. Then these can be moved individually
if we can stop kernel accesses (need to have a quiescent state on all
processors for this IPI?) while switching the ptes.
AFAIK Virtual iron (last years OLS) simply used a virtual mapping for node
unplug. They drained all the memory via swap and then creates a husk that
contained the remaining pages relocated to nodes still in use (I think
they called it a Zombie node which continued to exist while pages were
remaining or until the node was brought up again).
> Again. On the whole, that was a pretty useless email. Please give us
> something we can use.
Well review the discussions that we had regarding Mel Gorman's defrag
approaches. We discussed this in detail at the VM summit and decided to
not create additional zones but instead separate the free lists. You and
Linus seemed to be in agreement with this. I am a bit surprised ....
Is this a Google effect?
Moreover the discussion here is only remotely connected to the issue at
hand. We all agree that ZONE_DMA is bad and we want to have an alternate
scheme. Why not continue making it possible to not compile ZONE_DMA
dependent code into the kernel?
Single zone patches would increase VM performance. That would in turn
make it more difficult to get approaches in that require multiple zones
since the performance drop would be more significant.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-10-29 1:29 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-17 0:50 Christoph Lameter
2006-10-17 1:10 ` Andrew Morton
2006-10-17 1:13 ` Christoph Lameter
2006-10-17 1:27 ` KAMEZAWA Hiroyuki
2006-10-17 1:25 ` Christoph Lameter
2006-10-17 6:04 ` Nick Piggin
2006-10-17 17:54 ` Christoph Lameter
2006-10-18 11:15 ` Nick Piggin
2006-10-18 19:38 ` Andrew Morton
2006-10-23 23:08 ` Christoph Lameter
2006-10-24 1:07 ` Christoph Lameter
2006-10-26 22:09 ` Andrew Morton
2006-10-26 22:28 ` Christoph Lameter
2006-10-28 1:00 ` Christoph Lameter
2006-10-28 2:04 ` Andrew Morton
2006-10-28 2:12 ` Christoph Lameter
2006-10-28 2:24 ` Andrew Morton
2006-10-28 2:31 ` Christoph Lameter
2006-10-28 4:43 ` Andrew Morton
2006-10-28 7:47 ` KAMEZAWA Hiroyuki
2006-10-28 16:12 ` Andi Kleen
2006-10-29 0:48 ` Christoph Lameter
2006-10-29 1:04 ` Andrew Morton
2006-10-29 1:29 ` Christoph Lameter [this message]
2006-10-29 11:32 ` Nick Piggin
2006-10-30 16:41 ` Christoph Lameter
2006-11-01 18:26 ` Mel Gorman
2006-11-01 20:34 ` Andrew Morton
2006-11-01 21:00 ` Christoph Lameter
2006-11-01 21:46 ` Andrew Morton
2006-11-01 21:50 ` Christoph Lameter
2006-11-01 22:13 ` Mel Gorman
2006-11-01 23:29 ` Christoph Lameter
2006-11-02 0:22 ` Andrew Morton
2006-11-02 0:27 ` Christoph Lameter
2006-11-02 12:45 ` Mel Gorman
2006-11-01 22:10 ` Mel Gorman
2006-11-02 17:37 ` Andy Whitcroft
2006-11-02 18:08 ` Christoph Lameter
2006-11-02 20:58 ` Mel Gorman
2006-11-02 21:04 ` Christoph Lameter
2006-11-02 21:16 ` Mel Gorman
2006-11-02 21:52 ` Christoph Lameter
2006-11-02 22:37 ` Mel Gorman
2006-11-02 22:50 ` Christoph Lameter
2006-11-03 9:14 ` Mel Gorman
2006-11-03 13:17 ` Andy Whitcroft
2006-11-03 18:11 ` Christoph Lameter
2006-11-03 19:06 ` Mel Gorman
2006-11-03 19:44 ` Christoph Lameter
2006-11-03 21:11 ` Mel Gorman
2006-11-03 21:42 ` Christoph Lameter
2006-11-03 21:50 ` Andrew Morton
2006-11-03 21:53 ` Christoph Lameter
2006-11-03 22:12 ` Andrew Morton
2006-11-03 22:15 ` Christoph Lameter
2006-11-03 22:19 ` Andi Kleen
2006-11-04 0:37 ` Christoph Lameter
2006-11-04 1:32 ` Andi Kleen
2006-11-06 16:40 ` Christoph Lameter
2006-11-06 16:56 ` Andi Kleen
2006-11-06 17:00 ` Christoph Lameter
2006-11-06 17:07 ` Andi Kleen
2006-11-06 17:12 ` Hugh Dickins
2006-11-06 17:15 ` Christoph Lameter
2006-11-06 17:20 ` Andi Kleen
2006-11-06 17:26 ` Christoph Lameter
2006-11-07 16:30 ` Mel Gorman
2006-11-07 17:54 ` Christoph Lameter
2006-11-07 18:14 ` Mel Gorman
2006-11-08 0:29 ` KAMEZAWA Hiroyuki
2006-11-08 2:08 ` Christoph Lameter
2006-11-13 21:08 ` Mel Gorman
2006-11-03 12:48 ` Peter Zijlstra
2006-11-03 18:15 ` Christoph Lameter
2006-11-03 18:53 ` Peter Zijlstra
2006-11-03 19:23 ` Christoph Lameter
2006-11-02 18:52 ` Andrew Morton
2006-11-02 21:51 ` Mel Gorman
2006-11-02 22:03 ` Andy Whitcroft
2006-11-02 22:11 ` Andrew Morton
2006-11-01 18:13 ` Mel Gorman
2006-11-01 17:39 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0610281805280.14100@schroedinger.engr.sgi.com \
--to=clameter@sgi.com \
--cc=akpm@osdl.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox