From: Magnus Damm <magnus.damm@gmail.com>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: Christoph Lameter <clameter@sgi.com>,
akpm@osdl.org, Mike Kravetz <kravetz@us.ibm.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 0/4] Swap migration V3: Overview
Date: Wed, 26 Oct 2005 16:04:18 +0900 [thread overview]
Message-ID: <aec7e5c30510260004p5a3b07a9v28ae67b2982f1945@mail.gmail.com> (raw)
In-Reply-To: <20051025143741.GA6604@logos.cnet>
Hi again Marcelo,
On 10/25/05, Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
> On Tue, Oct 25, 2005 at 08:37:52PM +0900, Magnus Damm wrote:
> > DMA: each page has been scanned ~37 times
> > Normal: each page has been scanned ~15 times
> > HighMem: each page has been scanned ~18 times
> >
> > So if your user space page happens to be allocated from the DMA zone,
> > it looks like it is more probable that it will be paged out sooner
> > than if it was allocated from another zone. And this is on a half year
> > old P4 system.
>
> Well the higher relative pressure on a specific zone is a fact you have
> to live with.
Yes, and even if the DMA zone was removed we still would have the same
issue with highmem vs lowmem.
> Even with a global LRU you're going to suffer from the same issue once
> you've got different relative pressure on different zones.
Yep, the per-node LRU will not even out the pressure. But my main
concern is rather the side effect of the pressure difference than the
pressure difference itself.
The side effect is that the "wrong" pages may be paged out in a
per-zone LRU compared to a per-node LRU. This may or may not be a big
deal for performance.
> Thats the reason for the mechanisms which attempt to avoid allocating
> from the lower precious zones (lowmem_reserve and the allocation
> fallback logic).
Exactly. But will this logic always work well? With some memory
configurations the normal zone might be smaller than the DMA zone. And
the same applies for highmem vs normal zone. I'm not sure, but doesn't
the size of the zones somehow relate to the memory pressure?
> > > > There are probably not that many drivers using the DMA zone on a
> > > > modern PC, so instead of bringing performance penalty on the entire
> > > > system I think it would be nicer to punish the evil hardware instead.
> > >
> > > Agreed - the 16MB DMA zone is silly. Would love to see it go away...
> >
> > But is the DMA zone itself evil, or just that we have one LRU per zone...?
>
> I agree that per-zone LRU complicates global page aging (you simply don't have
> global aging).
>
> But how to deal with restricted allocation requirements otherwise?
> Scanning several GB's worth of pages looking for pages in a specific
> small range can't be very promising.
I'm not sure exactly how much of the buddy allocator design that
currently is used by the kernel, but I suspect that 99.9% of all
allocations are 0-order. So it probably makes sense to optimize for
such a case.
Maybe it is possible to scrap the zones and instead use:
0-order pages without restrictions (common case):
Free pages in the node are chained together and either kept on one
list (64 bit system or 32 bit system without highmem) or on two lists;
one for lowmem and one for highmem. Maybe per cpu lists should be used
on top of this too.
Other pages (>0-order, special requirements):
Each node has a bitmap where pages belonging to the node are
represented by one bit each. Each bit is used to determine if the
per-page status. A value of 0 means that the page is used/reserved,
and a 1 means that the page is either free or allocated somehow but it
is possible migrate or page out the data.
So a page marked as 1 may be on the 0-order list, in use on some LRU,
or maybe even migratable SLAB.
The functions in linux/bitmap.h or asm/bitops.h are then used to scan
through the bitmap to find contiguous pages within a certain range of
pages. This allows us to fulfill all sorts of funky requirements such
as alignment or "within N address bits".
The allocator should of course prefer free pages over "used but
migratable", but if no free pages exist to fulfill the requirement,
page migration is used to empty the contiguous range.
The drawback of the idea above is of course the overhead (both memory
and cpu) introduced by the bitmap. But the allocator above may be more
successful for N-order allocations than the buddy allocator since the
pages doesn't have to be aligned. The allocator will probably be even
more successful if page migration is used too.
And then you have a per-node LRU on top of the above. =)
> Hope to be useful comments.
Yes, very useful. Many thanks!
/ magnus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2005-10-26 7:04 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-20 22:59 Christoph Lameter
2005-10-20 22:59 ` [PATCH 1/4] Swap migration V3: LRU operations Christoph Lameter
2005-10-21 6:06 ` Dave Hansen
2005-10-21 6:27 ` Magnus Damm
2005-10-21 6:56 ` Dave Hansen
2005-10-21 7:25 ` Magnus Damm
2005-10-21 15:42 ` Christoph Lameter
2005-10-21 11:49 ` Nikita Danilov
2005-10-20 22:59 ` [PATCH 2/4] Swap migration V3: Page Eviction Christoph Lameter
2005-10-22 1:06 ` Marcelo Tosatti
2005-10-20 22:59 ` [PATCH 3/4] Swap migration V3: MPOL_MF_MOVE interface Christoph Lameter
2005-10-20 22:59 ` [PATCH 4/4] Swap migration V3: sys_migrate_pages interface Christoph Lameter
2005-10-21 2:55 ` KAMEZAWA Hiroyuki
2005-10-21 7:07 ` Simon Derr
2005-10-21 7:20 ` KAMEZAWA Hiroyuki
2005-10-21 7:39 ` Simon Derr
2005-10-21 7:46 ` KAMEZAWA Hiroyuki
2005-10-21 15:22 ` Paul Jackson
2005-10-21 15:15 ` Paul Jackson
2005-10-21 15:21 ` Kamezawa Hiroyuki
2005-10-21 18:10 ` Paul Jackson
2005-10-21 18:26 ` Christoph Lameter
2005-10-21 18:57 ` Paul Jackson
2005-10-21 15:47 ` Christoph Lameter
2005-10-21 16:18 ` Ray Bryant
2005-10-21 16:33 ` Christoph Lameter
2005-10-21 15:18 ` Paul Jackson
2005-10-21 16:27 ` Christoph Lameter
2005-10-21 16:59 ` Kamezawa Hiroyuki
2005-10-21 17:03 ` Paul Jackson
2005-10-21 17:06 ` Christoph Lameter
2005-10-21 18:17 ` Paul Jackson
2005-10-20 23:06 ` [PATCH 0/4] Swap migration V3: Overview Andrew Morton
2005-10-20 23:46 ` mike kravetz
2005-10-21 3:22 ` KAMEZAWA Hiroyuki
2005-10-21 3:32 ` mike kravetz
2005-10-21 3:56 ` KAMEZAWA Hiroyuki
2005-10-21 4:22 ` mike kravetz
2005-10-21 5:13 ` KAMEZAWA Hiroyuki
2005-10-21 15:28 ` Paul Jackson
2005-10-21 16:00 ` mike kravetz
2005-10-21 5:59 ` KAMEZAWA Hiroyuki
2005-10-22 1:16 ` Marcelo Tosatti
2005-10-21 15:54 ` Christoph Lameter
2005-10-21 1:57 ` Magnus Damm
2005-10-22 0:50 ` Marcelo Tosatti
2005-10-23 12:50 ` Magnus Damm
2005-10-24 7:44 ` Marcelo Tosatti
2005-10-25 11:37 ` Magnus Damm
2005-10-25 14:37 ` Marcelo Tosatti
2005-10-26 7:04 ` Magnus Damm [this message]
2005-10-27 15:01 ` Marcelo Tosatti
2005-10-27 20:43 ` Andrew Morton
2005-10-27 21:35 ` Marcelo Tosatti
2005-10-28 3:07 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aec7e5c30510260004p5a3b07a9v28ae67b2982f1945@mail.gmail.com \
--to=magnus.damm@gmail.com \
--cc=akpm@osdl.org \
--cc=clameter@sgi.com \
--cc=kravetz@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=marcelo.tosatti@cyclades.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox