From: Mel Gorman <mel@csn.ul.ie>
To: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>, Andrew Morton <akpm@osdl.org>,
Nick Piggin <nickpiggin@yahoo.com.au>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Linux Memory Management List <linux-mm@kvack.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: Page allocator: Single Zone optimizations
Date: Fri, 3 Nov 2006 19:06:09 +0000 (GMT) [thread overview]
Message-ID: <Pine.LNX.4.64.0611031825420.25219@skynet.skynet.ie> (raw)
In-Reply-To: <Pine.LNX.4.64.0611030952530.14741@schroedinger.engr.sgi.com>
On Fri, 3 Nov 2006, Christoph Lameter wrote:
> On Fri, 3 Nov 2006, Mel Gorman wrote:
>
>> I know, this sort of thing would have to be written into page migration before
>> defrag for high-order allocations was developed. Even then, defrag needs to
>> sit on top of something like anti-frag to get teh clustering of movable pages.
>
> Hmmm... The disk defraggers are capable of defragmenting around pinned
> blocks and this seems to be a similar.
Not similar enough. Disk defragmentation aims at having files as
contiguous as possible on the filesystem. if they are not contiguous, it
doesn't matter to functionality but performance degrades slightly.
For allocation of hugepages, the physical pages must be contiguous and
they must be aligned. If there is one unmovable or unreclaimable page in
there, that block is unusable for a hugepage. We can defragment around it
all right, but the resulting block is still not usable. It's not the same
as disk defragmentation.
Defragmentation on it's own is not enough. The clustering based on
reclaimability/movability is still required and that is what anti-frag
provides.
> This only works if the number of
> unmovable objects is small compared to the movable objects otherwise we
> may need this sorting. For other reasons discussed before (memory unplug,
> node unplug) I think it would be necessary to have this separation
> between movable and unmovable pages.
>
If there is only one unmovable block per MAX_ORDER_NR_PAGES in the system,
you can defrag as much as you like and hugepage allocations will still
fail. Similar for hot unplug.
> I can add a migrate_page_table_page() function? The migrate_pages()
> function is only capable of migrating user space pages since it relies on
> being able to take pages off the LRU. At some point we need to
> distinguishthe type of page and call the appropriate migration function
> for the various page types.
>
If such a function existed, then page table pages could be placed beside
"reclaimable" pages and the block could be migrated. However, the
clustering would still have be needed, be it based on reclaimability or
movability (which in many cases is the same thing)
> int migrate_page_table_page(struct page *new, struct page *new);
> ?
>
>> Reclaimable - These are kernel allocations for caches that are
>> reclaimable or allocations that are known to be very short-lived.
>> These allocations are marked __GFP_RECLAIMABLE
>
> For now this would include reclaimable slabs?
It could, but I don't. Currently, only network buffers, inode caches,
buffer heads and dentries are marked like this.
> They are reclaimable with a
> huge effort and there may be pinned objects that we cannot move. Isnt this
> more another case of unmovable?
Probably, they would currently be treated as unmovable.
> Or can we tolerate the objects that cannot
> be moved and classify this as movable (with the understanding that we may
> have to do expensive slab reclaim (up to dropping all reclaimable slabs)
> in order to get there).
>
There is nothing stopping such marking taking place, but I wouldn't if I
thought that reclaiming or moving them was that expensive.
>> Non-Movable - These are pages that are allocated by the kernel that
>> are not trivially reclaimed. For example, the memory allocated for a
>> loaded module would be in this category. By default, allocations are
>> considered to be of this type
>> These are allocations that are not marked otherwise
>
> Ok.
>
> Note that memory for a loaded module is allocated via vmalloc, mapped via
> a page table (init_mm) and thus memory is remappable. We will likely be
> able to move those.
>
It's not just a case of updating init_mm. You would also need to tear down
the vmalloc area for every current running process in the system in case
they had faulted within that module. That would be pretty entertaining.
>> So, right now, page tables would not be marked __GFP_MOVABLE, but they would
>> be later when defrag was developed. Would that be any better?
>
> Isnt this is still doing reclaim instead of defragmentation?
Not necessarily reclaim. Currently we reclaim. Under memory pressure, we
may still reclaim. However, if there was enough free memory (due to
min_free_kbytes been set to a higher value for example), then we could
migrate instead of reclaim to satisfy a high-order allocation. The page
migration stuff is already there so it's clearly possible.
Once again, I am not adverse to writing such a defragment mechanism, but I
see anti-frag as it currently stands as a prequisitie for a
defragmentation mechanism having a decent success rate.
> Maybe it
> will work but I am not not sure about the performance impact. We
> would have to read pages back in from swap or disk?
>
> The problem that we have is that one cannot higher order pages since
> memory is fragmented. Maybe what would initially be sufficient is that a
> failing allocation of a higher order page lead to defrag occurring until
> pages of suffiecient size have been created and then the allocation can
> be satisfied.
>
Defragmentation on it's own would be insufficient for hugepage allocations
because of unmovable pages dotted around the system. We know this because
if you reclaim everything possible in the system, you still are unlikely
to be able to grow the hugepage pool. If reclaiming everything doesn't
give you huge pages, shuffling the same pages around the system won't
improve the situation any either.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-11-03 19:06 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-17 0:50 Christoph Lameter
2006-10-17 1:10 ` Andrew Morton
2006-10-17 1:13 ` Christoph Lameter
2006-10-17 1:27 ` KAMEZAWA Hiroyuki
2006-10-17 1:25 ` Christoph Lameter
2006-10-17 6:04 ` Nick Piggin
2006-10-17 17:54 ` Christoph Lameter
2006-10-18 11:15 ` Nick Piggin
2006-10-18 19:38 ` Andrew Morton
2006-10-23 23:08 ` Christoph Lameter
2006-10-24 1:07 ` Christoph Lameter
2006-10-26 22:09 ` Andrew Morton
2006-10-26 22:28 ` Christoph Lameter
2006-10-28 1:00 ` Christoph Lameter
2006-10-28 2:04 ` Andrew Morton
2006-10-28 2:12 ` Christoph Lameter
2006-10-28 2:24 ` Andrew Morton
2006-10-28 2:31 ` Christoph Lameter
2006-10-28 4:43 ` Andrew Morton
2006-10-28 7:47 ` KAMEZAWA Hiroyuki
2006-10-28 16:12 ` Andi Kleen
2006-10-29 0:48 ` Christoph Lameter
2006-10-29 1:04 ` Andrew Morton
2006-10-29 1:29 ` Christoph Lameter
2006-10-29 11:32 ` Nick Piggin
2006-10-30 16:41 ` Christoph Lameter
2006-11-01 18:26 ` Mel Gorman
2006-11-01 20:34 ` Andrew Morton
2006-11-01 21:00 ` Christoph Lameter
2006-11-01 21:46 ` Andrew Morton
2006-11-01 21:50 ` Christoph Lameter
2006-11-01 22:13 ` Mel Gorman
2006-11-01 23:29 ` Christoph Lameter
2006-11-02 0:22 ` Andrew Morton
2006-11-02 0:27 ` Christoph Lameter
2006-11-02 12:45 ` Mel Gorman
2006-11-01 22:10 ` Mel Gorman
2006-11-02 17:37 ` Andy Whitcroft
2006-11-02 18:08 ` Christoph Lameter
2006-11-02 20:58 ` Mel Gorman
2006-11-02 21:04 ` Christoph Lameter
2006-11-02 21:16 ` Mel Gorman
2006-11-02 21:52 ` Christoph Lameter
2006-11-02 22:37 ` Mel Gorman
2006-11-02 22:50 ` Christoph Lameter
2006-11-03 9:14 ` Mel Gorman
2006-11-03 13:17 ` Andy Whitcroft
2006-11-03 18:11 ` Christoph Lameter
2006-11-03 19:06 ` Mel Gorman [this message]
2006-11-03 19:44 ` Christoph Lameter
2006-11-03 21:11 ` Mel Gorman
2006-11-03 21:42 ` Christoph Lameter
2006-11-03 21:50 ` Andrew Morton
2006-11-03 21:53 ` Christoph Lameter
2006-11-03 22:12 ` Andrew Morton
2006-11-03 22:15 ` Christoph Lameter
2006-11-03 22:19 ` Andi Kleen
2006-11-04 0:37 ` Christoph Lameter
2006-11-04 1:32 ` Andi Kleen
2006-11-06 16:40 ` Christoph Lameter
2006-11-06 16:56 ` Andi Kleen
2006-11-06 17:00 ` Christoph Lameter
2006-11-06 17:07 ` Andi Kleen
2006-11-06 17:12 ` Hugh Dickins
2006-11-06 17:15 ` Christoph Lameter
2006-11-06 17:20 ` Andi Kleen
2006-11-06 17:26 ` Christoph Lameter
2006-11-07 16:30 ` Mel Gorman
2006-11-07 17:54 ` Christoph Lameter
2006-11-07 18:14 ` Mel Gorman
2006-11-08 0:29 ` KAMEZAWA Hiroyuki
2006-11-08 2:08 ` Christoph Lameter
2006-11-13 21:08 ` Mel Gorman
2006-11-03 12:48 ` Peter Zijlstra
2006-11-03 18:15 ` Christoph Lameter
2006-11-03 18:53 ` Peter Zijlstra
2006-11-03 19:23 ` Christoph Lameter
2006-11-02 18:52 ` Andrew Morton
2006-11-02 21:51 ` Mel Gorman
2006-11-02 22:03 ` Andy Whitcroft
2006-11-02 22:11 ` Andrew Morton
2006-11-01 18:13 ` Mel Gorman
2006-11-01 17:39 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0611031825420.25219@skynet.skynet.ie \
--to=mel@csn.ul.ie \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@osdl.org \
--cc=apw@shadowen.org \
--cc=clameter@sgi.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox