linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joel Schopp <jschopp@austin.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: lhms <lhms-devel@lists.sourceforge.net>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-kernel@vger.kernel.org, Mel Gorman <mel@csn.ul.ie>,
	Mike Kravetz <kravetz@us.ibm.com>,
	jschopp@austin.ibm.com
Subject: [PATCH 0/9] fragmentation avoidance
Date: Mon, 26 Sep 2005 15:01:02 -0500	[thread overview]
Message-ID: <4338537E.8070603@austin.ibm.com> (raw)

The buddy system provides an efficient algorithm for managing a set of pages
within each zone. Despite the proven effectiveness of the algorithm in its
current form as used in the kernel, it is not possible to aggregate a subset
of pages within a zone according to specific allocation types. As a result,
two physically contiguous page frames (or sets of page frames) may satisfy
allocation requests that are drastically different. For example, one page
frame may contain data that is only temporarily used by an application while
the other is in use for a kernel device driver.  This can result in heavy
system fragmentation.

This series of patches is designed to reduce fragmentation in the standard
buddy allocator without impairing the performance of the allocator. High
fragmentation in the standard binary buddy allocator means that high-order
allocations can rarely be serviced. These patches work by dividing allocations
into three different types of allocations;

UserReclaimable - These are userspace pages that are easily reclaimable. Right
	now, all allocations of GFP_USER, GFP_HIGHUSER and disk buffers are
	in this category. These pages are trivially reclaimed by writing
	the page out to swap or syncing with backing storage

KernelReclaimable - These are pages allocated by the kernel that are easily
	reclaimed. This is stuff like inode caches, dcache, buffer_heads etc.
	These type of pages potentially could be reclaimed by dumping the
	caches and reaping the slabs

KernelNonReclaimable - These are pages that are allocated by the kernel that
	are not trivially reclaimed. For example, the memory allocated for a
	loaded module would be in this category. By default, allocations are
	considered to be of this type

Instead of having one global MAX_ORDER-sized array of free lists, there
are four, one for each type of allocation and another 12.5% reserve for
fallbacks. Finally, there is a list of pages of size 2^MAX_ORDER which is
a global pool of the largest pages the kernel deals with.

Once a 2^MAX_ORDER block of pages it split for a type of allocation, it is
added to the free-lists for that type, in effect reserving it. Hence, over
time, pages of the different types can be clustered together. This means that
if we wanted 2^MAX_ORDER number of pages, we could linearly scan a block of
pages allocated for UserReclaimable and page each of them out.

Fallback is used when there are no 2^MAX_ORDER pages available and there
are no free pages of the desired type. The fallback lists were chosen in a
way that keeps the most easily reclaimable pages together.

These patches originally were discussed as "Avoiding external fragmentation
with a placement policy" as authored by Mel Gorman and went through about 13
revisions on lkml and linux-mm.  Then with Mel's permission I have been
reworking these patches for easier mergability, readability, maintainability,
etc.  Several revisions have been posted on lhms-devel, as the Linux memory
hotplug community will be a major beneficiary of these patches.  All of the
various revisions have been tested on various platforms and shown to perform
well.  I believe the patches are now ready for inclusion in -mm, and after
wider testing inclusion in the mainline kernel.

The patch set consists of 9 patches that can be merged in 4 separate blocks,
with the only dependency being that the lower numbered patches are merged
first.  All are against 2.6.13.
Patch 1 defines the allocation flags and adds them to the allocator calls.
Patch 2 defines some new structures and the macros used to access them.
Patch 3-8 implement the fully functional fragmentation avoidance.
Patch 9 is trivial but useful for memory hotplug remove.
---
Patch 10 -- not ready for merging -- extends fragmentation avoidance to the
percpu allocator.  This patch works on 2.6.13-rc1 but only with NUMA off on
2.6.13; I am having a great deal of trouble tracking down why, help would be
appreciated.  I include the patch for review and test purposes as I plan to
submit it for merging after resolving the NUMA issues.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2005-09-26 20:01 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-26 20:01 Joel Schopp [this message]
2005-09-26 20:03 ` [PATCH 1/9] add defrag flags Joel Schopp
2005-09-27  0:16   ` Kyle Moffett
2005-09-27  0:24     ` Dave Hansen
2005-09-27  0:43       ` Kyle Moffett
2005-09-27  5:44       ` Paul Jackson
2005-09-27 13:34         ` Mel Gorman
2005-09-27 16:26           ` [Lhms-devel] " Paul Jackson
2005-09-27 18:38         ` Joel Schopp
2005-09-27 19:30           ` Paul Jackson
2005-09-27 21:00             ` [Lhms-devel] " Joel Schopp
2005-09-27 21:23               ` Paul Jackson
2005-09-27 22:03                 ` Joel Schopp
2005-09-27 22:45                   ` Paul Jackson
2005-09-26 20:05 ` [PATCH 2/9] declare defrag structs Joel Schopp
2005-09-26 20:06 ` [PATCH 3/9] initialize defrag Joel Schopp
2005-09-26 20:09 ` [PATCH 4/9] defrag helper functions Joel Schopp
2005-09-26 22:29   ` Alex Bligh - linux-kernel
2005-09-27 16:08     ` Joel Schopp
2005-09-26 20:11 ` [PATCH 5/9] propagate defrag alloc types Joel Schopp
2005-09-26 20:13 ` [PATCH 6/9] fragmentation avoidance core Joel Schopp
2005-09-26 20:14 ` [PATCH 7/9] try harder on large allocations Joel Schopp
2005-09-27  7:21   ` Coywolf Qi Hunt
2005-09-27 16:17     ` Joel Schopp
2005-09-26 20:16 ` [PATCH 8/9] defrag fallback Joel Schopp
2005-09-26 20:17 ` [PATCH 9/9] free memory is user reclaimable Joel Schopp
2005-09-26 20:19 ` [PATCH 10/9] percpu splitout Joel Schopp
2005-09-26 21:49 ` [Lhms-devel] [PATCH 0/9] fragmentation avoidance Joel Schopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4338537E.8070603@austin.ibm.com \
    --to=jschopp@austin.ibm.com \
    --cc=akpm@osdl.org \
    --cc=kravetz@us.ibm.com \
    --cc=lhms-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox