[PATCH 0/13] Reduce external fragmentation by grouping pages by mobility v30

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: akpm@linux-foundation.org
Cc: Mel Gorman <mel@csn.ul.ie>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 0/13] Reduce external fragmentation by grouping pages by mobility v30
Date: Mon, 10 Sep 2007 12:20:11 +0100 (IST)	[thread overview]
Message-ID: <20070910112011.3097.8438.sendpatchset@skynet.skynet.ie> (raw)

Hi Andrew,

Here is a restacked version of the grouping pages by mobility patches
based on the patches currently in your tree. It should be  a drop-in
replacement for what is in 2.6.23-rc4-mm1 and is what I propose for merging
to mainline. The change from what you have already is that the redundant
patches are removed. For example, the patches that made grouping pages by
mobility configurable and later removed that ability do not exist in this
set. Simiarly, the patches for grouping high-order atomic allocations together
does not exist. Also note that the first patch related to IA-64 in this set
appears unrelated but it's required by patches and having the change at the
start makes the patchset more comprehensible in terms of dependencies. This
rebasing work is largely the work of Andy Whitcroft. Thanks Andy.

The patches replaced in -mm are as follows;

add-a-bitmap-that-is-used-to-track-flags-affecting-a-block-of-pages.patch
split-the-free-lists-for-movable-and-unmovable-allocations.patch
choose-pages-from-the-per-cpu-list-based-on-migration-type.patch
add-a-configure-option-to-group-pages-by-mobility.patch
drain-per-cpu-lists-when-high-order-allocations-fail.patch
move-free-pages-between-lists-on-steal.patch
group-short-lived-and-reclaimable-kernel-allocations.patch
group-high-order-atomic-allocations.patch
do-not-group-pages-by-mobility-type-on-low-memory-systems.patch
bias-the-placement-of-kernel-pages-at-lower-pfns.patch
be-more-agressive-about-stealing-when-migrate_reclaimable-allocations-fallback.patch
fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2.patch
fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2-fix.patch
fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2-fix-fix.patch
bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch
remove-page_group_by_mobility.patch
dont-group-high-order-atomic-allocations.patch
fix-calculation-in-move_freepages_block-for-counting-pages.patch
do-not-depend-on-max_order-when-grouping-pages-by-mobility.patch
print-out-statistics-in-relation-to-fragmentation-avoidance-to-proc-pagetypeinfo.patch

Note that the patch
breakout-page_order-to-internalh-to-avoid-special-knowledge-of-the-buddy-allocator.patch
is not in the list and remains in -mm as part of page-owner tracking. In
the series file, the breakout patch is placed after this new patchset.

To refresh;

The objective of this patchset is to keep the system in a state where actions
such as page reclaim or memory compaction will reduce external fragmentation
in the system. It works by grouping pages of similar mobility together in
PAGEBLOCK_NR_PAGES areas. The types of mobility are

UNMOVABLE - Pages that cannot be trivially reclaimed or moved
MOVABLE - Pages that can be moved using the page migration mechanism
RECLAIMABLE - Pages that the kernel can often directly reclaim such as
	those used for inode caches
RESERVE - The areas where min_free_kbyte-related pages should be stored

Instead of having one MAX_ORDER-sized array of free lists in struct free_area,
there is one for each type of mobility.  Once a 2^pageblock_order (typically
the size of the system large page) area of pages is split for a type of
allocation, the remaining unused portion is placed on the free-lists for
that type prioritising its use for compatible mobility allocations.  Hence,
over time, pages of the different types can be clustered together.

When the preferred freelists are expired, the largest possible block is
taken from an alternative list. Again, the unused portion is placed on the
free lists of the preferred allocation-type.

This grouping clearly requires additional work in the page allocator.
kernbench shows effectively no performance difference varying between -0.2%
and +1% on a variety of test machines.  Success rates for huge page allocation
are dramatically increased.  For example, on a ppc64 machine, the vanilla
kernel was only able to allocate 1% of memory as a hugepage and this was
due to a single hugepage reserved as min_free_kbytes.  With these patches
applied, 40% was allocatable as superpages.

These patches work in conjunction with the ZONE_MOVABLE patches that were
merged for 2.6.23-rc1, particularly the allocations that have already been
flagged as __GFP_MOVABLE.

Changelog Since V29
o Remove redundant patches
o Keep min_free_pages contiguous as much as possible
o Agressively group RECLAIMABLE pages together
o Bug fixes that were applied during the time in -mm

Changelog Since V28
o Group high-order atomic allocations together
o It is no longer required to set min_free_kbytes to 10% of memory. A value
  of 16384 in most cases will be sufficient
o Now applied with zone-based anti-fragmentation
o Fix incorrect VM_BUG_ON within buffered_rmqueue()
o Reorder the stack so later patches do not back out work from earlier patches
o Fix bug were journal pages were being treated as movable
o Bias placement of non-movable pages to lower PFNs
o More agressive clustering of reclaimable pages in reactions to workloads
  like updatedb that flood the size of inode caches

Changelog Since V27

o Renamed anti-fragmentation to Page Clustering. Anti-fragmentation was giving
  the mistaken impression that it was the 100% solution for high order
  allocations. Instead, it greatly increases the chances high-order
  allocations will succeed and lays the foundation for defragmentation and
  memory hot-remove to work properly
o Redefine page groupings based on ability to migrate or reclaim instead of
  basing on reclaimability alone
o Get rid of spurious inits
o Per-cpu lists are no longer split up per-type. Instead the per-cpu list is
  searched for a page of the appropriate type
o Added more explanation commentary
o Fix up bug in pageblock code where bitmap was used before being initalised

Changelog Since V26
o Fix double init of lists in setup_pageset

Changelog Since V25
o Fix loop order of for_each_rclmtype_order so that order of loop matches args
o gfpflags_to_rclmtype uses gfp_t instead of unsigned long
o Rename get_pageblock_type() to get_page_rclmtype()
o Fix alignment problem in move_freepages()
o Add mechanism for assigning flags to blocks of pages instead of page->flags
o On fallback, do not examine the preferred list of free pages a second time

Following this email are 14 patches that implement the page grouping feature.
These apply to mainline but can also act as a drop-in replacement for the
patches that are in -mm.

The first patch changes how IA-64 parses the hugepagesz parameter so that
is occurs before memory initialisation. The second patch adds a bitmap that
stores flags per PAGEBLOCK_NR_PAGES block in the system. The third patch
is a fix to the pageblock flags patch that still exists due to it being
developed by Bob Picco.

The fourth patch splits the free lists between movable and all other
allocations.  Following that is a patch that deals with per-cpu pages so that
the free-lists are not containimated by pages of the wrong mobility type.
Next is patch to group temporary and reclaimable pages together in the
same areas and the last functionality patch drains the per-cpu lists when
a high-order allocation fails.

The remaining patches in the set deal with controlling the situations that
can lead to external fragmentation later. They include biasing the location of
unmovable pages to the lower PFNs and being more aggressive about clustering
reclaimable pages together rather than letting them get scattered throughout
the address space that would happen during such activities as updatedb.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2007-09-10 11:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-10 11:20 Mel Gorman [this message]
2007-09-10 11:20 ` [PATCH 1/13] ia64: parse kernel parameter hugepagesz= in early boot, ia64: parse kernel parameter hugepagesz= in early boot Mel Gorman
2007-09-10 11:20 ` [PATCH 2/13] Add a bitmap that is used to track flags affecting a block of pages, Add a bitmap that is used to track flags affecting a block of pages Mel Gorman
2007-09-10 11:21 ` [PATCH 3/13] Fix corruption of memmap on ia64-sparsemem when mem_section is not a power of 2, Fix corruption of memmap on ia64-sparsemem when mem_section is not a power of 2 Mel Gorman
2007-09-10 11:21 ` [PATCH 4/13] Split the free lists for movable and unmovable allocations, Split the free lists for movable and unmovable allocations Mel Gorman
2007-09-10 11:21 ` [PATCH 5/13] Choose pages from the per cpu list-based on migration type, Choose pages from the per cpu list-based on migration type Mel Gorman
2009-07-13 19:16   ` [PATCH 5/13] " Andrew Morton
2009-07-14  9:14     ` Mel Gorman
2007-09-10 11:22 ` [PATCH 6/13] Group short-lived and reclaimable kernel allocations, Group short-lived and reclaimable kernel allocations Mel Gorman
2007-09-10 19:44   ` Paul Jackson
2007-09-10 21:15     ` Mel Gorman
2007-09-10 11:22 ` [PATCH 7/13] Drain per-cpu lists when high-order allocations fail, Drain per-cpu lists when high-order allocations fail Mel Gorman
2007-09-10 15:05   ` [PATCH 7/13] " Nick Piggin
2007-09-11  9:34     ` Mel Gorman
2007-09-10 11:22 ` [PATCH 8/13] Move free pages between lists on steal, Move free pages between lists on steal Mel Gorman
2007-09-10 11:23 ` [PATCH 9/13] Do not group pages by mobility type on low memory systems, Do not group pages by mobility type on low memory systems Mel Gorman
2007-09-10 11:23 ` [PATCH 10/13] Bias the location of pages freed for min_free_kbytes in the same pageblock_nr_pages areas, Bias the location of pages freed for min_free_kbytes in the same pageblock_nr_pages areas Mel Gorman
2007-09-10 11:23 ` [PATCH 11/13] Bias the placement of kernel pages at lower pfns, Bias the placement of kernel pages at lower pfns Mel Gorman
2007-09-10 11:24 ` [PATCH 12/13] Be more agressive about stealing when MIGRATE_RECLAIMABLE allocations fallback, Be more agressive about stealing when MIGRATE_RECLAIMABLE allocations fallback Mel Gorman
2007-09-10 11:24 ` [PATCH 13/13] Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo, Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo Mel Gorman
2007-09-14  1:01 ` [PATCH 0/13] Reduce external fragmentation by grouping pages by mobility v30 Andrew Morton
2007-09-14 14:33   ` Mel Gorman
2007-09-16 10:34     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070910112011.3097.8438.sendpatchset@skynet.skynet.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox