[PATCH 0/4] Reducing fragmentation using lists (sub-zones) v22

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] Reducing fragmentation using lists (sub-zones) v22
@ 2006-01-20 11:54 Mel Gorman
  2006-01-20 11:54 ` [PATCH 1/4] Add __GFP_EASYRCLM flag and update callers Mel Gorman
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Mel Gorman @ 2006-01-20 11:54 UTC (permalink / raw)
  To: linux-mm; +Cc: jschopp, Mel Gorman, linux-kernel, kamezawa.hiroyu, lhms-devel

This is a rebase of the list-based anti-fragmentation approach to act as
a comparison to the zone-based approach. To recall, this version does *not*
depend on a usemap bitfield and instead uses a page flag to track where a page
should be freed to. Fundamentally, the list-based approach works by having
two free lists for each order - one for kernel pages and the other for user
pages. The per-cpu allocator is also split between kernel and user pages.

In total, it adds 146 lines of code, can be fully disabled via a compile-time
option without any performance penalty and when enabled, it was able allocate
about 70% of physical memory in 4MiB contiguous chunks after a heavy set of
benchmarks completed.

This gives best-effort for low fragmentation in all zones. Fully guaranteed
reclaim for hotplug and high order allocation in a zone would require a
zone. Full benchmarks are below the Changelog, but here is the summary

			clean		anti-frag
Kernel extract		14		14		(no difference)
Kernel build		741		741		(no difference)
aim9-page_test		131762.75	134311.90	+2549.15  (1.93%)
aim9-brk_test		586206.90	597167.14	+10960.24 (1.87%)
HugePages under load	60		86		+26     (+43%)
HugePages after tests	141		242		+101	(+71%)

The majorify of the aim9 results measured are within 1% of the standard
allocator. Kernel compiles vary sometimes by about 1-2 seconds or about
0.1%. The HugeTLB pages under load was 7 simulatanous kernel compiles,
each -j1.

When anti-frag is disabled via the config option, it behaves and performs just
like the standard allocator with no performance impact I can measure. When it
is enabled, there are definite variances but they are small and on loads like
kernel builds, it makes less than 3 seconds of a difference over 12 minutes.

Diffstat for full set of patches
 fs/buffer.c                |    3 
 fs/compat.c                |    2 
 fs/exec.c                  |    2 
 fs/inode.c                 |    2 
 include/asm-i386/page.h    |    3 
 include/linux/gfp.h        |    3 
 include/linux/highmem.h    |    3 
 include/linux/mmzone.h     |   28 ++++++-
 include/linux/page-flags.h |    7 +
 init/Kconfig               |   12 +++
 mm/memory.c                |    6 +
 mm/page_alloc.c            |  180 ++++++++++++++++++++++++++++++++++-----------
 mm/shmem.c                 |    4 -
 mm/swap_state.c            |    3 
 14 files changed, 202 insertions(+), 56 deletions(-)

Changelog since v21
o Update to 2.6.16-rc1-mm1

Changelog since v20
o Update to 2.6.15-rc1-mm2
o Remove usemap to reduce cache footprint, also does not hurt hotplug add
o Extend free_area not use new free_area lists, reduces footprint and is faster
o Leverage existing support for per-cpu page drain to avoid free but pinned
  pages
o Small number of micro-optimisations

Changelog since complex anti-fragmentation v19
o Updated to 2.6.15-rc1-mm2
o Removed the fallback area and balancing code
o Only differentiate between kernel and easy reclaimable allocations
  - Removes almost all the code that deals with usemaps
  - Made a number of simplifications based on two allocation lists
  - Fallback code is drastically simpler
o Do not change behaviour for high-order allocations
o Drop stats patch - unnecessary complications

Changelog since v18
o Resync against 2.6.14-rc5-mm1
o 004_markfree dropped
o Documentation note added on the behavior of free_area.nr_free

Changelog since v17
o Update to 2.6.14-rc4-mm1
o Remove explicit casts where implicit casts were in place
o Change __GFP_USER to __GFP_EASYRCLM, RCLM_USER to RCLM_EASY and PCPU_USER to
  PCPU_EASY
o Print a warning and return NULL if both RCLM flags are set in the GFP flags
o Reduce size of fallback_allocs
o Change magic number 64 to FREE_AREA_USEMAP_SIZE
o CodingStyle regressions cleanup
o Move sparsemen setup_usemap() out of header
o Changed fallback_balance to a mechanism that depended on zone->present_pages
  to avoid hotplug problems later
o Many superfluous parenthesis removed

Changlog since v16
o Variables using bit operations now are unsigned long. Note that when used
  as indices, they are integers and cast to unsigned long when necessary.
  This is because aim9 shows regressions when used as unsigned longs 
  throughout (~10% slowdown)
o 004_showfree added to provide more debugging information
o 008_stats dropped. Even with CONFIG_ALLOCSTATS disabled, it is causing 
  severe performance regressions. No explanation as to why
o for_each_rclmtype_order moved to header
o More coding style cleanups

Changelog since V14 (V15 not released)
o Update against 2.6.14-rc3
o Resync with Joel's work. All suggestions made on fix-ups to his last
  set of patches should also be in here. e.g. __GFP_USER is still __GFP_USER
  but is better commented.
o Large amount of CodingStyle, readability cleanups and corrections pointed
  out by Dave Hansen.
o Fix CONFIG_NUMA error that corrupted per-cpu lists
o Patches broken out to have one-feature-per-patch rather than
  more-code-per-patch
o Fix fallback bug where pages for RCLM_NORCLM end up on random other
  free lists.

Changelog since V13
o Patches are now broken out
o Added per-cpu draining of userrclm pages
o Brought the patch more in line with memory hotplug work
o Fine-grained use of the __GFP_USER and __GFP_KERNRCLM flags
o Many coding-style corrections
o Many whitespace-damage corrections

Changelog since V12
o Minor whitespace damage fixed as pointed by Joel Schopp

Changelog since V11
o Mainly a redefiff against 2.6.12-rc5
o Use #defines for indexing into pcpu lists
o Fix rounding error in the size of usemap

Changelog since V10
o All allocation types now use per-cpu caches like the standard allocator
o Removed all the additional buddy allocator statistic code
o Elimated three zone fields that can be lived without
o Simplified some loops
o Removed many unnecessary calculations

Changelog since V9
o Tightened what pools are used for fallbacks, less likely to fragment
o Many micro-optimisations to have the same performance as the standard 
  allocator. Modified allocator now faster than standard allocator using
  gcc 3.3.5
o Add counter for splits/coalescing

Changelog since V8
o rmqueue_bulk() allocates pages in large blocks and breaks it up into the
  requested size. Reduces the number of calls to __rmqueue()
o Beancounters are now a configurable option under "Kernel Hacking"
o Broke out some code into inline functions to be more Hotplug-friendly
o Increased the size of reserve for fallbacks from 10% to 12.5%. 

Changelog since V7
o Updated to 2.6.11-rc4
o Lots of cleanups, mainly related to beancounters
o Fixed up a miscalculation in the bitmap size as pointed out by Mike Kravetz
  (thanks Mike)
o Introduced a 10% reserve for fallbacks. Drastically reduces the number of
  kernnorclm allocations that go to the wrong places
o Don't trigger OOM when large allocations are involved

Changelog since V6
o Updated to 2.6.11-rc2
o Minor change to allow prezeroing to be a cleaner looking patch

Changelog since V5
o Fixed up gcc-2.95 errors
o Fixed up whitespace damage

Changelog since V4
o No changes. Applies cleanly against 2.6.11-rc1 and 2.6.11-rc1-bk6. Applies
  with offsets to 2.6.11-rc1-mm1

Changelog since V3
o inlined get_pageblock_type() and set_pageblock_type()
o set_pageblock_type() now takes a zone parameter to avoid a call to page_zone()
o When taking from the global pool, do not scan all the low-order lists

Changelog since V2
o Do not to interfere with the "min" decay
o Update the __GFP_BITS_SHIFT properly. Old value broke fsync and probably
  anything to do with asynchronous IO

Changelog since V1
o Update patch to 2.6.11-rc1
o Cleaned up bug where memory was wasted on a large bitmap
o Remove code that needed the binary buddy bitmaps
o Update flags to avoid colliding with __GFP_ZERO changes
o Extended fallback_count bean counters to show the fallback count for each
  allocation type
o In-code documentation

Version 1
o Initial release against 2.6.9

This patch is designed to reduce fragmentation in the standard buddy allocator
without impairing the performance of the allocator. High fragmentation in the
standard binary buddy allocator means that high-order allocations can rarely be
serviced which impacts HugeTLB allocations and the hot-removal of memory. This
patch works by dividing allocations into two different types of allocations;

EasyReclaimable - These are userspace pages that are easily reclaimable. This
	flag is set when it is known that the pages will be trivially reclaimed
	by writing the page out to swap or syncing with backing storage

KernelNonReclaimable - These are pages that are allocated by the kernel that
	are not trivially reclaimed. For example, the memory allocated for a
	loaded module would be in this category. By default, allocations are
	considered to be of this type

Instead of having one MAX_ORDER-sized array of free lists in struct free_area,
there are two, one for each type of allocation. Once a 2^MAX_ORDER block of
pages is split for a type of allocation, it is added to the free-lists for
that type, in effect reserving it. Hence, over time, pages of the different
types can be clustered together. When a page is allocated, the page-flags
are updated with a value indicating it's type of page so that it is placed
on the correct list on free.

When the preferred freelists are expired, the largest possible block is taken
from the alternative list. Buddies that are split from that large block are
placed on the preferred allocation-type freelists to mitigate fragmentation.

Three benchmark results are included all based on a 2.6.16-rc1-mm1 kernel
compiled with gcc 3.4. These benchmarks were run in the order you see them
*without* rebooting. This means that when the final highorder stress test,
the system is already running with any fragmentation introduced by other
benchmarks.

The first test called bench-kbuild.sh times a kernel build. Time is in seconds

                               2.6.16-rc1-mm1-clean  2.6.16-rc1-mm1-mbuddy-v22
Time taken to extract kernel:                    14                         15
Time taken to build kernel:                     741                        741

The second is the output of portions of AIM9 for the vanilla
allocator and the list-based anti-fragmentation one;

(Tests run with bench-aim9.sh from VMRegress 0.20)
                 2.6.16-rc1-mm1-clean  2.6.16-rc1-mm1-mbuddy-v22
 1 creat-clo                 12273.11                   12239.80     -33.31 -0.27% File Creations and Closes/second
 2 page_test                131762.75                  134311.90    2549.15 1.93% System Allocations & Pages/second
 3 brk_test                 586206.90                  597167.14   10960.24 1.87% System Memory Allocations/second
 4 jmp_test                4375520.75                 4373004.50   -2516.25 -0.06% Non-local gotos/second
 5 signal_test               79436.76                   77307.56   -2129.20 -2.68% Signal Traps/second
 6 exec_test                    62.90                      62.93       0.03 0.05% Program Loads/second
 7 fork_test                  1211.92                    1218.13       6.21 0.51% Task Creations/second
 8 link_test                  4332.30                    4324.56      -7.74 -0.18% Link/Unlink Pairs/second

The last test is to show that the allocator can satisfy more high-order
allocations, especially under load, than the standard allocator. The test
performs the following;

1. Start updatedb running in the background
2. Load kernel modules that tries to allocate high-order blocks on demand
3. Clean a kernel tree
4. Make 6 copies of the tree. As each copy finishes, a compile starts at -j1
5. Start compiling the primary tree
6. Sleep 1 minute while the 7 trees are being compiled
7. Use the kernel module to attempt 160 times to allocate a 2^10 block of pages
    - note, it only attempts 275 times, no matter how often it succeeds
    - An allocation is attempted every 1/10th of a second
    - Performance will get badly shot as it forces considerable amounts of
      pageout
8. At rest, dd a file from /dev/zero that is the size of physical memory, cat
   it and delete it again to flush as much of buffer cache as possible. Then
   try and allocate as many pages as possible

The result of the allocations under load;

HighAlloc Under Load Test Results Pass 1
                           2.6.16-rc1-mm1-clean  2.6.16-rc1-mm1-mbuddy-v22 
Order                                        10                         10 
Allocation type                         HighMem                    HighMem 
Attempted allocations                       275                        275 
Success allocs                               60                         86 
Failed allocs                               215                        189 
DMA zone allocs                               1                          1 
Normal zone allocs                            5                          0 
HighMem zone allocs                          54                         85 
EasyRclm zone allocs                          0                          0 
% Success                                    21                         31 

HighAlloc Under Load Test Results Pass 2
                           2.6.16-rc1-mm1-clean  2.6.16-rc1-mm1-mbuddy-v22 
Order                                        10                         10 
Allocation type                         HighMem                    HighMem 
Attempted allocations                       275                        275 
Success allocs                              101                        103 
Failed allocs                               174                        172 
DMA zone allocs                               1                          1 
Normal zone allocs                            5                          0 
HighMem zone allocs                          95                        102 
EasyRclm zone allocs                          0                          0 
% Success                                    36                         37 

HighAlloc Test Results while Rested
                           2.6.16-rc1-mm1-clean  2.6.16-rc1-mm1-mbuddy-v22 
Order                                        10                         10 
Allocation type                         HighMem                    HighMem 
Attempted allocations                       275                        275 
Success allocs                              141                        242 
Failed allocs                               134                         33 
DMA zone allocs                               1                          1 
Normal zone allocs                           16                         83 
HighMem zone allocs                         124                        158 
EasyRclm zone allocs                          0                          0 
% Success                                    51                         88 

Under load, it appears there is not a large differences in the percentage of
success but list-based anti-fragmentation was able to allocate 31 more HugeTLB
pages which is a significant improvement. The "Pass 2" results look similar
until you take into account that the -clean kernel is configured with PTEs
allocated from low memory. When CONFIG_HIGHPTE is set, it's success rate is
lower. Higher success rates with list-based depend heavily on the decisions
made by LRU. 

A second indicator of how well fragmentation is addressed is testing again
after the load decreases.  After the tests completed, the standard allocator
was able to allocate 141 order-10 pages and list-based anti-fragmentation
allocated 242, a massive improvement.

The results show that the modified allocator has comparable speed, but
is far less fragmented and in a better position to satisfy high-order
allocations. Unlike zone-based, it requires no configuration at boot time
to achieve the results.
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] Add __GFP_EASYRCLM flag and update callers
  2006-01-20 11:54 [PATCH 0/4] Reducing fragmentation using lists (sub-zones) v22 Mel Gorman
@ 2006-01-20 11:54 ` Mel Gorman
  2006-01-20 11:54 ` [PATCH 2/4] Split the free lists into kernel and user parts Mel Gorman
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2006-01-20 11:54 UTC (permalink / raw)
  To: linux-mm; +Cc: jschopp, Mel Gorman, linux-kernel, kamezawa.hiroyu, lhms-devel

This patch adds a flag __GFP_EASYRCLM.  Allocations using the __GFP_EASYRCLM
flag are expected to be easily reclaimed by syncing with backing storage (be
it a file or swap) or cleaning the buffers and discarding.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/fs/buffer.c linux-2.6.16-rc1-mm1-001_antifrag_flags/fs/buffer.c
--- linux-2.6.16-rc1-mm1-clean/fs/buffer.c	2006-01-19 11:21:58.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/fs/buffer.c	2006-01-19 21:49:49.000000000 +0000
@@ -1115,7 +1115,8 @@ grow_dev_page(struct block_device *bdev,
 	struct page *page;
 	struct buffer_head *bh;
 
-	page = find_or_create_page(inode->i_mapping, index, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, index,
+				   GFP_NOFS|__GFP_EASYRCLM);
 	if (!page)
 		return NULL;
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/fs/compat.c linux-2.6.16-rc1-mm1-001_antifrag_flags/fs/compat.c
--- linux-2.6.16-rc1-mm1-clean/fs/compat.c	2006-01-19 11:21:58.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/fs/compat.c	2006-01-19 21:49:49.000000000 +0000
@@ -1397,7 +1397,7 @@ static int compat_copy_strings(int argc,
 			page = bprm->page[i];
 			new = 0;
 			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
+				page = alloc_page(GFP_HIGHUSER|__GFP_EASYRCLM);
 				bprm->page[i] = page;
 				if (!page) {
 					ret = -ENOMEM;
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/fs/exec.c linux-2.6.16-rc1-mm1-001_antifrag_flags/fs/exec.c
--- linux-2.6.16-rc1-mm1-clean/fs/exec.c	2006-01-19 11:21:58.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/fs/exec.c	2006-01-19 21:49:49.000000000 +0000
@@ -238,7 +238,7 @@ static int copy_strings(int argc, char _
 			page = bprm->page[i];
 			new = 0;
 			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
+				page = alloc_page(GFP_HIGHUSER|__GFP_EASYRCLM);
 				bprm->page[i] = page;
 				if (!page) {
 					ret = -ENOMEM;
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/fs/inode.c linux-2.6.16-rc1-mm1-001_antifrag_flags/fs/inode.c
--- linux-2.6.16-rc1-mm1-clean/fs/inode.c	2006-01-19 11:21:58.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/fs/inode.c	2006-01-19 21:49:49.000000000 +0000
@@ -147,7 +147,7 @@ static struct inode *alloc_inode(struct 
 		mapping->a_ops = &empty_aops;
  		mapping->host = inode;
 		mapping->flags = 0;
-		mapping_set_gfp_mask(mapping, GFP_HIGHUSER);
+		mapping_set_gfp_mask(mapping, GFP_HIGHUSER|__GFP_EASYRCLM);
 		mapping->assoc_mapping = NULL;
 		mapping->backing_dev_info = &default_backing_dev_info;
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/include/asm-i386/page.h linux-2.6.16-rc1-mm1-001_antifrag_flags/include/asm-i386/page.h
--- linux-2.6.16-rc1-mm1-clean/include/asm-i386/page.h	2006-01-19 11:21:59.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/include/asm-i386/page.h	2006-01-19 21:49:49.000000000 +0000
@@ -36,7 +36,8 @@
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
+#define alloc_zeroed_user_highpage(vma, vaddr) \
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | __GFP_EASYRCLM, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 /*
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/include/linux/gfp.h linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/gfp.h
--- linux-2.6.16-rc1-mm1-clean/include/linux/gfp.h	2006-01-17 07:44:47.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/gfp.h	2006-01-19 21:49:49.000000000 +0000
@@ -47,6 +47,7 @@ struct vm_area_struct;
 #define __GFP_ZERO	((__force gfp_t)0x8000u)/* Return zeroed page on success */
 #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
 #define __GFP_HARDWALL   ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
+#define __GFP_EASYRCLM   ((__force gfp_t)0x40000u) /* Easily reclaimed page */
 
 #define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
@@ -55,7 +56,7 @@ struct vm_area_struct;
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
 			__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 			__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
-			__GFP_NOMEMALLOC|__GFP_HARDWALL)
+			__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_EASYRCLM)
 
 /* GFP_ATOMIC means both !wait (__GFP_WAIT not set) and use emergency pool */
 #define GFP_ATOMIC	(__GFP_HIGH)
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/include/linux/highmem.h linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/highmem.h
--- linux-2.6.16-rc1-mm1-clean/include/linux/highmem.h	2006-01-17 07:44:47.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/highmem.h	2006-01-19 21:49:49.000000000 +0000
@@ -47,7 +47,8 @@ static inline void clear_user_highpage(s
 static inline struct page *
 alloc_zeroed_user_highpage(struct vm_area_struct *vma, unsigned long vaddr)
 {
-	struct page *page = alloc_page_vma(GFP_HIGHUSER, vma, vaddr);
+	struct page *page = alloc_page_vma(GFP_HIGHUSER|__GFP_EASYRCLM,
+							vma, vaddr);
 
 	if (page)
 		clear_user_highpage(page, vaddr);
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/mm/memory.c linux-2.6.16-rc1-mm1-001_antifrag_flags/mm/memory.c
--- linux-2.6.16-rc1-mm1-clean/mm/memory.c	2006-01-19 11:21:59.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/mm/memory.c	2006-01-19 21:50:24.000000000 +0000
@@ -1472,7 +1472,8 @@ gotten:
 		if (!new_page)
 			goto oom;
 	} else {
-		new_page = alloc_page_vma(GFP_HIGHUSER, vma, address);
+		new_page = alloc_page_vma(GFP_HIGHUSER|__GFP_EASYRCLM,
+							vma, address);
 		if (!new_page)
 			goto oom;
 		cow_user_page(new_page, old_page, address);
@@ -2071,7 +2072,8 @@ retry:
 
 		if (unlikely(anon_vma_prepare(vma)))
 			goto oom;
-		page = alloc_page_vma(GFP_HIGHUSER, vma, address);
+		page = alloc_page_vma(GFP_HIGHUSER|__GFP_EASYRCLM,
+						vma, address);
 		if (!page)
 			goto oom;
 		copy_user_highpage(page, new_page, address);
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/mm/shmem.c linux-2.6.16-rc1-mm1-001_antifrag_flags/mm/shmem.c
--- linux-2.6.16-rc1-mm1-clean/mm/shmem.c	2006-01-19 11:21:59.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/mm/shmem.c	2006-01-19 21:49:49.000000000 +0000
@@ -921,7 +921,7 @@ shmem_alloc_page(gfp_t gfp, struct shmem
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, idx);
 	pvma.vm_pgoff = idx;
 	pvma.vm_end = PAGE_SIZE;
-	page = alloc_page_vma(gfp | __GFP_ZERO, &pvma, 0);
+	page = alloc_page_vma(gfp | __GFP_ZERO | __GFP_EASYRCLM, &pvma, 0);
 	mpol_free(pvma.vm_policy);
 	return page;
 }
@@ -936,7 +936,7 @@ shmem_swapin(struct shmem_inode_info *in
 static inline struct page *
 shmem_alloc_page(gfp_t gfp,struct shmem_inode_info *info, unsigned long idx)
 {
-	return alloc_page(gfp | __GFP_ZERO);
+	return alloc_page(gfp | __GFP_ZERO | __GFP_EASYRCLM);
 }
 #endif
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-clean/mm/swap_state.c linux-2.6.16-rc1-mm1-001_antifrag_flags/mm/swap_state.c
--- linux-2.6.16-rc1-mm1-clean/mm/swap_state.c	2006-01-19 11:21:59.000000000 +0000
+++ linux-2.6.16-rc1-mm1-001_antifrag_flags/mm/swap_state.c	2006-01-19 21:49:49.000000000 +0000
@@ -334,7 +334,8 @@ struct page *read_swap_cache_async(swp_e
 		 * Get a new page to read into from swap.
 		 */
 		if (!new_page) {
-			new_page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
+			new_page = alloc_page_vma(GFP_HIGHUSER|__GFP_EASYRCLM,
+							vma, addr);
 			if (!new_page)
 				break;		/* Out of memory */
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/4] Split the free lists into kernel and user parts
  2006-01-20 11:54 [PATCH 0/4] Reducing fragmentation using lists (sub-zones) v22 Mel Gorman
  2006-01-20 11:54 ` [PATCH 1/4] Add __GFP_EASYRCLM flag and update callers Mel Gorman
@ 2006-01-20 11:54 ` Mel Gorman
  2006-01-22 13:31   ` Marcelo Tosatti
  2006-02-05  8:57   ` Coywolf Qi Hunt
  2006-01-20 11:55 ` [PATCH 3/4] Split the per-cpu " Mel Gorman
  2006-01-20 11:55 ` [PATCH 4/4] Add a configure option for anti-fragmentation Mel Gorman
  3 siblings, 2 replies; 12+ messages in thread
From: Mel Gorman @ 2006-01-20 11:54 UTC (permalink / raw)
  To: linux-mm; +Cc: jschopp, Mel Gorman, linux-kernel, kamezawa.hiroyu, lhms-devel

This patch adds the core of the anti-fragmentation strategy. It works by
grouping related allocation types together. The idea is that large groups of
pages that may be reclaimed are placed near each other. The zone->free_area
list is broken into RCLM_TYPES number of lists.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h
--- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h	2006-01-19 11:21:59.000000000 +0000
+++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h	2006-01-19 21:51:05.000000000 +0000
@@ -22,8 +22,16 @@
 #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
 #endif
 
+#define RCLM_NORCLM 0
+#define RCLM_EASY   1
+#define RCLM_TYPES  2
+
+#define for_each_rclmtype_order(type, order) \
+	for (order = 0; order < MAX_ORDER; order++) \
+		for (type = 0; type < RCLM_TYPES; type++)
+
 struct free_area {
-	struct list_head	free_list;
+	struct list_head	free_list[RCLM_TYPES];
 	unsigned long		nr_free;
 };
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h
--- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h	2006-01-19 11:21:59.000000000 +0000
+++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h	2006-01-19 21:51:05.000000000 +0000
@@ -76,6 +76,7 @@
 #define PG_reclaim		17	/* To be reclaimed asap */
 #define PG_nosave_free		18	/* Free, should not be written */
 #define PG_uncached		19	/* Page has been mapped as uncached */
+#define PG_easyrclm		20	/* Page is in an easy reclaim block */
 
 /*
  * Global page accounting.  One instance per CPU.  Only unsigned longs are
@@ -345,6 +346,12 @@ extern void __mod_page_state_offset(unsi
 #define SetPageUncached(page)	set_bit(PG_uncached, &(page)->flags)
 #define ClearPageUncached(page)	clear_bit(PG_uncached, &(page)->flags)
 
+#define PageEasyRclm(page)	test_bit(PG_easyrclm, &(page)->flags)
+#define SetPageEasyRclm(page)	set_bit(PG_easyrclm, &(page)->flags)
+#define ClearPageEasyRclm(page)	clear_bit(PG_easyrclm, &(page)->flags)
+#define __SetPageEasyRclm(page)	__set_bit(PG_easyrclm, &(page)->flags)
+#define __ClearPageEasyRclm(page) __clear_bit(PG_easyrclm, &(page)->flags)
+
 struct page;	/* forward declaration */
 
 int test_clear_page_dirty(struct page *page);
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/mm/page_alloc.c linux-2.6.16-rc1-mm1-002_fragcore/mm/page_alloc.c
--- linux-2.6.16-rc1-mm1-001_antifrag_flags/mm/page_alloc.c	2006-01-19 11:21:59.000000000 +0000
+++ linux-2.6.16-rc1-mm1-002_fragcore/mm/page_alloc.c	2006-01-19 22:12:09.000000000 +0000
@@ -72,6 +72,16 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_Z
 
 EXPORT_SYMBOL(totalram_pages);
 
+static inline int get_pageblock_type(struct page *page)
+{
+	return (PageEasyRclm(page) != 0);
+}
+
+static inline int gfpflags_to_alloctype(unsigned long gfp_flags)
+{
+	return ((gfp_flags & __GFP_EASYRCLM) != 0);
+}
+
 /*
  * Used by page_zone() to look up the address of the struct zone whose
  * id is encoded in the upper bits of page->flags
@@ -328,11 +338,13 @@ static inline void __free_one_page(struc
 {
 	unsigned long page_idx;
 	int order_size = 1 << order;
+	int alloctype = get_pageblock_type(page);
 
 	if (unlikely(PageCompound(page)))
 		destroy_compound_page(page, order);
 
 	page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
+	__SetPageEasyRclm(page);
 
 	BUG_ON(page_idx & (order_size - 1));
 	BUG_ON(bad_range(zone, page));
@@ -340,7 +352,6 @@ static inline void __free_one_page(struc
 	zone->free_pages += order_size;
 	while (order < MAX_ORDER-1) {
 		unsigned long combined_idx;
-		struct free_area *area;
 		struct page *buddy;
 
 		buddy = __page_find_buddy(page, page_idx, order);
@@ -348,8 +359,7 @@ static inline void __free_one_page(struc
 			break;		/* Move the buddy up one level. */
 
 		list_del(&buddy->lru);
-		area = zone->free_area + order;
-		area->nr_free--;
+		zone->free_area[order].nr_free--;
 		rmv_page_order(buddy);
 		combined_idx = __find_combined_index(page_idx, order);
 		page = page + (combined_idx - page_idx);
@@ -357,7 +367,7 @@ static inline void __free_one_page(struc
 		order++;
 	}
 	set_page_order(page, order);
-	list_add(&page->lru, &zone->free_area[order].free_list);
+	list_add(&page->lru, &zone->free_area[order].free_list[alloctype]);
 	zone->free_area[order].nr_free++;
 }
 
@@ -500,7 +510,8 @@ void fastcall __init __free_pages_bootme
  * -- wli
  */
 static inline void expand(struct zone *zone, struct page *page,
- 	int low, int high, struct free_area *area)
+ 	int low, int high, struct free_area *area,
+	int alloctype)
 {
 	unsigned long size = 1 << high;
 
@@ -509,7 +520,7 @@ static inline void expand(struct zone *z
 		high--;
 		size >>= 1;
 		BUG_ON(bad_range(zone, &page[size]));
-		list_add(&page[size].lru, &area->free_list);
+		list_add(&page[size].lru, &area->free_list[alloctype]);
 		area->nr_free++;
 		set_page_order(&page[size], high);
 	}
@@ -552,31 +563,77 @@ static int prep_new_page(struct page *pa
 	return 0;
 }
 
+/* Remove an element from the buddy allocator from the fallback list */
+ static struct page *__rmqueue_fallback(struct zone *zone, int order,
+							int alloctype)
+{
+	struct free_area * area;
+	int current_order;
+	struct page *page;
+
+	/* Find the largest possible block of pages in the other list */
+	alloctype = !alloctype;
+	for (current_order = MAX_ORDER-1; current_order >= order;
+						--current_order) {
+		area = &(zone->free_area[current_order]);
+ 		if (list_empty(&area->free_list[alloctype]))
+ 			continue;
+
+		page = list_entry(area->free_list[alloctype].next,
+					struct page, lru);
+		area->nr_free--;
+
+		/*
+		 * If breaking a large block of pages, place the buddies
+		 * on the preferred allocation list
+		 */
+		if (unlikely(current_order >= MAX_ORDER / 2))
+			alloctype = !alloctype;
+
+		list_del(&page->lru);
+		rmv_page_order(page);
+		zone->free_pages -= 1UL << order;
+		expand(zone, page, order, current_order, area, alloctype);
+		return page;
+	}
+
+	return NULL;
+}
+
 /* 
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
  */
-static struct page *__rmqueue(struct zone *zone, unsigned int order)
+static struct page *__rmqueue(struct zone *zone, unsigned int order,
+		int alloctype)
 {
 	struct free_area * area;
 	unsigned int current_order;
 	struct page *page;
 
+	/* Find a page of the appropriate size in the preferred list */
 	for (current_order = order; current_order < MAX_ORDER; ++current_order) {
-		area = zone->free_area + current_order;
-		if (list_empty(&area->free_list))
+		area = &(zone->free_area[current_order]);
+		if (list_empty(&area->free_list[alloctype]))
 			continue;
 
-		page = list_entry(area->free_list.next, struct page, lru);
+		page = list_entry(area->free_list[alloctype].next,
+					struct page, lru);
 		list_del(&page->lru);
 		rmv_page_order(page);
 		area->nr_free--;
 		zone->free_pages -= 1UL << order;
-		expand(zone, page, order, current_order, area);
-		return page;
+		expand(zone, page, order, current_order, area, alloctype);
+		goto got_page;
 	}
 
-	return NULL;
+	page = __rmqueue_fallback(zone, order, alloctype);
+
+got_page:
+	if (unlikely(alloctype == RCLM_NORCLM) && page)
+		__ClearPageEasyRclm(page);
+
+	return page;
 }
 
 /* 
@@ -585,13 +642,14 @@ static struct page *__rmqueue(struct zon
  * Returns the number of new pages which were placed at *list.
  */
 static int rmqueue_bulk(struct zone *zone, unsigned int order, 
-			unsigned long count, struct list_head *list)
+			unsigned long count, struct list_head *list,
+			int alloctype)
 {
 	int i;
 	
 	spin_lock(&zone->lock);
 	for (i = 0; i < count; ++i) {
-		struct page *page = __rmqueue(zone, order);
+		struct page *page = __rmqueue(zone, order, alloctype);
 		if (unlikely(page == NULL))
 			break;
 		list_add_tail(&page->lru, list);
@@ -658,7 +716,7 @@ static void __drain_pages(unsigned int c
 void mark_free_pages(struct zone *zone)
 {
 	unsigned long zone_pfn, flags;
-	int order;
+	int order, t;
 	struct list_head *curr;
 
 	if (!zone->spanned_pages)
@@ -669,13 +727,15 @@ void mark_free_pages(struct zone *zone)
 		ClearPageNosaveFree(pfn_to_page(zone_pfn + zone->zone_start_pfn));
 
 	for (order = MAX_ORDER - 1; order >= 0; --order)
-		list_for_each(curr, &zone->free_area[order].free_list) {
+	for_each_rclmtype_order(t, order) {
+		list_for_each(curr, &zone->free_area[order].free_list[t]) {
 			unsigned long start_pfn, i;
 
 			start_pfn = page_to_pfn(list_entry(curr, struct page, lru));
 
 			for (i=0; i < (1<<order); i++)
 				SetPageNosaveFree(pfn_to_page(start_pfn+i));
+		}
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -775,6 +835,7 @@ static struct page *buffered_rmqueue(str
 	unsigned long flags;
 	struct page *page;
 	int cold = !!(gfp_flags & __GFP_COLD);
+	int alloctype = gfpflags_to_alloctype(gfp_flags);
 	int cpu;
 
 again:
@@ -786,7 +847,8 @@ again:
 		local_irq_save(flags);
 		if (!pcp->count) {
 			pcp->count += rmqueue_bulk(zone, 0,
-						pcp->batch, &pcp->list);
+						pcp->batch, &pcp->list,
+						alloctype);
 			if (unlikely(!pcp->count))
 				goto failed;
 		}
@@ -795,7 +857,7 @@ again:
 		pcp->count--;
 	} else {
 		spin_lock_irqsave(&zone->lock, flags);
-		page = __rmqueue(zone, order);
+		page = __rmqueue(zone, order, alloctype);
 		spin_unlock(&zone->lock);
 		if (!page)
 			goto failed;
@@ -1852,7 +1914,8 @@ void zone_init_free_lists(struct pglist_
 {
 	int order;
 	for (order = 0; order < MAX_ORDER ; order++) {
-		INIT_LIST_HEAD(&zone->free_area[order].free_list);
+		INIT_LIST_HEAD(&zone->free_area[order].free_list[RCLM_NORCLM]);
+		INIT_LIST_HEAD(&zone->free_area[order].free_list[RCLM_EASY]);
 		zone->free_area[order].nr_free = 0;
 	}
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Split the free lists into kernel and user parts
  2006-01-20 11:54 ` [PATCH 2/4] Split the free lists into kernel and user parts Mel Gorman
@ 2006-01-22 13:31   ` Marcelo Tosatti
  2006-01-23  9:39     ` Mel Gorman
  2006-02-05  8:57   ` Coywolf Qi Hunt
  1 sibling, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2006-01-22 13:31 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, jschopp, linux-kernel, kamezawa.hiroyu, lhms-devel

Hi Mel,

On Fri, Jan 20, 2006 at 11:54:55AM +0000, Mel Gorman wrote:
> 
> This patch adds the core of the anti-fragmentation strategy. It works by
> grouping related allocation types together. The idea is that large groups of
> pages that may be reclaimed are placed near each other. The zone->free_area
> list is broken into RCLM_TYPES number of lists.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
> diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h
> --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h	2006-01-19 11:21:59.000000000 +0000
> +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h	2006-01-19 21:51:05.000000000 +0000
> @@ -22,8 +22,16 @@
>  #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
>  #endif
>  
> +#define RCLM_NORCLM 0
> +#define RCLM_EASY   1
> +#define RCLM_TYPES  2
> +
> +#define for_each_rclmtype_order(type, order) \
> +	for (order = 0; order < MAX_ORDER; order++) \
> +		for (type = 0; type < RCLM_TYPES; type++)
> +
>  struct free_area {
> -	struct list_head	free_list;
> +	struct list_head	free_list[RCLM_TYPES];
>  	unsigned long		nr_free;
>  };
>  
> diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h
> --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h	2006-01-19 11:21:59.000000000 +0000
> +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h	2006-01-19 21:51:05.000000000 +0000
> @@ -76,6 +76,7 @@
>  #define PG_reclaim		17	/* To be reclaimed asap */
>  #define PG_nosave_free		18	/* Free, should not be written */
>  #define PG_uncached		19	/* Page has been mapped as uncached */
> +#define PG_easyrclm		20	/* Page is in an easy reclaim block */
>  
>  /*
>   * Global page accounting.  One instance per CPU.  Only unsigned longs are
> @@ -345,6 +346,12 @@ extern void __mod_page_state_offset(unsi
>  #define SetPageUncached(page)	set_bit(PG_uncached, &(page)->flags)
>  #define ClearPageUncached(page)	clear_bit(PG_uncached, &(page)->flags)
>  
> +#define PageEasyRclm(page)	test_bit(PG_easyrclm, &(page)->flags)
> +#define SetPageEasyRclm(page)	set_bit(PG_easyrclm, &(page)->flags)
> +#define ClearPageEasyRclm(page)	clear_bit(PG_easyrclm, &(page)->flags)
> +#define __SetPageEasyRclm(page)	__set_bit(PG_easyrclm, &(page)->flags)
> +#define __ClearPageEasyRclm(page) __clear_bit(PG_easyrclm, &(page)->flags)
> +

You can't read/write to page->flags non-atomically, except when you
guarantee that the page is not visible to other CPU's (eg at the very
end of the page freeing code).

Please use atomic operations.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Split the free lists into kernel and user parts
  2006-01-22 13:31   ` Marcelo Tosatti
@ 2006-01-23  9:39     ` Mel Gorman
  2006-01-23 19:13       ` Marcelo Tosatti
  0 siblings, 1 reply; 12+ messages in thread
From: Mel Gorman @ 2006-01-23  9:39 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-mm, jschopp, linux-kernel, kamezawa.hiroyu, lhms-devel

On Sun, 22 Jan 2006, Marcelo Tosatti wrote:

> Hi Mel,
>
> On Fri, Jan 20, 2006 at 11:54:55AM +0000, Mel Gorman wrote:
> >
> > This patch adds the core of the anti-fragmentation strategy. It works by
> > grouping related allocation types together. The idea is that large groups of
> > pages that may be reclaimed are placed near each other. The zone->free_area
> > list is broken into RCLM_TYPES number of lists.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
> > diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h
> > --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h	2006-01-19 11:21:59.000000000 +0000
> > +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h	2006-01-19 21:51:05.000000000 +0000
> > @@ -22,8 +22,16 @@
> >  #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
> >  #endif
> >
> > +#define RCLM_NORCLM 0
> > +#define RCLM_EASY   1
> > +#define RCLM_TYPES  2
> > +
> > +#define for_each_rclmtype_order(type, order) \
> > +	for (order = 0; order < MAX_ORDER; order++) \
> > +		for (type = 0; type < RCLM_TYPES; type++)
> > +
> >  struct free_area {
> > -	struct list_head	free_list;
> > +	struct list_head	free_list[RCLM_TYPES];
> >  	unsigned long		nr_free;
> >  };
> >
> > diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h
> > --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h	2006-01-19 11:21:59.000000000 +0000
> > +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h	2006-01-19 21:51:05.000000000 +0000
> > @@ -76,6 +76,7 @@
> >  #define PG_reclaim		17	/* To be reclaimed asap */
> >  #define PG_nosave_free		18	/* Free, should not be written */
> >  #define PG_uncached		19	/* Page has been mapped as uncached */
> > +#define PG_easyrclm		20	/* Page is in an easy reclaim block */
> >
> >  /*
> >   * Global page accounting.  One instance per CPU.  Only unsigned longs are
> > @@ -345,6 +346,12 @@ extern void __mod_page_state_offset(unsi
> >  #define SetPageUncached(page)	set_bit(PG_uncached, &(page)->flags)
> >  #define ClearPageUncached(page)	clear_bit(PG_uncached, &(page)->flags)
> >
> > +#define PageEasyRclm(page)	test_bit(PG_easyrclm, &(page)->flags)
> > +#define SetPageEasyRclm(page)	set_bit(PG_easyrclm, &(page)->flags)
> > +#define ClearPageEasyRclm(page)	clear_bit(PG_easyrclm, &(page)->flags)
> > +#define __SetPageEasyRclm(page)	__set_bit(PG_easyrclm, &(page)->flags)
> > +#define __ClearPageEasyRclm(page) __clear_bit(PG_easyrclm, &(page)->flags)
> > +
>
> You can't read/write to page->flags non-atomically, except when you
> guarantee that the page is not visible to other CPU's (eg at the very
> end of the page freeing code).
>

The helper PageEasyRclm is only used when either the spinlock is held or a
per-cpu page is being released so it should be safe. The Set and Clear
helpers are only used with a spinlock held.

> Please use atomic operations.
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Split the free lists into kernel and user parts
  2006-01-23  9:39     ` Mel Gorman
@ 2006-01-23 19:13       ` Marcelo Tosatti
  2006-01-26 15:55         ` Mel Gorman
  0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2006-01-23 19:13 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, jschopp, linux-kernel, kamezawa.hiroyu, lhms-devel

On Mon, Jan 23, 2006 at 09:39:16AM +0000, Mel Gorman wrote:
> On Sun, 22 Jan 2006, Marcelo Tosatti wrote:
> 
> > Hi Mel,
> >
> > On Fri, Jan 20, 2006 at 11:54:55AM +0000, Mel Gorman wrote:
> > >
> > > This patch adds the core of the anti-fragmentation strategy. It works by
> > > grouping related allocation types together. The idea is that large groups of
> > > pages that may be reclaimed are placed near each other. The zone->free_area
> > > list is broken into RCLM_TYPES number of lists.
> > >
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
> > > diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h
> > > --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h	2006-01-19 11:21:59.000000000 +0000
> > > +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h	2006-01-19 21:51:05.000000000 +0000
> > > @@ -22,8 +22,16 @@
> > >  #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
> > >  #endif
> > >
> > > +#define RCLM_NORCLM 0
> > > +#define RCLM_EASY   1
> > > +#define RCLM_TYPES  2
> > > +
> > > +#define for_each_rclmtype_order(type, order) \
> > > +	for (order = 0; order < MAX_ORDER; order++) \
> > > +		for (type = 0; type < RCLM_TYPES; type++)
> > > +
> > >  struct free_area {
> > > -	struct list_head	free_list;
> > > +	struct list_head	free_list[RCLM_TYPES];
> > >  	unsigned long		nr_free;
> > >  };
> > >
> > > diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h
> > > --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h	2006-01-19 11:21:59.000000000 +0000
> > > +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h	2006-01-19 21:51:05.000000000 +0000
> > > @@ -76,6 +76,7 @@
> > >  #define PG_reclaim		17	/* To be reclaimed asap */
> > >  #define PG_nosave_free		18	/* Free, should not be written */
> > >  #define PG_uncached		19	/* Page has been mapped as uncached */
> > > +#define PG_easyrclm		20	/* Page is in an easy reclaim block */
> > >
> > >  /*
> > >   * Global page accounting.  One instance per CPU.  Only unsigned longs are
> > > @@ -345,6 +346,12 @@ extern void __mod_page_state_offset(unsi
> > >  #define SetPageUncached(page)	set_bit(PG_uncached, &(page)->flags)
> > >  #define ClearPageUncached(page)	clear_bit(PG_uncached, &(page)->flags)
> > >
> > > +#define PageEasyRclm(page)	test_bit(PG_easyrclm, &(page)->flags)
> > > +#define SetPageEasyRclm(page)	set_bit(PG_easyrclm, &(page)->flags)
> > > +#define ClearPageEasyRclm(page)	clear_bit(PG_easyrclm, &(page)->flags)
> > > +#define __SetPageEasyRclm(page)	__set_bit(PG_easyrclm, &(page)->flags)
> > > +#define __ClearPageEasyRclm(page) __clear_bit(PG_easyrclm, &(page)->flags)
> > > +
> >
> > You can't read/write to page->flags non-atomically, except when you
> > guarantee that the page is not visible to other CPU's (eg at the very
> > end of the page freeing code).
> >
> 
> The helper PageEasyRclm is only used when either the spinlock is held or a
> per-cpu page is being released so it should be safe. The Set and Clear
> helpers are only used with a spinlock held.

Mel,

Other codepaths which touch page->flags do not hold any lock, so you
really must use atomic operations, except when you've guarantee that the
page is being freed and won't be reused.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Split the free lists into kernel and user parts
  2006-01-23 19:13       ` Marcelo Tosatti
@ 2006-01-26 15:55         ` Mel Gorman
  2006-01-31 19:57           ` Marcelo Tosatti
  0 siblings, 1 reply; 12+ messages in thread
From: Mel Gorman @ 2006-01-26 15:55 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-mm, jschopp, linux-kernel, kamezawa.hiroyu, lhms-devel

On Mon, 23 Jan 2006, Marcelo Tosatti wrote:

> On Mon, Jan 23, 2006 at 09:39:16AM +0000, Mel Gorman wrote:
> > On Sun, 22 Jan 2006, Marcelo Tosatti wrote:
> >
> > > Hi Mel,
> > >
> > > On Fri, Jan 20, 2006 at 11:54:55AM +0000, Mel Gorman wrote:
> > > >
> > > > This patch adds the core of the anti-fragmentation strategy. It works by
> > > > grouping related allocation types together. The idea is that large groups of
> > > > pages that may be reclaimed are placed near each other. The zone->free_area
> > > > list is broken into RCLM_TYPES number of lists.
> > > >
> > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > > Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
> > > > diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h
> > > > --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h	2006-01-19 11:21:59.000000000 +0000
> > > > +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h	2006-01-19 21:51:05.000000000 +0000
> > > > @@ -22,8 +22,16 @@
> > > >  #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
> > > >  #endif
> > > >
> > > > +#define RCLM_NORCLM 0
> > > > +#define RCLM_EASY   1
> > > > +#define RCLM_TYPES  2
> > > > +
> > > > +#define for_each_rclmtype_order(type, order) \
> > > > +	for (order = 0; order < MAX_ORDER; order++) \
> > > > +		for (type = 0; type < RCLM_TYPES; type++)
> > > > +
> > > >  struct free_area {
> > > > -	struct list_head	free_list;
> > > > +	struct list_head	free_list[RCLM_TYPES];
> > > >  	unsigned long		nr_free;
> > > >  };
> > > >
> > > > diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h
> > > > --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/page-flags.h	2006-01-19 11:21:59.000000000 +0000
> > > > +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/page-flags.h	2006-01-19 21:51:05.000000000 +0000
> > > > @@ -76,6 +76,7 @@
> > > >  #define PG_reclaim		17	/* To be reclaimed asap */
> > > >  #define PG_nosave_free		18	/* Free, should not be written */
> > > >  #define PG_uncached		19	/* Page has been mapped as uncached */
> > > > +#define PG_easyrclm		20	/* Page is in an easy reclaim block */
> > > >
> > > >  /*
> > > >   * Global page accounting.  One instance per CPU.  Only unsigned longs are
> > > > @@ -345,6 +346,12 @@ extern void __mod_page_state_offset(unsi
> > > >  #define SetPageUncached(page)	set_bit(PG_uncached, &(page)->flags)
> > > >  #define ClearPageUncached(page)	clear_bit(PG_uncached, &(page)->flags)
> > > >
> > > > +#define PageEasyRclm(page)	test_bit(PG_easyrclm, &(page)->flags)
> > > > +#define SetPageEasyRclm(page)	set_bit(PG_easyrclm, &(page)->flags)
> > > > +#define ClearPageEasyRclm(page)	clear_bit(PG_easyrclm, &(page)->flags)
> > > > +#define __SetPageEasyRclm(page)	__set_bit(PG_easyrclm, &(page)->flags)
> > > > +#define __ClearPageEasyRclm(page) __clear_bit(PG_easyrclm, &(page)->flags)
> > > > +
> > >
> > > You can't read/write to page->flags non-atomically, except when you
> > > guarantee that the page is not visible to other CPU's (eg at the very
> > > end of the page freeing code).
> > >
> >
> > The helper PageEasyRclm is only used when either the spinlock is held or a
> > per-cpu page is being released so it should be safe. The Set and Clear
> > helpers are only used with a spinlock held.
>
> Mel,
>
> Other codepaths which touch page->flags do not hold any lock, so you
> really must use atomic operations, except when you've guarantee that the
> page is being freed and won't be reused.
>

Understood, so I took another look to be sure;

PageEasyRclm() is used on pages that are about to be freed to the main
or per-cpu allocator so it should be safe.

__SetPageEasyRclm is called when the page is about to be freed. It should
be safe from concurrent access.

__ClearPageEasyRclm is called when the page is about to be allocated. It
should be safe.

I think it is guaranteed that there are on concurrent accessing of the
page flags. Is there something I have missed?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Split the free lists into kernel and user parts
  2006-01-26 15:55         ` Mel Gorman
@ 2006-01-31 19:57           ` Marcelo Tosatti
  0 siblings, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2006-01-31 19:57 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, jschopp, kamezawa.hiroyu, lhms-devel

> > Other codepaths which touch page->flags do not hold any lock, so you
> > really must use atomic operations, except when you've guarantee that the
> > page is being freed and won't be reused.
> >
> 
> Understood, so I took another look to be sure;
> 
> PageEasyRclm() is used on pages that are about to be freed to the main
> or per-cpu allocator so it should be safe.
> 
> __SetPageEasyRclm is called when the page is about to be freed. It should
> be safe from concurrent access.
> 
> __ClearPageEasyRclm is called when the page is about to be allocated. It
> should be safe.
> 
> I think it is guaranteed that there are on concurrent accessing of the
> page flags. Is there something I have missed?

Nope, you are right.

The usage is safe.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Split the free lists into kernel and user parts
  2006-01-20 11:54 ` [PATCH 2/4] Split the free lists into kernel and user parts Mel Gorman
  2006-01-22 13:31   ` Marcelo Tosatti
@ 2006-02-05  8:57   ` Coywolf Qi Hunt
  2006-02-05  9:12     ` Coywolf Qi Hunt
  1 sibling, 1 reply; 12+ messages in thread
From: Coywolf Qi Hunt @ 2006-02-05  8:57 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, jschopp, linux-kernel, kamezawa.hiroyu, lhms-devel

2006/1/20, Mel Gorman <mel@csn.ul.ie>:
>
> This patch adds the core of the anti-fragmentation strategy. It works by
> grouping related allocation types together. The idea is that large groups of
> pages that may be reclaimed are placed near each other. The zone->free_area
> list is broken into RCLM_TYPES number of lists.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
> diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h
> --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h      2006-01-19 11:21:59.000000000 +0000
> +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h    2006-01-19 21:51:05.000000000 +0000
> @@ -22,8 +22,16 @@
>  #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
>  #endif
>
> +#define RCLM_NORCLM 0

better be RCLM_NORMAL

> +#define RCLM_EASY   1
> +#define RCLM_TYPES  2
> +
> +#define for_each_rclmtype_order(type, order) \
> +       for (order = 0; order < MAX_ORDER; order++) \
> +               for (type = 0; type < RCLM_TYPES; type++)
> +
>  struct free_area {
> -       struct list_head        free_list;
> +       struct list_head        free_list[RCLM_TYPES];
>         unsigned long           nr_free;
>  };
>


--
Coywolf Qi Hunt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Split the free lists into kernel and user parts
  2006-02-05  8:57   ` Coywolf Qi Hunt
@ 2006-02-05  9:12     ` Coywolf Qi Hunt
  0 siblings, 0 replies; 12+ messages in thread
From: Coywolf Qi Hunt @ 2006-02-05  9:12 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, jschopp, linux-kernel, kamezawa.hiroyu, lhms-devel

2006/2/5, Coywolf Qi Hunt <coywolf@gmail.com>:
> 2006/1/20, Mel Gorman <mel@csn.ul.ie>:
> >
> > This patch adds the core of the anti-fragmentation strategy. It works by
> > grouping related allocation types together. The idea is that large groups of
> > pages that may be reclaimed are placed near each other. The zone->free_area
> > list is broken into RCLM_TYPES number of lists.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
> > diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h
> > --- linux-2.6.16-rc1-mm1-001_antifrag_flags/include/linux/mmzone.h      2006-01-19 11:21:59.000000000 +0000
> > +++ linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h    2006-01-19 21:51:05.000000000 +0000
> > @@ -22,8 +22,16 @@
> >  #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
> >  #endif
> >
> > +#define RCLM_NORCLM 0
>
> better be RCLM_NORMAL

err, RCLM_NONRCLM, or RCLM_NONE

>
> > +#define RCLM_EASY   1
> > +#define RCLM_TYPES  2
> > +
> > +#define for_each_rclmtype_order(type, order) \
> > +       for (order = 0; order < MAX_ORDER; order++) \
> > +               for (type = 0; type < RCLM_TYPES; type++)
> > +
> >  struct free_area {
> > -       struct list_head        free_list;
> > +       struct list_head        free_list[RCLM_TYPES];
> >         unsigned long           nr_free;
> >  };

--
Coywolf Qi Hunt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/4] Split the per-cpu lists into kernel and user parts
  2006-01-20 11:54 [PATCH 0/4] Reducing fragmentation using lists (sub-zones) v22 Mel Gorman
  2006-01-20 11:54 ` [PATCH 1/4] Add __GFP_EASYRCLM flag and update callers Mel Gorman
  2006-01-20 11:54 ` [PATCH 2/4] Split the free lists into kernel and user parts Mel Gorman
@ 2006-01-20 11:55 ` Mel Gorman
  2006-01-20 11:55 ` [PATCH 4/4] Add a configure option for anti-fragmentation Mel Gorman
  3 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2006-01-20 11:55 UTC (permalink / raw)
  To: linux-mm; +Cc: jschopp, Mel Gorman, linux-kernel, kamezawa.hiroyu, lhms-devel

The freelists for each allocation type can slowly become corrupted due to
the per-cpu list. Consider what happens when the following happens

1. A 2^(MAX_ORDER-1) list is reserved for __GFP_EASYRCLM pages
2. An order-0 page is allocated from the newly reserved block
3. The page is freed and placed on the per-cpu list
4. alloc_page() is called with GFP_KERNEL as the gfp_mask
5. The per-cpu list is used to satisfy the allocation

This results in a kernel page is in the middle of a RCLM_EASY region. This
means that over long periods of the time, the anti-fragmentation scheme
slowly degrades to the standard allocator.

This patch divides the per-cpu lists into RCLM_TYPES number of lists.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h linux-2.6.16-rc1-mm1-003_percpu/include/linux/mmzone.h
--- linux-2.6.16-rc1-mm1-002_fragcore/include/linux/mmzone.h	2006-01-19 21:51:05.000000000 +0000
+++ linux-2.6.16-rc1-mm1-003_percpu/include/linux/mmzone.h	2006-01-19 22:15:16.000000000 +0000
@@ -26,6 +26,8 @@
 #define RCLM_EASY   1
 #define RCLM_TYPES  2
 
+#define for_each_rclmtype(type) \
+	for (type = 0; type < RCLM_TYPES; type++)
 #define for_each_rclmtype_order(type, order) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < RCLM_TYPES; type++)
@@ -53,10 +55,10 @@ struct zone_padding {
 #endif
 
 struct per_cpu_pages {
-	int count;		/* number of pages in the list */
+	int count[RCLM_TYPES];	/* number of pages in the list */
 	int high;		/* high watermark, emptying needed */
 	int batch;		/* chunk size for buddy add/remove */
-	struct list_head list;	/* the list of pages */
+	struct list_head list[RCLM_TYPES];	/* the list of pages */
 };
 
 struct per_cpu_pageset {
@@ -71,6 +73,11 @@ struct per_cpu_pageset {
 #endif
 } ____cacheline_aligned_in_smp;
 
+static inline int pcp_count(struct per_cpu_pages *pcp)
+{
+	return pcp->count[RCLM_NORCLM] + pcp->count[RCLM_EASY];
+}
+
 #ifdef CONFIG_NUMA
 #define zone_pcp(__z, __cpu) ((__z)->pageset[(__cpu)])
 #else
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-002_fragcore/mm/page_alloc.c linux-2.6.16-rc1-mm1-003_percpu/mm/page_alloc.c
--- linux-2.6.16-rc1-mm1-002_fragcore/mm/page_alloc.c	2006-01-19 22:12:09.000000000 +0000
+++ linux-2.6.16-rc1-mm1-003_percpu/mm/page_alloc.c	2006-01-19 22:26:45.000000000 +0000
@@ -663,7 +663,7 @@ static int rmqueue_bulk(struct zone *zon
 void drain_remote_pages(void)
 {
 	struct zone *zone;
-	int i;
+	int i, pindex;
 	unsigned long flags;
 
 	local_irq_save(flags);
@@ -679,8 +679,12 @@ void drain_remote_pages(void)
 			struct per_cpu_pages *pcp;
 
 			pcp = &pset->pcp[i];
-			free_pages_bulk(zone, pcp->count, &pcp->list, 0);
-			pcp->count = 0;
+			for_each_rclmtype(pindex) {
+				free_pages_bulk(zone,
+						pcp->count[pindex],
+						&pcp->list[pindex], 0);
+				pcp->count[pindex] = 0;
+			}
 		}
 	}
 	local_irq_restore(flags);
@@ -692,7 +696,7 @@ static void __drain_pages(unsigned int c
 {
 	unsigned long flags;
 	struct zone *zone;
-	int i;
+	int i, pindex;
 
 	for_each_zone(zone) {
 		struct per_cpu_pageset *pset;
@@ -703,8 +707,13 @@ static void __drain_pages(unsigned int c
 
 			pcp = &pset->pcp[i];
 			local_irq_save(flags);
-			free_pages_bulk(zone, pcp->count, &pcp->list, 0);
-			pcp->count = 0;
+			for_each_rclmtype(pindex) {
+				free_pages_bulk(zone,
+						pcp->count[pindex],
+						&pcp->list[pindex], 0);
+
+				pcp->count[pindex] = 0;
+			}
 			local_irq_restore(flags);
 		}
 	}
@@ -780,6 +789,7 @@ static void zone_statistics(struct zonel
 static void fastcall free_hot_cold_page(struct page *page, int cold)
 {
 	struct zone *zone = page_zone(page);
+	int pindex = get_pageblock_type(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
 
@@ -795,11 +805,11 @@ static void fastcall free_hot_cold_page(
 	pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
 	local_irq_save(flags);
 	__inc_page_state(pgfree);
-	list_add(&page->lru, &pcp->list);
-	pcp->count++;
-	if (pcp->count >= pcp->high) {
-		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
-		pcp->count -= pcp->batch;
+	list_add(&page->lru, &pcp->list[pindex]);
+	pcp->count[pindex]++;
+	if (pcp->count[pindex] >= pcp->high) {
+		free_pages_bulk(zone, pcp->batch, &pcp->list[pindex], 0);
+		pcp->count[pindex] -= pcp->batch;
 	}
 	local_irq_restore(flags);
 	put_cpu();
@@ -845,16 +855,17 @@ again:
 
 		pcp = &zone_pcp(zone, cpu)->pcp[cold];
 		local_irq_save(flags);
-		if (!pcp->count) {
-			pcp->count += rmqueue_bulk(zone, 0,
-						pcp->batch, &pcp->list,
+		if (!pcp->count[alloctype]) {
+			pcp->count[alloctype] += rmqueue_bulk(zone, 0,
+						pcp->batch,
+						&pcp->list[alloctype],
 						alloctype);
 			if (unlikely(!pcp->count))
 				goto failed;
 		}
-		page = list_entry(pcp->list.next, struct page, lru);
+		page = list_entry(pcp->list[alloctype].next, struct page, lru);
 		list_del(&page->lru);
-		pcp->count--;
+		pcp->count[alloctype]--;
 	} else {
 		spin_lock_irqsave(&zone->lock, flags);
 		page = __rmqueue(zone, order, alloctype);
@@ -1534,7 +1545,7 @@ void show_free_areas(void)
 					temperature ? "cold" : "hot",
 					pageset->pcp[temperature].high,
 					pageset->pcp[temperature].batch,
-					pageset->pcp[temperature].count);
+					pcp_count(&pageset->pcp[temperature]));
 		}
 	}
 
@@ -1978,16 +1989,20 @@ inline void setup_pageset(struct per_cpu
 	memset(p, 0, sizeof(*p));
 
 	pcp = &p->pcp[0];		/* hot */
-	pcp->count = 0;
+	pcp->count[RCLM_NORCLM] = 0;
+	pcp->count[RCLM_EASY] = 0;
 	pcp->high = 6 * batch;
 	pcp->batch = max(1UL, 1 * batch);
-	INIT_LIST_HEAD(&pcp->list);
+	INIT_LIST_HEAD(&pcp->list[RCLM_NORCLM]);
+	INIT_LIST_HEAD(&pcp->list[RCLM_EASY]);
 
 	pcp = &p->pcp[1];		/* cold*/
-	pcp->count = 0;
+	pcp->count[RCLM_NORCLM] = 0;
+	pcp->count[RCLM_EASY] = 0;
 	pcp->high = 2 * batch;
 	pcp->batch = max(1UL, batch/2);
-	INIT_LIST_HEAD(&pcp->list);
+	INIT_LIST_HEAD(&pcp->list[RCLM_NORCLM]);
+	INIT_LIST_HEAD(&pcp->list[RCLM_EASY]);
 }
 
 /*
@@ -2403,7 +2418,7 @@ static int zoneinfo_show(struct seq_file
 					   "\n              high:  %i"
 					   "\n              batch: %i",
 					   i, j,
-					   pageset->pcp[j].count,
+					   pcp_count(&pageset->pcp[j]),
 					   pageset->pcp[j].high,
 					   pageset->pcp[j].batch);
 			}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/4] Add a configure option for anti-fragmentation
  2006-01-20 11:54 [PATCH 0/4] Reducing fragmentation using lists (sub-zones) v22 Mel Gorman
                   ` (2 preceding siblings ...)
  2006-01-20 11:55 ` [PATCH 3/4] Split the per-cpu " Mel Gorman
@ 2006-01-20 11:55 ` Mel Gorman
  3 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2006-01-20 11:55 UTC (permalink / raw)
  To: linux-mm; +Cc: jschopp, Mel Gorman, linux-kernel, kamezawa.hiroyu, lhms-devel

The anti-fragmentation strategy has memory overhead. This patch allows
the strategy to be disabled for small memory systems or if it is known the
workload is suffering because of the strategy. It also acts to show where
the anti-frag strategy interacts with the standard buddy allocator.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-003_percpu/include/linux/mmzone.h linux-2.6.16-rc1-mm1-004_configurable/include/linux/mmzone.h
--- linux-2.6.16-rc1-mm1-003_percpu/include/linux/mmzone.h	2006-01-19 22:15:16.000000000 +0000
+++ linux-2.6.16-rc1-mm1-004_configurable/include/linux/mmzone.h	2006-01-19 22:27:50.000000000 +0000
@@ -73,10 +73,17 @@ struct per_cpu_pageset {
 #endif
 } ____cacheline_aligned_in_smp;
 
+#ifdef CONFIG_PAGEALLOC_ANTIFRAG
 static inline int pcp_count(struct per_cpu_pages *pcp)
 {
 	return pcp->count[RCLM_NORCLM] + pcp->count[RCLM_EASY];
 }
+#else
+static inline int pcp_count(struct per_cpu_pages *pcp)
+{
+	return pcp->count[RCLM_NORCLM];
+}
+#endif /* CONFIG_PAGEALLOC_ANTIFRAG */
 
 #ifdef CONFIG_NUMA
 #define zone_pcp(__z, __cpu) ((__z)->pageset[(__cpu)])
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-003_percpu/init/Kconfig linux-2.6.16-rc1-mm1-004_configurable/init/Kconfig
--- linux-2.6.16-rc1-mm1-003_percpu/init/Kconfig	2006-01-19 11:21:59.000000000 +0000
+++ linux-2.6.16-rc1-mm1-004_configurable/init/Kconfig	2006-01-19 22:27:50.000000000 +0000
@@ -376,6 +376,18 @@ config CC_ALIGN_FUNCTIONS
 	  32-byte boundary only if this can be done by skipping 23 bytes or less.
 	  Zero means use compiler's default.
 
+config PAGEALLOC_ANTIFRAG
+	bool "Avoid fragmentation in the page allocator"
+	def_bool n
+	help
+	  The standard allocator will fragment memory over time which means that
+	  high order allocations will fail even if kswapd is running. If this
+	  option is set, the allocator will try and group page types into
+	  two groups, kernel and easy reclaimable. The gain is a best effort
+	  attempt at lowering fragmentation which a few workloads care about.
+	  The loss is a more complex allocactor that performs slower.
+	  If unsure, say N
+
 config CC_ALIGN_LABELS
 	int "Label alignment" if EMBEDDED
 	default 0
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.16-rc1-mm1-003_percpu/mm/page_alloc.c linux-2.6.16-rc1-mm1-004_configurable/mm/page_alloc.c
--- linux-2.6.16-rc1-mm1-003_percpu/mm/page_alloc.c	2006-01-19 22:26:45.000000000 +0000
+++ linux-2.6.16-rc1-mm1-004_configurable/mm/page_alloc.c	2006-01-19 22:27:50.000000000 +0000
@@ -72,6 +72,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_Z
 
 EXPORT_SYMBOL(totalram_pages);
 
+#ifdef CONFIG_PAGEALLOC_ANTIFRAG
 static inline int get_pageblock_type(struct page *page)
 {
 	return (PageEasyRclm(page) != 0);
@@ -81,6 +82,17 @@ static inline int gfpflags_to_alloctype(
 {
 	return ((gfp_flags & __GFP_EASYRCLM) != 0);
 }
+#else
+static inline int get_pageblock_type(struct page *page)
+{
+	return RCLM_NORCLM;
+}
+
+static inline int gfpflags_to_alloctype(unsigned long gfp_flags)
+{
+	return RCLM_NORCLM;
+}
+#endif /* CONFIG_PAGEALLOC_ANTIFRAG */
 
 /*
  * Used by page_zone() to look up the address of the struct zone whose
@@ -563,6 +575,7 @@ static int prep_new_page(struct page *pa
 	return 0;
 }
 
+#ifdef CONFIG_PAGEALLOC_ANTIFRAG
 /* Remove an element from the buddy allocator from the fallback list */
  static struct page *__rmqueue_fallback(struct zone *zone, int order,
 							int alloctype)
@@ -599,6 +612,13 @@ static int prep_new_page(struct page *pa
 
 	return NULL;
 }
+#else
+static struct page *__rmqueue_fallback(struct zone *zone, unsigned int order,
+							int alloctype)
+{
+	return NULL;
+}
+#endif /* CONFIG_PAGEALLOC_ANTIFRAG */
 
 /* 
  * Do the hard work of removing an element from the buddy allocator.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-02-05  9:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-20 11:54 [PATCH 0/4] Reducing fragmentation using lists (sub-zones) v22 Mel Gorman
2006-01-20 11:54 ` [PATCH 1/4] Add __GFP_EASYRCLM flag and update callers Mel Gorman
2006-01-20 11:54 ` [PATCH 2/4] Split the free lists into kernel and user parts Mel Gorman
2006-01-22 13:31   ` Marcelo Tosatti
2006-01-23  9:39     ` Mel Gorman
2006-01-23 19:13       ` Marcelo Tosatti
2006-01-26 15:55         ` Mel Gorman
2006-01-31 19:57           ` Marcelo Tosatti
2006-02-05  8:57   ` Coywolf Qi Hunt
2006-02-05  9:12     ` Coywolf Qi Hunt
2006-01-20 11:55 ` [PATCH 3/4] Split the per-cpu " Mel Gorman
2006-01-20 11:55 ` [PATCH 4/4] Add a configure option for anti-fragmentation Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox