linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Fragmentation Avoidance V16
@ 2005-10-05 14:45 Mel Gorman
  2005-10-05 14:45 ` [PATCH 1/7] Fragmentation Avoidance V16: 001_antidefrag_flags Mel Gorman
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 14:45 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Mel Gorman, kravetz, linux-kernel, jschopp, lhms-devel

Changelog since V14 (V15 not released)
o Update against 2.6.14-rc3
o Resync with Joel's work. All suggestions made on fix-ups to his last
  set of patches should also be in here. e.g. __GFP_USER is still __GFP_USER
  but is better commented.
o Large amount of CodingStyle, readability cleanups and corrections pointed
  out by Dave Hansen.
o Fix CONFIG_NUMA error that corrupted per-cpu lists
o Patches broken out to have one-feature-per-patch rather than
  more-code-per-patch
o Fix fallback bug where pages for RCLM_NORCLM end up on random other
  free lists.

Changelog since V13
o Patches are now broken out
o Added per-cpu draining of userrclm pages
o Brought the patch more in line with memory hotplug work
o Fine-grained use of the __GFP_USER and __GFP_KERNRCLM flags
o Many coding-style corrections
o Many whitespace-damage corrections

Changelog since V12
o Minor whitespace damage fixed as pointed by Joel Schopp

Changelog since V11
o Mainly a redefiff against 2.6.12-rc5
o Use #defines for indexing into pcpu lists
o Fix rounding error in the size of usemap

Changelog since V10
o All allocation types now use per-cpu caches like the standard allocator
o Removed all the additional buddy allocator statistic code
o Elimated three zone fields that can be lived without
o Simplified some loops
o Removed many unnecessary calculations

Changelog since V9
o Tightened what pools are used for fallbacks, less likely to fragment
o Many micro-optimisations to have the same performance as the standard 
  allocator. Modified allocator now faster than standard allocator using
  gcc 3.3.5
o Add counter for splits/coalescing

Changelog since V8
o rmqueue_bulk() allocates pages in large blocks and breaks it up into the
  requested size. Reduces the number of calls to __rmqueue()
o Beancounters are now a configurable option under "Kernel Hacking"
o Broke out some code into inline functions to be more Hotplug-friendly
o Increased the size of reserve for fallbacks from 10% to 12.5%. 

Changelog since V7
o Updated to 2.6.11-rc4
o Lots of cleanups, mainly related to beancounters
o Fixed up a miscalculation in the bitmap size as pointed out by Mike Kravetz
  (thanks Mike)
o Introduced a 10% reserve for fallbacks. Drastically reduces the number of
  kernnorclm allocations that go to the wrong places
o Don't trigger OOM when large allocations are involved

Changelog since V6
o Updated to 2.6.11-rc2
o Minor change to allow prezeroing to be a cleaner looking patch

Changelog since V5
o Fixed up gcc-2.95 errors
o Fixed up whitespace damage

Changelog since V4
o No changes. Applies cleanly against 2.6.11-rc1 and 2.6.11-rc1-bk6. Applies
  with offsets to 2.6.11-rc1-mm1

Changelog since V3
o inlined get_pageblock_type() and set_pageblock_type()
o set_pageblock_type() now takes a zone parameter to avoid a call to page_zone()
o When taking from the global pool, do not scan all the low-order lists

Changelog since V2
o Do not to interfere with the "min" decay
o Update the __GFP_BITS_SHIFT properly. Old value broke fsync and probably
  anything to do with asynchronous IO
  
Changelog since V1
o Update patch to 2.6.11-rc1
o Cleaned up bug where memory was wasted on a large bitmap
o Remove code that needed the binary buddy bitmaps
o Update flags to avoid colliding with __GFP_ZERO changes
o Extended fallback_count bean counters to show the fallback count for each
  allocation type
o In-code documentation

Version 1
o Initial release against 2.6.9

This patch is designed to reduce fragmentation in the standard buddy allocator
without impairing the performance of the allocator. High fragmentation in
the standard binary buddy allocator means that high-order allocations can
rarely be serviced. This patch works by dividing allocations into three
different types of allocations;

UserReclaimable - These are userspace pages that are easily reclaimable. This
	flag is set when it is known that the pages will be trivially reclaimed
	by writing the page out to swap or syncing with backing storage

KernelReclaimable - These are pages allocated by the kernel that are easily
	reclaimed. This is stuff like inode caches, dcache, buffer_heads etc.
	These type of pages potentially could be reclaimed by dumping the
	caches and reaping the slabs

KernelNonReclaimable - These are pages that are allocated by the kernel that
	are not trivially reclaimed. For example, the memory allocated for a
	loaded module would be in this category. By default, allocations are
	considered to be of this type

Instead of having one global MAX_ORDER-sized array of free lists,
there are four, one for each type of allocation and another reserve for
fallbacks. 

Once a 2^MAX_ORDER block of pages it split for a type of allocation, it is
added to the free-lists for that type, in effect reserving it. Hence, over
time, pages of the different types can be clustered together. This means that
if 2^MAX_ORDER number of pages were required, the system could linearly scan
a block of pages allocated for UserReclaimable and page each of them out.

Fallback is used when there are no 2^MAX_ORDER pages available and there
are no free pages of the desired type. The fallback lists were chosen in a
way that keeps the most easily reclaimable pages together.

Three benchmark results are included all based on a 2.6.14-rc3 kernel
compiled with gcc 3.4 (it is known that gcc 2.95 produces different results).
The first is the output of portions of AIM9 for the vanilla allocator and
the modified one;

(Tests run with bench-aim9.sh from VMRegress 0.14)
2.6.14-rc3
------------------------------------------------------------------------------------------------------------
 Test        Test        Elapsed  Iteration    Iteration          Operation
Number       Name      Time (sec)   Count   Rate (loops/sec)    Rate (ops/sec)
------------------------------------------------------------------------------------------------------------
     1 creat-clo           60.04        963   16.03931        16039.31 File Creations and Closes/second
     2 page_test           60.01       4194   69.88835       118810.20 System Allocations & Pages/second
     3 brk_test            60.03       1573   26.20356       445460.60 System Memory Allocations/second
     4 jmp_test            60.01     251144 4185.03583      4185035.83 Non-local gotos/second
     5 signal_test         60.02       5118   85.27158        85271.58 Signal Traps/second
     6 exec_test           60.03        758   12.62702           63.14 Program Loads/second
     7 fork_test           60.03        820   13.65984         1365.98 Task Creations/second
     8 link_test           60.02       5326   88.73709         5590.44 Link/Unlink Pairs/second

2.6.14-rc3-mbuddy-v16
------------------------------------------------------------------------------------------------------------
 Test        Test        Elapsed  Iteration    Iteration          Operation
Number       Name      Time (sec)   Count   Rate (loops/sec)    Rate (ops/sec)
------------------------------------------------------------------------------------------------------------
     1 creat-clo           60.06        959   15.96737        15967.37 File Creations and Closes/second
     2 page_test           60.02       4067   67.76075       115193.27 System Allocations & Pages/second
     3 brk_test            60.01       1578   26.29562       447025.50 System Memory Allocations/second
     4 jmp_test            60.01     251498 4190.93484      4190934.84 Non-local gotos/second
     5 signal_test         60.01       5127   85.43576        85435.76 Signal Traps/second
     6 exec_test           60.03        753   12.54373           62.72 Program Loads/second
     7 fork_test           60.02        802   13.36221         1336.22 Task Creations/second
     8 link_test           60.01       5307   88.43526         5571.42 Link/Unlink Pairs/second

Difference in performance operations report generated by diff-aim9.sh
                   Clean   mbuddy-v16
                ---------- ----------
 1 creat-clo      16039.31   15967.37     -71.94 -0.45% File Creations and Closes/second
 2 page_test     118810.20  115193.27   -3616.93 -3.04% System Allocations & Pages/second
 3 brk_test      445460.60  447025.50    1564.90  0.35% System Memory Allocations/second
 4 jmp_test     4185035.83 4190934.84    5899.01  0.14% Non-local gotos/second
 5 signal_test    85271.58   85435.76     164.18  0.19% Signal Traps/second
 6 exec_test         63.14      62.72      -0.42 -0.67% Program Loads/second
 7 fork_test       1365.98    1336.22     -29.76 -2.18% Task Creations/second
 8 link_test       5590.44    5571.42     -19.02 -0.34% Link/Unlink Pairs/second

In this test, there were regressions in the page_test. However, it is known
that different kernel configurations, compilers and even different runs show
similar varianes of +/- 3% . I do not consider it significant.

The second benchmark tested the CPU cache usage to make sure it was not
getting clobbered. The test was to repeatedly render a large postscript file
10 times and get the average. The result is;

2.6.13-clean:      Average: 42.725 real, 42.626 user, 0.041 sys
2.6.13-mbuddy-v14: Average: 42.793 real, 42.695 user, 0.034 sys

So there are no adverse cache effects. The last test is to show that the
allocator can satisfy more high-order allocations, especially under load,
than the standard allocator. The test performs the following;

1. Start updatedb running in the background
2. Load kernel modules that tries to allocate high-order blocks on demand
3. Clean a kernel tree
4. Make 4 copies of the tree. As each copy finishes, a compile starts at -j4
5. Start compiling the primary tree
6. Sleep 3 minutes while the 5 trees are being compiled
7. Use the kernel module to attempt 160 times to allocate a 2^10 block of pages
    - note, it only attempts 160 times, no matter how often it succeeds
    - An allocation is attempted every 1/10th of a second
    - Performance will get badly shot as it forces consider amounts of pageout

The result of the allocations under load (load averaging 18) were;

2.6.14-rc3 Clean
Order:                 10
Allocation type:       HighMem
Attempted allocations: 160
Success allocs:        46
Failed allocs:         114
DMA zone allocs:       1
Normal zone allocs:    32
HighMem zone allocs:   13
% Success:            28

2.6.14-rc3 MBuddy V14
Order:                 10
Allocation type:       HighMem
Attempted allocations: 160
Success allocs:        97
Failed allocs:         63
DMA zone allocs:       1
Normal zone allocs:    89
HighMem zone allocs:   7
% Success:            60


One thing that had to be changed in the 2.6.14-rc3-clean test was to disable
the OOM killer. During one test, the OOM killer had better results but invoked
the OOM killer 457 times to achieve it. The patch with the placement policy
never invoked the OOM killer.

When the system is at rest after the test and the kernel trees deleted, the
standard allocator was able to allocate 54 order-10 pages and the modified
allocator allocated 159.

The results show that the modified allocator has comparable speed has
no adverse cache effects but is far less fragmented and able to satisfy
high-order allocations.
-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/7] Fragmentation Avoidance V16: 001_antidefrag_flags
  2005-10-05 14:45 [PATCH 0/7] Fragmentation Avoidance V16 Mel Gorman
@ 2005-10-05 14:45 ` Mel Gorman
  2005-10-06 15:11   ` Paul Jackson
  2005-10-05 14:45 ` [PATCH 2/7] Fragmentation Avoidance V16: 002_usemap Mel Gorman
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 14:45 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Mel Gorman, kravetz, linux-kernel, jschopp, lhms-devel

This patch adds two flags __GFP_USER and __GFP_KERNRCLM that are used to trap
the type of allocation the caller is made. Allocations using the __GFP_USER
flag are expected to be easily reclaimed by syncing with backing storage (be
it a file or swap) or cleaning the buffers and discarding. Allocations using
the __GFP_KERNRCLM flag belong to slab caches that can be shrunk by the kernel.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/fs/buffer.c linux-2.6.14-rc3-001_antidefrag_flags/fs/buffer.c
--- linux-2.6.14-rc3-clean/fs/buffer.c	2005-10-04 22:58:33.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/fs/buffer.c	2005-10-05 12:13:19.000000000 +0100
@@ -1119,7 +1119,12 @@ grow_dev_page(struct block_device *bdev,
 	struct page *page;
 	struct buffer_head *bh;
 
-	page = find_or_create_page(inode->i_mapping, index, GFP_NOFS);
+	/*
+	 * Mark as __GFP_USER because from a fragmentation avoidance and
+	 * reclamation point of view this memory behaves like user memory.
+	 */
+	page = find_or_create_page(inode->i_mapping, index,
+				   GFP_NOFS | __GFP_USER);
 	if (!page)
 		return NULL;
 
@@ -3047,7 +3052,8 @@ static void recalc_bh_state(void)
 	
 struct buffer_head *alloc_buffer_head(unsigned int __nocast gfp_flags)
 {
-	struct buffer_head *ret = kmem_cache_alloc(bh_cachep, gfp_flags);
+	struct buffer_head *ret = kmem_cache_alloc(bh_cachep,
+						   gfp_flags|__GFP_KERNRCLM);
 	if (ret) {
 		get_cpu_var(bh_accounting).nr++;
 		recalc_bh_state();
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/fs/compat.c linux-2.6.14-rc3-001_antidefrag_flags/fs/compat.c
--- linux-2.6.14-rc3-clean/fs/compat.c	2005-10-04 22:58:33.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/fs/compat.c	2005-10-05 12:13:19.000000000 +0100
@@ -1352,7 +1352,7 @@ static int compat_copy_strings(int argc,
 			page = bprm->page[i];
 			new = 0;
 			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
+				page = alloc_page(GFP_HIGHUSER|__GFP_USER);
 				bprm->page[i] = page;
 				if (!page) {
 					ret = -ENOMEM;
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/fs/dcache.c linux-2.6.14-rc3-001_antidefrag_flags/fs/dcache.c
--- linux-2.6.14-rc3-clean/fs/dcache.c	2005-10-04 22:58:33.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/fs/dcache.c	2005-10-05 12:13:19.000000000 +0100
@@ -714,7 +714,7 @@ struct dentry *d_alloc(struct dentry * p
 	struct dentry *dentry;
 	char *dname;
 
-	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL); 
+	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL|__GFP_KERNRCLM);
 	if (!dentry)
 		return NULL;
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/fs/exec.c linux-2.6.14-rc3-001_antidefrag_flags/fs/exec.c
--- linux-2.6.14-rc3-clean/fs/exec.c	2005-10-04 22:58:33.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/fs/exec.c	2005-10-05 12:13:19.000000000 +0100
@@ -237,7 +237,7 @@ static int copy_strings(int argc, char _
 			page = bprm->page[i];
 			new = 0;
 			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
+				page = alloc_page(GFP_HIGHUSER|__GFP_USER);
 				bprm->page[i] = page;
 				if (!page) {
 					ret = -ENOMEM;
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/fs/ext2/super.c linux-2.6.14-rc3-001_antidefrag_flags/fs/ext2/super.c
--- linux-2.6.14-rc3-clean/fs/ext2/super.c	2005-10-04 22:58:33.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/fs/ext2/super.c	2005-10-05 12:13:19.000000000 +0100
@@ -141,7 +141,8 @@ static kmem_cache_t * ext2_inode_cachep;
 static struct inode *ext2_alloc_inode(struct super_block *sb)
 {
 	struct ext2_inode_info *ei;
-	ei = (struct ext2_inode_info *)kmem_cache_alloc(ext2_inode_cachep, SLAB_KERNEL);
+	ei = (struct ext2_inode_info *)kmem_cache_alloc(ext2_inode_cachep,
+						SLAB_KERNEL|__GFP_KERNRCLM);
 	if (!ei)
 		return NULL;
 #ifdef CONFIG_EXT2_FS_POSIX_ACL
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/fs/ext3/super.c linux-2.6.14-rc3-001_antidefrag_flags/fs/ext3/super.c
--- linux-2.6.14-rc3-clean/fs/ext3/super.c	2005-10-04 22:58:33.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/fs/ext3/super.c	2005-10-05 12:13:19.000000000 +0100
@@ -441,7 +441,7 @@ static struct inode *ext3_alloc_inode(st
 {
 	struct ext3_inode_info *ei;
 
-	ei = kmem_cache_alloc(ext3_inode_cachep, SLAB_NOFS);
+	ei = kmem_cache_alloc(ext3_inode_cachep, SLAB_NOFS|__GFP_KERNRCLM);
 	if (!ei)
 		return NULL;
 #ifdef CONFIG_EXT3_FS_POSIX_ACL
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/fs/inode.c linux-2.6.14-rc3-001_antidefrag_flags/fs/inode.c
--- linux-2.6.14-rc3-clean/fs/inode.c	2005-10-04 22:58:33.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/fs/inode.c	2005-10-05 12:13:19.000000000 +0100
@@ -146,7 +146,7 @@ static struct inode *alloc_inode(struct 
 		mapping->a_ops = &empty_aops;
  		mapping->host = inode;
 		mapping->flags = 0;
-		mapping_set_gfp_mask(mapping, GFP_HIGHUSER);
+		mapping_set_gfp_mask(mapping, GFP_HIGHUSER|__GFP_USER);
 		mapping->assoc_mapping = NULL;
 		mapping->backing_dev_info = &default_backing_dev_info;
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/fs/ntfs/inode.c linux-2.6.14-rc3-001_antidefrag_flags/fs/ntfs/inode.c
--- linux-2.6.14-rc3-clean/fs/ntfs/inode.c	2005-10-04 22:58:33.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/fs/ntfs/inode.c	2005-10-05 12:13:19.000000000 +0100
@@ -317,7 +317,7 @@ struct inode *ntfs_alloc_big_inode(struc
 	ntfs_inode *ni;
 
 	ntfs_debug("Entering.");
-	ni = kmem_cache_alloc(ntfs_big_inode_cache, SLAB_NOFS);
+	ni = kmem_cache_alloc(ntfs_big_inode_cache, SLAB_NOFS|__GFP_KERNRCLM);
 	if (likely(ni != NULL)) {
 		ni->state = 0;
 		return VFS_I(ni);
@@ -342,7 +342,7 @@ static inline ntfs_inode *ntfs_alloc_ext
 	ntfs_inode *ni;
 
 	ntfs_debug("Entering.");
-	ni = kmem_cache_alloc(ntfs_inode_cache, SLAB_NOFS);
+	ni = kmem_cache_alloc(ntfs_inode_cache, SLAB_NOFS|__GFP_KERNRCLM);
 	if (likely(ni != NULL)) {
 		ni->state = 0;
 		return ni;
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/include/asm-i386/page.h linux-2.6.14-rc3-001_antidefrag_flags/include/asm-i386/page.h
--- linux-2.6.14-rc3-clean/include/asm-i386/page.h	2005-10-04 22:58:34.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/include/asm-i386/page.h	2005-10-05 12:13:19.000000000 +0100
@@ -36,7 +36,7 @@
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
+#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | __GFP_USER, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 /*
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/include/linux/gfp.h linux-2.6.14-rc3-001_antidefrag_flags/include/linux/gfp.h
--- linux-2.6.14-rc3-clean/include/linux/gfp.h	2005-10-04 22:58:34.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/include/linux/gfp.h	2005-10-05 12:13:19.000000000 +0100
@@ -42,14 +42,20 @@ struct vm_area_struct;
 #define __GFP_NORECLAIM  0x20000u /* No realy zone reclaim during allocation */
 #define __GFP_HARDWALL   0x40000u /* Enforce hardwall cpuset memory allocs */
 
-#define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
+/* Allocation type modifiers, group together if possible */
+#define __GFP_USER       0x80000u  /* User and other easily reclaimed pages */
+#define __GFP_KERNRCLM   0x100000u /* Kernel page that is reclaimable */
+#define __GFP_RCLM_BITS  (__GFP_USER|__GFP_KERNRCLM)
+
+#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1)
 
 /* if you forget to add the bitmask here kernel will crash, period */
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
 			__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 			__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
-			__GFP_NOMEMALLOC|__GFP_NORECLAIM|__GFP_HARDWALL)
+			__GFP_NOMEMALLOC|__GFP_NORECLAIM|__GFP_HARDWALL | \
+			__GFP_USER | __GFP_KERNRCLM)
 
 #define GFP_ATOMIC	(__GFP_HIGH)
 #define GFP_NOIO	(__GFP_WAIT)
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/include/linux/highmem.h linux-2.6.14-rc3-001_antidefrag_flags/include/linux/highmem.h
--- linux-2.6.14-rc3-clean/include/linux/highmem.h	2005-08-29 00:41:01.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/include/linux/highmem.h	2005-10-05 12:13:19.000000000 +0100
@@ -47,7 +47,7 @@ static inline void clear_user_highpage(s
 static inline struct page *
 alloc_zeroed_user_highpage(struct vm_area_struct *vma, unsigned long vaddr)
 {
-	struct page *page = alloc_page_vma(GFP_HIGHUSER, vma, vaddr);
+	struct page *page = alloc_page_vma(GFP_HIGHUSER|__GFP_USER, vma, vaddr);
 
 	if (page)
 		clear_user_highpage(page, vaddr);
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/mm/memory.c linux-2.6.14-rc3-001_antidefrag_flags/mm/memory.c
--- linux-2.6.14-rc3-clean/mm/memory.c	2005-10-04 22:58:34.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/mm/memory.c	2005-10-05 12:13:19.000000000 +0100
@@ -1296,7 +1296,8 @@ static int do_wp_page(struct mm_struct *
 		if (!new_page)
 			goto no_new_page;
 	} else {
-		new_page = alloc_page_vma(GFP_HIGHUSER, vma, address);
+		new_page = alloc_page_vma(GFP_HIGHUSER|__GFP_USER,
+							vma, address);
 		if (!new_page)
 			goto no_new_page;
 		copy_user_highpage(new_page, old_page, address);
@@ -1871,7 +1872,7 @@ retry:
 
 		if (unlikely(anon_vma_prepare(vma)))
 			goto oom;
-		page = alloc_page_vma(GFP_HIGHUSER, vma, address);
+		page = alloc_page_vma(GFP_HIGHUSER | __GFP_USER, vma, address);
 		if (!page)
 			goto oom;
 		copy_user_highpage(page, new_page, address);
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/mm/shmem.c linux-2.6.14-rc3-001_antidefrag_flags/mm/shmem.c
--- linux-2.6.14-rc3-clean/mm/shmem.c	2005-10-04 22:58:34.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/mm/shmem.c	2005-10-05 12:13:19.000000000 +0100
@@ -908,7 +908,7 @@ shmem_alloc_page(unsigned long gfp, stru
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, idx);
 	pvma.vm_pgoff = idx;
 	pvma.vm_end = PAGE_SIZE;
-	page = alloc_page_vma(gfp | __GFP_ZERO, &pvma, 0);
+	page = alloc_page_vma(gfp | __GFP_ZERO | __GFP_USER, &pvma, 0);
 	mpol_free(pvma.vm_policy);
 	return page;
 }
@@ -924,7 +924,7 @@ static inline struct page *
 shmem_alloc_page(unsigned int __nocast gfp,struct shmem_inode_info *info,
 				 unsigned long idx)
 {
-	return alloc_page(gfp | __GFP_ZERO);
+	return alloc_page(gfp | __GFP_ZERO | __GFP_USER);
 }
 #endif
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-clean/mm/swap_state.c linux-2.6.14-rc3-001_antidefrag_flags/mm/swap_state.c
--- linux-2.6.14-rc3-clean/mm/swap_state.c	2005-10-04 22:58:34.000000000 +0100
+++ linux-2.6.14-rc3-001_antidefrag_flags/mm/swap_state.c	2005-10-05 12:13:19.000000000 +0100
@@ -335,7 +335,8 @@ struct page *read_swap_cache_async(swp_e
 		 * Get a new page to read into from swap.
 		 */
 		if (!new_page) {
-			new_page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
+			new_page = alloc_page_vma(GFP_HIGHUSER | __GFP_USER,
+							vma, addr);
 			if (!new_page)
 				break;		/* Out of memory */
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/7] Fragmentation Avoidance V16: 002_usemap
  2005-10-05 14:45 [PATCH 0/7] Fragmentation Avoidance V16 Mel Gorman
  2005-10-05 14:45 ` [PATCH 1/7] Fragmentation Avoidance V16: 001_antidefrag_flags Mel Gorman
@ 2005-10-05 14:45 ` Mel Gorman
  2005-10-05 23:38   ` David S. Miller, Mel Gorman
  2005-10-05 14:46 ` [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore Mel Gorman
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 14:45 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Mel Gorman, kravetz, linux-kernel, jschopp, lhms-devel

This patch adds a "usemap" to the allocator. When a PAGE_PER_MAXORDER block
of pages (i.e. 2^(MAX_ORDER-1)) is split, the usemap is updated with the
type of allocation when splitting. This information is used in an 
anti-fragmentation patch to group related allocation types together.

The __GFP_USER and __GFP_KERNRCLM bits are used to enumerate three allocation
types;

RCLM_NORLM:	These are kernel allocations that cannot be reclaimed
		on demand.
RCLM_USER:	These are pages allocated with __GFP_USER flag set. They are
		considered to be user and other easily reclaimed pages such
		as buffers
RCLM_KERN:	Allocated for the kernel but for caches that can be reclaimed
		on demand.

gfpflags_to_rclmtype() converts gfp_flags to their corresponding RCLM_TYPE
by masking out irrelevant bits and shifting the result right by RCLM_SHIFT.
Compile-time checks are made on RCLM_SHIFT to ensure gfpflags_to_rclmtype()
keeps working. ffz() could be used to avoid static checks, but it would be
runtime overhead for a compile-time constant.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-001_antidefrag_flags/include/linux/mm.h linux-2.6.14-rc3-002_usemap/include/linux/mm.h
--- linux-2.6.14-rc3-001_antidefrag_flags/include/linux/mm.h	2005-10-04 22:58:34.000000000 +0100
+++ linux-2.6.14-rc3-002_usemap/include/linux/mm.h	2005-10-05 12:14:02.000000000 +0100
@@ -522,6 +522,13 @@ static inline void set_page_links(struct
 extern struct page *mem_map;
 #endif
 
+/*
+ * Return what type of page is being allocated from this 2^MAX_ORDER-1 block
+ * of pages.
+ */
+extern unsigned int get_pageblock_type(struct zone *zone,
+						struct page *page);
+
 static inline void *lowmem_page_address(struct page *page)
 {
 	return __va(page_to_pfn(page) << PAGE_SHIFT);
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-001_antidefrag_flags/include/linux/mmzone.h linux-2.6.14-rc3-002_usemap/include/linux/mmzone.h
--- linux-2.6.14-rc3-001_antidefrag_flags/include/linux/mmzone.h	2005-10-04 22:58:34.000000000 +0100
+++ linux-2.6.14-rc3-002_usemap/include/linux/mmzone.h	2005-10-05 12:14:02.000000000 +0100
@@ -20,6 +20,17 @@
 #else
 #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
 #endif
+#define PAGES_PER_MAXORDER (1 << (MAX_ORDER-1))
+
+/*
+ * The two bit field __GFP_RECLAIMBITS enumerates the following 4 kinds of
+ * page reclaimability.
+ */
+#define RCLM_NORCLM   0
+#define RCLM_USER     1
+#define RCLM_KERN     2
+#define RCLM_TYPES    3
+#define BITS_PER_RCLM_TYPE 2
 
 struct free_area {
 	struct list_head	free_list;
@@ -139,6 +150,13 @@ struct zone {
 	spinlock_t		lock;
 	struct free_area	free_area[MAX_ORDER];
 
+#ifndef CONFIG_SPARSEMEM
+	/*
+	 * The map tracks what each 2^MAX_ORDER-1 sized block is being used for.
+	 * Each 2^MAX_ORDER block of pages uses BITS_PER_RCLM_TYPE bits
+	 */
+	unsigned long		*free_area_usemap;
+#endif
 
 	ZONE_PADDING(_pad1_)
 
@@ -473,6 +491,15 @@ extern struct pglist_data contig_page_da
 #if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
 #endif
+#if ((SECTION_SIZE_BITS - MAX_ORDER) * BITS_PER_RCLM_TYPE) > 64
+#error free_area_usemap is not big enough
+#endif
+
+/* Usemap initialisation */
+#ifdef CONFIG_SPARSEMEM
+static inline void setup_usemap(struct pglist_data *pgdat,
+				struct zone *zone, unsigned long zonesize) {}
+#endif /* CONFIG_SPARSEMEM */
 
 struct page;
 struct mem_section {
@@ -485,6 +512,7 @@ struct mem_section {
 	 * before using it wrong.
 	 */
 	unsigned long section_mem_map;
+	DECLARE_BITMAP(free_area_usemap,64);
 };
 
 #ifdef CONFIG_SPARSEMEM_EXTREME
@@ -552,6 +580,17 @@ static inline struct mem_section *__pfn_
 	return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
+static inline unsigned long *pfn_to_usemap(struct zone *zone, unsigned long pfn)
+{
+	return &__pfn_to_section(pfn)->free_area_usemap[0];
+}
+
+static inline int pfn_to_bitidx(struct zone *zone, unsigned long pfn)
+{
+	pfn &= (PAGES_PER_SECTION-1);
+	return (int)((pfn >> (MAX_ORDER-1)) * BITS_PER_RCLM_TYPE);
+}
+
 #define pfn_to_page(pfn) 						\
 ({ 									\
 	unsigned long __pfn = (pfn);					\
@@ -589,6 +628,16 @@ void sparse_init(void);
 #else
 #define sparse_init()	do {} while (0)
 #define sparse_index_init(_sec, _nid)  do {} while (0)
+static inline unsigned long *pfn_to_usemap(struct zone *zone, 
+						unsigned long pfn)
+{
+	return (zone->free_area_usemap);
+}
+static inline int pfn_to_bitidx(struct zone *zone, unsigned long pfn)
+{
+	pfn = pfn - zone->zone_start_pfn;
+	return (int)((pfn >> (MAX_ORDER-1)) * BITS_PER_RCLM_TYPE);
+}
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-001_antidefrag_flags/mm/page_alloc.c linux-2.6.14-rc3-002_usemap/mm/page_alloc.c
--- linux-2.6.14-rc3-001_antidefrag_flags/mm/page_alloc.c	2005-10-04 22:58:34.000000000 +0100
+++ linux-2.6.14-rc3-002_usemap/mm/page_alloc.c	2005-10-05 12:14:02.000000000 +0100
@@ -66,6 +66,90 @@ EXPORT_SYMBOL(totalram_pages);
 EXPORT_SYMBOL(nr_swap_pages);
 
 /*
+ * RCLM_SHIFT is the number of bits that a gfp_mask has to be shifted right
+ * to have just the __GFP_USER and __GFP_KERNRCLM bits. The static check is
+ * made afterwards in case the GFP flags are not updated without updating 
+ * this number
+ */
+#define RCLM_SHIFT 19
+#if (__GFP_USER >> RCLM_SHIFT) != RCLM_USER
+#error __GFP_USER not mapping to RCLM_USER
+#endif
+#if (__GFP_KERNRCLM >> RCLM_SHIFT) != RCLM_KERN
+#error __GFP_KERNRCLM not mapping to RCLM_KERN
+#endif
+
+/*
+ * This function maps gfpflags to their RCLM_TYPE. It makes assumptions
+ * on the location of the GFP flags
+ */
+static inline int gfpflags_to_rclmtype(unsigned long gfp_flags) {
+	return (gfp_flags & __GFP_RCLM_BITS) >> RCLM_SHIFT;
+}
+
+/*
+ * copy_bits - Copy bits between bitmaps
+ * @dstaddr: The destination bitmap to copy to
+ * @srcaddr: The source bitmap to copy from
+ * @sindex_dst: The start bit index within the destination map to copy to
+ * @sindex_src: The start bit index within the source map to copy from
+ * @nr: The number of bits to copy
+ */
+static inline void copy_bits(unsigned long *dstaddr,
+		unsigned long *srcaddr, 
+		int sindex_dst, 
+		int sindex_src,
+		int nr)
+{
+	/*
+	 * Written like this to take advantage of arch-specific 
+	 * set_bit() and clear_bit() functions
+	 */
+	for (nr = nr-1; nr >= 0; nr--) {
+		int bit = test_bit(sindex_src + nr, srcaddr);
+		if (bit)
+			set_bit(sindex_dst + nr, dstaddr);
+		else
+			clear_bit(sindex_dst + nr, dstaddr);
+	}
+}
+
+unsigned int get_pageblock_type(struct zone *zone,
+						struct page *page)
+{
+	unsigned long pfn = page_to_pfn(page);
+	int bitidx;
+	unsigned int type = 0;
+	unsigned long *usemap;
+
+	bitidx = pfn_to_bitidx(zone, pfn);
+	usemap = pfn_to_usemap(zone, pfn);
+
+	copy_bits((unsigned long *)&type, usemap, 
+			0, bitidx, BITS_PER_RCLM_TYPE);
+
+	return type;
+}
+
+/*
+ * Reserve a block of pages for an allocation type & enforce function
+ * being changed if more bits are added to keep track of additional types
+ */
+static inline void set_pageblock_type(struct zone *zone, struct page *page,
+					int type)
+{
+	unsigned long pfn = page_to_pfn(page);
+	int bitidx;
+	unsigned long *usemap;
+
+	bitidx = pfn_to_bitidx(zone, pfn);
+	usemap = pfn_to_usemap(zone, pfn);
+
+	copy_bits(usemap, (unsigned long *)&type, 
+			bitidx, 0, BITS_PER_RCLM_TYPE);
+}
+
+/*
  * Used by page_zone() to look up the address of the struct zone whose
  * id is encoded in the upper bits of page->flags
  */
@@ -470,7 +554,8 @@ static void prep_new_page(struct page *p
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
  */
-static struct page *__rmqueue(struct zone *zone, unsigned int order)
+static struct page *__rmqueue(struct zone *zone, unsigned int order, 
+		int alloctype)
 {
 	struct free_area * area;
 	unsigned int current_order;
@@ -486,6 +571,14 @@ static struct page *__rmqueue(struct zon
 		rmv_page_order(page);
 		area->nr_free--;
 		zone->free_pages -= 1UL << order;
+
+		/*
+		 * If splitting a large block, record what the block is being
+		 * used for in the usemap
+		 */
+		if (current_order == MAX_ORDER-1)
+			set_pageblock_type(zone, page, alloctype);
+
 		return expand(zone, page, order, current_order, area);
 	}
 
@@ -498,7 +591,8 @@ static struct page *__rmqueue(struct zon
  * Returns the number of new pages which were placed at *list.
  */
 static int rmqueue_bulk(struct zone *zone, unsigned int order, 
-			unsigned long count, struct list_head *list)
+			unsigned long count, struct list_head *list,
+			int alloctype)
 {
 	unsigned long flags;
 	int i;
@@ -507,7 +601,7 @@ static int rmqueue_bulk(struct zone *zon
 	
 	spin_lock_irqsave(&zone->lock, flags);
 	for (i = 0; i < count; ++i) {
-		page = __rmqueue(zone, order);
+		page = __rmqueue(zone, order, alloctype);
 		if (page == NULL)
 			break;
 		allocated++;
@@ -691,7 +785,8 @@ buffered_rmqueue(struct zone *zone, int 
 	unsigned long flags;
 	struct page *page = NULL;
 	int cold = !!(gfp_flags & __GFP_COLD);
-
+	int alloctype = gfpflags_to_rclmtype(gfp_flags);
+	
 	if (order == 0) {
 		struct per_cpu_pages *pcp;
 
@@ -699,7 +794,8 @@ buffered_rmqueue(struct zone *zone, int 
 		local_irq_save(flags);
 		if (pcp->count <= pcp->low)
 			pcp->count += rmqueue_bulk(zone, 0,
-						pcp->batch, &pcp->list);
+						pcp->batch, &pcp->list,
+						alloctype);
 		if (pcp->count) {
 			page = list_entry(pcp->list.next, struct page, lru);
 			list_del(&page->lru);
@@ -711,7 +807,7 @@ buffered_rmqueue(struct zone *zone, int 
 
 	if (page == NULL) {
 		spin_lock_irqsave(&zone->lock, flags);
-		page = __rmqueue(zone, order);
+		page = __rmqueue(zone, order, alloctype);
 		spin_unlock_irqrestore(&zone->lock, flags);
 	}
 
@@ -1870,6 +1966,36 @@ void __init setup_per_cpu_pageset()
 
 #endif
 
+#ifndef CONFIG_SPARSEMEM
+#define roundup(x, y) ((((x)+((y)-1))/(y))*(y))
+/*
+ * Calculate the size of the zone->usemap in bytes rounded to an unsigned long
+ * Start by making sure zonesize is a multiple of MAX_ORDER-1 by rounding up
+ * Then figure 1 RCLM_TYPE worth of bits per MAX_ORDER-1, finally round up
+ * what is now in bits to nearest long in bits, then return it in bytes.
+ */
+static unsigned long __init usemap_size(unsigned long zonesize)
+{
+	unsigned long usemapsize;
+
+	usemapsize = roundup(zonesize, PAGES_PER_MAXORDER);
+	usemapsize = usemapsize >> (MAX_ORDER-1);
+	usemapsize *= BITS_PER_RCLM_TYPE;
+	usemapsize = roundup(usemapsize, 8 * sizeof(unsigned long));
+
+	return usemapsize / 8;
+}
+
+static void __init setup_usemap(struct pglist_data *pgdat,
+				struct zone *zone, unsigned long zonesize)
+{
+	unsigned long usemapsize = usemap_size(zonesize);
+	zone->free_area_usemap = alloc_bootmem_node(pgdat, usemapsize);
+	memset(zone->free_area_usemap, RCLM_NORCLM, usemapsize);
+}
+#endif /* CONFIG_SPARSEMEM */
+
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -1955,6 +2081,7 @@ static void __init free_area_init_core(s
 		memmap_init(size, nid, j, zone_start_pfn);
 
 		zonetable_add(zone, nid, j, zone_start_pfn, size);
+		setup_usemap(pgdat, zone, size);
 
 		zone_start_pfn += size;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore
  2005-10-05 14:45 [PATCH 0/7] Fragmentation Avoidance V16 Mel Gorman
  2005-10-05 14:45 ` [PATCH 1/7] Fragmentation Avoidance V16: 001_antidefrag_flags Mel Gorman
  2005-10-05 14:45 ` [PATCH 2/7] Fragmentation Avoidance V16: 002_usemap Mel Gorman
@ 2005-10-05 14:46 ` Mel Gorman
  2005-10-05 16:48   ` Dave Hansen
  2005-10-05 14:46 ` [PATCH 4/7] Fragmentation Avoidance V16: 004_largealloc_tryharder Mel Gorman
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 14:46 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Mel Gorman, kravetz, linux-kernel, jschopp, lhms-devel

This patch adds the core of the anti-fragmentation strategy. It works by
grouping related allocation types together. The idea is that large groups of
pages that may be reclaimed are placed near each other. The zone->free_area
list is broken into three free lists for each RCLM_TYPE.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-002_usemap/include/linux/mmzone.h linux-2.6.14-rc3-003_fragcore/include/linux/mmzone.h
--- linux-2.6.14-rc3-002_usemap/include/linux/mmzone.h	2005-10-05 12:14:02.000000000 +0100
+++ linux-2.6.14-rc3-003_fragcore/include/linux/mmzone.h	2005-10-05 12:14:44.000000000 +0100
@@ -148,7 +148,6 @@ struct zone {
 	 * free areas of different sizes
 	 */
 	spinlock_t		lock;
-	struct free_area	free_area[MAX_ORDER];
 
 #ifndef CONFIG_SPARSEMEM
 	/*
@@ -158,6 +157,8 @@ struct zone {
 	unsigned long		*free_area_usemap;
 #endif
 
+	struct free_area	free_area_lists[RCLM_TYPES][MAX_ORDER];
+
 	ZONE_PADDING(_pad1_)
 
 	/* Fields commonly accessed by the page reclaim scanner */
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-002_usemap/mm/page_alloc.c linux-2.6.14-rc3-003_fragcore/mm/page_alloc.c
--- linux-2.6.14-rc3-002_usemap/mm/page_alloc.c	2005-10-05 12:14:02.000000000 +0100
+++ linux-2.6.14-rc3-003_fragcore/mm/page_alloc.c	2005-10-05 12:14:44.000000000 +0100
@@ -316,6 +316,16 @@ __find_combined_index(unsigned long page
 }
 
 /*
+ * Return the free list for a given page within a zone
+ */
+static inline struct free_area *__page_find_freelist(struct zone *zone,
+						struct page *page)
+{
+	int type = get_pageblock_type(zone, page);
+	return zone->free_area_lists[type];
+}
+
+/*
  * This function checks whether a page is free && is the buddy
  * we can do coalesce a page and its buddy if
  * (a) the buddy is free &&
@@ -363,6 +373,8 @@ static inline void __free_pages_bulk (st
 {
 	unsigned long page_idx;
 	int order_size = 1 << order;
+	struct free_area *area;
+	struct free_area *freelist;
 
 	if (unlikely(order))
 		destroy_compound_page(page, order);
@@ -372,10 +384,11 @@ static inline void __free_pages_bulk (st
 	BUG_ON(page_idx & (order_size - 1));
 	BUG_ON(bad_range(zone, page));
 
+	freelist = __page_find_freelist(zone, page);
+
 	zone->free_pages += order_size;
 	while (order < MAX_ORDER-1) {
 		unsigned long combined_idx;
-		struct free_area *area;
 		struct page *buddy;
 
 		combined_idx = __find_combined_index(page_idx, order);
@@ -386,7 +399,7 @@ static inline void __free_pages_bulk (st
 		if (!page_is_buddy(buddy, order))
 			break;		/* Move the buddy up one level. */
 		list_del(&buddy->lru);
-		area = zone->free_area + order;
+		area = &(freelist[order]);
 		area->nr_free--;
 		rmv_page_order(buddy);
 		page = page + (combined_idx - page_idx);
@@ -394,8 +407,8 @@ static inline void __free_pages_bulk (st
 		order++;
 	}
 	set_page_order(page, order);
-	list_add(&page->lru, &zone->free_area[order].free_list);
-	zone->free_area[order].nr_free++;
+	list_add_tail(&page->lru, &freelist[order].free_list);
+	freelist[order].nr_free++;
 }
 
 static inline void free_pages_check(const char *function, struct page *page)
@@ -550,6 +563,47 @@ static void prep_new_page(struct page *p
 	kernel_map_pages(page, 1 << order, 1);
 }
 
+/*
+ * Find a list that has a 2^MAX_ORDER-1 block of pages available and
+ * return it
+ */
+struct page* steal_largepage(struct zone *zone, int alloctype)
+{
+	struct page *page;
+	struct free_area *area = NULL;
+	int i=0;
+
+	for(i = 0; i < RCLM_TYPES; i++) {
+		if(i == alloctype)
+			continue;
+
+		area = &zone->free_area_lists[i][MAX_ORDER-1];
+		if (!list_empty(&area->free_list))
+			break;
+	}
+	if (i == RCLM_TYPES) 
+		return NULL;
+
+	page = list_entry(area->free_list.next, struct page, lru);
+	area->nr_free--;
+
+	set_pageblock_type(zone, page, alloctype);
+
+	return page;
+}
+
+
+static inline struct page *
+remove_page(struct zone *zone, struct page *page, unsigned int order,
+		unsigned int current_order, struct free_area *area)
+{
+	list_del(&page->lru);
+	rmv_page_order(page);
+	zone->free_pages -= 1UL << order;
+	return expand(zone, page, order, current_order, area);
+
+}
+
 /* 
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
@@ -557,31 +611,25 @@ static void prep_new_page(struct page *p
 static struct page *__rmqueue(struct zone *zone, unsigned int order, 
 		int alloctype)
 {
-	struct free_area * area;
+	struct free_area * area = NULL;
 	unsigned int current_order;
 	struct page *page;
 
 	for (current_order = order; current_order < MAX_ORDER; ++current_order) {
-		area = zone->free_area + current_order;
+		area = &(zone->free_area_lists[alloctype][current_order]);
 		if (list_empty(&area->free_list))
 			continue;
 
 		page = list_entry(area->free_list.next, struct page, lru);
-		list_del(&page->lru);
-		rmv_page_order(page);
 		area->nr_free--;
-		zone->free_pages -= 1UL << order;
-
-		/*
-		 * If splitting a large block, record what the block is being
-		 * used for in the usemap
-		 */
-		if (current_order == MAX_ORDER-1)
-			set_pageblock_type(zone, page, alloctype);
-
-		return expand(zone, page, order, current_order, area);
+		return remove_page(zone, page, order, current_order, area);
 	}
 
+	/* Allocate a MAX_ORDER block */
+	page = steal_largepage(zone, alloctype);
+	if (page != NULL)
+		return remove_page(zone, page, order, MAX_ORDER-1, area);
+
 	return NULL;
 }
 
@@ -662,14 +710,17 @@ static void __drain_pages(unsigned int c
 }
 #endif /* CONFIG_PM || CONFIG_HOTPLUG_CPU */
 
+#define for_each_rclmtype_order(type, order) \
+	for (type=0; type < RCLM_TYPES; type++) \
+		for (order = MAX_ORDER - 1; order >= 0; order--)
+	
 #ifdef CONFIG_PM
-
 void mark_free_pages(struct zone *zone)
 {
 	unsigned long zone_pfn, flags;
-	int order;
+	int order, t;
+	unsigned long start_pfn, i;
 	struct list_head *curr;
-
 	if (!zone->spanned_pages)
 		return;
 
@@ -677,14 +728,12 @@ void mark_free_pages(struct zone *zone)
 	for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
 		ClearPageNosaveFree(pfn_to_page(zone_pfn + zone->zone_start_pfn));
 
-	for (order = MAX_ORDER - 1; order >= 0; --order)
-		list_for_each(curr, &zone->free_area[order].free_list) {
-			unsigned long start_pfn, i;
-
+	for_each_rclmtype_order(t, order) {
+		list_for_each(curr,&zone->free_area_lists[t][order].free_list){
 			start_pfn = page_to_pfn(list_entry(curr, struct page, lru));
-
 			for (i=0; i < (1<<order); i++)
 				SetPageNosaveFree(pfn_to_page(start_pfn+i));
+		}
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -835,6 +884,7 @@ int zone_watermark_ok(struct zone *z, in
 	/* free_pages my go negative - that's OK */
 	long min = mark, free_pages = z->free_pages - (1 << order) + 1;
 	int o;
+	struct free_area *kernnorclm, *kernrclm, *userrclm;
 
 	if (gfp_high)
 		min -= min / 2;
@@ -843,15 +893,21 @@ int zone_watermark_ok(struct zone *z, in
 
 	if (free_pages <= min + z->lowmem_reserve[classzone_idx])
 		return 0;
+	kernnorclm = z->free_area_lists[RCLM_NORCLM];
+	kernrclm = z->free_area_lists[RCLM_KERN];
+	userrclm = z->free_area_lists[RCLM_USER];
 	for (o = 0; o < order; o++) {
 		/* At the next order, this order's pages become unavailable */
-		free_pages -= z->free_area[o].nr_free << o;
-
+		free_pages -= (kernnorclm->nr_free + kernrclm->nr_free +
+				userrclm->nr_free) << o;
 		/* Require fewer higher order pages to be free */
 		min >>= 1;
 
 		if (free_pages <= min)
 			return 0;
+		kernnorclm++;
+		kernrclm++;
+		userrclm++;
 	}
 	return 1;
 }
@@ -1390,6 +1446,7 @@ void show_free_areas(void)
 	unsigned long inactive;
 	unsigned long free;
 	struct zone *zone;
+	int type;
 
 	for_each_zone(zone) {
 		show_node(zone);
@@ -1483,8 +1540,10 @@ void show_free_areas(void)
 
 		spin_lock_irqsave(&zone->lock, flags);
 		for (order = 0; order < MAX_ORDER; order++) {
-			nr = zone->free_area[order].nr_free;
-			total += nr << order;
+			for (type=0; type < RCLM_TYPES; type++) {
+				nr = zone->free_area_lists[type][order].nr_free;
+				total += nr << order;
+			}
 			printk("%lu*%lukB ", nr, K(1UL) << order);
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
@@ -1785,9 +1844,14 @@ void zone_init_free_lists(struct pglist_
 				unsigned long size)
 {
 	int order;
-	for (order = 0; order < MAX_ORDER ; order++) {
-		INIT_LIST_HEAD(&zone->free_area[order].free_list);
-		zone->free_area[order].nr_free = 0;
+	int type;
+	struct free_area *area;
+
+	/* Initialse the three size ordered lists of free_areas */
+	for_each_rclmtype_order(type, order) {
+		area = &(zone->free_area_lists[type][order]);
+		INIT_LIST_HEAD(&area->free_list);
+		area->nr_free = 0;
 	}
 }
 
@@ -2179,16 +2243,26 @@ static int frag_show(struct seq_file *m,
 	struct zone *zone;
 	struct zone *node_zones = pgdat->node_zones;
 	unsigned long flags;
-	int order;
+	int order, type;
+	struct free_area *area;
+	unsigned long nr_bufs = 0;
 
 	for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
 		if (!zone->present_pages)
 			continue;
 
 		spin_lock_irqsave(&zone->lock, flags);
-		seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
-		for (order = 0; order < MAX_ORDER; ++order)
-			seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
+		seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
+		for (order = 0; order < MAX_ORDER; ++order) {
+			nr_bufs = 0;
+
+			for (type=0; type < RCLM_TYPES; type++) {
+				area = &(zone->free_area_lists[type][order]);
+				nr_bufs += area->nr_free;
+			}
+			seq_printf(m, "%6lu ", nr_bufs);
+		}
+
 		spin_unlock_irqrestore(&zone->lock, flags);
 		seq_putc(m, '\n');
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/7] Fragmentation Avoidance V16: 004_largealloc_tryharder
  2005-10-05 14:45 [PATCH 0/7] Fragmentation Avoidance V16 Mel Gorman
                   ` (2 preceding siblings ...)
  2005-10-05 14:46 ` [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore Mel Gorman
@ 2005-10-05 14:46 ` Mel Gorman
  2005-10-05 14:46 ` [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback Mel Gorman
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 14:46 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Mel Gorman, kravetz, linux-kernel, jschopp, lhms-devel

Fragmentation avoidance patches increase our chances of satisfying high
order allocations.  So this patch takes more than one iteration at trying
to fulfill those allocations because, unlike before, the extra iterations
are often useful.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-003_fragcore/mm/page_alloc.c linux-2.6.14-rc3-004_largealloc_tryharder/mm/page_alloc.c
--- linux-2.6.14-rc3-003_fragcore/mm/page_alloc.c	2005-10-05 12:14:44.000000000 +0100
+++ linux-2.6.14-rc3-004_largealloc_tryharder/mm/page_alloc.c	2005-10-05 12:15:23.000000000 +0100
@@ -939,6 +939,7 @@ __alloc_pages(unsigned int __nocast gfp_
 	int do_retry;
 	int can_try_harder;
 	int did_some_progress;
+	int highorder_retry = 3;
 
 	might_sleep_if(wait);
 
@@ -1087,7 +1088,16 @@ rebalance:
 				goto got_pg;
 		}
 
-		out_of_memory(gfp_mask, order);
+		if (order < MAX_ORDER/2)
+			out_of_memory(gfp_mask, order);
+
+		/*
+		 * Due to low fragmentation efforts, we should try a little
+		 * harder to satisfy high order allocations
+		 */
+		if (order >= MAX_ORDER/2 && --highorder_retry > 0)
+			goto rebalance;
+
 		goto restart;
 	}
 
@@ -1104,6 +1114,8 @@ rebalance:
 			do_retry = 1;
 		if (gfp_mask & __GFP_NOFAIL)
 			do_retry = 1;
+		if (order >= MAX_ORDER/2 && --highorder_retry > 0)
+			do_retry = 1;
 	}
 	if (do_retry) {
 		blk_congestion_wait(WRITE, HZ/50);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 14:45 [PATCH 0/7] Fragmentation Avoidance V16 Mel Gorman
                   ` (3 preceding siblings ...)
  2005-10-05 14:46 ` [PATCH 4/7] Fragmentation Avoidance V16: 004_largealloc_tryharder Mel Gorman
@ 2005-10-05 14:46 ` Mel Gorman
  2005-10-05 16:49   ` Dave Hansen
                     ` (2 more replies)
  2005-10-05 14:46 ` [PATCH 6/7] Fragmentation Avoidance V16: 006_percpu Mel Gorman
  2005-10-05 14:46 ` [PATCH 7/7] Fragmentation Avoidance V16: 007_stats Mel Gorman
  6 siblings, 3 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 14:46 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Mel Gorman, kravetz, linux-kernel, jschopp, lhms-devel

This patch implements fallback logic. In the event there is no 2^(MAX_ORDER-1)
blocks of pages left, this will help the system decide what list to use. The
highlights of the patch are;

o Define a RCLM_FALLBACK type for fallbacks
o Use a percentage of each zone for fallbacks. When a reserved pool of pages
  is depleted, it will try and use RCLM_FALLBACK before using anything else.
  This greatly reduces the amount of fallbacks causing fragmentation without
  needing complex balancing algorithms
o Add a fallback_reserve and fallback_balance so that the system knows how
  may 2^(MAX_ORDER-1) blocks are being used for fallbacks and if more need
  to be reserved.
o Adds a fallback_allocs[] array that determines the order of freelists are
  used for each allocation type

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-004_largealloc_tryharder/include/linux/mmzone.h linux-2.6.14-rc3-005_fallback/include/linux/mmzone.h
--- linux-2.6.14-rc3-004_largealloc_tryharder/include/linux/mmzone.h	2005-10-05 12:14:44.000000000 +0100
+++ linux-2.6.14-rc3-005_fallback/include/linux/mmzone.h	2005-10-05 12:16:05.000000000 +0100
@@ -29,7 +29,8 @@
 #define RCLM_NORCLM   0
 #define RCLM_USER     1
 #define RCLM_KERN     2
-#define RCLM_TYPES    3
+#define RCLM_FALLBACK 3
+#define RCLM_TYPES    4
 #define BITS_PER_RCLM_TYPE 2
 
 struct free_area {
@@ -159,6 +160,13 @@ struct zone {
 
 	struct free_area	free_area_lists[RCLM_TYPES][MAX_ORDER];
 
+	/*
+	 * Track what percentage of the zone should be used for fallbacks and
+	 * how much is being currently used
+	 */
+	unsigned long		fallback_reserve;
+	long			fallback_balance;
+
 	ZONE_PADDING(_pad1_)
 
 	/* Fields commonly accessed by the page reclaim scanner */
@@ -271,6 +279,24 @@ struct zonelist {
 	struct zone *zones[MAX_NUMNODES * MAX_NR_ZONES + 1]; // NULL delimited
 };
 
+static inline void inc_reserve_count(struct zone* zone, int type)
+{
+	if (type == RCLM_FALLBACK)
+		zone->fallback_reserve++;
+#ifdef CONFIG_ALLOCSTATS
+	zone->reserve_count[type]++;
+#endif
+}
+
+static inline void dec_reserve_count(struct zone* zone, int type)
+{
+	if (type == RCLM_FALLBACK && zone->fallback_reserve)
+		zone->fallback_reserve--;
+#ifdef CONFIG_ALLOCSTATS
+	if (zone->reserve_count[type] > 0)
+		zone->reserve_count[type]--;
+#endif
+}
 
 /*
  * The pg_data_t structure is used in machines with CONFIG_DISCONTIGMEM
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-004_largealloc_tryharder/mm/page_alloc.c linux-2.6.14-rc3-005_fallback/mm/page_alloc.c
--- linux-2.6.14-rc3-004_largealloc_tryharder/mm/page_alloc.c	2005-10-05 12:15:23.000000000 +0100
+++ linux-2.6.14-rc3-005_fallback/mm/page_alloc.c	2005-10-05 12:16:05.000000000 +0100
@@ -53,6 +53,38 @@ unsigned long totalhigh_pages __read_mos
 long nr_swap_pages;
 
 /*
+ * fallback_allocs contains the fallback types for low memory conditions
+ * where the preferred alloction type if not available.
+ */
+int fallback_allocs[RCLM_TYPES][RCLM_TYPES+1] = {
+	{RCLM_NORCLM,RCLM_FALLBACK,  RCLM_KERN,  RCLM_USER,-1},
+	{RCLM_KERN,  RCLM_FALLBACK,  RCLM_NORCLM,RCLM_USER,-1},
+	{RCLM_USER,  RCLM_FALLBACK,  RCLM_NORCLM,RCLM_KERN,-1},
+	{RCLM_FALLBACK,  RCLM_NORCLM,RCLM_KERN,  RCLM_USER,-1}
+};
+
+/*
+ * Returns 1 if the required presentage of the zone if reserved for fallbacks
+ *
+ * fallback_balance and fallback_reserve are used to detect when the required
+ * percentage is reserved. fallback_balance is decremented when a
+ * 2^(MAX_ORDER-1) block is split and incremented when coalesced.
+ * fallback_reserve is incremented when a block is reserved for fallbacks
+ * and decremented when reassigned elsewhere.
+ *
+ * When fallback_balance is negative, a reserve is required. The number
+ * of reserved blocks required is related to the negative value of
+ * fallback_balance
+ */
+static inline int min_fallback_reserved(struct zone *zone) {
+	/* If fallback_balance is positive, we do not need to reserve */
+	if (zone->fallback_balance > 0)
+		return 1;
+
+	return -(zone->fallback_balance) < zone->fallback_reserve;
+}
+
+/*
  * results with 256, 32 in the lowmem_reserve sysctl:
  *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
  *	1G machine -> (16M dma, 784M normal, 224M high)
@@ -406,6 +438,8 @@ static inline void __free_pages_bulk (st
 		page_idx = combined_idx;
 		order++;
 	}
+	if (unlikely(order == MAX_ORDER-1))
+		zone->fallback_balance++;
 	set_page_order(page, order);
 	list_add_tail(&page->lru, &freelist[order].free_list);
 	freelist[order].nr_free++;
@@ -587,7 +621,12 @@ struct page* steal_largepage(struct zone
 	page = list_entry(area->free_list.next, struct page, lru);
 	area->nr_free--;
 
+	if (!min_fallback_reserved(zone))
+		alloctype = RCLM_FALLBACK;
+
 	set_pageblock_type(zone, page, alloctype);
+	dec_reserve_count(zone, i);
+	inc_reserve_count(zone, alloctype);
 
 	return page;
 }
@@ -597,6 +636,8 @@ static inline struct page *
 remove_page(struct zone *zone, struct page *page, unsigned int order,
 		unsigned int current_order, struct free_area *area)
 {
+	if (unlikely(current_order == MAX_ORDER-1))
+		zone->fallback_balance--;
 	list_del(&page->lru);
 	rmv_page_order(page);
 	zone->free_pages -= 1UL << order;
@@ -604,6 +645,80 @@ remove_page(struct zone *zone, struct pa
 
 }
 
+/*
+ * If we are falling back, and the allocation is KERNNORCLM,
+ * then reserve any buddies for the KERNNORCLM pool. These
+ * allocations fragment the worst so this helps keep them
+ * in the one place
+ */
+static inline struct free_area *
+fallback_buddy_reserve(int start_alloctype, struct zone *zone,
+			unsigned int current_order, struct page *page,
+			struct free_area *area)
+{
+	if (start_alloctype != RCLM_NORCLM)
+		return area;
+
+	area = &(zone->free_area_lists[RCLM_NORCLM][current_order]);
+
+	/* Reserve the whole block if this is a large split */
+	if (current_order >= MAX_ORDER / 2) {
+		int reserve_type=RCLM_NORCLM;
+		dec_reserve_count(zone, get_pageblock_type(zone,page));
+
+		/*
+		 * Use this block for fallbacks if the
+		 * minimum reserve is not being met
+		 */
+		if (!min_fallback_reserved(zone))
+			reserve_type = RCLM_FALLBACK;
+
+		set_pageblock_type(zone, page, reserve_type);
+		inc_reserve_count(zone, reserve_type);
+	}
+	return area;
+}
+
+static struct page *
+fallback_alloc(int alloctype, struct zone *zone, unsigned int order) {
+	int *fallback_list;
+	int start_alloctype = alloctype;
+	struct free_area *area;
+	unsigned int current_order;
+	struct page *page;
+
+	/* Ok, pick the fallback order based on the type */
+	BUG_ON(alloctype < 0 || alloctype >= RCLM_TYPES);
+	fallback_list = fallback_allocs[alloctype];
+
+	/*
+	 * Here, the alloc type lists has been depleted as well as the global
+	 * pool, so fallback. When falling back, the largest possible block
+	 * will be taken to keep the fallbacks clustered if possible
+	 */
+	while ((alloctype = *(++fallback_list)) != -1) {
+
+		/* Find a block to allocate */
+		area = &(zone->free_area_lists[alloctype][MAX_ORDER-1]);
+		for (current_order = MAX_ORDER - 1; current_order > order;
+				current_order--, area--) {
+			if (list_empty(&area->free_list))
+				continue;
+
+			page = list_entry(area->free_list.next, 
+						struct page, lru);
+			area->nr_free--;
+			area = fallback_buddy_reserve(start_alloctype, zone,
+					current_order, page, area);
+			return remove_page(zone, page, order,
+					current_order, area);
+
+		}
+	}
+
+	return NULL;
+}
+
 /* 
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
@@ -630,7 +745,8 @@ static struct page *__rmqueue(struct zon
 	if (page != NULL)
 		return remove_page(zone, page, order, MAX_ORDER-1, area);
 
-	return NULL;
+	/* Try falling back */
+	return fallback_alloc(alloctype, zone, order);
 }
 
 /* 
@@ -2109,6 +2225,10 @@ static void __init free_area_init_core(s
 		spin_lock_init(&zone->lru_lock);
 		zone->zone_pgdat = pgdat;
 		zone->free_pages = 0;
+		zone->fallback_reserve = 0;
+
+		/* Set the balance so about 12.5% will be used for fallbacks */
+		zone->fallback_balance = -(realsize >> (MAX_ORDER+2));
 
 		zone->temp_priority = zone->prev_priority = DEF_PRIORITY;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 6/7] Fragmentation Avoidance V16: 006_percpu
  2005-10-05 14:45 [PATCH 0/7] Fragmentation Avoidance V16 Mel Gorman
                   ` (4 preceding siblings ...)
  2005-10-05 14:46 ` [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback Mel Gorman
@ 2005-10-05 14:46 ` Mel Gorman
  2005-10-05 14:46 ` [PATCH 7/7] Fragmentation Avoidance V16: 007_stats Mel Gorman
  6 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 14:46 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Mel Gorman, kravetz, linux-kernel, jschopp, lhms-devel

The freelists for each allocation type can slowly become corrupted due to
the per-cpu list. Consider what happens when the following happens

1. A 2^(MAX_ORDER-1) list is reserved for __GFP_USER pages
2. An order-0 page is allocated from the newly reserved block
3. The page is freed and placed on the per-cpu list
4. alloc_page() is called with GFP_KERNEL as the gfp_mask
5. The per-cpu list is used to satisfy the allocation

Now, a kernel page is in the middle of a __GFP_USER page. This means that over
long periods of the time, the anti-fragmentation scheme slowly degrades to the
standard allocator.

This patch divides the per-cpu lists into Kernel and User lists. RCLM_NORCLM
and RCLM_KERN use the Kernel list and RCLM_USER uses the user list. Strictly
speaking, there should be three lists but as little effort is made to reclaim
RCLM_KERN pages, it is not worth the overhead *yet*.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-005_fallback/include/linux/mmzone.h linux-2.6.14-rc3-006_percpu/include/linux/mmzone.h
--- linux-2.6.14-rc3-005_fallback/include/linux/mmzone.h	2005-10-05 12:16:05.000000000 +0100
+++ linux-2.6.14-rc3-006_percpu/include/linux/mmzone.h	2005-10-05 12:26:56.000000000 +0100
@@ -55,12 +55,17 @@ struct zone_padding {
 #define ZONE_PADDING(name)
 #endif
 
+/* Indices into pcpu_list */
+#define PCPU_KERNEL 0
+#define PCPU_USER   1
+#define PCPU_TYPES  2
+
 struct per_cpu_pages {
-	int count;		/* number of pages in the list */
+	int count[PCPU_TYPES];  /* Number of pages on each list */
 	int low;		/* low watermark, refill needed */
 	int high;		/* high watermark, emptying needed */
 	int batch;		/* chunk size for buddy add/remove */
-	struct list_head list;	/* the list of pages */
+	struct list_head list[PCPU_TYPES]; /* the lists of pages */
 };
 
 struct per_cpu_pageset {
@@ -75,6 +80,10 @@ struct per_cpu_pageset {
 #endif
 } ____cacheline_aligned_in_smp;
 
+/* Helpers for per_cpu_pages */
+#define pset_count(pset) (pset.count[PCPU_KERNEL] + pset.count[PCPU_USER])
+#define for_each_pcputype(pindex) \
+	for (pindex=0; pindex < RCLM_TYPES; pindex++)
 #ifdef CONFIG_NUMA
 #define zone_pcp(__z, __cpu) ((__z)->pageset[(__cpu)])
 #else
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-005_fallback/mm/page_alloc.c linux-2.6.14-rc3-006_percpu/mm/page_alloc.c
--- linux-2.6.14-rc3-005_fallback/mm/page_alloc.c	2005-10-05 12:16:05.000000000 +0100
+++ linux-2.6.14-rc3-006_percpu/mm/page_alloc.c	2005-10-05 12:27:48.000000000 +0100
@@ -780,7 +780,7 @@ static int rmqueue_bulk(struct zone *zon
 void drain_remote_pages(void)
 {
 	struct zone *zone;
-	int i;
+	int i, pindex;
 	unsigned long flags;
 
 	local_irq_save(flags);
@@ -796,9 +796,16 @@ void drain_remote_pages(void)
 			struct per_cpu_pages *pcp;
 
 			pcp = &pset->pcp[i];
-			if (pcp->count)
-				pcp->count -= free_pages_bulk(zone, pcp->count,
-						&pcp->list, 0);
+			for_each_pcputype(pindex) {
+				if (!pcp->count[pindex])
+					continue;
+
+				/* Try remove all pages from the pcpu list */
+				pcp->count[pindex] -= 
+					free_pages_bulk(zone, 
+						pcp->count[pindex],
+						&pcp->list[pindex], 0);
+			}
 		}
 	}
 	local_irq_restore(flags);
@@ -809,7 +816,7 @@ void drain_remote_pages(void)
 static void __drain_pages(unsigned int cpu)
 {
 	struct zone *zone;
-	int i;
+	int i, pindex;
 
 	for_each_zone(zone) {
 		struct per_cpu_pageset *pset;
@@ -819,8 +826,16 @@ static void __drain_pages(unsigned int c
 			struct per_cpu_pages *pcp;
 
 			pcp = &pset->pcp[i];
-			pcp->count -= free_pages_bulk(zone, pcp->count,
-						&pcp->list, 0);
+			for_each_pcputype(pindex) {
+				if (!pcp->count[pindex])
+					continue;
+
+				/* Try remove all pages from the pcpu list */
+				pcp->count[pindex] -= 
+					free_pages_bulk(zone, 
+						pcp->count[pindex],
+						&pcp->list[pindex], 0);
+			}
 		}
 	}
 }
@@ -902,6 +917,7 @@ static void fastcall free_hot_cold_page(
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
+	int pindex;
 
 	arch_free_page(page, 0);
 
@@ -911,11 +927,21 @@ static void fastcall free_hot_cold_page(
 		page->mapping = NULL;
 	free_pages_check(__FUNCTION__, page);
 	pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
+
+	/*
+	 * Strictly speaking, we should not be accessing the zone information
+	 * here. In this case, it does not matter if the read is incorrect
+	 */
+	if (get_pageblock_type(zone, page) == RCLM_USER)
+		pindex = PCPU_USER;
+	else
+		pindex = PCPU_KERNEL;
 	local_irq_save(flags);
-	list_add(&page->lru, &pcp->list);
-	pcp->count++;
-	if (pcp->count >= pcp->high)
-		pcp->count -= free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
+	list_add(&page->lru, &pcp->list[pindex]);
+	pcp->count[pindex]++;
+	if (pcp->count[pindex] >= pcp->high)
+		pcp->count[pindex] -= free_pages_bulk(zone, pcp->batch, 
+				&pcp->list[pindex], 0);
 	local_irq_restore(flags);
 	put_cpu();
 }
@@ -954,18 +980,25 @@ buffered_rmqueue(struct zone *zone, int 
 	
 	if (order == 0) {
 		struct per_cpu_pages *pcp;
-
+		int pindex = PCPU_KERNEL;
+		if (alloctype == RCLM_USER)
+			pindex = PCPU_USER;
+		
 		pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
 		local_irq_save(flags);
-		if (pcp->count <= pcp->low)
-			pcp->count += rmqueue_bulk(zone, 0,
-						pcp->batch, &pcp->list,
-						alloctype);
-		if (pcp->count) {
-			page = list_entry(pcp->list.next, struct page, lru);
+		if (pcp->count[pindex] <= pcp->low)
+			pcp->count[pindex] += rmqueue_bulk(zone,
+					0, pcp->batch,
+					&(pcp->list[pindex]),
+					alloctype);
+
+		if (pcp->count[pindex]) {
+			page = list_entry(pcp->list[pindex].next,
+					struct page, lru);
 			list_del(&page->lru);
-			pcp->count--;
+			pcp->count[pindex]--;
 		}
+
 		local_irq_restore(flags);
 		put_cpu();
 	}
@@ -1601,7 +1634,7 @@ void show_free_areas(void)
 					pageset->pcp[temperature].low,
 					pageset->pcp[temperature].high,
 					pageset->pcp[temperature].batch,
-					pageset->pcp[temperature].count);
+					pset_count(pageset->pcp[temperature]));
 		}
 	}
 
@@ -2039,18 +2072,22 @@ inline void setup_pageset(struct per_cpu
 	struct per_cpu_pages *pcp;
 
 	pcp = &p->pcp[0];		/* hot */
-	pcp->count = 0;
+	pcp->count[PCPU_KERNEL] = 0;
+	pcp->count[PCPU_USER] = 0;
 	pcp->low = 2 * batch;
-	pcp->high = 6 * batch;
+	pcp->high = 3 * batch;
 	pcp->batch = max(1UL, 1 * batch);
-	INIT_LIST_HEAD(&pcp->list);
+	INIT_LIST_HEAD(&pcp->list[PCPU_KERNEL]);
+	INIT_LIST_HEAD(&pcp->list[PCPU_USER]);
 
 	pcp = &p->pcp[1];		/* cold*/
-	pcp->count = 0;
+	pcp->count[PCPU_KERNEL] = 0;
+	pcp->count[PCPU_USER] = 0;
 	pcp->low = 0;
-	pcp->high = 2 * batch;
+	pcp->high = batch;
 	pcp->batch = max(1UL, 1 * batch);
-	INIT_LIST_HEAD(&pcp->list);
+	INIT_LIST_HEAD(&pcp->list[PCPU_KERNEL]);
+	INIT_LIST_HEAD(&pcp->list[PCPU_USER]);
 }
 
 #ifdef CONFIG_NUMA
@@ -2460,7 +2497,7 @@ static int zoneinfo_show(struct seq_file
 
 			pageset = zone_pcp(zone, i);
 			for (j = 0; j < ARRAY_SIZE(pageset->pcp); j++) {
-				if (pageset->pcp[j].count)
+				if (pset_count(pageset->pcp[j]))
 					break;
 			}
 			if (j == ARRAY_SIZE(pageset->pcp))
@@ -2473,7 +2510,7 @@ static int zoneinfo_show(struct seq_file
 					   "\n              high:  %i"
 					   "\n              batch: %i",
 					   i, j,
-					   pageset->pcp[j].count,
+					   pset_count(pageset->pcp[j]),
 					   pageset->pcp[j].low,
 					   pageset->pcp[j].high,
 					   pageset->pcp[j].batch);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 7/7] Fragmentation Avoidance V16: 007_stats
  2005-10-05 14:45 [PATCH 0/7] Fragmentation Avoidance V16 Mel Gorman
                   ` (5 preceding siblings ...)
  2005-10-05 14:46 ` [PATCH 6/7] Fragmentation Avoidance V16: 006_percpu Mel Gorman
@ 2005-10-05 14:46 ` Mel Gorman
  6 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 14:46 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Mel Gorman, kravetz, linux-kernel, jschopp, lhms-devel

It is not necessary to apply this patch to get all the anti-fragmentation
code. This patch adds a new config option called CONFIG_ALLOCSTATS. If
set, a number of new bean counters are added that are related to the
anti-fragmentation code. The information is exported via /proc/buddyinfo. This
is very useful when debugging why high-order pages are not available for
allocation.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-006_percpu/include/linux/mmzone.h linux-2.6.14-rc3-007_stats/include/linux/mmzone.h
--- linux-2.6.14-rc3-006_percpu/include/linux/mmzone.h	2005-10-05 12:26:56.000000000 +0100
+++ linux-2.6.14-rc3-007_stats/include/linux/mmzone.h	2005-10-05 12:29:57.000000000 +0100
@@ -176,6 +176,19 @@ struct zone {
 	unsigned long		fallback_reserve;
 	long			fallback_balance;
 
+#ifdef CONFIG_ALLOCSTATS
+	/*
+	 * These are beancounters that track how the placement policy
+	 * of the buddy allocator is performing
+	 */
+	unsigned long fallback_count[RCLM_TYPES];
+	unsigned long alloc_count[RCLM_TYPES];
+	unsigned long reserve_count[RCLM_TYPES];
+	unsigned long kernnorclm_full_steal;
+	unsigned long kernnorclm_partial_steal;
+#endif
+
+
 	ZONE_PADDING(_pad1_)
 
 	/* Fields commonly accessed by the page reclaim scanner */
@@ -265,6 +278,17 @@ struct zone {
 	char			*name;
 } ____cacheline_maxaligned_in_smp;
 
+#ifdef CONFIG_ALLOCSTATS
+#define inc_fallback_count(zone, type) zone->fallback_count[type]++
+#define inc_alloc_count(zone, type) zone->alloc_count[type]++
+#define inc_kernnorclm_partial_steal(zone) zone->kernnorclm_partial_steal++
+#define inc_kernnorclm_full_steal(zone) zone->kernnorclm_full_steal++
+#else
+#define inc_fallback_count(zone, type) do {} while (0)
+#define inc_alloc_count(zone, type) do {} while (0)
+#define inc_kernnorclm_partial_steal(zone) do {} while (0)
+#define inc_kernnorclm_full_steal(zone) do {} while (0)
+#endif
 
 /*
  * The "priority" of VM scanning is how much of the queues we will scan in one
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-006_percpu/lib/Kconfig.debug linux-2.6.14-rc3-007_stats/lib/Kconfig.debug
--- linux-2.6.14-rc3-006_percpu/lib/Kconfig.debug	2005-10-04 22:58:34.000000000 +0100
+++ linux-2.6.14-rc3-007_stats/lib/Kconfig.debug	2005-10-05 12:29:57.000000000 +0100
@@ -77,6 +77,17 @@ config SCHEDSTATS
 	  application, you can say N to avoid the very slight overhead
 	  this adds.
 
+config ALLOCSTATS
+	bool "Collection buddy allocator statistics"
+	depends on DEBUG_KERNEL && PROC_FS
+	help
+	  If you say Y here, additional code will be inserted into the
+	  page allocator routines to collect statistics on the allocator
+	  behavior and provide them in /proc/buddyinfo. These stats are
+	  useful for measuring fragmentation in the buddy allocator. If
+	  you are not debugging or measuring the allocator, you can say N
+	  to avoid the slight overhead this adds.
+
 config DEBUG_SLAB
 	bool "Debug memory allocations"
 	depends on DEBUG_KERNEL
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.14-rc3-006_percpu/mm/page_alloc.c linux-2.6.14-rc3-007_stats/mm/page_alloc.c
--- linux-2.6.14-rc3-006_percpu/mm/page_alloc.c	2005-10-05 12:27:48.000000000 +0100
+++ linux-2.6.14-rc3-007_stats/mm/page_alloc.c	2005-10-05 12:29:57.000000000 +0100
@@ -191,6 +191,11 @@ EXPORT_SYMBOL(zone_table);
 static char *zone_names[MAX_NR_ZONES] = { "DMA", "Normal", "HighMem" };
 int min_free_kbytes = 1024;
 
+#ifdef CONFIG_ALLOCSTATS
+static char *type_names[RCLM_TYPES] = { "KernNoRclm", "UserRclm",
+					"KernRclm", "Fallback"};
+#endif /* CONFIG_ALLOCSTATS */
+
 unsigned long __initdata nr_kernel_pages;
 unsigned long __initdata nr_all_pages;
 
@@ -675,6 +680,9 @@ fallback_buddy_reserve(int start_allocty
 
 		set_pageblock_type(zone, page, reserve_type);
 		inc_reserve_count(zone, reserve_type);
+		inc_kernnorclm_full_steal(zone);
+	} else {
+		inc_kernnorclm_partial_steal(zone);
 	}
 	return area;
 }
@@ -714,6 +722,15 @@ fallback_alloc(int alloctype, struct zon
 					current_order, area);
 
 		}
+
+		/*
+		 * If the current alloctype is RCLM_FALLBACK, it means
+		 * that the requested pool and fallback pool are both
+		 * depleted and we are falling back to other pools.
+		 * At this point, pools are starting to get fragmented
+		 */
+		if (alloctype == RCLM_FALLBACK)
+			inc_fallback_count(zone, start_alloctype);
 	}
 
 	return NULL;
@@ -730,6 +747,8 @@ static struct page *__rmqueue(struct zon
 	unsigned int current_order;
 	struct page *page;
 
+	inc_alloc_count(zone, alloctype);
+
 	for (current_order = order; current_order < MAX_ORDER; ++current_order) {
 		area = &(zone->free_area_lists[alloctype][current_order]);
 		if (list_empty(&area->free_list))
@@ -2319,6 +2338,20 @@ static void __init free_area_init_core(s
 		zone_start_pfn += size;
 
 		zone_init_free_lists(pgdat, zone, zone->spanned_pages);
+
+#ifdef CONFIG_ALLOCSTAT
+		memset((unsigned long *)zone->fallback_count, 0,
+				sizeof(zone->fallback_count));
+		memset((unsigned long *)zone->alloc_count, 0,
+				sizeof(zone->alloc_count));
+		memset((unsigned long *)zone->alloc_count, 0,
+				sizeof(zone->alloc_count));
+		zone->kernnorclm_partial_steal=0;
+		zone->kernnorclm_full_steal=0;
+		zone->reserve_count[RCLM_NORCLM] =
+				realsize >> (MAX_ORDER-1);
+#endif
+
 	}
 }
 
@@ -2415,6 +2448,18 @@ static int frag_show(struct seq_file *m,
 	int order, type;
 	struct free_area *area;
 	unsigned long nr_bufs = 0;
+#ifdef CONFIG_ALLOCSTATS
+	int i;
+	unsigned long kernnorclm_full_steal=0;
+	unsigned long kernnorclm_partial_steal=0;
+	unsigned long reserve_count[RCLM_TYPES];
+	unsigned long fallback_count[RCLM_TYPES];
+	unsigned long alloc_count[RCLM_TYPES];
+
+	memset(reserve_count, 0, sizeof(reserve_count));
+	memset(fallback_count, 0, sizeof(fallback_count));
+	memset(alloc_count, 0, sizeof(alloc_count));
+#endif
 
 	for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
 		if (!zone->present_pages)
@@ -2435,6 +2480,79 @@ static int frag_show(struct seq_file *m,
 		spin_unlock_irqrestore(&zone->lock, flags);
 		seq_putc(m, '\n');
 	}
+
+#ifdef CONFIG_ALLOCSTATS
+ 	/* Show statistics for each allocation type */
+ 	seq_printf(m, "\nPer-allocation-type statistics");
+ 	for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
+ 		if (!zone->present_pages)
+ 			continue;
+ 
+ 		spin_lock_irqsave(&zone->lock, flags);
+ 		for (type=0; type < RCLM_TYPES; type++) {
+			struct list_head *elem;
+ 			seq_printf(m, "\nNode %d, zone %8s, type %10s ", 
+ 					pgdat->node_id, zone->name,
+ 					type_names[type]);
+ 			for (order = 0; order < MAX_ORDER; ++order) {
+ 				nr_bufs = 0;
+
+ 				list_for_each(elem, &(zone->free_area_lists[type][order].free_list))
+ 					++nr_bufs;
+ 				seq_printf(m, "%6lu ", nr_bufs);
+ 			}
+		}
+ 
+ 		/* Scan global list */
+ 		seq_printf(m, "\n");
+ 		seq_printf(m, "Node %d, zone %8s, type %10s", 
+ 					pgdat->node_id, zone->name,
+ 					"MAX_ORDER");
+		nr_bufs = 0;
+		for (type=0; type < RCLM_TYPES; type++) {
+			nr_bufs += zone->free_area_lists[type][MAX_ORDER-1].nr_free;
+		}
+ 		seq_printf(m, "%6lu ", nr_bufs);
+		seq_printf(m, "\n");
+
+ 		seq_printf(m, "%s Zone beancounters\n", zone->name);
+		seq_printf(m, "Fallback balance: %d\n", (int)zone->fallback_balance);
+		seq_printf(m, "Fallback reserve: %d\n", (int)zone->fallback_reserve);
+		seq_printf(m, "Partial steal:    %lu\n", zone->kernnorclm_partial_steal);
+		seq_printf(m, "Full steal:       %lu\n", zone->kernnorclm_full_steal);
+
+		kernnorclm_partial_steal += zone->kernnorclm_partial_steal;
+		kernnorclm_full_steal += zone->kernnorclm_full_steal;
+		seq_putc(m, '\n');
+
+		for (i=0; i< RCLM_TYPES; i++) {
+			seq_printf(m, "%-10s Allocs: %-10lu Reserve: %-10lu Fallbacks: %-10lu\n",
+					type_names[i],
+					zone->alloc_count[i],
+					zone->reserve_count[i],
+					zone->fallback_count[i]);
+			alloc_count[i] += zone->alloc_count[i];
+			reserve_count[i] += zone->reserve_count[i];
+			fallback_count[i] += zone->fallback_count[i];
+		}
+
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+
+
+ 	/* Show bean counters */
+ 	seq_printf(m, "\nGlobal beancounters\n");
+	seq_printf(m, "Partial steal:    %lu\n", kernnorclm_partial_steal);
+	seq_printf(m, "Full steal:       %lu\n", kernnorclm_full_steal);
+
+	for (i=0; i< RCLM_TYPES; i++) {
+ 		seq_printf(m, "%-10s Allocs: %-10lu Reserve: %-10lu Fallbacks: %-10lu\n", 
+				type_names[i], 
+				alloc_count[i],
+				reserve_count[i],
+				fallback_count[i]);
+	}
+#endif /* CONFIG_ALLOCSTATS */
 	return 0;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore
  2005-10-05 14:46 ` [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore Mel Gorman
@ 2005-10-05 16:48   ` Dave Hansen
  2005-10-05 16:57     ` Mel Gorman
  2005-10-05 17:14     ` Mel Gorman
  0 siblings, 2 replies; 27+ messages in thread
From: Dave Hansen @ 2005-10-05 16:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> 
> @@ -1483,8 +1540,10 @@ void show_free_areas(void)
>  
>                 spin_lock_irqsave(&zone->lock, flags);
>                 for (order = 0; order < MAX_ORDER; order++) {
> -                       nr = zone->free_area[order].nr_free;
> -                       total += nr << order;
> +                       for (type=0; type < RCLM_TYPES; type++) {
> +                               nr = zone->free_area_lists[type][order].nr_free;
> +                               total += nr << order;
> +                       }

Can that use the new for_each_ macro?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 14:46 ` [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback Mel Gorman
@ 2005-10-05 16:49   ` Dave Hansen
  2005-10-05 17:11     ` Mel Gorman
  2005-10-05 16:51   ` Dave Hansen
  2005-10-05 16:53   ` Dave Hansen
  2 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2005-10-05 16:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> 
> +#ifdef CONFIG_ALLOCSTATS
> +       zone->reserve_count[type]++;
> +#endif

Did this sneak up from another patch?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 14:46 ` [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback Mel Gorman
  2005-10-05 16:49   ` Dave Hansen
@ 2005-10-05 16:51   ` Dave Hansen
  2005-10-05 17:16     ` Mel Gorman
  2005-10-05 16:53   ` Dave Hansen
  2 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2005-10-05 16:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> 
> + */
> +static inline struct free_area *
> +fallback_buddy_reserve(int start_alloctype, struct zone *zone,
> +                       unsigned int current_order, struct page *page,
> +                       struct free_area *area)
> +{
> +       if (start_alloctype != RCLM_NORCLM)
> +               return area;
> +
> +       area = &(zone->free_area_lists[RCLM_NORCLM][current_order]);
> +
> +       /* Reserve the whole block if this is a large split */
> +       if (current_order >= MAX_ORDER / 2) {
> +               int reserve_type=RCLM_NORCLM;

-EBADCODINGSTYLE.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 14:46 ` [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback Mel Gorman
  2005-10-05 16:49   ` Dave Hansen
  2005-10-05 16:51   ` Dave Hansen
@ 2005-10-05 16:53   ` Dave Hansen
  2005-10-05 17:20     ` Mel Gorman
  2 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2005-10-05 16:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> +static struct page *
> +fallback_alloc(int alloctype, struct zone *zone, unsigned int order)
> {
...
> +       /*
> +        * Here, the alloc type lists has been depleted as well as the global
> +        * pool, so fallback. When falling back, the largest possible block
> +        * will be taken to keep the fallbacks clustered if possible
> +        */
> +       while ((alloctype = *(++fallback_list)) != -1) {

That's a bit obtuse.  Is there no way to simplify it?  Just keeping an
index instead of a fallback_list pointer should make it quite a bit
easier to grok.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore
  2005-10-05 16:48   ` Dave Hansen
@ 2005-10-05 16:57     ` Mel Gorman
  2005-10-05 17:14     ` Mel Gorman
  1 sibling, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 16:57 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 5 Oct 2005, Dave Hansen wrote:

> On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> >
> > @@ -1483,8 +1540,10 @@ void show_free_areas(void)
> >
> >                 spin_lock_irqsave(&zone->lock, flags);
> >                 for (order = 0; order < MAX_ORDER; order++) {
> > -                       nr = zone->free_area[order].nr_free;
> > -                       total += nr << order;
> > +                       for (type=0; type < RCLM_TYPES; type++) {
> > +                               nr = zone->free_area_lists[type][order].nr_free;
> > +                               total += nr << order;
> > +                       }
>
> Can that use the new for_each_ macro?
>

Yes, yes it can. Not sure why I dismissed it the first time around.

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 16:49   ` Dave Hansen
@ 2005-10-05 17:11     ` Mel Gorman
  0 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 17:11 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 5 Oct 2005, Dave Hansen wrote:

> On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> >
> > +#ifdef CONFIG_ALLOCSTATS
> > +       zone->reserve_count[type]++;
> > +#endif
>
> Did this sneak up from another patch?
>

Worse, it is not active until a later patch - 007_stats. Both patches
fixed now.

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore
  2005-10-05 16:48   ` Dave Hansen
  2005-10-05 16:57     ` Mel Gorman
@ 2005-10-05 17:14     ` Mel Gorman
  2005-10-05 17:22       ` Dave Hansen
  1 sibling, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 17:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 5 Oct 2005, Dave Hansen wrote:

> On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> >
> > @@ -1483,8 +1540,10 @@ void show_free_areas(void)
> >
> >                 spin_lock_irqsave(&zone->lock, flags);
> >                 for (order = 0; order < MAX_ORDER; order++) {
> > -                       nr = zone->free_area[order].nr_free;
> > -                       total += nr << order;
> > +                       for (type=0; type < RCLM_TYPES; type++) {
> > +                               nr = zone->free_area_lists[type][order].nr_free;
> > +                               total += nr << order;
> > +                       }
>
> Can that use the new for_each_ macro?
>

Now I remember why, it's because of the printf below "for (type=0" . The
printf has to happen once for each order. With the for_each_macro, it
would happen for each type *and* order.

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 16:51   ` Dave Hansen
@ 2005-10-05 17:16     ` Mel Gorman
  2005-10-05 17:20       ` Dave Hansen
  0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 17:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 5 Oct 2005, Dave Hansen wrote:

> On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> >
> > + */
> > +static inline struct free_area *
> > +fallback_buddy_reserve(int start_alloctype, struct zone *zone,
> > +                       unsigned int current_order, struct page *page,
> > +                       struct free_area *area)
> > +{
> > +       if (start_alloctype != RCLM_NORCLM)
> > +               return area;
> > +
> > +       area = &(zone->free_area_lists[RCLM_NORCLM][current_order]);
> > +
> > +       /* Reserve the whole block if this is a large split */
> > +       if (current_order >= MAX_ORDER / 2) {
> > +               int reserve_type=RCLM_NORCLM;
>
> -EBADCODINGSTYLE.
>

Changed to;

+static inline struct free_area *
+fallback_buddy_reserve(int start_alloctype, struct zone *zone,
+                       unsigned int current_order, struct page *page,
+                       struct free_area *area)
+{
+       int reserve_type;
+       if (start_alloctype != RCLM_NORCLM)
+               return area;
+
+       area = &(zone->free_area_lists[RCLM_NORCLM][current_order]);
+
+       /* Reserve the whole block if this is a large split */
+       if (current_order >= MAX_ORDER / 2) {
+               reserve_type=RCLM_NORCLM;

(Ignore the whitespace damage, cutting and pasting just so you can see it)

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 16:53   ` Dave Hansen
@ 2005-10-05 17:20     ` Mel Gorman
  2005-10-05 17:25       ` Dave Hansen
  0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 17:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 5 Oct 2005, Dave Hansen wrote:

> On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> > +static struct page *
> > +fallback_alloc(int alloctype, struct zone *zone, unsigned int order)
> > {
> ...
> > +       /*
> > +        * Here, the alloc type lists has been depleted as well as the global
> > +        * pool, so fallback. When falling back, the largest possible block
> > +        * will be taken to keep the fallbacks clustered if possible
> > +        */
> > +       while ((alloctype = *(++fallback_list)) != -1) {
>
> That's a bit obtuse.  Is there no way to simplify it?  Just keeping an
> index instead of a fallback_list pointer should make it quite a bit
> easier to grok.
>

Changed to

for (i = 0; (alloctype = fallback_list[i]) != -1; i++) {

where i is declared a the start of the function. It's essentially the same
as how we move through the zones fallback list so should seem familiar. Is
that better?

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 17:16     ` Mel Gorman
@ 2005-10-05 17:20       ` Dave Hansen
  2005-10-05 17:32         ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2005-10-05 17:20 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 2005-10-05 at 18:16 +0100, Mel Gorman wrote:
> 
> +               reserve_type=RCLM_NORCLM;
> 
> (Ignore the whitespace damage, cutting and pasting just so you can see
> it)

Sorry, should have been more specific.  You need spaces around the '='.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore
  2005-10-05 17:14     ` Mel Gorman
@ 2005-10-05 17:22       ` Dave Hansen
  2005-10-05 17:45         ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2005-10-05 17:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 2005-10-05 at 18:14 +0100, Mel Gorman wrote:
> On Wed, 5 Oct 2005, Dave Hansen wrote:
> > On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> > >
> > > @@ -1483,8 +1540,10 @@ void show_free_areas(void)
> > >
> > >                 spin_lock_irqsave(&zone->lock, flags);
> > >                 for (order = 0; order < MAX_ORDER; order++) {
> > > -                       nr = zone->free_area[order].nr_free;
> > > -                       total += nr << order;
> > > +                       for (type=0; type < RCLM_TYPES; type++) {
> > > +                               nr = zone->free_area_lists[type][order].nr_free;
> > > +                               total += nr << order;
> > > +                       }
> >
> > Can that use the new for_each_ macro?
> 
> Now I remember why, it's because of the printf below "for (type=0" . The
> printf has to happen once for each order. With the for_each_macro, it
> would happen for each type *and* order.

Actually, that's for debugging, so we might want to do that anyway.  Can
you put it in a separate patch and explain?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 17:20     ` Mel Gorman
@ 2005-10-05 17:25       ` Dave Hansen
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Hansen @ 2005-10-05 17:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 2005-10-05 at 18:20 +0100, Mel Gorman wrote:
> Changed to
> 
> for (i = 0; (alloctype = fallback_list[i]) != -1; i++) {
> 
> where i is declared a the start of the function. It's essentially the same
> as how we move through the zones fallback list so should seem familiar. Is
> that better?

Yep, at least I understand what it's doing :)

One thing you might consider is not doing the assignment in the for()
body:

	for (i = 0; fallback_list[i] != -1; i++) {
		alloctype = fallback_list[i];
		...

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback
  2005-10-05 17:20       ` Dave Hansen
@ 2005-10-05 17:32         ` Mel Gorman
  0 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 17:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 5 Oct 2005, Dave Hansen wrote:

> On Wed, 2005-10-05 at 18:16 +0100, Mel Gorman wrote:
> >
> > +               reserve_type=RCLM_NORCLM;
> >
> > (Ignore the whitespace damage, cutting and pasting just so you can see
> > it)
>
> Sorry, should have been more specific.  You need spaces around the '='.
>

Ah. It's now added to my growing list of "Highlight bad style in red"
list of vim macros. Sholdn't happen again.

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore
  2005-10-05 17:22       ` Dave Hansen
@ 2005-10-05 17:45         ` Mel Gorman
  2005-10-05 17:51           ` Dave Hansen
  0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2005-10-05 17:45 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 5 Oct 2005, Dave Hansen wrote:

> On Wed, 2005-10-05 at 18:14 +0100, Mel Gorman wrote:
> > On Wed, 5 Oct 2005, Dave Hansen wrote:
> > > On Wed, 2005-10-05 at 15:46 +0100, Mel Gorman wrote:
> > > >
> > > > @@ -1483,8 +1540,10 @@ void show_free_areas(void)
> > > >
> > > >                 spin_lock_irqsave(&zone->lock, flags);
> > > >                 for (order = 0; order < MAX_ORDER; order++) {
> > > > -                       nr = zone->free_area[order].nr_free;
> > > > -                       total += nr << order;
> > > > +                       for (type=0; type < RCLM_TYPES; type++) {
> > > > +                               nr = zone->free_area_lists[type][order].nr_free;
> > > > +                               total += nr << order;
> > > > +                       }
> > >
> > > Can that use the new for_each_ macro?
> >
> > Now I remember why, it's because of the printf below "for (type=0" . The
> > printf has to happen once for each order. With the for_each_macro, it
> > would happen for each type *and* order.
>
> Actually, that's for debugging, so we might want to do that anyway.  Can
> you put it in a separate patch and explain?
>

To print out for each type and order, I'll need the type_names[] array
from 007_stats but I don't see it as a problem.

The problem is that by putting all the changes to this function in another
patch, the kernel will not build after applying 003_fragcore. I am
assuming that is bad. I think it makes sense to leave this patch as it is,
but have a 004_showfree patch that adds the type_names[] array and a more
detailed printout in show_free_areas. The remaining patches get bumped up
a number.

Would you be happy with that?

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore
  2005-10-05 17:45         ` Mel Gorman
@ 2005-10-05 17:51           ` Dave Hansen
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Hansen @ 2005-10-05 17:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Andrew Morton, kravetz, Linux Kernel Mailing List,
	jschopp, lhms

On Wed, 2005-10-05 at 18:45 +0100, Mel Gorman wrote:
> The problem is that by putting all the changes to this function in another
> patch, the kernel will not build after applying 003_fragcore. I am
> assuming that is bad. I think it makes sense to leave this patch as it is,
> but have a 004_showfree patch that adds the type_names[] array and a more
> detailed printout in show_free_areas. The remaining patches get bumped up
> a number.
> 
> Would you be happy with that?

Seems reasonable to me.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/7] Fragmentation Avoidance V16: 002_usemap
  2005-10-05 14:45 ` [PATCH 2/7] Fragmentation Avoidance V16: 002_usemap Mel Gorman
@ 2005-10-05 23:38   ` David S. Miller, Mel Gorman
  2005-10-06 15:12     ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: David S. Miller, Mel Gorman @ 2005-10-05 23:38 UTC (permalink / raw)
  To: mel; +Cc: linux-mm, akpm, kravetz, linux-kernel, jschopp, lhms-devel

> +	unsigned int type = 0;
 ...
> +	bitidx = pfn_to_bitidx(zone, pfn);
> +	usemap = pfn_to_usemap(zone, pfn);
> +

There seems no strong reason not to use "unsigned long" for "type" and
besides that will provide the required alignment for the bitops
interfaces.  "unsigned int" is not sufficient.

Then we also don't need to thing about "does this work on big-endian
64-bit" and things of that nature.

Please audit your other bitops uses for this issue.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] Fragmentation Avoidance V16: 001_antidefrag_flags
  2005-10-05 14:45 ` [PATCH 1/7] Fragmentation Avoidance V16: 001_antidefrag_flags Mel Gorman
@ 2005-10-06 15:11   ` Paul Jackson
  2005-10-06 15:15     ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Jackson @ 2005-10-06 15:11 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, akpm, kravetz, linux-kernel, jschopp, lhms-devel

Mel wrote:
> +/* Allocation type modifiers, group together if possible */

Isn't that "if possible" bogus.  I thought these two bits
_had_ to be grouped together, at least with the current code.

What happened to the comment that Joel added to gpl.h:

+/* Allocation type modifiers, these are required to be adjacent
+ * __GPF_USER: Allocation for user page or a buffer page
+ * __GFP_KERNRCLM: Short-lived or reclaimable kernel allocation
+ * Both bits off: Kernel non-reclaimable or very hard to reclaim
+ * RCLM_SHIFT (defined elsewhere) depends on the location of these bits

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/7] Fragmentation Avoidance V16: 002_usemap
  2005-10-05 23:38   ` David S. Miller, Mel Gorman
@ 2005-10-06 15:12     ` Mel Gorman
  0 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-06 15:12 UTC (permalink / raw)
  To: David S. Miller
  Cc: linux-mm, akpm, kravetz, linux-kernel, jschopp, lhms-devel

On Wed, 5 Oct 2005, David S. Miller wrote:

> From: Mel Gorman <mel@csn.ul.ie>
> Date: Wed,  5 Oct 2005 15:45:57 +0100 (IST)
>
> > +	unsigned int type = 0;
>  ...
> > +	bitidx = pfn_to_bitidx(zone, pfn);
> > +	usemap = pfn_to_usemap(zone, pfn);
> > +
>
> There seems no strong reason not to use "unsigned long" for "type" and
> besides that will provide the required alignment for the bitops
> interfaces.  "unsigned int" is not sufficient.
>

There is no strong reason. I'll convert them to unsigned longs and check
for implicit type conversions.

> Then we also don't need to thing about "does this work on big-endian
> 64-bit" and things of that nature.
>

Always a plus.

> Please audit your other bitops uses for this issue.
>

I will. Thanks

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] Fragmentation Avoidance V16: 001_antidefrag_flags
  2005-10-06 15:11   ` Paul Jackson
@ 2005-10-06 15:15     ` Mel Gorman
  0 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2005-10-06 15:15 UTC (permalink / raw)
  To: Paul Jackson; +Cc: linux-mm, akpm, kravetz, linux-kernel, jschopp, lhms-devel

On Thu, 6 Oct 2005, Paul Jackson wrote:

> Mel wrote:
> > +/* Allocation type modifiers, group together if possible */
>
> Isn't that "if possible" bogus.  I thought these two bits
> _had_ to be grouped together, at least with the current code.
>
> What happened to the comment that Joel added to gpl.h:
>

My bad. I missed it while resyncing with Joel. I have it in now.

-- 
Mel Gorman
Part-time Phd Student                          Java Applications Developer
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2005-10-06 15:15 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-05 14:45 [PATCH 0/7] Fragmentation Avoidance V16 Mel Gorman
2005-10-05 14:45 ` [PATCH 1/7] Fragmentation Avoidance V16: 001_antidefrag_flags Mel Gorman
2005-10-06 15:11   ` Paul Jackson
2005-10-06 15:15     ` Mel Gorman
2005-10-05 14:45 ` [PATCH 2/7] Fragmentation Avoidance V16: 002_usemap Mel Gorman
2005-10-05 23:38   ` David S. Miller, Mel Gorman
2005-10-06 15:12     ` Mel Gorman
2005-10-05 14:46 ` [PATCH 3/7] Fragmentation Avoidance V16: 003_fragcore Mel Gorman
2005-10-05 16:48   ` Dave Hansen
2005-10-05 16:57     ` Mel Gorman
2005-10-05 17:14     ` Mel Gorman
2005-10-05 17:22       ` Dave Hansen
2005-10-05 17:45         ` Mel Gorman
2005-10-05 17:51           ` Dave Hansen
2005-10-05 14:46 ` [PATCH 4/7] Fragmentation Avoidance V16: 004_largealloc_tryharder Mel Gorman
2005-10-05 14:46 ` [PATCH 5/7] Fragmentation Avoidance V16: 005_fallback Mel Gorman
2005-10-05 16:49   ` Dave Hansen
2005-10-05 17:11     ` Mel Gorman
2005-10-05 16:51   ` Dave Hansen
2005-10-05 17:16     ` Mel Gorman
2005-10-05 17:20       ` Dave Hansen
2005-10-05 17:32         ` Mel Gorman
2005-10-05 16:53   ` Dave Hansen
2005-10-05 17:20     ` Mel Gorman
2005-10-05 17:25       ` Dave Hansen
2005-10-05 14:46 ` [PATCH 6/7] Fragmentation Avoidance V16: 006_percpu Mel Gorman
2005-10-05 14:46 ` [PATCH 7/7] Fragmentation Avoidance V16: 007_stats Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox