[PATCH 0/8] Review-based updates to grouping pages by mobility

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/8] Review-based updates to grouping pages by mobility
@ 2007-05-15 15:03 Mel Gorman
  2007-05-15 15:03 ` [PATCH 1/8] Do not depend on MAX_ORDER when " Mel Gorman
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 15:03 UTC (permalink / raw)
  To: clameter; +Cc: Mel Gorman, linux-mm

Hi Christoph,

The following patches address points brought up by your review of the
grouping pages by mobility patches. There are quite a number of patches here.

The first patch allows grouping by mobility at sizes other than
MAX_ORDER_NR_PAGES.  The size is based on the order of the system hugepage
where that is defined. When possible this is specified as a compile time
constant to help the optimiser. It does change the handling of hugepagesz
from __setup() to early_param() which needs looking at.

The second and third patches provide some statistics in relation to
fragmentation avoidance.

Patches four and five are fixes for incorrectly flagged allocations sites.

Patches six, seven and eight extend the allocation types available and
convert allocation sites to use them. This corrects a number of areas
where call-sites are annotated incorrectly.

This set of patches handles most of the items in the TODO list that were
brought up during your review. There is another patch which groups page
cache pages separetly to other allocations but I'm holding off on it for
the moment in light of Nicolas's bug reports although they now appear to be
resolved. The last two items are SLAB_PERSISTENT and resizing ZONE_MOVABLE. I
glanced to check if SLAB_PERSISTENT would be useful but it doesn't seem to
be the case yet. The last item was resizing ZONE_MOVABLE at runtime.
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/8] Do not depend on MAX_ORDER when grouping pages by mobility
  2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
@ 2007-05-15 15:03 ` Mel Gorman
  2007-05-15 18:19   ` Christoph Lameter
  2007-05-15 15:03 ` [PATCH 2/8] Print out statistics in relation to fragmentation avoidance to /proc/fragavoidance Mel Gorman
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 15:03 UTC (permalink / raw)
  To: clameter; +Cc: Mel Gorman, linux-mm

Currently mobility grouping works at the MAX_ORDER_NR_PAGES level.
This makes sense for the majority of users where this is also the huge page
size. However, on platforms like ia64 where the huge page size is runtime
configurable it is desirable to group at a lower order.  On x86_64 and
occasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES.

This patch groups pages together based on the value of HUGETLB_PAGE_ORDER. It
uses a compile-time constant if possible and a variable where the huge page
size is runtime configurable.

It is assumed that grouping should be done at the lowest sensible order
and that the user would not want to override this.  If this is not true,
page_block order could be forced to a variable initialised via a boot-time
kernel parameter.

One potential issue with this patch is that IA64 now parses hugepagesz
with early_param() instead of __setup(). __setup() is called after the
memory allocator has been initialised and the pageblock bitmaps already
setup. In tests on one IA64 there did not seem to be any problem with using
early_param() and in fact may be more correct as it guarantees the parameter
is handled before the parsing of hugepages=.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 arch/ia64/Kconfig               |    5 ++
 arch/ia64/mm/hugetlbpage.c      |    4 -
 include/linux/mmzone.h          |    4 -
 include/linux/pageblock-flags.h |   24 +++++++++++
 mm/page_alloc.c                 |   71 ++++++++++++++++++++++++-----------
 5 files changed, 82 insertions(+), 26 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-001_v1r2/arch/ia64/Kconfig linux-2.6.21-mm2-002_group_arbitrary/arch/ia64/Kconfig
--- linux-2.6.21-mm2-001_v1r2/arch/ia64/Kconfig	2007-05-11 21:16:07.000000000 +0100
+++ linux-2.6.21-mm2-002_group_arbitrary/arch/ia64/Kconfig	2007-05-15 12:23:22.000000000 +0100
@@ -54,6 +54,11 @@ config ARCH_HAS_ILOG2_U64
 	bool
 	default n
 
+config HUGETLB_PAGE_SIZE_VARIABLE
+	depends on HUGETLB_PAGE
+	bool
+	default y
+
 config GENERIC_FIND_NEXT_BIT
 	bool
 	default y
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-001_v1r2/arch/ia64/mm/hugetlbpage.c linux-2.6.21-mm2-002_group_arbitrary/arch/ia64/mm/hugetlbpage.c
--- linux-2.6.21-mm2-001_v1r2/arch/ia64/mm/hugetlbpage.c	2007-05-11 21:16:07.000000000 +0100
+++ linux-2.6.21-mm2-002_group_arbitrary/arch/ia64/mm/hugetlbpage.c	2007-05-15 12:23:22.000000000 +0100
@@ -195,6 +195,6 @@ static int __init hugetlb_setup_sz(char 
 	 * override here with new page shift.
 	 */
 	ia64_set_rr(HPAGE_REGION_BASE, hpage_shift << 2);
-	return 1;
+	return 0;
 }
-__setup("hugepagesz=", hugetlb_setup_sz);
+early_param("hugepagesz", hugetlb_setup_sz);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-001_v1r2/include/linux/mmzone.h linux-2.6.21-mm2-002_group_arbitrary/include/linux/mmzone.h
--- linux-2.6.21-mm2-001_v1r2/include/linux/mmzone.h	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-002_group_arbitrary/include/linux/mmzone.h	2007-05-15 12:23:22.000000000 +0100
@@ -238,7 +238,7 @@ struct zone {
 
 #ifndef CONFIG_SPARSEMEM
 	/*
-	 * Flags for a MAX_ORDER_NR_PAGES block. See pageblock-flags.h.
+	 * Flags for a nr_pages_pageblock block. See pageblock-flags.h.
 	 * In SPARSEMEM, this map is stored in struct mem_section
 	 */
 	unsigned long		*pageblock_flags;
@@ -707,7 +707,7 @@ extern struct zone *next_zone(struct zon
 #define PAGE_SECTION_MASK	(~(PAGES_PER_SECTION-1))
 
 #define SECTION_BLOCKFLAGS_BITS \
-		((1 << (PFN_SECTION_SHIFT - (MAX_ORDER-1))) * NR_PAGEBLOCK_BITS)
+	((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
 
 #if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-001_v1r2/include/linux/pageblock-flags.h linux-2.6.21-mm2-002_group_arbitrary/include/linux/pageblock-flags.h
--- linux-2.6.21-mm2-001_v1r2/include/linux/pageblock-flags.h	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-002_group_arbitrary/include/linux/pageblock-flags.h	2007-05-15 12:23:22.000000000 +0100
@@ -1,6 +1,6 @@
 /*
  * Macros for manipulating and testing flags related to a
- * MAX_ORDER_NR_PAGES block of pages.
+ * nr_pages_pageblock number of pages.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -35,6 +35,28 @@ enum pageblock_bits {
 	NR_PAGEBLOCK_BITS
 };
 
+#ifdef CONFIG_HUGETLB_PAGE
+
+#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
+
+/* Huge page sizes are variable */
+extern int pageblock_order;
+
+#else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
+
+/* Huge pages are a constant size */
+#define pageblock_order		HUGETLB_PAGE_ORDER
+
+#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
+
+#else /* CONFIG_HUGETLB_PAGE */
+
+/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
+#define pageblock_order		(MAX_ORDER-1)
+#endif /* CONFIG_HUGETLB_PAGE */
+
+#define nr_pages_pageblock	(1UL << pageblock_order)
+
 /* Forward declaration */
 struct page;
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-001_v1r2/mm/page_alloc.c linux-2.6.21-mm2-002_group_arbitrary/mm/page_alloc.c
--- linux-2.6.21-mm2-001_v1r2/mm/page_alloc.c	2007-05-15 12:21:44.000000000 +0100
+++ linux-2.6.21-mm2-002_group_arbitrary/mm/page_alloc.c	2007-05-15 12:23:22.000000000 +0100
@@ -59,6 +59,10 @@ unsigned long totalreserve_pages __read_
 long nr_swap_pages;
 int percpu_pagelist_fraction;
 
+#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
+int pageblock_order __read_mostly;
+#endif
+
 static void __free_pages_ok(struct page *page, unsigned int order);
 
 /*
@@ -721,7 +725,7 @@ static int fallbacks[MIGRATE_TYPES][MIGR
 
 /*
  * Move the free pages in a range to the free lists of the requested type.
- * Note that start_page and end_pages are not aligned in a MAX_ORDER_NR_PAGES
+ * Note that start_page and end_pages are not aligned on a pageblock
  * boundary. If alignment is required, use move_freepages_block()
  */
 int move_freepages(struct zone *zone,
@@ -771,10 +775,10 @@ int move_freepages_block(struct zone *zo
 	struct page *start_page, *end_page;
 
 	start_pfn = page_to_pfn(page);
-	start_pfn = start_pfn & ~(MAX_ORDER_NR_PAGES-1);
+	start_pfn = start_pfn & ~(nr_pages_pageblock-1);
 	start_page = pfn_to_page(start_pfn);
-	end_page = start_page + MAX_ORDER_NR_PAGES - 1;
-	end_pfn = start_pfn + MAX_ORDER_NR_PAGES - 1;
+	end_page = start_page + nr_pages_pageblock - 1;
+	end_pfn = start_pfn + nr_pages_pageblock - 1;
 
 	/* Do not cross zone boundaries */
 	if (start_pfn < zone->zone_start_pfn)
@@ -838,14 +842,15 @@ static struct page *__rmqueue_fallback(s
 			 * back for a reclaimable kernel allocation, be more
 			 * agressive about taking ownership of free pages
 			 */
-			if (unlikely(current_order >= MAX_ORDER / 2) ||
+			if (unlikely(current_order >= (pageblock_order >> 1)) ||
 					start_migratetype == MIGRATE_RECLAIMABLE) {
 				unsigned long pages;
 				pages = move_freepages_block(zone, page,
 								start_migratetype);
+				pages <<= current_order;
 
 				/* Claim the whole block if over half of it is free */
-				if ((pages << current_order) >= (1 << (MAX_ORDER-2)))
+				if (pages >= (1 << (pageblock_order-1)))
 					set_pageblock_migratetype(page,
 								start_migratetype);
 
@@ -858,7 +863,7 @@ static struct page *__rmqueue_fallback(s
 			__mod_zone_page_state(zone, NR_FREE_PAGES,
 							-(1UL << order));
 
-			if (current_order == MAX_ORDER - 1)
+			if (current_order == pageblock_order)
 				set_pageblock_migratetype(page,
 							start_migratetype);
 
@@ -2204,14 +2209,16 @@ void __meminit build_all_zonelists(void)
 	 * made on memory-hotadd so a system can start with mobility
 	 * disabled and enable it later
 	 */
-	if (vm_total_pages < (MAX_ORDER_NR_PAGES * MIGRATE_TYPES))
+	if (vm_total_pages < (nr_pages_pageblock * MIGRATE_TYPES))
 		page_group_by_mobility_disabled = 1;
 	else
 		page_group_by_mobility_disabled = 0;
 
-	printk("Built %i zonelists, mobility grouping %s.  Total pages: %ld\n",
+	printk(KERN_INFO "Built %i zonelists, mobility grouping %s order %d. "
+		"Total pages: %ld\n",
 			num_online_nodes(),
 			page_group_by_mobility_disabled ? "off" : "on",
+			pageblock_order,
 			vm_total_pages);
 }
 
@@ -2284,7 +2291,7 @@ static inline unsigned long wait_table_b
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
 /*
- * Mark a number of MAX_ORDER_NR_PAGES blocks as MIGRATE_RESERVE. The number
+ * Mark a number of pageblocks as MIGRATE_RESERVE. The number
  * of blocks reserved is based on zone->pages_min. The memory within the
  * reserve will tend to store contiguous free pages. Setting min_free_kbytes
  * higher will lead to a bigger reserve which will get freed as contiguous
@@ -2299,9 +2306,10 @@ static void setup_zone_migrate_reserve(s
 	/* Get the start pfn, end pfn and the number of blocks to reserve */
 	start_pfn = zone->zone_start_pfn;
 	end_pfn = start_pfn + zone->spanned_pages;
-	reserve = roundup(zone->pages_min, MAX_ORDER_NR_PAGES) >> (MAX_ORDER-1);
+	reserve = roundup(zone->pages_min, nr_pages_pageblock) >>
+							pageblock_order;
 
-	for (pfn = start_pfn; pfn < end_pfn; pfn += MAX_ORDER_NR_PAGES) {
+	for (pfn = start_pfn; pfn < end_pfn; pfn += nr_pages_pageblock) {
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
@@ -2376,7 +2384,7 @@ void __meminit memmap_init_zone(unsigned
 		 * the start are marked MIGRATE_RESERVE by
 		 * setup_zone_migrate_reserve()
 		 */
-		if ((pfn & (MAX_ORDER_NR_PAGES-1)))
+		if ((pfn & (nr_pages_pageblock-1)))
 			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 
 		INIT_LIST_HEAD(&page->lru);
@@ -3080,8 +3088,8 @@ static void __meminit calculate_node_tot
 #ifndef CONFIG_SPARSEMEM
 /*
  * Calculate the size of the zone->blockflags rounded to an unsigned long
- * Start by making sure zonesize is a multiple of MAX_ORDER-1 by rounding up
- * Then figure 1 NR_PAGEBLOCK_BITS worth of bits per MAX_ORDER-1, finally
+ * Start by making sure zonesize is a multiple of pageblock_order by rounding up
+ * Then figure 1 NR_PAGEBLOCK_BITS worth of bits per pageblock, finally
  * round what is now in bits to nearest long in bits, then return it in
  * bytes.
  */
@@ -3089,8 +3097,8 @@ static unsigned long __init usemap_size(
 {
 	unsigned long usemapsize;
 
-	usemapsize = roundup(zonesize, MAX_ORDER_NR_PAGES);
-	usemapsize = usemapsize >> (MAX_ORDER-1);
+	usemapsize = roundup(zonesize, nr_pages_pageblock);
+	usemapsize = usemapsize >> pageblock_order;
 	usemapsize *= NR_PAGEBLOCK_BITS;
 	usemapsize = roundup(usemapsize, 8 * sizeof(unsigned long));
 
@@ -3112,6 +3120,26 @@ static void inline setup_usemap(struct p
 				struct zone *zone, unsigned long zonesize) {}
 #endif /* CONFIG_SPARSEMEM */
 
+#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
+/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
+void __init set_pageblock_order(unsigned int order)
+{
+	/* Check that nr_pages_pageblock has not already been setup */
+	if (pageblock_order)
+		return;
+
+	/*
+	 * Assume the largest contiguous order of interest is a huge page.
+	 * This value may be variable depending on boot parameters on IA64
+	 */
+	pageblock_order = order;
+}
+#else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
+void __init set_pageblock_order(unsigned int order)
+{
+}
+#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -3192,6 +3220,7 @@ static void __meminit free_area_init_cor
 		if (!size)
 			continue;
 
+		set_pageblock_order(HUGETLB_PAGE_ORDER);
 		setup_usemap(pgdat, zone, size);
 		ret = init_currently_empty_zone(zone, zone_start_pfn,
 						size, MEMMAP_EARLY);
@@ -4083,15 +4112,15 @@ static inline int pfn_to_bitidx(struct z
 {
 #ifdef CONFIG_SPARSEMEM
 	pfn &= (PAGES_PER_SECTION-1);
-	return (pfn >> (MAX_ORDER-1)) * NR_PAGEBLOCK_BITS;
+	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
 #else
 	pfn = pfn - zone->zone_start_pfn;
-	return (pfn >> (MAX_ORDER-1)) * NR_PAGEBLOCK_BITS;
+	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
 #endif /* CONFIG_SPARSEMEM */
 }
 
 /**
- * get_pageblock_flags_group - Return the requested group of flags for the MAX_ORDER_NR_PAGES block of pages
+ * get_pageblock_flags_group - Return the requested group of flags for the nr_pages_pageblock block of pages
  * @page: The page within the block of interest
  * @start_bitidx: The first bit of interest to retrieve
  * @end_bitidx: The last bit of interest
@@ -4119,7 +4148,7 @@ unsigned long get_pageblock_flags_group(
 }
 
 /**
- * set_pageblock_flags_group - Set the requested group of flags for a MAX_ORDER_NR_PAGES block of pages
+ * set_pageblock_flags_group - Set the requested group of flags for a nr_pages_pageblock block of pages
  * @page: The page within the block of interest
  * @start_bitidx: The first bit of interest
  * @end_bitidx: The last bit of interest

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/8] Print out statistics in relation to fragmentation avoidance to /proc/fragavoidance
  2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
  2007-05-15 15:03 ` [PATCH 1/8] Do not depend on MAX_ORDER when " Mel Gorman
@ 2007-05-15 15:03 ` Mel Gorman
  2007-05-15 18:25   ` Christoph Lameter
  2007-05-15 15:04 ` [PATCH 3/8] Print out PAGE_OWNER statistics in relation to fragmentation avoidance Mel Gorman
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 15:03 UTC (permalink / raw)
  To: clameter; +Cc: Mel Gorman, linux-mm

This patch provides fragmentation avoidance statistics via
/proc/fragavoidance. The information is collected only on request so there
is no runtime overhead. The statistics are in two parts:

The first part is a more detailed version of /proc/buddyinfo and looks like

Free pages count per migrate type
Node 0, zone      DMA, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
Node 0, zone      DMA, type  Reclaimable      1      0      0      0      0      0      0      0      0      0      0
Node 0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      0
Node 0, zone      DMA, type      Reserve      0      4      4      0      0      0      0      1      0      1      0
Node 0, zone   Normal, type    Unmovable    111      8      4      4      2      3      1      0      0      0      0
Node 0, zone   Normal, type  Reclaimable    293     89      8      0      0      0      0      0      0      0      0
Node 0, zone   Normal, type      Movable      1      6     13      9      7      6      3      0      0      0      0
Node 0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      4

The second part looks like

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve
Node 0, zone      DMA            0            1            2            1
Node 0, zone   Normal            3           17           94            4

To walk the zones within a node with interrupts disabled, walk_zones_in_node()
is introduced and shared between /proc/buddyinfo, /proc/zoneinfo and
/proc/fragavoidance to reduce code duplication. It seems specific to what
vmstat.c requires but could be broken out as a general utility function in
mmzone.c if there were other other potential users.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 fs/proc/proc_misc.c    |   14 ++
 include/linux/gfp.h    |   12 +
 include/linux/mmzone.h |   10 +
 mm/page_alloc.c        |   20 ---
 mm/vmstat.c            |  275 +++++++++++++++++++++++++++++++-------------
 5 files changed, 231 insertions(+), 100 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-002_group_arbitrary/fs/proc/proc_misc.c linux-2.6.21-mm2-005_statistics/fs/proc/proc_misc.c
--- linux-2.6.21-mm2-002_group_arbitrary/fs/proc/proc_misc.c	2007-05-11 21:16:10.000000000 +0100
+++ linux-2.6.21-mm2-005_statistics/fs/proc/proc_misc.c	2007-05-15 12:24:58.000000000 +0100
@@ -231,6 +231,19 @@ static const struct file_operations frag
 	.release	= seq_release,
 };
 
+extern struct seq_operations fragavoidance_op;
+static int fragavoidance_open(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &fragavoidance_op);
+}
+
+static const struct file_operations fragavoidance_file_ops = {
+	.open		= fragavoidance_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+
 extern struct seq_operations zoneinfo_op;
 static int zoneinfo_open(struct inode *inode, struct file *file)
 {
@@ -873,6 +886,7 @@ void __init proc_misc_init(void)
 #endif
 #endif
 	create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations);
+	create_seq_entry("fragavoidance", S_IRUGO, &fragavoidance_file_ops);
 	create_seq_entry("vmstat",S_IRUGO, &proc_vmstat_file_operations);
 	create_seq_entry("zoneinfo",S_IRUGO, &proc_zoneinfo_file_operations);
 #ifdef CONFIG_BLOCK
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-002_group_arbitrary/include/linux/gfp.h linux-2.6.21-mm2-005_statistics/include/linux/gfp.h
--- linux-2.6.21-mm2-002_group_arbitrary/include/linux/gfp.h	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-005_statistics/include/linux/gfp.h	2007-05-15 12:24:58.000000000 +0100
@@ -93,6 +93,18 @@ struct vm_area_struct;
 /* 4GB DMA on some platforms */
 #define GFP_DMA32	__GFP_DMA32
 
+/* Convert GFP flags to their corresponding migrate type */
+static inline int allocflags_to_migratetype(gfp_t gfp_flags)
+{
+	WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
+
+	if (unlikely(page_group_by_mobility_disabled))
+		return MIGRATE_UNMOVABLE;
+
+	/* Cluster based on mobility */
+	return (((gfp_flags & __GFP_MOVABLE) != 0) << 1) |
+		((gfp_flags & __GFP_RECLAIMABLE) != 0);
+}
 
 static inline enum zone_type gfp_zone(gfp_t flags)
 {
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-002_group_arbitrary/include/linux/mmzone.h linux-2.6.21-mm2-005_statistics/include/linux/mmzone.h
--- linux-2.6.21-mm2-002_group_arbitrary/include/linux/mmzone.h	2007-05-15 12:23:22.000000000 +0100
+++ linux-2.6.21-mm2-005_statistics/include/linux/mmzone.h	2007-05-15 12:24:58.000000000 +0100
@@ -45,6 +45,16 @@ extern int page_group_by_mobility_disabl
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
 
+extern int page_group_by_mobility_disabled;
+
+static inline int get_pageblock_migratetype(struct page *page)
+{
+	if (unlikely(page_group_by_mobility_disabled))
+		return MIGRATE_UNMOVABLE;
+
+	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-002_group_arbitrary/mm/page_alloc.c linux-2.6.21-mm2-005_statistics/mm/page_alloc.c
--- linux-2.6.21-mm2-002_group_arbitrary/mm/page_alloc.c	2007-05-15 12:23:22.000000000 +0100
+++ linux-2.6.21-mm2-005_statistics/mm/page_alloc.c	2007-05-15 12:24:58.000000000 +0100
@@ -150,32 +150,12 @@ static unsigned long __meminitdata dma_r
 
 int page_group_by_mobility_disabled __read_mostly;
 
-static inline int get_pageblock_migratetype(struct page *page)
-{
-	if (unlikely(page_group_by_mobility_disabled))
-		return MIGRATE_UNMOVABLE;
-
-	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
-}
-
 static void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 	set_pageblock_flags_group(page, (unsigned long)migratetype,
 					PB_migrate, PB_migrate_end);
 }
 
-static inline int allocflags_to_migratetype(gfp_t gfp_flags)
-{
-	WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
-
-	if (unlikely(page_group_by_mobility_disabled))
-		return MIGRATE_UNMOVABLE;
-
-	/* Cluster based on mobility */
-	return (((gfp_flags & __GFP_MOVABLE) != 0) << 1) |
-		((gfp_flags & __GFP_RECLAIMABLE) != 0);
-}
-
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-002_group_arbitrary/mm/vmstat.c linux-2.6.21-mm2-005_statistics/mm/vmstat.c
--- linux-2.6.21-mm2-002_group_arbitrary/mm/vmstat.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-005_statistics/mm/vmstat.c	2007-05-15 12:24:58.000000000 +0100
@@ -396,6 +396,13 @@ void zone_statistics(struct zonelist *zo
 
 #include <linux/seq_file.h>
 
+static char * const migratetype_names[MIGRATE_TYPES] = {
+	"Unmovable",
+	"Reclaimable",
+	"Movable",
+	"Reserve",
+};
+
 static void *frag_start(struct seq_file *m, loff_t *pos)
 {
 	pg_data_t *pgdat;
@@ -420,28 +427,135 @@ static void frag_stop(struct seq_file *m
 {
 }
 
-/*
- * This walks the free areas for each zone.
- */
-static int frag_show(struct seq_file *m, void *arg)
+/* Walk all the zones in a node and print using a callback */
+static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat,
+		void (*print)(struct seq_file *m, pg_data_t *, struct zone *))
 {
-	pg_data_t *pgdat = (pg_data_t *)arg;
 	struct zone *zone;
 	struct zone *node_zones = pgdat->node_zones;
 	unsigned long flags;
-	int order;
 
 	for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
 		if (!populated_zone(zone))
 			continue;
 
 		spin_lock_irqsave(&zone->lock, flags);
-		seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
-		for (order = 0; order < MAX_ORDER; ++order)
-			seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
+		print(m, pgdat, zone);
 		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+}
+
+static void frag_show_print(struct seq_file *m, pg_data_t *pgdat,
+						struct zone *zone)
+{
+	int order;
+
+	seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+	for (order = 0; order < MAX_ORDER; ++order)
+		seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
+	seq_putc(m, '\n');
+}
+
+/*
+ * This walks the free areas for each zone.
+ */
+static int frag_show(struct seq_file *m, void *arg)
+{
+	pg_data_t *pgdat = (pg_data_t *)arg;
+	walk_zones_in_node(m, pgdat, frag_show_print);
+	return 0;
+}
+
+static void fragavoidance_showfree_print(struct seq_file *m,
+					pg_data_t *pgdat, struct zone *zone)
+{
+	int order, mtype;
+
+	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
+		seq_printf(m, "Node %d, zone %8s, type %12s ",
+					pgdat->node_id,
+					zone->name,
+					migratetype_names[mtype]);
+		for (order = 0; order < MAX_ORDER; ++order) {
+			unsigned long freecount = 0;
+			struct free_area *area;
+			struct list_head *curr;
+
+			area = &(zone->free_area[order]);
+
+			list_for_each(curr, &area->free_list[mtype])
+				freecount++;
+			seq_printf(m, "%6lu ", freecount);
+		}
 		seq_putc(m, '\n');
 	}
+}
+
+/* Print out the free pages at each order for each migatetype */
+static int fragavoidance_showfree(struct seq_file *m, void *arg)
+{
+	pg_data_t *pgdat = (pg_data_t *)arg;
+
+	seq_printf(m, "Free pages count per migrate type\n");
+	walk_zones_in_node(m, pgdat, fragavoidance_showfree_print);
+
+	return 0;
+}
+
+static void fragavoidance_showblockcount_print(struct seq_file *m,
+					pg_data_t *pgdat, struct zone *zone)
+{
+	int mtype;
+	unsigned long pfn;
+	unsigned long start_pfn = zone->zone_start_pfn;
+	unsigned long end_pfn = start_pfn + zone->spanned_pages;
+	unsigned long count[MIGRATE_TYPES] = { 0, };
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += nr_pages_pageblock) {
+		struct page *page;
+
+		if (!pfn_valid(pfn))
+			continue;
+
+		page = pfn_to_page(pfn);
+		mtype = get_pageblock_migratetype(page);
+
+		count[mtype]++;
+	}
+
+	/* Print counts */
+	seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+		seq_printf(m, "%12lu ", count[mtype]);
+	seq_putc(m, '\n');
+}
+
+/* Print out the free pages at each order for each migatetype */
+static int fragavoidance_showblockcount(struct seq_file *m, void *arg)
+{
+	int mtype;
+	pg_data_t *pgdat = (pg_data_t *)arg;
+
+	seq_printf(m, "\n%-23s", "Number of blocks type ");
+	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+		seq_printf(m, "%12s ", migratetype_names[mtype]);
+	seq_putc(m, '\n');
+	walk_zones_in_node(m, pgdat, fragavoidance_showblockcount_print);
+
+	return 0;
+}
+
+/*
+ * This prints out statistics in relation to  grouping pages by mobility.
+ * It is expensive to collect do not constantly read the file.
+ */
+static int fragavoidance_show(struct seq_file *m, void *arg)
+{
+	pg_data_t *pgdat = (pg_data_t *)arg;
+
+	fragavoidance_showfree(m, pgdat);
+	fragavoidance_showblockcount(m, pgdat);
+
 	return 0;
 }
 
@@ -452,6 +566,13 @@ const struct seq_operations fragmentatio
 	.show	= frag_show,
 };
 
+const struct seq_operations fragavoidance_op = {
+	.start	= frag_start,
+	.next	= frag_next,
+	.stop	= frag_stop,
+	.show	= fragavoidance_show,
+};
+
 #ifdef CONFIG_ZONE_DMA
 #define TEXT_FOR_DMA(xx) xx "_dma",
 #else
@@ -530,84 +651,78 @@ static const char * const vmstat_text[] 
 #endif
 };
 
-/*
- * Output information about zones in @pgdat.
- */
-static int zoneinfo_show(struct seq_file *m, void *arg)
+static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
+							struct zone *zone)
 {
-	pg_data_t *pgdat = arg;
-	struct zone *zone;
-	struct zone *node_zones = pgdat->node_zones;
-	unsigned long flags;
-
-	for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; zone++) {
-		int i;
+	int i;
+	seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
+	seq_printf(m,
+		   "\n  pages free     %lu"
+		   "\n        min      %lu"
+		   "\n        low      %lu"
+		   "\n        high     %lu"
+		   "\n        scanned  %lu (a: %lu i: %lu)"
+		   "\n        spanned  %lu"
+		   "\n        present  %lu",
+		   zone_page_state(zone, NR_FREE_PAGES),
+		   zone->pages_min,
+		   zone->pages_low,
+		   zone->pages_high,
+		   zone->pages_scanned,
+		   zone->nr_scan_active, zone->nr_scan_inactive,
+		   zone->spanned_pages,
+		   zone->present_pages);
 
-		if (!populated_zone(zone))
-			continue;
+	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
+		seq_printf(m, "\n    %-12s %lu", vmstat_text[i],
+				zone_page_state(zone, i));
 
-		spin_lock_irqsave(&zone->lock, flags);
-		seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
-		seq_printf(m,
-			   "\n  pages free     %lu"
-			   "\n        min      %lu"
-			   "\n        low      %lu"
-			   "\n        high     %lu"
-			   "\n        scanned  %lu (a: %lu i: %lu)"
-			   "\n        spanned  %lu"
-			   "\n        present  %lu",
-			   zone_page_state(zone, NR_FREE_PAGES),
-			   zone->pages_min,
-			   zone->pages_low,
-			   zone->pages_high,
-			   zone->pages_scanned,
-			   zone->nr_scan_active, zone->nr_scan_inactive,
-			   zone->spanned_pages,
-			   zone->present_pages);
-
-		for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
-			seq_printf(m, "\n    %-12s %lu", vmstat_text[i],
-					zone_page_state(zone, i));
-
-		seq_printf(m,
-			   "\n        protection: (%lu",
-			   zone->lowmem_reserve[0]);
-		for (i = 1; i < ARRAY_SIZE(zone->lowmem_reserve); i++)
-			seq_printf(m, ", %lu", zone->lowmem_reserve[i]);
-		seq_printf(m,
-			   ")"
-			   "\n  pagesets");
-		for_each_online_cpu(i) {
-			struct per_cpu_pageset *pageset;
-			int j;
-
-			pageset = zone_pcp(zone, i);
-			for (j = 0; j < ARRAY_SIZE(pageset->pcp); j++) {
-				seq_printf(m,
-					   "\n    cpu: %i pcp: %i"
-					   "\n              count: %i"
-					   "\n              high:  %i"
-					   "\n              batch: %i",
-					   i, j,
-					   pageset->pcp[j].count,
-					   pageset->pcp[j].high,
-					   pageset->pcp[j].batch);
+	seq_printf(m,
+		   "\n        protection: (%lu",
+		   zone->lowmem_reserve[0]);
+	for (i = 1; i < ARRAY_SIZE(zone->lowmem_reserve); i++)
+		seq_printf(m, ", %lu", zone->lowmem_reserve[i]);
+	seq_printf(m,
+		   ")"
+		   "\n  pagesets");
+	for_each_online_cpu(i) {
+		struct per_cpu_pageset *pageset;
+		int j;
+
+		pageset = zone_pcp(zone, i);
+		for (j = 0; j < ARRAY_SIZE(pageset->pcp); j++) {
+			seq_printf(m,
+				   "\n    cpu: %i pcp: %i"
+				   "\n              count: %i"
+				   "\n              high:  %i"
+				   "\n              batch: %i",
+				   i, j,
+				   pageset->pcp[j].count,
+				   pageset->pcp[j].high,
+				   pageset->pcp[j].batch);
 			}
 #ifdef CONFIG_SMP
-			seq_printf(m, "\n  vm stats threshold: %d",
-					pageset->stat_threshold);
+		seq_printf(m, "\n  vm stats threshold: %d",
+				pageset->stat_threshold);
 #endif
-		}
-		seq_printf(m,
-			   "\n  all_unreclaimable: %u"
-			   "\n  prev_priority:     %i"
-			   "\n  start_pfn:         %lu",
-			   zone->all_unreclaimable,
-			   zone->prev_priority,
-			   zone->zone_start_pfn);
-		spin_unlock_irqrestore(&zone->lock, flags);
-		seq_putc(m, '\n');
 	}
+	seq_printf(m,
+		   "\n  all_unreclaimable: %u"
+		   "\n  prev_priority:     %i"
+		   "\n  start_pfn:         %lu",
+		   zone->all_unreclaimable,
+		   zone->prev_priority,
+		   zone->zone_start_pfn);
+	seq_putc(m, '\n');
+}
+
+/*
+ * Output information about zones in @pgdat.
+ */
+static int zoneinfo_show(struct seq_file *m, void *arg)
+{
+	pg_data_t *pgdat = (pg_data_t *)arg;
+	walk_zones_in_node(m, pgdat, zoneinfo_show_print);
 	return 0;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/8] Print out PAGE_OWNER statistics in relation to fragmentation avoidance
  2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
  2007-05-15 15:03 ` [PATCH 1/8] Do not depend on MAX_ORDER when " Mel Gorman
  2007-05-15 15:03 ` [PATCH 2/8] Print out statistics in relation to fragmentation avoidance to /proc/fragavoidance Mel Gorman
@ 2007-05-15 15:04 ` Mel Gorman
  2007-05-15 15:04 ` [PATCH 4/8] Mark bio_alloc() allocations correctly Mel Gorman
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 15:04 UTC (permalink / raw)
  To: clameter; +Cc: Mel Gorman, linux-mm

When PAGE_OWNER is set, more information is available of relevance
to fragmentation avoidance. A second line is added to /proc/page_owner
showing the PFN, the pageblock number, the mobility type of the page based
on its allocation flags, whether the allocation is improperly placed and
the flags. A sample entry looks like

Page allocated via order 0, mask 0x1280d2
PFN 7355 Block 7 type 3 Fallback Flags      LA     
[0xc01528c6] __handle_mm_fault+598
[0xc0320427] do_page_fault+279
[0xc031ed9a] error_code+114

This information can be used to identify pages that are improperly placed. As
the format of PAGE_OWNER data is now different, the comment at the top of
Documentation/page_owner.c is updated with new instructions.

As PAGE_OWNER tracks the GFP flags used to allocate the pages,
/proc/fragavoidance is enhanced to contain how many mixed blocks exist. The
additional output looks like

Number of mixed blocks    Unmovable  Reclaimable      Movable      Reserve
Node 0, zone      DMA            0            1            2            1
Node 0, zone   Normal            2           11           33            0

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 Documentation/page_owner.c |    3 -
 fs/proc/proc_misc.c        |   28 ++++++++++++
 mm/vmstat.c                |   92 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 122 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-005_statistics/Documentation/page_owner.c linux-2.6.21-mm2-006_statistics_owner/Documentation/page_owner.c
--- linux-2.6.21-mm2-005_statistics/Documentation/page_owner.c	2007-05-11 21:16:06.000000000 +0100
+++ linux-2.6.21-mm2-006_statistics_owner/Documentation/page_owner.c	2007-05-15 12:26:35.000000000 +0100
@@ -2,7 +2,8 @@
  * User-space helper to sort the output of /proc/page_owner
  *
  * Example use:
- * cat /proc/page_owner > page_owner.txt
+ * cat /proc/page_owner > page_owner_full.txt
+ * grep -v ^PFN page_owner_full.txt > page_owner.txt
  * ./sort page_owner.txt sorted_page_owner.txt
 */
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-005_statistics/fs/proc/proc_misc.c linux-2.6.21-mm2-006_statistics_owner/fs/proc/proc_misc.c
--- linux-2.6.21-mm2-005_statistics/fs/proc/proc_misc.c	2007-05-15 12:24:58.000000000 +0100
+++ linux-2.6.21-mm2-006_statistics_owner/fs/proc/proc_misc.c	2007-05-15 12:26:35.000000000 +0100
@@ -760,6 +760,7 @@ read_page_owner(struct file *file, char 
 	unsigned long offset = 0, symsize;
 	int i;
 	ssize_t num_written = 0;
+	int blocktype = 0, pagetype = 0;
 
 	pfn = min_low_pfn + *ppos;
 	page = pfn_to_page(pfn);
@@ -788,6 +789,33 @@ read_page_owner(struct file *file, char 
 		goto out;
 	}
 
+	/* Print information relevant to grouping pages by mobility */
+	blocktype = get_pageblock_migratetype(page);
+	pagetype  = allocflags_to_migratetype(page->gfp_mask);
+	ret += snprintf(kbuf+ret, count-ret,
+			"PFN %lu Block %lu type %d %s "
+			"Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
+			pfn,
+			pfn >> pageblock_order,
+			blocktype,
+			blocktype != pagetype ? "Fallback" : "        ",
+			PageLocked(page)	? "K" : " ",
+			PageError(page)		? "E" : " ",
+			PageReferenced(page)	? "R" : " ",
+			PageUptodate(page)	? "U" : " ",
+			PageDirty(page)		? "D" : " ",
+			PageLRU(page)		? "L" : " ",
+			PageActive(page)	? "A" : " ",
+			PageSlab(page)		? "S" : " ",
+			PageWriteback(page)	? "W" : " ",
+			PageCompound(page)	? "C" : " ",
+			PageSwapCache(page)	? "B" : " ",
+			PageMappedToDisk(page)	? "M" : " ");
+	if (ret >= count) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	num_written = ret;
 
 	for (i = 0; i < 8; i++) {
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-005_statistics/mm/vmstat.c linux-2.6.21-mm2-006_statistics_owner/mm/vmstat.c
--- linux-2.6.21-mm2-005_statistics/mm/vmstat.c	2007-05-15 12:24:58.000000000 +0100
+++ linux-2.6.21-mm2-006_statistics_owner/mm/vmstat.c	2007-05-15 12:26:35.000000000 +0100
@@ -427,6 +427,77 @@ static void frag_stop(struct seq_file *m
 {
 }
 
+#ifdef CONFIG_PAGE_OWNER
+static void fragavoidance_showmixedcount_print(struct seq_file *m,
+							pg_data_t *pgdat,
+							struct zone *zone)
+{
+	int mtype, pagetype;
+	unsigned long pfn;
+	unsigned long start_pfn = zone->zone_start_pfn;
+	unsigned long end_pfn = start_pfn + zone->spanned_pages;
+	unsigned long count[MIGRATE_TYPES] = { 0, };
+
+	/* Align PFNs to nr_pages_pageblock boundary */
+	pfn = start_pfn & ~(nr_pages_pageblock-1);
+
+	/*
+	 * Walk the zone in nr_pages_pageblock steps. If a page block spans
+	 * a zone boundary, it will be double counted between zones. This does
+	 * not matter as the mixed block count will still be correct
+	 */
+	for (; pfn < end_pfn; pfn += nr_pages_pageblock) {
+		struct page *page;
+		unsigned long offset = 0;
+
+		/* Do not read before the zone start */
+		if (pfn < start_pfn)
+			offset = start_pfn - pfn;
+
+		if (!pfn_valid(pfn + offset))
+			continue;
+
+		page = pfn_to_page(pfn + offset);
+		mtype = get_pageblock_migratetype(page);
+
+		/* Check the block for bad migrate types */
+		for (; offset < nr_pages_pageblock; offset++) {
+			/* Do not past the end of the zone */
+			if (pfn + offset >= end_pfn)
+				break;
+
+			if (!pfn_valid_within(pfn + offset))
+				continue;
+
+			page = pfn_to_page(pfn + offset);
+
+			/* Skip free pages */
+			if (PageBuddy(page)) {
+				offset += (1UL << page_private(page)) - 1UL;
+				continue;
+			}
+			if (page->order < 0)
+				continue;
+
+			pagetype = allocflags_to_migratetype(page->gfp_mask);
+			if (pagetype != mtype) {
+				count[mtype]++;
+				break;
+			}
+
+			/* Move to end of this allocation */
+			offset += (1 << page->order) - 1;
+		}
+	}
+
+	/* Print counts */
+	seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+		seq_printf(m, "%12lu ", count[mtype]);
+	seq_putc(m, '\n');
+}
+#endif /* CONFIG_PAGE_OWNER */
+
 /* Walk all the zones in a node and print using a callback */
 static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat,
 		void (*print)(struct seq_file *m, pg_data_t *, struct zone *))
@@ -546,6 +617,26 @@ static int fragavoidance_showblockcount(
 }
 
 /*
+ * Print out the number of pageblocks for each migratetype that contain pages
+ * of other types. This gives an indication of how well fallbacks are being
+ * contained by rmqueue_fallback(). It requires information from PAGE_OWNER
+ * to determine what is going on
+ */
+static void fragavoidance_showmixedcount(struct seq_file *m, pg_data_t *pgdat)
+{
+#ifdef CONFIG_PAGE_OWNER
+	int mtype;
+
+	seq_printf(m, "\n%-23s", "Number of mixed blocks ");
+	/* Print header */
+	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+		seq_printf(m, "%12s ", migratetype_names[mtype]);
+	seq_putc(m, '\n');
+	walk_zones_in_node(m, pgdat, fragavoidance_showmixedcount_print);
+#endif /* CONFIG_PAGE_OWNER */
+}
+
+/*
  * This prints out statistics in relation to  grouping pages by mobility.
  * It is expensive to collect do not constantly read the file.
  */
@@ -555,6 +646,7 @@ static int fragavoidance_show(struct seq
 
 	fragavoidance_showfree(m, pgdat);
 	fragavoidance_showblockcount(m, pgdat);
+	fragavoidance_showmixedcount(m, pgdat);
 
 	return 0;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/8] Mark bio_alloc() allocations correctly
  2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
                   ` (2 preceding siblings ...)
  2007-05-15 15:04 ` [PATCH 3/8] Print out PAGE_OWNER statistics in relation to fragmentation avoidance Mel Gorman
@ 2007-05-15 15:04 ` Mel Gorman
  2007-05-15 15:04 ` [PATCH 5/8] Do not annotate shmem allocations explicitly Mel Gorman
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 15:04 UTC (permalink / raw)
  To: clameter; +Cc: Mel Gorman, linux-mm

bio_alloc() currently uses __GFP_MOVABLE which is plain wrong. Objects are
allocated with that gfp mask via mempool. The slab that is ultimatly used
is not reclaimable or movable.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 buffer.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-006_statistics_owner/fs/buffer.c linux-2.6.21-mm2-010_biomovable/fs/buffer.c
--- linux-2.6.21-mm2-006_statistics_owner/fs/buffer.c	2007-05-11 21:16:10.000000000 +0100
+++ linux-2.6.21-mm2-010_biomovable/fs/buffer.c	2007-05-15 12:28:11.000000000 +0100
@@ -2621,7 +2621,7 @@ int submit_bh(int rw, struct buffer_head
 	 * from here on down, it's all bio -- do the initial mapping,
 	 * submit_bio -> generic_make_request may further map this bio around
 	 */
-	bio = bio_alloc(GFP_NOIO|__GFP_MOVABLE, 1);
+	bio = bio_alloc(GFP_NOIO, 1);
 
 	bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 	bio->bi_bdev = bh->b_bdev;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/8] Do not annotate shmem allocations explicitly
  2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
                   ` (3 preceding siblings ...)
  2007-05-15 15:04 ` [PATCH 4/8] Mark bio_alloc() allocations correctly Mel Gorman
@ 2007-05-15 15:04 ` Mel Gorman
  2007-05-15 15:05 ` [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived Mel Gorman
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 15:04 UTC (permalink / raw)
  To: clameter; +Cc: Mel Gorman, linux-mm

shmem support allocates pages for two purposes. Firstly, shmem_dir_alloc()
allocates pages to track swap vectors. These are not movable so this
patch clears all mobility-flags related to the allocation. Secondly,
shmem_alloc_pages() allocates pages on behalf of shmem_getpage(), whose
flags come from a file mapping which already sets the appropriate mobility
flags. These allocations do not need to be explicitly flagged so this patch
removes the unnecessary annotations.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 shmem.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-010_biomovable/mm/shmem.c linux-2.6.21-mm2-012_shmem/mm/shmem.c
--- linux-2.6.21-mm2-010_biomovable/mm/shmem.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-012_shmem/mm/shmem.c	2007-05-15 12:29:52.000000000 +0100
@@ -95,9 +95,9 @@ static inline struct page *shmem_dir_all
 	 * BLOCKS_PER_PAGE on indirect pages, assume PAGE_CACHE_SIZE:
 	 * might be reconsidered if it ever diverges from PAGE_SIZE.
 	 *
-	 * __GFP_MOVABLE is masked out as swap vectors cannot move
+	 * Mobility flags are masked out as swap vectors cannot move
 	 */
-	return alloc_pages((gfp_mask & ~__GFP_MOVABLE) | __GFP_ZERO,
+	return alloc_pages((gfp_mask & ~GFP_MOVABLE_MASK) | __GFP_ZERO,
 				PAGE_CACHE_SHIFT-PAGE_SHIFT);
 }
 
@@ -1053,9 +1053,7 @@ shmem_alloc_page(gfp_t gfp, struct shmem
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, idx);
 	pvma.vm_pgoff = idx;
 	pvma.vm_end = PAGE_SIZE;
-	page = alloc_page_vma(
-			set_migrateflags(gfp | __GFP_ZERO, __GFP_RECLAIMABLE),
-								&pvma, 0);
+	page = alloc_page_vma(gfp | __GFP_ZERO, &pvma, 0);
 	mpol_free(pvma.vm_policy);
 	return page;
 }
@@ -1075,8 +1073,7 @@ shmem_swapin(struct shmem_inode_info *in
 static inline struct page *
 shmem_alloc_page(gfp_t gfp,struct shmem_inode_info *info, unsigned long idx)
 {
-	return alloc_page(
-			set_migrateflags(gfp | __GFP_ZERO, __GFP_RECLAIMABLE));
+	return alloc_page(gfp | __GFP_ZERO);
 }
 #endif
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived
  2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
                   ` (4 preceding siblings ...)
  2007-05-15 15:04 ` [PATCH 5/8] Do not annotate shmem allocations explicitly Mel Gorman
@ 2007-05-15 15:05 ` Mel Gorman
  2007-05-15 18:29   ` Christoph Lameter
  2007-05-16  0:36   ` KAMEZAWA Hiroyuki
  2007-05-15 15:05 ` [PATCH 7/8] Rename GFP_HIGH_MOVABLE to GFP_HIGHUSER_MOVABLE Mel Gorman
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 15:05 UTC (permalink / raw)
  To: clameter; +Cc: Mel Gorman, linux-mm

Currently allocations that are short-lived or reclaimable by the kernel are
grouped together by specifying __GFP_RECLAIMABLE in the GFP flags. However,
it is confusing when reading code to see a temporary allocation using
__GFP_RECLAIMABLE when it is clearly not reclaimable.

This patch adds __GFP_TEMPORARY, GFP_TEMPORARY and SLAB_TEMPORARY for
temporary allocations. The journal_handle, journal_head, revoke_table,
revoke_record, skbuff_head_cache and skbuff_fclone_cache slabs are converted
to use SLAB_TEMPORARY instead of flagging the allocation call-sites. In the
implementation, reclaimable and temporary allocations are grouped into the
same blocks but this might change in the future. This change makes call
sites for temporary allocations clearer. Not all temporary allocations
were previously flagged. This patch flags a few additional allocations
appropriately.

Note that some GFP_USER and GFP_KERNEL allocations are both changed to
GFP_TEMPORARY. The difference between GFP_USER and GFP_KERNEL is only in how
cpuset boundaries are handled which is unimportant to temporary allocations.

This patch can be considered as fix to
group-short-lived-and-reclaimable-kernel-allocations.patch.

Credit goes to Christoph Lameter for identifying the problems in relation to
temporary allocations during review and providing an illustration-of-concept
patch to act as a starting point.

[clameter@sgi.com: patch framework]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 drivers/block/acsi_slm.c |    2 +-
 fs/jbd/journal.c         |   10 ++++------
 fs/jbd/revoke.c          |   14 ++++++++------
 fs/proc/base.c           |   12 ++++++------
 fs/proc/generic.c        |    2 +-
 include/linux/gfp.h      |    2 ++
 include/linux/slab.h     |    5 ++++-
 kernel/cpuset.c          |    2 +-
 mm/slub.c                |    2 +-
 net/core/skbuff.c        |   19 +++++++++----------
 10 files changed, 37 insertions(+), 33 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/drivers/block/acsi_slm.c linux-2.6.21-mm2-020_temporary/drivers/block/acsi_slm.c
--- linux-2.6.21-mm2-012_shmem/drivers/block/acsi_slm.c	2007-05-11 21:16:08.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/drivers/block/acsi_slm.c	2007-05-15 12:31:22.000000000 +0100
@@ -367,7 +367,7 @@ static ssize_t slm_read( struct file *fi
 	int length;
 	int end;
 
-	if (!(page = __get_free_page( GFP_KERNEL )))
+	if (!(page = __get_free_page(GFP_TEMPORARY)))
 		return( -ENOMEM );
 	
 	length = slm_getstats( (char *)page, iminor(node) );
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/fs/jbd/journal.c linux-2.6.21-mm2-020_temporary/fs/jbd/journal.c
--- linux-2.6.21-mm2-012_shmem/fs/jbd/journal.c	2007-05-11 21:16:10.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/fs/jbd/journal.c	2007-05-15 12:31:22.000000000 +0100
@@ -1710,7 +1710,7 @@ static int journal_init_journal_head_cac
 	journal_head_cache = kmem_cache_create("journal_head",
 				sizeof(struct journal_head),
 				0,		/* offset */
-				0,		/* flags */
+				SLAB_TEMPORARY,	/* flags */
 				NULL,		/* ctor */
 				NULL);		/* dtor */
 	retval = 0;
@@ -1739,8 +1739,7 @@ static struct journal_head *journal_allo
 #ifdef CONFIG_JBD_DEBUG
 	atomic_inc(&nr_journal_heads);
 #endif
-	ret = kmem_cache_alloc(journal_head_cache,
-			set_migrateflags(GFP_NOFS, __GFP_RECLAIMABLE));
+	ret = kmem_cache_alloc(journal_head_cache, GFP_NOFS);
 	if (ret == 0) {
 		jbd_debug(1, "out of memory for journal_head\n");
 		if (time_after(jiffies, last_warning + 5*HZ)) {
@@ -1750,8 +1749,7 @@ static struct journal_head *journal_allo
 		}
 		while (ret == 0) {
 			yield();
-			ret = kmem_cache_alloc(journal_head_cache,
-					GFP_NOFS|__GFP_RECLAIMABLE);
+			ret = kmem_cache_alloc(journal_head_cache, GFP_NOFS);
 		}
 	}
 	return ret;
@@ -2017,7 +2015,7 @@ static int __init journal_init_handle_ca
 	jbd_handle_cache = kmem_cache_create("journal_handle",
 				sizeof(handle_t),
 				0,		/* offset */
-				0,		/* flags */
+				SLAB_TEMPORARY,	/* flags */
 				NULL,		/* ctor */
 				NULL);		/* dtor */
 	if (jbd_handle_cache == NULL) {
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/fs/jbd/revoke.c linux-2.6.21-mm2-020_temporary/fs/jbd/revoke.c
--- linux-2.6.21-mm2-012_shmem/fs/jbd/revoke.c	2007-05-11 21:16:10.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/fs/jbd/revoke.c	2007-05-15 12:31:22.000000000 +0100
@@ -169,13 +169,17 @@ int __init journal_init_revoke_caches(vo
 {
 	revoke_record_cache = kmem_cache_create("revoke_record",
 					   sizeof(struct jbd_revoke_record_s),
-					   0, SLAB_HWCACHE_ALIGN, NULL, NULL);
+					   0,
+					   SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY,
+					   NULL, NULL);
 	if (revoke_record_cache == 0)
 		return -ENOMEM;
 
 	revoke_table_cache = kmem_cache_create("revoke_table",
 					   sizeof(struct jbd_revoke_table_s),
-					   0, 0, NULL, NULL);
+					   0,
+					   SLAB_TEMPORARY,
+					   NULL, NULL);
 	if (revoke_table_cache == 0) {
 		kmem_cache_destroy(revoke_record_cache);
 		revoke_record_cache = NULL;
@@ -205,8 +209,7 @@ int journal_init_revoke(journal_t *journ
 	while((tmp >>= 1UL) != 0UL)
 		shift++;
 
-	journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache,
-					GFP_KERNEL|__GFP_RECLAIMABLE);
+	journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
 	if (!journal->j_revoke_table[0])
 		return -ENOMEM;
 	journal->j_revoke = journal->j_revoke_table[0];
@@ -229,8 +232,7 @@ int journal_init_revoke(journal_t *journ
 	for (tmp = 0; tmp < hash_size; tmp++)
 		INIT_LIST_HEAD(&journal->j_revoke->hash_table[tmp]);
 
-	journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache,
-					GFP_KERNEL|__GFP_RECLAIMABLE);
+	journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
 	if (!journal->j_revoke_table[1]) {
 		kfree(journal->j_revoke_table[0]->hash_table);
 		kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/fs/proc/base.c linux-2.6.21-mm2-020_temporary/fs/proc/base.c
--- linux-2.6.21-mm2-012_shmem/fs/proc/base.c	2007-05-11 21:16:10.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/fs/proc/base.c	2007-05-15 12:31:22.000000000 +0100
@@ -487,7 +487,7 @@ static ssize_t proc_info_read(struct fil
 		count = PROC_BLOCK_SIZE;
 
 	length = -ENOMEM;
-	if (!(page = __get_free_page(GFP_KERNEL|__GFP_RECLAIMABLE)))
+	if (!(page = __get_free_page(GFP_TEMPORARY)))
 		goto out;
 
 	length = PROC_I(inode)->op.proc_read(task, (char*)page);
@@ -527,7 +527,7 @@ static ssize_t mem_read(struct file * fi
 		goto out;
 
 	ret = -ENOMEM;
-	page = (char *)__get_free_page(GFP_USER);
+	page = (char *)__get_free_page(GFP_TEMPORARY);
 	if (!page)
 		goto out;
 
@@ -597,7 +597,7 @@ static ssize_t mem_write(struct file * f
 		goto out;
 
 	copied = -ENOMEM;
-	page = (char *)__get_free_page(GFP_USER|__GFP_RECLAIMABLE);
+	page = (char *)__get_free_page(GFP_TEMPORARY);
 	if (!page)
 		goto out;
 
@@ -747,7 +747,7 @@ static ssize_t proc_loginuid_write(struc
 		/* No partial writes. */
 		return -EINVAL;
 	}
-	page = (char*)__get_free_page(GFP_USER|__GFP_RECLAIMABLE);
+	page = (char*)__get_free_page(GFP_TEMPORARY);
 	if (!page)
 		return -ENOMEM;
 	length = -EFAULT;
@@ -915,7 +915,7 @@ static int do_proc_readlink(struct dentr
 			    char __user *buffer, int buflen)
 {
 	struct inode * inode;
-	char *tmp = (char*)__get_free_page(GFP_KERNEL|__GFP_RECLAIMABLE);
+	char *tmp = (char*)__get_free_page(GFP_TEMPORARY);
 	char *path;
 	int len;
 
@@ -1688,7 +1688,7 @@ static ssize_t proc_pid_attr_write(struc
 		goto out;
 
 	length = -ENOMEM;
-	page = (char*)__get_free_page(GFP_USER|__GFP_RECLAIMABLE);
+	page = (char*)__get_free_page(GFP_TEMPORARY);
 	if (!page)
 		goto out;
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/fs/proc/generic.c linux-2.6.21-mm2-020_temporary/fs/proc/generic.c
--- linux-2.6.21-mm2-012_shmem/fs/proc/generic.c	2007-05-11 21:16:10.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/fs/proc/generic.c	2007-05-15 12:31:22.000000000 +0100
@@ -74,7 +74,7 @@ proc_file_read(struct file *file, char _
 		nbytes = MAX_NON_LFS - pos;
 
 	dp = PDE(inode);
-	if (!(page = (char*) __get_free_page(GFP_KERNEL|__GFP_RECLAIMABLE)))
+	if (!(page = (char*) __get_free_page(GFP_TEMPORARY)))
 		return -ENOMEM;
 
 	while ((nbytes > 0) && !eof) {
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/include/linux/gfp.h linux-2.6.21-mm2-020_temporary/include/linux/gfp.h
--- linux-2.6.21-mm2-012_shmem/include/linux/gfp.h	2007-05-15 12:24:58.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/include/linux/gfp.h	2007-05-15 12:31:22.000000000 +0100
@@ -71,6 +71,8 @@ struct vm_area_struct;
 #define GFP_NOIO	(__GFP_WAIT)
 #define GFP_NOFS	(__GFP_WAIT | __GFP_IO)
 #define GFP_KERNEL	(__GFP_WAIT | __GFP_IO | __GFP_FS)
+#define GFP_TEMPORARY	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
+			 __GFP_RECLAIMABLE)
 #define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
 #define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
 			 __GFP_HIGHMEM)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/include/linux/slab.h linux-2.6.21-mm2-020_temporary/include/linux/slab.h
--- linux-2.6.21-mm2-012_shmem/include/linux/slab.h	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/include/linux/slab.h	2007-05-15 12:31:22.000000000 +0100
@@ -26,12 +26,15 @@ typedef struct kmem_cache kmem_cache_t _
 #define SLAB_HWCACHE_ALIGN	0x00002000UL	/* Align objs on cache lines */
 #define SLAB_CACHE_DMA		0x00004000UL	/* Use GFP_DMA memory */
 #define SLAB_STORE_USER		0x00010000UL	/* DEBUG: Store the last owner for bug hunting */
-#define SLAB_RECLAIM_ACCOUNT	0x00020000UL	/* Objects are reclaimable */
 #define SLAB_PANIC		0x00040000UL	/* Panic if kmem_cache_create() fails */
 #define SLAB_DESTROY_BY_RCU	0x00080000UL	/* Defer freeing slabs to RCU */
 #define SLAB_MEM_SPREAD		0x00100000UL	/* Spread some memory over cpuset */
 #define SLAB_TRACE		0x00200000UL	/* Trace allocations and frees */
 
+/* The following flags affect the page allocator grouping pages by mobility */
+#define SLAB_RECLAIM_ACCOUNT	0x00020000UL	/* Objects are reclaimable */
+#define SLAB_TEMPORARY	SLAB_RECLAIM_ACCOUNT	/* Objects are short-lived */
+
 /* Flags passed to a constructor functions */
 #define SLAB_CTOR_CONSTRUCTOR	0x001UL		/* If not set, then deconstructor */
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/kernel/cpuset.c linux-2.6.21-mm2-020_temporary/kernel/cpuset.c
--- linux-2.6.21-mm2-012_shmem/kernel/cpuset.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/kernel/cpuset.c	2007-05-15 12:31:22.000000000 +0100
@@ -1383,7 +1383,7 @@ static ssize_t cpuset_common_file_read(s
 	ssize_t retval = 0;
 	char *s;
 
-	if (!(page = (char *)__get_free_page(GFP_KERNEL)))
+	if (!(page = (char *)__get_free_page(GFP_TEMPORARY)))
 		return -ENOMEM;
 
 	s = page;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/mm/slub.c linux-2.6.21-mm2-020_temporary/mm/slub.c
--- linux-2.6.21-mm2-012_shmem/mm/slub.c	2007-05-15 12:21:44.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/mm/slub.c	2007-05-15 12:31:22.000000000 +0100
@@ -2846,7 +2846,7 @@ static int alloc_loc_track(struct loc_tr
 
 	order = get_order(sizeof(struct location) * max);
 
-	l = (void *)__get_free_pages(GFP_KERNEL, order);
+	l = (void *)__get_free_pages(GFP_TEMPORARY, order);
 
 	if (!l)
 		return 0;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-012_shmem/net/core/skbuff.c linux-2.6.21-mm2-020_temporary/net/core/skbuff.c
--- linux-2.6.21-mm2-012_shmem/net/core/skbuff.c	2007-05-11 21:16:12.000000000 +0100
+++ linux-2.6.21-mm2-020_temporary/net/core/skbuff.c	2007-05-15 12:31:22.000000000 +0100
@@ -152,7 +152,6 @@ struct sk_buff *__alloc_skb(unsigned int
 	u8 *data;
 
 	cache = fclone ? skbuff_fclone_cache : skbuff_head_cache;
-	gfp_mask = set_migrateflags(gfp_mask, __GFP_RECLAIMABLE);
 
 	/* Get the HEAD */
 	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
@@ -2002,16 +2001,16 @@ EXPORT_SYMBOL_GPL(skb_segment);
 void __init skb_init(void)
 {
 	skbuff_head_cache = kmem_cache_create("skbuff_head_cache",
-					      sizeof(struct sk_buff),
-					      0,
-					      SLAB_HWCACHE_ALIGN|SLAB_PANIC,
-					      NULL, NULL);
+				sizeof(struct sk_buff),
+				0,
+				SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TEMPORARY,
+				NULL, NULL);
 	skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
-						(2*sizeof(struct sk_buff)) +
-						sizeof(atomic_t),
-						0,
-						SLAB_HWCACHE_ALIGN|SLAB_PANIC,
-						NULL, NULL);
+				(2*sizeof(struct sk_buff)) +
+				sizeof(atomic_t),
+				0,
+				SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TEMPORARY,
+				NULL, NULL);
 }
 
 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 7/8] Rename GFP_HIGH_MOVABLE to GFP_HIGHUSER_MOVABLE
  2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
                   ` (5 preceding siblings ...)
  2007-05-15 15:05 ` [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived Mel Gorman
@ 2007-05-15 15:05 ` Mel Gorman
  2007-05-15 18:29   ` Christoph Lameter
  2007-05-15 15:05 ` [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE Mel Gorman
  2007-05-16  2:33 ` [PATCH 0/8] Review-based updates to grouping pages by mobility KAMEZAWA Hiroyuki
  8 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 15:05 UTC (permalink / raw)
  To: clameter; +Cc: Mel Gorman, linux-mm

__GFP_HIGH are used to flag allocations that can access emergency
pools. GFP_HIGH_MOVABLE has little to do with __GFP_HIGH and the name is
misleading. This patch renames GFP_HIGH_MOVABLE to GFP_HIGHUSER_MOVABLE so
that it is clearer.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 fs/inode.c          |    6 +++---
 include/linux/gfp.h |    2 +-
 mm/hugetlb.c        |    2 +-
 mm/memory.c         |    5 +++--
 mm/mempolicy.c      |    5 +++--
 mm/migrate.c        |    3 ++-
 mm/page_alloc.c     |    2 +-
 mm/swap_prefetch.c  |    2 +-
 mm/swap_state.c     |    3 ++-
 9 files changed, 17 insertions(+), 13 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-020_temporary/fs/inode.c linux-2.6.21-mm2-025_gfphighuser/fs/inode.c
--- linux-2.6.21-mm2-020_temporary/fs/inode.c	2007-05-11 21:16:10.000000000 +0100
+++ linux-2.6.21-mm2-025_gfphighuser/fs/inode.c	2007-05-15 15:49:58.000000000 +0100
@@ -154,7 +154,7 @@ static struct inode *alloc_inode(struct 
 		mapping->a_ops = &empty_aops;
  		mapping->host = inode;
 		mapping->flags = 0;
-		mapping_set_gfp_mask(mapping, GFP_HIGH_MOVABLE);
+		mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
 		mapping->assoc_mapping = NULL;
 		mapping->backing_dev_info = &default_backing_dev_info;
 
@@ -536,8 +536,8 @@ repeat:
  *	@sb: superblock
  *
  *	Allocates a new inode for given superblock. The default gfp_mask
- *	for allocations related to inode->i_mapping is GFP_HIGH_MOVABLE. If
- *	HIGHMEM pages are unsuitable or it is known that pages allocated
+ *	for allocations related to inode->i_mapping is GFP_HIGHUSER_MOVABLE.
+ *	If HIGHMEM pages are unsuitable or it is known that pages allocated
  *	for the page cache are not reclaimable or migratable,
  *	mapping_set_gfp_mask() must be called with suitable flags on the
  *	newly created inode's mapping
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-020_temporary/include/linux/gfp.h linux-2.6.21-mm2-025_gfphighuser/include/linux/gfp.h
--- linux-2.6.21-mm2-020_temporary/include/linux/gfp.h	2007-05-15 15:48:18.000000000 +0100
+++ linux-2.6.21-mm2-025_gfphighuser/include/linux/gfp.h	2007-05-15 15:49:58.000000000 +0100
@@ -76,7 +76,7 @@ struct vm_area_struct;
 #define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
 #define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
 			 __GFP_HIGHMEM)
-#define GFP_HIGH_MOVABLE	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
+#define GFP_HIGHUSER_MOVABLE	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
 				 __GFP_HARDWALL | __GFP_HIGHMEM | \
 				 __GFP_MOVABLE)
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-020_temporary/mm/hugetlb.c linux-2.6.21-mm2-025_gfphighuser/mm/hugetlb.c
--- linux-2.6.21-mm2-020_temporary/mm/hugetlb.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-025_gfphighuser/mm/hugetlb.c	2007-05-15 15:49:58.000000000 +0100
@@ -267,7 +267,7 @@ int hugetlb_treat_movable_handler(struct
 {
 	proc_dointvec(table, write, file, buffer, length, ppos);
 	if (hugepages_treat_as_movable)
-		htlb_alloc_mask = GFP_HIGH_MOVABLE;
+		htlb_alloc_mask = GFP_HIGHUSER_MOVABLE;
 	else
 		htlb_alloc_mask = GFP_HIGHUSER;
 	return 0;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-020_temporary/mm/memory.c linux-2.6.21-mm2-025_gfphighuser/mm/memory.c
--- linux-2.6.21-mm2-020_temporary/mm/memory.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-025_gfphighuser/mm/memory.c	2007-05-15 15:49:58.000000000 +0100
@@ -1746,7 +1746,7 @@ gotten:
 		if (!new_page)
 			goto oom;
 	} else {
-		new_page = alloc_page_vma(GFP_HIGH_MOVABLE, vma, address);
+		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
 		if (!new_page)
 			goto oom;
 		cow_user_page(new_page, old_page, address, vma);
@@ -2392,7 +2392,8 @@ static int __do_fault(struct mm_struct *
 				fdata.type = VM_FAULT_OOM;
 				goto out;
 			}
-			page = alloc_page_vma(GFP_HIGH_MOVABLE, vma, address);
+			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE,
+								vma, address);
 			if (!page) {
 				fdata.type = VM_FAULT_OOM;
 				goto out;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-020_temporary/mm/mempolicy.c linux-2.6.21-mm2-025_gfphighuser/mm/mempolicy.c
--- linux-2.6.21-mm2-020_temporary/mm/mempolicy.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-025_gfphighuser/mm/mempolicy.c	2007-05-15 15:49:58.000000000 +0100
@@ -594,7 +594,7 @@ static void migrate_page_add(struct page
 
 static struct page *new_node_page(struct page *page, unsigned long node, int **x)
 {
-	return alloc_pages_node(node, GFP_HIGH_MOVABLE, 0);
+	return alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
 }
 
 /*
@@ -710,7 +710,8 @@ static struct page *new_vma_page(struct 
 {
 	struct vm_area_struct *vma = (struct vm_area_struct *)private;
 
-	return alloc_page_vma(GFP_HIGH_MOVABLE, vma, page_address_in_vma(page, vma));
+	return alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
+					page_address_in_vma(page, vma));
 }
 #else
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-020_temporary/mm/migrate.c linux-2.6.21-mm2-025_gfphighuser/mm/migrate.c
--- linux-2.6.21-mm2-020_temporary/mm/migrate.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-025_gfphighuser/mm/migrate.c	2007-05-15 15:49:58.000000000 +0100
@@ -761,7 +761,8 @@ static struct page *new_page_node(struct
 
 	*result = &pm->status;
 
-	return alloc_pages_node(pm->node, GFP_HIGH_MOVABLE | GFP_THISNODE, 0);
+	return alloc_pages_node(pm->node,
+				GFP_HIGHUSER_MOVABLE | GFP_THISNODE, 0);
 }
 
 /*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-020_temporary/mm/page_alloc.c linux-2.6.21-mm2-025_gfphighuser/mm/page_alloc.c
--- linux-2.6.21-mm2-020_temporary/mm/page_alloc.c	2007-05-15 15:48:04.000000000 +0100
+++ linux-2.6.21-mm2-025_gfphighuser/mm/page_alloc.c	2007-05-15 15:49:58.000000000 +0100
@@ -1808,7 +1808,7 @@ unsigned int nr_free_buffer_pages(void)
  */
 unsigned int nr_free_pagecache_pages(void)
 {
-	return nr_free_zone_pages(gfp_zone(GFP_HIGH_MOVABLE));
+	return nr_free_zone_pages(gfp_zone(GFP_HIGHUSER_MOVABLE));
 }
 
 static inline void show_node(struct zone *zone)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-020_temporary/mm/swap_prefetch.c linux-2.6.21-mm2-025_gfphighuser/mm/swap_prefetch.c
--- linux-2.6.21-mm2-020_temporary/mm/swap_prefetch.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-025_gfphighuser/mm/swap_prefetch.c	2007-05-15 15:49:58.000000000 +0100
@@ -204,7 +204,7 @@ static enum trickle_return trickle_swap_
 	 * Get a new page to read from swap. We have already checked the
 	 * watermarks so __alloc_pages will not call on reclaim.
 	 */
-	page = alloc_pages_node(node, GFP_HIGH_MOVABLE & ~__GFP_WAIT, 0);
+	page = alloc_pages_node(node, GFP_HIGHUSER_MOVABLE & ~__GFP_WAIT, 0);
 	if (unlikely(!page)) {
 		ret = TRICKLE_DELAY;
 		goto out;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-020_temporary/mm/swap_state.c linux-2.6.21-mm2-025_gfphighuser/mm/swap_state.c
--- linux-2.6.21-mm2-020_temporary/mm/swap_state.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-025_gfphighuser/mm/swap_state.c	2007-05-15 15:49:58.000000000 +0100
@@ -343,7 +343,8 @@ struct page *read_swap_cache_async(swp_e
 		 * Get a new page to read into from swap.
 		 */
 		if (!new_page) {
-			new_page = alloc_page_vma(GFP_HIGH_MOVABLE, vma, addr);
+			new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE,
+								vma, addr);
 			if (!new_page)
 				break;		/* Out of memory */
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE
  2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
                   ` (6 preceding siblings ...)
  2007-05-15 15:05 ` [PATCH 7/8] Rename GFP_HIGH_MOVABLE to GFP_HIGHUSER_MOVABLE Mel Gorman
@ 2007-05-15 15:05 ` Mel Gorman
  2007-05-15 18:31   ` Christoph Lameter
  2007-05-16  2:33 ` [PATCH 0/8] Review-based updates to grouping pages by mobility KAMEZAWA Hiroyuki
  8 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 15:05 UTC (permalink / raw)
  To: clameter; +Cc: Mel Gorman, linux-mm

This patch marks page cache allocations as __GFP_PAGECACHE instead of
__GFP_MOVABLE. To make code easier to read, a set of three GFP flags are
added called GFP_PAGECACHE, GFP_NOFS_PAGECACHE and GFP_HIGHUSER_PAGECACHE.

Note that allocations required for radix trees are still treated as
RECLAIMABLE after this patch is applied. bdget() also uses GFP_PAGECACHE
now instead of MOVABLE. Previously, it was using MOVABLE even though the
resulting pages were not always directly reclaimable. grow_dev_page() is
changed to use GFP_NOFS_PAGECACHE instead of __GFP_RECLAIMABLE so that it
is grouped with other pagecache pages.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 fs/block_dev.c      |    2 +-
 fs/buffer.c         |    2 +-
 fs/inode.c          |    6 +++---
 include/linux/gfp.h |    6 ++++++
 4 files changed, 11 insertions(+), 5 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-025_gfphighuser/fs/block_dev.c linux-2.6.21-mm2-030_pagecache_mark/fs/block_dev.c
--- linux-2.6.21-mm2-025_gfphighuser/fs/block_dev.c	2007-05-11 21:16:10.000000000 +0100
+++ linux-2.6.21-mm2-030_pagecache_mark/fs/block_dev.c	2007-05-15 12:34:45.000000000 +0100
@@ -578,7 +578,7 @@ struct block_device *bdget(dev_t dev)
 		inode->i_rdev = dev;
 		inode->i_bdev = bdev;
 		inode->i_data.a_ops = &def_blk_aops;
-		mapping_set_gfp_mask(&inode->i_data, GFP_USER|__GFP_MOVABLE);
+		mapping_set_gfp_mask(&inode->i_data, GFP_USER_PAGECACHE);
 		inode->i_data.backing_dev_info = &default_backing_dev_info;
 		spin_lock(&bdev_lock);
 		list_add(&bdev->bd_list, &all_bdevs);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-025_gfphighuser/fs/buffer.c linux-2.6.21-mm2-030_pagecache_mark/fs/buffer.c
--- linux-2.6.21-mm2-025_gfphighuser/fs/buffer.c	2007-05-15 12:28:11.000000000 +0100
+++ linux-2.6.21-mm2-030_pagecache_mark/fs/buffer.c	2007-05-15 12:34:45.000000000 +0100
@@ -990,7 +990,7 @@ grow_dev_page(struct block_device *bdev,
 	struct buffer_head *bh;
 
 	page = find_or_create_page(inode->i_mapping, index,
-					GFP_NOFS|__GFP_RECLAIMABLE);
+					GFP_NOFS_PAGECACHE);
 	if (!page)
 		return NULL;
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-025_gfphighuser/fs/inode.c linux-2.6.21-mm2-030_pagecache_mark/fs/inode.c
--- linux-2.6.21-mm2-025_gfphighuser/fs/inode.c	2007-05-15 12:32:57.000000000 +0100
+++ linux-2.6.21-mm2-030_pagecache_mark/fs/inode.c	2007-05-15 12:34:45.000000000 +0100
@@ -154,7 +154,7 @@ static struct inode *alloc_inode(struct 
 		mapping->a_ops = &empty_aops;
  		mapping->host = inode;
 		mapping->flags = 0;
-		mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
+		mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
 		mapping->assoc_mapping = NULL;
 		mapping->backing_dev_info = &default_backing_dev_info;
 
@@ -536,8 +536,8 @@ repeat:
  *	@sb: superblock
  *
  *	Allocates a new inode for given superblock. The default gfp_mask
- *	for allocations related to inode->i_mapping is GFP_HIGHUSER_MOVABLE.
- *	If HIGHMEM pages are unsuitable or it is known that pages allocated
+ *	for allocations related to inode->i_mapping is GFP_HIGHUSER_PAGECACHE.
+ *	If HIGHMEM pages are unsuitable or it is known that pages allocated
  *	for the page cache are not reclaimable or migratable,
  *	mapping_set_gfp_mask() must be called with suitable flags on the
  *	newly created inode's mapping
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-025_gfphighuser/include/linux/gfp.h linux-2.6.21-mm2-030_pagecache_mark/include/linux/gfp.h
--- linux-2.6.21-mm2-025_gfphighuser/include/linux/gfp.h	2007-05-15 12:32:57.000000000 +0100
+++ linux-2.6.21-mm2-030_pagecache_mark/include/linux/gfp.h	2007-05-15 12:34:45.000000000 +0100
@@ -79,6 +79,12 @@ struct vm_area_struct;
 #define GFP_HIGHUSER_MOVABLE	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
 				 __GFP_HARDWALL | __GFP_HIGHMEM | \
 				 __GFP_MOVABLE)
+#define GFP_NOFS_PAGECACHE	(__GFP_WAIT | __GFP_IO | __GFP_MOVABLE)
+#define GFP_USER_PAGECACHE	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
+				 __GFP_HARDWALL | __GFP_MOVABLE)
+#define GFP_HIGHUSER_PAGECACHE	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
+				 __GFP_HARDWALL | __GFP_HIGHMEM | \
+				 __GFP_MOVABLE)
 
 #ifdef CONFIG_NUMA
 #define GFP_THISNODE	(__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] Do not depend on MAX_ORDER when grouping pages by mobility
  2007-05-15 15:03 ` [PATCH 1/8] Do not depend on MAX_ORDER when " Mel Gorman
@ 2007-05-15 18:19   ` Christoph Lameter
  2007-05-15 19:19     ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2007-05-15 18:19 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm

On Tue, 15 May 2007, Mel Gorman wrote:

>  
>  #define SECTION_BLOCKFLAGS_BITS \
> -		((1 << (PFN_SECTION_SHIFT - (MAX_ORDER-1))) * NR_PAGEBLOCK_BITS)
> +	((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
>  

Ahh, Blockflags so this is not related to SPARSEMEM... 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] Print out statistics in relation to fragmentation avoidance to /proc/fragavoidance
  2007-05-15 15:03 ` [PATCH 2/8] Print out statistics in relation to fragmentation avoidance to /proc/fragavoidance Mel Gorman
@ 2007-05-15 18:25   ` Christoph Lameter
  2007-05-15 19:23     ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2007-05-15 18:25 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm

On Tue, 15 May 2007, Mel Gorman wrote:

> 
> This patch provides fragmentation avoidance statistics via
> /proc/fragavoidance. The information is collected only on request so there

The name is probably a bit strange.

/proc/pagetypeinfo or so?

> The first part is a more detailed version of /proc/buddyinfo and looks like
> 
> Free pages count per migrate type
If you have a header ^^^ then maybe add order on top of the numbers?
> Node 0, zone      DMA, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
> Node 0, zone      DMA, type  Reclaimable      1      0      0      0      0      0      0      0      0      0      0
> Node 0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      0
> Node 0, zone      DMA, type      Reserve      0      4      4      0      0      0      0      1      0      1      0
> Node 0, zone   Normal, type    Unmovable    111      8      4      4      2      3      1      0      0      0      0
> Node 0, zone   Normal, type  Reclaimable    293     89      8      0      0      0      0      0      0      0      0
> Node 0, zone   Normal, type      Movable      1      6     13      9      7      6      3      0      0      0      0
> Node 0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      4
> 
> The second part looks like
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve
> Node 0, zone      DMA            0            1            2            1
> Node 0, zone   Normal            3           17           94            4

What is "blocks"? maxorder blocks? how do I figure out the blocksize? 
Could you include the blocksize here?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived
  2007-05-15 15:05 ` [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived Mel Gorman
@ 2007-05-15 18:29   ` Christoph Lameter
  2007-05-16  0:36   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2007-05-15 18:29 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/8] Rename GFP_HIGH_MOVABLE to GFP_HIGHUSER_MOVABLE
  2007-05-15 15:05 ` [PATCH 7/8] Rename GFP_HIGH_MOVABLE to GFP_HIGHUSER_MOVABLE Mel Gorman
@ 2007-05-15 18:29   ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2007-05-15 18:29 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE
  2007-05-15 15:05 ` [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE Mel Gorman
@ 2007-05-15 18:31   ` Christoph Lameter
  2007-05-15 19:52     ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2007-05-15 18:31 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm

On Tue, 15 May 2007, Mel Gorman wrote:

> This patch marks page cache allocations as __GFP_PAGECACHE instead of
> __GFP_MOVABLE. To make code easier to read, a set of three GFP flags are
> added called GFP_PAGECACHE, GFP_NOFS_PAGECACHE and GFP_HIGHUSER_PAGECACHE.

What motivated this patch? Are there any special flags that are needed for 
the pagecache? 

If we have this flag then we could move the functionality from 
__page_cache_alloc (mm/filemap.c) into the page allocator?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] Do not depend on MAX_ORDER when grouping pages by mobility
  2007-05-15 18:19   ` Christoph Lameter
@ 2007-05-15 19:19     ` Mel Gorman
  0 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 19:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

On Tue, 15 May 2007, Christoph Lameter wrote:

> On Tue, 15 May 2007, Mel Gorman wrote:
>
>>
>>  #define SECTION_BLOCKFLAGS_BITS \
>> -		((1 << (PFN_SECTION_SHIFT - (MAX_ORDER-1))) * NR_PAGEBLOCK_BITS)
>> +	((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
>>
>
> Ahh, Blockflags so this is not related to SPARSEMEM...
>

Only in that a bitmap is allocated per memory section instead of having a 
sparsely populated bitmap allocated for the zone.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] Print out statistics in relation to fragmentation avoidance to /proc/fragavoidance
  2007-05-15 18:25   ` Christoph Lameter
@ 2007-05-15 19:23     ` Mel Gorman
  2007-05-16  0:27       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 19:23 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

On Tue, 15 May 2007, Christoph Lameter wrote:

> On Tue, 15 May 2007, Mel Gorman wrote:
>
>>
>> This patch provides fragmentation avoidance statistics via
>> /proc/fragavoidance. The information is collected only on request so there
>
> The name is probably a bit strange.
>
> /proc/pagetypeinfo or so?
>

/proc/mobilityinfo ?

>> The first part is a more detailed version of /proc/buddyinfo and looks like
>>
>> Free pages count per migrate type
> If you have a header ^^^ then maybe add order on top of the numbers?

I can do that.

>> Node 0, zone      DMA, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
>> Node 0, zone      DMA, type  Reclaimable      1      0      0      0      0      0      0      0      0      0      0
>> Node 0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      0
>> Node 0, zone      DMA, type      Reserve      0      4      4      0      0      0      0      1      0      1      0
>> Node 0, zone   Normal, type    Unmovable    111      8      4      4      2      3      1      0      0      0      0
>> Node 0, zone   Normal, type  Reclaimable    293     89      8      0      0      0      0      0      0      0      0
>> Node 0, zone   Normal, type      Movable      1      6     13      9      7      6      3      0      0      0      0
>> Node 0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      4
>>
>> The second part looks like
>>
>> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve
>> Node 0, zone      DMA            0            1            2            1
>> Node 0, zone   Normal            3           17           94            4
>
> What is "blocks"? maxorder blocks? how do I figure out the blocksize?
> Could you include the blocksize here?
>

Each block contains nr_pages_pageblock number of pages. The number of 
pages can be determined from the dmesg output like;

Built 1 zonelists, mobility grouping on order 10.

In that case, nr_pages_pageblock would be (1UL << 10).

However, the information can be printed here as well as whether mobility 
is on or not.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE
  2007-05-15 18:31   ` Christoph Lameter
@ 2007-05-15 19:52     ` Mel Gorman
  2007-05-15 20:04       ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 19:52 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

On (15/05/07 11:31), Christoph Lameter didst pronounce:
> On Tue, 15 May 2007, Mel Gorman wrote:
> 
> > This patch marks page cache allocations as __GFP_PAGECACHE instead of
> > __GFP_MOVABLE. To make code easier to read, a set of three GFP flags are
> > added called GFP_PAGECACHE, GFP_NOFS_PAGECACHE and GFP_HIGHUSER_PAGECACHE.
> 
> What motivated this patch? Are there any special flags that are needed for 
> the pagecache? 
> 

Initially, it was for similar reasons to why GFP_TEMPORARY was defined
instead of using __GFP_RECLAIMABLE. It was clearer when reading the code if
an allocation was marked PAGECACHE even if it was implemented as __GFP_MOVABLE
for grouping purposes.

> Are there any special flags that are needed for
> the pagecache?
> 
 
Not at the moment in this patchset. However, I have another patch that groups
PAGECACHE pages separate to MOVABLE pages based on a __GFP_PAGECACHE flag. If
large pages were used for IO, it would make sense to group them together
from an internal fragmentation perspective. As readahead pages can exist
in private pools outside of the LRU, it also makes sense to keep page
cache pages away from movable pages referenced by page tables. It didn't
seem urgent enough to post now though.

> If we have this flag then we could move the functionality from
> __page_cache_alloc (mm/filemap.c) into the page allocator?
> 

If __GFP_PAGECACHE was being used, I think that __page_cache_alloc() could
be replaced by a call to alloc_pages() once the flag was set. I can look
into it because it sounds like a nice cleanup.

I've included the group-pagecache-pages-together patch below. I haven't tested
it in a while but you'll see how the __GFP_ flag is defined at least. The
part that defines the __GFP_PAGECACHE part can be easily separated out.

========

Subject: Group page cache pages together when grouping pages by mobility

Currently page cache pages are grouped with MOVABLE allocations. This appears
to work well in practice as page cache pages are usually reclaimable via
the LRU. However, this is not strictly correct as page cache pages can only
be cleaned and discarded, not migrated. During readahead, pages may also
exist on a pool for a period of time instead of on the LRU giving them a
differnet lifecycle to ordinary movable pages.

This patch adds a separate MIGRATE type for page cache pages so they are
grouped together. With the possibility of page cache using different page
sizes, it is benefical to have the same contigous blocks in the same blocks
to reduce interference from other allocation sizes.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>

---
 include/linux/gfp.h             |   34 +++++++++++++++++++++++++++-------
 include/linux/mmzone.h          |    5 +++--
 include/linux/pageblock-flags.h |    2 +-
 mm/page_alloc.c                 |    9 +++++----
 mm/vmstat.c                     |    1 +
 5 files changed, 37 insertions(+), 14 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-lameter_v2r7/include/linux/gfp.h linux-2.6.21-mm2-031_pagecache_gfp/include/linux/gfp.h
--- linux-2.6.21-mm2-lameter_v2r7/include/linux/gfp.h	2007-05-15 15:54:23.000000000 +0100
+++ linux-2.6.21-mm2-031_pagecache_gfp/include/linux/gfp.h	2007-05-15 20:36:55.000000000 +0100
@@ -50,8 +50,9 @@ struct vm_area_struct;
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */
 #define __GFP_MOVABLE	((__force gfp_t)0x100000u)  /* Page is movable */
+#define __GFP_PAGECACHE ((__force gfp_t)0x200000u)  /* Page cache page */
 
-#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 22	/* Room for 22 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* if you forget to add the bitmask here kernel will crash, period */
@@ -59,10 +60,10 @@ struct vm_area_struct;
 			__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 			__GFP_NOFAIL|__GFP_NORETRY|__GFP_COMP| \
 			__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE| \
-			__GFP_RECLAIMABLE|__GFP_MOVABLE)
+			__GFP_RECLAIMABLE|__GFP_MOVABLE|__GFP_PAGECACHE)
 
 /* This mask makes up all the page movable related flags */
-#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
+#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE|__GFP_PAGECACHE)
 
 /* This equals 0, but use constants in case they ever change */
 #define GFP_NOWAIT	(GFP_ATOMIC & ~__GFP_HIGH)
@@ -79,12 +80,12 @@ struct vm_area_struct;
 #define GFP_HIGHUSER_MOVABLE	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
 				 __GFP_HARDWALL | __GFP_HIGHMEM | \
 				 __GFP_MOVABLE)
-#define GFP_NOFS_PAGECACHE	(__GFP_WAIT | __GFP_IO | __GFP_MOVABLE)
+#define GFP_NOFS_PAGECACHE	(__GFP_WAIT | __GFP_IO | __GFP_PAGECACHE)
 #define GFP_USER_PAGECACHE	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
-				 __GFP_HARDWALL | __GFP_MOVABLE)
+				 __GFP_HARDWALL | __GFP_PAGECACHE)
 #define GFP_HIGHUSER_PAGECACHE	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
 				 __GFP_HARDWALL | __GFP_HIGHMEM | \
-				 __GFP_MOVABLE)
+				 __GFP_PAGECACHE)
 
 #ifdef CONFIG_NUMA
 #define GFP_THISNODE	(__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
@@ -104,11 +105,27 @@ struct vm_area_struct;
 /* Convert GFP flags to their corresponding migrate type */
 static inline int allocflags_to_migratetype(gfp_t gfp_flags)
 {
-	WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
+#ifdef CONFIG_DEBUG_VM
+	/*
+	 * This is an expensive check for the valid usage of migrate flags when
+	 * DEBUG_VM is set. It seemed the quickest way to check for multiple
+	 * bits being set
+	 */
+	int nr_bits;
+	unsigned long mask = gfp_flags & GFP_MOVABLE_MASK;
+
+	for (nr_bits = 0; mask; nr_bits++)
+		mask ^= mask & -mask;
+	
+	BUG_ON(nr_bits > 1);
+#endif
 
 	if (unlikely(page_group_by_mobility_disabled))
 		return MIGRATE_UNMOVABLE;
 
+	if (gfp_flags & __GFP_PAGECACHE)
+		return MIGRATE_PAGECACHE;
+
 	/* Cluster based on mobility */
 	return (((gfp_flags & __GFP_MOVABLE) != 0) << 1) |
 		((gfp_flags & __GFP_RECLAIMABLE) != 0);
@@ -127,6 +144,9 @@ static inline enum zone_type gfp_zone(gf
 	if ((flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) ==
 			(__GFP_HIGHMEM | __GFP_MOVABLE))
 		return ZONE_MOVABLE;
+	if ((flags & (__GFP_HIGHMEM | __GFP_PAGECACHE)) ==
+			(__GFP_HIGHMEM | __GFP_PAGECACHE))
+		return ZONE_MOVABLE;
 #ifdef CONFIG_HIGHMEM
 	if (flags & __GFP_HIGHMEM)
 		return ZONE_HIGHMEM;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-lameter_v2r7/include/linux/mmzone.h linux-2.6.21-mm2-031_pagecache_gfp/include/linux/mmzone.h
--- linux-2.6.21-mm2-lameter_v2r7/include/linux/mmzone.h	2007-05-15 15:54:22.000000000 +0100
+++ linux-2.6.21-mm2-031_pagecache_gfp/include/linux/mmzone.h	2007-05-15 20:36:55.000000000 +0100
@@ -38,8 +38,9 @@ extern int page_group_by_mobility_disabl
 #define MIGRATE_UNMOVABLE     0
 #define MIGRATE_RECLAIMABLE   1
 #define MIGRATE_MOVABLE       2
-#define MIGRATE_RESERVE       3
-#define MIGRATE_TYPES         4
+#define MIGRATE_PAGECACHE     3
+#define MIGRATE_RESERVE       4
+#define MIGRATE_TYPES         5
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-lameter_v2r7/include/linux/pageblock-flags.h linux-2.6.21-mm2-031_pagecache_gfp/include/linux/pageblock-flags.h
--- linux-2.6.21-mm2-lameter_v2r7/include/linux/pageblock-flags.h	2007-05-15 15:54:22.000000000 +0100
+++ linux-2.6.21-mm2-031_pagecache_gfp/include/linux/pageblock-flags.h	2007-05-15 20:36:55.000000000 +0100
@@ -31,7 +31,7 @@
 
 /* Bit indices that affect a whole block of pages */
 enum pageblock_bits {
-	PB_range(PB_migrate, 2), /* 2 bits required for migrate types */
+	PB_range(PB_migrate, 3), /* 3 bits required for migrate types */
 	NR_PAGEBLOCK_BITS
 };
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-lameter_v2r7/mm/page_alloc.c linux-2.6.21-mm2-031_pagecache_gfp/mm/page_alloc.c
--- linux-2.6.21-mm2-lameter_v2r7/mm/page_alloc.c	2007-05-15 15:54:23.000000000 +0100
+++ linux-2.6.21-mm2-031_pagecache_gfp/mm/page_alloc.c	2007-05-15 20:36:55.000000000 +0100
@@ -697,10 +697,11 @@ static struct page *__rmqueue_smallest(s
  * the free lists for the desirable migrate type are depleted
  */
 static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
-	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_PAGECACHE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
+	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_PAGECACHE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
+	[MIGRATE_MOVABLE]     = { MIGRATE_PAGECACHE,   MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
+	[MIGRATE_PAGECACHE]   = { MIGRATE_MOVABLE,     MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-lameter_v2r7/mm/vmstat.c linux-2.6.21-mm2-031_pagecache_gfp/mm/vmstat.c
--- linux-2.6.21-mm2-lameter_v2r7/mm/vmstat.c	2007-05-15 15:54:22.000000000 +0100
+++ linux-2.6.21-mm2-031_pagecache_gfp/mm/vmstat.c	2007-05-15 20:36:55.000000000 +0100
@@ -400,6 +400,7 @@ static char * const migratetype_names[MI
 	"Unmovable",
 	"Reclaimable",
 	"Movable",
+	"Pagecache",
 	"Reserve",
 };
 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE
  2007-05-15 19:52     ` Mel Gorman
@ 2007-05-15 20:04       ` Christoph Lameter
  2007-05-15 20:20         ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2007-05-15 20:04 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm

On Tue, 15 May 2007, Mel Gorman wrote:

> Currently page cache pages are grouped with MOVABLE allocations. This appears
> to work well in practice as page cache pages are usually reclaimable via
> the LRU. However, this is not strictly correct as page cache pages can only
> be cleaned and discarded, not migrated. During readahead, pages may also
> exist on a pool for a period of time instead of on the LRU giving them a
> differnet lifecycle to ordinary movable pages.

Sorry but pagecache pages can be migrated.
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE
  2007-05-15 20:04       ` Christoph Lameter
@ 2007-05-15 20:20         ` Mel Gorman
  2007-05-15 20:36           ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 20:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Linux Memory Management List

On Tue, 15 May 2007, Christoph Lameter wrote:

> On Tue, 15 May 2007, Mel Gorman wrote:
>
>> Currently page cache pages are grouped with MOVABLE allocations. This appears
>> to work well in practice as page cache pages are usually reclaimable via
>> the LRU. However, this is not strictly correct as page cache pages can only
>> be cleaned and discarded, not migrated. During readahead, pages may also
>> exist on a pool for a period of time instead of on the LRU giving them a
>> differnet lifecycle to ordinary movable pages.
>
> Sorry but pagecache pages can be migrated.
>

Poor phrasing prehaps. I was under the impression that page migration was 
only concerned with pages mapped by process page tables for the 
move_pages() call. The statement above was also referring to pages read by 
readahead and normal file IO. I'm pretty sure they could be migrated 
without difficulty though once the source pages are identified. Either 
way, the separate grouping of page cache is probably not worthwhile for 
the moment.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE
  2007-05-15 20:20         ` Mel Gorman
@ 2007-05-15 20:36           ` Christoph Lameter
  2007-05-15 20:50             ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2007-05-15 20:36 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux Memory Management List

On Tue, 15 May 2007, Mel Gorman wrote:

> On Tue, 15 May 2007, Christoph Lameter wrote:
> 
> > On Tue, 15 May 2007, Mel Gorman wrote:
> > 
> > > Currently page cache pages are grouped with MOVABLE allocations. This
> > > appears
> > > to work well in practice as page cache pages are usually reclaimable via
> > > the LRU. However, this is not strictly correct as page cache pages can
> > > only
> > > be cleaned and discarded, not migrated. During readahead, pages may also
> > > exist on a pool for a period of time instead of on the LRU giving them a
> > > differnet lifecycle to ordinary movable pages.
> > 
> > Sorry but pagecache pages can be migrated.
> > 
> 
> Poor phrasing prehaps. I was under the impression that page migration was only
> concerned with pages mapped by process page tables for the move_pages() call.
> The statement above was also referring to pages read by readahead and normal
> file IO. I'm pretty sure they could be migrated without difficulty though once
> the source pages are identified. Either way, the separate grouping of page
> cache is probably not worthwhile for the moment.

So page cache = unmapped I/O pages? These can also be migrated. They still 
carry a refcount of the radix tree and page migration will have to update 
that pointer.

Page migration in its current form is indeed only used to move mapped 
pages but that is incidental to the current usage patterns. It is intended 
to be a generic page migration framework.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE
  2007-05-15 20:36           ` Christoph Lameter
@ 2007-05-15 20:50             ` Mel Gorman
  0 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2007-05-15 20:50 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Linux Memory Management List

On Tue, 15 May 2007, Christoph Lameter wrote:

> On Tue, 15 May 2007, Mel Gorman wrote:
>
>> On Tue, 15 May 2007, Christoph Lameter wrote:
>>
>>> On Tue, 15 May 2007, Mel Gorman wrote:
>>>
>>>> Currently page cache pages are grouped with MOVABLE allocations. This
>>>> appears
>>>> to work well in practice as page cache pages are usually reclaimable via
>>>> the LRU. However, this is not strictly correct as page cache pages can
>>>> only
>>>> be cleaned and discarded, not migrated. During readahead, pages may also
>>>> exist on a pool for a period of time instead of on the LRU giving them a
>>>> differnet lifecycle to ordinary movable pages.
>>>
>>> Sorry but pagecache pages can be migrated.
>>>
>>
>> Poor phrasing prehaps. I was under the impression that page migration was only
>> concerned with pages mapped by process page tables for the move_pages() call.
>> The statement above was also referring to pages read by readahead and normal
>> file IO. I'm pretty sure they could be migrated without difficulty though once
>> the source pages are identified. Either way, the separate grouping of page
>> cache is probably not worthwhile for the moment.
>
> So page cache = unmapped I/O pages?

Unmapped IO pages as well as the mapped pages. As far as grouping pages by 
mobility is concerned, it is difficult to tell the difference at the time 
of allocation without excessive use of __GFP flags. The grouping is 
probably not worthwhile but for clarity, the use GFP_*_PAGECACHE is. Using 
__GFP_PAGECACHE to clear up __page_cache_alloc() is worth looking at but 
I'm not sure the cost of a __GFP_ flag is justified

> These can also be migrated. They still
> carry a refcount of the radix tree and page migration will have to update
> that pointer.
>
> Page migration in its current form is indeed only used to move mapped
> pages but that is incidental to the current usage patterns. It is intended
> to be a generic page migration framework.
>

Perfect. That matches my current understanding.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] Print out statistics in relation to fragmentation avoidance to /proc/fragavoidance
  2007-05-15 19:23     ` Mel Gorman
@ 2007-05-16  0:27       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-16  0:27 UTC (permalink / raw)
  To: Mel Gorman; +Cc: clameter, linux-mm

On Tue, 15 May 2007 20:23:21 +0100 (IST)
Mel Gorman <mel@csn.ul.ie> wrote:

> On Tue, 15 May 2007, Christoph Lameter wrote:
> 
> > On Tue, 15 May 2007, Mel Gorman wrote:
> >
> >>
> >> This patch provides fragmentation avoidance statistics via
> >> /proc/fragavoidance. The information is collected only on request so there
> >
> > The name is probably a bit strange.
> >
> > /proc/pagetypeinfo or so?
> >
> 
> /proc/mobilityinfo ?
> 
I vote pagetypeinfo or pagegroupinfo :)

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived
  2007-05-15 15:05 ` [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived Mel Gorman
  2007-05-15 18:29   ` Christoph Lameter
@ 2007-05-16  0:36   ` KAMEZAWA Hiroyuki
  2007-05-16  0:52     ` Christoph Lameter
  1 sibling, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-16  0:36 UTC (permalink / raw)
  To: Mel Gorman; +Cc: clameter, linux-mm

On Tue, 15 May 2007 16:05:12 +0100 (IST)
Mel Gorman <mel@csn.ul.ie> wrote:

> Currently allocations that are short-lived or reclaimable by the kernel are
> grouped together by specifying __GFP_RECLAIMABLE in the GFP flags. However,
> it is confusing when reading code to see a temporary allocation using
> __GFP_RECLAIMABLE when it is clearly not reclaimable.
> 
> This patch adds __GFP_TEMPORARY, GFP_TEMPORARY and SLAB_TEMPORARY for
> temporary allocations. 

What kind of objects should be considered to be TEMPORARY (short-lived) ?
It seems hard-to-use if no documentation.
Could you add clear explanation in header file ?

In my understanding, following case is typical.

==
foo() {
	alloc();
	do some work
	free();
}
==

Other cases ?

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived
  2007-05-16  0:36   ` KAMEZAWA Hiroyuki
@ 2007-05-16  0:52     ` Christoph Lameter
  2007-05-16  9:04       ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2007-05-16  0:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Mel Gorman, linux-mm

On Wed, 16 May 2007, KAMEZAWA Hiroyuki wrote:

> What kind of objects should be considered to be TEMPORARY (short-lived) ?
> It seems hard-to-use if no documentation.
> Could you add clear explanation in header file ?
> 
> In my understanding, following case is typical.
> 
> ==
> foo() {
> 	alloc();
> 	do some work
> 	free();
> }
> ==
> 
> Other cases ?

GFP_TEMPORARY means that the memory will be freed in a short time without 
further kernel intervention. I.e. there is no reclaim pass, user 
intervention or other cleanup needed. I think network slabs also fit that 
description.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/8] Review-based updates to grouping pages by mobility
  2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
                   ` (7 preceding siblings ...)
  2007-05-15 15:05 ` [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE Mel Gorman
@ 2007-05-16  2:33 ` KAMEZAWA Hiroyuki
  2007-05-16  8:58   ` Mel Gorman
  8 siblings, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-16  2:33 UTC (permalink / raw)
  To: Mel Gorman; +Cc: clameter, linux-mm

On Tue, 15 May 2007 16:03:11 +0100 (IST)
Mel Gorman <mel@csn.ul.ie> wrote:

> Hi Christoph,
> 
> The following patches address points brought up by your review of the
> grouping pages by mobility patches. There are quite a number of patches here.
> 
May I have a question ?
Not about this patch but about 2.6.21-mm2.

In free_hot_cold_page()

==
static void fastcall free_hot_cold_page(struct page *page, int cold)
{
        struct zone *zone = page_zone(page);
        struct per_cpu_pages *pcp;
        unsigned long flags;
<snip>
	set_page_private(page, get_pageblock_migratetype(page));
        pcp->count++;
        if (pcp->count >= pcp->high) {
                free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
                pcp->count -= pcp->batch;
        }

==

get_pageblock_migratetype(page) is called without zone->lock.

Is this safe ? or should we add seqlock(or something) to access
migrate type bitmap ?

-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/8] Review-based updates to grouping pages by mobility
  2007-05-16  2:33 ` [PATCH 0/8] Review-based updates to grouping pages by mobility KAMEZAWA Hiroyuki
@ 2007-05-16  8:58   ` Mel Gorman
  0 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2007-05-16  8:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: clameter, linux-mm

On Wed, 16 May 2007, KAMEZAWA Hiroyuki wrote:

> On Tue, 15 May 2007 16:03:11 +0100 (IST)
> Mel Gorman <mel@csn.ul.ie> wrote:
>
>> Hi Christoph,
>>
>> The following patches address points brought up by your review of the
>> grouping pages by mobility patches. There are quite a number of patches here.
>>
> May I have a question ?
> Not about this patch but about 2.6.21-mm2.
>
> In free_hot_cold_page()
>
> ==
> static void fastcall free_hot_cold_page(struct page *page, int cold)
> {
>        struct zone *zone = page_zone(page);
>        struct per_cpu_pages *pcp;
>        unsigned long flags;
> <snip>
> 	set_page_private(page, get_pageblock_migratetype(page));
>        pcp->count++;
>        if (pcp->count >= pcp->high) {
>                free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
>                pcp->count -= pcp->batch;
>        }
>
> ==
>
> get_pageblock_migratetype(page) is called without zone->lock.
>

Indeed, this is the per-cpu allocator so acquiring a lock defeats the 
point.

> Is this safe ? or should we add seqlock(or something) to access
> migrate type bitmap ?
>

It's safe.

At worst, the pcp free calls get_pageblock_migratetype() and gets the 
wrong migrate type. For that to happen, it means that an allocator under 
lock has "stolen" the block already contains pages of a mixed type. As the 
block is already mixed, the situation has not gotten any worse.

If the pcp page gets a migrate type > MIGRATE_TYPE, it will remain on the 
pcp lists until a batch free occurs in which case it will call 
get_pageblock_migratetype() again under the zone->lock this time, get the 
right type and be freed.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived
  2007-05-16  0:52     ` Christoph Lameter
@ 2007-05-16  9:04       ` Mel Gorman
  0 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2007-05-16  9:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: KAMEZAWA Hiroyuki, linux-mm

On Tue, 15 May 2007, Christoph Lameter wrote:

> On Wed, 16 May 2007, KAMEZAWA Hiroyuki wrote:
>
>> What kind of objects should be considered to be TEMPORARY (short-lived) ?
>> It seems hard-to-use if no documentation.
>> Could you add clear explanation in header file ?
>>
>> In my understanding, following case is typical.
>>
>> ==
>> foo() {
>> 	alloc();
>> 	do some work
>> 	free();
>> }
>> ==
>>
>> Other cases ?
>
> GFP_TEMPORARY means that the memory will be freed in a short time without
> further kernel intervention. I.e. there is no reclaim pass, user
> intervention or other cleanup needed. I think network slabs also fit that
> description.
>

Exactly.

Hint taken though. Better documentation of the flags is on the TODO list.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2007-05-16  9:04 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-15 15:03 [PATCH 0/8] Review-based updates to grouping pages by mobility Mel Gorman
2007-05-15 15:03 ` [PATCH 1/8] Do not depend on MAX_ORDER when " Mel Gorman
2007-05-15 18:19   ` Christoph Lameter
2007-05-15 19:19     ` Mel Gorman
2007-05-15 15:03 ` [PATCH 2/8] Print out statistics in relation to fragmentation avoidance to /proc/fragavoidance Mel Gorman
2007-05-15 18:25   ` Christoph Lameter
2007-05-15 19:23     ` Mel Gorman
2007-05-16  0:27       ` KAMEZAWA Hiroyuki
2007-05-15 15:04 ` [PATCH 3/8] Print out PAGE_OWNER statistics in relation to fragmentation avoidance Mel Gorman
2007-05-15 15:04 ` [PATCH 4/8] Mark bio_alloc() allocations correctly Mel Gorman
2007-05-15 15:04 ` [PATCH 5/8] Do not annotate shmem allocations explicitly Mel Gorman
2007-05-15 15:05 ` [PATCH 6/8] Add __GFP_TEMPORARY to identify allocations that are short-lived Mel Gorman
2007-05-15 18:29   ` Christoph Lameter
2007-05-16  0:36   ` KAMEZAWA Hiroyuki
2007-05-16  0:52     ` Christoph Lameter
2007-05-16  9:04       ` Mel Gorman
2007-05-15 15:05 ` [PATCH 7/8] Rename GFP_HIGH_MOVABLE to GFP_HIGHUSER_MOVABLE Mel Gorman
2007-05-15 18:29   ` Christoph Lameter
2007-05-15 15:05 ` [PATCH 8/8] Mark page cache pages as __GFP_PAGECACHE instead of __GFP_MOVABLE Mel Gorman
2007-05-15 18:31   ` Christoph Lameter
2007-05-15 19:52     ` Mel Gorman
2007-05-15 20:04       ` Christoph Lameter
2007-05-15 20:20         ` Mel Gorman
2007-05-15 20:36           ` Christoph Lameter
2007-05-15 20:50             ` Mel Gorman
2007-05-16  2:33 ` [PATCH 0/8] Review-based updates to grouping pages by mobility KAMEZAWA Hiroyuki
2007-05-16  8:58   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox