linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Reclaim page capture v1
@ 2008-07-01 17:58 Andy Whitcroft
  2008-07-01 17:58 ` [PATCH 1/4] pull out the page pre-release and sanity check logic for reuse Andy Whitcroft
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Andy Whitcroft @ 2008-07-01 17:58 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Mel Gorman, Andy Whitcroft

For sometime we have been looking at mechanisms for improving the availability
of larger allocations under load.  One of the options we have explored is
the capturing of pages freed under direct reclaim in order to increase the
chances of free pages coelescing before they are subject to reallocation
by racing allocators.

Following this email is a patch stack implementing page capture during
direct reclaim.  It consits of four patches.  The first two simply pull
out existing code into helpers for reuse.  The third makes buddy's use
of struct page explicit.  The fourth contains the meat of the changes,
and its leader contains a much fuller description of the feature.

I have done a fair amount of comparitive testing with and without
this patch set and in broad brush I am seeing improvements in hugepage
allocations (worst case size) success of the order of 5% which under
load for systems with larger hugepages represents a doubling of the number
of pages available.  Testing is still ongoing to confirm these results.

Against: 2.6.26-rc6 (with the explicit page flags patches)

Comments?

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] pull out the page pre-release and sanity check logic for reuse
  2008-07-01 17:58 [RFC PATCH 0/4] Reclaim page capture v1 Andy Whitcroft
@ 2008-07-01 17:58 ` Andy Whitcroft
  2008-07-01 17:58 ` [PATCH 2/4] pull out zone cpuset and watermark checks " Andy Whitcroft
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Andy Whitcroft @ 2008-07-01 17:58 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Mel Gorman, Andy Whitcroft

When we are about to release a page we perform a number of actions
on that page.  We clear down any anonymous mappings, confirm that
the page is safe to release, check for freeing locks, before mapping
the page should that be required.  Pull this processing out into a
helper function for reuse in a later patch.

Note that we do not convert the similar cleardown in free_hot_cold_page()
as the optimiser is unable to squash the loops during the inline.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/page_alloc.c |   43 ++++++++++++++++++++++++++++++-------------
 1 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8aa93f3..758ecf1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -468,6 +468,35 @@ static inline int free_pages_check(struct page *page)
 }
 
 /*
+ * Prepare this page for release to the buddy.  Sanity check the page.
+ * Returns 1 if the page is safe to free.
+ */
+static inline int free_page_prepare(struct page *page, int order)
+{
+	int i;
+	int reserved = 0;
+
+	if (PageAnon(page))
+		page->mapping = NULL;
+
+	for (i = 0 ; i < (1 << order) ; ++i)
+		reserved += free_pages_check(page + i);
+	if (reserved)
+		return 0;
+
+	if (!PageHighMem(page)) {
+		debug_check_no_locks_freed(page_address(page),
+							PAGE_SIZE << order);
+		debug_check_no_obj_freed(page_address(page),
+					   PAGE_SIZE << order);
+	}
+	arch_free_page(page, order);
+	kernel_map_pages(page, 1 << order, 0);
+
+	return 1;
+}
+
+/*
  * Frees a list of pages. 
  * Assumes all pages on list are in same zone, and of same order.
  * count is the number of pages to free.
@@ -508,22 +537,10 @@ static void free_one_page(struct zone *zone, struct page *page, int order)
 static void __free_pages_ok(struct page *page, unsigned int order)
 {
 	unsigned long flags;
-	int i;
-	int reserved = 0;
 
-	for (i = 0 ; i < (1 << order) ; ++i)
-		reserved += free_pages_check(page + i);
-	if (reserved)
+	if (!free_page_prepare(page, order))
 		return;
 
-	if (!PageHighMem(page)) {
-		debug_check_no_locks_freed(page_address(page),PAGE_SIZE<<order);
-		debug_check_no_obj_freed(page_address(page),
-					   PAGE_SIZE << order);
-	}
-	arch_free_page(page, order);
-	kernel_map_pages(page, 1 << order, 0);
-
 	local_irq_save(flags);
 	__count_vm_events(PGFREE, 1 << order);
 	free_one_page(page_zone(page), page, order);
-- 
1.5.6.1.201.g3e7d3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/4] pull out zone cpuset and watermark checks for reuse
  2008-07-01 17:58 [RFC PATCH 0/4] Reclaim page capture v1 Andy Whitcroft
  2008-07-01 17:58 ` [PATCH 1/4] pull out the page pre-release and sanity check logic for reuse Andy Whitcroft
@ 2008-07-01 17:58 ` Andy Whitcroft
  2008-07-02  8:06   ` KOSAKI Motohiro
  2008-07-01 17:58 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
  2008-07-01 17:58 ` [PATCH 4/4] capture pages freed during direct reclaim for allocation by the reclaimer Andy Whitcroft
  3 siblings, 1 reply; 15+ messages in thread
From: Andy Whitcroft @ 2008-07-01 17:58 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Mel Gorman, Andy Whitcroft

When allocating we need to confirm that the zone we are about to allocate
from is acceptable to the CPUSET we are in, and that it does not violate
the zone watermarks.  Pull these checks out so we can reuse them in a
later patch.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/page_alloc.c |   62 ++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 758ecf1..4d9c4e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1248,6 +1248,44 @@ int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
 	return 1;
 }
 
+/*
+ * Return 1 if this zone is an acceptable source given the cpuset
+ * constraints.
+ */
+static inline int zone_cpuset_ok(struct zone *zone,
+					int alloc_flags, gfp_t gfp_mask)
+{
+	if ((alloc_flags & ALLOC_CPUSET) &&
+	    !cpuset_zone_allowed_softwall(zone, gfp_mask))
+		return 0;
+	return 1;
+}
+
+/*
+ * Return 1 if this zone is within the watermarks specified by the
+ * allocation flags.
+ */
+static inline int zone_alloc_ok(struct zone *zone, int order,
+			int classzone_idx, int alloc_flags, gfp_t gfp_mask)
+{
+	if (!(alloc_flags & ALLOC_NO_WATERMARKS)) {
+		unsigned long mark;
+		if (alloc_flags & ALLOC_WMARK_MIN)
+			mark = zone->pages_min;
+		else if (alloc_flags & ALLOC_WMARK_LOW)
+			mark = zone->pages_low;
+		else
+			mark = zone->pages_high;
+		if (!zone_watermark_ok(zone, order, mark,
+			    classzone_idx, alloc_flags)) {
+			if (!zone_reclaim_mode ||
+					!zone_reclaim(zone, gfp_mask, order))
+				return 0;
+		}
+	}
+	return 1;
+}
+
 #ifdef CONFIG_NUMA
 /*
  * zlc_setup - Setup for "zonelist cache".  Uses cached zone data to
@@ -1401,25 +1439,11 @@ zonelist_scan:
 		if (NUMA_BUILD && zlc_active &&
 			!zlc_zone_worth_trying(zonelist, z, allowednodes))
 				continue;
-		if ((alloc_flags & ALLOC_CPUSET) &&
-			!cpuset_zone_allowed_softwall(zone, gfp_mask))
-				goto try_next_zone;
-
-		if (!(alloc_flags & ALLOC_NO_WATERMARKS)) {
-			unsigned long mark;
-			if (alloc_flags & ALLOC_WMARK_MIN)
-				mark = zone->pages_min;
-			else if (alloc_flags & ALLOC_WMARK_LOW)
-				mark = zone->pages_low;
-			else
-				mark = zone->pages_high;
-			if (!zone_watermark_ok(zone, order, mark,
-				    classzone_idx, alloc_flags)) {
-				if (!zone_reclaim_mode ||
-				    !zone_reclaim(zone, gfp_mask, order))
-					goto this_zone_full;
-			}
-		}
+		if (!zone_cpuset_ok(zone, alloc_flags, gfp_mask))
+			goto try_next_zone;
+		if (!zone_alloc_ok(zone, order, classzone_idx,
+							alloc_flags, gfp_mask))
+			goto this_zone_full;
 
 		page = buffered_rmqueue(preferred_zone, zone, order, gfp_mask);
 		if (page)
-- 
1.5.6.1.201.g3e7d3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] buddy: explicitly identify buddy field use in struct page
  2008-07-01 17:58 [RFC PATCH 0/4] Reclaim page capture v1 Andy Whitcroft
  2008-07-01 17:58 ` [PATCH 1/4] pull out the page pre-release and sanity check logic for reuse Andy Whitcroft
  2008-07-01 17:58 ` [PATCH 2/4] pull out zone cpuset and watermark checks " Andy Whitcroft
@ 2008-07-01 17:58 ` Andy Whitcroft
  2008-07-01 17:58 ` [PATCH 4/4] capture pages freed during direct reclaim for allocation by the reclaimer Andy Whitcroft
  3 siblings, 0 replies; 15+ messages in thread
From: Andy Whitcroft @ 2008-07-01 17:58 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Mel Gorman, Andy Whitcroft

Explicitly define the struct page fields which buddy uses when it owns
pages.  Defines a new anonymous struct to allow additional fields to
be defined in a later patch.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 include/linux/mm_types.h |    3 +++
 mm/internal.h            |    2 +-
 mm/page_alloc.c          |    4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 02a27ae..45eb71f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -69,6 +69,9 @@ struct page {
 #endif
 	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
 	    struct page *first_page;	/* Compound tail pages */
+	    struct {
+		unsigned long buddy_order;     /* buddy: free page order */
+	    };
 	};
 	union {
 		pgoff_t index;		/* Our offset within mapping. */
diff --git a/mm/internal.h b/mm/internal.h
index 0034e94..ac0f600 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -44,7 +44,7 @@ extern void __free_pages_bootmem(struct page *page, unsigned int order);
 static inline unsigned long page_order(struct page *page)
 {
 	VM_BUG_ON(!PageBuddy(page));
-	return page_private(page);
+	return page->buddy_order;
 }
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4d9c4e8..d73e1e1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -316,14 +316,14 @@ static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)
 
 static inline void set_page_order(struct page *page, int order)
 {
-	set_page_private(page, order);
+	page->buddy_order = order;
 	__SetPageBuddy(page);
 }
 
 static inline void rmv_page_order(struct page *page)
 {
 	__ClearPageBuddy(page);
-	set_page_private(page, 0);
+	page->buddy_order = 0;
 }
 
 /*
-- 
1.5.6.1.201.g3e7d3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/4] capture pages freed during direct reclaim for allocation by the reclaimer
  2008-07-01 17:58 [RFC PATCH 0/4] Reclaim page capture v1 Andy Whitcroft
                   ` (2 preceding siblings ...)
  2008-07-01 17:58 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
@ 2008-07-01 17:58 ` Andy Whitcroft
  2008-07-02 12:01   ` KOSAKI Motohiro
  3 siblings, 1 reply; 15+ messages in thread
From: Andy Whitcroft @ 2008-07-01 17:58 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Mel Gorman, Andy Whitcroft

When a process enters direct reclaim it will spend a considerable effort
identifying and releasing pages in the hope of obtaining a page.  However
as these pages are released asynchronously there is every possibility
that the pages will have been consumed by other allocators before the
reclaimer gets a look in.  This is particularly problematic where the
reclaimer is attempting to allocate a higher order page.  It is highly
likely that a parallel allocation will consume lower order constituent
pages as we release them preventing them coelescing into the higher order
page the reclaimer desires.

This patch set attempts to address this by temporarily collecting the pages
we are releasing onto a local free list.  Instead of freeing them to the
main buddy lists, pages are collected and coelesced on this per direct
reclaimer free list.  Pages which are freed by other processes are also
considered, where they coelesce with a page already under capture they
will be moved to the capture list.  When pressure has been applied to
a zone we then consult the capture list and if there is an appropriatly
sized page available it is taken immediatly and the remainder returned to
the free pool.  Capture is only enabled when the reclaimer's allocation
order exceeds ALLOC_COSTLY_ORDER, as free pages below this order should
naturally occur in large numbers following regular reclaim.

Thanks go to Mel Gorman for numerous discussions during the development
of this patch and for his repeated reviews.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 include/linux/mm_types.h   |    1 +
 include/linux/page-flags.h |    6 ++
 mm/internal.h              |    6 ++
 mm/page_alloc.c            |  142 +++++++++++++++++++++++++++++++++++++++++++-
 mm/vmscan.c                |  112 +++++++++++++++++++++++++++++------
 5 files changed, 247 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 45eb71f..67229ba 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -71,6 +71,7 @@ struct page {
 	    struct page *first_page;	/* Compound tail pages */
 	    struct {
 		unsigned long buddy_order;     /* buddy: free page order */
+		struct list_head *buddy_free;  /* buddy: free list pointer */
 	    };
 	};
 	union {
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0df0e75..405db40 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -111,6 +111,9 @@ enum pageflags {
 	/* SLUB */
 	PG_slub_frozen = PG_active,
 	PG_slub_debug = PG_error,
+
+	/* BUDDY overlays. */
+	PG_buddy_capture = PG_owner_priv_1,
 };
 
 #ifndef __GENERATING_BOUNDS_H
@@ -187,6 +190,9 @@ __PAGEFLAG(SlubDebug, slub_debug)
  */
 TESTPAGEFLAG(Writeback, writeback) TESTSCFLAG(Writeback, writeback)
 __PAGEFLAG(Buddy, buddy)
+PAGEFLAG(BuddyCapture, buddy_capture)	/* A buddy page, but reserved. */
+	__SETPAGEFLAG(BuddyCapture, buddy_capture)
+	__CLEARPAGEFLAG(BuddyCapture, buddy_capture)
 PAGEFLAG(MappedToDisk, mappedtodisk)
 
 /* PG_readahead is only used for file reads; PG_reclaim is only for writes */
diff --git a/mm/internal.h b/mm/internal.h
index ac0f600..e17f7f7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -59,4 +59,10 @@ static inline unsigned long page_order(struct page *page)
 #define __paginginit __init
 #endif
 
+extern struct page *capture_alloc_or_return(struct zone *, struct zone *,
+					struct list_head *, int, int, gfp_t);
+void capture_one_page(struct list_head *, struct zone *, struct page *, int);
+unsigned long try_to_free_pages_alloc(struct page **, struct zonelist *,
+					nodemask_t *, int, gfp_t, int);
+
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d73e1e1..1ac703d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -410,6 +410,51 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
  * -- wli
  */
 
+static inline void __capture_one_page(struct list_head *capture_list,
+		struct page *page, struct zone *zone, unsigned int order)
+{
+	unsigned long page_idx;
+	unsigned long order_size = 1UL << order;
+
+	if (unlikely(PageCompound(page)))
+		destroy_compound_page(page, order);
+
+	page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
+
+	VM_BUG_ON(page_idx & (order_size - 1));
+	VM_BUG_ON(bad_range(zone, page));
+
+	while (order < MAX_ORDER-1) {
+		unsigned long combined_idx;
+		struct page *buddy;
+
+		buddy = __page_find_buddy(page, page_idx, order);
+		if (!page_is_buddy(page, buddy, order))
+			break;
+
+		/* Our buddy is free, merge with it and move up one order. */
+		list_del(&buddy->lru);
+		if (PageBuddyCapture(buddy)) {
+			buddy->buddy_free = 0;
+			__ClearPageBuddyCapture(buddy);
+		} else {
+			zone->free_area[order].nr_free--;
+			__mod_zone_page_state(zone,
+					NR_FREE_PAGES, -(1UL << order));
+		}
+		rmv_page_order(buddy);
+		combined_idx = __find_combined_index(page_idx, order);
+		page = page + (combined_idx - page_idx);
+		page_idx = combined_idx;
+		order++;
+	}
+	set_page_order(page, order);
+	__SetPageBuddyCapture(page);
+	page->buddy_free = capture_list;
+
+	list_add(&page->lru, capture_list);
+}
+
 static inline void __free_one_page(struct page *page,
 		struct zone *zone, unsigned int order)
 {
@@ -433,6 +478,12 @@ static inline void __free_one_page(struct page *page,
 		buddy = __page_find_buddy(page, page_idx, order);
 		if (!page_is_buddy(page, buddy, order))
 			break;
+		if (PageBuddyCapture(buddy)) {
+			__mod_zone_page_state(zone,
+					NR_FREE_PAGES, -(1UL << order));
+			return __capture_one_page(buddy->buddy_free,
+							page, zone, order);
+		}
 
 		/* Our buddy is free, merge with it and move up one order. */
 		list_del(&buddy->lru);
@@ -534,6 +585,19 @@ static void free_one_page(struct zone *zone, struct page *page, int order)
 	spin_unlock(&zone->lock);
 }
 
+void capture_one_page(struct list_head *free_list,
+			struct zone *zone, struct page *page, int order)
+{
+	unsigned long flags;
+
+	if (!free_page_prepare(page, order))
+		return;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	__capture_one_page(free_list, page, zone, order);
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
 static void __free_pages_ok(struct page *page, unsigned int order)
 {
 	unsigned long flags;
@@ -607,6 +671,18 @@ static inline void expand(struct zone *zone, struct page *page,
 	}
 }
 
+void __carve_off(struct page *page, unsigned long actual_order,
+					unsigned long desired_order)
+{
+	int migratetype = get_pageblock_migratetype(page);
+	struct zone *zone = page_zone(page);
+	struct free_area *area = &(zone->free_area[actual_order]);
+
+	__mod_zone_page_state(zone, NR_FREE_PAGES,
+				(1UL << actual_order) - (1UL << desired_order));
+	expand(zone, page, desired_order, actual_order, area, migratetype);
+}
+
 /*
  * This page is about to be returned from the page allocator
  */
@@ -1585,11 +1661,15 @@ nofail_alloc:
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	did_some_progress = try_to_free_pages(zonelist, order, gfp_mask);
+	did_some_progress = try_to_free_pages_alloc(&page, zonelist, nodemask,
+						order, gfp_mask, alloc_flags);
 
 	p->reclaim_state = NULL;
 	p->flags &= ~PF_MEMALLOC;
 
+	if (page)
+		goto got_pg;
+
 	cond_resched();
 
 	if (order != 0)
@@ -4561,6 +4641,66 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
+
+/*
+ * Run through the accumulated list of captured pages and the first
+ * which is big enough to satisfy the original allocation.  Free
+ * the remainder of that page and all other pages.
+ */
+struct page *capture_alloc_or_return(struct zone *zone,
+		struct zone *preferred_zone, struct list_head *capture_list,
+		int order, int alloc_flags, gfp_t gfp_mask)
+{
+	struct page *capture_page = 0;
+	unsigned long flags;
+	int classzone_idx = zone_idx(preferred_zone);
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	while (!list_empty(capture_list)) {
+		struct page *page;
+		int pg_order;
+
+		page = lru_to_page(capture_list);
+		list_del(&page->lru);
+		pg_order = page_order(page);
+
+		/* This page is being allocated, clear our buddy info. */
+		rmv_page_order(page);
+		page->buddy_free = 0;
+		ClearPageBuddyCapture(page);
+
+		if (!capture_page && pg_order >= order) {
+			__carve_off(page, pg_order, order);
+			capture_page = page;
+		} else
+			__free_one_page(page, zone, pg_order);
+	}
+
+	/* Ensure that this capture would not violate the watermarks. */
+	if (capture_page &&
+	    (!zone_cpuset_ok(zone, alloc_flags, gfp_mask) ||
+	     !zone_alloc_ok(zone, order, classzone_idx,
+					     alloc_flags, gfp_mask))) {
+		__free_one_page(capture_page, zone, order);
+		capture_page = NULL;
+	}
+
+	if (capture_page)
+		__count_zone_vm_events(PGALLOC, zone, 1 << order);
+
+	zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
+	zone->pages_scanned = 0;
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	if (capture_page)
+		prep_new_page(capture_page, order, gfp_mask);
+
+	return capture_page;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9a29901..c9d99ff 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -53,6 +53,8 @@ struct scan_control {
 	/* This context's GFP mask */
 	gfp_t gfp_mask;
 
+	int alloc_flags;
+
 	int may_writepage;
 
 	/* Can pages be swapped as part of reclaim? */
@@ -78,6 +80,12 @@ struct scan_control {
 			unsigned long *scanned, int order, int mode,
 			struct zone *z, struct mem_cgroup *mem_cont,
 			int active);
+
+	/* Captured page. */
+	struct page **capture;
+	
+	/* Nodemask for acceptable allocations. */
+	nodemask_t *nodemask;
 };
 
 #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
@@ -454,7 +462,8 @@ cannot_free:
 /*
  * shrink_page_list() returns the number of reclaimed pages
  */
-static unsigned long shrink_page_list(struct list_head *page_list,
+static unsigned long shrink_page_list(struct list_head *free_list,
+					struct list_head *page_list,
 					struct scan_control *sc,
 					enum pageout_io sync_writeback)
 {
@@ -607,8 +616,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 free_it:
 		unlock_page(page);
 		nr_reclaimed++;
-		if (!pagevec_add(&freed_pvec, page))
-			__pagevec_release_nonlru(&freed_pvec);
+		if (free_list) {
+			if (put_page_testzero(page))
+				capture_one_page(free_list,
+						page_zone(page), page, 0);
+		} else {
+			if (!pagevec_add(&freed_pvec, page))
+				__pagevec_release_nonlru(&freed_pvec);
+		}
+
 		continue;
 
 activate_locked:
@@ -813,8 +829,8 @@ static unsigned long clear_active_flags(struct list_head *page_list)
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
-static unsigned long shrink_inactive_list(unsigned long max_scan,
-				struct zone *zone, struct scan_control *sc)
+static unsigned long shrink_inactive_list(struct list_head *free_list,
+	unsigned long max_scan, struct zone *zone, struct scan_control *sc)
 {
 	LIST_HEAD(page_list);
 	struct pagevec pvec;
@@ -848,7 +864,8 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 		spin_unlock_irq(&zone->lru_lock);
 
 		nr_scanned += nr_scan;
-		nr_freed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
+		nr_freed = shrink_page_list(free_list, &page_list,
+							sc, PAGEOUT_IO_ASYNC);
 
 		/*
 		 * If we are direct reclaiming for contiguous pages and we do
@@ -867,8 +884,8 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 			nr_active = clear_active_flags(&page_list);
 			count_vm_events(PGDEACTIVATE, nr_active);
 
-			nr_freed += shrink_page_list(&page_list, sc,
-							PAGEOUT_IO_SYNC);
+			nr_freed += shrink_page_list(free_list, &page_list,
+							sc, PAGEOUT_IO_SYNC);
 		}
 
 		nr_reclaimed += nr_freed;
@@ -1168,13 +1185,30 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
  * This is a basic per-zone page freer.  Used by both kswapd and direct reclaim.
  */
 static unsigned long shrink_zone(int priority, struct zone *zone,
-				struct scan_control *sc)
+			struct zone *preferred_zone, struct scan_control *sc)
 {
 	unsigned long nr_active;
 	unsigned long nr_inactive;
 	unsigned long nr_to_scan;
 	unsigned long nr_reclaimed = 0;
 
+	struct list_head __capture_list;
+	struct list_head *capture_list = NULL;
+	struct page *capture_page;
+
+	/*
+	 * When direct reclaimers are asking for larger orders
+	 * capture pages for them.  There is no point if we already
+	 * have an acceptable page or if this zone is not within the
+	 * nodemask.
+	 */
+	if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
+	    sc->capture && !*(sc->capture) && (sc->nodemask == NULL ||
+	    node_isset(zone_to_nid(zone), *sc->nodemask))) {
+		capture_list = &__capture_list;
+		INIT_LIST_HEAD(capture_list);
+	}
+
 	if (scan_global_lru(sc)) {
 		/*
 		 * Add one to nr_to_scan just to make sure that the kernel
@@ -1208,6 +1242,7 @@ static unsigned long shrink_zone(int priority, struct zone *zone,
 					zone, priority);
 	}
 
+	capture_page = NULL;
 
 	while (nr_active || nr_inactive) {
 		if (nr_active) {
@@ -1221,11 +1256,22 @@ static unsigned long shrink_zone(int priority, struct zone *zone,
 			nr_to_scan = min(nr_inactive,
 					(unsigned long)sc->swap_cluster_max);
 			nr_inactive -= nr_to_scan;
-			nr_reclaimed += shrink_inactive_list(nr_to_scan, zone,
-								sc);
+			nr_reclaimed += shrink_inactive_list(capture_list,
+							nr_to_scan, zone, sc);
+		}
+
+		if (capture_list) {
+			capture_page = capture_alloc_or_return(zone,
+				preferred_zone, capture_list, sc->order,
+				sc->alloc_flags, sc->gfp_mask);
+			if (capture_page)
+				capture_list = NULL;
 		}
 	}
 
+	if (capture_page)
+		*(sc->capture) = capture_page;
+
 	throttle_vm_writeout(sc->gfp_mask);
 	return nr_reclaimed;
 }
@@ -1247,7 +1293,7 @@ static unsigned long shrink_zone(int priority, struct zone *zone,
  * scan then give up on it.
  */
 static unsigned long shrink_zones(int priority, struct zonelist *zonelist,
-					struct scan_control *sc)
+		struct zone *preferred_zone, struct scan_control *sc)
 {
 	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
 	unsigned long nr_reclaimed = 0;
@@ -1281,7 +1327,7 @@ static unsigned long shrink_zones(int priority, struct zonelist *zonelist,
 							priority);
 		}
 
-		nr_reclaimed += shrink_zone(priority, zone, sc);
+		nr_reclaimed += shrink_zone(priority, zone, preferred_zone, sc);
 	}
 
 	return nr_reclaimed;
@@ -1314,8 +1360,14 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 	unsigned long lru_pages = 0;
 	struct zoneref *z;
 	struct zone *zone;
+	struct zone *preferred_zone;
 	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
 
+	/* This should never fail as we should be scanning a real zonelist. */
+	(void)first_zones_zonelist(zonelist, high_zoneidx, sc->nodemask,
+							&preferred_zone);
+	BUG_ON(!preferred_zone);
+
 	if (scan_global_lru(sc))
 		count_vm_event(ALLOCSTALL);
 	/*
@@ -1336,7 +1388,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 		sc->nr_scanned = 0;
 		if (!priority)
 			disable_swap_token();
-		nr_reclaimed += shrink_zones(priority, zonelist, sc);
+		nr_reclaimed += shrink_zones(priority, zonelist,
+							preferred_zone, sc);
 		/*
 		 * Don't shrink slabs when reclaiming memory from
 		 * over limit cgroups
@@ -1399,11 +1452,13 @@ out:
 	return ret;
 }
 
-unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
-								gfp_t gfp_mask)
+unsigned long try_to_free_pages_alloc(struct page **capture_pagep,
+		struct zonelist *zonelist, nodemask_t *nodemask,
+		int order, gfp_t gfp_mask, int alloc_flags)
 {
 	struct scan_control sc = {
 		.gfp_mask = gfp_mask,
+		.alloc_flags = alloc_flags,
 		.may_writepage = !laptop_mode,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.may_swap = 1,
@@ -1411,17 +1466,28 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 		.order = order,
 		.mem_cgroup = NULL,
 		.isolate_pages = isolate_pages_global,
+		.capture = capture_pagep,
+		.nodemask = nodemask,
 	};
 
 	return do_try_to_free_pages(zonelist, &sc);
 }
 
+unsigned long try_to_free_pages(struct zonelist *zonelist,
+						int order, gfp_t gfp_mask)
+{
+	return try_to_free_pages_alloc(NULL, zonelist, NULL,
+							order, gfp_mask, 0);
+}
+
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
 unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 						gfp_t gfp_mask)
 {
 	struct scan_control sc = {
+		.gfp_mask = gfp_mask,
+		.alloc_flags = 0,
 		.may_writepage = !laptop_mode,
 		.may_swap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
@@ -1429,6 +1495,8 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 		.order = 0,
 		.mem_cgroup = mem_cont,
 		.isolate_pages = mem_cgroup_isolate_pages,
+		.capture = NULL,
+		.nodemask = NULL,
 	};
 	struct zonelist *zonelist;
 
@@ -1470,12 +1538,15 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
+		.alloc_flags = 0,
 		.may_swap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = vm_swappiness,
 		.order = order,
 		.mem_cgroup = NULL,
 		.isolate_pages = isolate_pages_global,
+		.capture = NULL,
+		.nodemask = NULL,
 	};
 	/*
 	 * temp_priority is used to remember the scanning priority at which
@@ -1564,7 +1635,8 @@ loop_again:
 			 */
 			if (!zone_watermark_ok(zone, order, 8*zone->pages_high,
 						end_zone, 0))
-				nr_reclaimed += shrink_zone(priority, zone, &sc);
+				nr_reclaimed += shrink_zone(priority,
+							zone, zone, &sc);
 			reclaim_state->reclaimed_slab = 0;
 			nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
 						lru_pages);
@@ -1762,7 +1834,7 @@ static unsigned long shrink_all_zones(unsigned long nr_pages, int prio,
 			zone->nr_scan_inactive = 0;
 			nr_to_scan = min(nr_pages,
 				zone_page_state(zone, NR_INACTIVE));
-			ret += shrink_inactive_list(nr_to_scan, zone, sc);
+			ret += shrink_inactive_list(NULL, nr_to_scan, zone, sc);
 			if (ret >= nr_pages)
 				return ret;
 		}
@@ -1792,6 +1864,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 	struct reclaim_state reclaim_state;
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
+		.alloc_flags = 0,
 		.may_swap = 0,
 		.swap_cluster_max = nr_pages,
 		.may_writepage = 1,
@@ -1980,6 +2053,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		.swap_cluster_max = max_t(unsigned long, nr_pages,
 					SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,
+		.alloc_flags = 0,
 		.swappiness = vm_swappiness,
 		.isolate_pages = isolate_pages_global,
 	};
@@ -2006,7 +2080,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		priority = ZONE_RECLAIM_PRIORITY;
 		do {
 			note_zone_scanning_priority(zone, priority);
-			nr_reclaimed += shrink_zone(priority, zone, &sc);
+			nr_reclaimed += shrink_zone(priority, zone, zone, &sc);
 			priority--;
 		} while (priority >= 0 && nr_reclaimed < nr_pages);
 	}
-- 
1.5.6.1.201.g3e7d3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] pull out zone cpuset and watermark checks for reuse
  2008-07-01 17:58 ` [PATCH 2/4] pull out zone cpuset and watermark checks " Andy Whitcroft
@ 2008-07-02  8:06   ` KOSAKI Motohiro
  0 siblings, 0 replies; 15+ messages in thread
From: KOSAKI Motohiro @ 2008-07-02  8:06 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: kosaki.motohiro, linux-mm, linux-kernel, Mel Gorman

Hi Andy,

this is nit.


> +/*
> + * Return 1 if this zone is an acceptable source given the cpuset
> + * constraints.
> + */
> +static inline int zone_cpuset_ok(struct zone *zone,
> +					int alloc_flags, gfp_t gfp_mask)
> +{
> +	if ((alloc_flags & ALLOC_CPUSET) &&
> +	    !cpuset_zone_allowed_softwall(zone, gfp_mask))
> +		return 0;
> +	return 1;
> +}

zone_cpuset_ok() seems cpuset sanity check.
but it is "allocatable?" check.

in addition, "ok" is slightly vague name, IMHO.


> +/*
> + * Return 1 if this zone is within the watermarks specified by the
> + * allocation flags.
> + */
> +static inline int zone_alloc_ok(struct zone *zone, int order,
> +			int classzone_idx, int alloc_flags, gfp_t gfp_mask)
> +{
> +	if (!(alloc_flags & ALLOC_NO_WATERMARKS)) {
> +		unsigned long mark;
> +		if (alloc_flags & ALLOC_WMARK_MIN)
> +			mark = zone->pages_min;
> +		else if (alloc_flags & ALLOC_WMARK_LOW)
> +			mark = zone->pages_low;
> +		else
> +			mark = zone->pages_high;
> +		if (!zone_watermark_ok(zone, order, mark,
> +			    classzone_idx, alloc_flags)) {
> +			if (!zone_reclaim_mode ||
> +					!zone_reclaim(zone, gfp_mask, order))
> +				return 0;
> +		}
> +	}
> +	return 1;
> +}

zone_alloc_ok() seems check "allocatable? or not".
So, I like zone_reclaim() go away from its function.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] capture pages freed during direct reclaim for allocation by the reclaimer
  2008-07-01 17:58 ` [PATCH 4/4] capture pages freed during direct reclaim for allocation by the reclaimer Andy Whitcroft
@ 2008-07-02 12:01   ` KOSAKI Motohiro
  2008-07-02 14:44     ` Andy Whitcroft
  0 siblings, 1 reply; 15+ messages in thread
From: KOSAKI Motohiro @ 2008-07-02 12:01 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: kosaki.motohiro, linux-mm, linux-kernel, Mel Gorman

Hi Andy,

I feel this is interesting patch.

but I'm worry about it become increase OOM occur.
What do you think?

and, Why don't you make patch against -mm tree?


> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d73e1e1..1ac703d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -410,6 +410,51 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>   * -- wli
>   */
>  
> +static inline void __capture_one_page(struct list_head *capture_list,
> +		struct page *page, struct zone *zone, unsigned int order)
> +{
> +	unsigned long page_idx;
> +	unsigned long order_size = 1UL << order;
> +
> +	if (unlikely(PageCompound(page)))
> +		destroy_compound_page(page, order);
> +
> +	page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
> +
> +	VM_BUG_ON(page_idx & (order_size - 1));
> +	VM_BUG_ON(bad_range(zone, page));
> +
> +	while (order < MAX_ORDER-1) {
> +		unsigned long combined_idx;
> +		struct page *buddy;
> +
> +		buddy = __page_find_buddy(page, page_idx, order);
> +		if (!page_is_buddy(page, buddy, order))
> +			break;
> +
> +		/* Our buddy is free, merge with it and move up one order. */
> +		list_del(&buddy->lru);
> +		if (PageBuddyCapture(buddy)) {
> +			buddy->buddy_free = 0;
> +			__ClearPageBuddyCapture(buddy);
> +		} else {
> +			zone->free_area[order].nr_free--;
> +			__mod_zone_page_state(zone,
> +					NR_FREE_PAGES, -(1UL << order));
> +		}
> +		rmv_page_order(buddy);
> +		combined_idx = __find_combined_index(page_idx, order);
> +		page = page + (combined_idx - page_idx);
> +		page_idx = combined_idx;
> +		order++;
> +	}
> +	set_page_order(page, order);
> +	__SetPageBuddyCapture(page);
> +	page->buddy_free = capture_list;
> +
> +	list_add(&page->lru, capture_list);
> +}

if we already have enough size page, 
shoudn't we release page to buddy list?

otherwise, increase oom risk.
or, Am I misunderstanding?


>  static inline void __free_one_page(struct page *page,
>  		struct zone *zone, unsigned int order)
>  {
> @@ -433,6 +478,12 @@ static inline void __free_one_page(struct page *page,
>  		buddy = __page_find_buddy(page, page_idx, order);
>  		if (!page_is_buddy(page, buddy, order))
>  			break;
> +		if (PageBuddyCapture(buddy)) {
> +			__mod_zone_page_state(zone,
> +					NR_FREE_PAGES, -(1UL << order));
> +			return __capture_one_page(buddy->buddy_free,
> +							page, zone, order);
> +		}

shouldn't you make captured page's zonestat?
otherwise, administrator can't trouble shooting.


>  	/* Can pages be swapped as part of reclaim? */
> @@ -78,6 +80,12 @@ struct scan_control {
>  			unsigned long *scanned, int order, int mode,
>  			struct zone *z, struct mem_cgroup *mem_cont,
>  			int active);
> +
> +	/* Captured page. */
> +	struct page **capture;
> +	
> +	/* Nodemask for acceptable allocations. */
> +	nodemask_t *nodemask;
>  };

please more long comment.
anybody think about scan_control is reclaim purpose structure.
So, probably they think "Why is this member needed?".




> @@ -1314,8 +1360,14 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>  	unsigned long lru_pages = 0;
>  	struct zoneref *z;
>  	struct zone *zone;
> +	struct zone *preferred_zone;
>  	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
>  
> +	/* This should never fail as we should be scanning a real zonelist. */
> +	(void)first_zones_zonelist(zonelist, high_zoneidx, sc->nodemask,
> +							&preferred_zone);

nit.
(void) is unnecessary.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] capture pages freed during direct reclaim for allocation by the reclaimer
  2008-07-02 12:01   ` KOSAKI Motohiro
@ 2008-07-02 14:44     ` Andy Whitcroft
  0 siblings, 0 replies; 15+ messages in thread
From: Andy Whitcroft @ 2008-07-02 14:44 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-mm, linux-kernel, Mel Gorman

On Wed, Jul 02, 2008 at 09:01:59PM +0900, KOSAKI Motohiro wrote:
> Hi Andy,
> 
> I feel this is interesting patch.
> 
> but I'm worry about it become increase OOM occur.
> What do you think?

We do hold onto some nearly free pages for a while longer but only in
direct reclaim, assuming kswapd is running its pages should not get
captured.  I am pushing our machines in test pretty hard, to the
unusable stage mostly without OOM'ing but that is still an artifical
test.  The amount of memory under capture is proportional to the size of
the allocations at the time of capture so one would hope this would only
be significant at very high orders.

> and, Why don't you make patch against -mm tree?

That is historical mostly as there was major churn in the same place when
I was originally making these patches, plus -mm was not bootable on any
of my test systems..  I am not sure if that is still true.  I will have
a look at a recent -mm and see if they will rebase and boot.

Thanks for looking.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] buddy: explicitly identify buddy field use in struct page
  2008-10-01 12:31 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
@ 2008-10-02  7:06   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-10-02  7:06 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-mm, linux-kernel, KOSAKI Motohiro, Peter Zijlstra,
	Christoph Lameter, Rik van Riel, Mel Gorman, Nick Piggin,
	Andrew Morton

On Wed,  1 Oct 2008 13:31:00 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> Explicitly define the struct page fields which buddy uses when it owns
> pages.  Defines a new anonymous struct to allow additional fields to
> be defined in a later patch.
> 
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: Christoph Lameter <cl@linux-foundation.org>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] buddy: explicitly identify buddy field use in struct page
  2008-10-01 12:30 [PATCH 0/4] Reclaim page capture v4 Andy Whitcroft
@ 2008-10-01 12:31 ` Andy Whitcroft
  2008-10-02  7:06   ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 15+ messages in thread
From: Andy Whitcroft @ 2008-10-01 12:31 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, KOSAKI Motohiro, Peter Zijlstra, Christoph Lameter,
	Rik van Riel, Mel Gorman, Andy Whitcroft, Nick Piggin,
	Andrew Morton

Explicitly define the struct page fields which buddy uses when it owns
pages.  Defines a new anonymous struct to allow additional fields to
be defined in a later patch.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
---
 include/linux/mm_types.h |    3 +++
 mm/internal.h            |    2 +-
 mm/page_alloc.c          |    4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 995c588..906d8e0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -70,6 +70,9 @@ struct page {
 #endif
 	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
 	    struct page *first_page;	/* Compound tail pages */
+	    struct {
+		unsigned long buddy_order;     /* buddy: free page order */
+	    };
 	};
 	union {
 		pgoff_t index;		/* Our offset within mapping. */
diff --git a/mm/internal.h b/mm/internal.h
index c0e4859..fcedcd0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -58,7 +58,7 @@ extern void __free_pages_bootmem(struct page *page, unsigned int order);
 static inline unsigned long page_order(struct page *page)
 {
 	VM_BUG_ON(!PageBuddy(page));
-	return page_private(page);
+	return page->buddy_order;
 }
 
 extern int mlock_vma_pages_range(struct vm_area_struct *vma,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 921c435..3a646e3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -331,7 +331,7 @@ static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)
 
 static inline void set_page_order(struct page *page, int order)
 {
-	set_page_private(page, order);
+	page->buddy_order = order;
 	__SetPageBuddy(page);
 #ifdef CONFIG_PAGE_OWNER
 		page->order = -1;
@@ -341,7 +341,7 @@ static inline void set_page_order(struct page *page, int order)
 static inline void rmv_page_order(struct page *page)
 {
 	__ClearPageBuddy(page);
-	set_page_private(page, 0);
+	page->buddy_order = 0;
 }
 
 /*
-- 
1.6.0.1.451.gc8d31

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] buddy: explicitly identify buddy field use in struct page
  2008-09-05 10:19 [PATCH 0/4] Reclaim page capture v3 Andy Whitcroft
@ 2008-09-05 10:20 ` Andy Whitcroft
  0 siblings, 0 replies; 15+ messages in thread
From: Andy Whitcroft @ 2008-09-05 10:20 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, KOSAKI Motohiro, Peter Zijlstra, Christoph Lameter,
	Rik van Riel, Mel Gorman, Andy Whitcroft

Explicitly define the struct page fields which buddy uses when it owns
pages.  Defines a new anonymous struct to allow additional fields to
be defined in a later patch.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
---
 include/linux/mm_types.h |    3 +++
 mm/internal.h            |    2 +-
 mm/page_alloc.c          |    4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 995c588..906d8e0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -70,6 +70,9 @@ struct page {
 #endif
 	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
 	    struct page *first_page;	/* Compound tail pages */
+	    struct {
+		unsigned long buddy_order;     /* buddy: free page order */
+	    };
 	};
 	union {
 		pgoff_t index;		/* Our offset within mapping. */
diff --git a/mm/internal.h b/mm/internal.h
index c0e4859..fcedcd0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -58,7 +58,7 @@ extern void __free_pages_bootmem(struct page *page, unsigned int order);
 static inline unsigned long page_order(struct page *page)
 {
 	VM_BUG_ON(!PageBuddy(page));
-	return page_private(page);
+	return page->buddy_order;
 }
 
 extern int mlock_vma_pages_range(struct vm_area_struct *vma,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2c3874e..db0dbd6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -331,7 +331,7 @@ static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)
 
 static inline void set_page_order(struct page *page, int order)
 {
-	set_page_private(page, order);
+	page->buddy_order = order;
 	__SetPageBuddy(page);
 #ifdef CONFIG_PAGE_OWNER
 		page->order = -1;
@@ -341,7 +341,7 @@ static inline void set_page_order(struct page *page, int order)
 static inline void rmv_page_order(struct page *page)
 {
 	__ClearPageBuddy(page);
-	set_page_private(page, 0);
+	page->buddy_order = 0;
 }
 
 /*
-- 
1.6.0.rc1.258.g80295

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] buddy: explicitly identify buddy field use in struct page
  2008-09-03 18:44 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
  2008-09-03 20:36   ` Christoph Lameter
  2008-09-04  1:25   ` Rik van Riel
@ 2008-09-05  1:52   ` KOSAKI Motohiro
  2 siblings, 0 replies; 15+ messages in thread
From: KOSAKI Motohiro @ 2008-09-05  1:52 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: kosaki.motohiro, linux-mm, linux-kernel, Mel Gorman

> Explicitly define the struct page fields which buddy uses when it owns
> pages.  Defines a new anonymous struct to allow additional fields to
> be defined in a later patch.
> 
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] buddy: explicitly identify buddy field use in struct page
  2008-09-03 18:44 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
  2008-09-03 20:36   ` Christoph Lameter
@ 2008-09-04  1:25   ` Rik van Riel
  2008-09-05  1:52   ` KOSAKI Motohiro
  2 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-09-04  1:25 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro, Mel Gorman

On Wed,  3 Sep 2008 19:44:11 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> Explicitly define the struct page fields which buddy uses when it owns
> pages.  Defines a new anonymous struct to allow additional fields to
> be defined in a later patch.
> 
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] buddy: explicitly identify buddy field use in struct page
  2008-09-03 18:44 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
@ 2008-09-03 20:36   ` Christoph Lameter
  2008-09-04  1:25   ` Rik van Riel
  2008-09-05  1:52   ` KOSAKI Motohiro
  2 siblings, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2008-09-03 20:36 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro, Mel Gorman

Andy Whitcroft wrote:
> Explicitly define the struct page fields which buddy uses when it owns
> pages.  Defines a new anonymous struct to allow additional fields to
> be defined in a later patch.

Good. I have a similar patch floating around.

Reviewed-by: Christoph Lameter <cl@linux-foundation.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] buddy: explicitly identify buddy field use in struct page
  2008-09-03 18:44 [RFC PATCH 0/4] Reclaim page capture v2 Andy Whitcroft
@ 2008-09-03 18:44 ` Andy Whitcroft
  2008-09-03 20:36   ` Christoph Lameter
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Andy Whitcroft @ 2008-09-03 18:44 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, KOSAKI Motohiro, Mel Gorman, Andy Whitcroft

Explicitly define the struct page fields which buddy uses when it owns
pages.  Defines a new anonymous struct to allow additional fields to
be defined in a later patch.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 include/linux/mm_types.h |    3 +++
 mm/internal.h            |    2 +-
 mm/page_alloc.c          |    4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 995c588..906d8e0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -70,6 +70,9 @@ struct page {
 #endif
 	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
 	    struct page *first_page;	/* Compound tail pages */
+	    struct {
+		unsigned long buddy_order;     /* buddy: free page order */
+	    };
 	};
 	union {
 		pgoff_t index;		/* Our offset within mapping. */
diff --git a/mm/internal.h b/mm/internal.h
index c0e4859..fcedcd0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -58,7 +58,7 @@ extern void __free_pages_bootmem(struct page *page, unsigned int order);
 static inline unsigned long page_order(struct page *page)
 {
 	VM_BUG_ON(!PageBuddy(page));
-	return page_private(page);
+	return page->buddy_order;
 }
 
 extern int mlock_vma_pages_range(struct vm_area_struct *vma,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2c3874e..db0dbd6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -331,7 +331,7 @@ static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)
 
 static inline void set_page_order(struct page *page, int order)
 {
-	set_page_private(page, order);
+	page->buddy_order = order;
 	__SetPageBuddy(page);
 #ifdef CONFIG_PAGE_OWNER
 		page->order = -1;
@@ -341,7 +341,7 @@ static inline void set_page_order(struct page *page, int order)
 static inline void rmv_page_order(struct page *page)
 {
 	__ClearPageBuddy(page);
-	set_page_private(page, 0);
+	page->buddy_order = 0;
 }
 
 /*
-- 
1.6.0.rc1.258.g80295

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-10-02  7:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-01 17:58 [RFC PATCH 0/4] Reclaim page capture v1 Andy Whitcroft
2008-07-01 17:58 ` [PATCH 1/4] pull out the page pre-release and sanity check logic for reuse Andy Whitcroft
2008-07-01 17:58 ` [PATCH 2/4] pull out zone cpuset and watermark checks " Andy Whitcroft
2008-07-02  8:06   ` KOSAKI Motohiro
2008-07-01 17:58 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
2008-07-01 17:58 ` [PATCH 4/4] capture pages freed during direct reclaim for allocation by the reclaimer Andy Whitcroft
2008-07-02 12:01   ` KOSAKI Motohiro
2008-07-02 14:44     ` Andy Whitcroft
2008-09-03 18:44 [RFC PATCH 0/4] Reclaim page capture v2 Andy Whitcroft
2008-09-03 18:44 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
2008-09-03 20:36   ` Christoph Lameter
2008-09-04  1:25   ` Rik van Riel
2008-09-05  1:52   ` KOSAKI Motohiro
2008-09-05 10:19 [PATCH 0/4] Reclaim page capture v3 Andy Whitcroft
2008-09-05 10:20 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
2008-10-01 12:30 [PATCH 0/4] Reclaim page capture v4 Andy Whitcroft
2008-10-01 12:31 ` [PATCH 3/4] buddy: explicitly identify buddy field use in struct page Andy Whitcroft
2008-10-02  7:06   ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox