* [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type
2009-08-18 11:15 [RFC PATCH 0/3] Reduce searching in the page allocator fast-path Mel Gorman
@ 2009-08-18 11:16 ` Mel Gorman
2009-08-18 11:43 ` Nick Piggin
2009-08-18 22:57 ` Vincent Li
2009-08-18 11:16 ` [PATCH 2/3] page-allocoator: Maintain rolling count of pages to free from the PCP Mel Gorman
` (2 subsequent siblings)
3 siblings, 2 replies; 20+ messages in thread
From: Mel Gorman @ 2009-08-18 11:16 UTC (permalink / raw)
To: Linux Memory Management List
Cc: Christoph Lameter, Nick Piggin, Linux Kernel Mailing List, Mel Gorman
Currently the per-cpu page allocator searches the PCP list for pages of the
correct migrate-type to reduce the possibility of pages being inappropriate
placed from a fragmentation perspective. This search is potentially expensive
in a fast-path and undesirable. Splitting the per-cpu list into multiple
lists increases the size of a per-cpu structure and this was potentially
a major problem at the time the search was introduced. These problem has
been mitigated as now only the necessary number of structures is allocated
for the running system.
This patch replaces a list search in the per-cpu allocator with one list per
migrate type. The potential snag with this approach is when bulk freeing
pages. We round-robin free pages based on migrate type which has little
bearing on the cache hotness of the page and potentially checks empty lists
repeatedly in the event the majority of PCP pages are of one type.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
include/linux/mmzone.h | 5 ++-
mm/page_alloc.c | 106 ++++++++++++++++++++++++++---------------------
2 files changed, 63 insertions(+), 48 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9c50309..6e0b624 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -38,6 +38,7 @@
#define MIGRATE_UNMOVABLE 0
#define MIGRATE_RECLAIMABLE 1
#define MIGRATE_MOVABLE 2
+#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */
#define MIGRATE_RESERVE 3
#define MIGRATE_ISOLATE 4 /* can't allocate from here */
#define MIGRATE_TYPES 5
@@ -169,7 +170,9 @@ struct per_cpu_pages {
int count; /* number of pages in the list */
int high; /* high watermark, emptying needed */
int batch; /* chunk size for buddy add/remove */
- struct list_head list; /* the list of pages */
+
+ /* Lists of pages, one per migrate type stored on the pcp-lists */
+ struct list_head lists[MIGRATE_PCPTYPES];
};
struct per_cpu_pageset {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e5baa9..a06ddf0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
}
/*
- * Frees a list of pages.
+ * Frees a number of pages from the PCP lists
* Assumes all pages on list are in same zone, and of same order.
* count is the number of pages to free.
*
@@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
* And clear the zone's pages_scanned counter, to hold off the "all pages are
* pinned" detection logic.
*/
-static void free_pages_bulk(struct zone *zone, int count,
- struct list_head *list, int order)
+static void free_pcppages_bulk(struct zone *zone, int count,
+ struct per_cpu_pages *pcp)
{
+ int migratetype = 0;
+
spin_lock(&zone->lock);
zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
zone->pages_scanned = 0;
- __mod_zone_page_state(zone, NR_FREE_PAGES, count << order);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, count);
while (count--) {
struct page *page;
+ struct list_head *list;
+
+ /*
+ * Remove pages from lists in a round-robin fashion. This spinning
+ * around potentially empty lists is bloody awful, alternatives that
+ * don't suck are welcome
+ */
+ do {
+ if (++migratetype == MIGRATE_PCPTYPES)
+ migratetype = 0;
+ list = &pcp->lists[migratetype];
+ } while (list_empty(list));
- VM_BUG_ON(list_empty(list));
page = list_entry(list->prev, struct page, lru);
/* have to delete it as __free_one_page list manipulates */
list_del(&page->lru);
- trace_mm_page_pcpu_drain(page, order, page_private(page));
- __free_one_page(page, zone, order, page_private(page));
+ trace_mm_page_pcpu_drain(page, 0, migratetype);
+ __free_one_page(page, zone, 0, migratetype);
}
spin_unlock(&zone->lock);
}
@@ -974,7 +987,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
to_drain = pcp->batch;
else
to_drain = pcp->count;
- free_pages_bulk(zone, to_drain, &pcp->list, 0);
+ free_pcppages_bulk(zone, to_drain, pcp);
pcp->count -= to_drain;
local_irq_restore(flags);
}
@@ -1000,7 +1013,7 @@ static void drain_pages(unsigned int cpu)
pcp = &pset->pcp;
local_irq_save(flags);
- free_pages_bulk(zone, pcp->count, &pcp->list, 0);
+ free_pcppages_bulk(zone, pcp->count, pcp);
pcp->count = 0;
local_irq_restore(flags);
}
@@ -1066,6 +1079,7 @@ static void free_hot_cold_page(struct page *page, int cold)
struct zone *zone = page_zone(page);
struct per_cpu_pages *pcp;
unsigned long flags;
+ int migratetype;
int wasMlocked = __TestClearPageMlocked(page);
kmemcheck_free_shadow(page, 0);
@@ -1083,21 +1097,39 @@ static void free_hot_cold_page(struct page *page, int cold)
kernel_map_pages(page, 1, 0);
pcp = &zone_pcp(zone, get_cpu())->pcp;
- set_page_private(page, get_pageblock_migratetype(page));
+ migratetype = get_pageblock_migratetype(page);
+ set_page_private(page, migratetype);
local_irq_save(flags);
if (unlikely(wasMlocked))
free_page_mlock(page);
__count_vm_event(PGFREE);
+ /*
+ * We only track unreclaimable, reclaimable and movable on pcp lists.
+ * Free ISOLATE pages back to the allocator because they are being
+ * offlined but treat RESERVE as movable pages so we can get those
+ * areas back if necessary. Otherwise, we may have to free
+ * excessively into the page allocator
+ */
+ if (migratetype >= MIGRATE_PCPTYPES) {
+ if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+ free_one_page(zone, page, 0, migratetype);
+ goto out;
+ }
+ migratetype = MIGRATE_MOVABLE;
+ }
+
if (cold)
- list_add_tail(&page->lru, &pcp->list);
+ list_add_tail(&page->lru, &pcp->lists[migratetype]);
else
- list_add(&page->lru, &pcp->list);
+ list_add(&page->lru, &pcp->lists[migratetype]);
pcp->count++;
if (pcp->count >= pcp->high) {
- free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
+ free_pcppages_bulk(zone, pcp->batch, pcp);
pcp->count -= pcp->batch;
}
+
+out:
local_irq_restore(flags);
put_cpu();
}
@@ -1155,46 +1187,24 @@ again:
cpu = get_cpu();
if (likely(order == 0)) {
struct per_cpu_pages *pcp;
+ struct list_head *list;
pcp = &zone_pcp(zone, cpu)->pcp;
+ list = &pcp->lists[migratetype];
local_irq_save(flags);
- if (!pcp->count) {
- pcp->count = rmqueue_bulk(zone, 0,
- pcp->batch, &pcp->list,
- migratetype, cold);
- if (unlikely(!pcp->count))
- goto failed;
- }
-
- /* Find a page of the appropriate migrate type */
- if (cold) {
- list_for_each_entry_reverse(page, &pcp->list, lru)
- if (page_private(page) == migratetype)
- break;
- } else {
- list_for_each_entry(page, &pcp->list, lru)
- if (page_private(page) == migratetype)
- break;
- }
-
- /* Allocate more to the pcp list if necessary */
- if (unlikely(&page->lru == &pcp->list)) {
- int get_one_page = 0;
-
+ if (list_empty(list)) {
pcp->count += rmqueue_bulk(zone, 0,
- pcp->batch, &pcp->list,
+ pcp->batch, list,
migratetype, cold);
- list_for_each_entry(page, &pcp->list, lru) {
- if (get_pageblock_migratetype(page) !=
- MIGRATE_ISOLATE) {
- get_one_page = 1;
- break;
- }
- }
- if (!get_one_page)
+ if (unlikely(list_empty(list)))
goto failed;
}
+ if (cold)
+ page = list_entry(list->prev, struct page, lru);
+ else
+ page = list_entry(list->next, struct page, lru);
+
list_del(&page->lru);
pcp->count--;
} else {
@@ -3033,6 +3043,7 @@ static int zone_batchsize(struct zone *zone)
static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
{
struct per_cpu_pages *pcp;
+ int migratetype;
memset(p, 0, sizeof(*p));
@@ -3040,7 +3051,8 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
pcp->count = 0;
pcp->high = 6 * batch;
pcp->batch = max(1UL, 1 * batch);
- INIT_LIST_HEAD(&pcp->list);
+ for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
+ INIT_LIST_HEAD(&pcp->lists[migratetype]);
}
/*
@@ -3232,7 +3244,7 @@ static int __zone_pcp_update(void *data)
pcp = &pset->pcp;
local_irq_save(flags);
- free_pages_bulk(zone, pcp->count, &pcp->list, 0);
+ free_pcppages_bulk(zone, pcp->count, pcp);
setup_pageset(pset, batch);
local_irq_restore(flags);
}
--
1.6.3.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type
2009-08-18 11:16 ` [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type Mel Gorman
@ 2009-08-18 11:43 ` Nick Piggin
2009-08-18 13:10 ` Mel Gorman
2009-08-18 22:57 ` Vincent Li
1 sibling, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2009-08-18 11:43 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux Memory Management List, Christoph Lameter,
Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 12:16:00PM +0100, Mel Gorman wrote:
> Currently the per-cpu page allocator searches the PCP list for pages of the
> correct migrate-type to reduce the possibility of pages being inappropriate
> placed from a fragmentation perspective. This search is potentially expensive
> in a fast-path and undesirable. Splitting the per-cpu list into multiple
> lists increases the size of a per-cpu structure and this was potentially
> a major problem at the time the search was introduced. These problem has
> been mitigated as now only the necessary number of structures is allocated
> for the running system.
>
> This patch replaces a list search in the per-cpu allocator with one list per
> migrate type. The potential snag with this approach is when bulk freeing
> pages. We round-robin free pages based on migrate type which has little
> bearing on the cache hotness of the page and potentially checks empty lists
> repeatedly in the event the majority of PCP pages are of one type.
Seems OK I guess. Trading off icache and branches for dcache and
algorithmic gains. Too bad everything is always a tradeoff ;)
But no I think this is a good idea.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
> include/linux/mmzone.h | 5 ++-
> mm/page_alloc.c | 106 ++++++++++++++++++++++++++---------------------
> 2 files changed, 63 insertions(+), 48 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 9c50309..6e0b624 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -38,6 +38,7 @@
> #define MIGRATE_UNMOVABLE 0
> #define MIGRATE_RECLAIMABLE 1
> #define MIGRATE_MOVABLE 2
> +#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */
> #define MIGRATE_RESERVE 3
> #define MIGRATE_ISOLATE 4 /* can't allocate from here */
> #define MIGRATE_TYPES 5
> @@ -169,7 +170,9 @@ struct per_cpu_pages {
> int count; /* number of pages in the list */
> int high; /* high watermark, emptying needed */
> int batch; /* chunk size for buddy add/remove */
> - struct list_head list; /* the list of pages */
> +
> + /* Lists of pages, one per migrate type stored on the pcp-lists */
> + struct list_head lists[MIGRATE_PCPTYPES];
> };
>
> struct per_cpu_pageset {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0e5baa9..a06ddf0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
> }
>
> /*
> - * Frees a list of pages.
> + * Frees a number of pages from the PCP lists
> * Assumes all pages on list are in same zone, and of same order.
> * count is the number of pages to free.
> *
> @@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
> * And clear the zone's pages_scanned counter, to hold off the "all pages are
> * pinned" detection logic.
> */
> -static void free_pages_bulk(struct zone *zone, int count,
> - struct list_head *list, int order)
> +static void free_pcppages_bulk(struct zone *zone, int count,
> + struct per_cpu_pages *pcp)
> {
> + int migratetype = 0;
> +
> spin_lock(&zone->lock);
> zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
> zone->pages_scanned = 0;
>
> - __mod_zone_page_state(zone, NR_FREE_PAGES, count << order);
> + __mod_zone_page_state(zone, NR_FREE_PAGES, count);
> while (count--) {
> struct page *page;
> + struct list_head *list;
> +
> + /*
> + * Remove pages from lists in a round-robin fashion. This spinning
> + * around potentially empty lists is bloody awful, alternatives that
> + * don't suck are welcome
> + */
> + do {
> + if (++migratetype == MIGRATE_PCPTYPES)
> + migratetype = 0;
> + list = &pcp->lists[migratetype];
> + } while (list_empty(list));
>
> - VM_BUG_ON(list_empty(list));
> page = list_entry(list->prev, struct page, lru);
> /* have to delete it as __free_one_page list manipulates */
> list_del(&page->lru);
> - trace_mm_page_pcpu_drain(page, order, page_private(page));
> - __free_one_page(page, zone, order, page_private(page));
> + trace_mm_page_pcpu_drain(page, 0, migratetype);
> + __free_one_page(page, zone, 0, migratetype);
> }
> spin_unlock(&zone->lock);
> }
> @@ -974,7 +987,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
> to_drain = pcp->batch;
> else
> to_drain = pcp->count;
> - free_pages_bulk(zone, to_drain, &pcp->list, 0);
> + free_pcppages_bulk(zone, to_drain, pcp);
> pcp->count -= to_drain;
> local_irq_restore(flags);
> }
> @@ -1000,7 +1013,7 @@ static void drain_pages(unsigned int cpu)
>
> pcp = &pset->pcp;
> local_irq_save(flags);
> - free_pages_bulk(zone, pcp->count, &pcp->list, 0);
> + free_pcppages_bulk(zone, pcp->count, pcp);
> pcp->count = 0;
> local_irq_restore(flags);
> }
> @@ -1066,6 +1079,7 @@ static void free_hot_cold_page(struct page *page, int cold)
> struct zone *zone = page_zone(page);
> struct per_cpu_pages *pcp;
> unsigned long flags;
> + int migratetype;
> int wasMlocked = __TestClearPageMlocked(page);
>
> kmemcheck_free_shadow(page, 0);
> @@ -1083,21 +1097,39 @@ static void free_hot_cold_page(struct page *page, int cold)
> kernel_map_pages(page, 1, 0);
>
> pcp = &zone_pcp(zone, get_cpu())->pcp;
> - set_page_private(page, get_pageblock_migratetype(page));
> + migratetype = get_pageblock_migratetype(page);
> + set_page_private(page, migratetype);
> local_irq_save(flags);
> if (unlikely(wasMlocked))
> free_page_mlock(page);
> __count_vm_event(PGFREE);
>
> + /*
> + * We only track unreclaimable, reclaimable and movable on pcp lists.
> + * Free ISOLATE pages back to the allocator because they are being
> + * offlined but treat RESERVE as movable pages so we can get those
> + * areas back if necessary. Otherwise, we may have to free
> + * excessively into the page allocator
> + */
> + if (migratetype >= MIGRATE_PCPTYPES) {
> + if (unlikely(migratetype == MIGRATE_ISOLATE)) {
> + free_one_page(zone, page, 0, migratetype);
> + goto out;
> + }
> + migratetype = MIGRATE_MOVABLE;
> + }
> +
> if (cold)
> - list_add_tail(&page->lru, &pcp->list);
> + list_add_tail(&page->lru, &pcp->lists[migratetype]);
> else
> - list_add(&page->lru, &pcp->list);
> + list_add(&page->lru, &pcp->lists[migratetype]);
> pcp->count++;
> if (pcp->count >= pcp->high) {
> - free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
> + free_pcppages_bulk(zone, pcp->batch, pcp);
> pcp->count -= pcp->batch;
> }
> +
> +out:
> local_irq_restore(flags);
> put_cpu();
> }
> @@ -1155,46 +1187,24 @@ again:
> cpu = get_cpu();
> if (likely(order == 0)) {
> struct per_cpu_pages *pcp;
> + struct list_head *list;
>
> pcp = &zone_pcp(zone, cpu)->pcp;
> + list = &pcp->lists[migratetype];
> local_irq_save(flags);
> - if (!pcp->count) {
> - pcp->count = rmqueue_bulk(zone, 0,
> - pcp->batch, &pcp->list,
> - migratetype, cold);
> - if (unlikely(!pcp->count))
> - goto failed;
> - }
> -
> - /* Find a page of the appropriate migrate type */
> - if (cold) {
> - list_for_each_entry_reverse(page, &pcp->list, lru)
> - if (page_private(page) == migratetype)
> - break;
> - } else {
> - list_for_each_entry(page, &pcp->list, lru)
> - if (page_private(page) == migratetype)
> - break;
> - }
> -
> - /* Allocate more to the pcp list if necessary */
> - if (unlikely(&page->lru == &pcp->list)) {
> - int get_one_page = 0;
> -
> + if (list_empty(list)) {
> pcp->count += rmqueue_bulk(zone, 0,
> - pcp->batch, &pcp->list,
> + pcp->batch, list,
> migratetype, cold);
> - list_for_each_entry(page, &pcp->list, lru) {
> - if (get_pageblock_migratetype(page) !=
> - MIGRATE_ISOLATE) {
> - get_one_page = 1;
> - break;
> - }
> - }
> - if (!get_one_page)
> + if (unlikely(list_empty(list)))
> goto failed;
> }
>
> + if (cold)
> + page = list_entry(list->prev, struct page, lru);
> + else
> + page = list_entry(list->next, struct page, lru);
> +
> list_del(&page->lru);
> pcp->count--;
> } else {
> @@ -3033,6 +3043,7 @@ static int zone_batchsize(struct zone *zone)
> static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> {
> struct per_cpu_pages *pcp;
> + int migratetype;
>
> memset(p, 0, sizeof(*p));
>
> @@ -3040,7 +3051,8 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> pcp->count = 0;
> pcp->high = 6 * batch;
> pcp->batch = max(1UL, 1 * batch);
> - INIT_LIST_HEAD(&pcp->list);
> + for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
> + INIT_LIST_HEAD(&pcp->lists[migratetype]);
> }
>
> /*
> @@ -3232,7 +3244,7 @@ static int __zone_pcp_update(void *data)
> pcp = &pset->pcp;
>
> local_irq_save(flags);
> - free_pages_bulk(zone, pcp->count, &pcp->list, 0);
> + free_pcppages_bulk(zone, pcp->count, pcp);
> setup_pageset(pset, batch);
> local_irq_restore(flags);
> }
> --
> 1.6.3.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type
2009-08-18 11:43 ` Nick Piggin
@ 2009-08-18 13:10 ` Mel Gorman
2009-08-18 13:12 ` Nick Piggin
0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2009-08-18 13:10 UTC (permalink / raw)
To: Nick Piggin
Cc: Linux Memory Management List, Christoph Lameter,
Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 01:43:35PM +0200, Nick Piggin wrote:
> On Tue, Aug 18, 2009 at 12:16:00PM +0100, Mel Gorman wrote:
> > Currently the per-cpu page allocator searches the PCP list for pages of the
> > correct migrate-type to reduce the possibility of pages being inappropriate
> > placed from a fragmentation perspective. This search is potentially expensive
> > in a fast-path and undesirable. Splitting the per-cpu list into multiple
> > lists increases the size of a per-cpu structure and this was potentially
> > a major problem at the time the search was introduced. These problem has
> > been mitigated as now only the necessary number of structures is allocated
> > for the running system.
> >
> > This patch replaces a list search in the per-cpu allocator with one list per
> > migrate type. The potential snag with this approach is when bulk freeing
> > pages. We round-robin free pages based on migrate type which has little
> > bearing on the cache hotness of the page and potentially checks empty lists
> > repeatedly in the event the majority of PCP pages are of one type.
>
> Seems OK I guess. Trading off icache and branches for dcache and
> algorithmic gains. Too bad everything is always a tradeoff ;)
>
Tell me about it. The dcache overhead of this is a problem although I
tried to limit the damage using pahole to see how much padding I had to
play with and staying within it where possible.
> But no I think this is a good idea.
>
Thanks. Is that an Ack?
> > <SNIP>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type
2009-08-18 13:10 ` Mel Gorman
@ 2009-08-18 13:12 ` Nick Piggin
0 siblings, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2009-08-18 13:12 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux Memory Management List, Christoph Lameter,
Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 02:10:24PM +0100, Mel Gorman wrote:
> On Tue, Aug 18, 2009 at 01:43:35PM +0200, Nick Piggin wrote:
> > On Tue, Aug 18, 2009 at 12:16:00PM +0100, Mel Gorman wrote:
> Tell me about it. The dcache overhead of this is a problem although I
> tried to limit the damage using pahole to see how much padding I had to
> play with and staying within it where possible.
>
> > But no I think this is a good idea.
> >
>
> Thanks. Is that an Ack?
Sure, your numbers seem OK. I don't know if there is much more you
can do without having it merged somewhere...
Acked-by: Nick Piggin <npiggin@suse.de>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type
2009-08-18 11:16 ` [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-08-18 11:43 ` Nick Piggin
@ 2009-08-18 22:57 ` Vincent Li
2009-08-19 8:57 ` Mel Gorman
1 sibling, 1 reply; 20+ messages in thread
From: Vincent Li @ 2009-08-18 22:57 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
Linux Kernel Mailing List
On Tue, 18 Aug 2009, Mel Gorman wrote:
> + /*
> + * We only track unreclaimable, reclaimable and movable on pcp lists.
^^^^^^^^^^^^^
Is it unmovable? I don't see unreclaimable migrate type on pcp lists.
Just ask to make sure I undsterstand the comment right.
> + * Free ISOLATE pages back to the allocator because they are being
> + * offlined but treat RESERVE as movable pages so we can get those
> + * areas back if necessary. Otherwise, we may have to free
> + * excessively into the page allocator
> + */
> + if (migratetype >= MIGRATE_PCPTYPES) {
> + if (unlikely(migratetype == MIGRATE_ISOLATE)) {
> + free_one_page(zone, page, 0, migratetype);
> + goto out;
> + }
> + migratetype = MIGRATE_MOVABLE;
> + }
> +
Vincent Li
Biomedical Research Center
University of British Columbia
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type
2009-08-18 22:57 ` Vincent Li
@ 2009-08-19 8:57 ` Mel Gorman
0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2009-08-19 8:57 UTC (permalink / raw)
To: Vincent Li
Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 03:57:00PM -0700, Vincent Li wrote:
> On Tue, 18 Aug 2009, Mel Gorman wrote:
>
> > + /*
> > + * We only track unreclaimable, reclaimable and movable on pcp lists.
> ^^^^^^^^^^^^^
> Is it unmovable? I don't see unreclaimable migrate type on pcp lists.
> Just ask to make sure I undsterstand the comment right.
>
It should have said unmovable. Sorry
> > + * Free ISOLATE pages back to the allocator because they are being
> > + * offlined but treat RESERVE as movable pages so we can get those
> > + * areas back if necessary. Otherwise, we may have to free
> > + * excessively into the page allocator
> > + */
> > + if (migratetype >= MIGRATE_PCPTYPES) {
> > + if (unlikely(migratetype == MIGRATE_ISOLATE)) {
> > + free_one_page(zone, page, 0, migratetype);
> > + goto out;
> > + }
> > + migratetype = MIGRATE_MOVABLE;
> > + }
> > +
>
> Vincent Li
> Biomedical Research Center
> University of British Columbia
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/3] page-allocoator: Maintain rolling count of pages to free from the PCP
2009-08-18 11:15 [RFC PATCH 0/3] Reduce searching in the page allocator fast-path Mel Gorman
2009-08-18 11:16 ` [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type Mel Gorman
@ 2009-08-18 11:16 ` Mel Gorman
2009-08-18 11:16 ` [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone Mel Gorman
2009-08-18 14:22 ` [RFC PATCH 0/3] Reduce searching in the page allocator fast-path Christoph Lameter
3 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2009-08-18 11:16 UTC (permalink / raw)
To: Linux Memory Management List
Cc: Christoph Lameter, Nick Piggin, Linux Kernel Mailing List, Mel Gorman
When round-robin freeing pages from the PCP lists, empty lists may be
encountered. In the event one of the lists has more pages than another,
there may be numerous checks for list_empty() which is undesirable. This
patch maintains a count of pages to free which is incremented when empty
lists are encountered. The intention is that more pages will then be freed
from fuller lists than the empty ones reducing the number of empty list
checks in the free path.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/page_alloc.c | 23 ++++++++++++++---------
1 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a06ddf0..dd3f306 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -536,32 +536,37 @@ static void free_pcppages_bulk(struct zone *zone, int count,
struct per_cpu_pages *pcp)
{
int migratetype = 0;
+ int batch_free = 0;
spin_lock(&zone->lock);
zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
zone->pages_scanned = 0;
__mod_zone_page_state(zone, NR_FREE_PAGES, count);
- while (count--) {
+ while (count) {
struct page *page;
struct list_head *list;
/*
- * Remove pages from lists in a round-robin fashion. This spinning
- * around potentially empty lists is bloody awful, alternatives that
- * don't suck are welcome
+ * Remove pages from lists in a round-robin fashion. A batch_free
+ * count is maintained that is incremented when an empty list is
+ * encountered. This is so more pages are freed off fuller lists
+ * instead of spinning excessively around empty lists
*/
do {
+ batch_free++;
if (++migratetype == MIGRATE_PCPTYPES)
migratetype = 0;
list = &pcp->lists[migratetype];
} while (list_empty(list));
- page = list_entry(list->prev, struct page, lru);
- /* have to delete it as __free_one_page list manipulates */
- list_del(&page->lru);
- trace_mm_page_pcpu_drain(page, 0, migratetype);
- __free_one_page(page, zone, 0, migratetype);
+ do {
+ page = list_entry(list->prev, struct page, lru);
+ /* must delete as __free_one_page list manipulates */
+ list_del(&page->lru);
+ __free_one_page(page, zone, 0, migratetype);
+ trace_mm_page_pcpu_drain(page, 0, migratetype);
+ } while (--count && --batch_free && !list_empty(list));
}
spin_unlock(&zone->lock);
}
--
1.6.3.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone
2009-08-18 11:15 [RFC PATCH 0/3] Reduce searching in the page allocator fast-path Mel Gorman
2009-08-18 11:16 ` [PATCH 1/3] page-allocator: Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-08-18 11:16 ` [PATCH 2/3] page-allocoator: Maintain rolling count of pages to free from the PCP Mel Gorman
@ 2009-08-18 11:16 ` Mel Gorman
2009-08-18 11:47 ` Nick Piggin
2009-08-18 14:18 ` Christoph Lameter
2009-08-18 14:22 ` [RFC PATCH 0/3] Reduce searching in the page allocator fast-path Christoph Lameter
3 siblings, 2 replies; 20+ messages in thread
From: Mel Gorman @ 2009-08-18 11:16 UTC (permalink / raw)
To: Linux Memory Management List
Cc: Christoph Lameter, Nick Piggin, Linux Kernel Mailing List, Mel Gorman
Having multiple lists per PCPU increased the size of the per-pcpu
structure. Two of the fields, high and batch, do not change within a
zone making that information redundant. This patch moves those fields
off the PCP and onto the zone to reduce the size of the PCPU.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
include/linux/mmzone.h | 9 +++++----
mm/page_alloc.c | 47 +++++++++++++++++++++++++----------------------
mm/vmstat.c | 4 ++--
3 files changed, 32 insertions(+), 28 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6e0b624..57a3ef0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -167,12 +167,10 @@ enum zone_watermarks {
#define high_wmark_pages(z) (z->watermark[WMARK_HIGH])
struct per_cpu_pages {
- int count; /* number of pages in the list */
- int high; /* high watermark, emptying needed */
- int batch; /* chunk size for buddy add/remove */
-
/* Lists of pages, one per migrate type stored on the pcp-lists */
struct list_head lists[MIGRATE_PCPTYPES];
+
+ int count; /* number of pages in the list */
};
struct per_cpu_pageset {
@@ -284,6 +282,9 @@ struct zone {
/* zone watermarks, access with *_wmark_pages(zone) macros */
unsigned long watermark[NR_WMARK];
+ int pcp_high; /* high watermark, emptying needed */
+ int pcp_batch; /* chunk size for buddy add/remove */
+
/*
* We don't know if the memory that we're going to allocate will be freeable
* or/and it will be released eventually, so to avoid totally wasting several
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dd3f306..65cdfbf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -988,8 +988,8 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
int to_drain;
local_irq_save(flags);
- if (pcp->count >= pcp->batch)
- to_drain = pcp->batch;
+ if (pcp->count >= zone->pcp_batch)
+ to_drain = zone->pcp_batch;
else
to_drain = pcp->count;
free_pcppages_bulk(zone, to_drain, pcp);
@@ -1129,9 +1129,9 @@ static void free_hot_cold_page(struct page *page, int cold)
else
list_add(&page->lru, &pcp->lists[migratetype]);
pcp->count++;
- if (pcp->count >= pcp->high) {
- free_pcppages_bulk(zone, pcp->batch, pcp);
- pcp->count -= pcp->batch;
+ if (pcp->count >= zone->pcp_high) {
+ free_pcppages_bulk(zone, zone->pcp_batch, pcp);
+ pcp->count -= zone->pcp_batch;
}
out:
@@ -1199,7 +1199,7 @@ again:
local_irq_save(flags);
if (list_empty(list)) {
pcp->count += rmqueue_bulk(zone, 0,
- pcp->batch, list,
+ zone->pcp_batch, list,
migratetype, cold);
if (unlikely(list_empty(list)))
goto failed;
@@ -2178,8 +2178,8 @@ void show_free_areas(void)
pageset = zone_pcp(zone, cpu);
printk("CPU %4d: hi:%5d, btch:%4d usd:%4d\n",
- cpu, pageset->pcp.high,
- pageset->pcp.batch, pageset->pcp.count);
+ cpu, zone->pcp_high,
+ zone->pcp_batch, pageset->pcp.count);
}
}
@@ -3045,7 +3045,9 @@ static int zone_batchsize(struct zone *zone)
#endif
}
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+static void setup_pageset(struct zone *zone,
+ struct per_cpu_pageset *p,
+ unsigned long batch)
{
struct per_cpu_pages *pcp;
int migratetype;
@@ -3054,8 +3056,8 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
pcp = &p->pcp;
pcp->count = 0;
- pcp->high = 6 * batch;
- pcp->batch = max(1UL, 1 * batch);
+ zone->pcp_high = 6 * batch;
+ zone->pcp_batch = max(1UL, 1 * batch);
for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
INIT_LIST_HEAD(&pcp->lists[migratetype]);
}
@@ -3065,16 +3067,17 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
* to the value high for the pageset p.
*/
-static void setup_pagelist_highmark(struct per_cpu_pageset *p,
+static void setup_pagelist_highmark(struct zone *zone,
+ struct per_cpu_pageset *p,
unsigned long high)
{
struct per_cpu_pages *pcp;
pcp = &p->pcp;
- pcp->high = high;
- pcp->batch = max(1UL, high/4);
+ zone->pcp_high = high;
+ zone->pcp_batch = max(1UL, high/4);
if ((high/4) > (PAGE_SHIFT * 8))
- pcp->batch = PAGE_SHIFT * 8;
+ zone->pcp_batch = PAGE_SHIFT * 8;
}
@@ -3115,10 +3118,10 @@ static int __cpuinit process_zones(int cpu)
if (!zone_pcp(zone, cpu))
goto bad;
- setup_pageset(zone_pcp(zone, cpu), zone_batchsize(zone));
+ setup_pageset(zone, zone_pcp(zone, cpu), zone_batchsize(zone));
if (percpu_pagelist_fraction)
- setup_pagelist_highmark(zone_pcp(zone, cpu),
+ setup_pagelist_highmark(zone, zone_pcp(zone, cpu),
(zone->present_pages / percpu_pagelist_fraction));
}
@@ -3250,7 +3253,7 @@ static int __zone_pcp_update(void *data)
local_irq_save(flags);
free_pcppages_bulk(zone, pcp->count, pcp);
- setup_pageset(pset, batch);
+ setup_pageset(zone, pset, batch);
local_irq_restore(flags);
}
return 0;
@@ -3270,9 +3273,9 @@ static __meminit void zone_pcp_init(struct zone *zone)
#ifdef CONFIG_NUMA
/* Early boot. Slab allocator not functional yet */
zone_pcp(zone, cpu) = &boot_pageset[cpu];
- setup_pageset(&boot_pageset[cpu],0);
+ setup_pageset(zone, &boot_pageset[cpu],0);
#else
- setup_pageset(zone_pcp(zone,cpu), batch);
+ setup_pageset(zone, zone_pcp(zone,cpu), batch);
#endif
}
if (zone->present_pages)
@@ -4781,7 +4784,7 @@ int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
}
/*
- * percpu_pagelist_fraction - changes the pcp->high for each zone on each
+ * percpu_pagelist_fraction - changes the zone->pcp_high for each zone on each
* cpu. It is the fraction of total pages in each zone that a hot per cpu pagelist
* can have before it gets flushed back to buddy allocator.
*/
@@ -4800,7 +4803,7 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
for_each_online_cpu(cpu) {
unsigned long high;
high = zone->present_pages / percpu_pagelist_fraction;
- setup_pagelist_highmark(zone_pcp(zone, cpu), high);
+ setup_pagelist_highmark(zone, zone_pcp(zone, cpu), high);
}
}
return 0;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c81321f..a9d23c3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -746,8 +746,8 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
"\n batch: %i",
i,
pageset->pcp.count,
- pageset->pcp.high,
- pageset->pcp.batch);
+ zone->pcp_high,
+ zone->pcp_batch);
#ifdef CONFIG_SMP
seq_printf(m, "\n vm stats threshold: %d",
pageset->stat_threshold);
--
1.6.3.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone
2009-08-18 11:16 ` [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone Mel Gorman
@ 2009-08-18 11:47 ` Nick Piggin
2009-08-18 12:57 ` Mel Gorman
2009-08-18 14:18 ` Christoph Lameter
1 sibling, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2009-08-18 11:47 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux Memory Management List, Christoph Lameter,
Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 12:16:02PM +0100, Mel Gorman wrote:
> Having multiple lists per PCPU increased the size of the per-pcpu
> structure. Two of the fields, high and batch, do not change within a
> zone making that information redundant. This patch moves those fields
> off the PCP and onto the zone to reduce the size of the PCPU.
Hmm.. I did have some patches a long long time ago that among other
things made the lists larger for the local node only....
But I guess if something like that is ever shown to be a good idea
then we can go back to the old scheme. So yeah this seems OK.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
> include/linux/mmzone.h | 9 +++++----
> mm/page_alloc.c | 47 +++++++++++++++++++++++++----------------------
> mm/vmstat.c | 4 ++--
> 3 files changed, 32 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6e0b624..57a3ef0 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -167,12 +167,10 @@ enum zone_watermarks {
> #define high_wmark_pages(z) (z->watermark[WMARK_HIGH])
>
> struct per_cpu_pages {
> - int count; /* number of pages in the list */
> - int high; /* high watermark, emptying needed */
> - int batch; /* chunk size for buddy add/remove */
> -
> /* Lists of pages, one per migrate type stored on the pcp-lists */
> struct list_head lists[MIGRATE_PCPTYPES];
> +
> + int count; /* number of pages in the list */
> };
>
> struct per_cpu_pageset {
> @@ -284,6 +282,9 @@ struct zone {
> /* zone watermarks, access with *_wmark_pages(zone) macros */
> unsigned long watermark[NR_WMARK];
>
> + int pcp_high; /* high watermark, emptying needed */
> + int pcp_batch; /* chunk size for buddy add/remove */
> +
> /*
> * We don't know if the memory that we're going to allocate will be freeable
> * or/and it will be released eventually, so to avoid totally wasting several
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dd3f306..65cdfbf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -988,8 +988,8 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
> int to_drain;
>
> local_irq_save(flags);
> - if (pcp->count >= pcp->batch)
> - to_drain = pcp->batch;
> + if (pcp->count >= zone->pcp_batch)
> + to_drain = zone->pcp_batch;
> else
> to_drain = pcp->count;
> free_pcppages_bulk(zone, to_drain, pcp);
> @@ -1129,9 +1129,9 @@ static void free_hot_cold_page(struct page *page, int cold)
> else
> list_add(&page->lru, &pcp->lists[migratetype]);
> pcp->count++;
> - if (pcp->count >= pcp->high) {
> - free_pcppages_bulk(zone, pcp->batch, pcp);
> - pcp->count -= pcp->batch;
> + if (pcp->count >= zone->pcp_high) {
> + free_pcppages_bulk(zone, zone->pcp_batch, pcp);
> + pcp->count -= zone->pcp_batch;
> }
>
> out:
> @@ -1199,7 +1199,7 @@ again:
> local_irq_save(flags);
> if (list_empty(list)) {
> pcp->count += rmqueue_bulk(zone, 0,
> - pcp->batch, list,
> + zone->pcp_batch, list,
> migratetype, cold);
> if (unlikely(list_empty(list)))
> goto failed;
> @@ -2178,8 +2178,8 @@ void show_free_areas(void)
> pageset = zone_pcp(zone, cpu);
>
> printk("CPU %4d: hi:%5d, btch:%4d usd:%4d\n",
> - cpu, pageset->pcp.high,
> - pageset->pcp.batch, pageset->pcp.count);
> + cpu, zone->pcp_high,
> + zone->pcp_batch, pageset->pcp.count);
> }
> }
>
> @@ -3045,7 +3045,9 @@ static int zone_batchsize(struct zone *zone)
> #endif
> }
>
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> +static void setup_pageset(struct zone *zone,
> + struct per_cpu_pageset *p,
> + unsigned long batch)
> {
> struct per_cpu_pages *pcp;
> int migratetype;
> @@ -3054,8 +3056,8 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
>
> pcp = &p->pcp;
> pcp->count = 0;
> - pcp->high = 6 * batch;
> - pcp->batch = max(1UL, 1 * batch);
> + zone->pcp_high = 6 * batch;
> + zone->pcp_batch = max(1UL, 1 * batch);
> for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
> INIT_LIST_HEAD(&pcp->lists[migratetype]);
> }
> @@ -3065,16 +3067,17 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> * to the value high for the pageset p.
> */
>
> -static void setup_pagelist_highmark(struct per_cpu_pageset *p,
> +static void setup_pagelist_highmark(struct zone *zone,
> + struct per_cpu_pageset *p,
> unsigned long high)
> {
> struct per_cpu_pages *pcp;
>
> pcp = &p->pcp;
> - pcp->high = high;
> - pcp->batch = max(1UL, high/4);
> + zone->pcp_high = high;
> + zone->pcp_batch = max(1UL, high/4);
> if ((high/4) > (PAGE_SHIFT * 8))
> - pcp->batch = PAGE_SHIFT * 8;
> + zone->pcp_batch = PAGE_SHIFT * 8;
> }
>
>
> @@ -3115,10 +3118,10 @@ static int __cpuinit process_zones(int cpu)
> if (!zone_pcp(zone, cpu))
> goto bad;
>
> - setup_pageset(zone_pcp(zone, cpu), zone_batchsize(zone));
> + setup_pageset(zone, zone_pcp(zone, cpu), zone_batchsize(zone));
>
> if (percpu_pagelist_fraction)
> - setup_pagelist_highmark(zone_pcp(zone, cpu),
> + setup_pagelist_highmark(zone, zone_pcp(zone, cpu),
> (zone->present_pages / percpu_pagelist_fraction));
> }
>
> @@ -3250,7 +3253,7 @@ static int __zone_pcp_update(void *data)
>
> local_irq_save(flags);
> free_pcppages_bulk(zone, pcp->count, pcp);
> - setup_pageset(pset, batch);
> + setup_pageset(zone, pset, batch);
> local_irq_restore(flags);
> }
> return 0;
> @@ -3270,9 +3273,9 @@ static __meminit void zone_pcp_init(struct zone *zone)
> #ifdef CONFIG_NUMA
> /* Early boot. Slab allocator not functional yet */
> zone_pcp(zone, cpu) = &boot_pageset[cpu];
> - setup_pageset(&boot_pageset[cpu],0);
> + setup_pageset(zone, &boot_pageset[cpu],0);
> #else
> - setup_pageset(zone_pcp(zone,cpu), batch);
> + setup_pageset(zone, zone_pcp(zone,cpu), batch);
> #endif
> }
> if (zone->present_pages)
> @@ -4781,7 +4784,7 @@ int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
> }
>
> /*
> - * percpu_pagelist_fraction - changes the pcp->high for each zone on each
> + * percpu_pagelist_fraction - changes the zone->pcp_high for each zone on each
> * cpu. It is the fraction of total pages in each zone that a hot per cpu pagelist
> * can have before it gets flushed back to buddy allocator.
> */
> @@ -4800,7 +4803,7 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
> for_each_online_cpu(cpu) {
> unsigned long high;
> high = zone->present_pages / percpu_pagelist_fraction;
> - setup_pagelist_highmark(zone_pcp(zone, cpu), high);
> + setup_pagelist_highmark(zone, zone_pcp(zone, cpu), high);
> }
> }
> return 0;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index c81321f..a9d23c3 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -746,8 +746,8 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
> "\n batch: %i",
> i,
> pageset->pcp.count,
> - pageset->pcp.high,
> - pageset->pcp.batch);
> + zone->pcp_high,
> + zone->pcp_batch);
> #ifdef CONFIG_SMP
> seq_printf(m, "\n vm stats threshold: %d",
> pageset->stat_threshold);
> --
> 1.6.3.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone
2009-08-18 11:47 ` Nick Piggin
@ 2009-08-18 12:57 ` Mel Gorman
0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2009-08-18 12:57 UTC (permalink / raw)
To: Nick Piggin
Cc: Linux Memory Management List, Christoph Lameter,
Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 01:47:52PM +0200, Nick Piggin wrote:
> On Tue, Aug 18, 2009 at 12:16:02PM +0100, Mel Gorman wrote:
> > Having multiple lists per PCPU increased the size of the per-pcpu
> > structure. Two of the fields, high and batch, do not change within a
> > zone making that information redundant. This patch moves those fields
> > off the PCP and onto the zone to reduce the size of the PCPU.
>
> Hmm.. I did have some patches a long long time ago that among other
> things made the lists larger for the local node only....
>
To reduce the remote node lists, one could look at applying some fixed factor
to the high value or basing remote lists on some percentage of high.
> But I guess if something like that is ever shown to be a good idea
> then we can go back to the old scheme. So yeah this seems OK.
>
Thanks.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> > include/linux/mmzone.h | 9 +++++----
> > mm/page_alloc.c | 47 +++++++++++++++++++++++++----------------------
> > mm/vmstat.c | 4 ++--
> > 3 files changed, 32 insertions(+), 28 deletions(-)
> >
> > <SNIP>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone
2009-08-18 11:16 ` [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone Mel Gorman
2009-08-18 11:47 ` Nick Piggin
@ 2009-08-18 14:18 ` Christoph Lameter
2009-08-18 16:42 ` Mel Gorman
1 sibling, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2009-08-18 14:18 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux Memory Management List, Nick Piggin, Linux Kernel Mailing List
This will increase the cache footprint for the hot code path. Could these
new variable be moved next to zone fields that are already in use there?
The pageset array is used f.e.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone
2009-08-18 14:18 ` Christoph Lameter
@ 2009-08-18 16:42 ` Mel Gorman
2009-08-18 17:56 ` Christoph Lameter
0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2009-08-18 16:42 UTC (permalink / raw)
To: Christoph Lameter
Cc: Linux Memory Management List, Nick Piggin, Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 10:18:48AM -0400, Christoph Lameter wrote:
>
> This will increase the cache footprint for the hot code path. Could these
> new variable be moved next to zone fields that are already in use there?
> The pageset array is used f.e.
>
pageset is ____cacheline_aligned_in_smp so putting pcp->high/batch near
it won't help in terms of cache footprint. This is why I located it near
watermarks because it's known they'll be needed at roughly the same time
pcp->high/batch would be normally accessed.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone
2009-08-18 16:42 ` Mel Gorman
@ 2009-08-18 17:56 ` Christoph Lameter
2009-08-18 20:50 ` Mel Gorman
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2009-08-18 17:56 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux Memory Management List, Nick Piggin, Linux Kernel Mailing List
On Tue, 18 Aug 2009, Mel Gorman wrote:
> On Tue, Aug 18, 2009 at 10:18:48AM -0400, Christoph Lameter wrote:
> >
> > This will increase the cache footprint for the hot code path. Could these
> > new variable be moved next to zone fields that are already in use there?
> > The pageset array is used f.e.
> >
>
> pageset is ____cacheline_aligned_in_smp so putting pcp->high/batch near
> it won't help in terms of cache footprint. This is why I located it near
> watermarks because it's known they'll be needed at roughly the same time
> pcp->high/batch would be normally accessed.
watermarks are not accessed from the hot code path in free_hot_cold page.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone
2009-08-18 17:56 ` Christoph Lameter
@ 2009-08-18 20:50 ` Mel Gorman
0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2009-08-18 20:50 UTC (permalink / raw)
To: Christoph Lameter
Cc: Linux Memory Management List, Nick Piggin, Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 01:56:22PM -0400, Christoph Lameter wrote:
> On Tue, 18 Aug 2009, Mel Gorman wrote:
>
> > On Tue, Aug 18, 2009 at 10:18:48AM -0400, Christoph Lameter wrote:
> > >
> > > This will increase the cache footprint for the hot code path. Could these
> > > new variable be moved next to zone fields that are already in use there?
> > > The pageset array is used f.e.
> > >
> >
> > pageset is ____cacheline_aligned_in_smp so putting pcp->high/batch near
> > it won't help in terms of cache footprint. This is why I located it near
> > watermarks because it's known they'll be needed at roughly the same time
> > pcp->high/batch would be normally accessed.
>
> watermarks are not accessed from the hot code path in free_hot_cold page.
>
They are used in a commonly-used path for allocation so there is some
advantage. Put beside pageset, there is no advantage as that structure
is already aligned to a cache-line.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] Reduce searching in the page allocator fast-path
2009-08-18 11:15 [RFC PATCH 0/3] Reduce searching in the page allocator fast-path Mel Gorman
` (2 preceding siblings ...)
2009-08-18 11:16 ` [PATCH 3/3] page-allocator: Move pcp static fields for high and batch off-pcp and onto the zone Mel Gorman
@ 2009-08-18 14:22 ` Christoph Lameter
2009-08-18 16:53 ` Mel Gorman
3 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2009-08-18 14:22 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux Memory Management List, Nick Piggin, Linux Kernel Mailing List
This could be combined with the per cpu ops patch that makes the page
allocator use alloc_percpu for its per cpu data needs. That in turn would
allow the use of per cpu atomics in the hot paths, maybe we can
get to a point where we can drop the irq disable there.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC PATCH 0/3] Reduce searching in the page allocator fast-path
2009-08-18 14:22 ` [RFC PATCH 0/3] Reduce searching in the page allocator fast-path Christoph Lameter
@ 2009-08-18 16:53 ` Mel Gorman
2009-08-18 19:05 ` Christoph Lameter
0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2009-08-18 16:53 UTC (permalink / raw)
To: Christoph Lameter
Cc: Linux Memory Management List, Nick Piggin, Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 10:22:01AM -0400, Christoph Lameter wrote:
>
> This could be combined with the per cpu ops patch that makes the page
> allocator use alloc_percpu for its per cpu data needs. That in turn would
> allow the use of per cpu atomics in the hot paths, maybe we can
> get to a point where we can drop the irq disable there.
>
It would appear that getting rid of IRQ disabling and using per-cpu-atomics
would be a problem independent of searching the free lists. Either would
be good, both would be better or am I missing something that makes them
mutually exclusive?
Can you point me to which patchset you are talking about specifically that
uses per-cpu atomics in the hot path? There are a lot of per-cpu patches
related to you that have been posted in the last few months and I'm not sure
what any of their merge status' is.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] Reduce searching in the page allocator fast-path
2009-08-18 16:53 ` Mel Gorman
@ 2009-08-18 19:05 ` Christoph Lameter
2009-08-19 9:08 ` Mel Gorman
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2009-08-18 19:05 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux Memory Management List, Nick Piggin, Linux Kernel Mailing List
On Tue, 18 Aug 2009, Mel Gorman wrote:
> Can you point me to which patchset you are talking about specifically that
> uses per-cpu atomics in the hot path? There are a lot of per-cpu patches
> related to you that have been posted in the last few months and I'm not sure
> what any of their merge status' is.
The following patch just moved the page allocator to use the new per cpu
allocator. It does not use per cpu atomic yet but its possible then.
http://marc.info/?l=linux-mm&m=124527414206546&w=2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] Reduce searching in the page allocator fast-path
2009-08-18 19:05 ` Christoph Lameter
@ 2009-08-19 9:08 ` Mel Gorman
2009-08-19 11:48 ` Christoph Lameter
0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2009-08-19 9:08 UTC (permalink / raw)
To: Christoph Lameter
Cc: Linux Memory Management List, Nick Piggin, Linux Kernel Mailing List
On Tue, Aug 18, 2009 at 03:05:25PM -0400, Christoph Lameter wrote:
> On Tue, 18 Aug 2009, Mel Gorman wrote:
>
> > Can you point me to which patchset you are talking about specifically that
> > uses per-cpu atomics in the hot path? There are a lot of per-cpu patches
> > related to you that have been posted in the last few months and I'm not sure
> > what any of their merge status' is.
>
> The following patch just moved the page allocator to use the new per cpu
> allocator. It does not use per cpu atomic yet but its possible then.
>
> http://marc.info/?l=linux-mm&m=124527414206546&w=2
>
Ok, I don't see this particular patch merged, is it in a merge queue somewhere?
After glancing through, I can see how it might help. I'm going to drop patch
3 of this set that shuffles data from the PCP to the zone and take a closer
look at those patches. Patch 1 and 2 of this set should still go ahead. Do
you agree?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/3] Reduce searching in the page allocator fast-path
2009-08-19 9:08 ` Mel Gorman
@ 2009-08-19 11:48 ` Christoph Lameter
0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2009-08-19 11:48 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux Memory Management List, Nick Piggin, Linux Kernel Mailing List
On Wed, 19 Aug 2009, Mel Gorman wrote:
> Ok, I don't see this particular patch merged, is it in a merge queue somewhere?
The patch depends on Tejun's work to be merged that makes the per cpu
allocator available on all platforms. I believe that is in the queue for
2.6.32.
> After glancing through, I can see how it might help. I'm going to drop patch
> 3 of this set that shuffles data from the PCP to the zone and take a closer
> look at those patches. Patch 1 and 2 of this set should still go ahead. Do
> you agree?
Yes.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread