From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from m3.gw.fujitsu.co.jp ([10.0.50.73]) by fgwmail6.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id l7V7otdr022426 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Fri, 31 Aug 2007 16:50:55 +0900 Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 209FD2AC046 for ; Fri, 31 Aug 2007 16:50:55 +0900 (JST) Received: from s9.gw.fujitsu.co.jp (s9.gw.fujitsu.co.jp [10.0.50.99]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id D98F012C10A for ; Fri, 31 Aug 2007 16:50:54 +0900 (JST) Received: from s9.gw.fujitsu.co.jp (s9 [127.0.0.1]) by s9.gw.fujitsu.co.jp (Postfix) with ESMTP id BAB8D1818028 for ; Fri, 31 Aug 2007 16:50:54 +0900 (JST) Received: from fjm506.ms.jp.fujitsu.com (fjm506.ms.jp.fujitsu.com [10.56.99.86]) by s9.gw.fujitsu.co.jp (Postfix) with ESMTP id 2239E1818027 for ; Fri, 31 Aug 2007 16:50:54 +0900 (JST) Received: from fjmscan501.ms.jp.fujitsu.com (fjmscan501.ms.jp.fujitsu.com [10.56.99.141])by fjm506.ms.jp.fujitsu.com with ESMTP id l7V7o5sR023981 for ; Fri, 31 Aug 2007 16:50:05 +0900 Date: Fri, 31 Aug 2007 16:52:25 +0900 From: KAMEZAWA Hiroyuki Subject: [RFC] patch for mulitiple lru in a zone [2/2] separate lru form zone (just for hearing advice/opinion) Message-Id: <20070831165225.3f12d7a4.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20070831164611.2c29de69.kamezawa.hiroyu@jp.fujitsu.com> References: <20070831164611.2c29de69.kamezawa.hiroyu@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" List-ID: Remove zone's LRU list from zone and add page_group struct. A base patch for prural LRU in a zone. - page_group struct is added for support LRU list of pages. - page_group struct has active_list , inactive_list and flag. - I'm now considering add following flag types. - can't reclaim - for swapless anon/shmfs and ramdisk. - locked - mlocked pages. - pages in special range - ? Pros(possible). - there may be good usage of this lru-group. lru-group is not necessary to be tied to zone, maybe. - list for "not reclaimable pages" can be built on this." - Can we do "add lru list for DMA pages and remove ZONE_DMA" ? - By reuse page->group pointer, we can remove PG_buddy. - Can this kind of structure be used for container's memory control method ? Cons. - overhead in reclaim path. - need good scheduling where pages should be reclaimed from. - increase size of page struct. This patch is just a scracth and not well tested/reviewed. just tested on i386/UP system, works well. I'm now trying to add LRU-for mlocked pages. (but not yet.) Do you have any idea around zone's LRU ? Any comment (including "Don't do that") is welcome. Signed-off-by: KAMEZAWA Hiroyuki include/linux/mm_inline.h | 26 ++--- include/linux/mmzone.h | 34 ++++++- mm/filemap.c | 4 mm/migrate.c | 10 +- mm/page_alloc.c | 5 - mm/swap.c | 94 +++++++++---------- mm/vmscan.c | 222 +++++++++++++++++++++++++++++----------------- 7 files changed, 242 insertions(+), 153 deletions(-) Index: linux-2.6.23-rc4/include/linux/mmzone.h =================================================================== --- linux-2.6.23-rc4.orig/include/linux/mmzone.h +++ linux-2.6.23-rc4/include/linux/mmzone.h @@ -188,6 +188,29 @@ enum zone_type { #endif #undef __ZONE_COUNT + +/* + * Page grouping structure for memory reclaim. + */ + +struct page_group { + struct list_head list; /* list of scan list */ + struct zone *z; /* zone of this. */ + spinlock_t lru_lock; /* lock for active/inactive list */ + struct list_head active_list; + struct list_head inactive_list; + unsigned long flags; +#define DONTRECLAIM 0x1 /* This page group just contianes not-reclaimable pages */ + atomic_t refcnt; /* not used but will be used at dynamic page group handling */ +}; + +/* for page recaliming interface */ +enum lru_type { + LRU_ACTIVE, + LRU_INACTIVE, +}; + + struct zone { /* Fields commonly accessed by the page allocator */ unsigned long pages_min, pages_low, pages_high; @@ -224,11 +247,9 @@ struct zone { ZONE_PADDING(_pad1_) - - /* Fields commonly accessed by the page reclaim scanner */ - spinlock_t lru_lock; - struct list_head active_list; - struct list_head inactive_list; + spinlock_t pg_lock; + struct page_group *pg_token; + struct page_group zone_lru;/* default lru for page reclaim */ unsigned long nr_scan_active; unsigned long nr_scan_inactive; unsigned long pages_scanned; /* since last reclaim */ @@ -612,6 +633,7 @@ extern int numa_zonelist_order_handler(s extern char numa_zonelist_order[]; #define NUMA_ZONELIST_ORDER_LEN 16 /* string buffer size */ +extern void init_page_group(struct zone *z, struct page_group *pg); #include /* Returns the number of the current Node. */ #ifndef numa_node_id Index: linux-2.6.23-rc4/include/linux/mm_inline.h =================================================================== --- linux-2.6.23-rc4.orig/include/linux/mm_inline.h +++ linux-2.6.23-rc4/include/linux/mm_inline.h @@ -1,40 +1,64 @@ static inline void -add_page_to_active_list(struct zone *zone, struct page *page) +add_page_to_active_list(struct page_group *pg, struct page *page) { - list_add(&page->lru, &zone->active_list); - __inc_zone_state(zone, NR_ACTIVE); + list_add(&page->lru, &pg->active_list); + __inc_zone_state(page_zone(page), NR_ACTIVE); } static inline void -add_page_to_inactive_list(struct zone *zone, struct page *page) +add_page_to_inactive_list(struct page_group *pg, struct page *page) { - list_add(&page->lru, &zone->inactive_list); - __inc_zone_state(zone, NR_INACTIVE); + list_add(&page->lru, &pg->inactive_list); + __inc_zone_state(page_zone(page), NR_INACTIVE); } static inline void -del_page_from_active_list(struct zone *zone, struct page *page) +del_page_from_active_list(struct page_group *pg, struct page *page) { list_del(&page->lru); - __dec_zone_state(zone, NR_ACTIVE); + __dec_zone_state(page_zone(page), NR_ACTIVE); } static inline void -del_page_from_inactive_list(struct zone *zone, struct page *page) +del_page_from_inactive_list(struct page_group *pg, struct page *page) { list_del(&page->lru); - __dec_zone_state(zone, NR_INACTIVE); + __dec_zone_state(page_zone(page), NR_INACTIVE); } static inline void -del_page_from_lru(struct zone *zone, struct page *page) +del_page_from_lru(struct page_group *pg, struct page *page) { list_del(&page->lru); if (PageActive(page)) { __ClearPageActive(page); - __dec_zone_state(zone, NR_ACTIVE); + __dec_zone_state(page_zone(page), NR_ACTIVE); } else { - __dec_zone_state(zone, NR_INACTIVE); + __dec_zone_state(page_zone(page), NR_INACTIVE); } } +/* + * Should be done under page_lock() ? + */ +static inline struct page_group * page_group(struct page *page) +{ + return page->group; +} + +static inline void set_page_group(struct page *page, struct page_group *pg) +{ + page->group = pg; +} + +/* + * Just for preparing 'dynamic page group addition/deletion + * it is not used now. + */ +static inline void get_pagegroup(struct page_group *pg) +{ +} + +static inline void put_pagegroup(struct page_group *pg) +{ +} Index: linux-2.6.23-rc4/mm/vmscan.c =================================================================== --- linux-2.6.23-rc4.orig/mm/vmscan.c +++ linux-2.6.23-rc4/mm/vmscan.c @@ -132,6 +132,19 @@ void unregister_shrinker(struct shrinker } EXPORT_SYMBOL(unregister_shrinker); + +int pagegroup_should_be_reclaimed(struct page_group *pg, enum lru_type type) +{ + if (pg->flags & DONTRECLAIM) + return 0; + if (type == LRU_ACTIVE && list_empty(&pg->active_list)) + return 0; + if (type == LRU_INACTIVE && list_empty(&pg->inactive_list)) + return 0; + return 1; +} + + #define SHRINK_BATCH 128 /* * Call the shrink functions to age shrinkable caches @@ -657,7 +670,7 @@ static int __isolate_lru_page(struct pag } /* - * zone->lru_lock is heavily contended. Some of the functions that + * page_group->lru_lock is heavily contended. Some of the functions that * shrink the lists perform better by taking out a batch of pages * and working on them outside the LRU lock. * @@ -778,6 +791,48 @@ static unsigned long clear_active_flags( return nr_active; } +static struct page_group *zone_isolate_lru(unsigned long nr_to_scan, + struct zone *zone, struct list_head *dst, enum lru_type type, + unsigned long *scanned, unsigned long *taken, int order, int mode) +{ + struct page_group *pg, *start_pg; + *taken = 0; + *scanned = 0; + + spin_lock(&zone->pg_lock); + /* do we need some intelligent scheduler ? */ + if (zone->pg_token) + pg = zone->pg_token; + else + pg = &zone->zone_lru; + start_pg = pg; + do { + if (pagegroup_should_be_reclaimed(pg, type)) + break; + pg = list_entry(pg->list.next, struct page_group, list); + } while (pg != start_pg); + get_pagegroup(pg); + spin_unlock(&zone->pg_lock); + + if (pagegroup_should_be_reclaimed(pg, type)) { + if (type == LRU_ACTIVE) + *taken = isolate_lru_pages(nr_to_scan, + &pg->active_list, dst, scanned, order, mode); + else + *taken = isolate_lru_pages(nr_to_scan, + &pg->inactive_list, dst, scanned, order, mode); + } else + pg = NULL; + + if (zone->pg_token) { + put_pagegroup(zone->pg_token); + get_pagegroup(pg); + zone->pg_token = pg; + } + return pg; +} + + /* * shrink_inactive_list() is a helper for shrink_zone(). It returns the number * of reclaimed pages @@ -789,11 +844,11 @@ static unsigned long shrink_inactive_lis struct pagevec pvec; unsigned long nr_scanned = 0; unsigned long nr_reclaimed = 0; + struct page_group *pg; pagevec_init(&pvec, 1); lru_add_drain(); - spin_lock_irq(&zone->lru_lock); do { struct page *page; unsigned long nr_taken; @@ -801,9 +856,9 @@ static unsigned long shrink_inactive_lis unsigned long nr_freed; unsigned long nr_active; - nr_taken = isolate_lru_pages(sc->swap_cluster_max, - &zone->inactive_list, - &page_list, &nr_scan, sc->order, + pg = zone_isolate_lru(sc->swap_cluster_max, + zone, &page_list, LRU_INACTIVE, + &nr_scan, &nr_taken, sc->order, (sc->order > PAGE_ALLOC_COSTLY_ORDER)? ISOLATE_BOTH : ISOLATE_INACTIVE); nr_active = clear_active_flags(&page_list); @@ -813,7 +868,6 @@ static unsigned long shrink_inactive_lis __mod_zone_page_state(zone, NR_INACTIVE, -(nr_taken - nr_active)); zone->pages_scanned += nr_scan; - spin_unlock_irq(&zone->lru_lock); nr_scanned += nr_scan; nr_freed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC); @@ -851,27 +905,27 @@ static unsigned long shrink_inactive_lis if (nr_taken == 0) goto done; - spin_lock(&zone->lru_lock); /* * Put back any unfreeable pages. */ + spin_lock(&pg->lru_lock); while (!list_empty(&page_list)) { page = lru_to_page(&page_list); VM_BUG_ON(PageLRU(page)); SetPageLRU(page); list_del(&page->lru); if (PageActive(page)) - add_page_to_active_list(zone, page); + add_page_to_active_list(pg, page); else - add_page_to_inactive_list(zone, page); + add_page_to_inactive_list(pg, page); if (!pagevec_add(&pvec, page)) { - spin_unlock_irq(&zone->lru_lock); + spin_unlock_irq(&pg->lru_lock); __pagevec_release(&pvec); - spin_lock_irq(&zone->lru_lock); + spin_lock_irq(&pg->lru_lock); } } + spin_unlock(&pg->lru_lock); } while (nr_scanned < max_scan); - spin_unlock(&zone->lru_lock); done: local_irq_enable(); pagevec_release(&pvec); @@ -926,6 +980,7 @@ static void shrink_active_list(unsigned LIST_HEAD(l_active); /* Pages to go onto the active_list */ struct page *page; struct pagevec pvec; + struct page_group *pg; int reclaim_mapped = 0; if (sc->may_swap) { @@ -975,83 +1030,85 @@ force_reclaim_mapped: reclaim_mapped = 1; } - lru_add_drain(); - spin_lock_irq(&zone->lru_lock); - pgmoved = isolate_lru_pages(nr_pages, &zone->active_list, - &l_hold, &pgscanned, sc->order, ISOLATE_ACTIVE); - zone->pages_scanned += pgscanned; - __mod_zone_page_state(zone, NR_ACTIVE, -pgmoved); - spin_unlock_irq(&zone->lru_lock); - - while (!list_empty(&l_hold)) { - cond_resched(); - page = lru_to_page(&l_hold); - list_del(&page->lru); - if (page_mapped(page)) { - if (!reclaim_mapped || - (total_swap_pages == 0 && PageAnon(page)) || - page_referenced(page, 0)) { - list_add(&page->lru, &l_active); + while (nr_pages > 0) { + lru_add_drain(); + pg = zone_isolate_lru(nr_pages, zone, &l_hold, LRU_ACTIVE, + &pgscanned, &pgmoved, sc->order, ISOLATE_ACTIVE); + zone->pages_scanned += pgscanned; + __mod_zone_page_state(zone, NR_ACTIVE, -pgmoved); + if (!pgscanned) + break; + nr_pages -= pgscanned; + while (!list_empty(&l_hold)) { + cond_resched(); + page = lru_to_page(&l_hold); + list_del(&page->lru); + if (page_mapped(page)) { + if (!reclaim_mapped || + (total_swap_pages == 0 && PageAnon(page)) || + page_referenced(page, 0)) { + list_add(&page->lru, &l_active); continue; + } } + list_add(&page->lru, &l_inactive); } - list_add(&page->lru, &l_inactive); - } - pagevec_init(&pvec, 1); - pgmoved = 0; - spin_lock_irq(&zone->lru_lock); - while (!list_empty(&l_inactive)) { - page = lru_to_page(&l_inactive); - prefetchw_prev_lru_page(page, &l_inactive, flags); - VM_BUG_ON(PageLRU(page)); - SetPageLRU(page); - VM_BUG_ON(!PageActive(page)); - ClearPageActive(page); + pagevec_init(&pvec, 1); + pgmoved = 0; + spin_lock_irq(&pg->lru_lock); + while (!list_empty(&l_inactive)) { + page = lru_to_page(&l_inactive); + prefetchw_prev_lru_page(page, &l_inactive, flags); + VM_BUG_ON(PageLRU(page)); + SetPageLRU(page); + VM_BUG_ON(!PageActive(page)); + ClearPageActive(page); - list_move(&page->lru, &zone->inactive_list); - pgmoved++; - if (!pagevec_add(&pvec, page)) { - __mod_zone_page_state(zone, NR_INACTIVE, pgmoved); - spin_unlock_irq(&zone->lru_lock); - pgdeactivate += pgmoved; - pgmoved = 0; - if (buffer_heads_over_limit) - pagevec_strip(&pvec); - __pagevec_release(&pvec); - spin_lock_irq(&zone->lru_lock); + list_move(&page->lru, &pg->inactive_list); + pgmoved++; + if (!pagevec_add(&pvec, page)) { + __mod_zone_page_state(zone, NR_INACTIVE, pgmoved); + spin_unlock_irq(&pg->lru_lock); + pgdeactivate += pgmoved; + pgmoved = 0; + if (buffer_heads_over_limit) + pagevec_strip(&pvec); + __pagevec_release(&pvec); + spin_lock_irq(&pg->lru_lock); + } + } + __mod_zone_page_state(zone, NR_INACTIVE, pgmoved); + pgdeactivate += pgmoved; + if (buffer_heads_over_limit) { + spin_unlock_irq(&pg->lru_lock); + pagevec_strip(&pvec); + spin_lock_irq(&pg->lru_lock); } - } - __mod_zone_page_state(zone, NR_INACTIVE, pgmoved); - pgdeactivate += pgmoved; - if (buffer_heads_over_limit) { - spin_unlock_irq(&zone->lru_lock); - pagevec_strip(&pvec); - spin_lock_irq(&zone->lru_lock); - } - pgmoved = 0; - while (!list_empty(&l_active)) { - page = lru_to_page(&l_active); - prefetchw_prev_lru_page(page, &l_active, flags); - VM_BUG_ON(PageLRU(page)); - SetPageLRU(page); - VM_BUG_ON(!PageActive(page)); - list_move(&page->lru, &zone->active_list); - pgmoved++; - if (!pagevec_add(&pvec, page)) { - __mod_zone_page_state(zone, NR_ACTIVE, pgmoved); - pgmoved = 0; - spin_unlock_irq(&zone->lru_lock); - __pagevec_release(&pvec); - spin_lock_irq(&zone->lru_lock); + pgmoved = 0; + while (!list_empty(&l_active)) { + page = lru_to_page(&l_active); + prefetchw_prev_lru_page(page, &l_active, flags); + VM_BUG_ON(PageLRU(page)); + SetPageLRU(page); + VM_BUG_ON(!PageActive(page)); + list_move(&page->lru, &pg->active_list); + pgmoved++; + if (!pagevec_add(&pvec, page)) { + __mod_zone_page_state(zone, NR_ACTIVE, pgmoved); + pgmoved = 0; + spin_unlock_irq(&pg->lru_lock); + __pagevec_release(&pvec); + spin_lock_irq(&pg->lru_lock); + } } - } - __mod_zone_page_state(zone, NR_ACTIVE, pgmoved); + __mod_zone_page_state(zone, NR_ACTIVE, pgmoved); - __count_zone_vm_events(PGREFILL, zone, pgscanned); - __count_vm_events(PGDEACTIVATE, pgdeactivate); - spin_unlock_irq(&zone->lru_lock); + __count_zone_vm_events(PGREFILL, zone, pgscanned); + __count_vm_events(PGDEACTIVATE, pgdeactivate); + spin_unlock_irq(&pg->lru_lock); + } pagevec_release(&pvec); } Index: linux-2.6.23-rc4/mm/swap.c =================================================================== --- linux-2.6.23-rc4.orig/mm/swap.c +++ linux-2.6.23-rc4/mm/swap.c @@ -42,13 +42,13 @@ static void fastcall __page_cache_releas { if (PageLRU(page)) { unsigned long flags; - struct zone *zone = page_zone(page); + struct page_group *pg = page_group(page); - spin_lock_irqsave(&zone->lru_lock, flags); + spin_lock_irqsave(&pg->lru_lock, flags); VM_BUG_ON(!PageLRU(page)); __ClearPageLRU(page); - del_page_from_lru(zone, page); - spin_unlock_irqrestore(&zone->lru_lock, flags); + del_page_from_lru(pg, page); + spin_unlock_irqrestore(&pg->lru_lock, flags); } free_hot_page(page); } @@ -110,7 +110,7 @@ EXPORT_SYMBOL(put_pages_list); */ int rotate_reclaimable_page(struct page *page) { - struct zone *zone; + struct page_group *pg; unsigned long flags; if (PageLocked(page)) @@ -122,15 +122,15 @@ int rotate_reclaimable_page(struct page if (!PageLRU(page)) return 1; - zone = page_zone(page); - spin_lock_irqsave(&zone->lru_lock, flags); + pg = page_group(page); + spin_lock_irqsave(&pg->lru_lock, flags); if (PageLRU(page) && !PageActive(page)) { - list_move_tail(&page->lru, &zone->inactive_list); + list_move_tail(&page->lru, &pg->inactive_list); __count_vm_event(PGROTATED); } if (!test_clear_page_writeback(page)) BUG(); - spin_unlock_irqrestore(&zone->lru_lock, flags); + spin_unlock_irqrestore(&pg->lru_lock, flags); return 0; } @@ -139,16 +139,16 @@ int rotate_reclaimable_page(struct page */ void fastcall activate_page(struct page *page) { - struct zone *zone = page_zone(page); + struct page_group *pg = page_group(page); - spin_lock_irq(&zone->lru_lock); + spin_lock_irq(&pg->lru_lock); if (PageLRU(page) && !PageActive(page)) { - del_page_from_inactive_list(zone, page); + del_page_from_inactive_list(pg, page); SetPageActive(page); - add_page_to_active_list(zone, page); + add_page_to_active_list(pg, page); __count_vm_event(PGACTIVATE); } - spin_unlock_irq(&zone->lru_lock); + spin_unlock_irq(&pg->lru_lock); } /* @@ -257,16 +257,16 @@ void release_pages(struct page **pages, { int i; struct pagevec pages_to_free; - struct zone *zone = NULL; + struct page_group *pg= NULL; pagevec_init(&pages_to_free, cold); for (i = 0; i < nr; i++) { struct page *page = pages[i]; if (unlikely(PageCompound(page))) { - if (zone) { - spin_unlock_irq(&zone->lru_lock); - zone = NULL; + if (pg) { + spin_unlock_irq(&pg->lru_lock); + pg = NULL; } put_compound_page(page); continue; @@ -276,29 +276,29 @@ void release_pages(struct page **pages, continue; if (PageLRU(page)) { - struct zone *pagezone = page_zone(page); - if (pagezone != zone) { - if (zone) - spin_unlock_irq(&zone->lru_lock); - zone = pagezone; - spin_lock_irq(&zone->lru_lock); + struct page_group *group = page_group(page); + if (group != pg) { + if (pg) + spin_unlock_irq(&pg->lru_lock); + pg = group; + spin_lock_irq(&pg->lru_lock); } VM_BUG_ON(!PageLRU(page)); __ClearPageLRU(page); - del_page_from_lru(zone, page); + del_page_from_lru(pg, page); } if (!pagevec_add(&pages_to_free, page)) { - if (zone) { - spin_unlock_irq(&zone->lru_lock); - zone = NULL; + if (pg) { + spin_unlock_irq(&pg->lru_lock); + pg = NULL; } __pagevec_free(&pages_to_free); pagevec_reinit(&pages_to_free); } } - if (zone) - spin_unlock_irq(&zone->lru_lock); + if (pg) + spin_unlock_irq(&pg->lru_lock); pagevec_free(&pages_to_free); } @@ -351,24 +351,24 @@ void __pagevec_release_nonlru(struct pag void __pagevec_lru_add(struct pagevec *pvec) { int i; - struct zone *zone = NULL; + struct page_group *pg = NULL; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct zone *pagezone = page_zone(page); + struct page_group *group = page_group(page); - if (pagezone != zone) { - if (zone) - spin_unlock_irq(&zone->lru_lock); - zone = pagezone; - spin_lock_irq(&zone->lru_lock); + if (group != pg) { + if (pg) + spin_unlock_irq(&pg->lru_lock); + pg = group; + spin_lock_irq(&pg->lru_lock); } VM_BUG_ON(PageLRU(page)); SetPageLRU(page); - add_page_to_inactive_list(zone, page); + add_page_to_inactive_list(pg, page); } - if (zone) - spin_unlock_irq(&zone->lru_lock); + if (pg) + spin_unlock_irq(&pg->lru_lock); release_pages(pvec->pages, pvec->nr, pvec->cold); pagevec_reinit(pvec); } @@ -378,26 +378,26 @@ EXPORT_SYMBOL(__pagevec_lru_add); void __pagevec_lru_add_active(struct pagevec *pvec) { int i; - struct zone *zone = NULL; + struct page_group *pg = NULL; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct zone *pagezone = page_zone(page); + struct page_group *group = page_group(page); - if (pagezone != zone) { - if (zone) - spin_unlock_irq(&zone->lru_lock); - zone = pagezone; - spin_lock_irq(&zone->lru_lock); + if (group != pg) { + if (pg) + spin_unlock_irq(&pg->lru_lock); + pg = group; + spin_lock_irq(&pg->lru_lock); } VM_BUG_ON(PageLRU(page)); SetPageLRU(page); VM_BUG_ON(PageActive(page)); SetPageActive(page); - add_page_to_active_list(zone, page); + add_page_to_active_list(pg, page); } - if (zone) - spin_unlock_irq(&zone->lru_lock); + if (pg) + spin_unlock_irq(&pg->lru_lock); release_pages(pvec->pages, pvec->nr, pvec->cold); pagevec_reinit(pvec); } Index: linux-2.6.23-rc4/mm/page_alloc.c =================================================================== --- linux-2.6.23-rc4.orig/mm/page_alloc.c +++ linux-2.6.23-rc4/mm/page_alloc.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -615,6 +616,7 @@ static int prep_new_page(struct page *pa 1 << PG_owner_priv_1 | 1 << PG_mappedtodisk); set_page_private(page, 0); set_page_refcounted(page); + set_page_group(page, &(page_zone(page)->zone_lru)); /* default */ arch_alloc_page(page, order); kernel_map_pages(page, 1 << order, 1); @@ -2963,15 +2965,14 @@ static void __meminit free_area_init_cor #endif zone->name = zone_names[j]; spin_lock_init(&zone->lock); - spin_lock_init(&zone->lru_lock); + spin_lock_init(&zone->pg_lock); + init_page_group(zone, &zone->zone_lru); zone_seqlock_init(zone); zone->zone_pgdat = pgdat; zone->prev_priority = DEF_PRIORITY; zone_pcp_init(zone); - INIT_LIST_HEAD(&zone->active_list); - INIT_LIST_HEAD(&zone->inactive_list); zone->nr_scan_active = 0; zone->nr_scan_inactive = 0; zap_zone_vm_stats(zone); Index: linux-2.6.23-rc4/mm/filemap.c =================================================================== --- linux-2.6.23-rc4.orig/mm/filemap.c +++ linux-2.6.23-rc4/mm/filemap.c @@ -95,8 +95,8 @@ generic_file_direct_IO(int rw, struct ki * ->swap_lock (try_to_unmap_one) * ->private_lock (try_to_unmap_one) * ->tree_lock (try_to_unmap_one) - * ->zone.lru_lock (follow_page->mark_page_accessed) - * ->zone.lru_lock (check_pte_range->isolate_lru_page) + * ->page_group.lru_lock (follow_page->mark_page_accessed) + * ->page_group.lru_lock (check_pte_range->isolate_lru_page) * ->private_lock (page_remove_rmap->set_page_dirty) * ->tree_lock (page_remove_rmap->set_page_dirty) * ->inode_lock (page_remove_rmap->set_page_dirty) Index: linux-2.6.23-rc4/mm/rmap.c =================================================================== --- linux-2.6.23-rc4.orig/mm/rmap.c +++ linux-2.6.23-rc4/mm/rmap.c @@ -27,7 +27,7 @@ * mapping->i_mmap_lock * anon_vma->lock * mm->page_table_lock or pte_lock - * zone->lru_lock (in mark_page_accessed, isolate_lru_page) + * page_group->lru_lock (in mark_page_accessed, isolate_lru_page) * swap_lock (in swap_duplicate, swap_info_get) * mmlist_lock (in mmput, drain_mmlist and others) * mapping->private_lock (in __set_page_dirty_buffers) Index: linux-2.6.23-rc4/mm/mmzone.c =================================================================== --- linux-2.6.23-rc4.orig/mm/mmzone.c +++ linux-2.6.23-rc4/mm/mmzone.c @@ -9,6 +9,18 @@ #include #include +void init_page_group(struct zone *zone, struct page_group *pg) +{ + INIT_LIST_HEAD(&pg->list); + INIT_LIST_HEAD(&pg->active_list); + INIT_LIST_HEAD(&pg->inactive_list); + spin_lock_init(&pg->lru_lock); + pg->z = zone; + pg->flags = 0; + atomic_set(&pg->refcnt, 1); +} + + struct pglist_data *first_online_pgdat(void) { return NODE_DATA(first_online_node); Index: linux-2.6.23-rc4/include/linux/mm_types.h =================================================================== --- linux-2.6.23-rc4.orig/include/linux/mm_types.h +++ linux-2.6.23-rc4/include/linux/mm_types.h @@ -64,6 +64,7 @@ struct page { struct list_head lru; /* Pageout list, eg. active_list * protected by zone->lru_lock ! */ + struct page_group *group; /* * On machines where all RAM is mapped into kernel address space, * we can simply calculate the virtual address. On machines with Index: linux-2.6.23-rc4/mm/migrate.c =================================================================== --- linux-2.6.23-rc4.orig/mm/migrate.c +++ linux-2.6.23-rc4/mm/migrate.c @@ -46,9 +46,9 @@ int isolate_lru_page(struct page *page, int ret = -EBUSY; if (PageLRU(page)) { - struct zone *zone = page_zone(page); + struct page_group *pg = page_group(page); - spin_lock_irq(&zone->lru_lock); + spin_lock_irq(&pg->lru_lock); if (PageLRU(page) && get_page_unless_zero(page)) { ret = 0; ClearPageLRU(page); @@ -58,7 +58,7 @@ int isolate_lru_page(struct page *page, del_page_from_inactive_list(zone, page); list_add_tail(&page->lru, pagelist); } - spin_unlock_irq(&zone->lru_lock); + spin_unlock_irq(&pg->lru_lock); } return ret; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org