From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from m4.gw.fujitsu.co.jp ([10.0.50.74]) by fgwmail5.fujitsu.co.jp (8.12.10/Fujitsu Gateway) id i827tC9B012196 for ; Thu, 2 Sep 2004 16:55:12 +0900 (envelope-from kamezawa.hiroyu@jp.fujitsu.com) Received: from s0.gw.fujitsu.co.jp by m4.gw.fujitsu.co.jp (8.12.10/Fujitsu Domain Master) id i827tBTM028813 for ; Thu, 2 Sep 2004 16:55:11 +0900 (envelope-from kamezawa.hiroyu@jp.fujitsu.com) Received: from s0.gw.fujitsu.co.jp (s0 [127.0.0.1]) by s0.gw.fujitsu.co.jp (Postfix) with ESMTP id 6A5C3A7CCC for ; Thu, 2 Sep 2004 16:55:11 +0900 (JST) Received: from fjmail501.fjmail.jp.fujitsu.com (fjmail501-0.fjmail.jp.fujitsu.com [10.59.80.96]) by s0.gw.fujitsu.co.jp (Postfix) with ESMTP id 0D32DA7CCA for ; Thu, 2 Sep 2004 16:55:11 +0900 (JST) Received: from jp.fujitsu.com (fjscan503-0.fjmail.jp.fujitsu.com [10.59.80.124]) by fjmail501.fjmail.jp.fujitsu.com (Sun Internet Mail Server sims.4.0.2001.07.26.11.50.p9) with ESMTP id <0I3E00MJPLZYJA@fjmail501.fjmail.jp.fujitsu.com> for linux-mm@kvack.org; Thu, 2 Sep 2004 16:55:10 +0900 (JST) Date: Thu, 02 Sep 2004 17:00:24 +0900 From: Hiroyuki KAMEZAWA Subject: [RFC] buddy allocator without bitmap(3) [1/3] Message-id: <4136D318.9060102@jp.fujitsu.com> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Linux Kernel ML Cc: linux-mm , LHMS List-ID: This part is for initialization. -- Kame ====== This patch is 2nd. This implements some initialization for the buddy allocator. New function: calculate_aligned_end() calculate_aligned_end() removes some pages from system for removing invalid mem_map access from __free_pages_bulk() main loop.(This is in 4th patch) See below ================== main loop in __free_pages_bulk(page,order) =========== while (order < MAX_ORDER) { struct page *buddy; ...... buddy_idx = page_idx ^ (1 << order); buddy = zone->zone_mem_map + buddy_idx; if (page_count(buddy) !=0 --------------------(*) ....... } =============================================================== At (*), we have to guarantee that "buddy" is a valid page struct in a valid zone. Let MASK = (1 << (MAX_ORDER - 1)) - 1. A page of index 'X' can be coalesced with pages from (X &~MASK) to (X | mask). If both of a start index and length of a mem_map are multiples of (1 << (MAX_ORDER - 1)), all pages have its valid buddies in all order. Namely, if a mem_map is aligned with (1 << (MAX_ORDER -1)), there is no invalid "buddy" access in (*). If a mem_map is unaligned, there will be invalid access in (*). For fixing this, it looks good to force the mem_map to be aligned. (example) IA32's MAX_ORDER is 11. The alignment of buddy system is (1 << (11 - 1))=1024. To make a mem_map aligned is not so difficult. But , for example, IA64 kernel's MAX_ORDER is 18 !!! Forcing a mem_map to be aligned is not good in general. So, I uses another approach. calculate_aligned_end() discards some pages which is to be coalesced with invalid "buddy". It is called from memmap_init() which is called per contiguous mem_map. It removes some pages from the mem_map to guarantee that there is no invalid "buddy" access in (*). What is done in calculate_aligned_end() ? : (1) It checks whether a mem_map is aligned or not. (2) If the start of mem_map is unaligned, it find pages which have possibility of being coalesced with invalid pages. It marks them with PG_locked. (3) If the end of mem_map is unaligned, it marks the end page with PG_locked. Pages marked with PG_locked is removed by free_pages_at_init() and is not added to the buddy system. I think many machines' mem_map has good alignment and there are usually no removed pages :). -- Kame --- include/linux/gfp.h | 0 test-kernel-kamezawa/mm/bootmem.c | 9 -- test-kernel-kamezawa/mm/page_alloc.c | 140 +++++++++++++++++++++++++---------- 3 files changed, 103 insertions(+), 46 deletions(-) diff -puN mm/page_alloc.c~eliminate-bitmap-init mm/page_alloc.c --- test-kernel/mm/page_alloc.c~eliminate-bitmap-init 2004-09-02 15:18:48.572933656 +0900 +++ test-kernel-kamezawa/mm/page_alloc.c 2004-09-02 15:45:14.692806784 +0900 @@ -917,6 +917,33 @@ fastcall void free_pages(unsigned long a EXPORT_SYMBOL(free_pages); /* + * This function doesn't pass PG_locked pages to buddy allocator. + * PG_locked pages are silently removed from buddy system, which are + * marked with PG_private and set page_order() to INVALID_PAGE_ORDER. + */ +int __init free_pages_at_init(struct page *base, unsigned int order) +{ + struct page *page; + int nr_pages = (1 << order); + int nr_freed_pages; + + if (PageReserved(base) || !put_page_testzero(base)) + return 0; + + for( page = base,nr_freed_pages = 0; page != base + nr_pages; page++) { + if (PageLocked(page)) { + /* this page is out of buddy system */ + SetPagePrivate(page); + set_page_order(page, PAGE_INVALID_ORDER); + } else { + free_hot_page(page); + nr_freed_pages++; + } + } + return nr_freed_pages; +} + +/* * Total amount of free (allocatable) RAM: */ unsigned int nr_free_pages(void) @@ -1499,6 +1526,67 @@ static void __init calculate_zone_totalp printk(KERN_DEBUG "On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages); } +/* + * (1) We check whether the mem_map is aligned to MAX_ORDER or not. + * (2) If the mem_map is not aligned in its start address, we find pages + * which are top of buddy-list in each order and check whether + * their buddy are out of mem_map.If so, we mark them with PG_locked. + * (3) If the mem_map is not aligned in its end addres,we mark the end page + * of it with PG_locked. + * + * Marked pages are not added to buddy system. Please see free_pages_at_init(). + */ + +static void __init calculate_aligned_end(struct zone *zone, + unsigned long start_pfn, + int nr_pages) +{ + struct page *base; + unsigned long alignment_mask; + long start_idx, end_idx; + + start_idx = start_pfn - zone->zone_start_pfn; + end_idx = start_idx + nr_pages - 1; + alignment_mask = (1 << MAX_ORDER) - 1; + base = zone->zone_mem_map; + printk("calculate_aligned_end() %lx %lx\n",start_pfn, nr_pages); + + if (start_idx & alignment_mask) { + long edge_idx, buddy_idx; + int order; + for (edge_idx = start_idx,order = 0; + order < MAX_ORDER; + order++) { + if (edge_idx > end_idx) + break; + buddy_idx = edge_idx ^ (1 << order); + if (buddy_idx > end_idx) + break; + if (buddy_idx < edge_idx) { + /* + * Reserve the top page in this order + * as the stopper for buddy allocator. + * Because this page is an only page + * which has an out of range buddy in + * this order. + */ + printk("victim top page %lx\n", + zone->zone_start_pfn + edge_idx); + SetPageLocked(base + edge_idx); + edge_idx += (1 << order); + } + } + } + if ((end_idx & alignment_mask) != alignment_mask) { + /* + * Reserve the last page as the stopper for buddy allocator. + * Any coalescing including this page will not occur. + */ + printk("victim end page %lx\n",zone->zone_start_pfn + end_idx); + SetPageLocked(base + end_idx); + } + return; +} /* * Initially all pages are reserved - free ones are freed @@ -1510,9 +1598,13 @@ void __init memmap_init_zone(unsigned lo { struct page *start = pfn_to_page(start_pfn); struct page *page; - + unsigned long saved_start_pfn = start_pfn; + int zoneid = NODEZONE(nid, zone); + struct zone *zonep; + zonep = zone_table[zoneid]; + for (page = start; page < (start + size); page++) { - set_page_zone(page, NODEZONE(nid, zone)); + set_page_zone(page, zoneid); set_page_count(page, 0); reset_page_mapcount(page); SetPageReserved(page); @@ -1524,51 +1616,19 @@ void __init memmap_init_zone(unsigned lo #endif start_pfn++; } -} - -/* - * Page buddy system uses "index >> (i+1)", where "index" is - * at most "size-1". - * - * The extra "+3" is to round down to byte size (8 bits per byte - * assumption). Thus we get "(size-1) >> (i+4)" as the last byte - * we can access. - * - * The "+1" is because we want to round the byte allocation up - * rather than down. So we should have had a "+7" before we shifted - * down by three. Also, we have to add one as we actually _use_ the - * last bit (it's [0,n] inclusive, not [0,n[). - * - * So we actually had +7+1 before we shift down by 3. But - * (n+8) >> 3 == (n >> 3) + 1 (modulo overflows, which we do not have). - * - * Finally, we LONG_ALIGN because all bitmap operations are on longs. - */ -unsigned long pages_to_bitmap_size(unsigned long order, unsigned long nr_pages) -{ - unsigned long bitmap_size; - - bitmap_size = (nr_pages-1) >> (order+4); - bitmap_size = LONG_ALIGN(bitmap_size+1); + /* + * calling calculate_aligned_end(zone) is need to be called per + * a contiguous mem_map. + */ + calculate_aligned_end(zonep, saved_start_pfn, size); - return bitmap_size; } void zone_init_free_lists(struct pglist_data *pgdat, struct zone *zone, unsigned long size) { int order; - for (order = 0; ; order++) { - unsigned long bitmap_size; - + for (order = 0 ; order < MAX_ORDER ; order++) { INIT_LIST_HEAD(&zone->free_area[order].free_list); - if (order == MAX_ORDER-1) { - zone->free_area[order].map = NULL; - break; - } - - bitmap_size = pages_to_bitmap_size(order, size); - zone->free_area[order].map = - (unsigned long *) alloc_bootmem_node(pgdat, bitmap_size); } } diff -puN include/linux/gfp.h~eliminate-bitmap-init include/linux/gfp.h diff -puN mm/bootmem.c~eliminate-bitmap-init mm/bootmem.c --- test-kernel/mm/bootmem.c~eliminate-bitmap-init 2004-09-02 15:18:48.576933048 +0900 +++ test-kernel-kamezawa/mm/bootmem.c 2004-09-02 15:18:48.583931984 +0900 @@ -277,7 +277,6 @@ static unsigned long __init free_all_boo if (gofast && v == ~0UL) { int j; - count += BITS_PER_LONG; __ClearPageReserved(page); set_page_count(page, 1); for (j = 1; j < BITS_PER_LONG; j++) { @@ -285,17 +284,16 @@ static unsigned long __init free_all_boo prefetchw(page + j + 16); __ClearPageReserved(page + j); } - __free_pages(page, ffs(BITS_PER_LONG)-1); + count += free_pages_at_init(page,ffs(BITS_PER_LONG)-1); i += BITS_PER_LONG; page += BITS_PER_LONG; } else if (v) { unsigned long m; for (m = 1; m && i < idx; m<<=1, page++, i++) { if (v & m) { - count++; __ClearPageReserved(page); set_page_count(page, 1); - __free_page(page); + count += free_pages_at_init(page,0); } } } else { @@ -312,10 +310,9 @@ static unsigned long __init free_all_boo page = virt_to_page(bdata->node_bootmem_map); count = 0; for (i = 0; i < ((bdata->node_low_pfn-(bdata->node_boot_start >> PAGE_SHIFT))/8 + PAGE_SIZE-1)/PAGE_SIZE; i++,page++) { - count++; __ClearPageReserved(page); set_page_count(page, 1); - __free_page(page); + count += free_pages_at_init(page,0); } total += count; bdata->node_bootmem_map = NULL; _ -- --the clue is these footmarks leading to the door.-- KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org