From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from m5.gw.fujitsu.co.jp ([10.0.50.75]) by fgwmail5.fujitsu.co.jp (8.12.10/Fujitsu Gateway) id i7VAUp9B025722 for ; Tue, 31 Aug 2004 19:30:51 +0900 (envelope-from kamezawa.hiroyu@jp.fujitsu.com) Received: from s6.gw.fujitsu.co.jp by m5.gw.fujitsu.co.jp (8.12.10/Fujitsu Domain Master) id i7VAUomZ032607 for ; Tue, 31 Aug 2004 19:30:50 +0900 (envelope-from kamezawa.hiroyu@jp.fujitsu.com) Received: from fjmail504.fjmail.jp.fujitsu.com (fjmail504-0.fjmail.jp.fujitsu.com [10.59.80.102]) by s6.gw.fujitsu.co.jp (8.12.11) id i7VAUnRI006744 for ; Tue, 31 Aug 2004 19:30:50 +0900 (envelope-from kamezawa.hiroyu@jp.fujitsu.com) Received: from jp.fujitsu.com (fjscan501-0.fjmail.jp.fujitsu.com [10.59.80.120]) by fjmail504.fjmail.jp.fujitsu.com (Sun Internet Mail Server sims.4.0.2001.07.26.11.50.p9) with ESMTP id <0I3B002BQ3VCFM@fjmail504.fjmail.jp.fujitsu.com> for linux-mm@kvack.org; Tue, 31 Aug 2004 19:30:49 +0900 (JST) Date: Tue, 31 Aug 2004 19:36:01 +0900 From: Hiroyuki KAMEZAWA Subject: [RFC] buddy allocator without bitmap(2) [0/3] Message-id: <41345491.1020209@jp.fujitsu.com> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii; format=flowed Content-transfer-encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Linux Kernel ML Cc: linux-mm , LHMS List-ID: Hi, thanks for many advices on previous RFC. This new patch is against 2.6.9-rc1-mm1. A new algorithm is implemented. Considering advises on previous patch, I think the problems of that was (1) vagueness of zone->aligned_order (2) vagueness of zone->nr_mem_map (3) using pfn_valid() in the core of buddy allocator only for rare case. (ex) readability ;) I removed these (1)-(3) and this version is not so micro-optimized. I think inner-loop of free_pages_bulk() becomes simpler. I added one more PG_xxx flag, PG_buddyend and reserve one page as a stopper. PG_buddyend is set in boot-time and never cleared. This flags works as a lower-address stopper of buddy allocator.The reserved one page works as a higher-address stopper. "No accessing not-existing page" is guaranteed by these two. About a usage of PG_buddyend flag, please read patch. There is a sample illustration in attached patch. Advantage: - information about buddy system is fully stored in the mem_map. - no bitmap - can work well on discontiguous mem_map. Disadvantage: - using one more PG_xxx flag. - If mem_map is not aligned, reserve one page as a victim for buddy allocater. How about this approach ? Regards -- Kame This patch removes bitmap from buddy allocator, removes free_area_t's bitmap in include/linux/mmzone.h and adds some definition in include/linux/mm.h This is the 1st patch. Currently,Linux's page allocator uses buddy algorithm and codes for buddy allocator uses bitmaps. For what is bitmaps are used ? (*) for recording "a page is free" and page's order. here, page's order means size of contiguous free pages. if a free page[x] 's order is Y, there are contiguous free pages among page[X] to page[X + 2^(Y) - 1] If a page is free and is a head of contiguous free pages of order 'X', we can record it by set_bit(free_area[X]->map, index of page[X] in this order) For coalescing, when there is a chunk of free pages of order 'X', we can test whether we can coalesce or not by, test_bit(free_aera[X]->bitmap,index_of_buddy) index_of_buddy can be calculated by (index_of_page ^ (1 << order)) This patch removes bitmap and recording a free page's order in its page structure's page->private field. If a page is free and it is a head of a free contiguous memory chunk, page->private indicates the order of the page.PG_private bit is used to show propriety of this field. For coalescing, when there is a page which is a chunk of contiguous free pages of order 'X', we can test whether the page is to be coalesced or not by (page_is_free(buddy) && PagePrivate(buddy) && page_order(buddy) == 'X') address of buddy can be calculated by the same way in bitmap case. If page is free and on the buddy system, PG_private bit is set and has its order in page->private. This scheme is safe because... (a) when page is being freed, PG_private is not set. (see free_pages_check()) (b) when page is free and on the buddy system, PG_private is set. These facts are guaranteed by zone->lock. Only one thread can change a free page's PG_private bit and private field at anytime. [ Not MAX_ORDER aligned memory ] If all memory are aligned to system's MAX_ORDER, all buddy algorithm works fine and there is no trouble.But if memory is not aligned, we must check "whether buddy exists or not?". Checking this in main-loop of free_pages_bulk() is ugly but a page which has no buddy in some order can be calculated in boot time. New PG_xxx flag, PG_buddyend flag is introduced. This flag works as a stopper of buddy coalescing. A page is on the lower address end of mem_map and it cannot have its lower address buddy is marked as PG_buddyend. Please see below. If mem_map's start address is aligned, pages & buddy looks like this order start_pfn -> higher address ------------------------------------- 0 | | | | | | | | | | | | | .. ------------------------------------- 1 | | | | | | | ------------------------------------- 2 | | | | ------------------------------------- 3 | | ------------------------------------- If mem_map start address is not aligned, some of "lower address buddy" disappears. ---------------------------------- 0 |X | | | | | | | | | | | .. ---------------------------------- 1 |X | | | | | ------------------------------- 2 |X | | ------------------------- 3 |X ------------- A group of pages marked 'X' in each order never has its lower address buddy. 'X' pages are marked as PG_buddyend. Now,this check can work well enough to avoid coalescing not aligned pages. ----------------------- buddy_index = page_index ^ (1 << order) if ((buddy_index < page_index) && PageBuddyend(base + page_index)) stop coalescing. ----------------------- Above is for lower address case, How about higher address ? If higher end of mem_map is not aligned to MAX_ORDER, we reserve the highest address page and don't put it into buddy system. (don't pass to free_pages()) This one page is a victim for buddy allocator, and this is enough. -- Kame --- linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h | 24 ++++++++++++++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h | 1 linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h | 10 +++++ 3 files changed, 34 insertions(+), 1 deletion(-) diff -puN include/linux/mm.h~eliminate-bitmap-includes include/linux/mm.h --- linux-2.6.9-rc1-mm1-k/include/linux/mm.h~eliminate-bitmap-includes 2004-08-31 18:37:04.186101664 +0900 +++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h 2004-08-31 18:37:04.194100448 +0900 @@ -209,6 +209,9 @@ struct page { * usually used for buffer_heads * if PagePrivate set; used for * swp_entry_t if PageSwapCache + * When page is free: + * this indicates order of page + * in buddy allocator. */ struct address_space *mapping; /* If low bit clear, points to * inode address_space, or NULL. @@ -322,6 +325,27 @@ static inline void put_page(struct page #endif /* CONFIG_HUGETLB_PAGE */ /* + * These functions are used in alloc_pages()/free_pages(), buddy allocator. + * page_order(page) returns an order of a free page in buddy allocator. + * + * this is used with PG_private flag + * + * Note : all PG_private operations used in buddy system is done while + * zone->lock is acquired. So set and clear PG_private bit operation + * does not need to be atomic. + */ + +static inline int page_order(struct page *page) +{ + return page->private; +} + +static inline void set_page_order(struct page *page, int order) +{ + page->private = order; +} + +/* * Multiple processes may "see" the same page. E.g. for untouched * mappings of /dev/null, all processes see the same page full of * zeroes, and text pages of executables and shared libraries have diff -puN include/linux/mmzone.h~eliminate-bitmap-includes include/linux/mmzone.h --- linux-2.6.9-rc1-mm1-k/include/linux/mmzone.h~eliminate-bitmap-includes 2004-08-31 18:37:04.188101360 +0900 +++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h 2004-08-31 18:37:04.195100296 +0900 @@ -22,7 +22,6 @@ struct free_area { struct list_head free_list; - unsigned long *map; }; struct pglist_data; diff -puN include/linux/page-flags.h~eliminate-bitmap-includes include/linux/page-flags.h --- linux-2.6.9-rc1-mm1-k/include/linux/page-flags.h~eliminate-bitmap-includes 2004-08-31 18:37:04.190101056 +0900 +++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h 2004-08-31 18:37:04.196100144 +0900 @@ -44,6 +44,12 @@ * space, they need to be kmapped separately for doing IO on the pages. The * struct page (these bits with information) are always mapped into kernel * address space... + * + * PG_buddyend pages don't have its buddy in buddy allocator in some meaning. + * If a page is on the lower address end of mem_map and a buddy of it, in some + * order, is below the end, a page is marked as PG_buddyend. If a page is on + * the higher addres end of mem_map and a buddy of it,in some order, is above + * the end, a page is marked as PG_buddyend. */ /* @@ -75,6 +81,7 @@ #define PG_mappedtodisk 17 /* Has blocks allocated on-disk */ #define PG_reclaim 18 /* To be reclaimed asap */ +#define PG_buddyend 19 /* end of the buddy allocator */ /* * Global page accounting. One instance per CPU. Only unsigned longs are @@ -290,6 +297,9 @@ extern unsigned long __read_page_state(u #define SetPageCompound(page) set_bit(PG_compound, &(page)->flags) #define ClearPageCompound(page) clear_bit(PG_compound, &(page)->flags) +#define PageBuddyend(page) test_bit(PG_buddyend, &(page)->flags) +#define SetPageBuddyend(page) set_bit(PG_buddyend, &(page)->flags) + #ifdef CONFIG_SWAP #define PageSwapCache(page) test_bit(PG_swapcache, &(page)->flags) #define SetPageSwapCache(page) set_bit(PG_swapcache, &(page)->flags) _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org