* [RFC] buddy allocator without bitmap(2) [0/3]
@ 2004-08-31 10:36 Hiroyuki KAMEZAWA
2004-08-31 16:26 ` Dave Hansen
0 siblings, 1 reply; 5+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-08-31 10:36 UTC (permalink / raw)
To: Linux Kernel ML; +Cc: linux-mm, LHMS
Hi,
thanks for many advices on previous RFC.
This new patch is against 2.6.9-rc1-mm1.
A new algorithm is implemented.
Considering advises on previous patch, I think the problems of that was
(1) vagueness of zone->aligned_order
(2) vagueness of zone->nr_mem_map
(3) using pfn_valid() in the core of buddy allocator only for rare case.
(ex) readability ;)
I removed these (1)-(3) and this version is not so micro-optimized.
I think inner-loop of free_pages_bulk() becomes simpler.
I added one more PG_xxx flag, PG_buddyend and reserve one page as a stopper.
PG_buddyend is set in boot-time and never cleared. This flags works as a
lower-address stopper of buddy allocator.The reserved one page works as a
higher-address stopper. "No accessing not-existing page" is guaranteed by these two.
About a usage of PG_buddyend flag, please read patch.
There is a sample illustration in attached patch.
Advantage:
- information about buddy system is fully stored in the mem_map.
- no bitmap
- can work well on discontiguous mem_map.
Disadvantage:
- using one more PG_xxx flag.
- If mem_map is not aligned, reserve one page as a victim for buddy allocater.
How about this approach ?
Regards
-- Kame
This patch removes bitmap from buddy allocator,
removes free_area_t's bitmap in include/linux/mmzone.h
and adds some definition in include/linux/mm.h
This is the 1st patch.
Currently,Linux's page allocator uses buddy algorithm and codes for buddy
allocator uses bitmaps. For what is bitmaps are used ?
(*) for recording "a page is free" and page's order.
here, page's order means size of contiguous free pages.
if a free page[x] 's order is Y, there are contiguous free pages
among page[X] to page[X + 2^(Y) - 1]
If a page is free and is a head of contiguous free pages of order 'X',
we can record it by
set_bit(free_area[X]->map, index of page[X] in this order)
For coalescing, when there is a chunk of free pages of order 'X',
we can test whether we can coalesce or not by,
test_bit(free_aera[X]->bitmap,index_of_buddy)
index_of_buddy can be calculated by (index_of_page ^ (1 << order))
This patch removes bitmap and recording a free page's order
in its page structure's page->private field. If a page is free and
it is a head of a free contiguous memory chunk, page->private indicates
the order of the page.PG_private bit is used to show propriety of this field.
For coalescing, when there is a page which is a chunk of contiguous free pages
of order 'X', we can test whether the page is to be coalesced or not by
(page_is_free(buddy) && PagePrivate(buddy) && page_order(buddy) == 'X')
address of buddy can be calculated by the same way in bitmap case.
If page is free and on the buddy system, PG_private bit is set
and has its order in page->private. This scheme is safe because...
(a) when page is being freed, PG_private is not set. (see free_pages_check())
(b) when page is free and on the buddy system, PG_private is set.
These facts are guaranteed by zone->lock.
Only one thread can change a free page's PG_private bit and private field
at anytime.
[ Not MAX_ORDER aligned memory ]
If all memory are aligned to system's MAX_ORDER, all buddy algorithm works
fine and there is no trouble.But if memory is not aligned, we must check
"whether buddy exists or not?".
Checking this in main-loop of free_pages_bulk() is ugly but a page which
has no buddy in some order can be calculated in boot time.
New PG_xxx flag, PG_buddyend flag is introduced.
This flag works as a stopper of buddy coalescing.
A page is on the lower address end of mem_map and it cannot have
its lower address buddy is marked as PG_buddyend. Please see below.
If mem_map's start address is aligned, pages & buddy looks like this
order start_pfn -> higher address
-------------------------------------
0 | | | | | | | | | | | | | ..
-------------------------------------
1 | | | | | | |
-------------------------------------
2 | | | |
-------------------------------------
3 | |
-------------------------------------
If mem_map start address is not aligned, some of "lower address buddy"
disappears.
----------------------------------
0 |X | | | | | | | | | | | ..
----------------------------------
1 |X | | | | |
-------------------------------
2 |X | |
-------------------------
3 |X
-------------
A group of pages marked 'X' in each order never has its lower address buddy.
'X' pages are marked as PG_buddyend.
Now,this check can work well enough to avoid coalescing not aligned pages.
-----------------------
buddy_index = page_index ^ (1 << order)
if ((buddy_index < page_index) &&
PageBuddyend(base + page_index))
stop coalescing.
-----------------------
Above is for lower address case, How about higher address ?
If higher end of mem_map is not aligned to MAX_ORDER, we reserve
the highest address page and don't put it into buddy system. (don't pass to
free_pages())
This one page is a victim for buddy allocator, and this is enough.
-- Kame
---
linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h | 24 ++++++++++++++
linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h | 1
linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h | 10 +++++
3 files changed, 34 insertions(+), 1 deletion(-)
diff -puN include/linux/mm.h~eliminate-bitmap-includes include/linux/mm.h
--- linux-2.6.9-rc1-mm1-k/include/linux/mm.h~eliminate-bitmap-includes 2004-08-31 18:37:04.186101664 +0900
+++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h 2004-08-31 18:37:04.194100448 +0900
@@ -209,6 +209,9 @@ struct page {
* usually used for buffer_heads
* if PagePrivate set; used for
* swp_entry_t if PageSwapCache
+ * When page is free:
+ * this indicates order of page
+ * in buddy allocator.
*/
struct address_space *mapping; /* If low bit clear, points to
* inode address_space, or NULL.
@@ -322,6 +325,27 @@ static inline void put_page(struct page
#endif /* CONFIG_HUGETLB_PAGE */
/*
+ * These functions are used in alloc_pages()/free_pages(), buddy allocator.
+ * page_order(page) returns an order of a free page in buddy allocator.
+ *
+ * this is used with PG_private flag
+ *
+ * Note : all PG_private operations used in buddy system is done while
+ * zone->lock is acquired. So set and clear PG_private bit operation
+ * does not need to be atomic.
+ */
+
+static inline int page_order(struct page *page)
+{
+ return page->private;
+}
+
+static inline void set_page_order(struct page *page, int order)
+{
+ page->private = order;
+}
+
+/*
* Multiple processes may "see" the same page. E.g. for untouched
* mappings of /dev/null, all processes see the same page full of
* zeroes, and text pages of executables and shared libraries have
diff -puN include/linux/mmzone.h~eliminate-bitmap-includes include/linux/mmzone.h
--- linux-2.6.9-rc1-mm1-k/include/linux/mmzone.h~eliminate-bitmap-includes 2004-08-31 18:37:04.188101360 +0900
+++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h 2004-08-31 18:37:04.195100296 +0900
@@ -22,7 +22,6 @@
struct free_area {
struct list_head free_list;
- unsigned long *map;
};
struct pglist_data;
diff -puN include/linux/page-flags.h~eliminate-bitmap-includes include/linux/page-flags.h
--- linux-2.6.9-rc1-mm1-k/include/linux/page-flags.h~eliminate-bitmap-includes 2004-08-31 18:37:04.190101056 +0900
+++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h 2004-08-31 18:37:04.196100144 +0900
@@ -44,6 +44,12 @@
* space, they need to be kmapped separately for doing IO on the pages. The
* struct page (these bits with information) are always mapped into kernel
* address space...
+ *
+ * PG_buddyend pages don't have its buddy in buddy allocator in some meaning.
+ * If a page is on the lower address end of mem_map and a buddy of it, in some
+ * order, is below the end, a page is marked as PG_buddyend. If a page is on
+ * the higher addres end of mem_map and a buddy of it,in some order, is above
+ * the end, a page is marked as PG_buddyend.
*/
/*
@@ -75,6 +81,7 @@
#define PG_mappedtodisk 17 /* Has blocks allocated on-disk */
#define PG_reclaim 18 /* To be reclaimed asap */
+#define PG_buddyend 19 /* end of the buddy allocator */
/*
* Global page accounting. One instance per CPU. Only unsigned longs are
@@ -290,6 +297,9 @@ extern unsigned long __read_page_state(u
#define SetPageCompound(page) set_bit(PG_compound, &(page)->flags)
#define ClearPageCompound(page) clear_bit(PG_compound, &(page)->flags)
+#define PageBuddyend(page) test_bit(PG_buddyend, &(page)->flags)
+#define SetPageBuddyend(page) set_bit(PG_buddyend, &(page)->flags)
+
#ifdef CONFIG_SWAP
#define PageSwapCache(page) test_bit(PG_swapcache, &(page)->flags)
#define SetPageSwapCache(page) set_bit(PG_swapcache, &(page)->flags)
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] buddy allocator without bitmap(2) [0/3]
2004-08-31 10:36 [RFC] buddy allocator without bitmap(2) [0/3] Hiroyuki KAMEZAWA
@ 2004-08-31 16:26 ` Dave Hansen
2004-08-31 22:44 ` [Lhms-devel] " Hiroyuki KAMEZAWA
0 siblings, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2004-08-31 16:26 UTC (permalink / raw)
To: Hiroyuki KAMEZAWA; +Cc: Linux Kernel ML, linux-mm, lhms
On Tue, 2004-08-31 at 03:36, Hiroyuki KAMEZAWA wrote:
> Disadvantage:
> - using one more PG_xxx flag.
> - If mem_map is not aligned, reserve one page as a victim for buddy allocater.
>
> How about this approach ?
Granted, we have some free wiggle room in page->flags right now, but
using another bit effectively shifts the entire benefit of your patch.
Instead of getting rid of the buddy bitmaps, you simply consume a
page->flag instead. While you don't have to allocate anything (because
of the page->flags use), the number of bits consumed in the operation is
still the same as before. And the patch is getting more complex by the
minute.
Something ate your patch:
* Global page accounting. One instance per CPU. Only unsigned longs are
@@ -290,6 +297,9 @@ extern unsigned long __read_page_state(u
#define SetPageCompound(page) set_bit(PG_compound, &(page)->flags)
#define ClearPageCompound(page) clear_bit(PG_compound, &(page)->flags)
+#define PageBuddyend(page) test_bit(PG_buddyend, &(page)->flags)
+#define SetPageBuddyend(page) set_bit(PG_buddyend, &(page)->flags)
+
#ifdef CONFIG_SWAP
#define PageSwapCache(page) test_bit(PG_swapcache, &(page)->flags)
#define SetPageSwapCache(page) set_bit(PG_swapcache, &(page)->flags)
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap(2) [0/3]
2004-08-31 16:26 ` Dave Hansen
@ 2004-08-31 22:44 ` Hiroyuki KAMEZAWA
2004-08-31 23:24 ` Andrew Morton
0 siblings, 1 reply; 5+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-08-31 22:44 UTC (permalink / raw)
To: Dave Hansen; +Cc: Linux Kernel ML, linux-mm, lhms
Dave Hansen wrote:
> On Tue, 2004-08-31 at 03:36, Hiroyuki KAMEZAWA wrote:
>
>>Disadvantage:
>> - using one more PG_xxx flag.
>> - If mem_map is not aligned, reserve one page as a victim for buddy allocater.
>>
>>How about this approach ?
>
>
> Granted, we have some free wiggle room in page->flags right now, but
> using another bit effectively shifts the entire benefit of your patch.
> Instead of getting rid of the buddy bitmaps, you simply consume a
> page->flag instead. While you don't have to allocate anything (because
> of the page->flags use), the number of bits consumed in the operation is
> still the same as before. And the patch is getting more complex by the
> minute.
Hmm...I understand what you say. Consuming PG_xxx bit in buddy allocator is harmful
because no PG_xxx bit is used in current kernel's one.
What this patch implements is
"How to record shape of the mem_map needed by buddy allocator without using some
structure which must be resized at memory resizing."
Because I had to record some information about shape of mem_map, I used PG_xxx bit.
1 bit is maybe minimum consumption.
If consumption of 1 bit in a page structure is too harmful,
I have to record information in some structure which needs to be resized
at memory resizing. When I do so, this patch itself is meaningless, I think.
I'll consider more, but...
> Something ate your patch:
>
> * Global page accounting. One instance per CPU. Only unsigned longs are
> @@ -290,6 +297,9 @@ extern unsigned long __read_page_state(u
> #define SetPageCompound(page) set_bit(PG_compound, &(page)->flags)
> #define ClearPageCompound(page) clear_bit(PG_compound, &(page)->flags)
>
> +#define PageBuddyend(page) test_bit(PG_buddyend, &(page)->flags)
> +#define SetPageBuddyend(page) set_bit(PG_buddyend, &(page)->flags)
> +
> #ifdef CONFIG_SWAP
> #define PageSwapCache(page) test_bit(PG_swapcache, &(page)->flags)
> #define SetPageSwapCache(page) set_bit(PG_swapcache, &(page)->flags)
>
>
> -- Dave
--
--the clue is these footmarks leading to the door.--
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap(2) [0/3]
2004-08-31 22:44 ` [Lhms-devel] " Hiroyuki KAMEZAWA
@ 2004-08-31 23:24 ` Andrew Morton
2004-08-31 23:53 ` Hiroyuki KAMEZAWA
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2004-08-31 23:24 UTC (permalink / raw)
To: Hiroyuki KAMEZAWA; +Cc: haveblue, linux-kernel, linux-mm, lhms-devel
Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Because I had to record some information about shape of mem_map, I used PG_xxx bit.
> 1 bit is maybe minimum consumption.
The point is that we're running out of bits in page.flags.
You should be able to reuse an existing bit for this application. PG_lock would suit.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap(2) [0/3]
2004-08-31 23:24 ` Andrew Morton
@ 2004-08-31 23:53 ` Hiroyuki KAMEZAWA
0 siblings, 0 replies; 5+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-08-31 23:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: haveblue, linux-kernel, linux-mm, lhms-devel
Andrew Morton wrote:
> Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
>>Because I had to record some information about shape of mem_map, I used PG_xxx bit.
>>1 bit is maybe minimum consumption.
>
>
> The point is that we're running out of bits in page.flags.
>
yes.
> You should be able to reuse an existing bit for this application. PG_lock would suit.
Hmm... PG_buddyend pages in the top of mem_map can be allocated and used as normal pages
,which can be used for Disk I/O.
If I make them as victims to buddy allocator and don't allow to use them,
I can reuse an existing bit.
I'll consider more.
--Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-08-31 23:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-31 10:36 [RFC] buddy allocator without bitmap(2) [0/3] Hiroyuki KAMEZAWA
2004-08-31 16:26 ` Dave Hansen
2004-08-31 22:44 ` [Lhms-devel] " Hiroyuki KAMEZAWA
2004-08-31 23:24 ` Andrew Morton
2004-08-31 23:53 ` Hiroyuki KAMEZAWA
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox