* [RFC] buddy allocator without bitmap [3/4]
@ 2004-08-26 12:10 Hiroyuki KAMEZAWA
2004-08-26 15:55 ` Dave Hansen
0 siblings, 1 reply; 5+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-08-26 12:10 UTC (permalink / raw)
To: Linux Kernel ML; +Cc: linux-mm, LHMS, William Lee Irwin III, Dave Hansen
This part4 is for freeing pages.
Look of function free_pages_bulk() is different from orignal code,
but algorithm is unchanged.
-- Kame
=====
This patch removes bitmap operation from free_pages()
Instead of using bitmap, this patch records page's order in
page struct itself, page->private field.
As a side effect of removing a bitmap, we must test whether a buddy of a page
is existing or not. This is done by zone->aligned_order and bad_range(page,zone).
zone->aligned_order guarantees "a page of an order below zone->aligned_order has
its buddy in the zone".This is calculated in the zone initialization time.
If order > zone->aligned_order, we must call bad_range(zone, page) and
this is enough when zone doesn't contain memory-hole.
But....
In IA64 case, additionally, there can be memory holes in a zone and size of a hole
is smaller than 1 << MAX_ORDER. So pfn_valid() is inserted.
But pfn_valid() looks a bit heavy in ia64 and replacing this with faster one
is desirable.
example) a series of free_pages()
consider these 8 pages
page 0 1 2 3 4 5 6 7
[A][F][F][-][F][A][A2][-]
page[0] is allocated , order=0
page[1] is free, order=0
page[2-3] is free, order=1
page[4] is free, order=0
page[5] is allocated , order=0
page[6-7] is allocated, order=1
0) status
free_area[3] ->
free_area[2] ->
free_area[1] -> page[2-3]
free_area[0] -> page[1], page[4]
allocated -> page[0], page[5], page[6-7]
1) free_pages (page[6],order=1)
1-1) loop 1st
free_area[3] ->
free_area[2] ->
free_area[1] -> page[2-3], page[6-7]
free_area[0] -> page[1], page[4]
allocated -> page[0], page[5]
buddy of page 6 in order 1 is page[4].
page[4] is free,but its order is 0
stop here.
2) free_pages(page[5], order=0)
2-1) loop 1st
free_area[3] ->
free_area[2] ->
free_area[1] -> page[2-3], page[6-7]
free_area[0] -> page[1], page[4], page[5]
allocated -> page[0]
buddy of page[5] in order 0 is page[4].
page[4] is free and its order is 0.
do coalesce page[4] & page[5].
2-2) loop 2nd
free_area[3] ->
free_area[2] ->
free_area[1] -> page[2-3], page[6-7],page[4-5]
free_area[0] -> page[1]
allocated -> page[0]
buddy of page[4] in order 1 is page[6]
page[6] is free and its order is 1
coalesce page[4-5] and page[6-7]
2-3) loop 3rd
free_area[3] ->
free_area[2] -> page[4-7]
free_area[1] -> page[2-3],
free_area[0] -> page[1]
allocated -> page[0]
buddy of page[4] in order 2 is page[0]
page[0] is not free.
stop here.
3) free_pages(page[0],order=0)
3-1) 1st loop
free_area[3] ->
free_area[2] -> page[4-7]
free_area[1] -> page[2-3],
free_area[0] -> page[1],page[0] -> coalesce
allocated ->
3-2) 2nd loop
free_area[3] ->
free_area[2] -> page[4-7]
free_area[1] -> page[0-1],page[2-3] -> coalesce
free_area[0] ->
allocated ->
3-3) 3rd
free_area[3] ->
free_area[2] -> page[4-7] , page[0-3] -> coalesce
free_area[1] ->
free_area[0] ->
allocated ->
3-4) 4th
free_area[3] -> page[0-7]
free_area[2] ->
free_area[1] ->
free_area[0] ->
allocated ->
---
linux-2.6.8.1-mm4-kame-kamezawa/mm/page_alloc.c | 100 +++++++++++++++++-------
1 files changed, 74 insertions(+), 26 deletions(-)
diff -puN mm/page_alloc.c~eliminate-bitmap-free mm/page_alloc.c
--- linux-2.6.8.1-mm4-kame/mm/page_alloc.c~eliminate-bitmap-free 2004-08-26 11:41:46.000000000 +0900
+++ linux-2.6.8.1-mm4-kame-kamezawa/mm/page_alloc.c 2004-08-26 19:30:58.202099448 +0900
@@ -157,6 +157,27 @@ static void destroy_compound_page(struct
#endif /* CONFIG_HUGETLB_PAGE */
/*
+ * This function checks whether a page is free && is the buddy
+ * we can do coalesce if
+ * (a) the buddy is free and
+ * (b) the buddy is on the buddy system
+ * (c) the buddy has the same order.
+ * for recording page's order, we use private field and PG_private.
+ *
+ * Because page_count(page) == 0, and zone->lock is aquired.
+ * Atomic page->flags operation is needless here.
+ */
+static inline int page_is_buddy(struct page *page, int order)
+{
+ if ((page->flags & (1 << PG_private)) &&
+ (page_order(page) == order) &&
+ !(page->flags & (1 << PG_reserved)) &&
+ page_count(page) == 0)
+ return 1;
+ return 0;
+}
+
+/*
* Freeing function for a buddy system allocator.
*
* The concept of a buddy system is to maintain direct-mapped table
@@ -168,9 +189,12 @@ static void destroy_compound_page(struct
* at the bottom level available, and propagating the changes upward
* as necessary, plus some accounting needed to play nicely with other
* parts of the VM system.
- * At each level, we keep one bit for each pair of blocks, which
- * is set to 1 iff only one of the pair is allocated. So when we
- * are allocating or freeing one, we can derive the state of the
+ *
+ * At each level, we keep a list of pages, which are head of chunk of
+ * pages at the level. A page, which is a head of chunks, has its order
+ * in page structure itself and PG_private flag is set. we can get an
+ * order of a page by calling page_order().
+ * So we are allocating or freeing one, we can derive the state of the
* other. That is, if we allocate a small block, and both were
* free, the remainder of the region must be split into blocks.
* If a block is freed, and its buddy is also free, then this
@@ -180,42 +204,68 @@ static void destroy_compound_page(struct
*/
static inline void __free_pages_bulk (struct page *page, struct page *base,
- struct zone *zone, struct free_area *area, unsigned int order)
+ struct zone *zone, unsigned int order)
{
- unsigned long page_idx, index, mask;
-
+ unsigned long page_idx, mask;
+ struct page *coalesced_page;
+
if (order)
destroy_compound_page(page, order);
+
mask = (~0UL) << order;
page_idx = page - base;
if (page_idx & ~mask)
BUG();
- index = page_idx >> (1 + order);
-
zone->free_pages += 1 << order;
+ BUG_ON(bad_range(zone,page));
+
while (order < MAX_ORDER-1) {
- struct page *buddy1, *buddy2;
+ struct page *buddy;
+
+ buddy = base + (page_idx ^ (1 << order));
- BUG_ON(area >= zone->free_area + MAX_ORDER);
- if (!__test_and_change_bit(index, area->map))
+ /*
+ * quick check, if order < zone->aligned_order
+ * every page has its buddy in this order.
+ * buddy is guaranteed to be valid page.
+ */
+ if (order >= zone->aligned_order) {
+ if (zone->nr_mem_map > 1) {
+ /*
+ * there may be hole in zone's memmap &&
+ * hole is not aligned in this order.
+ * currently, I think CONFIG_VIRTUAL_MEM_MAP
+ * case is only case to reach here.
+ * Is there any other case ?
+ */
+ /*
+ * Is there better call than pfn_valid ?
+ */
+ if (!pfn_valid(zone->zone_start_pfn
+ + (page_idx ^ (1 << order))))
+ break;
+ }
+ /* we need range check */
+ if (bad_range(zone, buddy))
+ break;
+ }
+ if (!page_is_buddy(buddy, order))
/*
* the buddy page is still allocated.
*/
break;
-
- /* Move the buddy up one level. */
- buddy1 = base + (page_idx ^ (1 << order));
- buddy2 = base + page_idx;
- BUG_ON(bad_range(zone, buddy1));
- BUG_ON(bad_range(zone, buddy2));
- list_del(&buddy1->lru);
- mask <<= 1;
order++;
- area++;
- index >>= 1;
+ mask <<= 1;
page_idx &= mask;
- }
- list_add(&(base + page_idx)->lru, &area->free_list);
+ list_del(&buddy->lru);
+ /* for propriety of PG_private bit, we clear it */
+ buddy->flags &= ~(1 << PG_private);
+ }
+ /* record the final order of the page */
+ coalesced_page = base + page_idx;
+ coalesced_page->flags |= (1 << PG_private);
+ coalesced_page->private = order;
+ list_add(&coalesced_page->lru, &zone->free_area[order].free_list);
}
static inline void free_pages_check(const char *function, struct page *page)
@@ -253,12 +303,10 @@ free_pages_bulk(struct zone *zone, int c
struct list_head *list, unsigned int order)
{
unsigned long flags;
- struct free_area *area;
struct page *base, *page = NULL;
int ret = 0;
base = zone->zone_mem_map;
- area = zone->free_area + order;
spin_lock_irqsave(&zone->lock, flags);
zone->all_unreclaimable = 0;
zone->pages_scanned = 0;
@@ -266,7 +314,7 @@ free_pages_bulk(struct zone *zone, int c
page = list_entry(list->prev, struct page, lru);
/* have to delete it as __free_pages_bulk list manipulates */
list_del(&page->lru);
- __free_pages_bulk(page, base, zone, area, order);
+ __free_pages_bulk(page, base, zone, order);
ret++;
}
spin_unlock_irqrestore(&zone->lock, flags);
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] buddy allocator without bitmap [3/4]
2004-08-26 12:10 [RFC] buddy allocator without bitmap [3/4] Hiroyuki KAMEZAWA
@ 2004-08-26 15:55 ` Dave Hansen
2004-08-27 0:05 ` [Lhms-devel] " Hiroyuki KAMEZAWA
0 siblings, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2004-08-26 15:55 UTC (permalink / raw)
To: Hiroyuki KAMEZAWA; +Cc: Linux Kernel ML, linux-mm, lhms, William Lee Irwin III
> + if (zone->nr_mem_map > 1) {
> + /*
> + * there may be hole in zone's memmap &&
> + * hole is not aligned in this order.
> + * currently, I think CONFIG_VIRTUAL_MEM_MAP
> + * case is only case to reach here.
> + * Is there any other case ?
> + */
> + /*
> + * Is there better call than pfn_valid ?
> + */
> + if (!pfn_valid(zone->zone_start_pfn
> + + (page_idx ^ (1 << order))))
> + break;
> + }
Nice try. How about putting the ia64 code in a macro or header function
that you can #ifdef out on all the other architectures? We used to be
able to see that entire while loop on one screen. That's a bit harder
now.
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap [3/4]
2004-08-26 15:55 ` Dave Hansen
@ 2004-08-27 0:05 ` Hiroyuki KAMEZAWA
2004-08-27 0:15 ` Dave Hansen
0 siblings, 1 reply; 5+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-08-27 0:05 UTC (permalink / raw)
To: Dave Hansen; +Cc: Linux Kernel ML, linux-mm, lhms, William Lee Irwin III
Dave Hansen wrote:
>>+ if (zone->nr_mem_map > 1) {
>>+ /*
>>+ * there may be hole in zone's memmap &&
>>+ * hole is not aligned in this order.
>>+ * currently, I think CONFIG_VIRTUAL_MEM_MAP
>>+ * case is only case to reach here.
>>+ * Is there any other case ?
>>+ */
>>+ /*
>>+ * Is there better call than pfn_valid ?
>>+ */
>>+ if (!pfn_valid(zone->zone_start_pfn
>>+ + (page_idx ^ (1 << order))))
>>+ break;
>>+ }
>
>
> Nice try. How about putting the ia64 code in a macro or header function
> that you can #ifdef out on all the other architectures? We used to be
> able to see that entire while loop on one screen. That's a bit harder
> now.
>
Currently, I think zone->nr_mem_map itself is very vague.
I'm now looking for another way to remove this part entirely.
I think mem_section approarch may be helpful to remove this part,
but to implement full feature of CONFIG_NONLINEAR,
I'll need lots of different kind of patches.
(If mem_map is guaranteed to be contiguous in one mem_section)
1. Now, I think some small parts, some essence of mem_section which
makes pfn_valid() faster may be good.
And another way,
2. A method which enables page -> page's max_order calculation
may be good and consistent way in this no-bitmap approach.
But this problem would be my week-end homework :).
--Kame
--
--the clue is these footmarks leading to the door.--
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap [3/4]
2004-08-27 0:05 ` [Lhms-devel] " Hiroyuki KAMEZAWA
@ 2004-08-27 0:15 ` Dave Hansen
2004-08-27 0:42 ` Hiroyuki KAMEZAWA
0 siblings, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2004-08-27 0:15 UTC (permalink / raw)
To: Hiroyuki KAMEZAWA; +Cc: Linux Kernel ML, linux-mm, lhms, William Lee Irwin III
On Thu, 2004-08-26 at 17:05, Hiroyuki KAMEZAWA wrote:
> Currently, I think zone->nr_mem_map itself is very vague.
> I'm now looking for another way to remove this part entirely.
>
> I think mem_section approarch may be helpful to remove this part,
> but to implement full feature of CONFIG_NONLINEAR,
> I'll need lots of different kind of patches.
> (If mem_map is guaranteed to be contiguous in one mem_section)
This is definitely a true assumption right now.
> 1. Now, I think some small parts, some essence of mem_section which
> makes pfn_valid() faster may be good.
The only question is what it will take when there's a partially populate
mem_section. We'll almost certainly have to allow it, but the real
question is whether or not we will ever have a partially populated one
that's not at the end of memory.
> And another way,
>
> 2. A method which enables page -> page's max_order calculation
> may be good and consistent way in this no-bitmap approach.
>
> But this problem would be my week-end homework :).
Instead of adding more stuff to the mem_section, we might be able to
(ab)use more stuff in the mem_map's mem_map, like I am with
page->section right now.
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Lhms-devel] Re: [RFC] buddy allocator without bitmap [3/4]
2004-08-27 0:15 ` Dave Hansen
@ 2004-08-27 0:42 ` Hiroyuki KAMEZAWA
0 siblings, 0 replies; 5+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-08-27 0:42 UTC (permalink / raw)
To: Dave Hansen; +Cc: Linux Kernel ML, linux-mm, lhms, William Lee Irwin III
Dave Hansen wrote:
>>1. Now, I think some small parts, some essence of mem_section which
>> makes pfn_valid() faster may be good.
>
>
> The only question is what it will take when there's a partially populate
> mem_section. We'll almost certainly have to allow it, but the real
> question is whether or not we will ever have a partially populated one
> that's not at the end of memory.
>
Hmm....I cannot answer it fully.
My tiger4 (Itanium x 2) shows aligned_order=0, because it has a mem_map
start with address 0x????????3(I forget now), odd number ;(.
I like a mechine in which all memory are aligned.....
>>And another way,
>>
>>2. A method which enables page -> page's max_order calculation
>> may be good and consistent way in this no-bitmap approach.
>>
>>But this problem would be my week-end homework :).
>
>
> Instead of adding more stuff to the mem_section, we might be able to
> (ab)use more stuff in the mem_map's mem_map, like I am with
> page->section right now.
I wonder if there is another way which doesn't increase memory usage
in boottime, it will be better.
I'll going on considering the way to fix nr_mem_map things.
Thanks
-- Kame
--
--the clue is these footmarks leading to the door.--
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-08-27 0:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-26 12:10 [RFC] buddy allocator without bitmap [3/4] Hiroyuki KAMEZAWA
2004-08-26 15:55 ` Dave Hansen
2004-08-27 0:05 ` [Lhms-devel] " Hiroyuki KAMEZAWA
2004-08-27 0:15 ` Dave Hansen
2004-08-27 0:42 ` Hiroyuki KAMEZAWA
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox