From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 4 Nov 2004 06:57:45 -0200 From: Marcelo Tosatti Subject: Re: [PATCH 2/3] higher order watermarks Message-ID: <20041104085745.GA7186@logos.cnet> References: <417F5584.2070400@yahoo.com.au> <417F55B9.7090306@yahoo.com.au> <417F5604.3000908@yahoo.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <417F5604.3000908@yahoo.com.au> Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: Andrew Morton , Linux Memory Management , Linus Torvalds List-ID: On Wed, Oct 27, 2004 at 06:02:12PM +1000, Nick Piggin wrote: > 2/3 > > > Move the watermark checking code into a single function. Extend it to account > for the order of the allocation and the number of free pages that could satisfy > such a request. > > Signed-off-by: Nick Piggin Hi Nick, I have a few comments and doubts. > linux-2.6-npiggin/include/linux/mmzone.h | 2 + > linux-2.6-npiggin/mm/page_alloc.c | 58 ++++++++++++++++++++----------- > 2 files changed, 41 insertions(+), 19 deletions(-) > > diff -puN mm/page_alloc.c~vm-alloc-order-watermarks mm/page_alloc.c > --- linux-2.6/mm/page_alloc.c~vm-alloc-order-watermarks 2004-10-27 16:41:32.000000000 +1000 > +++ linux-2.6-npiggin/mm/page_alloc.c 2004-10-27 17:53:33.000000000 +1000 > @@ -586,6 +586,37 @@ buffered_rmqueue(struct zone *zone, int > } > > /* > + * Return 1 if free pages are above 'mark'. This takes into account the order > + * of the allocation. > + */ > +int zone_watermark_ok(struct zone *z, int order, unsigned long mark, > + int alloc_type, int can_try_harder, int gfp_high) > +{ > + /* free_pages my go negative - that's OK */ > + long min = mark, free_pages = z->free_pages - (1 << order) + 1; > + int o; > + > + if (gfp_high) > + min -= min / 2; > + if (can_try_harder) > + min -= min / 4; > + > + if (free_pages <= min + z->protection[alloc_type]) > + return 0; > + for (o = 0; o < order; o++) { > + /* At the next order, this order's pages become unavailable */ > + free_pages -= z->free_area[order].nr_free << o; > + > + /* Require fewer higher order pages to be free */ > + min >>= 1; I can't understand this. You decrease from free_pages nr_order_free_pages << o, in a loop, and divide min by two. What is the meaning of "nr_free_pages[order] << o" ? Its only meaningful when o == order? You're multiplying the number of free pages of the order the allocation wants by "0, 1..order". The two values have different meanings, until o == order. In the first iteration of the loop, order is 0, so you decrease from free_pages "z->free_area[order].nr_free". Again, the two values mean different things. Can you enlight me? I see you're trying to have some kind of extra protection, but the calculation is difficult to understand for me. > + > + if (free_pages <= min) > + return 0; > + } > + return 1; > +} > + > +/* > * This is the 'heart' of the zoned buddy allocator. > * > * Herein lies the mysterious "incremental min". That's the > @@ -606,7 +637,6 @@ __alloc_pages(unsigned int gfp_mask, uns > struct zonelist *zonelist) > { > const int wait = gfp_mask & __GFP_WAIT; > - unsigned long min; > struct zone **zones, *z; > struct page *page; > struct reclaim_state reclaim_state; > @@ -636,9 +666,9 @@ __alloc_pages(unsigned int gfp_mask, uns > > /* Go through the zonelist once, looking for a zone with enough free */ > for (i = 0; (z = zones[i]) != NULL; i++) { > - min = z->pages_low + (1<protection[alloc_type]; > > - if (z->free_pages < min) > + if (!zone_watermark_ok(z, order, z->pages_low, > + alloc_type, 0, 0)) The original code didnt had the can_try_harder/gfp_high decrease which is now on zone_watermark_ok. Means that those allocations will now be successful earlier, instead of going to the next zonelist iteration. kswapd will not be awake when it used to be. Hopefully it doesnt matter that much. You did this by intention? TIA > continue; > > page = buffered_rmqueue(z, order, gfp_mask); > @@ -654,14 +684,9 @@ __alloc_pages(unsigned int gfp_mask, uns > * coming from realtime tasks to go deeper into reserves > */ > for (i = 0; (z = zones[i]) != NULL; i++) { > - min = z->pages_min; > - if (gfp_mask & __GFP_HIGH) > - min /= 2; > - if (can_try_harder) > - min -= min / 4; > - min += (1<protection[alloc_type]; > - > - if (z->free_pages < min) > + if (!zone_watermark_ok(z, order, z->pages_min, > + alloc_type, can_try_harder, > + gfp_mask & __GFP_HIGH)) > continue; > > page = buffered_rmqueue(z, order, gfp_mask); > @@ -697,14 +722,9 @@ rebalance: > > /* go through the zonelist yet one more time */ > for (i = 0; (z = zones[i]) != NULL; i++) { > - min = z->pages_min; > - if (gfp_mask & __GFP_HIGH) > - min /= 2; > - if (can_try_harder) > - min -= min / 4; > - min += (1<protection[alloc_type]; > - > - if (z->free_pages < min) > + if (!zone_watermark_ok(z, order, z->pages_min, > + alloc_type, can_try_harder, > + gfp_mask & __GFP_HIGH)) > continue; > > page = buffered_rmqueue(z, order, gfp_mask); > diff -puN include/linux/mmzone.h~vm-alloc-order-watermarks include/linux/mmzone.h > --- linux-2.6/include/linux/mmzone.h~vm-alloc-order-watermarks 2004-10-27 16:41:32.000000000 +1000 > +++ linux-2.6-npiggin/include/linux/mmzone.h 2004-10-27 17:52:07.000000000 +1000 > @@ -279,6 +279,8 @@ void get_zone_counts(unsigned long *acti > unsigned long *free); > void build_all_zonelists(void); > void wakeup_kswapd(struct zone *zone); > +int zone_watermark_ok(struct zone *z, int order, unsigned long mark, > + int alloc_type, int can_try_harder, int gfp_high); > > /* > * zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc. > > _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org