Re: [Patch] memory unplug v3 [1/4] page isolation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, y-goto@jp.fujitsu.com, clameter@sgi.com
Subject: Re: [Patch] memory unplug v3 [1/4] page isolation
Date: Tue, 22 May 2007 20:01:39 +0900	[thread overview]
Message-ID: <20070522200139.e7ac1987.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0705221023020.16461@skynet.skynet.ie>

On Tue, 22 May 2007 11:19:27 +0100 (IST)
Mel Gorman <mel@csn.ul.ie> wrote:
> > If isolate_pages(start,end) is called,
> > - migratetype of the range turns to be MIGRATE_ISOLATE  if
> >  its current type is MIGRATE_MOVABLE or MIGRATE_RESERVE.
> 
> Why not MIGRATE_RECLAIMABLE as well?
> 

To allow that, I have to implement page_reclaime_range(start_pfn, end_pfn);
Now, I use just migration.
I'll consider it as my future work.
Maybe Christoph's work will help me.

> > - MIGRATE_ISOLATE is not on migratetype fallback list.
> >
> > Then, pages of this migratetype will not be allocated even if it is free.
> >
> > Now, isolate_pages() only can treat the range aligned to MAX_ORDER.
> > This can be adjusted if necesasry...maybe.
> >
> 
> I have a patch ready that groups pages by an arbitrary order. Right now it 
> is related to the size of the huge page on the system but it's a single 
> variable pageblock_order that determines the range. You may find you want 
> to adjust this value.
> 
I see. I'll support it in patches for next -mm.


> > +#define MIGRATE_UNMOVABLE     0		/* not reclaimable pages */
> > +#define MIGRATE_RECLAIMABLE   1		/* shrink_xxx routine can reap this */
> > +#define MIGRATE_MOVABLE       2		/* migrate_page can migrate this */
> > +#define MIGRATE_RESERVE       3		/* no type yet */
> 
> MIGRATE_RESERVE is where the min_free_kbytes pages are kept if possible 
> and the number of RESERVE blocks depends on the value of it. It is only 
> allocated from if the alternative is to fail the allocation so this 
> comment should read
> 
> /* min_free_kbytes free pages here */
> 
ok.

> Later we may find a way of using MIGRATE_RESERVE to isolate ranges but 
> it's not necessary now because it would obscure how the patch works.
> 
> > +#define MIGRATE_ISOLATE       4		/* never allocated from */
> > +#define MIGRATE_TYPES         5
> >
> 
> The documentation changes probably belong in a separate patch but thanks, 
> it nudges me again into getting around to it.
> 
Ok, I'll just consider comments for MIGRAT_ISOLATE.


>
> > +
> > +	migrate_type = get_pageblock_migratetype(page);
> > +	if (migrate_type == MIGRATE_ISOLATE) {
> > +		__free_pages_ok(page, 0);
> > +		return;
> > +	}
> 
> This change to the PCP allocator may be unnecessary. If you let the page 
> free to the pcp lists, they will never be allocated from there because 
> allocflags_to_migratetype() will never return MIGRATE_ISOLATE. What you 
> could do is drain the PCP lists just before you try to hot-remove or call 
> test_pages_isolated() to that the pcp pages will free back to the 
> MIGRATE_ISOLATE lists.
> 
Ah.. thanks. I'll remove this.


> The extra drain is undesirable but probably better than checking for 
> isolate every time a free occurs to the pcp lists.
> 
yes.

>
> > +/*
> > + * set/clear page block's type to be ISOLATE.
> > + * page allocater never alloc memory from ISOLATE blcok.
> > + */
> > +
> > +int is_page_isolated(struct page *page)
> > +{
> > +	if ((page_count(page) == 0) &&
> > +	    (get_pageblock_migratetype(page) == MIGRATE_ISOLATE))
> 
> (PageBuddy(page) || (page_count(page) == 0 && PagePrivate(page))) &&
>  	(get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
> 
> PageBuddy(page) for free pages and page_count(page) with PagePrivate 
> should indicate pages that are on the pcp lists.
> 
> As you currently prevent ISOLATE pages going to the pcp lists, only the 
> PageBuddy check is necessary right now but If you drain before you check 
> for isolated pages, you only need the PageBuddy() check. If you choose to 
> let pages on the pcp lists until a drain occurs, then you need the second 
> check.
> 
> This page_count() check instead of PageBuddy() appears to be related to 
> how test_pages_isolated() is implemented - more on that later.
> 
PG_buddy is set only if page is linked to freelist. IOW, if the page
is not the head of its buddy, PG_buddy is not set.
So, I didn't use PageBuddy().

(*) If I use PG_buddy for check "page is free or not", I have to search 
    head of buddy and its order.

> > +		return 1;
> > +	return 0;
> > +}
> > +
> > +int set_migratetype_isolate(struct page *page)
> > +{
> 
> set_pageblock_isolate() maybe to match set_pageblock_migratetype() naming?
> 

> > +	struct zone *zone;
> > +	unsigned long flags;
> > +	int migrate_type;
> > +	int ret = -EBUSY;
> > +
> > +	zone = page_zone(page);
> > +	spin_lock_irqsave(&zone->lock, flags);
> 
> It may be more appropriate to have the caller take this lock. More later 
> in isolates_pages()
> 
ok.

> > +	migrate_type = get_pageblock_migratetype(page);
> > +	if ((migrate_type != MIGRATE_MOVABLE) &&
> > +	    (migrate_type != MIGRATE_RESERVE))
> > +		goto out;
> 
> and maybe MIGRATE_RECLAIMABLE here particularly in view of Christoph's 
> work with kmem_cache_vacate().
> 
ok. I'll look into.


> > +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> > +	move_freepages_block(zone, page, MIGRATE_ISOLATE);
> > +	ret = 0;
> > +out:
> > +	spin_unlock_irqrestore(&zone->lock, flags);
> > +	if (!ret)
> > +		drain_all_local_pages();
> 
> It's not clear why you drain the pcp lists when you encounter a block of 
> the wrong migrate_type. Draining the pcp lists is unlikely to help you.
> 
Ah, drain_all_local_pages() are called when MIGRATE_ISOLATE is successfully set.
But I'll change this because I'll remove hook in free_hot_cold_page() and call
drain_all_local_pages() in somewhere.


> > +	return ret;
> > +}
> > +
> > +void clear_migratetype_isolate(struct page *page)
> > +{
> > +	struct zone *zone;
> > +	unsigned long flags;
> > +	zone = page_zone(page);
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> > +		goto out;
> > +	set_pageblock_migratetype(page, MIGRATE_RESERVE);
> > +	move_freepages_block(zone, page, MIGRATE_RESERVE);
> 
> MIGRATE_RESERVE is likely not what you want to do here. The number of 
> MIGRATE_RESERVE blocks in a zone is determined by 
> setup_zone_migrate_reserve(). If you are setting blocks like this, then 
> you need to call setup_zone_migrate_reserve() with the zone->lru_lock held 
> after you have call clear_migratetype_isolate() for all the necessary 
> blocks.
> 
> It may be easier to just set the blocks MIGRATE_MOVABLE.
> 
Ok.



> > +out:
> > +	spin_unlock_irqrestore(&zone->lock, flags);
> > +}
> > Index: devel-2.6.22-rc1-mm1/mm/page_isolation.c
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ devel-2.6.22-rc1-mm1/mm/page_isolation.c	2007-05-22 15:12:28.000000000 +0900
> > @@ -0,0 +1,67 @@
> > +/*
> > + * linux/mm/page_isolation.c
> > + */
> > +
> > +#include <stddef.h>
> > +#include <linux/mm.h>
> > +#include <linux/page-isolation.h>
> > +
> > +#define ROUND_DOWN(x,y)	((x) & ~((y) - 1))
> > +#define ROUND_UP(x,y)	(((x) + (y) -1) & ~((y) - 1))
> 
> A roundup() macro already exists in kernel.h. You may want to use that and 
> define a new rounddown() macro there instead.
Oh...I couldn't find it. thank you.


> 
> > +int
> > +isolate_pages(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
> > +	unsigned long undo_pfn;
> > +
> > +	start_pfn_aligned = ROUND_DOWN(start_pfn, NR_PAGES_ISOLATION_BLOCK);
> > +	end_pfn_aligned = ROUND_UP(end_pfn, NR_PAGES_ISOLATION_BLOCK);
> > +
> > +	for (pfn = start_pfn_aligned;
> > +	     pfn < end_pfn_aligned;
> > +	     pfn += NR_PAGES_ISOLATION_BLOCK)
> > +		if (set_migratetype_isolate(pfn_to_page(pfn))) {
> 
> You will need to call pfn_valid() in the non-SPARSEMEM case before calling 
> pfn_to_page() or this will crash in some circumstances.
ok.

> 
> You also need to check zone boundaries. Lets say start_pfn is the start of 
> a non-MAX_ORDER aligned zone. Aligning it could make you start isolating 
> in the wrong zone - prehaps this is intentional, I don't know.

Ah, ok. at least pfn_valid() is necessary.



> 
> > +			undo_pfn = pfn;
> > +			goto undo;
> > +		}
> > +	return 0;
> > +undo:
> > +	for (pfn = start_pfn_aligned;
> > +	     pfn <= undo_pfn;
> > +	     pfn += NR_PAGES_ISOLATION_BLOCK)
> > +		clear_migratetype_isolate(pfn_to_page(pfn));
> > +
> 
> We fail if we encounter any non-MIGRATE_MOVABLE block in the start_pfn to 
> end_pfn range but at that point we've done a lot of work. We also take and 
> release an interrupt safe lock for each NR_PAGES_ISOLATION_BLOCK block 
> because set_migratetype_isolate() is responsible for lock taking.
> 
> It might be better if you took the lock here, scanned first to make sure 
> all the blocks were suitable for isolation and only then, call 
> set_migratetype_isolate() for each of them before releasing the lock.

Hm. ok.

> 
> That would take the lock once and avoid the need for back-out code that 
> changes all the MIGRATE types in the range. Even for large ranges of 
> memory, it should not be too long to be holding a lock particularly in 
> this path.
> 


> > +	return -EBUSY;
> > +}
> > +
> > +
> > +int
> > +free_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
> > +	start_pfn_aligned = ROUND_DOWN(start_pfn, NR_PAGES_ISOLATION_BLOCK);
> > +        end_pfn_aligned = ROUND_UP(end_pfn, NR_PAGES_ISOLATION_BLOCK);
> 
> spaces instead of tabs there before end_pfn_aligned.
> 
> > +
> > +	for (pfn = start_pfn_aligned;
> > +	     pfn < end_pfn_aligned;
> > +	     pfn += MAX_ORDER_NR_PAGES)
> 
> pfn += NR_PAGES_ISOLATION_BLOCK ?
> 
yes. it should be.

> pfn_valid() ?
> 
ok.

> > +		clear_migratetype_isolate(pfn_to_page(pfn));
> > +	return 0;
> > +}
> > +
> > +int
> > +test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +	unsigned long pfn;
> > +	int ret = 0;
> > +
> 
> You didn't align here, intentional?
> 
Ah...no. check alignment in the next version.


> > +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > +		if (!pfn_valid(pfn))
> > +			continue;
> > +		if (!is_page_isolated(pfn_to_page(pfn))) {
> > +			ret = 1;
> > +			break;
> > +		}
> 
> If the page is isolated, it's free and assuming you've drained the pcp 
> lists, it will have PageBuddy() set. In that case, you should be checking 
> what order the page is free at and skipping forward that number of pages. 
> I am guessing this pfn++ walk here is why you are checking 
> page_count(page) == 0 in is_page_isolated() instead of PageBuddy()
> 
yes. In next version, I'd like to try to treat PageBuddy() and page_order() things.


> > +	}
> > +	return ret;
> 
> The return value is a little counter-intuitive. It returns 1 if they are 
> not isolated. I would expect it to return 1 if isolated like test_bit() 
> returns 1 if it's set.
> 
ok.

> > +#define PAGE_ISOLATION_ORDER	(MAX_ORDER - 1)
> > +#define NR_PAGES_ISOLATION_BLOCK	(1 << PAGE_ISOLATION_ORDER)
> > +
> 
> When grouping-pages-by-arbitary-order goes in, there will be a value 
> available called pageblock_order and nr_pages_pageblock which will be 
> identical to these two values.
> 
ok.


> All in all, I like this implementation. I found it nice and relatively 
> straight-forward to read. Thanks
> 
Thank you for review. I'll reflect your comments in the next version.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-05-22 11:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-22  6:58 [Patch] memory unplug v3 [0/4] KAMEZAWA Hiroyuki
2007-05-22  7:01 ` [Patch] memory unplug v3 [1/4] page isolation KAMEZAWA Hiroyuki
2007-05-22 10:19   ` Mel Gorman
2007-05-22 11:01     ` KAMEZAWA Hiroyuki [this message]
2007-05-22 18:38   ` Christoph Lameter
2007-05-23  1:41     ` KAMEZAWA Hiroyuki
2007-05-22  7:04 ` [Patch] memory unplug v3 [2/4] migration by kernel KAMEZAWA Hiroyuki
2007-05-22 18:49   ` Christoph Lameter
2007-05-23  1:45     ` KAMEZAWA Hiroyuki
2007-05-23  1:56       ` Christoph Lameter
2007-05-23  2:09         ` KAMEZAWA Hiroyuki
2007-05-23 19:14     ` Mel Gorman
2007-05-25  7:43       ` KAMEZAWA Hiroyuki
2007-05-22  7:07 ` [Patch] memory unplug v3 [3/4] page removal KAMEZAWA Hiroyuki
2007-05-22 18:52   ` Christoph Lameter
2007-05-23  1:50     ` KAMEZAWA Hiroyuki
2007-05-22  7:08 ` [Patch] memory unplug v3 [4/4] ia64 interface KAMEZAWA Hiroyuki
2007-05-22 18:34 ` [Patch] memory unplug v3 [0/4] Christoph Lameter
2007-05-23  1:59   ` KAMEZAWA Hiroyuki
2007-05-23  2:09     ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070522200139.e7ac1987.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=y-goto@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox