Re: [PATCH] hibernate / memory hotplug: always use for_each_populated_zone()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>,
	Nigel Cunningham <ncunningham@crca.org.au>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Yasunori Goto <y-goto@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>,
	linux-mm@kvack.org
Subject: Re: [PATCH] hibernate / memory hotplug: always use for_each_populated_zone()
Date: Wed, 22 Jul 2009 09:25:35 +0900	[thread overview]
Message-ID: <20090722092535.5eac1ff6.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <200907211611.09525.rjw@sisk.pl>

On Tue, 21 Jul 2009 16:11:08 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Tuesday 21 July 2009, KAMEZAWA Hiroyuki wrote:
> > On Tue, 21 Jul 2009 09:15:08 +0200
> > Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > 
> > > On Tue, Jul 21, 2009 at 07:29:58AM +1000, Nigel Cunningham wrote:
> > > > Hi.
> > > > 
> > > > Gerald Schaefer wrote:
> > > > > From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> > > > > 
> > > > > Use for_each_populated_zone() instead of for_each_zone() in hibernation
> > > > > code. This fixes a bug on s390, where we allow both config options
> > > > > HIBERNATION and MEMORY_HOTPLUG, so that we also have a ZONE_MOVABLE
> > > > > here. We only allow hibernation if no memory hotplug operation was
> > > > > performed, so in fact both features can only be used exclusively, but
> > > > > this way we don't need 2 differently configured (distribution) kernels.
> > > > > 
> > > > > If we have an unpopulated ZONE_MOVABLE, we allow hibernation but run
> > > > > into a BUG_ON() in memory_bm_test/set/clear_bit() because hibernation
> > > > > code iterates through all zones, not only the populated zones, in
> > > > > several places. For example, swsusp_free() does for_each_zone() and
> > > > > then checks for pfn_valid(), which is true even if the zone is not
> > > > > populated, resulting in a BUG_ON() later because the pfn cannot be
> > > > > found in the memory bitmap.
> > > > 
> > > > I agree with your logic and patch, but doesn't this also imply that the
> > > > s390 implementation pfn_valid should be changed to return false for
> > > > those pages?
> > > 
> > > For CONFIG_SPARSEMEM, which s390 uses, there is no architecture specific
> > > pfn_valid() implementation.
> > > Also it looks like the semantics of pfn_valid() aren't clear.
> > > At least for sparsemem it means nothing but "the memmap for the section
> > > this page belongs to exists". So it just means the struct page for the
> > > pfn exists.
> > 
> > Historically, pfn_valid() just means "there is a memmap." no other meanings
> > in any configs/archs.
> 
> Is this documented anywhere actually?
> 
When I helped developping SPARSEMEM, I goodled, I found Linus said that ;)
But, from implementaion, it's a very clear fact. See CONFIG_FLATMEM, the simplest
implemenation of memmap. It use a coutinous mem_map regardless of memory holes
and pfn_valid() returns true if pfn < max_mapnr.
#define pfn_valid(pfn)          ((pfn) < max_mapnr)



> > > We still have pfn_present() for CONFIG_SPARSEMEM. But that just means
> > > "some pages in the section this pfn belongs to are present."
> > 
> > It just exists for sparsemem internal purpose IIUC.
> > 
> > 
> > > So it looks like checking for pfn_valid() and afterwards checking
> > > for PG_Reserved (?) might give what one would expect.
> > I think so, too. If memory is offline, PG_reserved is always set.
> > 
> > In general, it's expected that "page is contiguous in MAX_ORDER range"
> > and no memory holes in MAX_ORDER. In most case, PG_reserved is checked
> > for skipping not-existing memory.
> 
> PG_reserved is also set for kernel text, at least on some architectures, and
> for some other areas that we want to save.
> 
yes.

> > > Looks all a bit confusing to me.
> > > Or maybe it's just me who is confused? :)
> > > 
> > IIRC, there are no generic interface to know whether there is a physical page.
> 
> We need to know that for hibernation, though.
> 
> Well, there is a mechanism for marking making address ranges that are never
> to be saved, but they need to be known during initialisation already.
> 
> > pfn_valid() is only for memmap and people have used
> > 	if (pfn_valid(pfn) && !PageReserved(page))
> > check.
> > But, hmm, If hibernation have to save PG_reserved memory, general solution is
> > use copy_user_page() and handle fault.
> 
> That's not exactly straightforward IMHO.
> 
See ia64's ia64_pfn_valid(). It uses get_user() very effectively.
(I think this cost cost is small in any arch...)

 523 ia64_pfn_valid (unsigned long pfn)
 524 {
 525         char byte;
 526         struct page *pg = pfn_to_page(pfn);
 527 
 528         return     (__get_user(byte, (char __user *) pg) == 0)
 529                 && ((((u64)pg & PAGE_MASK) == (((u64)(pg + 1) - 1) & PAGE_MASK))
 530                         || (__get_user(byte, (char __user *) (pg + 1) - 1) == 0));
 531 }

Adding function like this is not very hard.

bool can_access_physmem(unsigned long pfn)
{
	 char byte;
	 char *pg = __va(pfn << PAGE_SHIFT);
	 return (__get_user(byte, pg) == 0)
}

and enough simple. But this may allow you to access remapped device's memory...
Then, some range check will be required anyway.
Can we detect io-remapped range from memmap or any ?
(I think we'll have to skip PG_reserved page...)

> > Alternative is making use of walk_memory_resource() as memory hotplug does.
> > It checks resource information registered.
> 
> I'd be fine with any _simple_ mechanism allowing us to check whether there's
> a physical page frame for given page (or given PFN).
> 

walk_memory_resource() is enough _simple_,  IMHO.
Now, I'm removing #ifdef CONFIG_MEMORY_HOTPLUG for walk_memory_resource() to
rewrite /proc/kcore. 

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-07-22  0:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1248103551.23961.0.camel@localhost.localdomain>
     [not found] ` <4A64E1D6.8090102@crca.org.au>
2009-07-21  7:15   ` Heiko Carstens
2009-07-21  7:21     ` Nick Piggin
2009-07-21  7:38     ` KAMEZAWA Hiroyuki
2009-07-21 14:11       ` Rafael J. Wysocki
2009-07-22  0:25         ` KAMEZAWA Hiroyuki [this message]
2009-07-22  0:38           ` KAMEZAWA Hiroyuki
2009-07-22 17:49           ` Rafael J. Wysocki
2009-07-22 23:46             ` KAMEZAWA Hiroyuki
2009-07-29 11:20         ` Gerald Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090722092535.5eac1ff6.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ncunningham@crca.org.au \
    --cc=npiggin@suse.de \
    --cc=rjw@sisk.pl \
    --cc=schwidefsky@de.ibm.com \
    --cc=y-goto@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox