linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* problems with memory hotplug/remove on 3.0.1
@ 2011-10-18 22:27 Larry Bassel
  2011-10-20  3:03 ` Larry Bassel
  0 siblings, 1 reply; 2+ messages in thread
From: Larry Bassel @ 2011-10-18 22:27 UTC (permalink / raw)
  To: linux-mm; +Cc: kparsha, vgandhi

We have encountered two problems with memory hotplug/hotremove
in 3.0.1 -- this is a port of memory hotplug to ARM with a few
small changes noted below.

Neither of these occurred on a similar 2.6.38-based port
we did to the same hardware.

The memory is essentially 2 512M memory banks, the lower
is always on, the upper is the one we are powering on
and off. ARCH_POPULATES_NODE_MAP was ported to ARM
and a small change was made to ensure that
the movable zone could be placed exactly where desired
(as movablecore= does not and must be specified on
the command line -- we don't know where the movable
zone must be until the kernel starts coming up).
Also the upper 512M is forced to be highmem as
the movable zone must come from the highest physical
memory zone (of course highmem may be larger than
512M, just not smaller).

1. If highmem is set to start at exactly 512M, then
all of highmem is used up when forming the movable
zone. This seems to confuse the memory management
subsystem (page reclaim?) because although the memory
hotremove of the upper 512M succeeds, running a command
that takes a pagefault after hotremove causes
the system to hang:

try_to_free_pages
__alloc_pages_nodemask
do_wp_page
handle_pte_fault
handle_mm_fault
do_page_fault

try_to_free_pages() is called repeatedly (forever), making no
apparent progress. After some experimentation, I
discovered that making the highmem zone at least 5M
larger than the 512M movable zone appears to make the
problem disappear.

I can (if I don't run anything that provokes the
above bug) hotplug the 512M back in, and then this
problem does not occur.

I've seen some discussion about very small zones causing
problems. Is what we are seeing a known problem?
Is there a known fix (or at least a patch we could try)?

2. Assuming the workaround we have for #1 is present,
we see memory hotremove occasionally fail. This seems
to (after a few seconds) cause init's state to become
corrupted, provoking a panic -- sometimes (but not always)
init's PC is 0. Sometimes additional (not always the
same) processes also unexpectedly exit after the
memory hotremove attempt.

Thanks in advance for any insight you might have.

Larry Bassel

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: problems with memory hotplug/remove on 3.0.1
  2011-10-18 22:27 problems with memory hotplug/remove on 3.0.1 Larry Bassel
@ 2011-10-20  3:03 ` Larry Bassel
  0 siblings, 0 replies; 2+ messages in thread
From: Larry Bassel @ 2011-10-20  3:03 UTC (permalink / raw)
  To: Larry Bassel; +Cc: linux-mm, kparsha, vgandhi

On 18 Oct 11 15:27, Larry Bassel wrote:
> We have encountered two problems with memory hotplug/hotremove
> in 3.0.1 -- this is a port of memory hotplug to ARM with a few
> small changes noted below.
> 
> Neither of these occurred on a similar 2.6.38-based port
> we did to the same hardware.
> 
> The memory is essentially 2 512M memory banks, the lower
> is always on, the upper is the one we are powering on
> and off. ARCH_POPULATES_NODE_MAP was ported to ARM
> and a small change was made to ensure that
> the movable zone could be placed exactly where desired
> (as movablecore= does not and must be specified on
> the command line -- we don't know where the movable
> zone must be until the kernel starts coming up).
> Also the upper 512M is forced to be highmem as
> the movable zone must come from the highest physical
> memory zone (of course highmem may be larger than
> 512M, just not smaller).
> 
> 1. If highmem is set to start at exactly 512M, then
> all of highmem is used up when forming the movable
> zone. This seems to confuse the memory management
> subsystem (page reclaim?) because although the memory
> hotremove of the upper 512M succeeds, running a command
> that takes a pagefault after hotremove causes
> the system to hang:
> 
> try_to_free_pages
> __alloc_pages_nodemask
> do_wp_page
> handle_pte_fault
> handle_mm_fault
> do_page_fault
> 
> try_to_free_pages() is called repeatedly (forever), making no
> apparent progress. After some experimentation, I
> discovered that making the highmem zone at least 5M
> larger than the 512M movable zone appears to make the
> problem disappear.
> 
> I can (if I don't run anything that provokes the
> above bug) hotplug the 512M back in, and then this
> problem does not occur.
> 
> I've seen some discussion about very small zones causing
> problems. Is what we are seeing a known problem?
> Is there a known fix (or at least a patch we could try)?
> 
> 2. Assuming the workaround we have for #1 is present,
> we see memory hotremove occasionally fail. This seems
> to (after a few seconds) cause init's state to become
> corrupted, provoking a panic -- sometimes (but not always)
> init's PC is 0. Sometimes additional (not always the
> same) processes also unexpectedly exit after the
> memory hotremove attempt.

Sorry to reply to my own post, but the second problem
was due to an error on our part -- I still believe
the first one is real and would appreciate help with it.

Thanks.

Larry

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-10-20  3:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-18 22:27 problems with memory hotplug/remove on 3.0.1 Larry Bassel
2011-10-20  3:03 ` Larry Bassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox