From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 3A3E56B002F for ; Wed, 19 Oct 2011 23:03:03 -0400 (EDT) Date: Wed, 19 Oct 2011 20:03:00 -0700 From: Larry Bassel Subject: Re: problems with memory hotplug/remove on 3.0.1 Message-ID: <20111020030300.GB3841@labbmf-linux.qualcomm.com> References: <20111018222756.GA3841@labbmf-linux.qualcomm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111018222756.GA3841@labbmf-linux.qualcomm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Larry Bassel Cc: linux-mm@kvack.org, kparsha@codeaurora.org, vgandhi@codeaurora.org On 18 Oct 11 15:27, Larry Bassel wrote: > We have encountered two problems with memory hotplug/hotremove > in 3.0.1 -- this is a port of memory hotplug to ARM with a few > small changes noted below. > > Neither of these occurred on a similar 2.6.38-based port > we did to the same hardware. > > The memory is essentially 2 512M memory banks, the lower > is always on, the upper is the one we are powering on > and off. ARCH_POPULATES_NODE_MAP was ported to ARM > and a small change was made to ensure that > the movable zone could be placed exactly where desired > (as movablecore= does not and must be specified on > the command line -- we don't know where the movable > zone must be until the kernel starts coming up). > Also the upper 512M is forced to be highmem as > the movable zone must come from the highest physical > memory zone (of course highmem may be larger than > 512M, just not smaller). > > 1. If highmem is set to start at exactly 512M, then > all of highmem is used up when forming the movable > zone. This seems to confuse the memory management > subsystem (page reclaim?) because although the memory > hotremove of the upper 512M succeeds, running a command > that takes a pagefault after hotremove causes > the system to hang: > > try_to_free_pages > __alloc_pages_nodemask > do_wp_page > handle_pte_fault > handle_mm_fault > do_page_fault > > try_to_free_pages() is called repeatedly (forever), making no > apparent progress. After some experimentation, I > discovered that making the highmem zone at least 5M > larger than the 512M movable zone appears to make the > problem disappear. > > I can (if I don't run anything that provokes the > above bug) hotplug the 512M back in, and then this > problem does not occur. > > I've seen some discussion about very small zones causing > problems. Is what we are seeing a known problem? > Is there a known fix (or at least a patch we could try)? > > 2. Assuming the workaround we have for #1 is present, > we see memory hotremove occasionally fail. This seems > to (after a few seconds) cause init's state to become > corrupted, provoking a panic -- sometimes (but not always) > init's PC is 0. Sometimes additional (not always the > same) processes also unexpectedly exit after the > memory hotremove attempt. Sorry to reply to my own post, but the second problem was due to an error on our part -- I still believe the first one is real and would appreciate help with it. Thanks. Larry -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org