From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: Update: [Automatic] NUMA replicated pagecache on 2.6.23-rc4-mm1 From: Lee Schermerhorn In-Reply-To: <46E74679.9020805@linux.vnet.ibm.com> References: <20070727084252.GA9347@wotan.suse.de> <1186604723.5055.47.camel@localhost> <1186780099.5246.6.camel@localhost> <20070813074351.GA15609@wotan.suse.de> <1189543962.5036.97.camel@localhost> <46E74679.9020805@linux.vnet.ibm.com> Content-Type: text/plain Date: Wed, 12 Sep 2007 09:48:47 -0400 Message-Id: <1189604927.5004.12.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: balbir@linux.vnet.ibm.com Cc: Nick Piggin , Linux Memory Management List , Joachim Deguara , Christoph Lameter , Mel Gorman , Eric Whitney List-ID: On Wed, 2007-09-12 at 07:22 +0530, Balbir Singh wrote: > Lee Schermerhorn wrote: > > [Balbir: see notes re: replication and memory controller below] > > > > A quick update: I have rebased the automatic/lazy page migration and > > replication patches to 23-rc4-mm1. If interested, you can find the > > entire series that I push in the '070911' tarball at: > > > > http://free.linux.hp.com/~lts/Patches/Replication/ > > > > I haven't gotten around to some of the things you suggested to address > > the soft lockups. etc. I just wanted to keep the patches up to date. > > > > In the process of doing a quick sanity test, I encountered an issue with > > replication and the new memory controller patches. I had built the > > kernel with the memory controller enabled. I encountered a panic in > > reclaim, while attempting to "drop caches", because replication was not > > "charging" the replicated pages and reclaim tried to deref a null > > "page_container" pointer. [!!! new member in page struct !!!] > > > > I added code to try_to_create_replica(), __remove_replicated_page() and > > release_pcache_desc() to charge/uncharge where I thought appropriate > > [replication patch # 02]. That seemed to solve the panic during drop > > caches triggered reclaim. However, when I tried a more stressful load, > > I hit another panic ["NaT Consumption" == ia64-ese for invalid pointer > > deref, I think] in shrink_active_list() called from direct reclaim. > > Still to be investigated. I wanted to give you and Balbir a heads up > > about the interaction of memory controllers with page replication. > > > > Hi, Lee, > > Thanks for testing the memory controller with page replication. I do > have some questions on the problem you are seeing > > Did you see the problem with direct reclaim or container reclaim? > drop_caches calls remove_mapping(), which should eventually call > the uncharge routine. We have some sanity checks in there. Sorry. This one wasn't in reclaim. It was from the fault path, via activate page. The bug in reclaim occurred after I "fixed" page replication to charge for replicated pages, thus adding the page_container. The second panic resulted from bad pointer ref in shrink_active_list() from direct reclaim. [abbreviated] stack traces attached below. I took a look at an assembly language objdump and it appears that the bad pointer deref occurred in the "while (!list_empty(&l_inactive))" loop. I see that there is also a mem_container_move_lists() call there. I will try to rerun the workload on an unpatched 23-rc4-mm1 today to see if it's reproducible there. I can believe that this is a race between replication [possibly "unreplicate"] and vmscan. I don't know what type of protection, if any, we have against that. > > We do try to see at several places if the page->page_container is NULL > and check for it. I'll look at your patches to see if there are any > changes to the reclaim logic. I tried looking for the oops you > mentioned, but could not find it in your directory, I saw the soft > lockup logs though. Do you still have the oops saved somewhere? > > I think the fix you have is correct and makes things works, but it > worries me that in direct reclaim we dereference the page_container > pointer without the page belonging to a container? What are the > properties of replicated pages? Are they assumed to be exact > replicas (struct page mappings, page_container expected to be the > same for all replicated pages) of the replicated page? Before "fix" Running spol+lpm+repl patches on 23-rc4-mm1. kernel build test echo 1 >/proc/sys/vm/drop_caches Then [perhaps a coincidence]: Unable to handle kernel NULL pointer dereference (address 0000000000000008) cc1[23366]: Oops 11003706212352 [1] Modules linked in: sunrpc binfmt_misc fan dock sg thermal processor container button sr_mod scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore Pid: 23366, CPU 6, comm: cc1 [] __mem_container_move_lists+0x50/0x100 sp=e0000720449a7d60 bsp=e0000720449a1040 [] mem_container_move_lists+0x50/0x80 sp=e0000720449a7d60 bsp=e0000720449a1010 [] activate_page+0x1d0/0x220 sp=e0000720449a7d60 bsp=e0000720449a0fd0 [] mark_page_accessed+0xe0/0x160 sp=e0000720449a7d60 bsp=e0000720449a0fb0 [] filemap_fault+0x390/0x840 sp=e0000720449a7d60 bsp=e0000720449a0f10 [] __do_fault+0xd0/0xbc0 sp=e0000720449a7d60 bsp=e0000720449a0e90 [] handle_mm_fault+0x280/0x1540 sp=e0000720449a7d90 bsp=e0000720449a0e00 [] ia64_do_page_fault+0x600/0xa80 sp=e0000720449a7da0 bsp=e0000720449a0da0 [] ia64_leave_kernel+0x0/0x270 sp=e0000720449a7e30 bsp=e0000720449a0da0 After "fix:" Running "usex" [unix systems exerciser] load, with kernel build, io tests, vm tests, memtoy "lock" tests, ... as[15608]: NaT consumption 2216203124768 [1] Modules linked in: sunrpc binfmt_misc fan dock sg container thermal button processor sr_mod scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore Pid: 15608, CPU 8, comm: as [] ia64_leave_kernel+0x0/0x270 sp=e00007401f53fab0 bsp=e00007401f539238 [] shrink_active_list+0x160/0xe80 sp=e00007401f53fc80 bsp=e00007401f539158 [] shrink_zone+0x240/0x280 sp=e00007401f53fd40 bsp=e00007401f539100 [] zone_reclaim+0x3c0/0x580 sp=e00007401f53fd40 bsp=e00007401f539098 [] get_page_from_freelist+0xb30/0x1360 sp=e00007401f53fd80 bsp=e00007401f538f08 [] __alloc_pages+0xd0/0x620 sp=e00007401f53fd80 bsp=e00007401f538e38 [] alloc_page_pol+0x100/0x180 sp=e00007401f53fd90 bsp=e00007401f538e08 [] alloc_page_vma+0xf0/0x120 sp=e00007401f53fd90 bsp=e00007401f538dc8 [] handle_mm_fault+0x740/0x1540 sp=e00007401f53fd90 bsp=e00007401f538d38 [] ia64_do_page_fault+0x600/0xa80 sp=e00007401f53fda0 bsp=e00007401f538ce0 [] ia64_leave_kernel+0x0/0x270 sp=e00007401f53fe30 bsp=e00007401f538ce0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org