From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f49.google.com (mail-la0-f49.google.com [209.85.215.49]) by kanga.kvack.org (Postfix) with ESMTP id 5D3616B0082 for ; Mon, 18 May 2015 03:32:32 -0400 (EDT) Received: by laat2 with SMTP id t2so205859632laa.1 for ; Mon, 18 May 2015 00:32:31 -0700 (PDT) Received: from mail-la0-x22b.google.com (mail-la0-x22b.google.com. [2a00:1450:4010:c03::22b]) by mx.google.com with ESMTPS id yj8si6144281lab.53.2015.05.18.00.32.29 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 May 2015 00:32:30 -0700 (PDT) Received: by labbd9 with SMTP id bd9so206372633lab.2 for ; Mon, 18 May 2015 00:32:29 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20150514093304.GS2462@suse.de> Date: Mon, 18 May 2015 00:32:29 -0700 Message-ID: Subject: Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!) From: Haren Myneni Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Haren Myneni , aneesh.kumar@linux.vnet.ibm.com, srikar@linux.vnet.ibm.com Mel, I am hitting this issue with 4.0 kernel and even with 3.19 and 3.17 kernels. I will also try with previous versions. Please let me know any suggestions on the debugging. Thanks Haren On 5/14/15, Haren Myneni wrote: > On 5/14/15, Mel Gorman wrote: >> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote: >>> Hi, >>> >>> I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2 >>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and >>> 1.6 TB memory. We can easily recreate this issue with kernel compile >>> (make -j500). But I could not reproduce with numa_balancing=disable. >>> >> >> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not >> match up with a BUG_ON. It's close to a PageLocked check but I want to >> be sure there are no other modifications. > > Mel, Thanks for your help. I added some printks and dump_page() to get > the page struct and swp_entry information. > >> >> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be >> easily reproduced, can the problem be bisected please? > > I did not try previous versions other than RHEL kernel (3.10.*). I > will try with previous versions. > > In the failure case, also noticed pte and address values are matched > in try_to_unmap_one() and remove_migration_pte(), but entry > (swp_entry_t) value is different. So looks like page strut address in > migration_entry_to_page() is not valid. > > try_to_unmap_one() > { > > ... > } else if (IS_ENABLED(CONFIG_MIGRATION)) { > /* > * Store the pfn of the page in a special migration > * pte. do_swap_page() will wait until the > migration > * pte is removed and then restart fault handling. > */ > BUG_ON(!(flags & TTU_MIGRATION)); > entry = make_migration_entry(page, > pte_write(pteval)); > } > swp_pte = swp_entry_to_pte(entry); > if (pte_soft_dirty(pteval)) > swp_pte = pte_swp_mksoft_dirty(swp_pte); > set_pte_at(mm, address, pte, swp_pte); > > /*pte=0xb16b8d0f80000000 address=0x100008150000 > page=0xf000000513f3e1e0 entry=0x3e0000000ec5ae34 */ > ... > } > > remove_migration_pte() > { > ... > /* address=0x100008150000 pte=0xb16b8d0f80000000 > *old=0xf000000513f3e1e0 */ > if (!is_migration_entry(entry) || > migration_entry_to_page(entry) != old) > goto unlock; > ... > } > > migration_entry_to_page() { > pte=0xb16b8d0f80000000 entry=0x3e00000002c5ae34 > page=0xf0000000f3f3e1e0 > } > > > Thanks > Haren > >> >> -- >> Mel Gorman >> SUSE Labs >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org