From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by kanga.kvack.org (Postfix) with ESMTP id C2B3F6B0032 for ; Wed, 1 Apr 2015 16:09:46 -0400 (EDT) Received: by pactp5 with SMTP id tp5so61778664pac.1 for ; Wed, 01 Apr 2015 13:09:46 -0700 (PDT) Received: from mail-pd0-x233.google.com (mail-pd0-x233.google.com. [2607:f8b0:400e:c02::233]) by mx.google.com with ESMTPS id fa2si4301854pbd.12.2015.04.01.13.09.45 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Apr 2015 13:09:45 -0700 (PDT) Received: by pdea3 with SMTP id a3so14123153pde.3 for ; Wed, 01 Apr 2015 13:09:45 -0700 (PDT) Date: Wed, 1 Apr 2015 13:09:32 -0700 (PDT) From: Hugh Dickins Subject: Re: kernel 3.18.10: THP refcounting bug In-Reply-To: <20150401134132.GB17886@node.dhcp.inet.fi> Message-ID: References: <551BBE1A.4040404@profihost.ag> <20150401113122.GA17153@node.dhcp.inet.fi> <551BDC4F.4010000@profihost.ag> <20150401134132.GB17886@node.dhcp.inet.fi> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" Cc: Stefan Priebe - Profihost AG , linux-mm@kvack.org, sasha.levin@oracle.com, Hugh Dickins , Konstantin Khlebnikov On Wed, 1 Apr 2015, Kirill A. Shutemov wrote: > On Wed, Apr 01, 2015 at 01:53:51PM +0200, Stefan Priebe - Profihost AG wrote: > > Hi, > > > > while using 3.18.9 i got several times the following stack trace: > > > > kernel BUG at mm/filemap.c:203! > > invalid opcode: 0000 [#1] SMP > > Modules linked in: dm_mod netconsole usbhid sd_mod sg ata_generic > > virtio_net virtio_scsi uhci_hcd ehci_hcd usbcore virtio_pci usb_common > > virtio_ring ata_piix virtio floppy > > CPU: 3 PID: 1 Comm: busybox Tainted: G B 3.18.9 #1 The "B" in that Tainted string means that earlier Stefan got a "Bad page" report. Please look for that in /var/log/messages, and post us what it said. It's not at all surprising to hit a BUG_ON(page_mapped(page)) after we already know that the page refcounting is messed up; though of course it's possible that the two are unrelated (and perhaps weeks apart). > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 > > task: ffff880137b98000 ti: ffff880137b94000 task.ti: ffff880137b94000 > > RIP: 0010:[] [] > > __delete_from_page_cache+0x2b5/0x2c0 > > RSP: 0018:ffff880137b97be8 EFLAGS: 00010046 > > RAX: 0000000000000000 RBX: 0000000000000003 RCX: 00000000ffffffd0 > > RDX: 0000000000000030 RSI: 000000000000000a RDI: ffff88013f9696c0 > > RBP: ffff880137b97c38 R08: 0000000000000000 R09: ffffea0002e927c0 > > R10: ffff8800bba92da0 R11: ffff880137b97c00 R12: ffffea0002e92480 > > R13: ffff8800bba8c4c8 R14: 0000000000000000 R15: ffff8800bba8c4d0 > > FS: 00007f5a79e0b700(0000) GS:ffff880139060000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000000023c3138 CR3: 00000000b84ba000 CR4: 00000000000006e0 > > Stack: > > 000000000000000e ffff880137b97d48 ffff8800bba92da0 ffff8800bba92dc8 > > ffff880137b97c68 ffffea0002e92480 ffff8800bba8c4c8 0000000000000000 > > 0000000000000000 0000000000000000 ffff880137b97c68 ffffffff81134604 > > Call Trace: > > [] delete_from_page_cache+0x44/0x70 > > [] truncate_inode_page+0x5b/0x90 > > [] truncate_inode_pages_range+0x1a4/0x6c0 > > [] truncate_inode_pages+0x15/0x20 > > [] truncate_inode_pages_final+0x3c/0x50 > > [] evict+0x16c/0x180 > > [] iput+0x105/0x190 > > [] do_unlinkat+0x189/0x2b0 > > [] SyS_unlink+0x16/0x20 > > [] system_call_fastpath+0x12/0x17 > > Code: 66 0f 1f 44 00 00 48 8b 75 c0 4c 89 ff e8 e4 5d 1f 00 84 c0 0f 85 > > 5e fe ff ff e9 41 fe ff ff 0f 1f 80 00 00 00 00 e8 75 70 4b 00 <0f> 0b > > 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 83 e2 fd 48 > > RIP [] __delete_from_page_cache+0x2b5/0x2c0 > > RSP > > ---[ end trace a4727cb71335dbd4 ]--- > > > > Is this a known bug? > > +Hugh, Konstantin. > > Nothing I recognize. Looks somewhat like[1], but not really. > > Do you have a way to reproduce? What fs it was? > > [1] lkml.kernel.org/g/20140603042121.GA27177@redhat.com I put a lot of thought into that one, but never found a convincing answer. Either it went away, or Dave grew tired of re-reporting it and getting no fix. For an instant, I wondered if your recent discovery of page mapcount being used for two purposes on a compound tail could account for these; but I don't think so, Stefan's stacktrace shows we're dealing with an ordinary filesystem page, which should be neither compound nor tail. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org