From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52EBDC433FE for ; Thu, 3 Nov 2022 16:21:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DE4B6B0072; Thu, 3 Nov 2022 12:21:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 98F166B0073; Thu, 3 Nov 2022 12:21:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 880066B0074; Thu, 3 Nov 2022 12:21:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 79E626B0072 for ; Thu, 3 Nov 2022 12:21:30 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 46BD38020B for ; Thu, 3 Nov 2022 16:21:30 +0000 (UTC) X-FDA: 80092646340.29.9E290C5 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf15.hostedemail.com (Postfix) with ESMTP id EF1BEA000E for ; Thu, 3 Nov 2022 16:21:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=0pgsoR6kkQjgxI+bNREy0AsEnu3e8l5TxjpkPZ5IsoQ=; b=H0SmUpIA13SzlED0mDkPpr2pF8 Q71rHK1fFRCG9uT3DnOl4qMQDvZBkWQ/+/bBtzIwqC0oeyKpArlNMaqNY0ON9B40MXkYio9+SRfGi u1tS0iFAGQzWM7pYUE9iHwh/J9WvGD4RbU36s17mm9YCJjlAdydYjIYKKxkVGY6AFcXtmJtzvgwXF guK398GkHIqgdtRXsaF5TEHICUl+plwgGeOOE4U8D5EMPkJ1xV/nDUDbNwAg2mIuROh+JhcmIFbKW Gy46CGjBoXuvD9Z+x/LAvsZU2hnORpXvOg4HKQYckisY7Hl6H9O3dG1sGODMKp2sqjtLI8PAeiJ1b kMfPWxAQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1oqcy6-006az9-44; Thu, 03 Nov 2022 16:21:26 +0000 Date: Thu, 3 Nov 2022 16:21:26 +0000 From: Matthew Wilcox To: Thorsten Leemhuis Cc: LKML , Linux-MM , Andrew Morton , Mikhail Pletnev Subject: Re: [REGRESSION] Bug 216646 - having TRANSPARENT_HUGEPAGE enabled hangs some applications Message-ID: References: <37cd0a8d-bbd1-baf3-9c37-0cb8325b4cb3@leemhuis.info> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <37cd0a8d-bbd1-baf3-9c37-0cb8325b4cb3@leemhuis.info> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667492489; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0pgsoR6kkQjgxI+bNREy0AsEnu3e8l5TxjpkPZ5IsoQ=; b=wNXIQmT+Yeut2lX/nk9iGeNxeSDE27cocT6w8OkugZNTRRihy/ytIIbat8ccBDQgFLoxfp JcugV51rFYD2CgvmnJHXkKc4HR+yC6GbZmNp3hm7YD9xYlFL7g42GFqx/PbL4bKN41rnRd iqhgwu4W4yUmCXKNvlN01qloDI07Sig= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=H0SmUpIA; dmarc=none; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667492489; a=rsa-sha256; cv=none; b=WyGSV7MyKhvQLuP+XBcs3epd231W+wbgc8pRW1BBCcgu8x3lTG3mzKGN3FcSc0PhKze/QP +FyirK4oHDdiGCwJeogqI7uQIz59L/F+KNxSl/0Obvyu/hUwkU1c3XFFX1T+FIRyORUIlZ 42ZTBsg05ouGulCg5ahEp3hbhtr2jjA= Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=H0SmUpIA; dmarc=none; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: EF1BEA000E X-Stat-Signature: ey56cs3qizrfihs7bjdyszjiwnfujnx1 X-HE-Tag: 1667492488-101749 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Nov 03, 2022 at 02:51:48PM +0100, Thorsten Leemhuis wrote: > Hi, this is your Linux kernel regression tracker speaking. > > Matthew, I noticed a regression report in bugzilla.kernel.org. As many > (most?) kernel developer don't keep an eye on it, I decided to forward > it by mail. Quoting from Thanks, Thorsten. I had no idea this issue had been filed. The sooner kernel bug tracking switches to something useful like debbugs, the better. > https://bugzilla.kernel.org/show_bug.cgi?id=216646 : > > > Mikhail Pletnev 2022-11-01 02:43:59 UTC > > > > Created attachment 303112 [details] > > dmesg error > > > > After updating kernel past 5.17 (checked in 5.19, 6.06), deluge torrent client began to hang after 1-4 hours of runtime, (when under heavy load - thousands of files mmapped and read at 20+MB/s) with following message in dmesg: > > > > BUG: kernel NULL pointer dereference, address: 0000000000000096 > > #PF: supervisor read access in kernel mode > > #PF: error_code(0x0000) - not-present page > > PGD 0 P4D 0 > > Oops: 0000 [#1] PREEMPT SMP NOPTI > > CPU: 15 PID: 8263 Comm: Disk Not tainted 5.17.0-rc4_ap-00165-g56a4d67c264e-dirty #36 > > Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 > > RIP: 0010:__filemap_get_folio+0x9e/0x350 > > Code: 10 e8 46 06 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 > > RSP: 0000:ffffbe1044ad3cb0 EFLAGS: 00010246 > > RAX: 0000000000000062 RBX: 0000000000000062 RCX: 0000000000000002 > > RDX: 000000000000001c RSI: ffffbe1044ad3cc0 RDI: ffff9fca83239ff0 > > RBP: 0000000000000000 R08: ffffbe1044ad3d40 R09: 0000000000000000 > > R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000 > > R13: ffff9fcbee9efa78 R14: 000000000004285e R15: fff000003fffffff > > FS: 00007f0a763fc640(0000) GS:ffff9fd23edc0000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000096 CR3: 0000000122c60000 CR4: 0000000000750ee0 Very interesting. The code disassembles to: 9: 48 3d 02 04 00 00 cmp $0x402,%rax f: 74 e2 je 0xfffffffffffffff3 11: 48 3d 06 04 00 00 cmp $0x406,%rax 17: 74 da je 0xfffffffffffffff3 19: 48 85 c0 test %rax,%rax 1c: 0f 84 3e 02 00 00 je 0x260 22: a8 01 test $0x1,%al 24: 0f 85 40 02 00 00 jne 0x26a 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 74 c2 je 0xfffffffffffffff3 which I recognise as this part of mapping_get_entry() (must have been inlined into __filemap_get_folio() -- in future, it helps enormously if you can run the trace through scripts/decode_stacktrace.sh) folio = xas_load(&xas); if (xas_retry(&xas, folio)) goto repeat; /* * A shadow entry of a recently evicted page, or a swap entry from * shmem/tmpfs. Return it without attempting to raise page count. */ if (!folio || xa_is_value(folio)) goto out; if (!folio_try_get_rcu(folio)) goto repeat; The trap happens when we attempt to load from offset 0x34 of rax -- the refcount field of struct folio. And rax is 0x62 instead of being a valid pointer. This should not be possible; 0x62 is used to represent a "sibling entry" in the XArray that underlies the page cache. xas_descend() checks if you hit a sibling entry, and if you did, it loads the canonical entry instead. The only way I can see this happening is if there's a sibling entry pointing to another sibling entry. What I don't know is whether your machine is experiencing a temporary glitch in the tree (because it's RCU protected, it might be observing a store in progress) or whether it has a corrupted tree where one sibling entry is pointing to another and this will be observable by any future load (until something happens to overwrite these entries in the cache). > > git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40 I suspect this is where your bisection went astray. This should have been bad and it led you to the wrong commit.