From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89A8BC5479D for ; Fri, 6 Jan 2023 10:13:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3DE68E0002; Fri, 6 Jan 2023 05:13:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DEE308E0001; Fri, 6 Jan 2023 05:13:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D042A8E0002; Fri, 6 Jan 2023 05:13:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C210A8E0001 for ; Fri, 6 Jan 2023 05:13:19 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9C6B2A0B40 for ; Fri, 6 Jan 2023 10:13:19 +0000 (UTC) X-FDA: 80323961718.23.F24C9C6 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf30.hostedemail.com (Postfix) with ESMTP id ADA1680002 for ; Fri, 6 Jan 2023 10:13:17 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=MaULLB4D; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=mWBUqwKn; spf=pass (imf30.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672999998; a=rsa-sha256; cv=none; b=l5xig9ofr9wi1hgn1+nbsk+MdsaIR9W+yUZMrGT75M3P5S3CdhKVfG+jMi3p5yKwNe3cZa /iZNj4HNMqaV+xJvug0Dfer6c2g9QiWNHq3miigQjtM4fN3RceA0DGMoxbh+RPBRIchqD4 l8VlI3n74boDNkOSG/dezeN7+DHpVCY= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=MaULLB4D; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=mWBUqwKn; spf=pass (imf30.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672999998; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nERWJ8jWPlGQK5idcTkJNelcJmXRYvfJXwDNtJ1Pe4A=; b=emyeFNct+8HRfEPwUylxf56jlz4HUjiv+WYFR88IMMaLfaJAZp/kfXfNSwn5sGiKE5Cf79 kngzGHObS0qTHDEBc73pOtkQflxnRbHPB4afBpwEo5pZBNO+Y9zf7sidMJ16UnbwGgyUWD IbOvEAkLjjNJsE1S68TH+pfnRkoPMOQ= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DF0B925EF0; Fri, 6 Jan 2023 10:13:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1672999995; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nERWJ8jWPlGQK5idcTkJNelcJmXRYvfJXwDNtJ1Pe4A=; b=MaULLB4DgfHl6u1tgkaruYPDRAeRoPXNjzSi8ktosf+Vp4qW34m6hu/KGSnOUf8UUsBToy XmasH0rlqiw3VGGc5imHtEYM85kkr2wIcRSU6CjicdfCmVu52DE9dyRnODc+VWJqbWHp0B l4mfnAZlcgGaExZtsxAydkc0LrUUX3w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1672999995; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nERWJ8jWPlGQK5idcTkJNelcJmXRYvfJXwDNtJ1Pe4A=; b=mWBUqwKncFqQaIkx1E72Oor3bxc7RsmdoL6G1gx5pBA4JvF6UTFVnwDU2kl7KPCYmlJ7cz KLcLyV+K/gzhenAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B847B139D5; Fri, 6 Jan 2023 10:13:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Gk9HLDv0t2OsfQAAMHmgww (envelope-from ); Fri, 06 Jan 2023 10:13:15 +0000 Message-ID: <3f7fa3b3-9623-5c4c-94b1-a41dea6eaaf2@suse.cz> Date: Fri, 6 Jan 2023 11:13:15 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [linus:master] [mm, slub] 0af8489b02: kernel_BUG_at_include/linux/mm.h To: Oliver Sang , Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: oe-lkp@lists.linux.dev, lkp@intel.com, Mike Rapoport , Christoph Lameter , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox References: <202212312021.bc1efe86-oliver.sang@intel.com> <41276905-b8a5-76ae-8a17-a8ec6558e988@suse.cz> Content-Language: en-US From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: ADA1680002 X-Rspamd-Server: rspam01 X-Stat-Signature: drdsgmiad16kkynsap4q1f5yftjjbedh X-HE-Tag: 1672999997-854294 X-HE-Meta: U2FsdGVkX18y9t0DpUz9Z1lw3nNCFHr2AuntL2Aw+6mj8LAAd89yu3c6xAxzgSVVDo0S7kQ2sQzkEg2qTcv707Sk1LR3V3BZwTOblf+aZL9sTLfbVq6ll4yL4Zp5bEYt0eQ9308qCy0hGLrS6IJs5usKUVd1qpl8S5WiDiyr4oY//uTyPJL5mthy089Hjmim4TEYjciYfDB/PLws4LDh9JWvDh1IUFcWnM39+XH+9n7WtTZg5QHNwqo/EGsXGEh8nNb8XRXXAc/RyBI3RbkQjBzu8g6kAhO+5boVCB/YB6fk/8x5fnwtrTV23onozFJ6tE6EtPYf7bckxHcWYqx2Q13OfOhtCiA6SKea7OcqU68cGz1old4ycHOxDoijcHzlhQujIaxdhxE4pawphD640CGnQIj67QA5HNMIVnjzkArO+EMxzkX3njJrWrC4iktDlNUfFRRVH0uxaKQIKkH8K8Vs8c6i/tmEVUfrknpUzZfyyE/mMgziFUXqRLFJuFEIlGfD/sDvyjgQLjro+v+kcdwUHxyY7w3BfWUH8TieWYGg/cqxx5OQSzPDBqPtaio8ddcPQpBVYgLKPyi6SogNvXmGZsZXQQvgj/19qsNOFczCrVlGUcOA0+v+0Rutn4ooQ2BhSue9KTe1oSS3CNwhQY40PfyAl3KnDQZLR5LU4WI+fT/sXxEUb8bZGG33/aikKZjPD4m7tZO/fj/0q6Uxd2ZrgoMpVmG8Q1hBb4O4qAXds+X52uQRmQEfdO3QnSB7qxnp1dVUW2/lZ08Otju8rliIkDgxJ+odDlMDouq43Px/KdMy68WTVgC/oJ5h7bP3mZxS2fIw4RYXj4Sjnx3Wt0igqTfy2yr+BqoJhe5INrisu8o9ZCRga0Q4au44knK9z9/HYbSABsw7JE7OOX5bdrzlXqY1V/NrZjsScLrsQ3DDVHpshsCkjrcXA4eH1tPS3IB059QpDIakvIb6P/G Wju6bkad tbB3pzVJgZCufy8qQlwhHTrBdqLA5bXhiU+gMhCBY61hs5UvTCdOgZJT7iw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/5/23 02:46, Oliver Sang wrote: > hi, Hyeonggon, hi, Vlastimil, > > On Wed, Jan 04, 2023 at 06:04:20PM +0900, Hyeonggon Yoo wrote: >> On Tue, Jan 03, 2023 at 09:46:33PM +0800, Oliver Sang wrote: >> > On Tue, Jan 03, 2023 at 11:42:11AM +0100, Vlastimil Babka wrote: >> > > So the events leading up to this could be something like: >> > > >> > > - 0x2daee is order-1 slab folio of the inode cache, sitting on the partial list >> > > - despite being on partial list, it's freed ??? >> > > - somebody else allocates order-2 page 0x2daec and uses it for whatever, >> > > then frees it >> > > - 0x2daec is reallocated as order-1 slab from names_cache, then freed >> > > - we try to allocate from the slab page 0x2daee and trip on the PageTail >> > > >> > > Except, the freeing of order-2 page would have reset the PageTail and >> > > compound_head in 0x2daec, so this is even more complicated or involves some >> > > extra race? >> > >> > FYI, we ran tests more up to 500 times, then saw different issues but rate is >> > actually low >> > >> > 56d5a2b9ba85a390 0af8489b0216fa1dd83e264bef8 >> > ---------------- --------------------------- >> > fail:runs %reproduction fail:runs >> > | | | >> > :500 12% 61:500 dmesg.invalid_opcode:#[##] >> > :500 3% 14:500 dmesg.kernel_BUG_at_include/linux/mm.h >> > :500 3% 17:500 dmesg.kernel_BUG_at_include/linux/page-flags.h >> > :500 5% 26:500 dmesg.kernel_BUG_at_lib/list_debug.c >> > :500 0% 2:500 dmesg.kernel_BUG_at_mm/page_alloc.c >> > :500 0% 2:500 dmesg.kernel_BUG_at_mm/usercopy.c >> > > > hi Vlastimil, > > as you mentioned >> Hm even if rate is low, the different kinds of reports could be useful to >> see, if all of that is caused by the commit. > > we tried to run tests even more times, but with the config which enable > CONFIG_DEBUG_PAGEALLOC > CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT > (config is attached as > config-6.1.0-rc2-00014-g0af8489b0216+CONFIG_DEBUG_PAGEALLOC+CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT > the only diff with previous config is > @@ -5601,7 +5601,8 @@ CONFIG_HAVE_KCSAN_COMPILER=y > # Memory Debugging > # > CONFIG_PAGE_EXTENSION=y > -# CONFIG_DEBUG_PAGEALLOC is not set > +CONFIG_DEBUG_PAGEALLOC=y > +CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y > CONFIG_PAGE_OWNER=y > # CONFIG_PAGE_POISONING is not set > CONFIG_DEBUG_PAGE_REF=y > ) > > what we found now is some issues are also reproduced on parent now (still by > rcutorture tests here), though seems lower rate on parent. > > ========================================================================================= > compiler/kconfig/rootfs/runtime/tbox_group/test/testcase/torture_type: > gcc-11/i386-randconfig-a012-20221226+CONFIG_DEBUG_PAGEALLOC+CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT/debian-11.1-i386-20220923.cgz/300s/vm-snb/default/rcutorture/tasks-tracing > > 56d5a2b9ba85a390 0af8489b0216fa1dd83e264bef8 > ---------------- --------------------------- > fail:runs %reproduction fail:runs > | | | > 8:985 19% 199:990 dmesg.invalid_opcode:#[##] > :985 5% 51:990 dmesg.kernel_BUG_at_include/linux/mm.h > 3:985 4% 41:990 dmesg.kernel_BUG_at_include/linux/page-flags.h > 4:985 10% 102:990 dmesg.kernel_BUG_at_lib/list_debug.c > :985 0% 2:990 dmesg.kernel_BUG_at_mm/page_alloc.c > 1:985 0% 3:990 dmesg.kernel_BUG_at_mm/usercopy.c > > however, we noticed dmesg.kernel_BUG_at_include/linux/mm.h still have > relatively high rate on this commit but keeps clean on parent. Well that's interesting. As long as any bugs happen in the parent, it could mean the commit we suspect is just changing the circumstances and creating conditions that increase the bug happening - e.g. because it causes slab pages to be always immediately freed when the last object is freed. So I would be curiou about how some of the reports from the parent look like in detail. And if the rate at the parent (has it increased thanks to the DEBUG_PAGEALLOC?) is sufficient to bisect to the truly first bad commit. Thanks!