From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D1D8C433EF for ; Wed, 15 Sep 2021 09:28:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CEB836124D for ; Wed, 15 Sep 2021 09:28:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org CEB836124D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 13E786B0071; Wed, 15 Sep 2021 05:28:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EDDC900002; Wed, 15 Sep 2021 05:28:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1D7A6B0073; Wed, 15 Sep 2021 05:28:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id E44E76B0071 for ; Wed, 15 Sep 2021 05:28:28 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8C7B23C7C3 for ; Wed, 15 Sep 2021 09:28:28 +0000 (UTC) X-FDA: 78589282296.26.3633E84 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf06.hostedemail.com (Postfix) with ESMTP id 0CDC2801A8A3 for ; Wed, 15 Sep 2021 09:28:27 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E5A13221EC; Wed, 15 Sep 2021 09:28:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1631698106; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aKnk6K1PQeRI3usztj+elqZdoJBzOLfZdcbOjbB+Uak=; b=h1W6aOujV4DecO4MFGTmcp/kiPOykON2cIN1Bqgs9oiNkbUQoPm9SYDoxgMAwCOAzD7RrY Mgj4ESq2Civ+4v9o1Z/0idDF4VhfV+LCWLNb2z6HvhlAoyMkdAQyRV1a7VMJCAfVMaCB3T SOjwPZ5mMwqEPVj1OCNiTlANS0CqshA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1631698106; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aKnk6K1PQeRI3usztj+elqZdoJBzOLfZdcbOjbB+Uak=; b=aLbDAyUUnVFj9UYzZrryquEdbZiFc3h3kvIXyu1823l0E5f1EfRfw6P9RLjhvDPUkZt23W iIu1ZsAZnjO9qNDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id BD9EB13AD4; Wed, 15 Sep 2021 09:28:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id QkVgLbq8QWEuVwAAMHmgww (envelope-from ); Wed, 15 Sep 2021 09:28:26 +0000 Message-ID: <8a32b437-4cea-f265-b26e-509466d5290b@suse.cz> Date: Wed, 15 Sep 2021 11:28:26 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0 Subject: Re: [GIT PULL] tracing: Fixes to bootconfig memory management Content-Language: en-US To: Linus Torvalds Cc: Steven Rostedt , Mike Rapoport , Andrew Morton , LKML , Ingo Molnar , Masami Hiramatsu , Linux-MM References: <20210914105620.677b90e5@oasis.local.home> <20210914145953.189f15dc@oasis.local.home> <20210914170553.7c1e1faa@oasis.local.home> <4392e867-0cce-d04a-e3d1-cba152daaa1f@suse.cz> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0CDC2801A8A3 X-Stat-Signature: 9zei7p4b7oadrnhw6ht86codnfes5xc3 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=h1W6aOuj; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=aLbDAyUU; spf=pass (imf06.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none X-HE-Tag: 1631698107-161315 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/15/21 01:29, Linus Torvalds wrote: > On Tue, Sep 14, 2021 at 3:48 PM Vlastimil Babka wrote: >> >> Well, looks like I can't. Commit 77e02cf57b6cf does boot fine for me, >> multiple times. But so now does the parent commit 6a4746ba06191. Looks like >> the magic is gone. I'm now surprised how deterministic it was during the >> bisect (most bad cases manifested on first boot, only few at second). > > Well, your report was clearly memory corruption by the invalid > memblock_free() just ending up causing random problems later on. > So it could easily be 100% deterministic with a certain memory layout > at a particular commit. And then enough other changes later, and it's > all gone, because the memory corruption now hits something else that > didn't even care. > > The code for your oops was > > 0: 48 8b 17 mov (%rdi),%rdx > 3: 48 39 d7 cmp %rdx,%rdi > 6: 74 43 je 0x4b > 8: 48 8b 47 08 mov 0x8(%rdi),%rax > c: 48 85 c0 test %rax,%rax > f: 74 23 je 0x34 > 11: 49 89 c0 mov %rax,%r8 > 14:* 48 8b 40 10 mov 0x10(%rax),%rax <-- trapping instruction > > and that's the start of rb_next(), so what's going on is that > "rb->rb_right" (the second word of 'struct rb_node') ends up having > that value in %rax: > > RAX: 343479726f6d656d > > which is ASCII "44yromem" rather than a valid pointer if I looked that up right. Yep, I was pretty sure it was related to the "/sys/bus/memory/devices/memory44" sysfs object and bisection would lead to kobject/sysfs or some memory hotplug related changes. So the result was a surprise. > And just _slightly_ different allocation patterns, and your 'struct > rb_node' gets allocated somewhere else, and you don't see the oops at > all, or you get it later in some different place. > > Most memory corruption doesn't cause oopses, because most memory isn't > used as pointers etc. > > What you _could_ try if you care enough is > > - go back to the thing you bisectted to where you can still hopefully > recreate the problem > > - apply that patch at that point with no other changes > > and then the test would hopefully be closer to the state you could > re-create the problem. > > And hopefully it would still not reproduce, just because the bug is > fixed, of course ;) Yeah, that worked! Commit 40caa127f3c7 was still broken, and cherry-pick of 77e02cf57b6cf on top fixed it. Thanks! > The very unlikely alternative is that your bisect was just pure random > bad luck and hit the wrong commit entirely, and the oops was due to > some other problem. > > But it does seem unlikely to be something else. Usually when bisects > go off into the weeds due to not being reproducible, they go very > obviously off into the weeds rather than point to something that ends > up having a very similar bug. > > Linus >