From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77C81C433F5 for ; Tue, 14 Sep 2021 23:29:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EF5CF61157 for ; Tue, 14 Sep 2021 23:29:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EF5CF61157 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 6C29A6B006C; Tue, 14 Sep 2021 19:29:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6720F6B0072; Tue, 14 Sep 2021 19:29:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 56117900002; Tue, 14 Sep 2021 19:29:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id 46C416B006C for ; Tue, 14 Sep 2021 19:29:47 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id EC8EB23E7B for ; Tue, 14 Sep 2021 23:29:46 +0000 (UTC) X-FDA: 78587773572.23.20ACD3F Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by imf07.hostedemail.com (Postfix) with ESMTP id 8FF0710000A8 for ; Tue, 14 Sep 2021 23:29:46 +0000 (UTC) Received: by mail-lj1-f181.google.com with SMTP id p15so1784352ljn.3 for ; Tue, 14 Sep 2021 16:29:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0PDbAl7XBZ5+izFdXy34C90+UtfxGn4oelrSaT4hvZo=; b=hlSJFVl2f17plnlKffHP/5QoB3uHmrBbD+1tiGUIZLKiDMBgtJ3uBmdOSRBxP5+jsD aaKltm8spMSCmqJAcuexFIwMAQnUUPYC0fbxtQ9uTnNTtEb0pDhDsTsD8r8bhE5H+b5P 9C4lDvKr/M/ipfhZIzHDgdS25QNz/6UuPzLkA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0PDbAl7XBZ5+izFdXy34C90+UtfxGn4oelrSaT4hvZo=; b=lt5/7TaX4rhhNhpyD9Z+YwIO3kPqt5D5MDCMYdEBOM/Zl1LmJMMjNbaffT1ZRHSlpZ gTm0NxPtcL7Z4yLuJXNfXMoreiddohfWI8UuJO9M38BAS4Jo8qsJ1qlj3WbPOu4s2Rgh zO1LRSPn5SvqX5efA1IaEsM89rl/rFkrb8T+WhWpMMyxI8k2kez++dU9iMlaBnt8QzD3 92JdXEOoIvi9XY8MSGJTkGZWlYcw5+xkr66Cr4o1KLTvXle6EXjGbBrK0W0Eg8Ke54iU qHPTcEircRb6hFJMHazb/yFWia2ElDId39mWFQyt2o6A0p2H1VRQsxZBqqr7Jluqu6YM o6RA== X-Gm-Message-State: AOAM531v0Z5/DG9yvKz9puWTBfey4FDWflepXHH9fAsmeVjfaQ/LwnQs sXpR/nYAVZQBUGL0RlYnWQG72mSMoU30jnP3Gfk= X-Google-Smtp-Source: ABdhPJzecirL/eStC8ccTkswCCKsecd8LO4toPGjznV+kWXlyTOOFI42NGJsZMw7omyZcpXyC/NuGQ== X-Received: by 2002:a05:651c:281:: with SMTP id b1mr18164232ljo.372.1631662184558; Tue, 14 Sep 2021 16:29:44 -0700 (PDT) Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com. [209.85.208.173]) by smtp.gmail.com with ESMTPSA id s9sm1485113ljp.34.2021.09.14.16.29.43 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 14 Sep 2021 16:29:43 -0700 (PDT) Received: by mail-lj1-f173.google.com with SMTP id s3so1653148ljp.11 for ; Tue, 14 Sep 2021 16:29:43 -0700 (PDT) X-Received: by 2002:a2e:96c7:: with SMTP id d7mr17832442ljj.191.1631662183146; Tue, 14 Sep 2021 16:29:43 -0700 (PDT) MIME-Version: 1.0 References: <20210914105620.677b90e5@oasis.local.home> <20210914145953.189f15dc@oasis.local.home> <20210914170553.7c1e1faa@oasis.local.home> <4392e867-0cce-d04a-e3d1-cba152daaa1f@suse.cz> In-Reply-To: <4392e867-0cce-d04a-e3d1-cba152daaa1f@suse.cz> From: Linus Torvalds Date: Tue, 14 Sep 2021 16:29:27 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [GIT PULL] tracing: Fixes to bootconfig memory management To: Vlastimil Babka Cc: Steven Rostedt , Mike Rapoport , Andrew Morton , LKML , Ingo Molnar , Masami Hiramatsu , Linux-MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8FF0710000A8 X-Stat-Signature: 3mzs6k7s6mrhiyi8txfzgfbido7h6d87 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=hlSJFVl2; spf=pass (imf07.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.181 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-HE-Tag: 1631662186-956989 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 14, 2021 at 3:48 PM Vlastimil Babka wrote: > > Well, looks like I can't. Commit 77e02cf57b6cf does boot fine for me, > multiple times. But so now does the parent commit 6a4746ba06191. Looks like > the magic is gone. I'm now surprised how deterministic it was during the > bisect (most bad cases manifested on first boot, only few at second). Well, your report was clearly memory corruption by the invalid memblock_free() just ending up causing random problems later on. So it could easily be 100% deterministic with a certain memory layout at a particular commit. And then enough other changes later, and it's all gone, because the memory corruption now hits something else that didn't even care. The code for your oops was 0: 48 8b 17 mov (%rdi),%rdx 3: 48 39 d7 cmp %rdx,%rdi 6: 74 43 je 0x4b 8: 48 8b 47 08 mov 0x8(%rdi),%rax c: 48 85 c0 test %rax,%rax f: 74 23 je 0x34 11: 49 89 c0 mov %rax,%r8 14:* 48 8b 40 10 mov 0x10(%rax),%rax <-- trapping instruction and that's the start of rb_next(), so what's going on is that "rb->rb_right" (the second word of 'struct rb_node') ends up having that value in %rax: RAX: 343479726f6d656d which is ASCII "44yromem" rather than a valid pointer if I looked that up right. And just _slightly_ different allocation patterns, and your 'struct rb_node' gets allocated somewhere else, and you don't see the oops at all, or you get it later in some different place. Most memory corruption doesn't cause oopses, because most memory isn't used as pointers etc. What you _could_ try if you care enough is - go back to the thing you bisectted to where you can still hopefully recreate the problem - apply that patch at that point with no other changes and then the test would hopefully be closer to the state you could re-create the problem. And hopefully it would still not reproduce, just because the bug is fixed, of course ;) The very unlikely alternative is that your bisect was just pure random bad luck and hit the wrong commit entirely, and the oops was due to some other problem. But it does seem unlikely to be something else. Usually when bisects go off into the weeds due to not being reproducible, they go very obviously off into the weeds rather than point to something that ends up having a very similar bug. Linus