From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AD69EB64DC for ; Tue, 11 Jul 2023 04:34:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9B008E0001; Tue, 11 Jul 2023 00:34:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C21308D0001; Tue, 11 Jul 2023 00:34:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A74378E0001; Tue, 11 Jul 2023 00:34:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 922B98D0001 for ; Tue, 11 Jul 2023 00:34:58 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5C2DE1601DF for ; Tue, 11 Jul 2023 04:34:58 +0000 (UTC) X-FDA: 80998065876.07.9C23774 Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) by imf04.hostedemail.com (Postfix) with ESMTP id 8D4E040006 for ; Tue, 11 Jul 2023 04:34:56 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=TgMxfCmE; spf=pass (imf04.hostedemail.com: domain of hughd@google.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689050096; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=59lHc5cCO4CCd9AaCyKQZBif+8hRbZLDuQW/Fgs4KeI=; b=029+vPNmEmXFSxpyjDyPxrIG7w1tE4plFRIa6g6q9GLiuPZFPBRsbctTgBP+f6Q2xQXuBk a6URizJ4EI5cGD25Up6rBGEiAi9RLTyELcj5t9VymaQrXhyPPtMTszcnr4gTw5Cf1SsGGU Vz2S596jkHfuYdXbhjLbRJsmW3WxHuc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689050096; a=rsa-sha256; cv=none; b=1HJhv4kdBcwNMa5MgqLqdSM3/J+5hGR700cbpP1WtCxzoZM8f6XiEULIRAmrsmzBnltV/H diBlNaDJSiMCiMV9U+ZXKFZMtYoR4CBz7UvsB/Hf9VPXD8i4oF1JFale506q+9ulRVg8Iw uorgXATzAfiDzWF7BherzQ2AlL/8xp8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=TgMxfCmE; spf=pass (imf04.hostedemail.com: domain of hughd@google.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f178.google.com with SMTP id 3f1490d57ef6-bd0a359ca35so4756570276.3 for ; Mon, 10 Jul 2023 21:34:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689050095; x=1691642095; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=59lHc5cCO4CCd9AaCyKQZBif+8hRbZLDuQW/Fgs4KeI=; b=TgMxfCmEug4Hqci9c95+8Pz0VXZ+wwRqfJm0LlEemaGfb3GfQMhI9ZnUuwoy6Fu7yz YVT1f8uqh7sHBlkDw72AzQs9L93loFH4I3xAtt5tWPZfkFagyrfAWktnOXAkeG9sWk02 4IKnsIEamCg8TBBG/OrGXPLPKMktnwa3Zb3g6sdbXZjEf/AAS3ntExLhqCzUXcfh/jwn lPPEwM1P8QVjI9xwgPctWZ0w6jJMBmPudIJjw3rbx+pMzZVVCMz+0rllFdmOGv6Zh07X 3wpIMi7qby9hYknxzsLOcSIaUV8QW8L7tItwmeapYa6kzOaYGhbBzfumq9FbrfPlS+TR qp/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689050095; x=1691642095; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=59lHc5cCO4CCd9AaCyKQZBif+8hRbZLDuQW/Fgs4KeI=; b=YNbsgAKq1H34XmbpNIKTCCurtWOghPS7UQkhJiQhywXv75Ol3G2ja1YYtCtVHqcwV4 jiAGF3+/53lYmEZhE2xeMEj6evRB6dksi4CfTB6jX+g6aUyOdr09hY/cFtxIhF7mpgtH vnlBrfZbKs3jJAnDHFlSxXC0CrjReVDmgdk4SGMCnE8YwMAX0qDUb4kPZ/ZRFKPSEZ18 ARKpi+rWL8Wlcwv9NUPkGTInBeqhfR7Px74QivY3YRHrmjfQIXaCLyMKOF9J77kInzRY 4ZAXiJKDMypkFnvzlECiszqWbLNFueYHYnGyo+/g1JR4yvreLzmrWrTnpFMmekrlmktr jucw== X-Gm-Message-State: ABy/qLZ/A8/lAXmgfNo6CDAlXJIJvp3Bm5XxweYSAs2d1pYaUA7/8r/K vxLMYsl5OHJclmGod9PrywznaA== X-Google-Smtp-Source: APBJJlECU9hb9crfksCQd+CQBlVo69/nP5NSfK3cb4UgLtyr7WZLSvPO5Vz952AMVAWrmXSiJNPxEg== X-Received: by 2002:a25:3243:0:b0:c6b:2072:e35c with SMTP id y64-20020a253243000000b00c6b2072e35cmr10298911yby.59.1689050095332; Mon, 10 Jul 2023 21:34:55 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c197-20020a25c0ce000000b00c5d48ed4317sm288700ybf.49.2023.07.10.21.34.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 21:34:54 -0700 (PDT) Date: Mon, 10 Jul 2023 21:34:42 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Mark Brown cc: Lorenzo Stoakes , Hugh Dickins , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org, regressions@leemhuis.info, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH v2 12/32] mm/vmalloc: vmalloc_to_page() use pte_offset_kernel() In-Reply-To: Message-ID: <977ddee4-35f0-fcd1-2fd-1c3057e7ea2a@google.com> References: <696386a-84f8-b33c-82e5-f865ed6eb39@google.com> <42279f1f-7b82-40dc-8546-86171018729c@sirena.org.uk> <901ae88d-ad0c-4e9d-b199-f1566cc62a00@lucifer.local> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Stat-Signature: dbaiif5rqrnqiby5broz5ykc91n5pcii X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 8D4E040006 X-Rspam-User: X-HE-Tag: 1689050096-841619 X-HE-Meta: U2FsdGVkX18dG5UqWFDVKNDBCys80B8qiDkFTjFIbPC7TEYkSkshYsZmHhsH1pGzeQQkhURogKNlbn7XTPqKSlXZfzkgxpfX6qkxz6tbIWMFxXbDDdEB3dgqn3bRtmOQyc3LwciLhmyTP9qzLMg09Aa9eWQUq3vUlmeoMPZCXlVgyOXhMe7o3/jORluXWLJfjST6+lLRL6RhY/bBZ0wiFyZuAJhu53ugpKBdlt4EVEJy9l8gVz2zBebvnQv6+5vc6/qjYl9/BIfrfz0P1pYVo03SXckaUSjtC7nrvPHhT8AUP+1rsC9d9NIKCVhlKzrzx9WHaX0RP5iT1sp9rDpELL9hU79TkYQ2VSYiP0wUHZIfDu7tDsc6OlQSX1UWRQB8Rut6BGE+9mFrxOWJ1wkbIisuUNpvyD6lbRZGQGGoaaPZr6YHoy3u5Ovslmlr+R2DFdJ34Sf2VdP64mHqzaNaRF2k1DTYmRgIfX16lT1Rfjxkka3csEx7UWkQiPPkpxiAZSEwsTKxz7Ffg1KjmY6GW82QJ580XeK2EK6WqzUTeJc/IAPTgLBkUAHhAgCE05/4l6m/LCH0h5iVnnie8mevpMX0ItysWoRs87LCOpowcdjh2+emH49m4/vMjTcsn3yv75JAQdxA7IIM9Q6PJsrSQpqJlb7jWU/4NZjLP0ct97r0b92saEVVvNFYxPk7myB8U5WIeD5uRULjlXr/EQIW1qpUnibUDmcFQfi3SBKzsktlHXwm1MAG0zX0k1qfq2EMry20+a3ZetcIZ8oonQxz9g55eHIyNz0BzTP8Uw4pH2S+Bd7Y8OTKnmbjzU4ZPjp6fdEHMt+MAnO8oDL2QeEFbu6Elw0CK603cAXjosm9CRdGly5bVzdBbDM5kVdqAs/GNfbutVfb6SOOt2iDrLs5ew2RDXBmYrvH4eDpFJXDHi0iCUM9D69DUYQ8sLMOQS2lA8nVTBHD+8D3SRihtP3 rBj2UQs8 0fd0CwYrmOjAAEk/Z2+hUqnZyZBptCsW/6QvQyVQerMyNTiJYBz5Rp+thiqoW7/+ulsGdIUXLvP6PGkpWaw8JcJvTra9GLhzOhsEUuJY/KCj53dqRfulm3tHTr7p85cFNxo6xsL0xc4PkLt2u1rK5RUJEYsQ5UBL2DxBTdrhDDRbuY4KUnGn9/AZpEtk0KUKAQ6Ob0/4Um6fTo/8D2DE21bn8GI45JAKLAHj2p42zsueTN20tRtcGMqFSLpTa/uyr2PtOzRv4Y+ES1vakH83OZC3u4jAnf0L7IhXC/618ErVbTKCZnONQyqdTnnJ581xPaLH+odiFFqTl8fCVhHBoP6hIAOhbpT6CKpZ46T84slj0zbtR6b6k/SbgMDjC8R19gKzVo2JDGZ1EM0AAl+9HMZffZ4YNiDZ69/3U X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 10 Jul 2023, Mark Brown wrote: > On Mon, Jul 10, 2023 at 06:18:27PM +0100, Lorenzo Stoakes wrote: > > On Mon, Jul 10, 2023 at 03:42:31PM +0100, Mark Brown wrote: > > > > We end up seeing NULL or otherwise bad pointer dereferences, the > > > specific error does vary a bit though it mostly appears to be in the > > > pinctrl code. A bisect (full log below) identified this patch as > > > introducing the failure, nothing is jumping out at me about the patch > > > and it's not affecting everything so I'd not be surprised if it's just > > > unconvering some bug in the platform support but I'm not super familiar > > > with the code. > > > Yeah seems likely. Do you have a .config you can share for this board? For > > a 64-bit device you'd expect that this change would probably be a nop. > > It's definitely happening with arm64 defconfig, possibly with other > configs but that's the main one. I'm sorry for dropping you in it, Mark, but I'm totally baffled. I've spent most of the day trying to come up with ideas, but failed. I've no doubt that you're seeing what you're seeing, but how it comes about is a mystery. Lorenzo is right that the change should be a no-op - compared with 6.4. But it's not quite a no-op in this series, because 04/32 0d940a9b270b ("mm/pgtable: allow pte_offset_map[_lock]() to fail") diverts the old pte_offset_map() macro off to a new function in mm/pgtable-generic.c; then this commit restores it back to being the pte_offset_kernel() macro. So the asm in vmalloc_to_page() is expected to change in this commit, but change back to what it would have been in 6.4. This feels like one of those bugs which depends on the code size in some way (a bit like those bugs we used to have, where a function was mistakenly marked __init, then in some configs its code landed on a page which got freed at startup - I'm not saying this is that at all, just saying it feels weird in that way). Yet your bisection converges convincingly, which I wouldn't expect in that case. I suppose I should ask you to try reverting this 0d1c81edc61e alone from 6.5-rc1: the consistency of your bisection implies that it will "fix" the issues, and it is a commit which we could drop. It makes me a little nervous, applying userspace-pagetable validation to kernel pagetables, so I don't want to drop it; and it would really be cargo- culting to drop it without understanding. But we could drop it. I guess it would be interesting to know whether vmalloc_to_page() is ever even called in your kernel, before it crashes on the pinctrl stuff. But putting in a printk to report on that may change everything. And I guess it would be interesting to know (from a DEBUG_INFO build of the crashing kernel) which line of dt_remember_or_free_map() it oopses on i.e. which pointer is NULL when it shouldn't be - or maybe you already worked that out. And what device (which ->dt_node_to_map) is involved. If one of the many dt_node_to_map's fails to initialize *map to NULL when it should, and has relied on it happening to be a NULL on the stack already... that might explain it. Another thing to try, would be the kernel at 0d940a9b270b^, just before pte_offset_map() grew a function call: there's a faint possibility that the bug came in before this series, that 0d940a9b270b somehow masked it (I don't see how: vmalloc_to_page() does sensible validation itself), and then 0d1c81edc61e unmasked it again - so that the bisection skipped over, and converged on the wrong point. But I'm thrashing about: I have no confidence that any of this info will help us. Sorry for wasting your time. Thanks, Hugh