From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B840C47071 for ; Thu, 16 Nov 2023 07:23:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC3186B041C; Thu, 16 Nov 2023 02:23:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D72396B041D; Thu, 16 Nov 2023 02:23:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C39D06B041E; Thu, 16 Nov 2023 02:23:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B4E976B041C for ; Thu, 16 Nov 2023 02:23:47 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 81EB21CB8FD for ; Thu, 16 Nov 2023 07:23:47 +0000 (UTC) X-FDA: 81462977694.18.63C87F6 Received: from mail-oo1-f41.google.com (mail-oo1-f41.google.com [209.85.161.41]) by imf03.hostedemail.com (Postfix) with ESMTP id A4E8420005 for ; Thu, 16 Nov 2023 07:23:45 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=IhygBi2r; spf=pass (imf03.hostedemail.com: domain of hughd@google.com designates 209.85.161.41 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700119425; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Va0DhdP1S+FQZDKLtRJCi2/YvbWjCbaUD3edV3P6ZW0=; b=s9bnurURdYkTHb8dbVE958ojFT6stWtAmuJujf0K2tjNRJoXWpvyUWbozIWBJNZ7ZxKyj9 cozWWSDw/2N8EmFTlZyCKzP3GZrFO22vN2CG3RFBms3zNeg71L4eLZ4Jp4yo4usKvIb25M dGq/KbKPHlVC7skZ4DHnoqA3rkoG678= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700119425; a=rsa-sha256; cv=none; b=m96n3u+CD83JR4fjkNYiFQfVDnkgOJuMKW9H3iXrT9gh8UxUb62FAl4DlbJtdsKkWBQbLA f66mcwgSAx6FsA53j18pGA10YNL+C12lOD97b6J2d2Ubhtchu+PcYIZVTdhYiM7Xz4WNsI bWhFFmyZcXd+VujUFWAs7hBASIR6Koc= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=IhygBi2r; spf=pass (imf03.hostedemail.com: domain of hughd@google.com designates 209.85.161.41 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-oo1-f41.google.com with SMTP id 006d021491bc7-589d412e8aeso252752eaf.3 for ; Wed, 15 Nov 2023 23:23:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700119425; x=1700724225; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Va0DhdP1S+FQZDKLtRJCi2/YvbWjCbaUD3edV3P6ZW0=; b=IhygBi2rOpWb8BMW/NAC05x3CLfCmPgP/g6gQk3al8j0aUFh04ZkUfkmOSzvAmC1nn PH7dj8Y7gqV3XYYUyOuz0w9UvqlV3PbnZNahIX7cqRLWWoilVXPKl2diBXpk9rEVdTdW yHe//sIB1cnSKd2bxd4/0F89HNClSIl0V2XiXm6QmY6nG/EpD1l3B6bA+cCjmyU7f2Id x7eUHxKk9jEiJ+Edo6Z/WfkXPytEcNnqQl+NRlLicIR8uOCmFgwfipo1itXQafyEOM7D IPGLNPHBWM7bmHPLpE4uZbvtbuK5b3psSh2tBiD6aqy9ljaasmwksfX3siHPXlqHAxJg hvEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700119425; x=1700724225; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Va0DhdP1S+FQZDKLtRJCi2/YvbWjCbaUD3edV3P6ZW0=; b=fo7c75csronnzf2WG9BS+gaY5E/gq1B/zq2lqkTxOqeedw1SsnqwrFNZfe+ia/ZYxO ajwLd8znpnni6yBjELoMmHqz2XzHCY6TTJQXjIupOrduOtHer4MAST5cFGB3d4NJAXHt kINBcsXVTYRF9PIgYzSP+eDO2RSfao0s4MhlP9Z5eC3HOZQY/2X/3xfN4N1R7RRud8Oc uYFxkGpffzhNDMFBTRHEJnFT4ptrhk9DKkQhdPkC/pn7qWT+1PeaeLXhtUwNB2JPolgX sGGOB295oOMxrnoSKlGCW1ev/6NiQwf7n1qaT/7DmZRK+CNFf0FzRGAE45cmuSCZuZuz zg+A== X-Gm-Message-State: AOJu0Yze1/bhNFimLRM+QxnlzzY9//N9sBkc6/ZNqY5D9qaKQF1mTDWa l1Y+a3IzQUTcLNI68x6xGWTsOg== X-Google-Smtp-Source: AGHT+IFSAc6weOLxIXxlEk0KaE5PBqOJ7TzepsRHqTgJYwW0yLdvhy899q5xQ6nAdaw8HNmFAj7psw== X-Received: by 2002:a4a:d2c1:0:b0:57b:6451:8c64 with SMTP id j1-20020a4ad2c1000000b0057b64518c64mr15059639oos.9.1700119424429; Wed, 15 Nov 2023 23:23:44 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a31-20020a4a98a2000000b0058737149513sm990159ooj.24.2023.11.15.23.23.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Nov 2023 23:23:43 -0800 (PST) Date: Wed, 15 Nov 2023 23:23:32 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Matthew Wilcox cc: =?ISO-8859-15?Q?Jos=E9_Pekkarinen?= , akpm@linux-foundation.org, skhan@linuxfoundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kernel-mentees@lists.linux.dev, syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com, Hugh Dickins , Jann Horn Subject: Re: [PATCH] mm/pgtable: return null if no ptl in __pte_offset_map_lock In-Reply-To: Message-ID: <515cb9c1-abcd-c3f3-cc0d-c3cd248b9d6f@google.com> References: <20231115065506.19780-1-jose.pekkarinen@foxhound.fi> <1c4cb1959829ecf4f0c59691d833618c@foxhound.fi> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463753983-726823796-1700119423=:27830" X-Rspamd-Queue-Id: A4E8420005 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: iozxb4byauoz737mgpej9r4j6q7c9qce X-HE-Tag: 1700119425-856299 X-HE-Meta: U2FsdGVkX1/EhMkTLF/t0dYRmUwkQsPRYrnsu2ZadE90MitQL+JtqOTg5mifyW0/reGIUrgFDGxRUqFH7uxcueDh8lbBxjG8pBhl0tpjP047ibx8ihTZhrroIUf1CxRaYWv2tbUALIKOF7rH7K7u+VnKRz97tuvI56lQ9mCmK6ENa8HcDgQVArMpqN3BtYrB3EbfhwIGoSx4DGjFe35hnXcX8NNZj/AcfDK/Wyu4wnSRMotUBoqxVsYP0qlWlUcD3iALqOBJNoIz6+npyNixbFGigQVilyuWCcdSotMM2Zw5dri4gJu4Y2reCuLLlaNM2EwgcmwVJ6SmC73Bk1H8dQ0Q+CTKsLj3R+Gx8qCTXwwslvdXB8LCNXfSyy32VxDdzbiNa/DPPfpAhIIEORLrq6qlFwOjSvMBpS+EV38MAwFC5LfmsL6uvXlCErAtc2pNR3LTQV+gXu3ASdyiAxCpuW2JvRR5tra7IXj1/kfN8pO5FO7HOghkUUuBJMAmhWP/yodWxMlwJ0pQXVniW9KjeDcwZm6/DXOxAIbAsX/Pc95uDpb5OUsZ4hLoq+5BgKJ+dm+Lma78hSA3P55dyDHKwbK6dkuP5O8+9ot6wH22/kukXWTVbd/0M4KE6nLM0Ib0VdV4BeRo+BjgZx/Md8MDKicFe/qaMwKLVP6HtDELbcH5V3E2EHoTUP//I+jNSBHndqjBl0tVfASbihoR9nQVtJVdf/KEqLlmrbCAikJ/W61tul4TK0ZjLC/TpjUOO7rGDS5ZM+AvYbYagPv1zDn1aOnunbpX7hUG4nnfCDs1pccauS4z5o0ziSGS8T+79e4MYXQdXbQnNHbg30NhRaWxotH789zhwQm7XRoYc+yQN12l9oizoRVw91doSZNdlN2EX3KVHy1ENfpXdT7Tjzw1seBZTMXrCCtGqH4K2AO0PAIzd3ESWNnfJlfP5enM9v3OqVKJ3Ib0XLInLxpJPEg h5Rusm1M fpLsKH79spkE9qnc2CdoEme8IcLBxHKoU+T1mNabxQymEtgr7yvgxxTZJLOg1kBo92XqCv1jS+v94NVweexggaC3qN+tj67A7ewYQ4HIwJQVDqbg86UsZ77P7BEj0s4BwyMB6eDbljSWzp1DEFvtHtOEhEmXkk2J9Asv5PXcCKj8oXLYVTHrwxfFsdZ5thY0zvEh0G9az5W0qrw3evieFPTekWUZFbAxG/OKoNzUHtoHXzk09qzkS/+E6KBYOtriUoZtpkGnRjCwld1BZZM+IiEv72d3lQnxIJ9YR188tB3Mc1GUGtMHiQfcR11jIHP2yVv6fZjR7MVEfDUsLCVobPPKMKmRpPAzduUkzHGU/Q+u+VeaaLNbfcOnQsxu2B6Z3vYb8GjemRk9i1fQLvEEWQM/Y+BMeLcZuc7Xpk/cqlWXjRN6QJazX/8iTJwsRKlu1ePQqZ82sJKi2w8lyCAWdwWa3ndTIloawevzFeRFazaRvRsO75CQux2qNG5B9sEUPNB2mwUd00HKLbAi4Gv7WpbnisBrrrFoP1Hoamtro2/P7ahLuv0dZk/Nafu8MHoPhm7PHRLbS09MBTsLfACkt8yjNUJrHtjZ4hjabRMAfp9iP9nrXbMfZ82xpWK4/L81bZI3kNsWb2zOj+tt4snW4lJMzg6GZbyfKthwc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463753983-726823796-1700119423=:27830 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE On Wed, 15 Nov 2023, Matthew Wilcox wrote: > On Wed, Nov 15, 2023 at 06:05:30PM +0200, Jos=E9 Pekkarinen wrote: >=20 > > > I don't think we should be changing ptlock_ptr(). > >=20 > > This is where the null ptr dereference originates, so the only > > alternative I can think of is to protect the life cycle of the ptdesc > > to prevent it to die between the pte check and the spin_unlock of > > __pte_offset_map_lock. Would that work for you? Thanks for pursuing this, Jos=E9, but I agree with Matthew: I don't think your patch is right at all. The change in ptlock_ptr() did not make sense to me, and the change in __pte_offset_map_lock() leaves us still wondering what has gone wrong (and misses an rcu_read_unlock()). You mentioned "I tested the syzbot reproducer in x86 and it doesn't produce this kasan report anymore": are you saying that you were able to reproduce the issue on x86 (without your patch)? That would be very interesting (and I think would disprove my hypothesis below). I ought to try on x86 if you managed to reproduce on it, but it's not worth the effort if you did not. If you have an x86 stack and registers, please show (though I'm uncertain how much KASAN spoils the output). >=20 > Ah! I think I found the problem. >=20 > If ALLOC_SPLIT_PTLOCKS is not set, there is no problem as ->ptl > is embedded in the struct page. But if ALLOC_SPLIT_PTLOCKS is set > (eg you have LOCKDEP enabled), we can _return_ a NULL pointer from > ptlock_ptr. The NULL pointer dereference isn't in ptlock_ptr(), it's > in __pte_offset_map_lock(). >=20 > So, how to solve this? We can't just check the ptl against NULL; the > memory that ptl points to may have been freed. We could grab a reference > to the pmd_page, possibly conditionally on ALLOC_SPLIT_LOCK being set, > but that will slow down everything. We could make page_ptl_cachep > SLAB_TYPESAFE_BY_RCU, preventing the memory from being freed (even if > the lock might not be associated with this page any more). But I don't think that's quite right either: or, I hope it's not right, because it would say that all my business of rcu_read_lock() and pte_free_defer() etc was a failing waste of everyone's time: if the page table (and/or its lock) can be freed while someone is in that RCU- protected area, then I've simply got it wrong. What I thought, when yesterday's flurry of syzbot messages came in, was perhaps the problem is at the other end - when the new page table is allocated, rather than when it's being freed: a barrier missing there? But pmd_install() does have an smp_wmb(), with a (suspiciously) long comment about why no smp_rmb()s are needed, except on alpha. I'm not much good on barriers, but the thought now in my mind is that that comment is not quite accurate: that at the bottom level an smp_rmb() is (very seldom!) needed - not just to avoid a NULL (or previous garbage) ptl pointer in the ALLOC_SPLIT_LOCK case, but to make sure that the ptl is visibly correctly initialized (or would spin_lock on it achieve that?); and that what pte_offset_map() points to is visibly empty. (I'm imagining stale cache lines left behind on the oopsing CPU, which need to be refetched after pmdval has been read.) And that this issue has, strictly speaking, always been there, but in practice never a problem, because of mmap_lock held for write while (or prior to) freeing page table, and racers holding mmap_lock for read (vma lock not changing the calculus); but collapse_pte_mapped_thp() can now be done with mmap_lock for read, letting those racers in (and maybe your filemap_map_pages() has helped speed these races up - any blame I can shift on to you, the better ;) But I may well be spouting nonsense. And even if I'm right, is adding an smp_rmb() in there too high a price to pay on some architectures? I did make an attempt to reproduce on an arm64, but there's too much more I'd need to muck around with to get it working right. Let's ask for a syz test on top of v6.7-rc1 - hmm, how do I tell syzbot that it's arm64 that it needs to try on? I hope it gets that from the Cc'ed syz number. #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git= b85ea95d086471afb4ad062012a4d73cd328fa86 [PATCH] mm/pgtable: smp_rmb() to match smp_wmb() in pmd_install() Not-Yet-Signed-off-by: Hugh Dickins --- mm/memory.c | 2 ++ mm/pgtable-generic.c | 5 +++++ 2 files changed, 7 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 1f18ed4a5497..8939357f1509 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -425,6 +425,8 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgta= ble_t *pte) =09=09 * being the notable exception) will already guarantee loads are =09=09 * seen in-order. See the alpha page table accessors for the =09=09 * smp_rmb() barriers in page table walking code. +=09=09 * +=09=09 * See __pte_offset_map() for the smp_rmb() at the pte level. =09=09 */ =09=09smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ =09=09pmd_populate(mm, pmd, *pte); diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 4fcd959dcc4d..3330b666e9c3 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -297,6 +297,11 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr= , pmd_t *pmdvalp) =09=09pmd_clear_bad(pmd); =09=09goto nomap; =09} +=09/* +=09 * Pair with the smp_wmb() in pmd_install(): make sure that the +=09 * page table lock and page table contents are visibly initialized. +=09 */ +=09smp_rmb(); =09return __pte_map(&pmdval, addr); nomap: =09rcu_read_unlock(); --=20 2.35.3 ---1463753983-726823796-1700119423=:27830--