From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92CE6C197A0 for ; Thu, 16 Nov 2023 08:26:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E40536B0422; Thu, 16 Nov 2023 03:26:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DEFAA6B0423; Thu, 16 Nov 2023 03:26:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB6936B0424; Thu, 16 Nov 2023 03:26:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B8C706B0422 for ; Thu, 16 Nov 2023 03:26:53 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9724A1A0636 for ; Thu, 16 Nov 2023 08:26:53 +0000 (UTC) X-FDA: 81463136706.27.DF71A45 Received: from mail-oa1-f49.google.com (mail-oa1-f49.google.com [209.85.160.49]) by imf04.hostedemail.com (Postfix) with ESMTP id D409A40008 for ; Thu, 16 Nov 2023 08:26:51 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U5NxDXdj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of hughd@google.com designates 209.85.160.49 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700123211; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u2XmX2KrWi1y3m2X9AnU9wT0fC0eZs76Gy7YyWSbakc=; b=sBBweK+FOGIoka6Ym5SsoL+2B+u29S2Al89zh2NuK4iU5StbattRtaP0DXmI7EwttiDraH mdr7gkzrO67N562f3lcIkiEN0nPaUUsXhZG2C/Yuika+yCh1qg76R4I6piyhz7RfLfDuOE 30Z0NTQZoKZZbmz4WY/gWSxEqmwepLc= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U5NxDXdj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of hughd@google.com designates 209.85.160.49 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700123211; a=rsa-sha256; cv=none; b=C6yILayCMwiGQspX9JeGmz1fUFFaQ8+K5kH5CHttctjQj2YbJUs5xnrfQpUYsjNqvzhdjL 2e08uoBbsxyAVFoKos2/2X+ExnZGmHrDglfuFPbxVx18U7iGI2yiMyBOz8kM86HwQfW82q MRYLuiAXRT8BS2uAwk1NCY+ArZ7tBFY= Received: by mail-oa1-f49.google.com with SMTP id 586e51a60fabf-1ef36a04931so256161fac.2 for ; Thu, 16 Nov 2023 00:26:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700123211; x=1700728011; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=u2XmX2KrWi1y3m2X9AnU9wT0fC0eZs76Gy7YyWSbakc=; b=U5NxDXdj9aCTJ2tQTJKFKMxqrJTzZsXss5jZglJlDsUxpog2CGQuLfqWTLNpQhUZEb gO8EI4Vcu2locomiczAIf4DrSLMuMsKJ3Jz0mxFrnBUdyq4q9ot+PxBwKZwLqPfB5y69 cCPjRIGSG9H6GwfoCamjvjfJx8gYvyj9YPhTbsFs+3dFvX6FsOaSvebKdTHOwkRX+4db joRlnikBDQeHpIIRxC4aeignV4GEr68jSu1lnoE6yDv9hFW991CZe9ZE64tCgl5GwKHw NrhYdrJi/Nyw6N91M1CmP7Uhzh/U9NkPk6sLAi7DWl0jU4IP9CRhT8oxu7AMan4IJPAw mVSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700123211; x=1700728011; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=u2XmX2KrWi1y3m2X9AnU9wT0fC0eZs76Gy7YyWSbakc=; b=W8DxBk2s1NrJZG0NMC4UqfSTorEZ6bmarOXPNOFDF+kRJanNjTJc/UHv+d+h/hCcO7 +bjd367tLG30wQvVPGQJlLW6fsou+R2DwCFRvYhSXFjzMb7A/ooZWiGKA9rUQIyYgjzF Xt9PnWBDbVUt2f//fMnb+a53LHdQvvHBbj4WtBWb/u53O5N3Qy1olnoFq5JyjZc0nH2B /pZQ0fqFNBQ5gnZ9aJDqUhQFwmwmjyeTuA62AQ26wSICb11UCykOpe1Prb4kyZ4BqLxG P3WtfS9silCgUtRTbe/SrsUI8JGPkI9GYtzUA4lsGkMVdJGvUPR+FWt8Dq9M91jWW+3R dADg== X-Gm-Message-State: AOJu0Yx9AVa1a0awmh+fqgu04LRdt2noMjXFI43VoWOv0DxmqZtACkvg Rz55YgPor/ej3UWA8dX/xXO+WQ== X-Google-Smtp-Source: AGHT+IE7w+1n0Rumu6tiZ0oCMxAY+y3dimDmUbr8EZaTXEn4BDvU7a/l/yvMSZ7+Fe7ajPFFqQFb4Q== X-Received: by 2002:a05:6870:3127:b0:1ef:cb8c:cebd with SMTP id v39-20020a056870312700b001efcb8ccebdmr17440326oaa.59.1700123210697; Thu, 16 Nov 2023 00:26:50 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y4-20020a056870a34400b001e9ce1b5e8fsm2062976oak.15.2023.11.16.00.26.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Nov 2023 00:26:49 -0800 (PST) Date: Thu, 16 Nov 2023 00:26:39 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Hugh Dickins cc: Matthew Wilcox , =?ISO-8859-15?Q?Jos=E9_Pekkarinen?= , akpm@linux-foundation.org, skhan@linuxfoundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kernel-mentees@lists.linux.dev, syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com, Jann Horn Subject: Re: [PATCH] mm/pgtable: return null if no ptl in __pte_offset_map_lock In-Reply-To: <515cb9c1-abcd-c3f3-cc0d-c3cd248b9d6f@google.com> Message-ID: <1b7019f3-0abf-a2f5-46ae-373af2043e56@google.com> References: <20231115065506.19780-1-jose.pekkarinen@foxhound.fi> <1c4cb1959829ecf4f0c59691d833618c@foxhound.fi> <515cb9c1-abcd-c3f3-cc0d-c3cd248b9d6f@google.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463753983-898050368-1700123209=:31351" X-Rspam-User: X-Stat-Signature: f113qdizddj95yue59ny8bgopqf7i5qi X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D409A40008 X-HE-Tag: 1700123211-357051 X-HE-Meta: U2FsdGVkX1+NygssdVXPN4ksg0mAPHE66vd4yBC5lbLYnQce52eH8Zsqg4SZ32zQstYbmeEiUdzqTPiuY3yBBx/EALFC2jGDbekC4x82Hw5nfbS8jGhSem+MPNE9zNjm+gmqvMysA5gSxsqXiB65gHejuWHv5l3INYFPvZCArFSlPvoixArt4kQrVA7L6b9JYu/q16812V01OLP9GCCCuk4srjfYw/67qVT2X90JsSQQwtsorOrtdgvArBosgI8nR95CkxEepO5HKuv4w+G/cUmNRMh/1M0rEvp4q3gpcKfinUv4WzzXX/gSRNlLgfxMjQOM26NjtabNqRUVBL1wyrvWIo7UeFRx/AP66hrryqpFkIZ/gUt9cMll5K7MwwzVQDClznvlR+hRTcKBUJQxWBOiWVLBwTOsByxInGBSNWJITVTQTkCLmVK/dMB6jrcLj9MLavhbTaqP0eXG2uVK3LmQ/6lqY8FfgCogqzB9bJ24nSrD5UYImjgRpZ/XdzaFB5ryip/q3xObUq9xdzqd0//DeeUMjWo7jQ80kgxE/r52Z4M25EMk4VP6glbO5zxr4eAEPdAQsRpjGykv1M2hDq05+wASFHzy2t/YTdwzwf/0eTYsFqLs4wQhhqeeBlK7TGX+KCpT0OAglqvW6tLbgszoCimbTpEj9bZGDDWykI9Gpk1WiORU9fxFB6M3BAo/xqMo2DU0RdGxHkguMMJ2RHrOs3DbldqwLb4nG2zFeAJ/9Ds9WdWsn0kgj/kd08eQCHQJn1uSB3qcV0GTyjhB1Tyz+ri3mrdrouRHtmBKLIl/egA1IsXVOyc7T0B9lMTFqkjRchhH8AG5ACDxdGxaW3KcRRSZblJIjKEOSIhonX+coV3MBHX9LNpfgpIsaDqzhxNFTDlUWV4qhR06OLlGnEDJA+GjbFUc+IQrpW2G+D/06DT9MyuxCpsDafu75xYN6dYb6nX4d4GdvYBSFp9 2mQQ4aGa HEVhuatdReWC8LmFM7e4OuTWH59sOYf8r6tzzEHmkVpHtFePmvuA4zQnpwqthC7RUXFSXqTf2mQ4aWWZ8ZHTkeBsSBfO+ps+mkf9w2WBReLTBAiH6SBOMcqqCOFdZij5q91WVL5nQAu9Fc68b4Bk4Ux3s9bLVEBd8SA/Xf1N0tP3UKoFhYABm8nJtCFfV/Kv4OvnmC5HKA6gv9JNOl5hAyzNsPo918IAeCeqA6Db+twUvjDoWUYUwgj/aA1zvbxUtGy6j2+0lrmJMymjtdYzUK9uSebFmsaWHNZxTjc5H/vxXzxfOYe1WOjeEk7ZhO/F4xiwMEVJjWAeaA9JD+cGYVU8foVaoaymITeGvAXjkngbKW7W3sAOUdJ0nUkhaJ50Xkl9JQ0Jd0DC+s5ZyyH2L98Uduz5VLdo3fW/k8Koau9OgszYYSvjU6YUTDHg13CtnXnYcvUTMIYZZCZpZ2+dvxr1yhycZ3g9tza6I5Ra1dVbXR5c4/vlctDSi8BKXtqB0d50WVsPzWqGFK15FSGZOVNY5e5J8aomzSJPVFAtDBP2P8mj9r10jcorQdoBD60gwdc0N+tT1DaisJFjIWcbndy6O8Y4yzbABYDpkE/sUr19Emikd5O4ABuM/0ZhUaVaBo0KxG8aLP8LsdPfW6BzlA+Zv5Ije0/dCJJOs X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463753983-898050368-1700123209=:31351 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE On Wed, 15 Nov 2023, Hugh Dickins wrote: > On Wed, 15 Nov 2023, Matthew Wilcox wrote: > > On Wed, Nov 15, 2023 at 06:05:30PM +0200, Jos=E9 Pekkarinen wrote: > >=20 > > > > I don't think we should be changing ptlock_ptr(). > > >=20 > > > This is where the null ptr dereference originates, so the only > > > alternative I can think of is to protect the life cycle of the ptdesc > > > to prevent it to die between the pte check and the spin_unlock of > > > __pte_offset_map_lock. Would that work for you? >=20 > Thanks for pursuing this, Jos=E9, but I agree with Matthew: I don't > think your patch is right at all. The change in ptlock_ptr() did not > make sense to me, and the change in __pte_offset_map_lock() leaves us > still wondering what has gone wrong (and misses an rcu_read_unlock()). >=20 > You mentioned "I tested the syzbot reproducer in x86 and it doesn't > produce this kasan report anymore": are you saying that you were able > to reproduce the issue on x86 (without your patch)? That would be very > interesting (and I think would disprove my hypothesis below). I ought > to try on x86 if you managed to reproduce on it, but it's not worth > the effort if you did not. If you have an x86 stack and registers, > please show (though I'm uncertain how much KASAN spoils the output). >=20 > >=20 > > Ah! I think I found the problem. > >=20 > > If ALLOC_SPLIT_PTLOCKS is not set, there is no problem as ->ptl > > is embedded in the struct page. But if ALLOC_SPLIT_PTLOCKS is set > > (eg you have LOCKDEP enabled), we can _return_ a NULL pointer from > > ptlock_ptr. The NULL pointer dereference isn't in ptlock_ptr(), it's > > in __pte_offset_map_lock(). > >=20 > > So, how to solve this? We can't just check the ptl against NULL; the > > memory that ptl points to may have been freed. We could grab a referen= ce > > to the pmd_page, possibly conditionally on ALLOC_SPLIT_LOCK being set, > > but that will slow down everything. We could make page_ptl_cachep > > SLAB_TYPESAFE_BY_RCU, preventing the memory from being freed (even if > > the lock might not be associated with this page any more). >=20 > But I don't think that's quite right either: or, I hope it's not right, > because it would say that all my business of rcu_read_lock() and > pte_free_defer() etc was a failing waste of everyone's time: if the > page table (and/or its lock) can be freed while someone is in that RCU- > protected area, then I've simply got it wrong. >=20 > What I thought, when yesterday's flurry of syzbot messages came in, > was perhaps the problem is at the other end - when the new page table > is allocated, rather than when it's being freed: a barrier missing there? >=20 > But pmd_install() does have an smp_wmb(), with a (suspiciously) long > comment about why no smp_rmb()s are needed, except on alpha. >=20 > I'm not much good on barriers, but the thought now in my mind is that > that comment is not quite accurate: that at the bottom level an smp_rmb() > is (very seldom!) needed - not just to avoid a NULL (or previous garbage) > ptl pointer in the ALLOC_SPLIT_LOCK case, but to make sure that the ptl > is visibly correctly initialized (or would spin_lock on it achieve that?)= ; > and that what pte_offset_map() points to is visibly empty. (I'm imaginin= g > stale cache lines left behind on the oopsing CPU, which need to be > refetched after pmdval has been read.) >=20 > And that this issue has, strictly speaking, always been there, but in > practice never a problem, because of mmap_lock held for write while (or > prior to) freeing page table, and racers holding mmap_lock for read > (vma lock not changing the calculus); but collapse_pte_mapped_thp() > can now be done with mmap_lock for read, letting those racers in > (and maybe your filemap_map_pages() has helped speed these races up - > any blame I can shift on to you, the better ;) >=20 > But I may well be spouting nonsense. And even if I'm right, is adding > an smp_rmb() in there too high a price to pay on some architectures? >=20 > I did make an attempt to reproduce on an arm64, but there's too much > more I'd need to muck around with to get it working right. Let's ask for > a syz test on top of v6.7-rc1 - hmm, how do I tell syzbot that it's arm64 > that it needs to try on? I hope it gets that from the Cc'ed syz number. Syzbot couldn't do it from this mail, but did it from the original thread: https://lore.kernel.org/linux-mm/000000000000ba0007060a40644f@google.com/ tells us that I was spouting nonsense: this patch does not fix it. I have no further idea yet. >=20 > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.g= it b85ea95d086471afb4ad062012a4d73cd328fa86 >=20 > [PATCH] mm/pgtable: smp_rmb() to match smp_wmb() in pmd_install() >=20 > Not-Yet-Signed-off-by: Hugh Dickins > --- > mm/memory.c | 2 ++ > mm/pgtable-generic.c | 5 +++++ > 2 files changed, 7 insertions(+) >=20 > diff --git a/mm/memory.c b/mm/memory.c > index 1f18ed4a5497..8939357f1509 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -425,6 +425,8 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pg= table_t *pte) > =09=09 * being the notable exception) will already guarantee loads are > =09=09 * seen in-order. See the alpha page table accessors for the > =09=09 * smp_rmb() barriers in page table walking code. > +=09=09 * > +=09=09 * See __pte_offset_map() for the smp_rmb() at the pte level. > =09=09 */ > =09=09smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ > =09=09pmd_populate(mm, pmd, *pte); > diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c > index 4fcd959dcc4d..3330b666e9c3 100644 > --- a/mm/pgtable-generic.c > +++ b/mm/pgtable-generic.c > @@ -297,6 +297,11 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long ad= dr, pmd_t *pmdvalp) > =09=09pmd_clear_bad(pmd); > =09=09goto nomap; > =09} > +=09/* > +=09 * Pair with the smp_wmb() in pmd_install(): make sure that the > +=09 * page table lock and page table contents are visibly initialized. > +=09 */ > +=09smp_rmb(); > =09return __pte_map(&pmdval, addr); > nomap: > =09rcu_read_unlock(); > --=20 > 2.35.3 ---1463753983-898050368-1700123209=:31351--