From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB74BC7115B for ; Mon, 23 Jun 2025 14:58:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C8456B00B2; Mon, 23 Jun 2025 10:58:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 477F86B00C0; Mon, 23 Jun 2025 10:58:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38EB86B00D1; Mon, 23 Jun 2025 10:58:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2956D6B00B2 for ; Mon, 23 Jun 2025 10:58:16 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 56104C0641 for ; Mon, 23 Jun 2025 14:58:15 +0000 (UTC) X-FDA: 83586970950.01.548927B Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf22.hostedemail.com (Postfix) with ESMTP id 287D2C000F for ; Mon, 23 Jun 2025 14:58:12 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=2V0SDnO+; dmarc=none; spf=pass (imf22.hostedemail.com: domain of axboe@kernel.dk designates 209.85.214.173 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750690693; a=rsa-sha256; cv=none; b=WGjDbG/RgakOFhqz3/RKPnE7GQV4HOEDDuDqRYcZ/uNtlhZKFo1NSo9PnRywbjSFtCe8kM GSoi89BVLV5bckLRsuLUwUaMO4ip3R4zmDgpwF2DWLu/DEJNZ2EPLWft2Lbs7PG3tmYAlW WWRDD3+S//S23qoJymz0IP9hYz6eJ90= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=2V0SDnO+; dmarc=none; spf=pass (imf22.hostedemail.com: domain of axboe@kernel.dk designates 209.85.214.173 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750690693; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=00yFHLwjKSmyTnldm5//Ve8YQJQIpSkCGJjKg7cs2a0=; b=KcStInuC6qFR7WFwJt9+1Nvh4I2N8ln34piHIuEYdXMWXyUfpn7SKR2afWpGcHkNOJ2g35 5rsY29w0dCiQka6YGi5eConjwnau5qhDrOG5uVwagCpnESME2MDLUmfRFXR66TM4xyJPig HsuU+W+GjoPKyIb+4SrXcdiH9Vb0ZcQ= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-234fcadde3eso58619455ad.0 for ; Mon, 23 Jun 2025 07:58:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1750690692; x=1751295492; darn=kvack.org; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=00yFHLwjKSmyTnldm5//Ve8YQJQIpSkCGJjKg7cs2a0=; b=2V0SDnO+NAUltP61gZnOX0MUTaQB+bRXzL2oro2YeUPKILY8wWsnhc9puGCbkmTRhO 91aNgIxZHdOklDCfwFKy8Ck+IdLH4/iJYRYndeBYL4mCpJczwEMDrL3JFTnyu8KnRwEr P6khz0Pvzbq6Zxp83MbQxEIg1P/xIjedUrtpY7Ho35ek+mliU/EifKDTbfpvE3e3w8YQ mXYMTnC02ER6jETkyl6gyUaDsJCi8+k7usGpUyn4wngD6uAx+NshZpYH7Uiu6sJnddvL PxHMggAGwQ9gQsNQ8MogubmjLYIkKxKooOgAubqH+z+tQCPG9KHK3FDfbUAQSQf7Hwqf rb1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750690692; x=1751295492; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=00yFHLwjKSmyTnldm5//Ve8YQJQIpSkCGJjKg7cs2a0=; b=tnk3i0BjP8IXndNy96Slq9tiKxBJ2G3s2IcWreKacnnU6O5iPC4L+Nv9EQMIDxEfdU 8MWbjXoGVuSsahxY+IebD/6Y/VeiNM0BYwIP8zdWTojJM2OQZEaJgDqvt8ZUIFguZ6FY M1V9ImVZqR1pUC6pAc6pbxj/4rCUNNyIP4tdl8UHUJvPVv59LOKW10rzRiZdxNAN8zbx x/mLq/FGTmew61MBxP0l1jLdUzO6U673WAJ4e8BkIpWOBQU5H1VYxrrx55jufTNHMsT5 Z0/YzIAecRoWuxAkOGxJr5Gdw3CfbZcXRynnf8MfahwiDOLiXU/kwlhL8mIAsdUOI5Ol l1xQ== X-Forwarded-Encrypted: i=1; AJvYcCWcyXAzTQUdjJbJS1TLagSqdLqDnLCAdH9POA8/G7p4u+L2THS15W8x/WhfSdq2/E9lmTlFg1M1yQ==@kvack.org X-Gm-Message-State: AOJu0YxusOveb8elx2nxqeM0FCnNZwJI2I0eaCIoJ2PJICgA++YHvgZS oy8iYriFM+XscgIyHIz5lvFzgYKY8nVPQWMiAsqTp8VtCo67ABoFed0BB+IzKxtrPdI= X-Gm-Gg: ASbGncvJzsqZZINdHmgj9Q80rpxk6t7+sRm1qlC2tt4ti5dU9faECyuCiR9/4D2zfBn QeoGAYLVYmVcBHm427HbsE0ENjL0g6Aky3P0+2ZNgfLCc1k4NW5241VhES1g9RQQ9MExZP14+zI FU0RQt0UyYUaEHfVb8dvSwkxsLWFAUbdeSGSc4NJbOMcO3iPWiER+jizNjMh8QeJUq5yyv/y7xc rTxuydnVb1AKsCulEa3gJe3WgFztHtJzzdkYSvH+K9HFYejq9smg/2j3D9yM1vwH5em04SxyXja F85NEAARiyjDFEkeF4QIEF19OtFe00YUCJq+rDW0am5vSb3GBFPlR6EqC+kUaPjRAb48 X-Google-Smtp-Source: AGHT+IFGJyaGlJ9kXniwK2TToGvQCZVPPliQLaHQMpaenA5QkeVuVo0fNAgLTHhZ/5bpw7VBsHsoHA== X-Received: by 2002:a17:902:c94e:b0:235:f18f:2924 with SMTP id d9443c01a7336-237d97cf9e2mr186776515ad.15.1750690691712; Mon, 23 Jun 2025 07:58:11 -0700 (PDT) Received: from [172.20.0.228] ([12.48.65.201]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-237d860a387sm85254815ad.115.2025.06.23.07.58.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Jun 2025 07:58:10 -0700 (PDT) Message-ID: <014a3820-8082-43a6-8bb2-70859cabdbc0@kernel.dk> Date: Mon, 23 Jun 2025 08:58:09 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [syzbot] [mm?] kernel BUG in sanity_check_pinned_pages To: David Hildenbrand , Alexander Potapenko Cc: syzbot , akpm@linux-foundation.org, catalin.marinas@arm.com, jgg@ziepe.ca, jhubbard@nvidia.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterx@redhat.com, syzkaller-bugs@googlegroups.com, Pavel Begunkov References: <6857299a.a00a0220.137b3.0085.GAE@google.com> <56862a1d-71c0-4f07-9c1a-9d70069b4d9e@redhat.com> From: Jens Axboe Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 287D2C000F X-Stat-Signature: 7m46hb94t9keh9euib8eno68ix4u8czr X-Rspam-User: X-HE-Tag: 1750690692-45600 X-HE-Meta: U2FsdGVkX18Hq0wm26V74gSn3dYVf9HyQERlFX4yEasvRB9ZQU4zEeROM5yxWdiOhqn9YYepAs7NkddAZwCUuOk1teds8KakvMFwPCQ4gc0ow87jUZl5W29CDbQAkIdUH2FpMfXUq60Neo9LaXJ/mD76Ib4WYbsc4j9shcBcB3HEYVbg3iDcbXIJu4pr+bhrIGgPIwYTqD3mKGzi6EIFa9Gn3eRbEXapnuizuZTfbT87gYzAIQ8OfltN8NTPCLcv2VV2dNW32txtKYA3VGfZNdtEwyXWS4AefOKY/D3C0U0/1d9imFhNsKvToKMuY5hEzPQ8HIZ4sMVqJMeUXiNU7UgobahY+/lBXGQAen14qfOt35UoHEBNwap10F9VbGL1Y4yMGlX3XLFR+ycinEuxIcE3ZPAYudflSK1gdm2X2e9V7F10vU8+tAOfOYRHzekyzy9dK83geKwmp2Xr/ctEHUew2uZ1fHmOTneYm8WYo44vD9Hedan+hV09JEdRidDefpw+YC2MaadkwhoL6MB2Gdb8BOICNAeYnK3GTbTKSysdkXSAQugSAL0tvyfuEAmN5FuM/4RucEfjVciHcopH16zp28+BsNNYGFA3teN0RYW0+BRWGsJSrrgmc24YJklFzfgN+rOgwLJWD/W3R1cBmjeUjR6//pOGnjWKU263p06Hw2iS0D9OMOCRVRSYu6w1TwhvDv6VoLLCPLWz9XKAKk1w4M0oSDY3qTZZvJe6fYWFnPJBOxWLaufBsOePccYxHcEZpHt6QvR+uT/yBEy0WhwQXFHdmhj4OBn0IppNI7US7jVRlN9Xj2+wuz4LpfoNprGKbuyI7h1TFt+aUiuS6yHjYfDkJ6yLY8SoSYi8J4oiZLZhGGaFtfM9C/BpsSSYqEtFNdsI1SCo7h95Vccy5fW6ElkkCQGgJTOOZTBNtKlq2aeqAhQZ4t7x27ehrRtWDi/FHBq0IWLSa6jRZzp RPftoHBX r1nFNXyfpCRbVMJXUTzpYXM17X24L2FsIwC6YZG2RXLNR9icjzTr7MPblkOdbV9TFO/KPPwpmvzMswO0BtIGAYnlsrZNxmFQTZee4kBeuarmYmhHb8raRbpq8CRDD1h6FljMKWIRN13k9vBq1OavVf8VAfSTERZgZN85rDOl5lKykBiz2syT/B/y2MEaVvEEvlGVG2JNmvXCyvkTBR4xzPZGLI2EsStKCey8n4iSlY+ufQSNXW8j/m+XA+SRlDpnKbxW0tjhuPt1sQowbbJmCQBMZA4OYgQ1y8oi1XPwibsNqVZwDIY0it2ctctVZzZl+eFdZ58Vt4et2oHMqW/+cJaJaJWgnRFlNYw3T8V6OSOnGrSYVsKYIulTKE1ho3MsrxgphY6YcQPVLSVcEAHy6Ay/uCdJ+IgSxU3oKdzexguNZbNEZCeYoqbV7p9NEdrg4AZ22tLm/kKYu0UDVk0SP9oe0hlFiMJvUkbLwcs/DnYbKsgBIo2YLre9wE7YIeBHBZ7KxivJBCZNus+QlqavaCYwLOnXu57bMeEJn34Foq8Peja/VcoMx2AJEGeYQ9TnaLCiKK9iYD8/56IpkBH2sGIHwhkEPYlTb2uaROI0STZxhRS52sJa5c2yx1dqJxEWzQ0AhzQW/0HMB/mHd7xm8VmpxAG87dn4w7Xph4a8kjpg0t2iPDfuxabXODTR4Pi4SdqIlQYKUjc4eYdbytZpLheNwOmJneW71WHzZWcIMjE5AHAhFwQhQixcoQrhTNEKFqk++0FEfXHlE621wqlm8w17DxisjUQCGMOc9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 6/23/25 6:22 AM, David Hildenbrand wrote: > On 23.06.25 12:10, David Hildenbrand wrote: >> On 23.06.25 11:53, Alexander Potapenko wrote: >>> On Mon, Jun 23, 2025 at 11:29?AM 'David Hildenbrand' via >>> syzkaller-bugs wrote: >>>> >>>> On 21.06.25 23:52, syzbot wrote: >>>>> syzbot has found a reproducer for the following issue on: >>>>> >>>>> HEAD commit: 9aa9b43d689e Merge branch 'for-next/core' into for-kernelci >>>>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1525330c580000 >>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=27f179c74d5c35cd >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=1d335893772467199ab6 >>>>> compiler: Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6 >>>>> userspace arch: arm64 >>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16d73370580000 >>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=160ef30c580000 >>>> >>>> There is not that much magic in there, I'm afraid. >>>> >>>> fork() is only used to spin up guests, but before the memory region of >>>> interest is actually allocated, IIUC. No threading code that races. >>>> >>>> IIUC, it triggers fairly fast on aarch64. I've left it running for a >>>> while on x86_64 without any luck. >>>> >>>> So maybe this is really some aarch64-special stuff (pointer tagging?). >>>> >>>> In particular, there is something very weird in the reproducer: >>>> >>>> syscall(__NR_madvise, /*addr=*/0x20a93000ul, /*len=*/0x4000ul, >>>> /*advice=MADV_HUGEPAGE|0x800000000*/ 0x80000000eul); >>>> >>>> advise is supposed to be a 32bit int. What does the magical >>>> "0x800000000" do? >>> >>> I am pretty sure this is a red herring. >>> Syzkaller sometimes mutates integer flags, even if the result makes no >>> sense - because sometimes it can trigger interesting bugs. >>> This `advice` argument will be discarded by is_valid_madvise(), >>> resulting in -EINVAL. >> >> I thought the same, but likely the upper bits are discarded, and we end >> up with __NR_madvise succeeding. >> >> The kernel config has >> >> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y >> >> So without MADV_HUGEPAGE, we wouldn't get a THP in the first place. >> >> So likely this is really just like dropping the "0x800000000" >> >> Anyhow, I managed to reproduce in the VM using the provided rootfs on >> aarch64. It triggers immediately, so no races involved. >> >> Running the reproducer on a Fedora 42 debug-kernel in the hypervisor >> does not trigger. > > Simplified reproducer that does not depend on a race with the > child process. > > As expected previously, we have PAE cleared on the head page, > because it is/was COW-shared with a child process. > > We are registering more than one consecutive tail pages of that > THP through iouring, GUP-pinning them. These pages are not > COW-shared and, therefore, do not have PAE set. > > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > #include > > int main(void) > { > struct io_uring_params params = { > .wq_fd = -1, > }; > struct iovec iovec; > const size_t pagesize = getpagesize(); > size_t size = 2048 * pagesize; > char *addr; > int fd; > > /* We need a THP-aligned area. */ > addr = mmap((char *)0x20000000u, size, PROT_WRITE|PROT_READ, > MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); > if (addr == MAP_FAILED) { > perror("MAP_FIXED failed\n"); > return 1; > } > > if (madvise(addr, size, MADV_HUGEPAGE)) { > perror("MADV_HUGEPAGE failed\n"); > return 1; > } > > /* Populate a THP. */ > memset(addr, 0, size); > > /* COW-share only the first page ... */ > if (madvise(addr + pagesize, size - pagesize, MADV_DONTFORK)) { > perror("MADV_DONTFORK failed\n"); > return 1; > } > > /* ... using fork(). This will clear PAE on the head page. */ > if (fork() == 0) > exit(0); > > /* Setup iouring */ > fd = syscall(__NR_io_uring_setup, 1024, ¶ms); > if (fd < 0) { > perror("__NR_io_uring_setup failed\n"); > return 1; > } > > /* Register (GUP-pin) two consecutive tail pages. */ > iovec.iov_base = addr + pagesize; > iovec.iov_len = 2 * pagesize; > syscall(__NR_io_uring_register, fd, IORING_REGISTER_BUFFERS, &iovec, 1); > return 0; > } > > [ 108.070381][ T14] kernel BUG at mm/gup.c:71! > [ 108.070502][ T14] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > [ 108.117202][ T14] Modules linked in: > [ 108.119105][ T14] CPU: 1 UID: 0 PID: 14 Comm: kworker/u32:1 Not tainted 6.16.0-rc2-syzkaller-g9aa9b43d689e #0 PREEMPT > [ 108.123672][ T14] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20250221-8.fc42 02/21/2025 > [ 108.127458][ T14] Workqueue: iou_exit io_ring_exit_work > [ 108.129812][ T14] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 108.133091][ T14] pc : sanity_check_pinned_pages+0x7cc/0x7d0 > [ 108.135566][ T14] lr : sanity_check_pinned_pages+0x7cc/0x7d0 > [ 108.138025][ T14] sp : ffff800097ac7640 > [ 108.139859][ T14] x29: ffff800097ac7660 x28: dfff800000000000 x27: 1fffffbff80d3000 > [ 108.143185][ T14] x26: 01ffc0000002007c x25: 01ffc0000002007c x24: fffffdffc0698000 > [ 108.146599][ T14] x23: fffffdffc0698000 x22: ffff800097ac76e0 x21: 01ffc0000002007c > [ 108.150025][ T14] x20: 0000000000000000 x19: ffff800097ac76e0 x18: 00000000ffffffff > [ 108.153449][ T14] x17: 703e2d6f696c6f66 x16: ffff80008ae33808 x15: ffff700011ed61d4 > [ 108.156892][ T14] x14: 1ffff00011ed61d4 x13: 0000000000000004 x12: ffffffffffffffff > [ 108.160267][ T14] x11: ffff700011ed61d4 x10: 0000000000ff0100 x9 : f6672ecf4f89d700 > [ 108.163782][ T14] x8 : f6672ecf4f89d700 x7 : 0000000000000001 x6 : 0000000000000001 > [ 108.167180][ T14] x5 : ffff800097ac6d58 x4 : ffff80008f727060 x3 : ffff80008054c348 > [ 108.170807][ T14] x2 : 0000000000000000 x1 : 0000000100000000 x0 : 0000000000000061 > [ 108.174205][ T14] Call trace: > [ 108.175649][ T14] sanity_check_pinned_pages+0x7cc/0x7d0 (P) > [ 108.178138][ T14] unpin_user_page+0x80/0x10c > [ 108.180189][ T14] io_release_ubuf+0x84/0xf8 > [ 108.182196][ T14] io_free_rsrc_node+0x250/0x57c > [ 108.184345][ T14] io_rsrc_data_free+0x148/0x298 > [ 108.186493][ T14] io_sqe_buffers_unregister+0x84/0xa0 > [ 108.188991][ T14] io_ring_ctx_free+0x48/0x480 > [ 108.191057][ T14] io_ring_exit_work+0x764/0x7d8 > [ 108.193207][ T14] process_one_work+0x7e8/0x155c > [ 108.195431][ T14] worker_thread+0x958/0xed8 > [ 108.197561][ T14] kthread+0x5fc/0x75c > [ 108.199362][ T14] ret_from_fork+0x10/0x20 > > > When only pinning a single tail page (iovec.iov_len = pagesize), it works as expected. > > So, if we pinned two tail pages but end up calling io_release_ubuf()->unpin_user_page() > on the head page, meaning that "imu->bvec[i].bv_page" points at the wrong folio page > (IOW, one we never pinned). > > So it's related to the io_coalesce_buffer() machinery. > > And in fact, in there, we have this weird logic: > > /* Store head pages only*/ > new_array = kvmalloc_array(nr_folios, sizeof(struct page *), GFP_KERNEL); > ... > > > Essentially discarding the subpage information when coalescing tail pages. > > > I am afraid the whole io_check_coalesce_buffer + io_coalesce_buffer() logic might be > flawed (we can -- in theory -- coalesc different folio page ranges in > a GUP result?). > > @Jens, not sure if this only triggers a warning when unpinning or if we actually mess up > imu->bvec[i].bv_page, to end up pointing at (reading/writing) pages we didn't even pin in the first > place. > > Can you look into that, as you are more familiar with the logic? Leaving this all quoted and adding Pavel, who wrote that code. I'm currently away, so can't look into this right now. -- Jens Axboe