From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19425C47DD9 for ; Mon, 22 Jan 2024 16:30:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A64178D0009; Mon, 22 Jan 2024 11:30:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A14C18D0001; Mon, 22 Jan 2024 11:30:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DCCC8D0009; Mon, 22 Jan 2024 11:30:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7FA548D0001 for ; Mon, 22 Jan 2024 11:30:36 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 55B2E80A94 for ; Mon, 22 Jan 2024 16:30:36 +0000 (UTC) X-FDA: 81707485272.28.34CBA3C Received: from mail-il1-f169.google.com (mail-il1-f169.google.com [209.85.166.169]) by imf30.hostedemail.com (Postfix) with ESMTP id 8B8078001B for ; Mon, 22 Jan 2024 16:30:34 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QkLqgva0; spf=pass (imf30.hostedemail.com: domain of edumazet@google.com designates 209.85.166.169 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705941034; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CjELHtowAv3ak1AQZT7BU/ZqYcvabtPwYrfG/ySOdno=; b=dzgwY/iRoNmX8YjVrF2yXEAZBLE8wAgwykIVhA1niSNiYDjqr9S1eSEQ1ouwzxA6SI2IHt hI701x3FowuJ7S/vQD66K6QT3dAvOacX5+upjD95zR6E15MlH1PzrjwdLzFgDcg1pIMnZh kxjxw84A7R2V7Vja6Jj79ZZdjIGuoZM= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QkLqgva0; spf=pass (imf30.hostedemail.com: domain of edumazet@google.com designates 209.85.166.169 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705941034; a=rsa-sha256; cv=none; b=JWXlgO1D8oz2CFmxXRE2ugkA1jUd6Etokcoxob48OVJzUuBYIUHrfcZMifV0kPt6y6o9J1 dT2pnd6u0XcXkEoxFgvdg1UOlUKL2pwOReOoVuo8AEErNduKxd1POy8wXax4oNQFkhzEam Nxrusvywry/WvjKM4SV6Jg4t4ScNCzg= Received: by mail-il1-f169.google.com with SMTP id e9e14a558f8ab-361b24d1a9eso334675ab.0 for ; Mon, 22 Jan 2024 08:30:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705941033; x=1706545833; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CjELHtowAv3ak1AQZT7BU/ZqYcvabtPwYrfG/ySOdno=; b=QkLqgva0gHUsjKewQKUozJgNFJpOuz/JWLnilm/qWQeYyA3Zx9Gq/dQF7bsLJJruyv GB/6mSLTiIqNNFmuweLX1b0M7LhOCA7xJ6xXMIdXcEOrkGqzw3SBMifuDGubsP2x9ir5 YAwdgKbfyYY+bMGmvoMGNrFXRCIjrdcYQbzzn8iKOBab65ukolOsb7HxkoRHP7gJzOA6 LuLqcAtiy8Dogq0rqcq5D9+Ku2O5ZRfZ+kNQMk2ZlZCnT+1ycwtvq4+ivAXN8QrWKu8O BU0SX5+4DOxcW/K3Sf/rpksHrzlWY6Dz4S9OYEtsR9CT8CyRgb9/Eik8LHM/3IWYyz9m NpJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705941033; x=1706545833; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CjELHtowAv3ak1AQZT7BU/ZqYcvabtPwYrfG/ySOdno=; b=awxnvlu7XUil7VNzDnCF2UhWXvE7MPkIUPQiSacsI0f0RZM+r/ab+D111yMmfzvRkT iocWs0KsKKjicXjxpXUIWX4+0CWmR2lwUowZWpuJ5sW7WcRqSxZW2745OrX9MRDP3dTc nKpi+rTNdg7T+kTZy4NnFF/dF6qi5vKdn44Yrv17UNWxRF1yg3hjL/wrSG21umcU6eL4 avmnDLBam2faVgWErpvC87+lKsm3zM0Q+9pAWUqeNs0i8ceqe5E4RJB7FZAU0n02fbRe Jhbyae28aYZE1iO5m9eMkt5X4+Nck03ot19kLe4cQk9MSpqoRK0NBOcx6tyKRo0HONFP 9l5w== X-Gm-Message-State: AOJu0YzVYc++6+Gepab40F05eCfQxaW9NwjxegE5tyAWtX9NVUWxRGfs 4kuJ3MywlMpyjkM1U4/duTe4jvUMILA+jDAv9FEzXHTdqRh/ X-Google-Smtp-Source: AGHT+IGC97gkTGATtf9M7kYTIQ90XsuuFjUZxJn7oFDh0SQFpWxPVfSenQKRQC4QLPOYii7BLctCq604l2+O3t2XXSA= X-Received: by 2002:a05:6e02:1c2c:b0:361:8079:2843 with SMTP id m12-20020a056e021c2c00b0036180792843mr454962ilh.18.1705941033411; Mon, 22 Jan 2024 08:30:33 -0800 (PST) MIME-Version: 1.0 References: <20240119092024.193066-1-zhangpeng362@huawei.com> <5106a58e-04da-372a-b836-9d3d0bd2507b@huawei.com> In-Reply-To: From: Eric Dumazet Date: Mon, 22 Jan 2024 17:30:18 +0100 Message-ID: Subject: Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY To: Matthew Wilcox Cc: "zhangpeng (AS)" , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, netdev@vger.kernel.org, akpm@linux-foundation.org, davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, arjunroy@google.com, wangkefeng.wang@huawei.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8B8078001B X-Rspam-User: X-Stat-Signature: b6inti83kmwsnxic4cme1kbs8zr5ga9p X-Rspamd-Server: rspam01 X-HE-Tag: 1705941034-806200 X-HE-Meta: U2FsdGVkX1+1tL36AlAXpepzUy+OU4Hauo4AXmrWQPiTPkYvecY2RdxQWFu7kOeUpGuFkg2qeNw3+Sef31ogC1hyrS3NjjPEF08+QGoqcxjNaYKrNFkC9OO6e7M8K6l+qIdzoRG6IjiO/izzMe7AWt6zrgRlJJYYMK5xfOL/3us7GeyDQYYEiTbyU6CCcZSQqiImd8kqzr4DwQmx52GcV8rvtHOy93D3MKOCCsqsiTMD5kJsLmar5ARDRY6aBxjwi2gu7Cmy7+8S3ODIgL5avu8Nhwo4I5Wrh1MYqvxx83qGwUe+XdMQSQ0SsxLBFce5SDGJ1zFAx2r7eEVuWxxN8tGjof44LXLjrgmCQD7JMK+6lqX2fSPPW3XCFYbLiNRQTcqSStWB9uIsQdp+jhoNzT0xzmno7Fv4tNWYCvBCJmrgdyPqJDk+85k5oZhBfgPcJS4ess4jrbiraLfqc37hI7vFTEQUo6odV6hYNkiGVMWknHTF+wI9vHSNUBzhUkbvMTcer2eewT3aa0L0SlDOGpG6RvKcFjqoZWw5b/Kh2IezDboFy2H04M+x3i2ql9C6Ia04r3Mj1drv1gze26mapLt9/WaNDmSXNk65esfvVAy5gIJe1gEQEJ0iBl4xR1q17StficgYKifNajPBjmwLRlyQtwrr03dV2MCLbyjHeJencL/nQeOTEyBTP/zpI5A5hZsdw2HIBf+jTgcjTj9Qbtb/im/WWThAlbY+CDlqXARzucKEWfcozfHviKIJwfGdfn5Ii1e93G36IAMoz9Mz8zqFi7NstEbjI68mpMZTpuTch6zYziF2xlkkFBeJ+9ZaHX5ritSW8kVzcc4yuZlqmFuiTp6bU7pd0SY8Yv97D5znfm8mH6NBpOffNGng8N1oNtTI6FT7kM61Vwmh3AwsMAXjDzBiYiH0xMd2dIHrtMFUHgppTqwtJnpjUgYY05D9pKhud7+MIEr2OReh3pC /3vseykm 6QILRu+W/Zfq3hXLQ8fR3wiL9cQ8I1Ajb9khi+bbWz5X5OYVvZ0qdCctS3i6aHn7UdQhtGQItMWay9c2rjvJZuugNT2o3LYAMTxXIdY5moUr5wP7rlXoDDeyXPjPugK+B6nqPsCpJR4G6E+cBaeHyop84uv9IyGUF7SfnEczn0wbrcO5zovCjCD4tgGlw3+FLtPLzukSqD5WYtX5El5QZDRGDHC77gYeipB265NS1/gN+yPccQoxOGjlbQ1DDGa+2fgI4XROrCAk4FGcgxop9AhMXkeqNVJ/Goudp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 22, 2024 at 5:04=E2=80=AFPM Matthew Wilcox wrote: > > I'm disappointed to have no reaction from netdev so far. Let's see if a > more exciting subject line evinces some interest. Hmm, perhaps some of us were enjoying their weekend ? I also see '[RFC PATCH] filemap: add mapping_mapped check in filemap_unaccount_folio()', and during the merge window, network maintainers tend to prioritize their work based on tags. If a stack trace was added, perhaps our attention would have been caught. I don't really know what changed recently, all I know is that TCP zero copy is for real network traffic. Real trafic uses order-0 pages, 4K at a time. If can_map_frag() needs to add another safety check, let's add it. syzbot is usually quite good at bisections, was a bug origin found ? > > On Sat, Jan 20, 2024 at 02:46:49PM +0800, zhangpeng (AS) wrote: > > On 2024/1/19 21:40, Matthew Wilcox wrote: > > > > > On Fri, Jan 19, 2024 at 05:20:24PM +0800, Peng Zhang wrote: > > > > Recently, we discovered a syzkaller issue that triggers > > > > VM_BUG_ON_FOLIO in filemap_unaccount_folio() with CONFIG_DEBUG_VM > > > > enabled, or bad page without CONFIG_DEBUG_VM. > > > > > > > > The specific scenarios are as follows: > > > > (1) mmap: Use socket fd to create a TCP VMA. > > > > (2) open(O_CREAT) + fallocate + sendfile: Read the ext4 file and cr= eate > > > > the page cache. The mapping of the page cache is ext4 inode->i_mapp= ing. > > > > Send the ext4 page cache to the socket fd through sendfile. > > > > (3) getsockopt TCP_ZEROCOPY_RECEIVE: Receive the ext4 page cache an= d use > > > > vm_insert_pages() to insert the ext4 page cache to the TCP VMA. In = this > > > > case, mapcount changes from - 1 to 0. The page cache mapping is ext= 4 > > > > inode->i_mapping, but the VMA of the page cache is the TCP VMA and > > > > folio->mapping->i_mmap is empty. > > > I think this is the bug. We shouldn't be incrementing the mapcount > > > in this scenario. Assuming we want to support doing this at all and > > > we don't want to include something like ... > > > > > > if (folio->mapping) { > > > if (folio->mapping !=3D vma->vm_file->f_mapping) > > > return -EINVAL; > > > if (page_to_pgoff(page) !=3D linear_page_index(vma, addre= ss)) > > > return -EINVAL; > > > } > > > > > > But maybe there's a reason for networking needing to map pages in thi= s > > > scenario? > > > > Agreed, and I'm also curious why. > > > > > > (4) open(O_TRUNC): Deletes the ext4 page cache. In this case, the p= age > > > > cache is still in the xarray tree of mapping->i_pages and these pag= e > > > > cache should also be deleted. However, folio->mapping->i_mmap is em= pty. > > > > Therefore, truncate_cleanup_folio()->unmap_mapping_folio() can't un= map > > > > i_mmap tree. In filemap_unaccount_folio(), the mapcount of the foli= o is > > > > 0, causing BUG ON. > > > > > > > > Syz log that can be used to reproduce the issue: > > > > r3 =3D socket$inet_tcp(0x2, 0x1, 0x0) > > > > mmap(&(0x7f0000ff9000/0x4000)=3Dnil, 0x4000, 0x0, 0x12, r3, 0x0) > > > > r4 =3D socket$inet_tcp(0x2, 0x1, 0x0) > > > > bind$inet(r4, &(0x7f0000000000)=3D{0x2, 0x4e24, @multicast1}, 0x10) > > > > connect$inet(r4, &(0x7f00000006c0)=3D{0x2, 0x4e24, @empty}, 0x10) > > > > r5 =3D openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)=3D'./file0\= x00', > > > > 0x181e42, 0x0) > > > > fallocate(r5, 0x0, 0x0, 0x85b8) > > > > sendfile(r4, r5, 0x0, 0x8ba0) > > > > getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23, > > > > &(0x7f00000001c0)=3D{&(0x7f0000ffb000/0x3000)=3Dnil, 0x3000, 0x0, 0= x0, 0x0, > > > > 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=3D0x40) > > > > r6 =3D openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)=3D'./file0\= x00', > > > > 0x181e42, 0x0) > > > > > > > > In the current TCP zerocopy scenario, folio will be released normal= ly . > > > > When the process exits, if the page cache is truncated before the > > > > process exits, BUG ON or Bad page occurs, which does not meet the > > > > expectation. > > > > To fix this issue, the mapping_mapped() check is added to > > > > filemap_unaccount_folio(). In addition, to reduce the impact on > > > > performance, no lock is added when mapping_mapped() is checked. > > > NAK this patch, you're just preventing the assertion from firing. > > > I think there's a deeper problem here. > > > > -- > > Best Regards, > > Peng > > > >