From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA66EC02192 for ; Mon, 3 Feb 2025 20:39:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59E1C6B0082; Mon, 3 Feb 2025 15:39:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5264E6B0085; Mon, 3 Feb 2025 15:39:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A0206B0088; Mon, 3 Feb 2025 15:39:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 195526B0082 for ; Mon, 3 Feb 2025 15:39:43 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8F34CA048D for ; Mon, 3 Feb 2025 20:39:42 +0000 (UTC) X-FDA: 83079799404.10.3B381F1 Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf13.hostedemail.com (Postfix) with ESMTP id A479020009 for ; Mon, 3 Feb 2025 20:39:40 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KLRmwRZ6; spf=pass (imf13.hostedemail.com: domain of amir73il@gmail.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738615180; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NJoLeA5HCdF+uipqG9OTqGKMKt79tnpVrboddnu4FFc=; b=8OE0Ki7V4LzOuPJPxLeI29yTdOYbRrfAlF2YobDPpQywJF+rxBAfImj5CzAAV5ozZR606Z zwpBbeXZUlAx8Imx+ea/TrmYf6rWs5vNbkbljtjYft1ZywVawidayPNuNmN3rP2c2IUZNb W/dIDEwVVCFci5zGwcZfWVQ2Ek1iYtk= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KLRmwRZ6; spf=pass (imf13.hostedemail.com: domain of amir73il@gmail.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738615180; a=rsa-sha256; cv=none; b=jRje4ZJV2DbEB5bkfWLMwXGrTFjRWHQLnq7PkKJMc53OmKem/EhIIq3c+1tGZ7dGnGnKqN lzy0DibjNO4y92KJc+7N9vjaq/+j0XeLH8frFtGLzlcO9pa4YGJu/aO1Rlu9gBBm5HOkgN K+rGOLNWOPh3VgWTr7vkfXFvVkSNW+8= Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-5d3e6f6cf69so8659445a12.1 for ; Mon, 03 Feb 2025 12:39:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738615179; x=1739219979; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=NJoLeA5HCdF+uipqG9OTqGKMKt79tnpVrboddnu4FFc=; b=KLRmwRZ6MIGApD6fSaNric7OrPIBZA+2x/MdQP++C/Hs+bY1OEyqr5MjOEznogEYcU 5kCS+H8JPUiaMO9gDri56Xc6VVe896LLWfWL/dvogOWWWTs0iWi/ABsaTP8jxKDZKoJX xLo9fJZHTShMK/I8Tja5oK3hv1quYd4rqXoD1X69s2hwjgYFtkyUyp1SSmwe7xxPT5lf UQD/1nPR13w119VG+wG8Z3SUnGrXvIoI+g0oaPGDeYlconl95FeJ48+JfarleQ2i0EHj n6uYWTL73b+c2o8kbAmP0rpqgJt7rT41LaPzreAIMkjiZZqLQHjuQPc2rz+mukNCxZ9r 9ZZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738615179; x=1739219979; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NJoLeA5HCdF+uipqG9OTqGKMKt79tnpVrboddnu4FFc=; b=BIDCzEVRX5QxbGiLVfqCq70Jhbg94mueaJtY3bcxmx6VlQKQKhm3f2AjJywV784/OQ inZZtc8H9+MzLJFCzi4NxDIQFiMXYlYRGpMiwBLE+9pNh5OHVviPXGuN/ng7KLZzjLZ2 VDXh/Zj3HSd9G7G0VikBI2d28HBOD4H88QCTmjJ6oiRxrXXzhotROz6fCS6JQHtBs3aI 68rUKj3biuIVFagiS6TkEYm9fIm8aW5oPRCBHRciRFKA+mXV7kuJnbljm08AA/6yZkdn L4u/qCMIL2BEjpEWIxQOaDUdvH4emagoT46B+aqI55D3SKsTck4ZWElfd39mZrLsFlsh e3GQ== X-Forwarded-Encrypted: i=1; AJvYcCUT63D6fAoKmJY6E3mDTgpgfWeuArDGVCzoFiQhGBn0YjgUmFtqg+ba80vLBKP4wkzEDexrJMAKrw==@kvack.org X-Gm-Message-State: AOJu0YxWItwq9PI84xJ+ixUeSkGX4FzljhkSMowAH/rtfMcrExx9cobC kJP+hmC7B7NVpioRhWhNkuHcbK97NJR8remL+njCdioW9dPrIlMcVWcM7CsoBrIvVmGw4IU3qbo hGLLxa3nKh3R3yhzpPh9d+5V4yvk= X-Gm-Gg: ASbGncv+ZQiAOnOGRp9+carC9JJ2wYi5CxVDMEXtxA7vvGn6etKEGgyNBV/zJJhD/jU HiJ+W5enlMHI8x0mWzgM1ZWZlVvdVz7eOke56eoVolXDiopigUb5taBKpLbTHa+o9Gz7tSIql X-Google-Smtp-Source: AGHT+IFNFuJ6PFWXTnztMuE1X9BK0sRVnxTYJ0CcSl1i8HxwsroaqPn8G3ITli2ILPX2NDGvbG709VaVv1BtBVn5pVE= X-Received: by 2002:a05:6402:1d49:b0:5dc:8b8b:3527 with SMTP id 4fb4d7f45d1cf-5dc8b8b3955mr14245605a12.17.1738615178722; Mon, 03 Feb 2025 12:39:38 -0800 (PST) MIME-Version: 1.0 References: <9035b82cff08a3801cef3d06bbf2778b2e5a4dba.1731684329.git.josef@toxicpanda.com> <20250131121703.1e4d00a7.alex.williamson@redhat.com> <20250201-legehennen-klopfen-2ab140dc0422@brauner> <20250202-abbauen-meerrettich-912513202ce4@brauner> In-Reply-To: From: Amir Goldstein Date: Mon, 3 Feb 2025 21:39:27 +0100 X-Gm-Features: AWEUYZkbHu6mMa6EWY6xvPoTALvkWrJQw00UR10VzqG2w2z2rfA8abpNUzs3ZCQ Message-ID: Subject: Re: [REGRESSION] Re: [PATCH v8 15/19] mm: don't allow huge faults for files with pre content watches To: Jan Kara , Alex Williamson Cc: Christian Brauner , Linus Torvalds , Peter Xu , Josef Bacik , kernel-team@fb.com, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org, "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A479020009 X-Stat-Signature: sb8sbx3t6f8mq3uzt789ipykyo9fy3q9 X-Rspam-User: X-HE-Tag: 1738615180-499441 X-HE-Meta: U2FsdGVkX1/IHcXaa+/iXSF9J4cUquuD4h7h1LLyfSfSx5n+fav/8eTCwW8Ebom0Y2/BeTYU9T0FmzvoXeCmirOeSAH78pWz6fgJZla1K3TiYWrXeVUHIfalcLGZBkQCFdb8YMSFKRPq7wDf0DPudp6s08KXXhB80+0Xco22HtwLm5qMJ9fruet9Jivbv/dDJqUuOIM5e6ulvf5iuJ+3MvKYy53TbNmzXc48WEY24T6C3sMl+gUog8yX9EffEqwmat6AJ0uSHLGbmOXSgiVncRWWIbhjH+54nXLi/Odt+AW+2uw4g+yXQ5oFa7mEGaT3FdJHfgHzxPVdG5mrIELvS1GWshQ0Zze+N192HLG0sztKNdRc8TQ7AqoyLV+WeRxNV6cx/OJW4/P+cTyaCOLFgI6F2iOSnio1cwY4uvXTeTZ0yxzxaqdWAFG1T4ZtIpdSqz8xrYFhiXUSsHs1fEZgnz72zulR17hjdQEJ3SCDJTrXBO2b+x3D9mJMRknUtStJmb+xBNNyb3AvkRjSCHjsWKnCcyjCRobyuKq4rt5iqZ0v9ENSIoB8SxcO7xq6+6pMlr9Giwbw2Io126odEACFXmLu/2w6uVl7v405780P8LKrvRqtvV2E38VhrPEl8Re5nNvGeaniOfuXPKa7+PXMWrS5Wypeiw1OKVi76AnfrFlCLsssawfo1jmXMKwHbdk913mwBjMVFriOTtK9+1OCAbf2qt46nNF9IGGgvJOxLa7oEzPTjnHoACHt9Ou95vR1+VNKbVIltP2UP76wH9wyMJz3yKuGqLJTJXYhuKUm5LlsUrq11cooAQbbNZOQwDz8NR54xgJxtKv/2ectM+i9mjTclE2hSrOlh+NT/72fTcC/9T3nEm3D0LQ26UG7VQtXyfiC+vkv7b9bcXTmTRcuZv5C4/Lq64+OJv8hgwoehCjRIE1BUb5/i2XgGxwzyX+oil2CswKkIU8+9s3KDnb k2PXGRVD +uKoHxSX24hZXhhtJAIODUSl932aAuN2qRGQAS5cuwAIhef8alpfab1zOY2prCBAYw7ENjh3U/xsSPogpzYAgWgnrUM76ian9Xs6mYNoAOm+O6HYta7x9Izuljra6zTgAwXqAkDcKP+lN3yRb5XU96zsy3WG+ra99z+6XYJ9pXzhfCK/aGQi56UJ3x+9KJYJ3Njh44FpNjK8ak1csQUcun9/0Fq2oaFITnvVAUgZNC/4zfPv5xFQebgVCKp5/hA1GYHZni5U5c+OBpDSOh0bjoxD2AhxXQ8hwBddR5XfJSW1CBLV6v7SzP9j4OwN3bf0fr9WKSC5UIQq8CgYDWizuxje+pZoMOhUuq19vxodVSSYEtt9XP+TFSQvB4UHH5IntoK5I4B2f2MX7iFsMc71Mi2fi7+xw/LfHjjkwWD3V6T87+2tNkXBq3VC9nKsyRg5fgBYo5NH9IFuntZksXXNDtSEjRw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000020, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 3, 2025 at 1:41=E2=80=AFPM Jan Kara wrote: > > On Sun 02-02-25 11:04:02, Christian Brauner wrote: > > On Sun, Feb 02, 2025 at 08:46:21AM +0100, Amir Goldstein wrote: > > > On Sun, Feb 2, 2025 at 1:58=E2=80=AFAM Linus Torvalds > > > wrote: > > > > > > > > On Sat, 1 Feb 2025 at 06:38, Christian Brauner = wrote: > > > > > > > > > > Ok, but those "device fds" aren't really device fds in the sense = that > > > > > they are character fds. They are regular files afaict from: > > > > > > > > > > vfio_device_open_file(struct vfio_device *device) > > > > > > > > > > (Well, it's actually worse as anon_inode_getfile() files don't ha= ve any > > > > > mode at all but that's beside the point.)? > > > > > > > > > > In any case, I think you're right that such files would (accident= ly?) > > > > > qualify for content watches afaict. So at least that should proba= bly get > > > > > FMODE_NONOTIFY. > > > > > > > > Hmm. Can we just make all anon_inodes do that? I don't think you ca= n > > > > sanely have pre-content watches on anon-inodes, since you can't rea= lly > > > > have access to them to _set_ the content watch from outside anyway.= . > > > > > > > > In fact, maybe do it in alloc_file_pseudo()? > > > > > > > > > > The problem is that we cannot set FMODE_NONOTIFY - > > > we tried that once but it regressed some workloads watching > > > write on pipe fd or something. > > > > Ok, that might be true. But I would assume that most users of > > alloc_file_pseudo() or the anonymous inode infrastructure will not care > > about fanotify events. I would not go for a separate helper. It'd be > > nice to keep the number of file allocation functions low. > > > > I'd rather have the subsystems that want it explicitly opt-in to > > fanotify watches, i.e., remove FMODE_NONOTIFY. Because right now we hav= e > > broken fanotify support for e.g., nsfs already. So make the subsystems > > think about whether they actually want to support it. > > Agreed, that would be a saner default. > > > I would disqualify all anonymous inodes and see what actually does > > break. I naively suspect that almost no one uses anonymous inodes + > > fanotify. I'd be very surprised. > > > > I'm currently traveling (see you later btw) but from a very cursory > > reading I would naively suspect the following: > > > > // Suspects for FMODE_NONOTIFY > > drivers/dma-buf/dma-buf.c: file =3D alloc_file_pseudo(inode, dma_b= uf_mnt, "dmabuf", > > drivers/misc/cxl/api.c: file =3D alloc_file_pseudo(inode, cxl_vfs_mount= , name, > > drivers/scsi/cxlflash/ocxl_hw.c: file =3D alloc_file_pseudo(inod= e, ocxlflash_vfs_mount, name, > > fs/anon_inodes.c: file =3D alloc_file_pseudo(inode, anon_inode_mn= t, name, > > fs/hugetlbfs/inode.c: file =3D alloc_file_pseudo(inode, mnt, = name, O_RDWR, > > kernel/bpf/token.c: file =3D alloc_file_pseudo(inode, path.mnt, BPF= _TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops); > > mm/secretmem.c: file =3D alloc_file_pseudo(inode, secretmem_mnt, "secre= tmem", > > block/bdev.c: bdev_file =3D alloc_file_pseudo_noaccount(BD_INODE(bdev= ), > > drivers/tty/pty.c: static int ptmx_open(struct inode *inode, struct fil= e *filp) > > > > // Suspects for ~FMODE_NONOTIFY > > fs/aio.c: file =3D alloc_file_pseudo(inode, aio_mnt, "[aio]", > > This is just a helper file for managing aio context so I don't think any > notification makes sense there (events are not well defined). So I'd say > FMODE_NONOTIFY here as well. > > > fs/pipe.c: f =3D alloc_file_pseudo(inode, pipe_mnt, "", > > mm/shmem.c: res =3D alloc_file_pseudo(inode, mnt, name, O_R= DWR, > > This is actually used for stuff like IPC SEM where notification doesn't > make sense. It's also used when mmapping /dev/zero but that struct file > isn't easily accessible to userspace so overall I'd say this should be > FMODE_NONOTIFY as well. I think there is another code path that the audit missed for getting these pseudo files not via alloc_file_pseudo(): ipc/shm.c: file =3D alloc_file_clone(base, f_flags, which does not copy f_mode as far as I can tell. > > > // Unsure: > > fs/nfs/nfs4file.c: filep =3D alloc_file_pseudo(r_ino, ss_mnt, read= _name, O_RDONLY, > > AFAICS this struct file is for copy offload and doesn't leave the kernel. > Hence FMODE_NONOTIFY should be fine. > > > net/socket.c: file =3D alloc_file_pseudo(SOCK_INODE(sock), sock_mnt, = dname, > > In this case I think we need to be careful. It's a similar case as pipes = so > probably we should use ~FMODE_NONOTIFY here from pure caution. > I tried this approach with patch: "fsnotify: disable notification by default for all pseudo files" But I also added another patch: "fsnotify: disable pre-content and permission events by default" So that code paths that we missed such as alloc_file_clone() will not have pre-content events enabled. Alex, Can you please try this branch: https://github.com/amir73il/linux/commits/fsnotify-fixes/ and verify that it fixes your issue. The branch contains one prep patch: "fsnotify: use accessor to set FMODE_NONOTIFY_*" and two independent Fixes patches. Assuming that it fixes your issue, can you please test each of the Fixes patches individually, because every one of them should be fixing the issue independently and every one of them could break something, so we may end up reverting it later on. Thanks, Amir.