From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 833A2C0218F for ; Sat, 1 Feb 2025 01:19:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51D1B6B007B; Fri, 31 Jan 2025 20:19:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CE5A6B0082; Fri, 31 Jan 2025 20:19:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 395F06B0083; Fri, 31 Jan 2025 20:19:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1C2186B007B for ; Fri, 31 Jan 2025 20:19:30 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 94CA5A10B2 for ; Sat, 1 Feb 2025 01:19:29 +0000 (UTC) X-FDA: 83069618058.28.CD1C16C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 3B12B100012 for ; Sat, 1 Feb 2025 01:19:27 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JmutSA6C; spf=pass (imf05.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738372767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iutb+f8svSiQJQKyELECXpPORtK9MA8lQS48vaY5V1E=; b=r6Fjmp/PgtoOKnyFDmh1UXSVUrLEV1nvqqr/ZNoejZYHjkbiGyqP1pnoOE06L2NlqWYEHI yf1UnMBVS6MV29R8Z/+D6xsCotzQfoXBMbt2qU3vSPySgOf/HrfXEV7tmbtzZbnuXnWEy/ ywkcYULPz8EwOFhCn0NHragD1XWwtkY= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JmutSA6C; spf=pass (imf05.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738372767; a=rsa-sha256; cv=none; b=vydRwYoStYa81VgN1Kau2QZmx724XNuU+L8UR9WIwypUTo+UBM9cKPirz1WtvYXTCYe9Rb jtyXVkFdBlLwCkA/IQDRNITT8wYxJaH1ngywdYS5odu8w1aIf9IT6yHgkqAWpxbfTMtTbR L8pW5npLjRCFEh1qxLDza2f10w8KTME= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738372766; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=iutb+f8svSiQJQKyELECXpPORtK9MA8lQS48vaY5V1E=; b=JmutSA6CLjrBl31tyaO9Ktcv5YLtrU2iwxn35C4pFCstjgFXx6Z1vLAsaUd1TTTm4JxepR 6yZpRM4Dh91JS4M9FUW8cZjshFLsUVToIubEvlneCiIR3ZJlL4M3H+euYJRPL/Ag+aQev9 iAClemAchAkMiOMxqfXerSHrxMah52E= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-159-bFMmaXivORua9CZHckEF7A-1; Fri, 31 Jan 2025 20:19:25 -0500 X-MC-Unique: bFMmaXivORua9CZHckEF7A-1 X-Mimecast-MFC-AGG-ID: bFMmaXivORua9CZHckEF7A Received: by mail-qk1-f197.google.com with SMTP id af79cd13be357-7b6f2515eb3so253919085a.0 for ; Fri, 31 Jan 2025 17:19:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738372765; x=1738977565; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=iutb+f8svSiQJQKyELECXpPORtK9MA8lQS48vaY5V1E=; b=LOzeDLTw8v/gxbXnzumzltgTk9MnDsbN6Y8V/mPORC3z27lxmKsEdRiiksQaBde+Lp 9noUC38LdVqreELGTqvA+41/wQtbvztSQ2TtEzbp1p8U9x5o0Y0+8m+AiuRuBwd+zs3S M1ge1MN4dwv1typTIbr0UGyPMRsa+oUQuBd+7cph6nCHh5fbj5VR+gA9kpnGtF/VbkqM lRnxYmLXgTWjty2cTtLa5VD9mVK1EDZhuKrnCtAFmF1LOOhtu7/B0ZedzVpACd02yeuq AtYTeGt6jAmq5KZnKc8IypHefQT1RlZFqNntFcuWumuxHC7YyBtk9RmYOVhi2QNSdzb1 otpA== X-Forwarded-Encrypted: i=1; AJvYcCWjiA5XLTezNonO38FnJMGBUS8Y6RnA8sAbSrTVRh0iZ1bhrTwYnh5sTKaqn+S9etTI06gV5dDdYA==@kvack.org X-Gm-Message-State: AOJu0YwPk4X91dhtVnUhsm8qdvHANRXV2oxjFf8iD4EiB5WXzy7ZfUWk UVvsxbrVp793ShKn938dMvx8fw6gBtrpmSgmhZXNjCRnitYVHTvBLz4IVGnHToABICR/6dfP1ua znqWlgKGgiEV6O+bKAB3OAuqgHQRE3voCOpYbFau6gWW/Tg/e X-Gm-Gg: ASbGncurJcHDO3OM29JdPhy6I2YIONDtP/f5vFrU1xRVzvfvcSgW26jgNtAEMlhNE7I mJzXeb9A/Qo2C6WE+8i0D+eWtEH6BhDkhrmV+4nKd6nu9Ws8ZSVmF72OpdPGTK6/pWkZkyt7hV3 1Zx/wjOsoda24kNYpp5LPYVZH596q8GxsU3woYOH3T8ItpY1OKyfhiZx1GdVOF4i7xkiwfhb8WZ Y2DAAfctA07/kSmm/GkB7Y0eJIx6UKof/0S6n6nqlq1had7lYUHtHhfZhE8+njQ2rASt/82iEQT GtH1cp9nFIUW8pWUljSW3OeVea3gd2wWROg/gouh3SCQN2Kg X-Received: by 2002:a05:620a:4304:b0:7b6:6bff:d13f with SMTP id af79cd13be357-7bffcd901abmr1890779785a.42.1738372764912; Fri, 31 Jan 2025 17:19:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IF1bH00mv1WK4WNu1gZToSc4hfDszMAYhLI2X7IUvKosEz+tCfK4Su+RxkyQl3oxUQfYV1vGw== X-Received: by 2002:a05:620a:4304:b0:7b6:6bff:d13f with SMTP id af79cd13be357-7bffcd901abmr1890776885a.42.1738372764552; Fri, 31 Jan 2025 17:19:24 -0800 (PST) Received: from x1.local (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c00a9047c0sm248232585a.86.2025.01.31.17.19.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 Jan 2025 17:19:23 -0800 (PST) Date: Fri, 31 Jan 2025 20:19:22 -0500 From: Peter Xu To: Linus Torvalds Cc: Alex Williamson , Josef Bacik , kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz, amir73il@gmail.com, brauner@kernel.org, viro@zeniv.linux.org.uk, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org, "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" Subject: Re: [REGRESSION] Re: [PATCH v8 15/19] mm: don't allow huge faults for files with pre content watches Message-ID: References: <9035b82cff08a3801cef3d06bbf2778b2e5a4dba.1731684329.git.josef@toxicpanda.com> <20250131121703.1e4d00a7.alex.williamson@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: S0S-Y5GVWYx6VGUAH-noDz7hWoiW4ltnCnKFW68GoRs_1738372765 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 3B12B100012 X-Stat-Signature: oh71t7uawzt8cntij3abe7dsajtt4sd8 X-Rspam-User: X-HE-Tag: 1738372767-577161 X-HE-Meta: U2FsdGVkX1/TYZynU/ED6Q3tVufYKepWLMZXDsB7yM6/OihyahziTmHS1fFs6WjhVvpAk/dJ87kr36Sf5nmtUiGflRBYOs7SGZU+vCIo2GXqPCzoFD8Ae5656n7qoZaq79T3WRtnc0FZ2Zmmu8lppDRoBYaltZkObR6z9tuvjpjiQPoLvnF93O7wmSCaC1F5nZezntt+9Nq1/GvckSaDjN5UsURrZQzaMrEpMj2Svx1f60GeR0IXYPo2l/y6k6OoYxek9q+n/CexIkmKwU8bEoRAYn9kFPeAloAcfMdhIj59acfXzIpqmDnqBcPMe13Z2hIO7iQBVxMpT9gRi3ymKBTbLMPgSAJ/7zzrAGaAX+HM0SU+MGUyoNWQukChXsQc+iykOI1k3raZOqvyRdoTMEtDorqXRMsv7i7lYN9SFJk8GipOcEXhFP7ardShZM4hQ/JjdivguK4/lWPhYP0d2YFsHj8aupGAhgtHrLoyhcPK+JJs1LyunSKb8wrht1XYPyqvESGJbdg7fz1IgaxhmOmktjpRJqsTm5TJ6137J4Lgmvl7DeB4yyxSDQHNs4sOQU4ptjuNM98uR32ZiBJEEkPVqv7uQvPvoMxy1vChyi0WKM9KmbQFv44vioo4nyRpXNNfDEKJW7LpsI80boHZdn+wPjA7a8Z6KPybAnlvCH70S7eLq9lQwvvnVib6eeqzh5qY0RLT9e1CPDfpuEOPq/483dY7drqFo1XEv24+5DWuluzzoHx3qStIY+5QCEFB30EEruXPGu4kl6E2dtCMZX7U8rElsBQjg6DFRL3mfHvH2T1BOPm95j5znoedUQ8L8SFKQv3yQIT37hFJtOhYtFAqXLjSO7QAXfqHMvBW6XWlVIz91y0mQNSW5CG+xvsv0Zm5NShoFef858yAJigYiv9uo32Nv7g9PgLEIEi9UTWxLxqMCletE15JEIE6lSB/XkIbbBdpfGkqqwHsPn2 XWEMr5K7 zf8+fGmD63kaffJfrQArExNppbmn5yfqCucukjlEWwBKi2Du/6+de/fETF7virl7P9Yv4WNw7rmMWFbjiuuogUPGegFKDsUfzmSP+VMxblyaBN51kNvDdkV+DKkde/EfEwVjjlgyBdu+j0HleaE7Vf6FuHutocm40PSeniW9ImPU2Bw4QSzn02fhZX0gEfsXqIss3OkPLIsGV1OaO6ekg8EAIyyEmRSGT9gj3Ekz9y5+Yh18d97SRCH8F1gGssk/ofo/Y1hE5XV4qwx7Dyly4Lrl9xZO4kcSbvSPUij3laMOhIfpadhaDmG45Hy11jDNXYZPsT4u+a/cZ/GONt1DMT/bt8YMr20QO+Bcw9g0d03ermsd3DW7/W8A6KW7Kj7fqvktDdzBMeoiWjSToMCaGEx2enw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.084835, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 31, 2025 at 11:59:56AM -0800, Linus Torvalds wrote: > On Fri, 31 Jan 2025 at 11:17, Alex Williamson > wrote: > > > > 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") > > > > This breaks huge_fault support for PFNMAPs that was recently added in > > v6.12 and is used by vfio-pci to fault device memory using PMD and PUD > > order mappings. > > Surely only for content watches? > > Which shouldn't be a valid situation *anyway*. > > IOW, there must be some unrelated bug somewhere: either somebody is > allowed to set a pre-content match on a special device. > > That should be disabled by the whole > > /* > * If there are permission event watchers but no pre-content event > * watchers, set FMODE_NONOTIFY | FMODE_NONOTIFY_PERM to indicate that. > */ > > thing in file_set_fsnotify_mode() which only allows regular files and > directories to be notified on. > > Or, alternatively, that check for huge-fault disabling is just > checking the wrong bits. > > Or - quite possibly - I am missing something obvious? Is it possible that we have some paths got overlooked in setting up the fsnotify bits in f_mode? Meanwhile since the default is "no bit set" on those bits, I think it means FMODE_FSNOTIFY_HSM() can always return true on those if overlooked.. One thing to mention is, /dev/vfio/* are chardevs, however the PCI bars are not mmap()ed from these fds - whatever under /dev/vfio/* represents IOMMU groups rather than the device fd itself. The app normally needs to first open the IOMMU group fd under /dev/vfio/*, then using VFIO ioctl(VFIO_GROUP_GET_DEVICE_FD) to get the device fd, which will be the mmap() target, instead of the ones under /dev. I checked, those device fds were allocated from vfio_device_open_file() within the ioctl, which internally uses anon_inode_getfile(). I don't see anywhere in that path that will set the fanotify bits.. Further, I'm not sure whether some callers of alloc_file() can also suffer from similar issue, because at least memfd_create() syscall also uses the API, which (hopefully?) would used to allow THPs for shmem backed memfds on aligned mmap()s, but not sure whether it'll also wrongly trigger the FALLBACK path similarly in create_huge_pmd() just like vfio's VMAs. I didn't verify it though, nor did I yet check more users. So I wonder whether we should setup the fanotify bits in at least alloc_file() too (to FMODE_NONOTIFY?). I'm totally not familiar with fanotify, and it's a bit late to try verify anything (I cannot quickly find my previous huge pfnmap setup, so setup those will also take time..). but maybe above can provide some clues for others.. Thanks, -- Peter Xu