linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Josef Bacik <josef@toxicpanda.com>,
	kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz,
	amir73il@gmail.com, brauner@kernel.org, viro@zeniv.linux.org.uk,
	linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-mm@kvack.org, linux-ext4@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [REGRESSION] Re: [PATCH v8 15/19] mm: don't allow huge faults for files with pre content watches
Date: Fri, 31 Jan 2025 20:19:22 -0500	[thread overview]
Message-ID: <Z512mt1hmX5Jg7iH@x1.local> (raw)
In-Reply-To: <CAHk-=wjMPZ7htPTzxtF52-ZPShfFOQ4R-pHVxLO+pfOW5avC4Q@mail.gmail.com>

On Fri, Jan 31, 2025 at 11:59:56AM -0800, Linus Torvalds wrote:
> On Fri, 31 Jan 2025 at 11:17, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches")
> >
> > This breaks huge_fault support for PFNMAPs that was recently added in
> > v6.12 and is used by vfio-pci to fault device memory using PMD and PUD
> > order mappings.
> 
> Surely only for content watches?
> 
> Which shouldn't be a valid situation *anyway*.
> 
> IOW, there must be some unrelated bug somewhere: either somebody is
> allowed to set a pre-content match on a special device.
> 
> That should be disabled by the whole
> 
>         /*
>          * If there are permission event watchers but no pre-content event
>          * watchers, set FMODE_NONOTIFY | FMODE_NONOTIFY_PERM to indicate that.
>          */
> 
> thing in file_set_fsnotify_mode() which only allows regular files and
> directories to be notified on.
> 
> Or, alternatively, that check for huge-fault disabling is just
> checking the wrong bits.
> 
> Or - quite possibly - I am missing something obvious?

Is it possible that we have some paths got overlooked in setting up the
fsnotify bits in f_mode? Meanwhile since the default is "no bit set" on
those bits, I think it means FMODE_FSNOTIFY_HSM() can always return true on
those if overlooked..

One thing to mention is, /dev/vfio/* are chardevs, however the PCI bars are
not mmap()ed from these fds - whatever under /dev/vfio/* represents IOMMU
groups rather than the device fd itself.

The app normally needs to first open the IOMMU group fd under /dev/vfio/*,
then using VFIO ioctl(VFIO_GROUP_GET_DEVICE_FD) to get the device fd, which
will be the mmap() target, instead of the ones under /dev.

I checked, those device fds were allocated from vfio_device_open_file()
within the ioctl, which internally uses anon_inode_getfile().  I don't see
anywhere in that path that will set the fanotify bits..

Further, I'm not sure whether some callers of alloc_file() can also suffer
from similar issue, because at least memfd_create() syscall also uses the
API, which (hopefully?) would used to allow THPs for shmem backed memfds on
aligned mmap()s, but not sure whether it'll also wrongly trigger the
FALLBACK path similarly in create_huge_pmd() just like vfio's VMAs.  I
didn't verify it though, nor did I yet check more users.

So I wonder whether we should setup the fanotify bits in at least
alloc_file() too (to FMODE_NONOTIFY?).

I'm totally not familiar with fanotify, and it's a bit late to try verify
anything (I cannot quickly find my previous huge pfnmap setup, so setup
those will also take time..). but maybe above can provide some clues for
others..

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-02-01  1:19 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-15 15:30 [PATCH v8 00/19] fanotify: add pre-content hooks Josef Bacik
2024-11-15 15:30 ` [PATCH v8 01/19] fs: get rid of __FMODE_NONOTIFY kludge Josef Bacik
2024-11-18 18:14   ` Jan Kara
2024-11-15 15:30 ` [PATCH v8 02/19] fsnotify: opt-in for permission events at file open time Josef Bacik
2024-11-20 15:53   ` Jan Kara
2024-11-20 16:12     ` Amir Goldstein
2024-11-21  9:39       ` Jan Kara
2024-11-21 10:09         ` Christian Brauner
2024-11-21 11:04           ` Amir Goldstein
2024-11-21 11:16             ` Jan Kara
2024-11-21 11:32               ` Amir Goldstein
2024-11-21  9:45     ` Christian Brauner
2024-11-21 11:39       ` Amir Goldstein
2024-11-15 15:30 ` [PATCH v8 03/19] fsnotify: add helper to check if file is actually being watched Josef Bacik
2024-11-20 16:02   ` Jan Kara
2024-11-20 16:42     ` Amir Goldstein
2024-11-21  8:54       ` Jan Kara
2024-11-15 15:30 ` [PATCH v8 04/19] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-11-15 15:30 ` [PATCH v8 05/19] fanotify: rename a misnamed constant Josef Bacik
2024-11-15 15:30 ` [PATCH v8 06/19] fanotify: reserve event bit of deprecated FAN_DIR_MODIFY Josef Bacik
2024-11-15 15:30 ` [PATCH v8 07/19] fsnotify: introduce pre-content permission events Josef Bacik
2024-11-15 15:30 ` [PATCH v8 08/19] fsnotify: pass optional file access range in pre-content event Josef Bacik
2024-11-15 15:30 ` [PATCH v8 09/19] fsnotify: generate pre-content permission event on truncate Josef Bacik
2024-11-20 15:23   ` Jan Kara
2024-11-20 15:57     ` Amir Goldstein
2024-11-20 16:16       ` Jan Kara
2024-11-15 15:30 ` [PATCH v8 10/19] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-11-15 15:59   ` Amir Goldstein
2024-11-21 10:44   ` Jan Kara
2024-11-21 14:18     ` Amir Goldstein
2024-11-21 16:36       ` Jan Kara
2024-11-21 18:31         ` Amir Goldstein
2024-11-21 18:37           ` Amir Goldstein
2024-11-22 12:42             ` Jan Kara
2024-11-22 13:51               ` Amir Goldstein
2024-11-27 12:18                 ` Jan Kara
2024-11-27 12:20                   ` Amir Goldstein
2024-11-15 15:30 ` [PATCH v8 11/19] fanotify: report file range info with pre-content events Josef Bacik
2024-11-15 15:30 ` [PATCH v8 12/19] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-11-15 15:30 ` [PATCH v8 13/19] fanotify: add a helper to check for pre content events Josef Bacik
2024-11-20 15:44   ` Jan Kara
2024-11-20 16:43     ` Amir Goldstein
2024-11-15 15:30 ` [PATCH v8 14/19] fanotify: disable readahead if we have pre-content watches Josef Bacik
2024-11-15 15:30 ` [PATCH v8 15/19] mm: don't allow huge faults for files with pre content watches Josef Bacik
2025-01-31 19:17   ` [REGRESSION] " Alex Williamson
2025-01-31 19:59     ` Linus Torvalds
2025-02-01  1:19       ` Peter Xu [this message]
2025-02-01 14:38         ` Christian Brauner
2025-02-02  0:58           ` Linus Torvalds
2025-02-02  7:46             ` Amir Goldstein
2025-02-02 10:04               ` Christian Brauner
2025-02-03 12:41                 ` Jan Kara
2025-02-03 20:39                   ` Amir Goldstein
2025-02-03 21:41                     ` Alex Williamson
2025-02-03 22:04                       ` Amir Goldstein
2024-11-15 15:30 ` [PATCH v8 16/19] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-12-08 16:58   ` Klara Modin
2024-12-09 10:45     ` Aithal, Srikanth
2024-12-09 12:34       ` Jan Kara
2024-12-09 12:31     ` Jan Kara
2024-12-09 12:56       ` Klara Modin
2024-12-09 14:16         ` Jan Kara
2024-12-10 21:12     ` Randy Dunlap
2024-12-11 16:30       ` Jan Kara
2024-11-15 15:30 ` [PATCH v8 17/19] xfs: add pre-content fsnotify hook for write faults Josef Bacik
2024-11-21 10:22   ` Jan Kara
2024-11-15 15:30 ` [PATCH v8 18/19] btrfs: disable defrag on pre-content watched files Josef Bacik
2024-11-15 15:30 ` [PATCH v8 19/19] fs: enable pre-content events on supported file systems Josef Bacik
2024-11-21 11:29 ` [PATCH v8 00/19] fanotify: add pre-content hooks Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z512mt1hmX5Jg7iH@x1.local \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox