From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E1D5C19F32 for ; Fri, 7 Mar 2025 17:45:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 547B56B0088; Fri, 7 Mar 2025 12:45:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F8DA6B0089; Fri, 7 Mar 2025 12:45:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C0A16B008C; Fri, 7 Mar 2025 12:45:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 185C56B0088 for ; Fri, 7 Mar 2025 12:45:29 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D033CB7E0C for ; Fri, 7 Mar 2025 17:45:29 +0000 (UTC) X-FDA: 83195481978.06.1B8B754 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf27.hostedemail.com (Postfix) with ESMTP id B6D454001A for ; Fri, 7 Mar 2025 17:45:27 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SaSEvlLn; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of amir73il@gmail.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=amir73il@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741369527; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=T2bV+93GEFpn3KE1wY4bHFWSGsAPF8+R+4qoLLnnUFE=; b=DRXf6XQ1X8FDShFmuaqlU1pnngNTVauxqwfur20C43BlQ+nNJnsU/0QbL6zLc8tYE7DZ2U xY5V4eeHVN+Y+syixhVEu+jI2/lSMEFdlIcbXXpikvhMQ2YPHXzzni8X7BSxi/u1YOha4a Pc5Yx/gCbhRo13iGM64TD0W4TMt4RRg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SaSEvlLn; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of amir73il@gmail.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=amir73il@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741369527; a=rsa-sha256; cv=none; b=pln3QJgPalopES6Q26QrbkAbQYTZReZenWMZt2VZ6CXPWN9ERQooielStKzV7ScA8qw/lF SHl0cgJhwN5oQObJMZWXl8365Bga5wAbYvri5W7BrxRW/RSope6zxCguxzNP6VRRRJRdRN EaYBbTtNPXcwS2E0+tilZ/jj64sS/QA= Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-abfe7b5fbe8so283670966b.0 for ; Fri, 07 Mar 2025 09:45:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741369526; x=1741974326; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=T2bV+93GEFpn3KE1wY4bHFWSGsAPF8+R+4qoLLnnUFE=; b=SaSEvlLnV4PiRxkXEWSWx7SZ9rId/y4O+DTo678GjPBuk0RvitEHGcuSpH2pIsfW7+ xqgVbCy9dKyWrHfONfMQG8iU7cu8/8S/NHK8z6jrF3tRr4mLczDa+rB6pHDAUyDBZE4y FnIBDEOqNauu/XKXhFQrmCHWl/F65BP8niGxK/ja/m1VQ+b1BctGSVqLASMRzJfhTRkM DvoKlt8KP+NalGVPkTw9CqrFNxBNYSWLMGd1DKtOCGzZZgJGQXarKnMGDQ1W1m0Bg+AY JQIF6LujnAnIZgrVBfkDwaa1RQ34ohxsORYKXzoMpeUhqhCoulrZXnrREt53bZEaXreE ZupQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741369526; x=1741974326; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T2bV+93GEFpn3KE1wY4bHFWSGsAPF8+R+4qoLLnnUFE=; b=ITcCCP544hyuWj/DeOz8KLbzmlhPL3ZVrRPy4T801XSb2pHulp6hl3mbObOHEomjEt hUf4X/pTXy5NmtN2HUvT5K+K4/UzgIWm4XpERqiYp7sXLkTAs3AcKywoGzfyCPMCaIQD /nNQTTBEFAG6PGds36BT1v0AEViemNWn5Egh5mAX2QFEtiZlzMTQ0TNsc6RaoYOSXn/y DuNUS2EGMp0JMktrZ37eUGDl5oksQ6lrwmSq8NUssVSp8wKooa2CSuoLI0lOnT9FJhR8 WHbxLFiGmru4IIagDo5Ag/+Ip2HzLwODUDSEvBbXij60/l1Zij1K1O6oQdPJOa60wpxZ /LNA== X-Forwarded-Encrypted: i=1; AJvYcCXA2GpQVzAv9XRkhg3MYGEE3Pf6k8t6CFLNiCaPgkj2WJyPxXIYHgffc2tbHsrgqGM98s1roR/E/Q==@kvack.org X-Gm-Message-State: AOJu0YxLbV+xSIhwxN50UiMGfSKj3CH9oczp39TI6pcicfWQmBqxv9oo f09mc+/NS8j3Ww5/AE5DxDg/qm0UyS4rgkXlbChlMLQIkjpUnsBjQ2fTn1lNtfGl+DAbeFdJUUk kVBt5YkCJvr679ay7+ORI6ZbgLcw= X-Gm-Gg: ASbGncvebnigPogBYUWNW7e3NS4QKsCwKgj9SzYN3jZvG+2a9wT69L4+FwRtNIJzcbF yuhUvzhV/E5KypWmB41u2YZjfNarQr4xRR4YzAd4EWEemHTE9lNzm/D6YRK5mZQQJYCs0GRv9tU yG38tt+bYbNubZQ7l8EXkPhHb/mw== X-Google-Smtp-Source: AGHT+IEmBRmko4QLwkLRaQOt5FhmEtxUggnufhBEEq70Dk25RmdOR/ZshO5S87xM2JoNs1YaRyM17eG15eURqooW9k8= X-Received: by 2002:a05:6402:2683:b0:5e5:c847:1a56 with SMTP id 4fb4d7f45d1cf-5e5e22bd1b6mr10312024a12.10.1741369525381; Fri, 07 Mar 2025 09:45:25 -0800 (PST) MIME-Version: 1.0 References: <67a487f7.050a0220.19061f.05fc.GAE@google.com> <67c4881e.050a0220.1dee4d.0054.GAE@google.com> <7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc> <20250304161509.GA4047943@perftesting> <20250304203657.GA4063187@perftesting> <20250307154614.GA59451@perftesting> In-Reply-To: From: Amir Goldstein Date: Fri, 7 Mar 2025 18:45:13 +0100 X-Gm-Features: AQ5f1JofUmv1iYpL1FGdwCjrdhgnQ4Cw6QHEOjLqZc--7F_SrO3COjsDQQLVPw8 Message-ID: Subject: Re: [syzbot] [xfs?] WARNING in fsnotify_file_area_perm To: Josef Bacik Cc: Jan Kara , syzbot , akpm@linux-foundation.org, axboe@kernel.dk, brauner@kernel.org, cem@kernel.org, chandan.babu@oracle.com, djwong@kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-xfs@vger.kernel.org, syzkaller-bugs@googlegroups.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B6D454001A X-Rspam-User: X-Stat-Signature: 31nqfw8dbq3rj4wjfhoc7iynbwnwm6he X-HE-Tag: 1741369527-139885 X-HE-Meta: U2FsdGVkX196t8UYEWM8X0bh4zLZS3aUhJLbKA6BJJwSLih9TsKJOoGaMaoqD942e1QwexMjkOv7359ZbMYzjbYuSU+Xzog2ABKOSds1s5PRvQSwCSykoVEkmqk9K09cCq8Exl/S2qjcz/Akz1zY9CvEIX6PsdRcdnvs7SGS1HUTkeRX17xsCZLiYaRnTtXO2A1PzGPQlfJXaQAkyC7ziLbAg//Tk8pduo6+R0XJoNN6jAPzzVySx5EZrW8hiVk1ECT5NCCIg19HGNHdmRQDTVaxoBCzsY2mjGNk+1c+c+/lHLbBFc8jmIdVEcaDhNtrA7/KJclhWWtV3wDXdvZ0WsUHe21kdki0AjO6H1nBKS/0KoF6CDZS8snIJvvoIMB0/h5aYYgSsnYz6gpePVuJJ0PAdnSZEmYzxd4wAiST2uaSJ10pxM9g+vrb7zXWV768rH12U1GqKUDdywtnvo4Tt5v8qBDpUN8IXLucCkraBvS3nSum+l2WVnCRc/8WpZMLytGRR184+X8LXbMx3aVlXOvEL+WAZLyea5fxE/BxBTqqzxwOOXOSX2t0cXhAypHRqcWXTX2nY6Il1SwCN6Z/ojoSkYOQVBCgKnmAs83DxZCpUm7ar87KPHOHzAj97ZwcRBplJBOndJm+g5PnzTseUEgjUjwZ1qdiXP+ZeNLtwP2QHLzqCDRoJrF89SfSnNjL2ZwzYMaVhes4Bc+WMfZkycyDLfAYs4ApkscI3gmiQqo3MpUeyrN+8XNfVwwucij3ue+m4Q+q7kntB1vYlf74avkknDYlDTS5VlpufMNRRagM7R63R/4UhGAlpe1tWRXd7C2UeRV+4zezmHyRDBDswAWNgkTkUeI5nbl4GVjQonLXhK0dCODiBBjwy43KSxyocPy7rwOvmYtlsUfLdBPDNdcz+ORg9eYWnr1uFMCGlJ/2QMQcFg83TEu9+mDQFAmFbwa8BBWbCPjsXWjPXEp sjYP5WV/ rHkwo+NPee1XRl66j6YBj1eTTphIQwNpgHSznySkOLM0taBIoZ+03apGX7EWnYb16FrukgpCyGgb3DkhS5Yc/0oiVkYZgqwppsMgK3BSF9SY7598IJH7j0+fbbcT6+3RnBwl3eoi4DeGt3FN7+osFuozlz5PaSmsiOHyknC3njF/w35WAti1My3dv8CIx9tATS4BiydGJ/XsEzuK1BTPHNDAN5dCgFp47pECS8PLCMafETAivsJVd++1aHCMi4IGnFuv5JSJpcg7gTt4nY5rZBTGYZ+zvRrqJBeo7Nhf/iaVGFqPYHnwdJOv+5zOEFrAiY/cAz+jnQJO1khskVbLfb81zc/Yjx3q3httXlX+gBW5iiUShTEPb77biHW2fSeqOnU4c9UG6ruUdKZJ1aycgrxFUbKW/2ZLYrdbW5gjg1/QeytNn8B39eX4UBYX5nnu3l2u9iBwsF9bl4gOqtnVLoGF67ZaE9vMUlWvPfQ3udXWidSXjV8a7kUxn65T8Xnl0/NQiMMyFgIFhpZW0pCBVr59eKWaekeLWGhwV2txZIRCOP6sQLHUy2xQgq9WYDG51lfSKCUaGY6YfcnUYon7xwpBzBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 7, 2025 at 5:07=E2=80=AFPM Amir Goldstein = wrote: > > On Fri, Mar 7, 2025 at 4:46=E2=80=AFPM Josef Bacik = wrote: > > > > On Tue, Mar 04, 2025 at 10:13:39PM +0100, Amir Goldstein wrote: > > > On Tue, Mar 4, 2025 at 9:37=E2=80=AFPM Josef Bacik wrote: > > > > > > > > On Tue, Mar 04, 2025 at 09:27:20PM +0100, Amir Goldstein wrote: > > > > > On Tue, Mar 4, 2025 at 5:15=E2=80=AFPM Josef Bacik wrote: > > > > > > > > > > > > On Tue, Mar 04, 2025 at 04:09:16PM +0100, Amir Goldstein wrote: > > > > > > > On Tue, Mar 4, 2025 at 12:06=E2=80=AFPM Jan Kara wrote: > > > > > > > > > > > > > > > > Josef, Amir, > > > > > > > > > > > > > > > > this is indeed an interesting case: > > > > > > > > > > > > > > > > On Sun 02-03-25 08:32:30, syzbot wrote: > > > > > > > > > syzbot has found a reproducer for the following issue on: > > > > > > > > ... > > > > > > > > > ------------[ cut here ]------------ > > > > > > > > > WARNING: CPU: 1 PID: 6440 at ./include/linux/fsnotify.h:1= 45 fsnotify_file_area_perm+0x20c/0x25c include/linux/fsnotify.h:145 > > > > > > > > > Modules linked in: > > > > > > > > > CPU: 1 UID: 0 PID: 6440 Comm: syz-executor370 Not tainted= 6.14.0-rc4-syzkaller-ge056da87c780 #0 > > > > > > > > > Hardware name: Google Google Compute Engine/Google Comput= e Engine, BIOS Google 12/27/2024 > > > > > > > > > pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTY= PE=3D--) > > > > > > > > > pc : fsnotify_file_area_perm+0x20c/0x25c include/linux/fs= notify.h:145 > > > > > > > > > lr : fsnotify_file_area_perm+0x20c/0x25c include/linux/fs= notify.h:145 > > > > > > > > > sp : ffff8000a42569d0 > > > > > > > > > x29: ffff8000a42569d0 x28: ffff0000dcec1b48 x27: ffff0000= d68a1708 > > > > > > > > > x26: ffff0000d68a16c0 x25: dfff800000000000 x24: 00000000= 00008000 > > > > > > > > > x23: 0000000000000001 x22: ffff8000a4256b00 x21: 00000000= 00001000 > > > > > > > > > x20: 0000000000000010 x19: ffff0000d68a16c0 x18: ffff8000= a42566e0 > > > > > > > > > x17: 000000000000e388 x16: ffff800080466c24 x15: 00000000= 00000001 > > > > > > > > > x14: 1fffe0001b31513c x13: 0000000000000000 x12: 00000000= 00000000 > > > > > > > > > x11: 0000000000000001 x10: 0000000000ff0100 x9 : 00000000= 00000000 > > > > > > > > > x8 : ffff0000c6d98000 x7 : 0000000000000000 x6 : 00000000= 00000000 > > > > > > > > > x5 : 0000000000000020 x4 : 0000000000000000 x3 : 00000000= 00001000 > > > > > > > > > x2 : ffff8000a4256b00 x1 : 0000000000000001 x0 : 00000000= 00000000 > > > > > > > > > Call trace: > > > > > > > > > fsnotify_file_area_perm+0x20c/0x25c include/linux/fsnoti= fy.h:145 (P) > > > > > > > > > filemap_fault+0x12b0/0x1518 mm/filemap.c:3509 > > > > > > > > > xfs_filemap_fault+0xc4/0x194 fs/xfs/xfs_file.c:1543 > > > > > > > > > __do_fault+0xf8/0x498 mm/memory.c:4988 > > > > > > > > > do_read_fault mm/memory.c:5403 [inline] > > > > > > > > > do_fault mm/memory.c:5537 [inline] > > > > > > > > > do_pte_missing mm/memory.c:4058 [inline] > > > > > > > > > handle_pte_fault+0x3504/0x57b0 mm/memory.c:5900 > > > > > > > > > __handle_mm_fault mm/memory.c:6043 [inline] > > > > > > > > > handle_mm_fault+0xfa8/0x188c mm/memory.c:6212 > > > > > > > > > do_page_fault+0x570/0x10a8 arch/arm64/mm/fault.c:690 > > > > > > > > > do_translation_fault+0xc4/0x114 arch/arm64/mm/fault.c:78= 3 > > > > > > > > > do_mem_abort+0x74/0x200 arch/arm64/mm/fault.c:919 > > > > > > > > > el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:432 > > > > > > > > > el1h_64_sync_handler+0x60/0xcc arch/arm64/kernel/entry-c= ommon.c:510 > > > > > > > > > el1h_64_sync+0x6c/0x70 arch/arm64/kernel/entry.S:595 > > > > > > > > > __uaccess_mask_ptr arch/arm64/include/asm/uaccess.h:169 = [inline] (P) > > > > > > > > > fault_in_readable+0x168/0x310 mm/gup.c:2234 (P) > > > > > > > > > fault_in_iov_iter_readable+0x1dc/0x22c lib/iov_iter.c:94 > > > > > > > > > iomap_write_iter fs/iomap/buffered-io.c:950 [inline] > > > > > > > > > iomap_file_buffered_write+0x490/0xd54 fs/iomap/buffered-= io.c:1039 > > > > > > > > > xfs_file_buffered_write+0x2dc/0xac8 fs/xfs/xfs_file.c:79= 2 > > > > > > > > > xfs_file_write_iter+0x2c4/0x6ac fs/xfs/xfs_file.c:881 > > > > > > > > > new_sync_write fs/read_write.c:586 [inline] > > > > > > > > > vfs_write+0x704/0xa9c fs/read_write.c:679 > > > > > > > > > > > > > > > > The backtrace actually explains it all. We had a buffered w= rite whose > > > > > > > > buffer was mmapped file on a filesystem with an HSM mark. N= ow the prefaulting > > > > > > > > of the buffer happens already (quite deep) under the filesy= stem freeze > > > > > > > > protection (obtained in vfs_write()) which breaks assumptio= ns of HSM code > > > > > > > > and introduces potential deadlock of HSM handler in userspa= ce with filesystem > > > > > > > > freezing. So we need to think how to deal with this case... > > > > > > > > > > > > > > Ouch. It's like the splice mess all over again. > > > > > > > Except we do not really care to make this use case work with = HSM > > > > > > > in the sense that we do not care to have to fill in the mmape= d file content > > > > > > > in this corner case - we just need to let HSM fail the access= if content is > > > > > > > not available. > > > > > > > > > > > > > > If you remember, in one of my very early version of pre-conte= nt events, > > > > > > > the pre-content event (or maybe it was FAN_ACCESS_PERM itself= ) > > > > > > > carried a flag (I think it was called FAN_PRE_VFS) to communi= cate to > > > > > > > HSM service if it was safe to write to fs in the context of e= vent handling. > > > > > > > > > > > > > > At the moment, I cannot think of any elegant way out of this = use case > > > > > > > except annotating the event from fault_in_readable() as "unsa= fe-for-write". > > > > > > > This will relax the debugging code assertion and notify the H= SM service > > > > > > > (via an event flag) that it can ALLOW/DENY, but it cannot fil= l the file. > > > > > > > Maybe we can reuse the FAN_ACCESS_PERM event to communicate > > > > > > > this case to HSM service. > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > I think that mmap was a mistake. > > > > > > > > > > What do you mean? > > > > > Isn't the fault hook required for your large executables use case= ? > > > > > > > > I mean the mmap syscall was a mistake ;). > > > > > > > > > > ah :) > > > > > > > > > > > > > > > > > > > > Is there a way to tell if we're currently in a path that is und= er fsfreeze > > > > > > protection? > > > > > > > > > > Not at the moment. > > > > > At the moment, file_write_not_started() is not a reliable check > > > > > (has false positives) without CONFIG_LOCKDEP. > > > > > > > > > > > One very ugly solution is to require CONFIG_LOCKDEP for > > > pre-content events. > > > > > > > > > Just denying this case would be a simpler short term solution w= hile > > > > > > we come up with a long term solution. I think your solution is = fine, but I'd be > > > > > > just as happy with a simpler "this isn't allowed" solution. Tha= nks, > > > > > > > > > > Yeh, I don't mind that, but it's a bit of an overkill considering= that > > > > > file with no content may in fact be rare. > > > > > > > > Agreed, I'm fine with your solution. > > > > > > Well, my "solution" was quite hand-wavy - it did not really say how t= o > > > propagate the fact that faults initiated from fault_in_readable(). > > > Do you guys have any ideas for a simple solution? > > > > Sorry I've been elbow deep in helping getting our machine replacements = working > > faster. > > > > I've been thnking about this, it's not like we can carry context from t= he reason > > we are faulting in, at least not simply, so I think the best thing to d= o is > > either > > > > 1) Emit a precontent event at mmap() time for the whole file, since rea= lly all I > > care about is faulting at exec time, and then we can just skip the prec= ontent > > event if we're not exec. > > Sorry, not that familiar with exec code. Do you mean to issue pre-content > for page fault only if memory is mapped executable or is there another wa= y > of knowing that we are in exec context? > > If the former, then syzbot will catch up with us and write a buffer which= is > mapped readable and exec. > > > > > 2) Revert the page fault stuff, put back your thing to fault the whole = file, and > > wait until we think of a better way to deal with this. > > > > Obviously I'd prefer not #2, but I'd really, really rather not chuck al= l of HSM > > because my page fault thing is silly. I'll carry what I need internall= y while > > we figure out what to do upstream. #1 doesn't seem bad, but I haven't = thought > > about it that hard. Thanks, > > > > So I started to test this patch, but I may be doing something very > terribly wrong > with this. Q: What is this something that is terribly wrong? > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 2788df98080f8..a8822b44d4967 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -3033,13 +3033,27 @@ static inline void file_start_write(struct file *= file) > if (!S_ISREG(file_inode(file)->i_mode)) > return; > sb_start_write(file_inode(file)->i_sb); > + /* > + * Prevent fault-in user pages that may call HSM hooks with > + * sb_writers held. > + */ > + if (unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) > + pagefault_disable(); > } > > static inline bool file_start_write_trylock(struct file *file) > { > if (!S_ISREG(file_inode(file)->i_mode)) > return true; > - return sb_start_write_trylock(file_inode(file)->i_sb); > + if (!sb_start_write_trylock(file_inode(file)->i_sb)) > + return false; > + /* > + * Prevent fault-in user pages that may call HSM hooks with > + * sb_writers held. > + */ > + if (unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) > + pagefault_disable(); > + return true; > } > > /** > @@ -3053,6 +3067,8 @@ static inline void file_end_write(struct file *file= ) > if (!S_ISREG(file_inode(file)->i_mode)) > return; > sb_end_write(file_inode(file)->i_sb); > + if (unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) > + pagefault_enable(); > } One thing that is wrong is that this is checking if the written file is marked for pre-content events, not the input buffer mmaped file. What we would have needed here is a check of unlikely(fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_PRE_CONTENT))= ) But Linus will not like that... Do we even care about optimizing the pre-content hooks of sporadic files that are not marked for pre-content events when there are pre-content watches on the filesystem? I think all of our use cases mark the sb for pre-content events anyway and do not care about a bit of overhead for non-marked files. If that is the case we can do away with the extra optimization and then the changes above will really solve the issue. I've squashed the followup change to the fsnotify-fixes branch. One thing that this patch does not address is aio and io_uring, but the comment above fault_in_iov_iter_readable() says: " ...For async buffered writes the assumption is that the user " page has already been faulted in. IDK. Let me know what you think. Thanks, Amir. --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -652,7 +652,6 @@ void file_set_fsnotify_mode_from_watchers(struct file *= file) { struct dentry *dentry =3D file->f_path.dentry, *parent; struct super_block *sb =3D dentry->d_sb; - __u32 mnt_mask, p_mask; /* Is it a file opened by fanotify? */ if (FMODE_FSNOTIFY_NONE(file->f_mode)) @@ -681,30 +680,10 @@ void file_set_fsnotify_mode_from_watchers(struct file *file) } /* - * OK, there are some pre-content watchers. Check if anybody is - * watching for pre-content events on *this* file. + * OK, there are some pre-content watchers on this fs, so + * Enable pre-content events. */ - mnt_mask =3D READ_ONCE(real_mount(file->f_path.mnt)->mnt_fsnotify_m= ask); - if (unlikely(fsnotify_object_watched(d_inode(dentry), mnt_mask, - FSNOTIFY_PRE_CONTENT_EVENTS))) { - /* Enable pre-content events */ - file_set_fsnotify_mode(file, 0); - return; - } - - /* Is parent watching for pre-content events on this file? */ - if (dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) { - parent =3D dget_parent(dentry); - p_mask =3D fsnotify_inode_watches_children(d_inode(parent))= ; - dput(parent); - if (p_mask & FSNOTIFY_PRE_CONTENT_EVENTS) { - /* Enable pre-content events */ - file_set_fsnotify_mode(file, 0); - return; - } - } - /* Nobody watching for pre-content events from this file */ - file_set_fsnotify_mode(file, FMODE_NONOTIFY | FMODE_NONOTIFY_PERM); + file_set_fsnotify_mode(file, 0); } #endif