From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BDCBD3ABF4 for ; Mon, 11 Nov 2024 20:27:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 933956B00AF; Mon, 11 Nov 2024 15:27:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E08F6B00B1; Mon, 11 Nov 2024 15:27:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75A1F6B00B0; Mon, 11 Nov 2024 15:27:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 500726B00AD for ; Mon, 11 Nov 2024 15:27:28 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EEB43141B86 for ; Mon, 11 Nov 2024 20:27:27 +0000 (UTC) X-FDA: 82774948074.10.91235D8 Received: from mail-oo1-f45.google.com (mail-oo1-f45.google.com [209.85.161.45]) by imf12.hostedemail.com (Postfix) with ESMTP id A8C084000A for ; Mon, 11 Nov 2024 20:27:07 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=AEi9xDsb; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of amir73il@gmail.com designates 209.85.161.45 as permitted sender) smtp.mailfrom=amir73il@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731356759; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c9QaPRaGh7bw5bqPaGITtEPnNR2LGGuZCaS4WqMgl5c=; b=qxQjrL0lyaYBXeif5iDlPBrVrfGIYIOswVHFn5LlFQkduL7pPKIlLV5gvd3X4mMwA9umG4 tuTkaolk+QWIePlrcbJcuRfIpFnWwwm7I6TsQQiFSUIcMSdgP8u5Q1ShB2GrLHWzjhRV/8 QpL97/egad3xQjsg0afPsjTC4aCCSkU= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=AEi9xDsb; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of amir73il@gmail.com designates 209.85.161.45 as permitted sender) smtp.mailfrom=amir73il@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731356759; a=rsa-sha256; cv=none; b=AEG2djVtZbsQJGhrZlvGM/Y1DnGybOH+h9gzIjJrBxkvVpGbnvqG6owFIqVi1Eqf1Imzu/ +M2RhMH6JWEuOO3PdjWnp1m4QrktMNeuXvJTjt+IdeMxyMb+JeRyV1le6Y8/gEWRj/BSt0 BRq/49vMqvzff/xwDU7sJ+j9TNPGpcc= Received: by mail-oo1-f45.google.com with SMTP id 006d021491bc7-5eb9ee4f14cso2070881eaf.1 for ; Mon, 11 Nov 2024 12:27:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731356845; x=1731961645; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=c9QaPRaGh7bw5bqPaGITtEPnNR2LGGuZCaS4WqMgl5c=; b=AEi9xDsbK1l3k+KsMlfV4AQ8c4fx8mcam7j+XjqR/EltqHnYwywcUwhYQ6+7nOBJfZ xR9lkfpshp9+I16Ev0ZVEOADraN5VJecWTJOJmXsmBik4UEVftDFMYuZ/V9Uhz3LXdyF uYWvO5WSkN4JH/2xDZTHty99oHXIR80OWL7Dm9hBviQUE6Qaz9gTpQLw/noB6pPdGtnd HAeOJ2qLN93mLleaBhRU/og8SrXHoB4BaOpA3uDCA0Rk2C6cF3hPjM9ebb7gH4atzLFp LIerqOEO2UMWhuAMIG3aNbl9m9aZc4DBE9DqDOf0jHMieiJJOyXByE8zEs3EesPO2QcO MtOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731356845; x=1731961645; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c9QaPRaGh7bw5bqPaGITtEPnNR2LGGuZCaS4WqMgl5c=; b=kREl2ipySgRC7nPV6kZx8xxE4wbYoWuZX+YFTV9cz6pdbAVCDVUFw7rtIkz83vjeKy 8qXNz6MB+iAG2JnJZfXo1uZfyRQASEYNLeJZcjrM3qqczoLYHKx3W8oxYGGHiD1rI1Mu iRJhq3MCHmxsmzDJKhGWLcHMqwJJZQKMwHr1Ew/pO+at8gOGjKxzKbnizG3qge6/mmHU pCkPWQdLJjt439kVPTou7za2VlWMjXJ3y8lY1u5agIvUsudUBra6tZYmxjj6gf8xthfe vBsrKczp42v83nX1v0WaJ82R3H7nqyg0k3EqzwpEM8V5RaborO6A9RCVxLiBd3D2iMFg WFAw== X-Forwarded-Encrypted: i=1; AJvYcCVKhh4wXJAl407sNVauNOKaSuevK1uCup6HEilo3r8iJRCONnS1TdGg0oshqE8CJ5BlDsziUdDknQ==@kvack.org X-Gm-Message-State: AOJu0YxQqctMNDcVhcIHrPfpQlSJLqDFkOCmR5+hKTOGuYrV0BGg0tz0 ZKROCPXvnj7q87KqehkhrFPqjMA8qKi0NCKuct17HjhqEzuRZ5IZJ2UvcC2BuaTytqaan58Y9qs qkwTF4OsgMCL+9SGW9T5iDx0dNGM= X-Google-Smtp-Source: AGHT+IEoThJIKWhSy1sVjVdrSYHS9PPvCysBXmyjm3/91owWrpCqZeqALU/ivnY0aIfruV13lp3Y2ax4AW9B7wKddTQ= X-Received: by 2002:a05:6358:4b50:b0:1bc:2d00:84ad with SMTP id e5c5f4694b2df-1c641e83bd0mr576865055d.3.1731356844814; Mon, 11 Nov 2024 12:27:24 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Amir Goldstein Date: Mon, 11 Nov 2024 21:27:13 +0100 Message-ID: Subject: Re: [PATCH v6 00/17] fanotify: add pre-content hooks To: Josef Bacik Cc: kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz, brauner@kernel.org, torvalds@linux-foundation.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-ext4@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A8C084000A X-Stat-Signature: tcqmfr3io8udw9igbbdr4jzx1q5xfd8d X-HE-Tag: 1731356827-764684 X-HE-Meta: U2FsdGVkX1/lF/EwlQO4IOwDhWWNOmgvRIhlf06Rv+36vDk/QJFfD7IBiaCUW7kjtqOirmt7/Au4XzQa5m4h1EWL0RIow3agocdIu9K9x3f6xv5ZbMp3npl0uUT7h7UjniSA/+GG/pw8OYF+8qL4jSN6kVNt7v1wfK85M2GcaQkIREuA9P2cC3df3wroUv6qYOeSAX+L/zYFfpEdKMdhOrlTtKQIK3zGbm+vpoShfA57d0OeyP5vPcPRVa9FEKfH7v5T5gwcu5avNwTb8W/6FU9H5l2AyELGCOmSDMlCVyjCf+JtIRGwrr9woaqf3rIy8TQkQQuyq31af33Zgd/4Gg2H3J11GqSi0l8zfUcntvD5ratjfIgSRpPpcV3pQPGV6Tu56uC+vxG3YR76Wyrgr3jNkzQMd8RY6vyWQ0kyn0rCUTV57pcyG0hurdP7MPI+dQxLdfIkHLg5ulq70HDJOiIRyk6VFs/u5pxfdyMsdaW8GdyEvRSqC+u4c65Dq4CHaELJPka5KlKUFHXbFi5w1UgrUr3x4ymfQErs+dTeP2tXX59ygcgHuACmMBPEnyUepvp6y9XDqanTACkIe30yNgXFhDV95dhQVJQeS2dWXFTgUIT80NWNtDaQc1u+P13fpQmUZB0pS4usCQeFAO2Mc/MeA5tmCOrspeo7gJFNOcLTGgrM9OcSp2tdUK1oiCnaXkmaJ41Frrj2SjcjQD5yMvP+x3xu1VqFSuy4dtVpcb+6cGF+sxay676fWstxPFSaNuQMidGqr9LuhVTYk1EnIa21+6ytoGhB7gg+aY8RyAP8sx+xt6LM/duWx2D4zGumDfp5ERiLf/P7xeNHX7xgdzEFVstr0FCnImzUXtyJJ+fqJvz3DvLnPTon0rQk6fghONXeie9exZPj2bkpXTsI71xcubWSbVFrO6a7OgXoyVshmkeQcd4CIluB2LT9wLgk2eBWFSj5Fjv08lp4iAP zYmMXTuX v9DsFSpjxLNO1YXR7FmsnYXENX39lWfbeJmpcxuNvMeMCXONWw9DX31oMjbD9uesmqH9bXV9Wt4ErBFoOo85jWfZzz5ihTcrphwyAAR/lY3pggsc8n3FbwU4TSEzLsnTZywHTQ7Nl1tXydlyQhh70AXm7vRSg2BpjCu7uV/ADyMZphsH0+j5PMnkQfeCDUk9JfO4TkGTuGE9OceCeuWg63cbuj4KITEtz+eaz4inMuFLjJnXOcP9dJaZq8cNGRt/QJpHgC6w/HL3hjfAg7Y++xtIyLsTloFp1xN4xn2e+PgQOtQMONX4qWjn/Q6pKZAgIRon7eDecD/lIpNblMzAhyn1zVKoTdTRoGF6Ag4SL768dyL739EToB2VHeYKCjg2lq9t2dDQbk/v7XAY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 11, 2024 at 9:19=E2=80=AFPM Josef Bacik = wrote: > > v5: https://lore.kernel.org/linux-fsdevel/cover.1725481503.git.josef@toxi= cpanda.com/ > v4: https://lore.kernel.org/linux-fsdevel/cover.1723670362.git.josef@toxi= cpanda.com/ > v3: https://lore.kernel.org/linux-fsdevel/cover.1723228772.git.josef@toxi= cpanda.com/ > v2: https://lore.kernel.org/linux-fsdevel/cover.1723144881.git.josef@toxi= cpanda.com/ > v1: https://lore.kernel.org/linux-fsdevel/cover.1721931241.git.josef@toxi= cpanda.com/ > > v5->v6: > - Linus had problems with this and rejected Jan's PR > (https://lore.kernel.org/linux-fsdevel/20240923110348.tbwihs42dxxltabc@= quack3/), > so I'm respinning this series to address his concerns. Hopefully this = is more > acceptable. > - Change the page fault hooks to happen only in the case where we have to= add a > page, not where there exists pages already. > - Amir added a hook to truncate. > - We made the flag per SB instead of per fstype, Amir wanted this because= of > some potential issues with other file system specific work he's doing. Not me :) it for the upcoming ps > bs patch set for xfs. It would be easiest to opt-out of this config for HSM to begin with. > - Dropped the bcachefs patch, there were some concerns that we were doing > something wrong, and it's not a huge deal to not have this feature for = now. > - Unfortunately the xfs write fault path still has to do the page fault h= ook As Jan corrected me, this is only for the DAX page faults in xfs, so we sho= uld be ok with fsnotify hook called on every fault in this case. > before we know if we have a page or not, this is because of the locking= that's > done before we get to the part where we know if we have a page already = or not, > so that's the path that is still the same from last iteration. > - I've re-validated this series with btrfs, xfs, and ext4 to make sure I = didn't > break anything. Thanks! Amir. > > v4->v5: > - Cleaned up the various "I'll fix it on commit" notes that Jan made sinc= e I had > to respin the series anyway. > - Renamed the filemap pagefault helper for fsnotify per Christians sugges= tion. > - Added a FS_ALLOW_HSM flag per Jan's comments, based on Amir's rough ske= tch. > - Added a patch to disable btrfs defrag on pre-content watched files. > - Added a patch to turn on FS_ALLOW_HSM for all the file systems that I t= ested. > - Added two fstests (which will be posted separately) to validate everyth= ing, > re-validated the series with btrfs, xfs, ext4, and bcachefs to make sur= e I > didn't break anything. > > v3->v4: > - Trying to send a final verson Friday at 5pm before you go on vacation i= s a > recipe for silly mistakes, fixed the xfs handling yet again, per Christ= oph's > review. > - Reworked the file system helper so it's handling of fpin was a little l= ess > silly, per Chinner's suggestion. > - Updated the return values to not or in VM_FAULT_RETRY, as we have a com= ment > in filemap_fault that says if VM_FAULT_ERROR is set we won't have > VM_FAULT_RETRY set. > > v2->v3: > - Fix the pagefault path to do MAY_ACCESS instead, updated the perm handl= er to > emit PRE_ACCESS in this case, so we can avoid the extraneous perm event= as per > Amir's suggestion. > - Reworked the exported helper so the per-filesystem changes are much sma= ller, > per Amir's suggestion. > - Fixed the screwup for DAX writes per Chinner's suggestion. > - Added Christian's reviewed-by's where appropriate. > > v1->v2: > - reworked the page fault logic based on Jan's suggestion and turned it i= nto a > helper. > - Added 3 patches per-fs where we need to call the fsnotify helper from t= heir > ->fault handlers. > - Disabled readahead in the case that there's a pre-content watch in plac= e. > - Disabled huge faults when there's a pre-content watch in place (entirel= y > because it's untested, theoretically it should be straightforward to do= ). > - Updated the command numbers. > - Addressed the random spelling/grammer mistakes that Jan pointed out. > - Addressed the other random nits from Jan. > > --- Original email --- > > Hello, > > These are the patches for the bare bones pre-content fanotify support. T= he > majority of this work is Amir's, my contribution to this has solely been = around > adding the page fault hooks, testing and validating everything. I'm send= ing it > because Amir is traveling a bunch, and I touched it last so I'm going to = take > all the hate and he can take all the credit. > > There is a PoC that I've been using to validate this work, you can find t= he git > repo here > > https://github.com/josefbacik/remote-fetch > > This consists of 3 different tools. > > 1. populate. This just creates all the stub files in the directory from = the > source directory. Just run ./populate ~/linux ~/hsm-linux and it'll > recursively create all of the stub files and directories. > 2. remote-fetch. This is the actual PoC, you just point it at the source= and > destination directory and then you can do whatever. ./remote-fetch ~/= linux > ~/hsm-linux. > 3. mmap-validate. This was to validate the pagefault thing, this is like= ly what > will be turned into the selftest with remote-fetch. It creates a file= and > then you can validate the file matches the right pattern with both nor= mal > reads and mmap. Normally I do something like > > ./mmap-validate create ~/src/foo > ./populate ~/src ~/dst > ./rmeote-fetch ~/src ~/dst > ./mmap-validate validate ~/dst/foo > > I did a bunch of testing, I also got some performance numbers. I copied = a > kernel tree, and then did remote-fetch, and then make -j4 > > Normal > real 9m49.709s > user 28m11.372s > sys 4m57.304s > > HSM > real 10m6.454s > user 29m10.517s > sys 5m2.617s > > So ~17 seconds more to build with HSM. I then did a make mrproper on bot= h trees > to see the size > > [root@fedora ~]# du -hs /src/linux > 1.6G /src/linux > [root@fedora ~]# du -hs dst > 125M dst > > This mirrors the sort of savings we've seen in production. > > Meta has had these patches (minus the page fault patch) deployed in produ= ction > for almost a year with our own utility for doing on-demand package fetchi= ng. > The savings from this has been pretty significant. > > The page-fault hooks are necessary for the last thing we need, which is > on-demand range fetching of executables. Some of our binaries are severa= l gigs > large, having the ability to remote fetch them on demand is a huge win fo= r us > not only with space savings, but with startup time of containers. > > There will be tests for this going into LTP once we're satisfied with the > patches and they're on their way upstream. Thanks, > > Josef > > Amir Goldstein (9): > fanotify: rename a misnamed constant > fanotify: reserve event bit of deprecated FAN_DIR_MODIFY > fsnotify: introduce pre-content permission events > fsnotify: pass optional file access range in pre-content event > fsnotify: generate pre-content permission event on open > fsnotify: generate pre-content permission event on truncate > fanotify: introduce FAN_PRE_ACCESS permission event > fanotify: report file range info with pre-content events > fanotify: allow to set errno in FAN_DENY permission response > > Josef Bacik (8): > fanotify: don't skip extra event info if no info_mode is set > fanotify: add a helper to check for pre content events > fanotify: disable readahead if we have pre-content watches > mm: don't allow huge faults for files with pre content watches > fsnotify: generate pre-content permission event on page fault > xfs: add pre-content fsnotify hook for write faults > btrfs: disable defrag on pre-content watched files > fs: enable pre-content events on supported file systems > > fs/btrfs/ioctl.c | 9 +++ > fs/btrfs/super.c | 5 +- > fs/ext4/super.c | 3 + > fs/namei.c | 10 ++- > fs/notify/fanotify/fanotify.c | 33 ++++++-- > fs/notify/fanotify/fanotify.h | 15 ++++ > fs/notify/fanotify/fanotify_user.c | 120 +++++++++++++++++++++++------ > fs/notify/fsnotify.c | 18 ++++- > fs/open.c | 31 +++++--- > fs/xfs/xfs_file.c | 4 + > fs/xfs/xfs_super.c | 2 +- > include/linux/fanotify.h | 19 +++-- > include/linux/fs.h | 1 + > include/linux/fsnotify.h | 73 ++++++++++++++++-- > include/linux/fsnotify_backend.h | 59 +++++++++++++- > include/linux/mm.h | 1 + > include/uapi/linux/fanotify.h | 18 +++++ > mm/filemap.c | 90 ++++++++++++++++++++++ > mm/memory.c | 22 ++++++ > mm/readahead.c | 13 ++++ > security/selinux/hooks.c | 3 +- > 21 files changed, 491 insertions(+), 58 deletions(-) > > -- > 2.43.0 >