From: Christian Brauner <brauner@kernel.org>
To: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: David Hildenbrand <david@redhat.com>,
Jingbo Xu <jefflexu@linux.alibaba.com>,
Joanne Koong <joannelkoong@gmail.com>,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
shakeel.butt@linux.dev, josef@toxicpanda.com, linux-mm@kvack.org,
kernel-team@meta.com, Matthew Wilcox <willy@infradead.org>,
Zi Yan <ziy@nvidia.com>, Oscar Salvador <osalvador@suse.de>,
Michal Hocko <mhocko@kernel.org>,
Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings
Date: Thu, 3 Apr 2025 11:35:23 +0200 [thread overview]
Message-ID: <20250403-option-holztisch-de5d88079f59@brauner> (raw)
In-Reply-To: <cb6a5eb4-582b-42ba-a4b8-7ecaccbf5ba2@fastmail.fm>
On Thu, Apr 03, 2025 at 11:25:17AM +0200, Bernd Schubert wrote:
>
>
> On 4/3/25 11:18, David Hildenbrand wrote:
> > On 03.04.25 05:31, Jingbo Xu wrote:
> >>
> >>
> >> On 4/3/25 5:34 AM, Joanne Koong wrote:
> >>> On Thu, Dec 19, 2024 at 5:05 AM David Hildenbrand <david@redhat.com>
> >>> wrote:
> >>>>
> >>>> On 23.11.24 00:23, Joanne Koong wrote:
> >>>>> For migrations called in MIGRATE_SYNC mode, skip migrating the
> >>>>> folio if
> >>>>> it is under writeback and has the AS_WRITEBACK_INDETERMINATE flag
> >>>>> set on its
> >>>>> mapping. If the AS_WRITEBACK_INDETERMINATE flag is set on the
> >>>>> mapping, the
> >>>>> writeback may take an indeterminate amount of time to complete, and
> >>>>> waits may get stuck.
> >>>>>
> >>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>>>> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> >>>>> ---
> >>>>> mm/migrate.c | 5 ++++-
> >>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/mm/migrate.c b/mm/migrate.c
> >>>>> index df91248755e4..fe73284e5246 100644
> >>>>> --- a/mm/migrate.c
> >>>>> +++ b/mm/migrate.c
> >>>>> @@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t
> >>>>> get_new_folio,
> >>>>> */
> >>>>> switch (mode) {
> >>>>> case MIGRATE_SYNC:
> >>>>> - break;
> >>>>> + if (!src->mapping ||
> >>>>> + !mapping_writeback_indeterminate(src-
> >>>>> >mapping))
> >>>>> + break;
> >>>>> + fallthrough;
> >>>>> default:
> >>>>> rc = -EBUSY;
> >>>>> goto out;
> >>>>
> >>>> Ehm, doesn't this mean that any fuse user can essentially completely
> >>>> block CMA allocations, memory compaction, memory hotunplug, memory
> >>>> poisoning... ?!
> >>>>
> >>>> That sounds very bad.
> >>>
> >>> I took a closer look at the migration code and the FUSE code. In the
> >>> migration code in migrate_folio_unmap(), I see that any MIGATE_SYNC
> >>> mode folio lock holds will block migration until that folio is
> >>> unlocked. This is the snippet in migrate_folio_unmap() I'm looking at:
> >>>
> >>> if (!folio_trylock(src)) {
> >>> if (mode == MIGRATE_ASYNC)
> >>> goto out;
> >>>
> >>> if (current->flags & PF_MEMALLOC)
> >>> goto out;
> >>>
> >>> if (mode == MIGRATE_SYNC_LIGHT && !
> >>> folio_test_uptodate(src))
> >>> goto out;
> >>>
> >>> folio_lock(src);
> >>> }
> >>>
> >
> > Right, I raised that also in my LSF/MM talk: waiting for readahead
> > currently implies waiting for the folio lock (there is no separate
> > readahead flag like there would be for writeback).
> >
> > The more I look into this and fuse, the more I realize that what fuse
> > does is just completely broken right now.
> >
> >>> If this is all that is needed for a malicious FUSE server to block
> >>> migration, then it makes no difference if AS_WRITEBACK_INDETERMINATE
> >>> mappings are skipped in migration. A malicious server has easier and
> >>> more powerful ways of blocking migration in FUSE than trying to do it
> >>> through writeback. For a malicious fuse server, we in fact wouldn't
> >>> even get far enough to hit writeback - a write triggers
> >>> aops->write_begin() and a malicious server would deliberately hang
> >>> forever while the folio is locked in write_begin().
> >>
> >> Indeed it seems possible. A malicious FUSE server may already be
> >> capable of blocking the synchronous migration in this way.
> >
> > Yes, I think the conclusion is that we should advise people from not
> > using unprivileged FUSE if they care about any features that rely on
> > page migration or page reclaim.
> >
> >>
> >>
> >>>
> >>> I looked into whether we could eradicate all the places in FUSE where
> >>> we may hold the folio lock for an indeterminate amount of time,
> >>> because if that is possible, then we should not add this writeback way
> >>> for a malicious fuse server to affect migration. But I don't think we
> >>> can, for example taking one case, the folio lock needs to be held as
> >>> we read in the folio from the server when servicing page faults, else
> >>> the page cache would contain stale data if there was a concurrent
> >>> write that happened just before, which would lead to data corruption
> >>> in the filesystem. Imo, we need a more encompassing solution for all
> >>> these cases if we're serious about preventing FUSE from blocking
> >>> migration, which probably looks like a globally enforced default
> >>> timeout of some sort or an mm solution for mitigating the blast radius
> >>> of how much memory can be blocked from migration, but that is outside
> >>> the scope of this patchset and is its own standalone topic.
> >
> > I'm still skeptical about timeouts: we can only get it wrong.
> >
> > I think a proper solution is making these pages movable, which does seem
> > feasible if (a) splice is not involved and (b) we can find a way to not
> > hold the folio lock forever e.g., in the readahead case.
> >
> > Maybe readahead would have to be handled more similar to writeback
> > (e.g., having a separate flag, or using a combination of e.g.,
> > writeback+uptodate flag, not sure)
> >
> > In both cases (readahead+writeback), we'd want to call into the FS to
> > migrate a folio that is under readahread/writeback. In case of fuse
> > without splice, a migration might be doable, and as discussed, splice
> > might just be avoided.
>
> My personal take is here that we should move away from splice.
> Keith (or colleague) is working on ZC with io-uring anyway, so
> maybe a good timing. We should just ensure that the new approach
> doesn't have the same issue.
splice is problematic in a lot of other ways too. It's easy to abuse it
for weird userspace hangs since it clings onto the pipe_lock() and no
one wants to do the invasive surgery to wean it off of that. So +1 on
avoiding splice.
next prev parent reply other threads:[~2025-04-03 9:35 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-22 23:23 [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-11-22 23:23 ` [PATCH v6 1/5] mm: add AS_WRITEBACK_INDETERMINATE mapping flag Joanne Koong
2024-11-22 23:23 ` [PATCH v6 2/5] mm: skip reclaiming folios in legacy memcg writeback indeterminate contexts Joanne Koong
2024-11-22 23:23 ` [PATCH v6 3/5] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_INDETERMINATE mappings Joanne Koong
2024-11-22 23:23 ` [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with " Joanne Koong
2024-12-19 13:05 ` David Hildenbrand
2024-12-19 14:19 ` Zi Yan
2024-12-19 15:08 ` Zi Yan
2024-12-19 15:39 ` David Hildenbrand
2024-12-19 15:47 ` Zi Yan
2024-12-19 15:50 ` David Hildenbrand
2024-12-19 15:43 ` Shakeel Butt
2024-12-19 15:47 ` David Hildenbrand
2024-12-19 15:53 ` Shakeel Butt
2024-12-19 15:55 ` Zi Yan
2024-12-19 15:56 ` Bernd Schubert
2024-12-19 16:00 ` Zi Yan
2024-12-19 16:02 ` Zi Yan
2024-12-19 16:09 ` Bernd Schubert
2024-12-19 16:14 ` Zi Yan
2024-12-19 16:26 ` Shakeel Butt
2024-12-19 16:31 ` David Hildenbrand
2024-12-19 16:53 ` Shakeel Butt
2024-12-19 16:22 ` Shakeel Butt
2024-12-19 16:29 ` David Hildenbrand
2024-12-19 16:40 ` Shakeel Butt
2024-12-19 16:41 ` David Hildenbrand
2024-12-19 17:14 ` Shakeel Butt
2024-12-19 17:26 ` David Hildenbrand
2024-12-19 17:30 ` Bernd Schubert
2024-12-19 17:37 ` Shakeel Butt
2024-12-19 17:40 ` Bernd Schubert
2024-12-19 17:44 ` Joanne Koong
2024-12-19 17:54 ` Shakeel Butt
2024-12-20 11:44 ` David Hildenbrand
2024-12-20 12:15 ` Bernd Schubert
2024-12-20 14:49 ` David Hildenbrand
2024-12-20 15:26 ` Bernd Schubert
2024-12-20 18:01 ` Shakeel Butt
2024-12-21 2:28 ` Jingbo Xu
2024-12-21 16:23 ` David Hildenbrand
2024-12-22 2:47 ` Jingbo Xu
2024-12-24 11:32 ` David Hildenbrand
2024-12-21 16:18 ` David Hildenbrand
2024-12-23 22:14 ` Shakeel Butt
2024-12-24 12:37 ` David Hildenbrand
2024-12-26 15:11 ` Zi Yan
2024-12-26 20:13 ` Shakeel Butt
2024-12-26 22:02 ` Bernd Schubert
2024-12-27 20:08 ` Joanne Koong
2024-12-27 20:32 ` Bernd Schubert
2024-12-30 17:52 ` Joanne Koong
2024-12-30 10:16 ` David Hildenbrand
2024-12-30 18:38 ` Joanne Koong
2024-12-30 19:52 ` David Hildenbrand
2024-12-30 20:11 ` Shakeel Butt
2025-01-02 18:54 ` Joanne Koong
2025-01-03 20:31 ` David Hildenbrand
2025-01-06 10:19 ` Miklos Szeredi
2025-01-06 18:17 ` Shakeel Butt
2025-01-07 8:34 ` David Hildenbrand
2025-01-07 18:07 ` Shakeel Butt
2025-01-09 11:22 ` David Hildenbrand
2025-01-10 20:28 ` Jeff Layton
2025-01-10 21:13 ` David Hildenbrand
2025-01-10 22:00 ` Shakeel Butt
2025-01-13 15:27 ` David Hildenbrand
2025-01-13 21:44 ` Jeff Layton
2025-01-14 8:38 ` Miklos Szeredi
2025-01-14 9:40 ` Miklos Szeredi
2025-01-14 9:55 ` Bernd Schubert
2025-01-14 10:07 ` Miklos Szeredi
2025-01-14 18:07 ` Joanne Koong
2025-01-14 18:58 ` Miklos Szeredi
2025-01-14 19:12 ` Joanne Koong
2025-01-14 20:00 ` Miklos Szeredi
2025-01-14 20:29 ` Jeff Layton
2025-01-14 21:40 ` Bernd Schubert
2025-01-23 16:06 ` Pavel Begunkov
2025-01-14 20:51 ` Joanne Koong
2025-01-24 12:25 ` David Hildenbrand
2025-01-14 15:49 ` Jeff Layton
2025-01-24 12:29 ` David Hildenbrand
2025-01-28 10:16 ` Miklos Szeredi
2025-01-14 15:44 ` Jeff Layton
2025-01-14 18:58 ` Joanne Koong
2025-01-10 23:11 ` Jeff Layton
2025-01-10 20:16 ` Jeff Layton
2025-01-10 20:20 ` David Hildenbrand
2025-01-10 20:43 ` Jeff Layton
2025-01-10 21:00 ` David Hildenbrand
2025-01-10 21:07 ` Jeff Layton
2025-01-10 21:21 ` David Hildenbrand
2025-01-07 16:15 ` Miklos Szeredi
2025-01-08 1:40 ` Jingbo Xu
2024-12-30 20:04 ` Shakeel Butt
2025-01-02 19:59 ` Joanne Koong
2025-01-02 20:26 ` Zi Yan
2024-12-20 21:01 ` Joanne Koong
2024-12-21 16:25 ` David Hildenbrand
2024-12-21 21:59 ` Bernd Schubert
2024-12-23 19:00 ` Joanne Koong
2024-12-26 22:44 ` Bernd Schubert
2024-12-27 18:25 ` Joanne Koong
2024-12-19 17:55 ` Joanne Koong
2024-12-19 18:04 ` Bernd Schubert
2024-12-19 18:11 ` Shakeel Butt
2024-12-20 7:55 ` Jingbo Xu
2025-04-02 21:34 ` Joanne Koong
2025-04-03 3:31 ` Jingbo Xu
2025-04-03 9:18 ` David Hildenbrand
2025-04-03 9:25 ` Bernd Schubert
2025-04-03 9:35 ` Christian Brauner [this message]
2025-04-03 19:09 ` Joanne Koong
2025-04-03 20:44 ` David Hildenbrand
2025-04-03 22:04 ` Joanne Koong
2024-11-22 23:23 ` [PATCH v6 5/5] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-25 9:46 ` Jingbo Xu
2024-12-12 21:55 ` [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-12-13 11:52 ` Miklos Szeredi
2024-12-13 16:47 ` Shakeel Butt
2024-12-18 17:37 ` Joanne Koong
2024-12-18 17:44 ` Shakeel Butt
2024-12-18 17:53 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250403-option-holztisch-de5d88079f59@brauner \
--to=brauner@kernel.org \
--cc=bernd.schubert@fastmail.fm \
--cc=david@redhat.com \
--cc=jefflexu@linux.alibaba.com \
--cc=joannelkoong@gmail.com \
--cc=josef@toxicpanda.com \
--cc=kbusch@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=miklos@szeredi.hu \
--cc=osalvador@suse.de \
--cc=shakeel.butt@linux.dev \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox