From: Shakeel Butt <shakeel.butt@linux.dev>
To: Zi Yan <ziy@nvidia.com>
Cc: Bernd Schubert <bernd.schubert@fastmail.fm>,
David Hildenbrand <david@redhat.com>,
Joanne Koong <joannelkoong@gmail.com>,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
jefflexu@linux.alibaba.com, josef@toxicpanda.com,
linux-mm@kvack.org, kernel-team@meta.com,
Matthew Wilcox <willy@infradead.org>,
Oscar Salvador <osalvador@suse.de>,
Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings
Date: Thu, 19 Dec 2024 08:26:06 -0800 [thread overview]
Message-ID: <7qyun2waznrduxpf2i5eebqdvpigrd5ycu4rlpawu336kqkyvh@xmfmlsmr43gw> (raw)
In-Reply-To: <6FBDD501-25A0-4A21-8051-F8EE74AD177B@nvidia.com>
On Thu, Dec 19, 2024 at 11:14:49AM -0500, Zi Yan wrote:
> On 19 Dec 2024, at 11:09, Bernd Schubert wrote:
>
> > On 12/19/24 17:02, Zi Yan wrote:
> >> On 19 Dec 2024, at 11:00, Zi Yan wrote:
> >>> On 19 Dec 2024, at 10:56, Bernd Schubert wrote:
> >>>
> >>>> On 12/19/24 16:55, Zi Yan wrote:
> >>>>> On 19 Dec 2024, at 10:53, Shakeel Butt wrote:
> >>>>>
> >>>>>> On Thu, Dec 19, 2024 at 04:47:18PM +0100, David Hildenbrand wrote:
> >>>>>>> On 19.12.24 16:43, Shakeel Butt wrote:
> >>>>>>>> On Thu, Dec 19, 2024 at 02:05:04PM +0100, David Hildenbrand wrote:
> >>>>>>>>> On 23.11.24 00:23, Joanne Koong wrote:
> >>>>>>>>>> For migrations called in MIGRATE_SYNC mode, skip migrating the folio if
> >>>>>>>>>> it is under writeback and has the AS_WRITEBACK_INDETERMINATE flag set on its
> >>>>>>>>>> mapping. If the AS_WRITEBACK_INDETERMINATE flag is set on the mapping, the
> >>>>>>>>>> writeback may take an indeterminate amount of time to complete, and
> >>>>>>>>>> waits may get stuck.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>>>>>>>>> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> >>>>>>>>>> ---
> >>>>>>>>>> mm/migrate.c | 5 ++++-
> >>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
> >>>>>>>>>> index df91248755e4..fe73284e5246 100644
> >>>>>>>>>> --- a/mm/migrate.c
> >>>>>>>>>> +++ b/mm/migrate.c
> >>>>>>>>>> @@ -1260,7 +1260,10 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> >>>>>>>>>> */
> >>>>>>>>>> switch (mode) {
> >>>>>>>>>> case MIGRATE_SYNC:
> >>>>>>>>>> - break;
> >>>>>>>>>> + if (!src->mapping ||
> >>>>>>>>>> + !mapping_writeback_indeterminate(src->mapping))
> >>>>>>>>>> + break;
> >>>>>>>>>> + fallthrough;
> >>>>>>>>>> default:
> >>>>>>>>>> rc = -EBUSY;
> >>>>>>>>>> goto out;
> >>>>>>>>>
> >>>>>>>>> Ehm, doesn't this mean that any fuse user can essentially completely block
> >>>>>>>>> CMA allocations, memory compaction, memory hotunplug, memory poisoning... ?!
> >>>>>>>>>
> >>>>>>>>> That sounds very bad.
> >>>>>>>>
> >>>>>>>> The page under writeback are already unmovable while they are under
> >>>>>>>> writeback. This patch is only making potentially unrelated tasks to
> >>>>>>>> synchronously wait on writeback completion for such pages which in worst
> >>>>>>>> case can be indefinite. This actually is solving an isolation issue on a
> >>>>>>>> multi-tenant machine.
> >>>>>>>>
> >>>>>>> Are you sure, because I read in the cover letter:
> >>>>>>>
> >>>>>>> "In the current FUSE writeback design (see commit 3be5a52b30aa ("fuse:
> >>>>>>> support writable mmap"))), a temp page is allocated for every dirty
> >>>>>>> page to be written back, the contents of the dirty page are copied over to
> >>>>>>> the temp page, and the temp page gets handed to the server to write back.
> >>>>>>> This is done so that writeback may be immediately cleared on the dirty
> >>>>>>> page,"
> >>>>>>>
> >>>>>>> Which to me means that they are immediately movable again?
> >>>>>>
> >>>>>> Oh sorry, my mistake, yes this will become an isolation issue with the
> >>>>>> removal of the temp page in-between which this series is doing. I think
> >>>>>> the tradeoff is between extra memory plus slow write performance versus
> >>>>>> temporary unmovable memory.
> >>>>>
> >>>>> No, the tradeoff is slow FUSE performance vs whole system slowdown due to
> >>>>> memory fragmentation. AS_WRITEBACK_INDETERMINATE indicates it is not
> >>>>> temporary.
> >>>>
> >>>> Is there is a difference between FUSE TMP page being unmovable and
> >>>> AS_WRITEBACK_INDETERMINATE folios/pages being unmovable?
> >>
> >> (Fix my response location)
> >>
> >> Both are unmovable, but you can control where FUSE TMP page
> >> can come from to avoid spread across the entire memory space. For example,
> >> allocate a contiguous region as a TMP page pool.
> >
> > Wouldn't it make sense to have that for fuse writeback pages as well?
> > Fuse tries to limit dirty pages anyway.
>
> Can fuse constraint the location of writeback pages? Something like what
> I proposed[1], migrating pages to a location before their writeback? Will
> that be a performance concern?
>
> In terms of the number of dirty pages, you only need one page out of 512
> pages to prevent 2MB THP from allocation. For CMA allocation, one unmovable
> page can kill one contiguous range. What is the limit of fuse dirty pages?
>
> [1] https://lore.kernel.org/linux-mm/90C41581-179F-40B6-9801-9C9DBBEB1AF4@nvidia.com/
I think this whole concern of fuse making system memory unmovable
forever is overblown. Fuse is already using a temp (unmovable) page for
the writeback and is slow and is being removed in this series.
next prev parent reply other threads:[~2024-12-19 16:26 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-22 23:23 [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-11-22 23:23 ` [PATCH v6 1/5] mm: add AS_WRITEBACK_INDETERMINATE mapping flag Joanne Koong
2024-11-22 23:23 ` [PATCH v6 2/5] mm: skip reclaiming folios in legacy memcg writeback indeterminate contexts Joanne Koong
2024-11-22 23:23 ` [PATCH v6 3/5] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_INDETERMINATE mappings Joanne Koong
2024-11-22 23:23 ` [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with " Joanne Koong
2024-12-19 13:05 ` David Hildenbrand
2024-12-19 14:19 ` Zi Yan
2024-12-19 15:08 ` Zi Yan
2024-12-19 15:39 ` David Hildenbrand
2024-12-19 15:47 ` Zi Yan
2024-12-19 15:50 ` David Hildenbrand
2024-12-19 15:43 ` Shakeel Butt
2024-12-19 15:47 ` David Hildenbrand
2024-12-19 15:53 ` Shakeel Butt
2024-12-19 15:55 ` Zi Yan
2024-12-19 15:56 ` Bernd Schubert
2024-12-19 16:00 ` Zi Yan
2024-12-19 16:02 ` Zi Yan
2024-12-19 16:09 ` Bernd Schubert
2024-12-19 16:14 ` Zi Yan
2024-12-19 16:26 ` Shakeel Butt [this message]
2024-12-19 16:31 ` David Hildenbrand
2024-12-19 16:53 ` Shakeel Butt
2024-12-19 16:22 ` Shakeel Butt
2024-12-19 16:29 ` David Hildenbrand
2024-12-19 16:40 ` Shakeel Butt
2024-12-19 16:41 ` David Hildenbrand
2024-12-19 17:14 ` Shakeel Butt
2024-12-19 17:26 ` David Hildenbrand
2024-12-19 17:30 ` Bernd Schubert
2024-12-19 17:37 ` Shakeel Butt
2024-12-19 17:40 ` Bernd Schubert
2024-12-19 17:44 ` Joanne Koong
2024-12-19 17:54 ` Shakeel Butt
2024-12-20 11:44 ` David Hildenbrand
2024-12-20 12:15 ` Bernd Schubert
2024-12-20 14:49 ` David Hildenbrand
2024-12-20 15:26 ` Bernd Schubert
2024-12-20 18:01 ` Shakeel Butt
2024-12-21 2:28 ` Jingbo Xu
2024-12-21 16:23 ` David Hildenbrand
2024-12-22 2:47 ` Jingbo Xu
2024-12-24 11:32 ` David Hildenbrand
2024-12-21 16:18 ` David Hildenbrand
2024-12-23 22:14 ` Shakeel Butt
2024-12-24 12:37 ` David Hildenbrand
2024-12-26 15:11 ` Zi Yan
2024-12-26 20:13 ` Shakeel Butt
2024-12-26 22:02 ` Bernd Schubert
2024-12-27 20:08 ` Joanne Koong
2024-12-27 20:32 ` Bernd Schubert
2024-12-30 17:52 ` Joanne Koong
2024-12-30 10:16 ` David Hildenbrand
2024-12-30 18:38 ` Joanne Koong
2024-12-30 19:52 ` David Hildenbrand
2024-12-30 20:11 ` Shakeel Butt
2025-01-02 18:54 ` Joanne Koong
2025-01-03 20:31 ` David Hildenbrand
2025-01-06 10:19 ` Miklos Szeredi
2025-01-06 18:17 ` Shakeel Butt
2025-01-07 8:34 ` David Hildenbrand
2025-01-07 18:07 ` Shakeel Butt
2025-01-09 11:22 ` David Hildenbrand
2025-01-10 20:28 ` Jeff Layton
2025-01-10 21:13 ` David Hildenbrand
2025-01-10 22:00 ` Shakeel Butt
2025-01-13 15:27 ` David Hildenbrand
2025-01-13 21:44 ` Jeff Layton
2025-01-14 8:38 ` Miklos Szeredi
2025-01-14 9:40 ` Miklos Szeredi
2025-01-14 9:55 ` Bernd Schubert
2025-01-14 10:07 ` Miklos Szeredi
2025-01-14 18:07 ` Joanne Koong
2025-01-14 18:58 ` Miklos Szeredi
2025-01-14 19:12 ` Joanne Koong
2025-01-14 20:00 ` Miklos Szeredi
2025-01-14 20:29 ` Jeff Layton
2025-01-14 21:40 ` Bernd Schubert
2025-01-23 16:06 ` Pavel Begunkov
2025-01-14 20:51 ` Joanne Koong
2025-01-24 12:25 ` David Hildenbrand
2025-01-14 15:49 ` Jeff Layton
2025-01-24 12:29 ` David Hildenbrand
2025-01-28 10:16 ` Miklos Szeredi
2025-01-14 15:44 ` Jeff Layton
2025-01-14 18:58 ` Joanne Koong
2025-01-10 23:11 ` Jeff Layton
2025-01-10 20:16 ` Jeff Layton
2025-01-10 20:20 ` David Hildenbrand
2025-01-10 20:43 ` Jeff Layton
2025-01-10 21:00 ` David Hildenbrand
2025-01-10 21:07 ` Jeff Layton
2025-01-10 21:21 ` David Hildenbrand
2025-01-07 16:15 ` Miklos Szeredi
2025-01-08 1:40 ` Jingbo Xu
2024-12-30 20:04 ` Shakeel Butt
2025-01-02 19:59 ` Joanne Koong
2025-01-02 20:26 ` Zi Yan
2024-12-20 21:01 ` Joanne Koong
2024-12-21 16:25 ` David Hildenbrand
2024-12-21 21:59 ` Bernd Schubert
2024-12-23 19:00 ` Joanne Koong
2024-12-26 22:44 ` Bernd Schubert
2024-12-27 18:25 ` Joanne Koong
2024-12-19 17:55 ` Joanne Koong
2024-12-19 18:04 ` Bernd Schubert
2024-12-19 18:11 ` Shakeel Butt
2024-12-20 7:55 ` Jingbo Xu
2025-04-02 21:34 ` Joanne Koong
2025-04-03 3:31 ` Jingbo Xu
2025-04-03 9:18 ` David Hildenbrand
2025-04-03 9:25 ` Bernd Schubert
2025-04-03 9:35 ` Christian Brauner
2025-04-03 19:09 ` Joanne Koong
2025-04-03 20:44 ` David Hildenbrand
2025-04-03 22:04 ` Joanne Koong
2024-11-22 23:23 ` [PATCH v6 5/5] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-25 9:46 ` Jingbo Xu
2024-12-12 21:55 ` [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-12-13 11:52 ` Miklos Szeredi
2024-12-13 16:47 ` Shakeel Butt
2024-12-18 17:37 ` Joanne Koong
2024-12-18 17:44 ` Shakeel Butt
2024-12-18 17:53 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7qyun2waznrduxpf2i5eebqdvpigrd5ycu4rlpawu336kqkyvh@xmfmlsmr43gw \
--to=shakeel.butt@linux.dev \
--cc=bernd.schubert@fastmail.fm \
--cc=david@redhat.com \
--cc=jefflexu@linux.alibaba.com \
--cc=joannelkoong@gmail.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=miklos@szeredi.hu \
--cc=osalvador@suse.de \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox