From: Harry Yoo <harry.yoo@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
"Tobin C. Harding" <tobin@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Matthew Wilcox <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>, Rik van Riel <riel@surriel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Jann Horn <jannh@google.com>, Pedro Falcato <pfalcato@suse.de>,
David Hildenbrand <david@redhat.com>,
Oscar Salvador <osalvador@suse.de>,
Michal Hocko <mhocko@kernel.org>,
Byungchul Park <byungchul@sk.com>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [DISCUSSION] Revisiting Slab Movable Objects
Date: Fri, 25 Apr 2025 20:09:21 +0900 [thread overview]
Message-ID: <aAttYSQsYc5y1AZO@harry> (raw)
In-Reply-To: <aAa-gCSHDFcNS3HS@dread.disaster.area>
On Tue, Apr 22, 2025 at 07:54:08AM +1000, Dave Chinner wrote:
> On Mon, Apr 21, 2025 at 10:47:39PM +0900, Harry Yoo wrote:
> > Hi folks,
> >
> > As a long term project, I'm starting to look into resurrecting
> > Slab Movable Objects. The goal is to make certain types of slab memory
> > movable and thus enable targeted reclamation, migration, and
> > defragmentation.
> >
> > The main purpose of this posting is to briefly review what's been tried
> > in the past, ask people why prior efforts have stalled (due to lack of
> > time or insufficient justification for additional complexity?),
> > and discuss what's feasible today.
> >
> > Please add anyone I may have missed to Cc. :)
>
> Adding -fsdevel because dentry/inode cache discussion needs to be
> visible to all the fs/VFS developers.
>
> I'm going to cut straight to the chase here, but I'll leave the rest
> of the original email quoted below for -fsdevel readers.
>
> > Previous Work on Slab Movable Objects
> > =====================================
>
> <snip>
>
> Without including any sort of viable proposal for dentry/inode
> relocation (i.e. the showstopper for past attempts), what is the
> point of trying to ressurect this?
Migrating slabs still makes sense for other objects such as xarray / maple
tree nodes, and VMAs.
Of course, if filesystem folks could enhance it further and make more of
dentry/inode objects that would be very welcome.
> However, I can think of two possible solutions to the untracked
> external inode reference issue.
>
> The first is that external inode references need to take an active
> reference to the inode (like a dentry does), and this prevents
> inodes from being relocated whilst such external references exist.
>
> Josef has proposed an active/passive reference counting mechanism
> for all references to inodes recently on -fsdevel here:
>
> https://lore.kernel.org/linux-fsdevel/20250303170029.GA3964340@perftesting/
>
> However, the ability to revoke external references and/or resolve
> internal references atomically has not really been considered at
> this point in time.
...alright, I expect that'll be more tricker part.
> To allow referenced inodes to be relocated, I'd suggest that any
> subsystem that takes an external reference to the inode needs to
> provide something like a SRCU notifier block to allow the external
> reference to be dynamically removed. Once the relocation is done,
> another notifier method can be called allowing the external
> reference to be updated with the new inode address. Any attempt to
> access the inode whilst it is being relocated through that external
> mechanism should probably block.
>
> [ Note: this could be leveraged as a general ->revoke mechanism for
> external inode references. Instead of the external access blocking
> after reference recall, it would return an error if access
> revocation has occurred. This mechanism could likely also solve some
> of the current lifetime issues with fsnotify and landlock objects. ]
>
> This leaves internal (passive) references that can be resolved by
> locking the inode itself. e.g. getting rid of mapping tree
> references (e.g. folio->mapping->host) by invalidating the
> inode page cache.
Thank you so much for such a detailed writeup.
The former approach would allow allocating them from movable areas,
help mm/compaction.c to build high-order folios, and help slab to reduce
fragmentation.
> The other solution is to prevent excessive inode slab cache
> fragmentation in the first place. i.e. *stop caching unreferenced
> inodes*. In this case, the inode LRU goes away and we rely fully on
> the dentry cache pinning inodes to maintain the working set of
> inodes in memory. This works with/without Josef's proposed reference
> counting changes - though Josef's proposed changes make getting rid
> of the inode LRU a lot easier.
>
> I talk about some of that stuff in the discussion of this superblock
> inode list iteration patchset here:
>
> https://lore.kernel.org/linux-fsdevel/20241002014017.3801899-1-david@fromorbit.com/
The latter approach, while it does not make them relocatable, will reduce
fragmentation at least.
Unfortunately, as an MM developer, I don’t have enough experience with
filesystems to assess which proposal is more feasible. It would be really
helpful to get consensus from the FS folks before we push this path
forward—whether it's relocating inode entries or avoiding their
fragmentation.
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-04-25 11:09 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-21 13:47 Harry Yoo
2025-04-21 16:33 ` Pedro Falcato
2025-04-22 23:17 ` Harry Yoo
2025-04-23 5:53 ` Christoph Lameter (Ampere)
2025-04-21 21:54 ` Dave Chinner
2025-04-23 1:47 ` Al Viro
2025-04-23 7:20 ` Harry Yoo
2025-04-23 7:40 ` Al Viro
2025-04-25 11:09 ` Harry Yoo [this message]
2025-04-28 15:31 ` Jann Horn
2025-04-30 13:11 ` Harry Yoo
2025-04-30 22:23 ` Jann Horn
2025-05-05 23:29 ` Dave Chinner
2025-04-21 21:59 ` Tobin C. Harding
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aAttYSQsYc5y1AZO@harry \
--to=harry.yoo@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=byungchul@sk.com \
--cc=cl@linux.com \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=jannh@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=osalvador@suse.de \
--cc=pfalcato@suse.de \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=tobin@kernel.org \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox