* [LSF/MM BPF Topic] Warming up to frozen pages
@ 2025-03-18 14:47 Hannes Reinecke
2025-03-18 15:10 ` Matthew Wilcox
2025-03-19 15:11 ` Vlastimil Babka
0 siblings, 2 replies; 4+ messages in thread
From: Hannes Reinecke @ 2025-03-18 14:47 UTC (permalink / raw)
To: Matthew Wilcox, lsf-pc, linux-mm, linux-block
Hey all,
(Thanks, Jon, for the title :-)
the recent discussion around frozen pages and when to do a
get_page()/put_page() and when not resulted in quite some unresolved issues.
So I would like to propose a session at LSF/MM:
'Warming up to frozen pages'
With the frozen pages patchset from Willy slab pages don't need
(and, in fact, can have) a page reference anymore. While this easy
to state, and to implement when using iov iterators, problems
arise when these iov iterators get mangled eg when being passed
via the various layers in the kernel.
Case in point: 'recvmsg()', when called from userspace, is being
passed an iov, and the iterator type defines if a page reference
need to be taken. However, when called from other kernel subsystems
(eg from nvme-tcp or iscsi), the iov is filled from a bvec which
in itself is filled from an iov iter from userspace, so the iov
iterator will assume it's a 'normal' bvec, and get a reference for
all entries as it wouldn't know which entry is a 'normal' and which
is a 'slab' page.
As Christoph indicated this is _not_ how things should be, so
a discussion on how to disentangle this would be good.
Maybe we even manage to lay down some rules when a page reference
should be taken and when not.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM BPF Topic] Warming up to frozen pages
2025-03-18 14:47 [LSF/MM BPF Topic] Warming up to frozen pages Hannes Reinecke
@ 2025-03-18 15:10 ` Matthew Wilcox
2025-03-20 6:02 ` Christoph Hellwig
2025-03-19 15:11 ` Vlastimil Babka
1 sibling, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2025-03-18 15:10 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: lsf-pc, linux-mm, linux-block
On Tue, Mar 18, 2025 at 03:47:31PM +0100, Hannes Reinecke wrote:
> 'Warming up to frozen pages'
> With the frozen pages patchset from Willy slab pages don't need
> (and, in fact, can have) a page reference anymore. While this easy
> to state, and to implement when using iov iterators, problems
> arise when these iov iterators get mangled eg when being passed
> via the various layers in the kernel.
> Case in point: 'recvmsg()', when called from userspace, is being
> passed an iov, and the iterator type defines if a page reference
> need to be taken. However, when called from other kernel subsystems
> (eg from nvme-tcp or iscsi), the iov is filled from a bvec which
> in itself is filled from an iov iter from userspace, so the iov
> iterator will assume it's a 'normal' bvec, and get a reference for
> all entries as it wouldn't know which entry is a 'normal' and which
> is a 'slab' page.
> As Christoph indicated this is _not_ how things should be, so
> a discussion on how to disentangle this would be good.
> Maybe we even manage to lay down some rules when a page reference
> should be taken and when not.
My only concern is that we might not have anybody from networking to talk
about their side of all this. We need Dave Howells for this as one of
the network filesystem people, he probably understands this fairly well.
Anna might have some network stack knowledge too. Maybe we can get some
of the BPF people to join in; although their track looks very dense,
so we'll have to try hard to find a time when there's a topic that the
networking-BPF people aren't so interested in.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM BPF Topic] Warming up to frozen pages
2025-03-18 14:47 [LSF/MM BPF Topic] Warming up to frozen pages Hannes Reinecke
2025-03-18 15:10 ` Matthew Wilcox
@ 2025-03-19 15:11 ` Vlastimil Babka
1 sibling, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2025-03-19 15:11 UTC (permalink / raw)
To: Hannes Reinecke, Matthew Wilcox, lsf-pc, linux-mm, linux-block
On 3/18/25 15:47, Hannes Reinecke wrote:
> Hey all,
>
> (Thanks, Jon, for the title :-)
> the recent discussion around frozen pages and when to do a
> get_page()/put_page() and when not resulted in quite some unresolved issues.
> So I would like to propose a session at LSF/MM:
>
> 'Warming up to frozen pages'
> With the frozen pages patchset from Willy slab pages don't need
> (and, in fact, can have) a page reference anymore. While this easy
BTW, my hope is that large kmalloc folios would also drop the refcount. That
means anything obtained by kmalloc(N) where N is over 8k (order-1 folio on
4k PAGE_SIZE). That 8k is an implementation detail of SLUB (SLAB used to
have more) and thus all kmalloc() buffers should better behave the same to
avoid surprises when some particular allocation size changes, or the thing
becomes used on an architecture with larger than 4k page size and thus
becomes a "small" kmalloc() there.
Willy already added a page type for large kmalloc. Given we can expect
possible troubles after the small kmalloc() ones already hit us, I'd first
just add get_page()/put_page() warnings for that large kmalloc page type in
addition to the slab page type ones and expose that to -next (after the 6.15
merge window) to see if something needs fixing.
> to state, and to implement when using iov iterators, problems
> arise when these iov iterators get mangled eg when being passed
> via the various layers in the kernel.
> Case in point: 'recvmsg()', when called from userspace, is being
> passed an iov, and the iterator type defines if a page reference
> need to be taken. However, when called from other kernel subsystems
> (eg from nvme-tcp or iscsi), the iov is filled from a bvec which
> in itself is filled from an iov iter from userspace, so the iov
> iterator will assume it's a 'normal' bvec, and get a reference for
> all entries as it wouldn't know which entry is a 'normal' and which
> is a 'slab' page.
> As Christoph indicated this is _not_ how things should be, so
> a discussion on how to disentangle this would be good.
> Maybe we even manage to lay down some rules when a page reference
> should be taken and when not.
>
> Cheers,
>
> Hannes
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM BPF Topic] Warming up to frozen pages
2025-03-18 15:10 ` Matthew Wilcox
@ 2025-03-20 6:02 ` Christoph Hellwig
0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2025-03-20 6:02 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Hannes Reinecke, lsf-pc, linux-mm, linux-block
On Tue, Mar 18, 2025 at 03:10:01PM +0000, Matthew Wilcox wrote:
> My only concern is that we might not have anybody from networking to talk
> about their side of all this. We need Dave Howells for this as one of
> the network filesystem people, he probably understands this fairly well.
> Anna might have some network stack knowledge too. Maybe we can get some
> of the BPF people to join in; although their track looks very dense,
> so we'll have to try hard to find a time when there's a topic that the
> networking-BPF people aren't so interested in.
The problem isn't the network file systems, they'd do much better
by not having the network stack play silly refcount games. This
needs the core networking folks to agree to this and then someone
to do the work, which might or might not reach deep down in all
the various network protocols supported.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-03-20 6:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-18 14:47 [LSF/MM BPF Topic] Warming up to frozen pages Hannes Reinecke
2025-03-18 15:10 ` Matthew Wilcox
2025-03-20 6:02 ` Christoph Hellwig
2025-03-19 15:11 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox