* [PATCH] mm: Decline to manipulate the refcount on a slab page
@ 2025-03-10 14:35 Matthew Wilcox (Oracle)
2025-03-10 14:37 ` Vlastimil Babka
2025-03-11 10:15 ` Jakub Kicinski
0 siblings, 2 replies; 15+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-03-10 14:35 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle),
netdev, Vlastimil Babka, linux-mm, Hannes Reinecke
Slab pages now have a refcount of 0, so nobody should be trying to
manipulate the refcount on them. Doing so has little effect; the object
could be freed and reallocated to a different purpose, although the slab
itself would not be until the refcount was put making it behave rather
like TYPESAFE_BY_RCU.
Unfortunately, __iov_iter_get_pages_alloc() does take a refcount.
Fix that to not change the refcount, and make put_page() silently not
change the refcount. get_page() warns so that we can fix any other
callers that need to be changed.
Long-term, networking needs to stop taking a refcount on the pages that
it uses and rely on the caller to hold whatever references are necessary
to make the memory stable. In the medium term, more page types are going
to hav a zero refcount, so we'll want to move get_page() and put_page()
out of line.
Reported-by: Hannes Reinecke <hare@suse.de>
Fixes: 9aec2fb0fd5e (slab: allocate frozen pages)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
include/linux/mm.h | 7 ++++++-
lib/iov_iter.c | 8 ++++++--
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 61de65c4e430..4e118cbe0556 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1539,7 +1539,10 @@ static inline void folio_get(struct folio *folio)
static inline void get_page(struct page *page)
{
- folio_get(page_folio(page));
+ struct folio *folio = page_folio(page);
+ if (WARN_ON_ONCE(folio_test_slab(folio)))
+ return;
+ folio_get(folio);
}
static inline __must_check bool try_get_page(struct page *page)
@@ -1633,6 +1636,8 @@ static inline void put_page(struct page *page)
{
struct folio *folio = page_folio(page);
+ if (folio_test_slab(folio))
+ return;
folio_put(folio);
}
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 65f550cb5081..8c7fdb7d8c8f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1190,8 +1190,12 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
if (!n)
return -ENOMEM;
p = *pages;
- for (int k = 0; k < n; k++)
- get_page(p[k] = page + k);
+ for (int k = 0; k < n; k++) {
+ struct folio *folio = page_folio(page);
+ p[k] = page + k;
+ if (!folio_test_slab(folio))
+ folio_get(folio);
+ }
maxsize = min_t(size_t, maxsize, n * PAGE_SIZE - *start);
i->count -= maxsize;
i->iov_offset += maxsize;
--
2.47.2
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-10 14:35 [PATCH] mm: Decline to manipulate the refcount on a slab page Matthew Wilcox (Oracle)
@ 2025-03-10 14:37 ` Vlastimil Babka
2025-03-10 14:50 ` Matthew Wilcox
2025-03-11 10:15 ` Jakub Kicinski
1 sibling, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2025-03-10 14:37 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: netdev, linux-mm, Hannes Reinecke
On 3/10/25 15:35, Matthew Wilcox (Oracle) wrote:
> Slab pages now have a refcount of 0, so nobody should be trying to
> manipulate the refcount on them. Doing so has little effect; the object
> could be freed and reallocated to a different purpose, although the slab
> itself would not be until the refcount was put making it behave rather
> like TYPESAFE_BY_RCU.
>
> Unfortunately, __iov_iter_get_pages_alloc() does take a refcount.
> Fix that to not change the refcount, and make put_page() silently not
> change the refcount. get_page() warns so that we can fix any other
> callers that need to be changed.
>
> Long-term, networking needs to stop taking a refcount on the pages that
> it uses and rely on the caller to hold whatever references are necessary
> to make the memory stable. In the medium term, more page types are going
> to hav a zero refcount, so we'll want to move get_page() and put_page()
> out of line.
>
> Reported-by: Hannes Reinecke <hare@suse.de>
Closes:
https://lore.kernel.org/all/08c29e4b-2f71-4b6d-8046-27e407214d8c@suse.com/
> Fixes: 9aec2fb0fd5e (slab: allocate frozen pages)
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Note it's a 6.14 hotfix for kernel oopses due to page refcount overflow.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> include/linux/mm.h | 7 ++++++-
> lib/iov_iter.c | 8 ++++++--
> 2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 61de65c4e430..4e118cbe0556 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1539,7 +1539,10 @@ static inline void folio_get(struct folio *folio)
>
> static inline void get_page(struct page *page)
> {
> - folio_get(page_folio(page));
> + struct folio *folio = page_folio(page);
> + if (WARN_ON_ONCE(folio_test_slab(folio)))
> + return;
> + folio_get(folio);
> }
>
> static inline __must_check bool try_get_page(struct page *page)
> @@ -1633,6 +1636,8 @@ static inline void put_page(struct page *page)
> {
> struct folio *folio = page_folio(page);
>
> + if (folio_test_slab(folio))
> + return;
> folio_put(folio);
> }
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 65f550cb5081..8c7fdb7d8c8f 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1190,8 +1190,12 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
> if (!n)
> return -ENOMEM;
> p = *pages;
> - for (int k = 0; k < n; k++)
> - get_page(p[k] = page + k);
> + for (int k = 0; k < n; k++) {
> + struct folio *folio = page_folio(page);
> + p[k] = page + k;
> + if (!folio_test_slab(folio))
> + folio_get(folio);
> + }
> maxsize = min_t(size_t, maxsize, n * PAGE_SIZE - *start);
> i->count -= maxsize;
> i->iov_offset += maxsize;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-10 14:37 ` Vlastimil Babka
@ 2025-03-10 14:50 ` Matthew Wilcox
0 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2025-03-10 14:50 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, netdev, linux-mm, Hannes Reinecke
On Mon, Mar 10, 2025 at 03:37:51PM +0100, Vlastimil Babka wrote:
> Note it's a 6.14 hotfix for kernel oopses due to page refcount overflow.
Not actually overflow ... without VM_DEBUG enabled, networking increases
the refcount from 0 to 1, then decrements it from 1 to 0, causing the
slab to be freed. So it's a UAF bug induced by a messed-up refcount.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-10 14:35 [PATCH] mm: Decline to manipulate the refcount on a slab page Matthew Wilcox (Oracle)
2025-03-10 14:37 ` Vlastimil Babka
@ 2025-03-11 10:15 ` Jakub Kicinski
2025-03-11 15:46 ` Hannes Reinecke
2025-03-12 5:47 ` Christoph Hellwig
1 sibling, 2 replies; 15+ messages in thread
From: Jakub Kicinski @ 2025-03-11 10:15 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Andrew Morton, netdev, Vlastimil Babka, linux-mm, Hannes Reinecke
On Mon, 10 Mar 2025 14:35:24 +0000 Matthew Wilcox (Oracle) wrote:
> Long-term, networking needs to stop taking a refcount on the pages that
> it uses and rely on the caller to hold whatever references are necessary
> to make the memory stable.
TBH I'm not clear on who is going to fix this.
IIRC we already told NVMe people that sending slab memory over sendpage
is not well supported. Plus the bug is in BPF integration, judging by
the stack traces (skmsg is a BPF thing). Joy.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-11 10:15 ` Jakub Kicinski
@ 2025-03-11 15:46 ` Hannes Reinecke
2025-03-11 16:59 ` Matthew Wilcox
2025-03-12 5:47 ` Christoph Hellwig
1 sibling, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2025-03-11 15:46 UTC (permalink / raw)
To: Jakub Kicinski, Matthew Wilcox (Oracle)
Cc: Andrew Morton, netdev, Vlastimil Babka, linux-mm
On 3/11/25 11:15, Jakub Kicinski wrote:
> On Mon, 10 Mar 2025 14:35:24 +0000 Matthew Wilcox (Oracle) wrote:
>> Long-term, networking needs to stop taking a refcount on the pages that
>> it uses and rely on the caller to hold whatever references are necessary
>> to make the memory stable.
>
> TBH I'm not clear on who is going to fix this.
> IIRC we already told NVMe people that sending slab memory over sendpage
> is not well supported. Plus the bug is in BPF integration, judging by
> the stack traces (skmsg is a BPF thing). Joy.
Hmm. Did you? Seem to have missed it.
We make sure to not do it via the 'sendpage_ok()' call; but other than
that it's not much we can do.
And BPF is probably not the culprit; issue here is that we have a kvec,
package it into a bio (where it gets converted into a bvec),
and then call an iov iterator in tls_sw to get to the pages.
But at that stage we only see the bvec iterator, and the information
that it was an kvec to start with has been lost.
All wouldn't be so bad if we wouldn't call get_page/put_page (the caller
holds the reference, after all), but iov iterators and the skmsg code
insists upon.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-11 15:46 ` Hannes Reinecke
@ 2025-03-11 16:59 ` Matthew Wilcox
2025-03-12 5:48 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2025-03-11 16:59 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Jakub Kicinski, Andrew Morton, netdev, Vlastimil Babka, linux-mm
On Tue, Mar 11, 2025 at 04:46:45PM +0100, Hannes Reinecke wrote:
> On 3/11/25 11:15, Jakub Kicinski wrote:
> > On Mon, 10 Mar 2025 14:35:24 +0000 Matthew Wilcox (Oracle) wrote:
> > > Long-term, networking needs to stop taking a refcount on the pages that
> > > it uses and rely on the caller to hold whatever references are necessary
> > > to make the memory stable.
> >
> > TBH I'm not clear on who is going to fix this.
> > IIRC we already told NVMe people that sending slab memory over sendpage
> > is not well supported. Plus the bug is in BPF integration, judging by
> > the stack traces (skmsg is a BPF thing). Joy.
>
> Hmm. Did you? Seem to have missed it.
> We make sure to not do it via the 'sendpage_ok()' call; but other than
> that it's not much we can do.
>
> And BPF is probably not the culprit; issue here is that we have a kvec,
> package it into a bio (where it gets converted into a bvec),
> and then call an iov iterator in tls_sw to get to the pages.
> But at that stage we only see the bvec iterator, and the information
> that it was an kvec to start with has been lost.
So I have two questions:
Hannes:
- Why does nvme need to turn the kvec into a bio rather than just
send it directly?
Jakub:
- Why does the socket code think it needs to get a refcount on a bvec
at all, since the block layer doesn't?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-11 10:15 ` Jakub Kicinski
2025-03-11 15:46 ` Hannes Reinecke
@ 2025-03-12 5:47 ` Christoph Hellwig
1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-03-12 5:47 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Matthew Wilcox (Oracle),
Andrew Morton, netdev, Vlastimil Babka, linux-mm,
Hannes Reinecke
On Tue, Mar 11, 2025 at 11:15:11AM +0100, Jakub Kicinski wrote:
> On Mon, 10 Mar 2025 14:35:24 +0000 Matthew Wilcox (Oracle) wrote:
> > Long-term, networking needs to stop taking a refcount on the pages that
> > it uses and rely on the caller to hold whatever references are necessary
> > to make the memory stable.
>
> TBH I'm not clear on who is going to fix this.
> IIRC we already told NVMe people that sending slab memory over sendpage
> is not well supported. Plus the bug is in BPF integration, judging by
> the stack traces (skmsg is a BPF thing). Joy.
slab over sendpage doesn't work because you refuse to take the patches
to make it work by transparently falling back to sendmsg. It's a giant
pain for all network storage drivers caused by the networking
maintainers. The ultimate root cause is the fact that networking messes
with the refcounts.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-11 16:59 ` Matthew Wilcox
@ 2025-03-12 5:48 ` Christoph Hellwig
2025-03-13 7:22 ` Hannes Reinecke
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2025-03-12 5:48 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Hannes Reinecke, Jakub Kicinski, Andrew Morton, netdev,
Vlastimil Babka, linux-mm
On Tue, Mar 11, 2025 at 04:59:53PM +0000, Matthew Wilcox wrote:
> So I have two questions:
>
> Hannes:
> - Why does nvme need to turn the kvec into a bio rather than just
> send it directly?
It doensn't need to and in fact does not.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-12 5:48 ` Christoph Hellwig
@ 2025-03-13 7:22 ` Hannes Reinecke
2025-03-13 7:36 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2025-03-13 7:22 UTC (permalink / raw)
To: Christoph Hellwig, Matthew Wilcox
Cc: Jakub Kicinski, Andrew Morton, netdev, Vlastimil Babka, linux-mm
On 3/12/25 06:48, Christoph Hellwig wrote:
> On Tue, Mar 11, 2025 at 04:59:53PM +0000, Matthew Wilcox wrote:
>> So I have two questions:
>>
>> Hannes:
>> - Why does nvme need to turn the kvec into a bio rather than just
>> send it directly?
>
> It doensn't need to and in fact does not.
>
Errm ... nvmf_connect_admin_queue()/nvmf_connect_io_queue() does ...
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-13 7:22 ` Hannes Reinecke
@ 2025-03-13 7:36 ` Christoph Hellwig
2025-03-13 8:34 ` Hannes Reinecke
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2025-03-13 7:36 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Matthew Wilcox, Jakub Kicinski, Andrew Morton,
netdev, Vlastimil Babka, linux-mm
On Thu, Mar 13, 2025 at 08:22:01AM +0100, Hannes Reinecke wrote:
> On 3/12/25 06:48, Christoph Hellwig wrote:
> > On Tue, Mar 11, 2025 at 04:59:53PM +0000, Matthew Wilcox wrote:
> > > So I have two questions:
> > >
> > > Hannes:
> > > - Why does nvme need to turn the kvec into a bio rather than just
> > > send it directly?
> >
> > It doensn't need to and in fact does not.
> >
> Errm ... nvmf_connect_admin_queue()/nvmf_connect_io_queue() does ...
No kvec there. Just plain old passthrough commands like many others.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-13 7:36 ` Christoph Hellwig
@ 2025-03-13 8:34 ` Hannes Reinecke
2025-03-13 8:44 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2025-03-13 8:34 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Matthew Wilcox, Jakub Kicinski, Andrew Morton, netdev,
Vlastimil Babka, linux-mm
On 3/13/25 08:36, Christoph Hellwig wrote:
> On Thu, Mar 13, 2025 at 08:22:01AM +0100, Hannes Reinecke wrote:
>> On 3/12/25 06:48, Christoph Hellwig wrote:
>>> On Tue, Mar 11, 2025 at 04:59:53PM +0000, Matthew Wilcox wrote:
>>>> So I have two questions:
>>>>
>>>> Hannes:
>>>> - Why does nvme need to turn the kvec into a bio rather than just
>>>> send it directly?
>>>
>>> It doensn't need to and in fact does not.
>>>
>> Errm ... nvmf_connect_admin_queue()/nvmf_connect_io_queue() does ...
>
> No kvec there. Just plain old passthrough commands like many others.
I might be misunderstood.
nvmf_connect_command_prep() returns a kmalloced buffer.
That is stored in a bvec in _nvme_submit_sync_cmd() via
blk_mq_rq_map_kern()->bio_map_kern().
And from that point on we are dealing with bvecs (iterators
and all), and losing the information that the page referenced
is a slab page.
The argument is that the network layer expected a kvec iterator
when slab pages are referred to, not a bvec iterator.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-13 8:34 ` Hannes Reinecke
@ 2025-03-13 8:44 ` Christoph Hellwig
2025-03-13 8:52 ` Hannes Reinecke
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2025-03-13 8:44 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Matthew Wilcox, Jakub Kicinski, Andrew Morton,
netdev, Vlastimil Babka, linux-mm
On Thu, Mar 13, 2025 at 09:34:39AM +0100, Hannes Reinecke wrote:
> nvmf_connect_command_prep() returns a kmalloced buffer.
Yes.
> That is stored in a bvec in _nvme_submit_sync_cmd() via
> blk_mq_rq_map_kern()->bio_map_kern().
> And from that point on we are dealing with bvecs (iterators
> and all), and losing the information that the page referenced
> is a slab page.
Yes. But so does every other consomer of the block layer that passes
slab memory, of which there are quite a few. Various internal scsi
and nvme command come to mind, as does the XFS buffer cache.
> The argument is that the network layer expected a kvec iterator
> when slab pages are referred to, not a bvec iterator.
It doesn't. It just doesn't want you to use ->sendpage.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-13 8:44 ` Christoph Hellwig
@ 2025-03-13 8:52 ` Hannes Reinecke
2025-03-13 9:15 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2025-03-13 8:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Matthew Wilcox, Jakub Kicinski, Andrew Morton, netdev,
Vlastimil Babka, linux-mm
On 3/13/25 09:44, Christoph Hellwig wrote:
> On Thu, Mar 13, 2025 at 09:34:39AM +0100, Hannes Reinecke wrote:
>> nvmf_connect_command_prep() returns a kmalloced buffer.
>
> Yes.
>
>> That is stored in a bvec in _nvme_submit_sync_cmd() via
>> blk_mq_rq_map_kern()->bio_map_kern().
>> And from that point on we are dealing with bvecs (iterators
>> and all), and losing the information that the page referenced
>> is a slab page.
>
> Yes. But so does every other consomer of the block layer that passes
> slab memory, of which there are quite a few. Various internal scsi
> and nvme command come to mind, as does the XFS buffer cache.
>
>> The argument is that the network layer expected a kvec iterator
>> when slab pages are referred to, not a bvec iterator.
>
> It doesn't. It just doesn't want you to use ->sendpage.
>
But we don't; we call 'sendpage_ok()' and disabling the MSG_SPLICE_PAGES
flag. Actual issue is that tls_sw() is calling iov_iter_alloc_pages(),
which is taking a page reference.
It probably should be calling iov_iter_extract_pages() (which does not
take a reference), but then one would need to review the entire network
stack as taking and releasing page references are littered throughout
the stack.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
2025-03-13 8:52 ` Hannes Reinecke
@ 2025-03-13 9:15 ` Christoph Hellwig
0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-03-13 9:15 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Matthew Wilcox, Jakub Kicinski, Andrew Morton,
netdev, Vlastimil Babka, linux-mm
On Thu, Mar 13, 2025 at 09:52:18AM +0100, Hannes Reinecke wrote:
> > It doesn't. It just doesn't want you to use ->sendpage.
> >
> But we don't; we call 'sendpage_ok()' and disabling the MSG_SPLICE_PAGES
> flag.
MSG_SPLICE_PAGES really just is the new name for the old ->sendpage.
Sorry for being stuck in the old naming.
> Actual issue is that tls_sw() is calling iov_iter_alloc_pages(),
> which is taking a page reference.
> It probably should be calling iov_iter_extract_pages() (which does not
> take a reference), but then one would need to review the entire network
> stack as taking and releasing page references are littered throughout
> the stack.
Yes, it needs to use the proper pinning helpers, if only to not
corrupt out of place write file systems when receiving from a TLS
socket. But for the network stack below it that doesn't matter,
it expects to be able to grab and release references, and for that
you need page backing. If that page was pinned or referenced
when resolving the user address does not matter at all.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Decline to manipulate the refcount on a slab page
[not found] ` <Z88vUFweLyk5s8UD@casper.infradead.org>
@ 2025-03-11 7:05 ` Hannes Reinecke
0 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2025-03-11 7:05 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: netdev, Vlastimil Babka, linux-mm
On 3/10/25 19:28, Matthew Wilcox wrote:
> On Mon, Mar 10, 2025 at 05:57:51PM +0100, Hannes Reinecke wrote:
>> On 3/10/25 15:27, Matthew Wilcox (Oracle) wrote:
>> I assume we will have a discussion at LSF around frozen pages/slab
>> behaviour?
>> It's not just networking, also every driver using iov_alloc_pages() and
>> friends is potentially affected.
>> And it would be good to clarify rules how these iterators should be
>> used.
>
> Sure, we can do that. I haven't conducted a deep survey of how many
> page users really need a refcount, so I'll have things to learn too.
That would be awesome.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-03-13 9:15 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-10 14:35 [PATCH] mm: Decline to manipulate the refcount on a slab page Matthew Wilcox (Oracle)
2025-03-10 14:37 ` Vlastimil Babka
2025-03-10 14:50 ` Matthew Wilcox
2025-03-11 10:15 ` Jakub Kicinski
2025-03-11 15:46 ` Hannes Reinecke
2025-03-11 16:59 ` Matthew Wilcox
2025-03-12 5:48 ` Christoph Hellwig
2025-03-13 7:22 ` Hannes Reinecke
2025-03-13 7:36 ` Christoph Hellwig
2025-03-13 8:34 ` Hannes Reinecke
2025-03-13 8:44 ` Christoph Hellwig
2025-03-13 8:52 ` Hannes Reinecke
2025-03-13 9:15 ` Christoph Hellwig
2025-03-12 5:47 ` Christoph Hellwig
[not found] <20250310142750.1209192-1-willy@infradead.org>
[not found] ` <77fa8d7e-4752-4979-affe-aa45c8d7795a@suse.de>
[not found] ` <Z88vUFweLyk5s8UD@casper.infradead.org>
2025-03-11 7:05 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox