* [PATCH] mm: page_frag: Fix refill handling in __page_frag_alloc_align()
@ 2025-03-01 2:03 Haiyang Zhang
2025-03-01 13:49 ` Yunsheng Lin
0 siblings, 1 reply; 5+ messages in thread
From: Haiyang Zhang @ 2025-03-01 2:03 UTC (permalink / raw)
To: linux-hyperv, akpm, linux-mm
Cc: haiyangz, decui, kys, paulros, olaf, vkuznets, davem, wei.liu,
longli, linux-kernel, linyunsheng, stable
In commit 8218f62c9c9b ("mm: page_frag: use initial zero offset for
page_frag_alloc_align()"), the check for fragsz is moved earlier.
So when the cache is used up, and if the fragsz > PAGE_SIZE, it won't
try to refill, and just return NULL.
I tested it with fragsz:8192, cache-size:32768. After the initial four
successful allocations, it failed, even there is plenty of free memory
in the system.
To fix, revert the refill logic like before: the refill is attempted
before the check & return NULL.
Cc: linyunsheng@huawei.com
Cc: stable@vger.kernel.org
Fixes: 8218f62c9c9b ("mm: page_frag: use initial zero offset for page_frag_alloc_align()")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
mm/page_frag_cache.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index d2423f30577e..82935d7e53de 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -119,19 +119,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
size = PAGE_SIZE << encoded_page_decode_order(encoded_page);
offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
if (unlikely(offset + fragsz > size)) {
- if (unlikely(fragsz > PAGE_SIZE)) {
- /*
- * The caller is trying to allocate a fragment
- * with fragsz > PAGE_SIZE but the cache isn't big
- * enough to satisfy the request, this may
- * happen in low memory conditions.
- * We don't release the cache page because
- * it could make memory pressure worse
- * so we simply return NULL here.
- */
- return NULL;
- }
-
page = encoded_page_decode_page(encoded_page);
if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
@@ -149,6 +136,19 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
/* reset page count bias and offset to start of new frag */
nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
offset = 0;
+
+ if (unlikely(fragsz > size)) {
+ /*
+ * The caller is trying to allocate a fragment
+ * with fragsz > size but the cache isn't big
+ * enough to satisfy the request, this may
+ * happen in low memory conditions.
+ * We don't release the cache page because
+ * it could make memory pressure worse
+ * so we simply return NULL here.
+ */
+ return NULL;
+ }
}
nc->pagecnt_bias--;
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: page_frag: Fix refill handling in __page_frag_alloc_align()
2025-03-01 2:03 [PATCH] mm: page_frag: Fix refill handling in __page_frag_alloc_align() Haiyang Zhang
@ 2025-03-01 13:49 ` Yunsheng Lin
2025-03-01 17:00 ` [EXTERNAL] " Haiyang Zhang
0 siblings, 1 reply; 5+ messages in thread
From: Yunsheng Lin @ 2025-03-01 13:49 UTC (permalink / raw)
To: Haiyang Zhang, linux-hyperv, akpm, linux-mm
Cc: decui, kys, paulros, olaf, vkuznets, davem, wei.liu, longli,
linux-kernel, linyunsheng, stable, netdev, Alexander Duyck
+cc netdev ML & Alexander
On 3/1/2025 10:03 AM, Haiyang Zhang wrote:
> In commit 8218f62c9c9b ("mm: page_frag: use initial zero offset for
> page_frag_alloc_align()"), the check for fragsz is moved earlier.
> So when the cache is used up, and if the fragsz > PAGE_SIZE, it won't
> try to refill, and just return NULL.
> I tested it with fragsz:8192, cache-size:32768. After the initial four
> successful allocations, it failed, even there is plenty of free memory
> in the system.
Hi, Haiyang
It seems the PAGE_SIZE is 4K for the tested system?
Which drivers or subsystems are passing the fragsz being bigger than
PAGE_SIZE to page_frag_alloc_align() related API?
> To fix, revert the refill logic like before: the refill is attempted
> before the check & return NULL.
page_frag API is not really for allocating memory being bigger than
PAGE_SIZE as __page_frag_cache_refill() will not try hard enough to
allocate order 3 compound page when calling __alloc_pages() and will
fail back to allocate base page as the discussed in below:
https://lore.kernel.org/all/ead00fb7-8538-45b3-8322-8a41386e7381@huawei.com/
>
> Cc: linyunsheng@huawei.com
> Cc: stable@vger.kernel.org
> Fixes: 8218f62c9c9b ("mm: page_frag: use initial zero offset for page_frag_alloc_align()")
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
> mm/page_frag_cache.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> index d2423f30577e..82935d7e53de 100644
> --- a/mm/page_frag_cache.c
> +++ b/mm/page_frag_cache.c
> @@ -119,19 +119,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
> size = PAGE_SIZE << encoded_page_decode_order(encoded_page);
> offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
> if (unlikely(offset + fragsz > size)) {
> - if (unlikely(fragsz > PAGE_SIZE)) {
> - /*
> - * The caller is trying to allocate a fragment
> - * with fragsz > PAGE_SIZE but the cache isn't big
> - * enough to satisfy the request, this may
> - * happen in low memory conditions.
> - * We don't release the cache page because
> - * it could make memory pressure worse
> - * so we simply return NULL here.
> - */
> - return NULL;
> - }
> -
> page = encoded_page_decode_page(encoded_page);
>
> if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> @@ -149,6 +136,19 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
> /* reset page count bias and offset to start of new frag */
> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> offset = 0;
> +
> + if (unlikely(fragsz > size)) {
> + /*
> + * The caller is trying to allocate a fragment
> + * with fragsz > size but the cache isn't big
> + * enough to satisfy the request, this may
> + * happen in low memory conditions.
> + * We don't release the cache page because
> + * it could make memory pressure worse
> + * so we simply return NULL here.
> + */
> + return NULL;
> + }
> }
>
> nc->pagecnt_bias--;
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [EXTERNAL] Re: [PATCH] mm: page_frag: Fix refill handling in __page_frag_alloc_align()
2025-03-01 13:49 ` Yunsheng Lin
@ 2025-03-01 17:00 ` Haiyang Zhang
2025-03-01 23:16 ` Haiyang Zhang
0 siblings, 1 reply; 5+ messages in thread
From: Haiyang Zhang @ 2025-03-01 17:00 UTC (permalink / raw)
To: Yunsheng Lin, linux-hyperv, akpm, linux-mm
Cc: Dexuan Cui, KY Srinivasan, Paul Rosswurm, olaf, vkuznets, davem,
wei.liu, Long Li, linux-kernel, linyunsheng, stable, netdev,
Alexander Duyck
> -----Original Message-----
> From: Yunsheng Lin <yunshenglin0825@gmail.com>
> Sent: Saturday, March 1, 2025 8:50 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>; linux-hyperv@vger.kernel.org;
> akpm@linux-foundation.org; linux-mm@kvack.org
> Cc: Dexuan Cui <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>;
> Paul Rosswurm <paulros@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; wei.liu@kernel.org; Long Li
> <longli@microsoft.com>; linux-kernel@vger.kernel.org;
> linyunsheng@huawei.com; stable@vger.kernel.org; netdev@vger.kernel.org;
> Alexander Duyck <alexander.duyck@gmail.com>
> Subject: [EXTERNAL] Re: [PATCH] mm: page_frag: Fix refill handling in
> __page_frag_alloc_align()
>
> +cc netdev ML & Alexander
>
> On 3/1/2025 10:03 AM, Haiyang Zhang wrote:
> > In commit 8218f62c9c9b ("mm: page_frag: use initial zero offset for
> > page_frag_alloc_align()"), the check for fragsz is moved earlier.
> > So when the cache is used up, and if the fragsz > PAGE_SIZE, it won't
> > try to refill, and just return NULL.
> > I tested it with fragsz:8192, cache-size:32768. After the initial four
> > successful allocations, it failed, even there is plenty of free memory
> > in the system.
>
> Hi, Haiyang
> It seems the PAGE_SIZE is 4K for the tested system?
Yes.
> Which drivers or subsystems are passing the fragsz being bigger than
> PAGE_SIZE to page_frag_alloc_align() related API?
For example, our MANA driver when using jumbo frame.
https://web.git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/drivers/net/ethernet/microsoft/mana
> > To fix, revert the refill logic like before: the refill is attempted
> > before the check & return NULL.
>
> page_frag API is not really for allocating memory being bigger than
> PAGE_SIZE as __page_frag_cache_refill() will not try hard enough to
> allocate order 3 compound page when calling __alloc_pages() and will
> fail back to allocate base page as the discussed in below:
> https://lore.ker/
> nel.org%2Fall%2Fead00fb7-8538-45b3-8322-
> 8a41386e7381%40huawei.com%2F&data=05%7C02%7Chaiyangz%40microsoft.com%7Cd73
> d6a0ae65b4a42681c08dd58c8087b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7
> C638764338396356411%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOi
> IwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7
> C&sdata=FJ7Ggrxxxv6QzKepUiHmtns1GZC2G2oJMcWSzOuFbsE%3D&reserved=0
We are already aware of this, and have error checking in place for the failover
case to "base page".
From the discussion thread above, there are other drivers using
page_frag_alloc_align() for over PAGE_SIZE too. If making the page_frag API
support only fragsz <= PAGE_SIZE is desired, can we create another API? One
keeps the existing API semantics (allowing > PAGE_SIZE), the other uses
your new code. By the way, it should add an explicit check and fail ALL requests
for fragsz > PAGE_SIZE. Currently your code successfully allocates big frags
for a few times, then fail. This is not a desired behavior. It's also a
breaking change for our MANA driver, which can no longer run Jumbo frames.
@Andrew Morton <akpm@linux-foundation.org>
And other maintainers, could you please also evaluate the idea above?
Thanks,
- Haiyang
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [EXTERNAL] Re: [PATCH] mm: page_frag: Fix refill handling in __page_frag_alloc_align()
2025-03-01 17:00 ` [EXTERNAL] " Haiyang Zhang
@ 2025-03-01 23:16 ` Haiyang Zhang
2025-03-02 2:22 ` Yunsheng Lin
0 siblings, 1 reply; 5+ messages in thread
From: Haiyang Zhang @ 2025-03-01 23:16 UTC (permalink / raw)
To: Haiyang Zhang, Yunsheng Lin, linux-hyperv, akpm, linux-mm
Cc: Dexuan Cui, KY Srinivasan, Paul Rosswurm, olaf, vkuznets, davem,
wei.liu, Long Li, linux-kernel, linyunsheng, stable, netdev,
Alexander Duyck
> -----Original Message-----
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Sent: Saturday, March 1, 2025 12:00 PM
> To: Yunsheng Lin <yunshenglin0825@gmail.com>; linux-
> hyperv@vger.kernel.org; akpm@linux-foundation.org; linux-mm@kvack.org
> Cc: Dexuan Cui <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>;
> Paul Rosswurm <paulros@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; wei.liu@kernel.org; Long Li
> <longli@microsoft.com>; linux-kernel@vger.kernel.org;
> linyunsheng@huawei.com; stable@vger.kernel.org; netdev@vger.kernel.org;
> Alexander Duyck <alexander.duyck@gmail.com>
> Subject: RE: [EXTERNAL] Re: [PATCH] mm: page_frag: Fix refill handling in
> __page_frag_alloc_align()
>
>
>
> > -----Original Message-----
> > From: Yunsheng Lin <yunshenglin0825@gmail.com>
> > Sent: Saturday, March 1, 2025 8:50 AM
> > To: Haiyang Zhang <haiyangz@microsoft.com>; linux-
> hyperv@vger.kernel.org;
> > akpm@linux-foundation.org; linux-mm@kvack.org
> > Cc: Dexuan Cui <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>;
> > Paul Rosswurm <paulros@microsoft.com>; olaf@aepfle.de; vkuznets
> > <vkuznets@redhat.com>; davem@davemloft.net; wei.liu@kernel.org; Long Li
> > <longli@microsoft.com>; linux-kernel@vger.kernel.org;
> > linyunsheng@huawei.com; stable@vger.kernel.org; netdev@vger.kernel.org;
> > Alexander Duyck <alexander.duyck@gmail.com>
> > Subject: [EXTERNAL] Re: [PATCH] mm: page_frag: Fix refill handling in
> > __page_frag_alloc_align()
> >
> > +cc netdev ML & Alexander
> >
> > On 3/1/2025 10:03 AM, Haiyang Zhang wrote:
> > > In commit 8218f62c9c9b ("mm: page_frag: use initial zero offset for
> > > page_frag_alloc_align()"), the check for fragsz is moved earlier.
> > > So when the cache is used up, and if the fragsz > PAGE_SIZE, it won't
> > > try to refill, and just return NULL.
> > > I tested it with fragsz:8192, cache-size:32768. After the initial four
> > > successful allocations, it failed, even there is plenty of free memory
> > > in the system.
> >
> > Hi, Haiyang
> > It seems the PAGE_SIZE is 4K for the tested system?
> Yes.
>
> > Which drivers or subsystems are passing the fragsz being bigger than
> > PAGE_SIZE to page_frag_alloc_align() related API?
> For example, our MANA driver when using jumbo frame.
> https://web.git/.
> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fnetdev%2Fnet-
> next.git%2Ftree%2Fdrivers%2Fnet%2Fethernet%2Fmicrosoft%2Fmana&data=05%7C02
> %7Chaiyangz%40microsoft.com%7Cea9cc3de8c904a5c720408dd58e2913a%7C72f988bf8
> 6f141af91ab2d7cd011db47%7C1%7C0%7C638764452327076527%7CUnknown%7CTWFpbGZsb
> 3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWF
> pbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=7%2BZ9hBYGudGZUeeF0i4UEa3zjx4ZLFd
> q5E3qcZxnIWE%3D&reserved=0
>
> > > To fix, revert the refill logic like before: the refill is attempted
> > > before the check & return NULL.
> >
> > page_frag API is not really for allocating memory being bigger than
> > PAGE_SIZE as __page_frag_cache_refill() will not try hard enough to
> > allocate order 3 compound page when calling __alloc_pages() and will
> > fail back to allocate base page as the discussed in below:
> >
> https://lore.ker/
> %2F&data=05%7C02%7Chaiyangz%40microsoft.com%7Cea9cc3de8c904a5c720408dd58e2
> 913a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638764452327105287%7CUnk
> nown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4
> zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=g4VAI8DbzUD95qgth
> vzFV0PYgOIA3%2F%2FI3gmQHzuLwbo%3D&reserved=0
> > nel.org%2Fall%2Fead00fb7-8538-45b3-8322-
> >
> 8a41386e7381%40huawei.com%2F&data=05%7C02%7Chaiyangz%40microsoft.com%7Cd73
> >
> d6a0ae65b4a42681c08dd58c8087b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7
> >
> C638764338396356411%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOi
> >
> IwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7
> > C&sdata=FJ7Ggrxxxv6QzKepUiHmtns1GZC2G2oJMcWSzOuFbsE%3D&reserved=0
> We are already aware of this, and have error checking in place for the
> failover
> case to "base page".
>
> From the discussion thread above, there are other drivers using
> page_frag_alloc_align() for over PAGE_SIZE too. If making the page_frag
> API
> support only fragsz <= PAGE_SIZE is desired, can we create another API?
> One
> keeps the existing API semantics (allowing > PAGE_SIZE), the other uses
> your new code. By the way, it should add an explicit check and fail ALL
> requests
> for fragsz > PAGE_SIZE. Currently your code successfully allocates big
> frags
> for a few times, then fail. This is not a desired behavior. It's also a
> breaking change for our MANA driver, which can no longer run Jumbo frames.
>
> @Andrew Morton <akpm@linux-foundation.org>
> And other maintainers, could you please also evaluate the idea above?
>
And, quote from current doc 6.14.0-rc4:
"A page fragment is an arbitrary-length arbitrary-offset area of memory
which resides within a 0 or higher order compound page."
https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Documentation/mm/page_frags.rst
So, it is designed to be *arbitrary-length* within a 0 or higher order
compound page.
If the commit 8218f62c9c9b ("mm: page_frag: use initial zero offset for
page_frag_alloc_align()") intended to change the existing API semantics
to be Page Frag Length <= PAGE_SIZE, the document and all breaking drivers
need to be updated.
Thanks,
- Haiyang
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [EXTERNAL] Re: [PATCH] mm: page_frag: Fix refill handling in __page_frag_alloc_align()
2025-03-01 23:16 ` Haiyang Zhang
@ 2025-03-02 2:22 ` Yunsheng Lin
0 siblings, 0 replies; 5+ messages in thread
From: Yunsheng Lin @ 2025-03-02 2:22 UTC (permalink / raw)
To: Haiyang Zhang, linux-hyperv, akpm, linux-mm
Cc: Dexuan Cui, KY Srinivasan, Paul Rosswurm, olaf, vkuznets, davem,
wei.liu, Long Li, linux-kernel, linyunsheng, stable, netdev,
Alexander Duyck
On 3/2/2025 7:16 AM, Haiyang Zhang wrote:
...
>> We are already aware of this, and have error checking in place for the
>> failover
>> case to "base page".
>>
>> From the discussion thread above, there are other drivers using
>> page_frag_alloc_align() for over PAGE_SIZE too. If making the page_frag
>> API
>> support only fragsz <= PAGE_SIZE is desired, can we create another API?
>> One
>> keeps the existing API semantics (allowing > PAGE_SIZE), the other uses
>> your new code. By the way, it should add an explicit check and fail ALL
>> requests
>> for fragsz > PAGE_SIZE. Currently your code successfully allocates big
>> frags
>> for a few times, then fail. This is not a desired behavior. It's also a
>> breaking change for our MANA driver, which can no longer run Jumbo frames.
It seems there was some memory corruption problem that may caused by
reuse of the previously allocated frag cache memory by following a
LARGER allocations before 'using initial zero offset'.
https://lore.kernel.org/netdev/b711ca5f-4180-d658-a330-89bd5dcb0acb@gmail.com/T/#m12ebdefb8ee653b281d477d2310218b4ac138cde
'using initial zero offset' seems to just make the API misuse problem
more obvious.
>>
>> @Andrew Morton <akpm@linux-foundation.org>
>> And other maintainers, could you please also evaluate the idea above?
>>
>
> And, quote from current doc 6.14.0-rc4:
> "A page fragment is an arbitrary-length arbitrary-offset area of memory
> which resides within a 0 or higher order compound page."
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Documentation/mm/page_frags.rst
>
> So, it is designed to be *arbitrary-length* within a 0 or higher order
> compound page.
>
> If the commit 8218f62c9c9b ("mm: page_frag: use initial zero offset for
> page_frag_alloc_align()") intended to change the existing API semantics
> to be Page Frag Length <= PAGE_SIZE, the document and all breaking drivers
> need to be updated.
Yes, updating the Documemntation to make it more obvious seems like the
right thing to do.
>
> Thanks,
> - Haiyang
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-03-02 2:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-01 2:03 [PATCH] mm: page_frag: Fix refill handling in __page_frag_alloc_align() Haiyang Zhang
2025-03-01 13:49 ` Yunsheng Lin
2025-03-01 17:00 ` [EXTERNAL] " Haiyang Zhang
2025-03-01 23:16 ` Haiyang Zhang
2025-03-02 2:22 ` Yunsheng Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox