* [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
@ 2025-09-26 11:38 Toke Høiland-Jørgensen
2025-09-26 15:25 ` Helge Deller
2025-09-30 0:04 ` Mina Almasry
0 siblings, 2 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-09-26 11:38 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Jesper Dangaard Brouer,
Ilias Apalodimas, Toke Høiland-Jørgensen, Mina Almasry,
Jakub Kicinski
Cc: Helge Deller, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, linux-mm, netdev
Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
boot on his 32-bit parisc machine. The cause of this is the mask is set
too wide, so the page_pool_page_is_pp() incurs false positives which
crashes the machine.
Just disabling the check in page_pool_is_pp() will lead to the page_pool
code itself malfunctioning; so instead of doing this, this patch changes
the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
pointers for page_pool-tagged pages.
The fix relies on the kernel pointers that alias with the pp_magic field
always being above PAGE_OFFSET. With this assumption, we can use the
lowest bit of the value of PAGE_OFFSET as the upper bound of the
PP_DMA_INDEX_MASK, which should avoid the false positives.
Because we cannot rely on PAGE_OFFSET always being a compile-time
constant, nor on it always being >0, we fall back to disabling the
dma_index storage when there are no bits available. This leaves us in
the situation we were in before the patch in the Fixes tag, but only on
a subset of architecture configurations. This seems to be the best we
can do until the transition to page types in complete for page_pool
pages.
Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
Sorry for the delay on getting this out. I have only compile-tested it,
since I don't have any hardware that triggers the original bug. Helge, I'm
hoping you can take it for a spin?
include/linux/mm.h | 18 +++++------
net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
2 files changed, 62 insertions(+), 32 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ae97a0b8ec7..28541cb40f69 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
* since this value becomes part of PP_SIGNATURE; meaning we can just use the
* space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
* lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
- * 0, we make sure that we leave the two topmost bits empty, as that guarantees
- * we won't mistake a valid kernel pointer for a value we set, regardless of the
- * VMSPLIT setting.
+ * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
+ * known at compile-time.
*
- * Altogether, this means that the number of bits available is constrained by
- * the size of an unsigned long (at the upper end, subtracting two bits per the
- * above), and the definition of PP_SIGNATURE (with or without
- * POISON_POINTER_DELTA).
+ * If the value of PAGE_OFFSET is not known at compile time, or if it is too
+ * small to leave some bits available above PP_SIGNATURE, we define the number
+ * of bits to be 0, which turns off the DMA index tracking altogether (see
+ * page_pool_register_dma_index()).
*/
#define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
#if POISON_POINTER_DELTA > 0
@@ -4175,8 +4174,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
*/
#define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
#else
-/* Always leave out the topmost two; see above. */
-#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
+/* Constrain to the lowest bit of PAGE_OFFSET if known; see above. */
+#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && PAGE_OFFSET > PP_SIGNATURE) ? \
+ MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
#endif
#define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 36a98f2bcac3..e224d2145eed 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -472,11 +472,60 @@ page_pool_dma_sync_for_device(const struct page_pool *pool,
}
}
+static int page_pool_register_dma_index(struct page_pool *pool,
+ netmem_ref netmem, gfp_t gfp)
+{
+ int err = 0;
+ u32 id;
+
+ if (unlikely(!PP_DMA_INDEX_BITS))
+ goto out;
+
+ if (in_softirq())
+ err = xa_alloc(&pool->dma_mapped, &id, netmem_to_page(netmem),
+ PP_DMA_INDEX_LIMIT, gfp);
+ else
+ err = xa_alloc_bh(&pool->dma_mapped, &id, netmem_to_page(netmem),
+ PP_DMA_INDEX_LIMIT, gfp);
+ if (err) {
+ WARN_ONCE(err != -ENOMEM, "couldn't track DMA mapping, please report to netdev@");
+ goto out;
+ }
+
+ netmem_set_dma_index(netmem, id);
+out:
+ return err;
+}
+
+static int page_pool_release_dma_index(struct page_pool *pool,
+ netmem_ref netmem)
+{
+ struct page *old, *page = netmem_to_page(netmem);
+ unsigned long id;
+
+ if (unlikely(!PP_DMA_INDEX_BITS))
+ return 0;
+
+ id = netmem_get_dma_index(netmem);
+ if (!id)
+ return -1;
+
+ if (in_softirq())
+ old = xa_cmpxchg(&pool->dma_mapped, id, page, NULL, 0);
+ else
+ old = xa_cmpxchg_bh(&pool->dma_mapped, id, page, NULL, 0);
+ if (old != page)
+ return -1;
+
+ netmem_set_dma_index(netmem, 0);
+
+ return 0;
+}
+
static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t gfp)
{
dma_addr_t dma;
int err;
- u32 id;
/* Setup DMA mapping: use 'struct page' area for storing DMA-addr
* since dma_addr_t can be either 32 or 64 bits and does not always fit
@@ -495,18 +544,10 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t g
goto unmap_failed;
}
- if (in_softirq())
- err = xa_alloc(&pool->dma_mapped, &id, netmem_to_page(netmem),
- PP_DMA_INDEX_LIMIT, gfp);
- else
- err = xa_alloc_bh(&pool->dma_mapped, &id, netmem_to_page(netmem),
- PP_DMA_INDEX_LIMIT, gfp);
- if (err) {
- WARN_ONCE(err != -ENOMEM, "couldn't track DMA mapping, please report to netdev@");
+ err = page_pool_register_dma_index(pool, netmem, gfp);
+ if (err)
goto unset_failed;
- }
- netmem_set_dma_index(netmem, id);
page_pool_dma_sync_for_device(pool, netmem, pool->p.max_len);
return true;
@@ -684,8 +725,6 @@ void page_pool_clear_pp_info(netmem_ref netmem)
static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
netmem_ref netmem)
{
- struct page *old, *page = netmem_to_page(netmem);
- unsigned long id;
dma_addr_t dma;
if (!pool->dma_map)
@@ -694,15 +733,7 @@ static __always_inline void __page_pool_release_netmem_dma(struct page_pool *poo
*/
return;
- id = netmem_get_dma_index(netmem);
- if (!id)
- return;
-
- if (in_softirq())
- old = xa_cmpxchg(&pool->dma_mapped, id, page, NULL, 0);
- else
- old = xa_cmpxchg_bh(&pool->dma_mapped, id, page, NULL, 0);
- if (old != page)
+ if (page_pool_release_dma_index(pool, netmem))
return;
dma = page_pool_get_dma_addr_netmem(netmem);
@@ -712,7 +743,6 @@ static __always_inline void __page_pool_release_netmem_dma(struct page_pool *poo
PAGE_SIZE << pool->p.order, pool->p.dma_dir,
DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
page_pool_set_dma_addr_netmem(netmem, 0);
- netmem_set_dma_index(netmem, 0);
}
/* Disconnects a page (from a page_pool). API users can have a need
--
2.51.0
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
2025-09-26 11:38 [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches Toke Høiland-Jørgensen
@ 2025-09-26 15:25 ` Helge Deller
2025-09-29 10:07 ` Toke Høiland-Jørgensen
2025-09-30 0:04 ` Mina Almasry
1 sibling, 1 reply; 7+ messages in thread
From: Helge Deller @ 2025-09-26 15:25 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Andrew Morton,
David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Jesper Dangaard Brouer, Ilias Apalodimas, Mina Almasry,
Jakub Kicinski
Cc: Helge Deller, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, linux-mm, netdev
On 9/26/25 13:38, Toke Høiland-Jørgensen wrote:
> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
> boot on his 32-bit parisc machine. The cause of this is the mask is set
> too wide, so the page_pool_page_is_pp() incurs false positives which
> crashes the machine.
>
> Just disabling the check in page_pool_is_pp() will lead to the page_pool
> code itself malfunctioning; so instead of doing this, this patch changes
> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
> pointers for page_pool-tagged pages.
>
> The fix relies on the kernel pointers that alias with the pp_magic field
> always being above PAGE_OFFSET. With this assumption, we can use the
> lowest bit of the value of PAGE_OFFSET as the upper bound of the
> PP_DMA_INDEX_MASK, which should avoid the false positives.
>
> Because we cannot rely on PAGE_OFFSET always being a compile-time
> constant, nor on it always being >0, we fall back to disabling the
> dma_index storage when there are no bits available. This leaves us in
> the situation we were in before the patch in the Fixes tag, but only on
> a subset of architecture configurations. This seems to be the best we
> can do until the transition to page types in complete for page_pool
> pages.
>
> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
> Sorry for the delay on getting this out. I have only compile-tested it,
> since I don't have any hardware that triggers the original bug. Helge, I'm
> hoping you can take it for a spin?
I can't comment if the patch is otherwise ok, but it does
indeed fixes the boot problem for me, so:
Tested-by: Helge Deller <deller@gmx.de>
Btw, this can easily be tested with qemu:
./qemu-system-hppa -kernel vmlinux -nographic -serial mon:stdio
If the patch is accepted, can you add the CC-stable tag, so that
it gets pushed down to kernel 6.15+ too?
Thank you, Toke!
Helge
>
> include/linux/mm.h | 18 +++++------
> net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
> 2 files changed, 62 insertions(+), 32 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ae97a0b8ec7..28541cb40f69 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
> - * we won't mistake a valid kernel pointer for a value we set, regardless of the
> - * VMSPLIT setting.
> + * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
> + * known at compile-time.
> *
> - * Altogether, this means that the number of bits available is constrained by
> - * the size of an unsigned long (at the upper end, subtracting two bits per the
> - * above), and the definition of PP_SIGNATURE (with or without
> - * POISON_POINTER_DELTA).
> + * If the value of PAGE_OFFSET is not known at compile time, or if it is too
> + * small to leave some bits available above PP_SIGNATURE, we define the number
> + * of bits to be 0, which turns off the DMA index tracking altogether (see
> + * page_pool_register_dma_index()).
> */
> #define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
> #if POISON_POINTER_DELTA > 0
> @@ -4175,8 +4174,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> */
> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
> #else
> -/* Always leave out the topmost two; see above. */
> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
> +/* Constrain to the lowest bit of PAGE_OFFSET if known; see above. */
> +#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && PAGE_OFFSET > PP_SIGNATURE) ? \
> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
> #endif
>
> #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 36a98f2bcac3..e224d2145eed 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -472,11 +472,60 @@ page_pool_dma_sync_for_device(const struct page_pool *pool,
> }
> }
>
> +static int page_pool_register_dma_index(struct page_pool *pool,
> + netmem_ref netmem, gfp_t gfp)
> +{
> + int err = 0;
> + u32 id;
> +
> + if (unlikely(!PP_DMA_INDEX_BITS))
> + goto out;
> +
> + if (in_softirq())
> + err = xa_alloc(&pool->dma_mapped, &id, netmem_to_page(netmem),
> + PP_DMA_INDEX_LIMIT, gfp);
> + else
> + err = xa_alloc_bh(&pool->dma_mapped, &id, netmem_to_page(netmem),
> + PP_DMA_INDEX_LIMIT, gfp);
> + if (err) {
> + WARN_ONCE(err != -ENOMEM, "couldn't track DMA mapping, please report to netdev@");
> + goto out;
> + }
> +
> + netmem_set_dma_index(netmem, id);
> +out:
> + return err;
> +}
> +
> +static int page_pool_release_dma_index(struct page_pool *pool,
> + netmem_ref netmem)
> +{
> + struct page *old, *page = netmem_to_page(netmem);
> + unsigned long id;
> +
> + if (unlikely(!PP_DMA_INDEX_BITS))
> + return 0;
> +
> + id = netmem_get_dma_index(netmem);
> + if (!id)
> + return -1;
> +
> + if (in_softirq())
> + old = xa_cmpxchg(&pool->dma_mapped, id, page, NULL, 0);
> + else
> + old = xa_cmpxchg_bh(&pool->dma_mapped, id, page, NULL, 0);
> + if (old != page)
> + return -1;
> +
> + netmem_set_dma_index(netmem, 0);
> +
> + return 0;
> +}
> +
> static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t gfp)
> {
> dma_addr_t dma;
> int err;
> - u32 id;
>
> /* Setup DMA mapping: use 'struct page' area for storing DMA-addr
> * since dma_addr_t can be either 32 or 64 bits and does not always fit
> @@ -495,18 +544,10 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t g
> goto unmap_failed;
> }
>
> - if (in_softirq())
> - err = xa_alloc(&pool->dma_mapped, &id, netmem_to_page(netmem),
> - PP_DMA_INDEX_LIMIT, gfp);
> - else
> - err = xa_alloc_bh(&pool->dma_mapped, &id, netmem_to_page(netmem),
> - PP_DMA_INDEX_LIMIT, gfp);
> - if (err) {
> - WARN_ONCE(err != -ENOMEM, "couldn't track DMA mapping, please report to netdev@");
> + err = page_pool_register_dma_index(pool, netmem, gfp);
> + if (err)
> goto unset_failed;
> - }
>
> - netmem_set_dma_index(netmem, id);
> page_pool_dma_sync_for_device(pool, netmem, pool->p.max_len);
>
> return true;
> @@ -684,8 +725,6 @@ void page_pool_clear_pp_info(netmem_ref netmem)
> static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
> netmem_ref netmem)
> {
> - struct page *old, *page = netmem_to_page(netmem);
> - unsigned long id;
> dma_addr_t dma;
>
> if (!pool->dma_map)
> @@ -694,15 +733,7 @@ static __always_inline void __page_pool_release_netmem_dma(struct page_pool *poo
> */
> return;
>
> - id = netmem_get_dma_index(netmem);
> - if (!id)
> - return;
> -
> - if (in_softirq())
> - old = xa_cmpxchg(&pool->dma_mapped, id, page, NULL, 0);
> - else
> - old = xa_cmpxchg_bh(&pool->dma_mapped, id, page, NULL, 0);
> - if (old != page)
> + if (page_pool_release_dma_index(pool, netmem))
> return;
>
> dma = page_pool_get_dma_addr_netmem(netmem);
> @@ -712,7 +743,6 @@ static __always_inline void __page_pool_release_netmem_dma(struct page_pool *poo
> PAGE_SIZE << pool->p.order, pool->p.dma_dir,
> DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
> page_pool_set_dma_addr_netmem(netmem, 0);
> - netmem_set_dma_index(netmem, 0);
> }
>
> /* Disconnects a page (from a page_pool). API users can have a need
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
2025-09-26 15:25 ` Helge Deller
@ 2025-09-29 10:07 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-09-29 10:07 UTC (permalink / raw)
To: Helge Deller, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Jesper Dangaard Brouer,
Ilias Apalodimas, Mina Almasry, Jakub Kicinski
Cc: Helge Deller, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, linux-mm, netdev
Helge Deller <deller@gmx.de> writes:
> On 9/26/25 13:38, Toke Høiland-Jørgensen wrote:
>> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
>> boot on his 32-bit parisc machine. The cause of this is the mask is set
>> too wide, so the page_pool_page_is_pp() incurs false positives which
>> crashes the machine.
>>
>> Just disabling the check in page_pool_is_pp() will lead to the page_pool
>> code itself malfunctioning; so instead of doing this, this patch changes
>> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
>> pointers for page_pool-tagged pages.
>>
>> The fix relies on the kernel pointers that alias with the pp_magic field
>> always being above PAGE_OFFSET. With this assumption, we can use the
>> lowest bit of the value of PAGE_OFFSET as the upper bound of the
>> PP_DMA_INDEX_MASK, which should avoid the false positives.
>>
>> Because we cannot rely on PAGE_OFFSET always being a compile-time
>> constant, nor on it always being >0, we fall back to disabling the
>> dma_index storage when there are no bits available. This leaves us in
>> the situation we were in before the patch in the Fixes tag, but only on
>> a subset of architecture configurations. This seems to be the best we
>> can do until the transition to page types in complete for page_pool
>> pages.
>>
>> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
>> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>> Sorry for the delay on getting this out. I have only compile-tested it,
>> since I don't have any hardware that triggers the original bug. Helge, I'm
>> hoping you can take it for a spin?
>
> I can't comment if the patch is otherwise ok, but it does
> indeed fixes the boot problem for me, so:
>
> Tested-by: Helge Deller <deller@gmx.de>
Great, thanks for testing :)
> Btw, this can easily be tested with qemu:
> ./qemu-system-hppa -kernel vmlinux -nographic -serial mon:stdio
Ah, neat, thank you for the pointer!
> If the patch is accepted, can you add the CC-stable tag, so that
> it gets pushed down to kernel 6.15+ too?
I find that networking patches make it to the stable trees based on the
Fixes tags, but I'll try to keep an eye on it just to make sure. I can
also add the Cc if I need to respin for some other reason.
-Toke
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
2025-09-26 11:38 [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches Toke Høiland-Jørgensen
2025-09-26 15:25 ` Helge Deller
@ 2025-09-30 0:04 ` Mina Almasry
2025-09-30 7:45 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 7+ messages in thread
From: Mina Almasry @ 2025-09-30 0:04 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Jesper Dangaard Brouer,
Ilias Apalodimas, Jakub Kicinski, Helge Deller, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-mm, netdev
On Fri, Sep 26, 2025 at 4:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
> boot on his 32-bit parisc machine. The cause of this is the mask is set
> too wide, so the page_pool_page_is_pp() incurs false positives which
> crashes the machine.
>
> Just disabling the check in page_pool_is_pp() will lead to the page_pool
> code itself malfunctioning; so instead of doing this, this patch changes
> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
> pointers for page_pool-tagged pages.
>
> The fix relies on the kernel pointers that alias with the pp_magic field
> always being above PAGE_OFFSET. With this assumption, we can use the
> lowest bit of the value of PAGE_OFFSET as the upper bound of the
> PP_DMA_INDEX_MASK, which should avoid the false positives.
>
> Because we cannot rely on PAGE_OFFSET always being a compile-time
> constant, nor on it always being >0, we fall back to disabling the
> dma_index storage when there are no bits available. This leaves us in
> the situation we were in before the patch in the Fixes tag, but only on
> a subset of architecture configurations. This seems to be the best we
> can do until the transition to page types in complete for page_pool
> pages.
>
> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
> Sorry for the delay on getting this out. I have only compile-tested it,
> since I don't have any hardware that triggers the original bug. Helge, I'm
> hoping you can take it for a spin?
>
> include/linux/mm.h | 18 +++++------
> net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
> 2 files changed, 62 insertions(+), 32 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ae97a0b8ec7..28541cb40f69 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
> - * we won't mistake a valid kernel pointer for a value we set, regardless of the
> - * VMSPLIT setting.
> + * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
> + * known at compile-time.
> *
> - * Altogether, this means that the number of bits available is constrained by
> - * the size of an unsigned long (at the upper end, subtracting two bits per the
> - * above), and the definition of PP_SIGNATURE (with or without
> - * POISON_POINTER_DELTA).
> + * If the value of PAGE_OFFSET is not known at compile time, or if it is too
> + * small to leave some bits available above PP_SIGNATURE, we define the number
> + * of bits to be 0, which turns off the DMA index tracking altogether (see
> + * page_pool_register_dma_index()).
> */
> #define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
> #if POISON_POINTER_DELTA > 0
> @@ -4175,8 +4174,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> */
> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
> #else
> -/* Always leave out the topmost two; see above. */
> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
> +/* Constrain to the lowest bit of PAGE_OFFSET if known; see above. */
> +#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && PAGE_OFFSET > PP_SIGNATURE) ? \
> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
Do you have to watch out for an underflow of __ffs(PAGE_OFFSET) -
PP_DMA_INDEX_SHIFT (at which point we'll presumably use 32 here
instead of the expected 0)? Or is that guaranteed to be positive for
some reason I'm not immediately grasping.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
2025-09-30 0:04 ` Mina Almasry
@ 2025-09-30 7:45 ` Toke Høiland-Jørgensen
2025-09-30 10:13 ` Paolo Abeni
0 siblings, 1 reply; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-09-30 7:45 UTC (permalink / raw)
To: Mina Almasry
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Jesper Dangaard Brouer,
Ilias Apalodimas, Jakub Kicinski, Helge Deller, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-mm, netdev
Mina Almasry <almasrymina@google.com> writes:
> On Fri, Sep 26, 2025 at 4:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
>> boot on his 32-bit parisc machine. The cause of this is the mask is set
>> too wide, so the page_pool_page_is_pp() incurs false positives which
>> crashes the machine.
>>
>> Just disabling the check in page_pool_is_pp() will lead to the page_pool
>> code itself malfunctioning; so instead of doing this, this patch changes
>> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
>> pointers for page_pool-tagged pages.
>>
>> The fix relies on the kernel pointers that alias with the pp_magic field
>> always being above PAGE_OFFSET. With this assumption, we can use the
>> lowest bit of the value of PAGE_OFFSET as the upper bound of the
>> PP_DMA_INDEX_MASK, which should avoid the false positives.
>>
>> Because we cannot rely on PAGE_OFFSET always being a compile-time
>> constant, nor on it always being >0, we fall back to disabling the
>> dma_index storage when there are no bits available. This leaves us in
>> the situation we were in before the patch in the Fixes tag, but only on
>> a subset of architecture configurations. This seems to be the best we
>> can do until the transition to page types in complete for page_pool
>> pages.
>>
>> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
>> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>> Sorry for the delay on getting this out. I have only compile-tested it,
>> since I don't have any hardware that triggers the original bug. Helge, I'm
>> hoping you can take it for a spin?
>>
>> include/linux/mm.h | 18 +++++------
>> net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
>> 2 files changed, 62 insertions(+), 32 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 1ae97a0b8ec7..28541cb40f69 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
>> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
>> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
>> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
>> - * we won't mistake a valid kernel pointer for a value we set, regardless of the
>> - * VMSPLIT setting.
>> + * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
>> + * known at compile-time.
>> *
>> - * Altogether, this means that the number of bits available is constrained by
>> - * the size of an unsigned long (at the upper end, subtracting two bits per the
>> - * above), and the definition of PP_SIGNATURE (with or without
>> - * POISON_POINTER_DELTA).
>> + * If the value of PAGE_OFFSET is not known at compile time, or if it is too
>> + * small to leave some bits available above PP_SIGNATURE, we define the number
>> + * of bits to be 0, which turns off the DMA index tracking altogether (see
>> + * page_pool_register_dma_index()).
>> */
>> #define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
>> #if POISON_POINTER_DELTA > 0
>> @@ -4175,8 +4174,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>> */
>> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>> #else
>> -/* Always leave out the topmost two; see above. */
>> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
>> +/* Constrain to the lowest bit of PAGE_OFFSET if known; see above. */
>> +#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && PAGE_OFFSET > PP_SIGNATURE) ? \
>> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
>
> Do you have to watch out for an underflow of __ffs(PAGE_OFFSET) -
> PP_DMA_INDEX_SHIFT (at which point we'll presumably use 32 here
> instead of the expected 0)? Or is that guaranteed to be positive for
> some reason I'm not immediately grasping.
That's what the 'PAGE_OFFSET > PP_SIGNATURE' in the ternary operator is
for. I'm assuming that PAGE_OFFSET is always a "round" number (e.g.,
0xc0000000), in which case that condition should be sufficient, no?
-Toke
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
2025-09-30 7:45 ` Toke Høiland-Jørgensen
@ 2025-09-30 10:13 ` Paolo Abeni
2025-09-30 11:30 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2025-09-30 10:13 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Mina Almasry
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Jesper Dangaard Brouer,
Ilias Apalodimas, Jakub Kicinski, Helge Deller, David S. Miller,
Eric Dumazet, Simon Horman, linux-mm, netdev
On 9/30/25 9:45 AM, Toke Høiland-Jørgensen wrote:
> Mina Almasry <almasrymina@google.com> writes:
>> On Fri, Sep 26, 2025 at 4:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>
>>> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
>>> boot on his 32-bit parisc machine. The cause of this is the mask is set
>>> too wide, so the page_pool_page_is_pp() incurs false positives which
>>> crashes the machine.
>>>
>>> Just disabling the check in page_pool_is_pp() will lead to the page_pool
>>> code itself malfunctioning; so instead of doing this, this patch changes
>>> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
>>> pointers for page_pool-tagged pages.
>>>
>>> The fix relies on the kernel pointers that alias with the pp_magic field
>>> always being above PAGE_OFFSET. With this assumption, we can use the
>>> lowest bit of the value of PAGE_OFFSET as the upper bound of the
>>> PP_DMA_INDEX_MASK, which should avoid the false positives.
>>>
>>> Because we cannot rely on PAGE_OFFSET always being a compile-time
>>> constant, nor on it always being >0, we fall back to disabling the
>>> dma_index storage when there are no bits available. This leaves us in
>>> the situation we were in before the patch in the Fixes tag, but only on
>>> a subset of architecture configurations. This seems to be the best we
>>> can do until the transition to page types in complete for page_pool
>>> pages.
>>>
>>> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
>>> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>> Sorry for the delay on getting this out. I have only compile-tested it,
>>> since I don't have any hardware that triggers the original bug. Helge, I'm
>>> hoping you can take it for a spin?
>>>
>>> include/linux/mm.h | 18 +++++------
>>> net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
>>> 2 files changed, 62 insertions(+), 32 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 1ae97a0b8ec7..28541cb40f69 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
>>> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
>>> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
>>> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
>>> - * we won't mistake a valid kernel pointer for a value we set, regardless of the
>>> - * VMSPLIT setting.
>>> + * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
>>> + * known at compile-time.
>>> *
>>> - * Altogether, this means that the number of bits available is constrained by
>>> - * the size of an unsigned long (at the upper end, subtracting two bits per the
>>> - * above), and the definition of PP_SIGNATURE (with or without
>>> - * POISON_POINTER_DELTA).
>>> + * If the value of PAGE_OFFSET is not known at compile time, or if it is too
>>> + * small to leave some bits available above PP_SIGNATURE, we define the number
>>> + * of bits to be 0, which turns off the DMA index tracking altogether (see
>>> + * page_pool_register_dma_index()).
>>> */
>>> #define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
>>> #if POISON_POINTER_DELTA > 0
>>> @@ -4175,8 +4174,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>> */
>>> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>>> #else
>>> -/* Always leave out the topmost two; see above. */
>>> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
>>> +/* Constrain to the lowest bit of PAGE_OFFSET if known; see above. */
>>> +#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && PAGE_OFFSET > PP_SIGNATURE) ? \
>>> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
>>
>> Do you have to watch out for an underflow of __ffs(PAGE_OFFSET) -
>> PP_DMA_INDEX_SHIFT (at which point we'll presumably use 32 here
>> instead of the expected 0)? Or is that guaranteed to be positive for
>> some reason I'm not immediately grasping.
>
> That's what the 'PAGE_OFFSET > PP_SIGNATURE' in the ternary operator is
> for. I'm assuming that PAGE_OFFSET is always a "round" number (e.g.,
> 0xc0000000), in which case that condition should be sufficient, no?
IDK if such assumption is obviously true. I think it would be safer to
somehow express it with a build time constraint.
/P
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
2025-09-30 10:13 ` Paolo Abeni
@ 2025-09-30 11:30 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-09-30 11:30 UTC (permalink / raw)
To: Paolo Abeni, Mina Almasry
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Jesper Dangaard Brouer,
Ilias Apalodimas, Jakub Kicinski, Helge Deller, David S. Miller,
Eric Dumazet, Simon Horman, linux-mm, netdev
Paolo Abeni <pabeni@redhat.com> writes:
> On 9/30/25 9:45 AM, Toke Høiland-Jørgensen wrote:
>> Mina Almasry <almasrymina@google.com> writes:
>>> On Fri, Sep 26, 2025 at 4:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>
>>>> Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
>>>> boot on his 32-bit parisc machine. The cause of this is the mask is set
>>>> too wide, so the page_pool_page_is_pp() incurs false positives which
>>>> crashes the machine.
>>>>
>>>> Just disabling the check in page_pool_is_pp() will lead to the page_pool
>>>> code itself malfunctioning; so instead of doing this, this patch changes
>>>> the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
>>>> pointers for page_pool-tagged pages.
>>>>
>>>> The fix relies on the kernel pointers that alias with the pp_magic field
>>>> always being above PAGE_OFFSET. With this assumption, we can use the
>>>> lowest bit of the value of PAGE_OFFSET as the upper bound of the
>>>> PP_DMA_INDEX_MASK, which should avoid the false positives.
>>>>
>>>> Because we cannot rely on PAGE_OFFSET always being a compile-time
>>>> constant, nor on it always being >0, we fall back to disabling the
>>>> dma_index storage when there are no bits available. This leaves us in
>>>> the situation we were in before the patch in the Fixes tag, but only on
>>>> a subset of architecture configurations. This seems to be the best we
>>>> can do until the transition to page types in complete for page_pool
>>>> pages.
>>>>
>>>> Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
>>>> Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>> ---
>>>> Sorry for the delay on getting this out. I have only compile-tested it,
>>>> since I don't have any hardware that triggers the original bug. Helge, I'm
>>>> hoping you can take it for a spin?
>>>>
>>>> include/linux/mm.h | 18 +++++------
>>>> net/core/page_pool.c | 76 ++++++++++++++++++++++++++++++--------------
>>>> 2 files changed, 62 insertions(+), 32 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 1ae97a0b8ec7..28541cb40f69 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -4159,14 +4159,13 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>> * since this value becomes part of PP_SIGNATURE; meaning we can just use the
>>>> * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
>>>> * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
>>>> - * 0, we make sure that we leave the two topmost bits empty, as that guarantees
>>>> - * we won't mistake a valid kernel pointer for a value we set, regardless of the
>>>> - * VMSPLIT setting.
>>>> + * 0, we use the lowest bit of PAGE_OFFSET as the boundary if that value is
>>>> + * known at compile-time.
>>>> *
>>>> - * Altogether, this means that the number of bits available is constrained by
>>>> - * the size of an unsigned long (at the upper end, subtracting two bits per the
>>>> - * above), and the definition of PP_SIGNATURE (with or without
>>>> - * POISON_POINTER_DELTA).
>>>> + * If the value of PAGE_OFFSET is not known at compile time, or if it is too
>>>> + * small to leave some bits available above PP_SIGNATURE, we define the number
>>>> + * of bits to be 0, which turns off the DMA index tracking altogether (see
>>>> + * page_pool_register_dma_index()).
>>>> */
>>>> #define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
>>>> #if POISON_POINTER_DELTA > 0
>>>> @@ -4175,8 +4174,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>> */
>>>> #define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
>>>> #else
>>>> -/* Always leave out the topmost two; see above. */
>>>> -#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
>>>> +/* Constrain to the lowest bit of PAGE_OFFSET if known; see above. */
>>>> +#define PP_DMA_INDEX_BITS ((__builtin_constant_p(PAGE_OFFSET) && PAGE_OFFSET > PP_SIGNATURE) ? \
>>>> + MIN(32, __ffs(PAGE_OFFSET) - PP_DMA_INDEX_SHIFT) : 0)
>>>
>>> Do you have to watch out for an underflow of __ffs(PAGE_OFFSET) -
>>> PP_DMA_INDEX_SHIFT (at which point we'll presumably use 32 here
>>> instead of the expected 0)? Or is that guaranteed to be positive for
>>> some reason I'm not immediately grasping.
>>
>> That's what the 'PAGE_OFFSET > PP_SIGNATURE' in the ternary operator is
>> for. I'm assuming that PAGE_OFFSET is always a "round" number (e.g.,
>> 0xc0000000), in which case that condition should be sufficient, no?
>
> IDK if such assumption is obviously true. I think it would be safer to
> somehow express it with a build time constraint.
Alright, I'll respin and constrain it further.
-Toke
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-30 11:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-26 11:38 [PATCH net] page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches Toke Høiland-Jørgensen
2025-09-26 15:25 ` Helge Deller
2025-09-29 10:07 ` Toke Høiland-Jørgensen
2025-09-30 0:04 ` Mina Almasry
2025-09-30 7:45 ` Toke Høiland-Jørgensen
2025-09-30 10:13 ` Paolo Abeni
2025-09-30 11:30 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox