* [PATCH net-next v4 1/5] mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument
[not found] <20240130113710.34511-1-linyunsheng@huawei.com>
@ 2024-01-30 11:37 ` Yunsheng Lin
2024-01-30 11:37 ` [PATCH net-next v4 2/5] page_frag: unify gfp bits for order 3 page allocation Yunsheng Lin
2024-01-30 11:37 ` [PATCH net-next v4 3/5] net: introduce page_frag_cache_drain() Yunsheng Lin
2 siblings, 0 replies; 7+ messages in thread
From: Yunsheng Lin @ 2024-01-30 11:37 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: netdev, linux-kernel, Yunsheng Lin, Alexander Duyck,
Andrew Morton, Eric Dumazet, linux-mm
napi_alloc_frag_align() and netdev_alloc_frag_align() accept
align as an argument, and they are thin wrappers around the
__napi_alloc_frag_align() and __netdev_alloc_frag_align() APIs
doing the alignment checking and align mask conversion, in order
to call page_frag_alloc_align() directly. The intention here is
to keep the alignment checking and the alignmask conversion in
in-line wrapper to avoid those kind of operations during execution
time since it can usually be handled during compile time.
We are going to use page_frag_alloc_align() in vhost_net.c, it
need the same kind of alignment checking and alignmask conversion,
so split up page_frag_alloc_align into an inline wrapper doing the
above operation, and add __page_frag_alloc_align() which is passed
with the align mask the original function expected as suggested by
Alexander.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Alexander Duyck <alexander.duyck@gmail.com>
---
include/linux/gfp.h | 15 +++++++++++----
mm/page_alloc.c | 8 ++++----
net/core/skbuff.c | 6 +++---
3 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index de292a007138..28aea17fa59b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -312,14 +312,21 @@ extern void free_pages(unsigned long addr, unsigned int order);
struct page_frag_cache;
extern void __page_frag_cache_drain(struct page *page, unsigned int count);
-extern void *page_frag_alloc_align(struct page_frag_cache *nc,
- unsigned int fragsz, gfp_t gfp_mask,
- unsigned int align_mask);
+void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz,
+ gfp_t gfp_mask, unsigned int align_mask);
+
+static inline void *page_frag_alloc_align(struct page_frag_cache *nc,
+ unsigned int fragsz, gfp_t gfp_mask,
+ unsigned int align)
+{
+ WARN_ON_ONCE(!is_power_of_2(align));
+ return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align);
+}
static inline void *page_frag_alloc(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask)
{
- return page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u);
+ return __page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u);
}
extern void page_frag_free(void *addr);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 150d4f23b010..c0f7e67c4250 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4708,9 +4708,9 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
}
EXPORT_SYMBOL(__page_frag_cache_drain);
-void *page_frag_alloc_align(struct page_frag_cache *nc,
- unsigned int fragsz, gfp_t gfp_mask,
- unsigned int align_mask)
+void *__page_frag_alloc_align(struct page_frag_cache *nc,
+ unsigned int fragsz, gfp_t gfp_mask,
+ unsigned int align_mask)
{
unsigned int size = PAGE_SIZE;
struct page *page;
@@ -4779,7 +4779,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
return nc->va + offset;
}
-EXPORT_SYMBOL(page_frag_alloc_align);
+EXPORT_SYMBOL(__page_frag_alloc_align);
/*
* Frees a page fragment allocated out of either a compound or order 0 page.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index edbbef563d4d..bc8f3858bc1c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -297,7 +297,7 @@ void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
fragsz = SKB_DATA_ALIGN(fragsz);
- return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask);
+ return __page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask);
}
EXPORT_SYMBOL(__napi_alloc_frag_align);
@@ -309,13 +309,13 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
if (in_hardirq() || irqs_disabled()) {
struct page_frag_cache *nc = this_cpu_ptr(&netdev_alloc_cache);
- data = page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask);
+ data = __page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask);
} else {
struct napi_alloc_cache *nc;
local_bh_disable();
nc = this_cpu_ptr(&napi_alloc_cache);
- data = page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask);
+ data = __page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask);
local_bh_enable();
}
return data;
--
2.33.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net-next v4 2/5] page_frag: unify gfp bits for order 3 page allocation
[not found] <20240130113710.34511-1-linyunsheng@huawei.com>
2024-01-30 11:37 ` [PATCH net-next v4 1/5] mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument Yunsheng Lin
@ 2024-01-30 11:37 ` Yunsheng Lin
2024-02-01 13:16 ` Paolo Abeni
2024-01-30 11:37 ` [PATCH net-next v4 3/5] net: introduce page_frag_cache_drain() Yunsheng Lin
2 siblings, 1 reply; 7+ messages in thread
From: Yunsheng Lin @ 2024-01-30 11:37 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: netdev, linux-kernel, Yunsheng Lin, Alexander Duyck,
Alexander Duyck, Michael S. Tsirkin, Jason Wang, Andrew Morton,
Eric Dumazet, kvm, virtualization, linux-mm
Currently there seems to be three page frag implementions
which all try to allocate order 3 page, if that fails, it
then fail back to allocate order 0 page, and each of them
all allow order 3 page allocation to fail under certain
condition by using specific gfp bits.
The gfp bits for order 3 page allocation are different
between different implementation, __GFP_NOMEMALLOC is
or'd to forbid access to emergency reserves memory for
__page_frag_cache_refill(), but it is not or'd in other
implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
direct reclaim in skb_page_frag_refill(), but it is not
masked off in __page_frag_cache_refill().
This patch unifies the gfp bits used between different
implementions by or'ing __GFP_NOMEMALLOC and masking off
__GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
possible pressure for mm.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
CC: Alexander Duyck <alexander.duyck@gmail.com>
---
drivers/vhost/net.c | 2 +-
mm/page_alloc.c | 4 ++--
net/core/sock.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f2ed7167c848..e574e21cc0ca 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
/* Avoid direct reclaim but allow kswapd to wake */
pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
__GFP_COMP | __GFP_NOWARN |
- __GFP_NORETRY,
+ __GFP_NORETRY | __GFP_NOMEMALLOC,
SKB_FRAG_PAGE_ORDER);
if (likely(pfrag->page)) {
pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c0f7e67c4250..636145c29f70 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4685,8 +4685,8 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
gfp_t gfp = gfp_mask;
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
- gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
- __GFP_NOMEMALLOC;
+ gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP |
+ __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
PAGE_FRAG_CACHE_MAX_ORDER);
nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
diff --git a/net/core/sock.c b/net/core/sock.c
index 88bf810394a5..8289a3d8c375 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2919,7 +2919,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
/* Avoid direct reclaim but allow kswapd to wake */
pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
__GFP_COMP | __GFP_NOWARN |
- __GFP_NORETRY,
+ __GFP_NORETRY | __GFP_NOMEMALLOC,
SKB_FRAG_PAGE_ORDER);
if (likely(pfrag->page)) {
pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
--
2.33.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net-next v4 3/5] net: introduce page_frag_cache_drain()
[not found] <20240130113710.34511-1-linyunsheng@huawei.com>
2024-01-30 11:37 ` [PATCH net-next v4 1/5] mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument Yunsheng Lin
2024-01-30 11:37 ` [PATCH net-next v4 2/5] page_frag: unify gfp bits for order 3 page allocation Yunsheng Lin
@ 2024-01-30 11:37 ` Yunsheng Lin
2 siblings, 0 replies; 7+ messages in thread
From: Yunsheng Lin @ 2024-01-30 11:37 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: netdev, linux-kernel, Yunsheng Lin, Jason Wang, Alexander Duyck,
Jeroen de Borst, Praveen Kaligineedi, Shailend Chand,
Eric Dumazet, Felix Fietkau, Sean Wang, Mark Lee,
Lorenzo Bianconi, Matthias Brugger, AngeloGioacchino Del Regno,
Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, Andrew Morton, linux-arm-kernel,
linux-mediatek, linux-nvme, linux-mm
When draining a page_frag_cache, most user are doing
the similar steps, so introduce an API to avoid code
duplication.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
---
drivers/net/ethernet/google/gve/gve_main.c | 11 ++---------
drivers/net/ethernet/mediatek/mtk_wed_wo.c | 17 ++---------------
drivers/nvme/host/tcp.c | 7 +------
drivers/nvme/target/tcp.c | 4 +---
include/linux/gfp.h | 1 +
mm/page_alloc.c | 10 ++++++++++
6 files changed, 17 insertions(+), 33 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index db6d9ae7cd78..dec6458bb8d7 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1276,17 +1276,10 @@ static void gve_unreg_xdp_info(struct gve_priv *priv)
static void gve_drain_page_cache(struct gve_priv *priv)
{
- struct page_frag_cache *nc;
int i;
- for (i = 0; i < priv->rx_cfg.num_queues; i++) {
- nc = &priv->rx[i].page_cache;
- if (nc->va) {
- __page_frag_cache_drain(virt_to_page(nc->va),
- nc->pagecnt_bias);
- nc->va = NULL;
- }
- }
+ for (i = 0; i < priv->rx_cfg.num_queues; i++)
+ page_frag_cache_drain(&priv->rx[i].page_cache);
}
static void gve_qpls_get_curr_alloc_cfg(struct gve_priv *priv,
diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
index d58b07e7e123..7063c78bd35f 100644
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -286,7 +286,6 @@ mtk_wed_wo_queue_free(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
static void
mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
{
- struct page *page;
int i;
for (i = 0; i < q->n_desc; i++) {
@@ -301,19 +300,12 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
entry->buf = NULL;
}
- if (!q->cache.va)
- return;
-
- page = virt_to_page(q->cache.va);
- __page_frag_cache_drain(page, q->cache.pagecnt_bias);
- memset(&q->cache, 0, sizeof(q->cache));
+ page_frag_cache_drain(&q->cache);
}
static void
mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
{
- struct page *page;
-
for (;;) {
void *buf = mtk_wed_wo_dequeue(wo, q, NULL, true);
@@ -323,12 +315,7 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
skb_free_frag(buf);
}
- if (!q->cache.va)
- return;
-
- page = virt_to_page(q->cache.va);
- __page_frag_cache_drain(page, q->cache.pagecnt_bias);
- memset(&q->cache, 0, sizeof(q->cache));
+ page_frag_cache_drain(&q->cache);
}
static void
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index d058d990532b..22e1fb9c9c0f 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1344,7 +1344,6 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl)
static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
{
- struct page *page;
struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
struct nvme_tcp_queue *queue = &ctrl->queues[qid];
unsigned int noreclaim_flag;
@@ -1355,11 +1354,7 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
if (queue->hdr_digest || queue->data_digest)
nvme_tcp_free_crypto(queue);
- if (queue->pf_cache.va) {
- page = virt_to_head_page(queue->pf_cache.va);
- __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias);
- queue->pf_cache.va = NULL;
- }
+ page_frag_cache_drain(&queue->pf_cache);
noreclaim_flag = memalloc_noreclaim_save();
/* ->sock will be released by fput() */
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 6a1e6bb80062..56224dc59f17 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1591,7 +1591,6 @@ static void nvmet_tcp_free_cmd_data_in_buffers(struct nvmet_tcp_queue *queue)
static void nvmet_tcp_release_queue_work(struct work_struct *w)
{
- struct page *page;
struct nvmet_tcp_queue *queue =
container_of(w, struct nvmet_tcp_queue, release_work);
@@ -1615,8 +1614,7 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w)
if (queue->hdr_digest || queue->data_digest)
nvmet_tcp_free_crypto(queue);
ida_free(&nvmet_tcp_queue_ida, queue->idx);
- page = virt_to_head_page(queue->pf_cache.va);
- __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias);
+ page_frag_cache_drain(&queue->pf_cache);
kfree(queue);
}
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 28aea17fa59b..6cef1c241180 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -311,6 +311,7 @@ extern void __free_pages(struct page *page, unsigned int order);
extern void free_pages(unsigned long addr, unsigned int order);
struct page_frag_cache;
+void page_frag_cache_drain(struct page_frag_cache *nc);
extern void __page_frag_cache_drain(struct page *page, unsigned int count);
void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz,
gfp_t gfp_mask, unsigned int align_mask);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 636145c29f70..06aa1ebbd21c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4699,6 +4699,16 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
return page;
}
+void page_frag_cache_drain(struct page_frag_cache *nc)
+{
+ if (!nc->va)
+ return;
+
+ __page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bias);
+ nc->va = NULL;
+}
+EXPORT_SYMBOL(page_frag_cache_drain);
+
void __page_frag_cache_drain(struct page *page, unsigned int count)
{
VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
--
2.33.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4 2/5] page_frag: unify gfp bits for order 3 page allocation
2024-01-30 11:37 ` [PATCH net-next v4 2/5] page_frag: unify gfp bits for order 3 page allocation Yunsheng Lin
@ 2024-02-01 13:16 ` Paolo Abeni
2024-02-02 2:10 ` Yunsheng Lin
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2024-02-01 13:16 UTC (permalink / raw)
To: Yunsheng Lin, davem, kuba
Cc: netdev, linux-kernel, Alexander Duyck, Alexander Duyck,
Michael S. Tsirkin, Jason Wang, Andrew Morton, Eric Dumazet, kvm,
virtualization, linux-mm
On Tue, 2024-01-30 at 19:37 +0800, Yunsheng Lin wrote:
> Currently there seems to be three page frag implementions
> which all try to allocate order 3 page, if that fails, it
> then fail back to allocate order 0 page, and each of them
> all allow order 3 page allocation to fail under certain
> condition by using specific gfp bits.
>
> The gfp bits for order 3 page allocation are different
> between different implementation, __GFP_NOMEMALLOC is
> or'd to forbid access to emergency reserves memory for
> __page_frag_cache_refill(), but it is not or'd in other
> implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
> direct reclaim in skb_page_frag_refill(), but it is not
> masked off in __page_frag_cache_refill().
>
> This patch unifies the gfp bits used between different
> implementions by or'ing __GFP_NOMEMALLOC and masking off
> __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
> possible pressure for mm.
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> ---
> drivers/vhost/net.c | 2 +-
> mm/page_alloc.c | 4 ++--
> net/core/sock.c | 2 +-
> 3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index f2ed7167c848..e574e21cc0ca 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
> /* Avoid direct reclaim but allow kswapd to wake */
> pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> __GFP_COMP | __GFP_NOWARN |
> - __GFP_NORETRY,
> + __GFP_NORETRY | __GFP_NOMEMALLOC,
> SKB_FRAG_PAGE_ORDER);
> if (likely(pfrag->page)) {
> pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c0f7e67c4250..636145c29f70 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4685,8 +4685,8 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
> gfp_t gfp = gfp_mask;
>
> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> - gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
> - __GFP_NOMEMALLOC;
> + gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP |
> + __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
> page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
> PAGE_FRAG_CACHE_MAX_ORDER);
> nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 88bf810394a5..8289a3d8c375 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2919,7 +2919,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
> /* Avoid direct reclaim but allow kswapd to wake */
> pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> __GFP_COMP | __GFP_NOWARN |
> - __GFP_NORETRY,
> + __GFP_NORETRY | __GFP_NOMEMALLOC,
> SKB_FRAG_PAGE_ORDER);
This will prevent memory reserve usage when allocating order 3 pages,
but not when allocating a single page as a fallback. Still different
from the __page_frag_cache_refill() allocator - which never accesses
the memory reserves.
I'm unsure we want to propagate the __page_frag_cache_refill behavior
here, the current behavior could be required by some systems.
It looks like this series still leave the skb_page_frag_refill()
allocator alone, what about dropping this chunk, too?
Thanks!
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4 2/5] page_frag: unify gfp bits for order 3 page allocation
2024-02-01 13:16 ` Paolo Abeni
@ 2024-02-02 2:10 ` Yunsheng Lin
2024-02-02 8:36 ` Paolo Abeni
0 siblings, 1 reply; 7+ messages in thread
From: Yunsheng Lin @ 2024-02-02 2:10 UTC (permalink / raw)
To: Paolo Abeni, davem, kuba
Cc: netdev, linux-kernel, Alexander Duyck, Alexander Duyck,
Michael S. Tsirkin, Jason Wang, Andrew Morton, Eric Dumazet, kvm,
virtualization, linux-mm
On 2024/2/1 21:16, Paolo Abeni wrote:
> On Tue, 2024-01-30 at 19:37 +0800, Yunsheng Lin wrote:
>> Currently there seems to be three page frag implementions
>> which all try to allocate order 3 page, if that fails, it
>> then fail back to allocate order 0 page, and each of them
>> all allow order 3 page allocation to fail under certain
>> condition by using specific gfp bits.
>>
>> The gfp bits for order 3 page allocation are different
>> between different implementation, __GFP_NOMEMALLOC is
>> or'd to forbid access to emergency reserves memory for
>> __page_frag_cache_refill(), but it is not or'd in other
>> implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
>> direct reclaim in skb_page_frag_refill(), but it is not
>> masked off in __page_frag_cache_refill().
>>
>> This patch unifies the gfp bits used between different
>> implementions by or'ing __GFP_NOMEMALLOC and masking off
>> __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
>> possible pressure for mm.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>> CC: Alexander Duyck <alexander.duyck@gmail.com>
>> ---
>> drivers/vhost/net.c | 2 +-
>> mm/page_alloc.c | 4 ++--
>> net/core/sock.c | 2 +-
>> 3 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index f2ed7167c848..e574e21cc0ca 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
>> /* Avoid direct reclaim but allow kswapd to wake */
>> pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>> __GFP_COMP | __GFP_NOWARN |
>> - __GFP_NORETRY,
>> + __GFP_NORETRY | __GFP_NOMEMALLOC,
>> SKB_FRAG_PAGE_ORDER);
>
>> if (likely(pfrag->page)) {
>> pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c0f7e67c4250..636145c29f70 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4685,8 +4685,8 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>> gfp_t gfp = gfp_mask;
>>
>> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> - gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
>> - __GFP_NOMEMALLOC;
>> + gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP |
>> + __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>> page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
>> PAGE_FRAG_CACHE_MAX_ORDER);
>> nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 88bf810394a5..8289a3d8c375 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2919,7 +2919,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
>> /* Avoid direct reclaim but allow kswapd to wake */
>> pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>> __GFP_COMP | __GFP_NOWARN |
>> - __GFP_NORETRY,
>> + __GFP_NORETRY | __GFP_NOMEMALLOC,
>> SKB_FRAG_PAGE_ORDER);
>
> This will prevent memory reserve usage when allocating order 3 pages,
> but not when allocating a single page as a fallback. Still different
More accurately, the above ensures memory reserve is always not used
for order 3 pages, whether memory reserve is used for order 0 pages
depending on original 'gfp' flags, if 'gfp' does not have __GFP_NOMEMALLOC
bit set, memory reserve may still be used for order 0 pages.
> from the __page_frag_cache_refill() allocator - which never accesses
> the memory reserves.
I am not really sure I understand the above commemt.
The semantic is the same as skb_page_frag_refill() as explained above
as my understanding. Note that __page_frag_cache_refill() use 'gfp_mask'
for allocating order 3 pages and use the original 'gfp' for allocating
order 0 pages.
>
> I'm unsure we want to propagate the __page_frag_cache_refill behavior
> here, the current behavior could be required by some systems.
>
> It looks like this series still leave the skb_page_frag_refill()
> allocator alone, what about dropping this chunk, too?
As explained above, I would prefer to keep it as it is as it seems
to be quite obvious that we can avoid possible pressure for mm by
not using memory reserve for order 3 pages as we have the fallback
for order 0 pages.
Please let me know if there is anything obvious I missed.
>
> Thanks!
>
> Paolo
>
>
> .
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4 2/5] page_frag: unify gfp bits for order 3 page allocation
2024-02-02 2:10 ` Yunsheng Lin
@ 2024-02-02 8:36 ` Paolo Abeni
2024-02-02 12:26 ` Yunsheng Lin
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2024-02-02 8:36 UTC (permalink / raw)
To: Yunsheng Lin, davem, kuba
Cc: netdev, linux-kernel, Alexander Duyck, Alexander Duyck,
Michael S. Tsirkin, Jason Wang, Andrew Morton, Eric Dumazet, kvm,
virtualization, linux-mm
On Fri, 2024-02-02 at 10:10 +0800, Yunsheng Lin wrote:
> On 2024/2/1 21:16, Paolo Abeni wrote:
>
> > from the __page_frag_cache_refill() allocator - which never accesses
> > the memory reserves.
>
> I am not really sure I understand the above commemt.
> The semantic is the same as skb_page_frag_refill() as explained above
> as my understanding. Note that __page_frag_cache_refill() use 'gfp_mask'
> for allocating order 3 pages and use the original 'gfp' for allocating
> order 0 pages.
You are right! I got fooled misreading 'gfp' as 'gfp_mask' in there.
> > I'm unsure we want to propagate the __page_frag_cache_refill behavior
> > here, the current behavior could be required by some systems.
> >
> > It looks like this series still leave the skb_page_frag_refill()
> > allocator alone, what about dropping this chunk, too?
>
> As explained above, I would prefer to keep it as it is as it seems
> to be quite obvious that we can avoid possible pressure for mm by
> not using memory reserve for order 3 pages as we have the fallback
> for order 0 pages.
>
> Please let me know if there is anything obvious I missed.
>
I still think/fear that behaviours changes here could have
subtle/negative side effects - even if I agree the change looks safe.
I think the series without this patch would still achieve its goals and
would be much more uncontroversial. What about move this patch as a
standalone follow-up?
Thanks!
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4 2/5] page_frag: unify gfp bits for order 3 page allocation
2024-02-02 8:36 ` Paolo Abeni
@ 2024-02-02 12:26 ` Yunsheng Lin
0 siblings, 0 replies; 7+ messages in thread
From: Yunsheng Lin @ 2024-02-02 12:26 UTC (permalink / raw)
To: Paolo Abeni, davem, kuba
Cc: netdev, linux-kernel, Alexander Duyck, Alexander Duyck,
Michael S. Tsirkin, Jason Wang, Andrew Morton, Eric Dumazet, kvm,
virtualization, linux-mm
On 2024/2/2 16:36, Paolo Abeni wrote:
> On Fri, 2024-02-02 at 10:10 +0800, Yunsheng Lin wrote:
>> On 2024/2/1 21:16, Paolo Abeni wrote:
>>
>>> from the __page_frag_cache_refill() allocator - which never accesses
>>> the memory reserves.
>>
>> I am not really sure I understand the above commemt.
>> The semantic is the same as skb_page_frag_refill() as explained above
>> as my understanding. Note that __page_frag_cache_refill() use 'gfp_mask'
>> for allocating order 3 pages and use the original 'gfp' for allocating
>> order 0 pages.
>
> You are right! I got fooled misreading 'gfp' as 'gfp_mask' in there.
>
>>> I'm unsure we want to propagate the __page_frag_cache_refill behavior
>>> here, the current behavior could be required by some systems.
>>>
>>> It looks like this series still leave the skb_page_frag_refill()
>>> allocator alone, what about dropping this chunk, too?
>>
>> As explained above, I would prefer to keep it as it is as it seems
>> to be quite obvious that we can avoid possible pressure for mm by
>> not using memory reserve for order 3 pages as we have the fallback
>> for order 0 pages.
>>
>> Please let me know if there is anything obvious I missed.
>>
>
> I still think/fear that behaviours changes here could have
> subtle/negative side effects - even if I agree the change looks safe.
>
> I think the series without this patch would still achieve its goals and
> would be much more uncontroversial. What about move this patch as a
> standalone follow-up?
Fair enough, will remove that for now.
>
> Thanks!
>
> Paolo
>
> .
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-02-02 12:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20240130113710.34511-1-linyunsheng@huawei.com>
2024-01-30 11:37 ` [PATCH net-next v4 1/5] mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument Yunsheng Lin
2024-01-30 11:37 ` [PATCH net-next v4 2/5] page_frag: unify gfp bits for order 3 page allocation Yunsheng Lin
2024-02-01 13:16 ` Paolo Abeni
2024-02-02 2:10 ` Yunsheng Lin
2024-02-02 8:36 ` Paolo Abeni
2024-02-02 12:26 ` Yunsheng Lin
2024-01-30 11:37 ` [PATCH net-next v4 3/5] net: introduce page_frag_cache_drain() Yunsheng Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox