linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	 netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org
Subject: Re: [RFC v11 03/14] mm: page_frag: use initial zero offset for page_frag_alloc_align()
Date: Sun, 21 Jul 2024 11:34:15 -0700	[thread overview]
Message-ID: <CAKgT0UfMBo2K7c1UZgJOJt23hO+44Er7JwabrGT6ymGjLps+Gg@mail.gmail.com> (raw)
In-Reply-To: <20240719093338.55117-4-linyunsheng@huawei.com>

On Fri, Jul 19, 2024 at 2:37 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> We are about to use page_frag_alloc_*() API to not just
> allocate memory for skb->data, but also use them to do
> the memory allocation for skb frag too. Currently the
> implementation of page_frag in mm subsystem is running
> the offset as a countdown rather than count-up value,
> there may have several advantages to that as mentioned
> in [1], but it may have some disadvantages, for example,
> it may disable skb frag coaleasing and more correct cache
> prefetching

You misspelled "coalescing".

> We have a trade-off to make in order to have a unified
> implementation and API for page_frag, so use a initial zero
> offset in this patch, and the following patch will try to
> make some optimization to avoid the disadvantages as much
> as possible.
>
> Rename 'offset' to 'remaining' to retain the 'countdown'
> behavior as 'remaining countdown' instead of 'offset
> countdown'. Also, Renaming enable us to do a single
> 'fragsz > remaining' checking for the case of cache not
> being enough, which should be the fast path if we ensure
> 'remaining' is zero when 'va' == NULL by memset'ing
> 'struct page_frag_cache' in page_frag_cache_init() and
> page_frag_cache_drain().
>
> 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/
>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>  include/linux/mm_types_task.h |  4 +-
>  mm/page_frag_cache.c          | 71 +++++++++++++++++++++--------------
>  2 files changed, 44 insertions(+), 31 deletions(-)
>
> diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
> index cdc1e3696439..b1c54b2b9308 100644
> --- a/include/linux/mm_types_task.h
> +++ b/include/linux/mm_types_task.h
> @@ -52,10 +52,10 @@ struct page_frag {
>  struct page_frag_cache {
>         void *va;
>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> -       __u16 offset;
> +       __u16 remaining;
>         __u16 size;
>  #else
> -       __u32 offset;
> +       __u32 remaining;
>  #endif
>         /* we maintain a pagecount bias, so that we dont dirty cache line
>          * containing page->_refcount every time we allocate a fragment.
> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> index 609a485cd02a..2958fe006fe7 100644
> --- a/mm/page_frag_cache.c
> +++ b/mm/page_frag_cache.c
> @@ -22,6 +22,7 @@
>  static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>                                              gfp_t gfp_mask)
>  {
> +       unsigned int page_size = PAGE_FRAG_CACHE_MAX_SIZE;
>         struct page *page = NULL;
>         gfp_t gfp = gfp_mask;
>
> @@ -30,12 +31,21 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>                    __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>         page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
>                                 PAGE_FRAG_CACHE_MAX_ORDER);
> -       nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
>  #endif
> -       if (unlikely(!page))
> +       if (unlikely(!page)) {
>                 page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
> +               if (unlikely(!page)) {
> +                       nc->va = NULL;
> +                       return NULL;
> +               }
>
> -       nc->va = page ? page_address(page) : NULL;
> +               page_size = PAGE_SIZE;
> +       }
> +
> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> +       nc->size = page_size;
> +#endif
> +       nc->va = page_address(page);
>
>         return page;
>  }

Not a huge fan of the changes here. If we are changing the direction
then just do that. I don't see the point of these changes. As far as I
can tell it is just adding noise to the diff and has no effect on the
final code as the outcome is mostly the same except for you don't
update size in the event that you overwrite nc->va to NULL.

> @@ -64,8 +74,8 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>                               unsigned int align_mask)
>  {
>         unsigned int size = PAGE_SIZE;
> +       unsigned int remaining;
>         struct page *page;
> -       int offset;
>
>         if (unlikely(!nc->va)) {
>  refill:
> @@ -82,35 +92,20 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>                  */
>                 page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
>
> -               /* reset page count bias and offset to start of new frag */
> +               /* reset page count bias and remaining to start of new frag */
>                 nc->pfmemalloc = page_is_pfmemalloc(page);
>                 nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> -               nc->offset = size;
> +               nc->remaining = size;
>         }
>
> -       offset = nc->offset - fragsz;
> -       if (unlikely(offset < 0)) {
> -               page = virt_to_page(nc->va);
> -
> -               if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> -                       goto refill;
> -
> -               if (unlikely(nc->pfmemalloc)) {
> -                       free_unref_page(page, compound_order(page));
> -                       goto refill;
> -               }
> -
>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> -               /* if size can vary use size else just use PAGE_SIZE */
> -               size = nc->size;
> +       /* if size can vary use size else just use PAGE_SIZE */
> +       size = nc->size;
>  #endif

Rather than pulling this out and placing it here it might make more
sense at the start of the function. Basically just overwrite size w/
either PAGE_SIZE or nc->size right at the start. Then if we have to
reallocate we overwrite it. That way we can avoid some redundancy and
this will be easier to read.

> -               /* OK, page count is 0, we can safely set it */
> -               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>
> -               /* reset page count bias and offset to start of new frag */
> -               nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> -               offset = size - fragsz;
> -               if (unlikely(offset < 0)) {
> +       remaining = nc->remaining & align_mask;
> +       if (unlikely(remaining < fragsz)) {
> +               if (unlikely(fragsz > PAGE_SIZE)) {
>                         /*
>                          * The caller is trying to allocate a fragment
>                          * with fragsz > PAGE_SIZE but the cache isn't big
> @@ -122,13 +117,31 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>                          */
>                         return NULL;
>                 }
> +
> +               page = virt_to_page(nc->va);
> +
> +               if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> +                       goto refill;
> +
> +               if (unlikely(nc->pfmemalloc)) {
> +                       free_unref_page(page, compound_order(page));
> +                       goto refill;
> +               }
> +
> +               /* OK, page count is 0, we can safely set it */
> +               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
> +
> +               /* reset page count bias and remaining to start of new frag */
> +               nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> +               nc->remaining = size;

Why are you setting nc->remaining here? You set it a few lines below.
This is redundant.

> +
> +               remaining = size;
>         }
>
>         nc->pagecnt_bias--;
> -       offset &= align_mask;
> -       nc->offset = offset;
> +       nc->remaining = remaining - fragsz;
>
> -       return nc->va + offset;
> +       return nc->va + (size - remaining);
>  }
>  EXPORT_SYMBOL(__page_frag_alloc_align);


  reply	other threads:[~2024-07-21 18:34 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240719093338.55117-1-linyunsheng@huawei.com>
2024-07-19  9:33 ` [RFC v11 01/14] mm: page_frag: add a test module for page_frag Yunsheng Lin
2024-07-21 17:34   ` Alexander Duyck
2024-07-23 13:19     ` Yunsheng Lin
2024-07-19  9:33 ` [RFC v11 02/14] mm: move the page fragment allocator from page_alloc into its own file Yunsheng Lin
2024-07-21 17:58   ` Alexander Duyck
2024-07-27 15:04     ` Yunsheng Lin
2024-07-19  9:33 ` [RFC v11 03/14] mm: page_frag: use initial zero offset for page_frag_alloc_align() Yunsheng Lin
2024-07-21 18:34   ` Alexander Duyck [this message]
2024-07-19  9:33 ` [RFC v11 04/14] mm: page_frag: add '_va' suffix to page_frag API Yunsheng Lin
     [not found]   ` <CAKgT0UcqELiXntRA_uD8eJGjt-OCLO64ax=YFXrCHNnaj9kD8g@mail.gmail.com>
2024-07-25 12:21     ` Yunsheng Lin
2024-07-19  9:33 ` [RFC v11 05/14] mm: page_frag: avoid caller accessing 'page_frag_cache' directly Yunsheng Lin
2024-07-21 23:01   ` Alexander H Duyck
2024-07-19  9:33 ` [RFC v11 07/14] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' Yunsheng Lin
2024-07-21 22:59   ` Alexander H Duyck
2024-07-19  9:33 ` [RFC v11 08/14] mm: page_frag: some minor refactoring before adding new API Yunsheng Lin
2024-07-21 23:40   ` Alexander H Duyck
2024-07-22 12:55     ` Yunsheng Lin
2024-07-22 15:32       ` Alexander Duyck
2024-07-23 13:19         ` Yunsheng Lin
2024-07-30 13:20           ` Yunsheng Lin
2024-07-30 15:12             ` Alexander H Duyck
2024-07-31 12:35               ` Yunsheng Lin
2024-07-31 17:02                 ` Alexander H Duyck
2024-08-01 12:53                   ` Yunsheng Lin
2024-07-19  9:33 ` [RFC v11 09/14] mm: page_frag: use __alloc_pages() to replace alloc_pages_node() Yunsheng Lin
2024-07-21 21:41   ` Alexander H Duyck
2024-07-24 12:54     ` Yunsheng Lin
2024-07-24 15:03       ` Alexander Duyck
2024-07-25 12:19         ` Yunsheng Lin
2024-08-14 18:34           ` Alexander H Duyck
2024-07-19  9:33 ` [RFC v11 11/14] mm: page_frag: introduce prepare/probe/commit API Yunsheng Lin
2024-07-19  9:33 ` [RFC v11 13/14] mm: page_frag: update documentation for page_frag Yunsheng Lin
     [not found] ` <CAKgT0UcGvrS7=r0OCGZipzBv8RuwYtRwb2QDXqiF4qW5CNws4g@mail.gmail.com>
     [not found]   ` <b2001dba-a2d2-4b49-bc9f-59e175e7bba1@huawei.com>
2024-07-22 15:21     ` [RFC v11 00/14] Replace page_frag with page_frag_cache for sk_page_frag() Alexander Duyck
2024-07-23 13:17       ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKgT0UfMBo2K7c1UZgJOJt23hO+44Er7JwabrGT6ymGjLps+Gg@mail.gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox