From: Yunsheng Lin <linyunsheng@huawei.com>
To: Alexander H Duyck <alexander.duyck@gmail.com>,
<davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>
Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH net-next v2 05/15] mm: page_frag: use initial zero offset for page_frag_alloc_align()
Date: Tue, 16 Apr 2024 21:11:02 +0800 [thread overview]
Message-ID: <6a78b9ad-0d20-a495-52ca-fac180408658@huawei.com> (raw)
In-Reply-To: <b03bca93fba5a1c1a1bef3db89df11fbc755670b.camel@gmail.com>
On 2024/4/16 7:55, Alexander H Duyck wrote:
> On Mon, 2024-04-15 at 21:19 +0800, Yunsheng Lin wrote:
>> We are above to use page_frag_alloc_*() API to not just
>> allocate memory for skb->data, but also use them to do
>> the memory allocation for skb frag too. Currently the
>> implementation of page_frag in mm subsystem is running
>> the offset as a countdown rather than count-up value,
>> there may have several advantages to that as mentioned
>> in [1], but it may have some disadvantages, for example,
>> it may disable skb frag coaleasing and more correct cache
>> prefetching
>>
>> We have a trade-off to make in order to have a unified
>> implementation and API for page_frag, so use a initial zero
>> offset in this patch, and the following patch will try to
>> make some optimization to aovid the disadvantages as much
>> as possible.
>>
>> 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/
>>
>> CC: Alexander Duyck <alexander.duyck@gmail.com>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
>> mm/page_frag_cache.c | 31 ++++++++++++++-----------------
>> 1 file changed, 14 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
>> index 64993b5d1243..dc864ee09536 100644
>> --- a/mm/page_frag_cache.c
>> +++ b/mm/page_frag_cache.c
>> @@ -65,9 +65,8 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>> unsigned int fragsz, gfp_t gfp_mask,
>> unsigned int align_mask)
>> {
>> - unsigned int size = PAGE_SIZE;
>> + unsigned int size, offset;
>> struct page *page;
>> - int offset;
>>
>> if (unlikely(!nc->va)) {
>> refill:
>> @@ -75,10 +74,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>> if (!page)
>> return NULL;
>>
>> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> - /* if size can vary use size else just use PAGE_SIZE */
>> - size = nc->size;
>> -#endif
>> /* Even if we own the page, we do not use atomic_set().
>> * This would break get_page_unless_zero() users.
>> */
>> @@ -87,11 +82,18 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>> /* reset page count bias and offset to start of new frag */
>> nc->pfmemalloc = page_is_pfmemalloc(page);
>> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>> - nc->offset = size;
>> + nc->offset = 0;
>> }
>>
>> - offset = nc->offset - fragsz;
>> - if (unlikely(offset < 0)) {
>> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> + /* if size can vary use size else just use PAGE_SIZE */
>> + size = nc->size;
>> +#else
>> + size = PAGE_SIZE;
>> +#endif
>> +
>> + offset = ALIGN(nc->offset, -align_mask);
>
> I am not sure if using -align_mask here with the ALIGN macro is really
> to your benefit. I would be curious what the compiler is generating.
>
> Again, I think you would be much better off with:
> offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
>
> That will save you a number of conversions as the use of the ALIGN
> macro gives you:
> offset = (nc->offset + (-align_mask - 1)) & ~(-align_mask -
> 1);
>
> whereas what I am suggesting gives you:
> offset = (nc->offset + ~align_mask) & ~(~align_mask));
>
> My main concern is that I am not sure the compiler will optimize around
> the combination of bit operations and arithmetic operations. It seems
> much cleaner to me to stick to the bitwise operations for the alignment
> than to force this into the vhost approach which requires a power of 2
> aligned mask.
My argument about the above is in [1]. But since you seems to not be working
through the next patch yet, I might just do it as you suggested in the next
version so that I don't have to repeat my argument again:(
1. https://lore.kernel.org/all/df826acf-8867-7eb6-e7f0-962c106bc28b@huawei.com/
>
> Also the old code was aligning on the combination of offset AND fragsz.
> This new logic is aligning on offset only. Do we run the risk of
> overwriting blocks of neighbouring fragments if two users of
> napi_alloc_frag_align end up passing arguments that have different
> alignment values?
I am not sure I understand the question here.
As my understanding, both the old code and new code is aligning on
the offset, and both might have space reserved before the offset
due to aligning. The memory returned to the caller is in the range
of [offset, offset + fragsz). Am I missing something obvious here?
>
>> + if (unlikely(offset + fragsz > size)) {
>> page = virt_to_page(nc->va);
>>
>> if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
>> @@ -102,17 +104,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>> goto refill;
>> }
>>
>> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> - /* if size can vary use size else just use PAGE_SIZE */
>> - size = nc->size;
>> -#endif
>> /* OK, page count is 0, we can safely set it */
>> set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>>
>> /* reset page count bias and offset to start of new frag */
>> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>> - offset = size - fragsz;
>> - if (unlikely(offset < 0)) {
>> + offset = 0;
>> + if (unlikely(fragsz > size)) {
>
> This check can probably be moved now. It was placed here to optimize
> things as a check of offset < 0 was a single jump command based on the
> signed flag being set as a result of the offset calculation.
>
> It might make sense to pull this out of here and instead place it at
> the start of this block after the initial check with offset + fragsz >
> size since that would shorten the need to carry the size variable.
Yes, that is better.
But does it make more sense to just do the 'fragsz > PAGE_SIZE' checking
alongside with the aligning checking, as we have a better chance of
succeding in allocating order 0 page than order 3 page, so it seems the
caller is not allowed to pass a fragsz being bigger than PAGE_SIZE anyway?
>
>> /*
>> * The caller is trying to allocate a fragment
>> * with fragsz > PAGE_SIZE but the cache isn't big
>> @@ -127,8 +125,7 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>> }
>>
>> nc->pagecnt_bias--;
>> - offset &= align_mask;
>> - nc->offset = offset;
>> + nc->offset = offset + fragsz;
>>
>> return nc->va + offset;
>> }
>
> .
>
next prev parent reply other threads:[~2024-04-16 13:11 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240415131941.51153-1-linyunsheng@huawei.com>
2024-04-15 13:19 ` [PATCH net-next v2 01/15] mm: page_frag: add a test module for page_frag Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 03/15] mm: page_frag: use free_unref_page() to free page fragment Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 04/15] mm: move the page fragment allocator from page_alloc into its own file Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 05/15] mm: page_frag: use initial zero offset for page_frag_alloc_align() Yunsheng Lin
2024-04-15 23:55 ` Alexander H Duyck
2024-04-16 13:11 ` Yunsheng Lin [this message]
2024-04-16 15:51 ` Alexander H Duyck
2024-04-17 13:17 ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 06/15] mm: page_frag: change page_frag_alloc_* API to accept align param Yunsheng Lin
2024-04-16 16:08 ` Alexander Duyck
2024-04-17 13:18 ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 07/15] mm: page_frag: add '_va' suffix to page_frag API Yunsheng Lin
2024-04-16 16:12 ` Alexander H Duyck
2024-04-17 13:18 ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 08/15] mm: page_frag: add two inline helper for " Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 09/15] mm: page_frag: reuse MSB of 'size' field for pfmemalloc Yunsheng Lin
2024-04-16 16:22 ` Alexander H Duyck
2024-04-17 13:19 ` Yunsheng Lin
2024-04-17 15:11 ` Alexander H Duyck
2024-04-18 9:39 ` Yunsheng Lin
2024-04-26 9:38 ` Yunsheng Lin
2024-04-29 14:49 ` Alexander Duyck
2024-04-30 12:05 ` Yunsheng Lin
2024-04-30 14:54 ` Alexander Duyck
2024-05-06 12:33 ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 10/15] mm: page_frag: reuse existing bit field of 'va' for pagecnt_bias Yunsheng Lin
2024-04-16 16:33 ` Alexander H Duyck
2024-04-17 13:23 ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 12/15] mm: page_frag: introduce prepare/commit API for page_frag Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 14/15] mm: page_frag: update documentation " Yunsheng Lin
2024-04-16 6:13 ` Bagas Sanjaya
2024-04-16 13:11 ` Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6a78b9ad-0d20-a495-52ca-fac180408658@huawei.com \
--to=linyunsheng@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox