From: Yunsheng Lin <linyunsheng@huawei.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>,
<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH net-next 3/6] mm/page_alloc: use initial zero offset for page_frag_alloc_align()
Date: Wed, 10 Jan 2024 17:45:11 +0800 [thread overview]
Message-ID: <b77ef32e-64b9-2e4f-8041-ccb46dea4caa@huawei.com> (raw)
In-Reply-To: <CAKgT0UcujEktOnHx7mxWd+Jah1J9mHFWnTx35vc3x25uUadxaA@mail.gmail.com>
On 2024/1/9 23:37, Alexander Duyck wrote:
> On Tue, Jan 9, 2024 at 3:22 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2024/1/9 0:25, Alexander Duyck wrote:
>>> On Mon, Jan 8, 2024 at 12:59 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> ...
>>
>>>
>>>>>
>>>>> 2. By starting at the end and working toward zero we can use built in
>>>>> functionality of the CPU to only have to check and see if our result
>>>>> would be signed rather than having to load two registers with the
>>>>> values and then compare them which saves us a few cycles. In addition
>>>>> it saves us from having to read both the size and the offset for every
>>>>> page.
>>>>
>>>> I suppose the above is ok if we only use the page_frag_alloc*() API to
>>>> allocate memory for skb->data, not for the frag in skb_shinfo(), as by
>>>> starting at the end and working toward zero, it means we can not do skb
>>>> coalescing.
>>>>
>>>> As page_frag_alloc*() is returning va now, I am assuming most of users
>>>> is using the API for skb->data, I guess it is ok to drop this patch for
>>>> now.
>>>>
>>>> If we allow page_frag_alloc*() to return struct page, we might need this
>>>> patch to enable coalescing.
>>>
>>> I would argue this is not the interface for enabling coalescing. This
>>> is one of the reasons why this is implemented the way it is. When you
>>> are aligning fragments you aren't going to be able to coalesce the
>>> frames anyway as the alignment would push the fragments apart.
>>
>> It seems the alignment requirement is the same for the same user of a page_frag
>> instance, so the aligning does not seem to be a problem for coalescing?
>
> I'm a bit confused as to what coalescing you are referring to. If you
> can provide a link it would be useful.
>
> The problem is page_frag is a very generic item and can be generated
> from a regular page on NICs that can internally reuse the same page
> instance for multiple buffers. So it is possible to coalesce page
> frags, however it is very unlikely to be coalescing them in the case
> of them being used for skb buffers since it would require aligned
> payloads on the network in order to really make it work without
> hardware intervention of some sort and on such devices they are likely
> allocating entire pages instead of page frags for the buffers.
The main usecase in my mind is the page_frag used in the tx part for
networking if we are able to unify the page_frag and page_frag_cache in
the future:
https://elixir.bootlin.com/linux/v6.7-rc8/source/net/ipv4/tcp.c#L1183
Do you think if it makes sense to unify them using below unified struct,
and provide API for returning 'page' and 'va' as page_pool does now?
It may mean we need to add one pointer to the new struct and are not able
do some trick for performance, I suppose that is ok as there are always
some trade off for maintainability and evolvability?
struct page_frag {
struct *page;
void *va;
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
__u16 offset;
__u16 size;
#else
__u32 offset;
#endif
/* we maintain a pagecount bias, so that we dont dirty cache line
* containing page->_refcount every time we allocate a fragment.
*/
unsigned int pagecnt_bias;
bool pfmemalloc;
};
Another usecase that is not really related is: hw may be configured with
a small BD buf size, for 2K and configured with a big mtu size or have
hw gro enabled, for 4K pagesize, that means we may be able to reduce the
number of the frag num to half as it is usually the case that two
consecutive BD pointing to the same page. I implemented a POC in hns3
long time ago using the frag implememtation in page_pool, it did show
some obvious peformance gain, But as the priority shifts, I have not
been able to continue that POC yet.
>
> .
>
next prev parent reply other threads:[~2024-01-10 9:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240103095650.25769-1-linyunsheng@huawei.com>
2024-01-03 9:56 ` [PATCH net-next 1/6] mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument Yunsheng Lin
2024-01-05 15:28 ` Alexander H Duyck
2024-01-08 8:20 ` Yunsheng Lin
2024-01-03 9:56 ` [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation Yunsheng Lin
2024-01-05 15:35 ` Alexander H Duyck
2024-01-08 8:25 ` Yunsheng Lin
2024-01-08 16:13 ` Alexander Duyck
2024-01-03 9:56 ` [PATCH net-next 3/6] mm/page_alloc: use initial zero offset for page_frag_alloc_align() Yunsheng Lin
2024-01-05 15:42 ` Alexander H Duyck
2024-01-08 8:59 ` Yunsheng Lin
2024-01-08 16:25 ` Alexander Duyck
2024-01-09 11:22 ` Yunsheng Lin
2024-01-09 15:37 ` Alexander Duyck
2024-01-10 9:45 ` Yunsheng Lin [this message]
2024-01-10 16:21 ` Alexander Duyck
2024-01-11 12:37 ` Yunsheng Lin
2024-01-03 9:56 ` [PATCH net-next 5/6] net: introduce page_frag_cache_drain() Yunsheng Lin
2024-01-05 15:48 ` Alexander H Duyck
[not found] <20231205113444.63015-1-linyunsheng@huawei.com>
2023-12-05 11:34 ` [PATCH net-next 3/6] mm/page_alloc: use initial zero offset for page_frag_alloc_align() Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b77ef32e-64b9-2e4f-8041-ccb46dea4caa@huawei.com \
--to=linyunsheng@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox