From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 486EBC4345F
	for <linux-mm@archiver.kernel.org>; Wed, 17 Apr 2024 13:17:41 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 6EE8F6B0085; Wed, 17 Apr 2024 09:17:40 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 69EEC6B0087; Wed, 17 Apr 2024 09:17:40 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 53E546B0088; Wed, 17 Apr 2024 09:17:40 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 34A5E6B0085
	for <linux-mm@kvack.org>; Wed, 17 Apr 2024 09:17:40 -0400 (EDT)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id B17651C1298
	for <linux-mm@kvack.org>; Wed, 17 Apr 2024 13:17:39 +0000 (UTC)
X-FDA: 82019075838.16.A6E93B5
Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187])
	by imf11.hostedemail.com (Postfix) with ESMTP id 64DE24000D
	for <linux-mm@kvack.org>; Wed, 17 Apr 2024 13:17:36 +0000 (UTC)
Authentication-Results: imf11.hostedemail.com;
	dkim=none;
	spf=pass (imf11.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1713359858;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=M1UKwXf4u+6j8KfXv2QkXVUyXhgvfGLb9enijGvHpms=;
	b=zeU9nUUwOat6a9DJOlkIW9TZHkbYpPMnHbnyErhudGFluircKEw/tnKdMMFapuT1TU9hlf
	K9hfDZWP47B9vK6kBD4lzG10Lo7DL3WyDHHm5wTAPX2Ejx/5cYL25RX88R/cs57x4j7iG9
	LcYwkkQEUilXLUfLxq2P5bIoQOCBDPU=
ARC-Authentication-Results: i=1;
	imf11.hostedemail.com;
	dkim=none;
	spf=pass (imf11.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713359858; a=rsa-sha256;
	cv=none;
	b=d1pe41aHkfuIaSRT7oVkw7tMJSHxAQNa1l7168310zFU2dmyIXN1JJCm5+xU37rMUyzxLS
	4Lgy044/AOtuaB29MO9TKXbruau2GCTI+rmPboEMb/KQmDRvPMd3dVyhoLwSmjiDOD6/Rq
	9zJMd2tDRhRE7rHibRsTC/jybCivErg=
Received: from mail.maildlp.com (unknown [172.19.162.254])
	by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4VKLwM3VBVzwSdy;
	Wed, 17 Apr 2024 21:14:27 +0800 (CST)
Received: from dggpemm500005.china.huawei.com (unknown [7.185.36.74])
	by mail.maildlp.com (Postfix) with ESMTPS id B29401800C3;
	Wed, 17 Apr 2024 21:17:30 +0800 (CST)
Received: from [10.69.30.204] (10.69.30.204) by dggpemm500005.china.huawei.com
 (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Wed, 17 Apr
 2024 21:17:30 +0800
Subject: Re: [PATCH net-next v2 05/15] mm: page_frag: use initial zero offset
 for page_frag_alloc_align()
To: Alexander H Duyck <alexander.duyck@gmail.com>, <davem@davemloft.net>,
	<kuba@kernel.org>, <pabeni@redhat.com>
CC: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Andrew Morton
	<akpm@linux-foundation.org>, <linux-mm@kvack.org>
References: <20240415131941.51153-1-linyunsheng@huawei.com>
 <20240415131941.51153-6-linyunsheng@huawei.com>
 <b03bca93fba5a1c1a1bef3db89df11fbc755670b.camel@gmail.com>
 <6a78b9ad-0d20-a495-52ca-fac180408658@huawei.com>
 <712b14031ca37a12c1871d1745794b1f0be0498f.camel@gmail.com>
From: Yunsheng Lin <linyunsheng@huawei.com>
Message-ID: <3da33d4c-a70e-23a4-8e00-23fe96dd0c1a@huawei.com>
Date: Wed, 17 Apr 2024 21:17:30 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.0
MIME-Version: 1.0
In-Reply-To: <712b14031ca37a12c1871d1745794b1f0be0498f.camel@gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.69.30.204]
X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To
 dggpemm500005.china.huawei.com (7.185.36.74)
X-Rspamd-Queue-Id: 64DE24000D
X-Stat-Signature: h6dc99zn118jbbfhpx8f1ksq9xki4ahm
X-Rspam-User: 
X-Rspamd-Server: rspam07
X-HE-Tag: 1713359856-786291
X-HE-Meta: U2FsdGVkX1+Pj+LHaOLxB23oNH/HTrfD5xHDzNR73sEjeuYZn07S8tVQQVfQqDE6//yGJ7vVB/O7JbIUKqVMPA43WTyl8YMg5+Rh2U1EYeYhRxzABCRETyoO4ML9b0vUvdiL6O3td5+yNCpsWpSBnQJfafWqBv9jqFF0RAUiHtWGShQjsWMN6IH7FlQt1QJPxhgkApsDXBM3S8glJ7+RK+7Jhll/2cisr/4ALMfyarj7tqbG6vJM8Q2xsJQ9nIIFgI+ldHLnP9Kkm0KjoXf9DJu7KLvFj68JQl3YLqjlpmBWALzQBvFnBp3mYdTNNFTvYiLg88gz6Ry/63+d2ShZ3Kqk8TFQvtZJ3pYK/Wd3iFCuARJA6MkPs8LxuoRU9u+3jKjSX90HeRj2sEsSlVlil4koec3RRKxA9x7FuFd/pMs0SPhHOBKCecFwMbs0vggqb/8fsUF4PRIiaRWWNBesB4FIzAKe/MdZlPWHviNv0tytJvLnaV8XaBXMGOtloB1ux1p/y6J1ywc6Bnwtpjc6HWv039H7VIe4MkTc9QTVsot5J3iuqKm+DnYWH4rVLUegs9/U/DLNQq1rCJEJ0nJNgmOxdI/EfvvliCuhUnrs1kfcuq0dd6QZRYkTal/ut7NLo/2F1R7ajVanccYyi+ob9ZbRr0D3LHpO0i8+RIMLBsxvLFEvl6bVjc2K9GOYt55KKgmiCxE04OmHe3fgaBh77Ls4Dy9isCgDDJbMFr+43D4p48hdlDZZC+8HDqmWGGj26dfw7gODz9+K/yEVpBksuttWzPrsbROy1bkUO5Mb2Byj3shWkg0hjRP+Q0+8pYlIUoZtQ8kmywpXaIvO9qu5E9avN8FUtPW7X4vZa+Tcr5kQsfzYb0JeDsA1/jOyiGyeaGwcLoOxNOnBOSCN6KjuP/D/jE1lT5+GuY9za4KIljVDtPvLwqh1IudGeXGXqNWXvFpWN0YhOmADFOvg21/
 zQuCaTWu
 VDg+la+PtBhABHfcs84uyO6Bq/dxZvvum3zEwbeTCcgj3EaozjZ+XNRdteKiXwT63/Ro8xbAQuCgGONxqFIoWq1NpS4P7zK7UDZkB5H+g3cO3Wj4rTAwBc4gqaNNNPHMkUkhFDPm6//Hl8VtL6T5NddWCUlsWKwXRehEdKsDITnL6gzyOyGEos+JEJgXNn0nm4f3ZWC45ihWd4EiBb4ZJ77J9weslMIitI+bA6OfeqG+5hZuLs+iITz2HUe+g0zmGC2JfBOI8RGs+T+P2WztJkOQlet8afryZip6NM1ewdViObCS+HZUtlfk/b9wepu48kcdYys73/TFxVd78fCBns481Yd55OD4uE3Wk1rpoYcf5Y6Rzj/HEv+9KBNQLw2ut98eUycu5iS5bQl7ftFfU13o+rDACoAc99NgH
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 2024/4/16 23:51, Alexander H Duyck wrote:
> On Tue, 2024-04-16 at 21:11 +0800, Yunsheng Lin wrote:
>> On 2024/4/16 7:55, Alexander H Duyck wrote:
>>> On Mon, 2024-04-15 at 21:19 +0800, Yunsheng Lin wrote:
>>>> We are above to use page_frag_alloc_*() API to not just
>>>> allocate memory for skb->data, but also use them to do
>>>> the memory allocation for skb frag too. Currently the
>>>> implementation of page_frag in mm subsystem is running
>>>> the offset as a countdown rather than count-up value,
>>>> there may have several advantages to that as mentioned
>>>> in [1], but it may have some disadvantages, for example,
>>>> it may disable skb frag coaleasing and more correct cache
>>>> prefetching
>>>>
>>>> We have a trade-off to make in order to have a unified
>>>> implementation and API for page_frag, so use a initial zero
>>>> offset in this patch, and the following patch will try to
>>>> make some optimization to aovid the disadvantages as much
>>>> as possible.
>>>>
>>>> 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/
>>>>
>>>> CC: Alexander Duyck <alexander.duyck@gmail.com>
>>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>>>> ---
>>>>  mm/page_frag_cache.c | 31 ++++++++++++++-----------------
>>>>  1 file changed, 14 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
>>>> index 64993b5d1243..dc864ee09536 100644
>>>> --- a/mm/page_frag_cache.c
>>>> +++ b/mm/page_frag_cache.c
>>>> @@ -65,9 +65,8 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>>>>  			      unsigned int fragsz, gfp_t gfp_mask,
>>>>  			      unsigned int align_mask)
>>>>  {
>>>> -	unsigned int size = PAGE_SIZE;
>>>> +	unsigned int size, offset;
>>>>  	struct page *page;
>>>> -	int offset;
>>>>  
>>>>  	if (unlikely(!nc->va)) {
>>>>  refill:
>>>> @@ -75,10 +74,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>>>>  		if (!page)
>>>>  			return NULL;
>>>>  
>>>> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>>>> -		/* if size can vary use size else just use PAGE_SIZE */
>>>> -		size = nc->size;
>>>> -#endif
>>>>  		/* Even if we own the page, we do not use atomic_set().
>>>>  		 * This would break get_page_unless_zero() users.
>>>>  		 */
>>>> @@ -87,11 +82,18 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>>>>  		/* reset page count bias and offset to start of new frag */
>>>>  		nc->pfmemalloc = page_is_pfmemalloc(page);
>>>>  		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>>>> -		nc->offset = size;
>>>> +		nc->offset = 0;
>>>>  	}
>>>>  
>>>> -	offset = nc->offset - fragsz;
>>>> -	if (unlikely(offset < 0)) {
>>>> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>>>> +	/* if size can vary use size else just use PAGE_SIZE */
>>>> +	size = nc->size;
>>>> +#else
>>>> +	size = PAGE_SIZE;
>>>> +#endif
>>>> +
>>>> +	offset = ALIGN(nc->offset, -align_mask);
>>>
>>> I am not sure if using -align_mask here with the ALIGN macro is really
>>> to your benefit. I would be curious what the compiler is generating.
>>>
>>> Again, I think you would be much better off with:
>>> 	offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
>>>
>>> That will save you a number of conversions as the use of the ALIGN
>>> macro gives you:
>>> 	offset = (nc->offset + (-align_mask - 1)) & ~(-align_mask -
>>> 1);
>>>
>>> whereas what I am suggesting gives you:
>>> 	offset = (nc->offset + ~align_mask) & ~(~align_mask));
>>>
>>> My main concern is that I am not sure the compiler will optimize around
>>> the combination of bit operations and arithmetic operations. It seems
>>> much cleaner to me to stick to the bitwise operations for the alignment
>>> than to force this into the vhost approach which requires a power of 2
>>> aligned mask.
>>
>> My argument about the above is in [1]. But since you seems to not be working
>> through the next patch yet, I might just do it as you suggested in the next
>> version so that I don't have to repeat my argument again:(
>>
>> 1. https://lore.kernel.org/all/df826acf-8867-7eb6-e7f0-962c106bc28b@huawei.com/
> 
> Sorry, I didn't have time to go digging through the mailing list to
> review all the patches from the last set. I was only Cced on a few of

I thought adding 'CC: Alexander Duyck <alexander.duyck@gmail.com>' in
the cover letter would enable the git sendmail to send all the patches
to a specific email, apparently it did not. And I seems to only add that
in rfc and v1, but forgot to add it in the newest v2 version:(

> them as I recall. As you know I have the fbnic patches I also have been
> trying to get pushed out so that was my primary focus the last couple
> weeks.

Understood.

> 
> That said, this just goes into my earlier complaints. You are now
> optimizing for the non-aligned paths. There are few callers that are
> asking for this to provide non-aligned segments. In most cases they are

I suppose that 'optimizing for the non-aligned paths' is referring to
doing the data alignment in a inline helper for aligned API caller and
avoid doing the data alignment for non-aligned API caller in the patch 6?

For the existing user, it seems there are more callers for the non-aligned
API than callers for aligned API:

Referenced in 13 files:
https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/napi_alloc_frag

Referenced in 2 files:
https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/napi_alloc_frag_align

Referenced in 15 files:
https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/netdev_alloc_frag

No references found in the database
https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/netdev_alloc_frag_align

Referenced in 6 files:
https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/page_frag_alloc

Referenced in 3 files:
https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/page_frag_alloc_align

And we are adding new users mostly asking for non-aligned segments in
patch 13.

I would argue that it is not about optimizing for the non-aligned paths,
it is about avoid doing the data alignment operation for non-aligned API.

> at least cache aligned. Specifically the __netdev_alloc_frag_align and
> __napi_alloc_frag_align are aligning things at a minimum to
> SMP_CACHE_BYTES by aligning the fragsz argument using SKB_DATA_ALIGN.

It seems the above is doing the aligning operation for fragsz, most of
callers are calling __netdev_alloc_frag_align() and __napi_alloc_frag_align()
with align_mask being ~0u.

> Perhaps it would be better to actually incorporate that alignment
> guarantee into the calls themselves by doing an &= with the align_mask
> request for those two functions to make this more transparent.

Did you means doing something like below for fragsz too?
fragsz = __ALIGN_KERNEL_MASK(fragsz, ~align_mask);

> 
>>>
>>> Also the old code was aligning on the combination of offset AND fragsz.
>>> This new logic is aligning on offset only. Do we run the risk of
>>> overwriting blocks of neighbouring fragments if two users of
>>> napi_alloc_frag_align end up passing arguments that have different
>>> alignment values?
>>
>> I am not sure I understand the question here.
>> As my understanding, both the old code and new code is aligning on
>> the offset, and both might have space reserved before the offset
>> due to aligning. The memory returned to the caller is in the range
>> of [offset, offset + fragsz). Am I missing something obvious here?
> 
> My main concern is that by aligning offset - fragsz by the alignment
> mask we were taking care of all our variables immediately ourselves. If
> we didn't provide a correct value it was all traceable to one call as
> the assumption was that fragsz would be a multiple of the alignment
> value.
> 
> With your change the alignment is done in the following call. So now it
> splits up the alignment of fragsz from the alignment of the offset. As
> such we will probably need to add additional overhead to guarantee
> fragsz is a multiple of the alignment.

I am not thinking it through how the above will affect the API caller yet
if different caller is passing different alignment for the same
'page_frag_cache' instance, does it cause some cache bouncing or dma
issue if used for dma?

I am supposing it depends on what alignment semantics are we providing
here:
1. Ensure alignment for both offset and fragsz.
2. Ensure alignment for offset only.
3. Ensure alignment for fragsz only.

It seems you are in favour of option 1?
I am supposing it is a balance between performance and API flexibility
here? If it is possible to enforce the caller to use the same alignment
for the same 'page_frag_cache' instance, and give a warning if it is
not using the same alignment? So that we only need to ensure alignment
for offset or fragsz, but not both of them.

I am not sure if there is a strong use case to support both alignment
for offset and fragsz, we might create a new API for it if it is a
strong use case?