From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 486EBC4345F for ; Wed, 17 Apr 2024 13:17:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6EE8F6B0085; Wed, 17 Apr 2024 09:17:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 69EEC6B0087; Wed, 17 Apr 2024 09:17:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53E546B0088; Wed, 17 Apr 2024 09:17:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 34A5E6B0085 for ; Wed, 17 Apr 2024 09:17:40 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B17651C1298 for ; Wed, 17 Apr 2024 13:17:39 +0000 (UTC) X-FDA: 82019075838.16.A6E93B5 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf11.hostedemail.com (Postfix) with ESMTP id 64DE24000D for ; Wed, 17 Apr 2024 13:17:36 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713359858; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M1UKwXf4u+6j8KfXv2QkXVUyXhgvfGLb9enijGvHpms=; b=zeU9nUUwOat6a9DJOlkIW9TZHkbYpPMnHbnyErhudGFluircKEw/tnKdMMFapuT1TU9hlf K9hfDZWP47B9vK6kBD4lzG10Lo7DL3WyDHHm5wTAPX2Ejx/5cYL25RX88R/cs57x4j7iG9 LcYwkkQEUilXLUfLxq2P5bIoQOCBDPU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713359858; a=rsa-sha256; cv=none; b=d1pe41aHkfuIaSRT7oVkw7tMJSHxAQNa1l7168310zFU2dmyIXN1JJCm5+xU37rMUyzxLS 4Lgy044/AOtuaB29MO9TKXbruau2GCTI+rmPboEMb/KQmDRvPMd3dVyhoLwSmjiDOD6/Rq 9zJMd2tDRhRE7rHibRsTC/jybCivErg= Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4VKLwM3VBVzwSdy; Wed, 17 Apr 2024 21:14:27 +0800 (CST) Received: from dggpemm500005.china.huawei.com (unknown [7.185.36.74]) by mail.maildlp.com (Postfix) with ESMTPS id B29401800C3; Wed, 17 Apr 2024 21:17:30 +0800 (CST) Received: from [10.69.30.204] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Wed, 17 Apr 2024 21:17:30 +0800 Subject: Re: [PATCH net-next v2 05/15] mm: page_frag: use initial zero offset for page_frag_alloc_align() To: Alexander H Duyck , , , CC: , , Andrew Morton , References: <20240415131941.51153-1-linyunsheng@huawei.com> <20240415131941.51153-6-linyunsheng@huawei.com> <6a78b9ad-0d20-a495-52ca-fac180408658@huawei.com> <712b14031ca37a12c1871d1745794b1f0be0498f.camel@gmail.com> From: Yunsheng Lin Message-ID: <3da33d4c-a70e-23a4-8e00-23fe96dd0c1a@huawei.com> Date: Wed, 17 Apr 2024 21:17:30 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <712b14031ca37a12c1871d1745794b1f0be0498f.camel@gmail.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500005.china.huawei.com (7.185.36.74) X-Rspamd-Queue-Id: 64DE24000D X-Stat-Signature: h6dc99zn118jbbfhpx8f1ksq9xki4ahm X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1713359856-786291 X-HE-Meta: U2FsdGVkX1+Pj+LHaOLxB23oNH/HTrfD5xHDzNR73sEjeuYZn07S8tVQQVfQqDE6//yGJ7vVB/O7JbIUKqVMPA43WTyl8YMg5+Rh2U1EYeYhRxzABCRETyoO4ML9b0vUvdiL6O3td5+yNCpsWpSBnQJfafWqBv9jqFF0RAUiHtWGShQjsWMN6IH7FlQt1QJPxhgkApsDXBM3S8glJ7+RK+7Jhll/2cisr/4ALMfyarj7tqbG6vJM8Q2xsJQ9nIIFgI+ldHLnP9Kkm0KjoXf9DJu7KLvFj68JQl3YLqjlpmBWALzQBvFnBp3mYdTNNFTvYiLg88gz6Ry/63+d2ShZ3Kqk8TFQvtZJ3pYK/Wd3iFCuARJA6MkPs8LxuoRU9u+3jKjSX90HeRj2sEsSlVlil4koec3RRKxA9x7FuFd/pMs0SPhHOBKCecFwMbs0vggqb/8fsUF4PRIiaRWWNBesB4FIzAKe/MdZlPWHviNv0tytJvLnaV8XaBXMGOtloB1ux1p/y6J1ywc6Bnwtpjc6HWv039H7VIe4MkTc9QTVsot5J3iuqKm+DnYWH4rVLUegs9/U/DLNQq1rCJEJ0nJNgmOxdI/EfvvliCuhUnrs1kfcuq0dd6QZRYkTal/ut7NLo/2F1R7ajVanccYyi+ob9ZbRr0D3LHpO0i8+RIMLBsxvLFEvl6bVjc2K9GOYt55KKgmiCxE04OmHe3fgaBh77Ls4Dy9isCgDDJbMFr+43D4p48hdlDZZC+8HDqmWGGj26dfw7gODz9+K/yEVpBksuttWzPrsbROy1bkUO5Mb2Byj3shWkg0hjRP+Q0+8pYlIUoZtQ8kmywpXaIvO9qu5E9avN8FUtPW7X4vZa+Tcr5kQsfzYb0JeDsA1/jOyiGyeaGwcLoOxNOnBOSCN6KjuP/D/jE1lT5+GuY9za4KIljVDtPvLwqh1IudGeXGXqNWXvFpWN0YhOmADFOvg21/ zQuCaTWu VDg+la+PtBhABHfcs84uyO6Bq/dxZvvum3zEwbeTCcgj3EaozjZ+XNRdteKiXwT63/Ro8xbAQuCgGONxqFIoWq1NpS4P7zK7UDZkB5H+g3cO3Wj4rTAwBc4gqaNNNPHMkUkhFDPm6//Hl8VtL6T5NddWCUlsWKwXRehEdKsDITnL6gzyOyGEos+JEJgXNn0nm4f3ZWC45ihWd4EiBb4ZJ77J9weslMIitI+bA6OfeqG+5hZuLs+iITz2HUe+g0zmGC2JfBOI8RGs+T+P2WztJkOQlet8afryZip6NM1ewdViObCS+HZUtlfk/b9wepu48kcdYys73/TFxVd78fCBns481Yd55OD4uE3Wk1rpoYcf5Y6Rzj/HEv+9KBNQLw2ut98eUycu5iS5bQl7ftFfU13o+rDACoAc99NgH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/4/16 23:51, Alexander H Duyck wrote: > On Tue, 2024-04-16 at 21:11 +0800, Yunsheng Lin wrote: >> On 2024/4/16 7:55, Alexander H Duyck wrote: >>> On Mon, 2024-04-15 at 21:19 +0800, Yunsheng Lin wrote: >>>> We are above to use page_frag_alloc_*() API to not just >>>> allocate memory for skb->data, but also use them to do >>>> the memory allocation for skb frag too. Currently the >>>> implementation of page_frag in mm subsystem is running >>>> the offset as a countdown rather than count-up value, >>>> there may have several advantages to that as mentioned >>>> in [1], but it may have some disadvantages, for example, >>>> it may disable skb frag coaleasing and more correct cache >>>> prefetching >>>> >>>> We have a trade-off to make in order to have a unified >>>> implementation and API for page_frag, so use a initial zero >>>> offset in this patch, and the following patch will try to >>>> make some optimization to aovid the disadvantages as much >>>> as possible. >>>> >>>> 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/ >>>> >>>> CC: Alexander Duyck >>>> Signed-off-by: Yunsheng Lin >>>> --- >>>> mm/page_frag_cache.c | 31 ++++++++++++++----------------- >>>> 1 file changed, 14 insertions(+), 17 deletions(-) >>>> >>>> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c >>>> index 64993b5d1243..dc864ee09536 100644 >>>> --- a/mm/page_frag_cache.c >>>> +++ b/mm/page_frag_cache.c >>>> @@ -65,9 +65,8 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, >>>> unsigned int fragsz, gfp_t gfp_mask, >>>> unsigned int align_mask) >>>> { >>>> - unsigned int size = PAGE_SIZE; >>>> + unsigned int size, offset; >>>> struct page *page; >>>> - int offset; >>>> >>>> if (unlikely(!nc->va)) { >>>> refill: >>>> @@ -75,10 +74,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, >>>> if (!page) >>>> return NULL; >>>> >>>> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >>>> - /* if size can vary use size else just use PAGE_SIZE */ >>>> - size = nc->size; >>>> -#endif >>>> /* Even if we own the page, we do not use atomic_set(). >>>> * This would break get_page_unless_zero() users. >>>> */ >>>> @@ -87,11 +82,18 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, >>>> /* reset page count bias and offset to start of new frag */ >>>> nc->pfmemalloc = page_is_pfmemalloc(page); >>>> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; >>>> - nc->offset = size; >>>> + nc->offset = 0; >>>> } >>>> >>>> - offset = nc->offset - fragsz; >>>> - if (unlikely(offset < 0)) { >>>> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >>>> + /* if size can vary use size else just use PAGE_SIZE */ >>>> + size = nc->size; >>>> +#else >>>> + size = PAGE_SIZE; >>>> +#endif >>>> + >>>> + offset = ALIGN(nc->offset, -align_mask); >>> >>> I am not sure if using -align_mask here with the ALIGN macro is really >>> to your benefit. I would be curious what the compiler is generating. >>> >>> Again, I think you would be much better off with: >>> offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask); >>> >>> That will save you a number of conversions as the use of the ALIGN >>> macro gives you: >>> offset = (nc->offset + (-align_mask - 1)) & ~(-align_mask - >>> 1); >>> >>> whereas what I am suggesting gives you: >>> offset = (nc->offset + ~align_mask) & ~(~align_mask)); >>> >>> My main concern is that I am not sure the compiler will optimize around >>> the combination of bit operations and arithmetic operations. It seems >>> much cleaner to me to stick to the bitwise operations for the alignment >>> than to force this into the vhost approach which requires a power of 2 >>> aligned mask. >> >> My argument about the above is in [1]. But since you seems to not be working >> through the next patch yet, I might just do it as you suggested in the next >> version so that I don't have to repeat my argument again:( >> >> 1. https://lore.kernel.org/all/df826acf-8867-7eb6-e7f0-962c106bc28b@huawei.com/ > > Sorry, I didn't have time to go digging through the mailing list to > review all the patches from the last set. I was only Cced on a few of I thought adding 'CC: Alexander Duyck ' in the cover letter would enable the git sendmail to send all the patches to a specific email, apparently it did not. And I seems to only add that in rfc and v1, but forgot to add it in the newest v2 version:( > them as I recall. As you know I have the fbnic patches I also have been > trying to get pushed out so that was my primary focus the last couple > weeks. Understood. > > That said, this just goes into my earlier complaints. You are now > optimizing for the non-aligned paths. There are few callers that are > asking for this to provide non-aligned segments. In most cases they are I suppose that 'optimizing for the non-aligned paths' is referring to doing the data alignment in a inline helper for aligned API caller and avoid doing the data alignment for non-aligned API caller in the patch 6? For the existing user, it seems there are more callers for the non-aligned API than callers for aligned API: Referenced in 13 files: https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/napi_alloc_frag Referenced in 2 files: https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/napi_alloc_frag_align Referenced in 15 files: https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/netdev_alloc_frag No references found in the database https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/netdev_alloc_frag_align Referenced in 6 files: https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/page_frag_alloc Referenced in 3 files: https://elixir.bootlin.com/linux/v6.7-rc8/A/ident/page_frag_alloc_align And we are adding new users mostly asking for non-aligned segments in patch 13. I would argue that it is not about optimizing for the non-aligned paths, it is about avoid doing the data alignment operation for non-aligned API. > at least cache aligned. Specifically the __netdev_alloc_frag_align and > __napi_alloc_frag_align are aligning things at a minimum to > SMP_CACHE_BYTES by aligning the fragsz argument using SKB_DATA_ALIGN. It seems the above is doing the aligning operation for fragsz, most of callers are calling __netdev_alloc_frag_align() and __napi_alloc_frag_align() with align_mask being ~0u. > Perhaps it would be better to actually incorporate that alignment > guarantee into the calls themselves by doing an &= with the align_mask > request for those two functions to make this more transparent. Did you means doing something like below for fragsz too? fragsz = __ALIGN_KERNEL_MASK(fragsz, ~align_mask); > >>> >>> Also the old code was aligning on the combination of offset AND fragsz. >>> This new logic is aligning on offset only. Do we run the risk of >>> overwriting blocks of neighbouring fragments if two users of >>> napi_alloc_frag_align end up passing arguments that have different >>> alignment values? >> >> I am not sure I understand the question here. >> As my understanding, both the old code and new code is aligning on >> the offset, and both might have space reserved before the offset >> due to aligning. The memory returned to the caller is in the range >> of [offset, offset + fragsz). Am I missing something obvious here? > > My main concern is that by aligning offset - fragsz by the alignment > mask we were taking care of all our variables immediately ourselves. If > we didn't provide a correct value it was all traceable to one call as > the assumption was that fragsz would be a multiple of the alignment > value. > > With your change the alignment is done in the following call. So now it > splits up the alignment of fragsz from the alignment of the offset. As > such we will probably need to add additional overhead to guarantee > fragsz is a multiple of the alignment. I am not thinking it through how the above will affect the API caller yet if different caller is passing different alignment for the same 'page_frag_cache' instance, does it cause some cache bouncing or dma issue if used for dma? I am supposing it depends on what alignment semantics are we providing here: 1. Ensure alignment for both offset and fragsz. 2. Ensure alignment for offset only. 3. Ensure alignment for fragsz only. It seems you are in favour of option 1? I am supposing it is a balance between performance and API flexibility here? If it is possible to enforce the caller to use the same alignment for the same 'page_frag_cache' instance, and give a warning if it is not using the same alignment? So that we only need to ensure alignment for offset or fragsz, but not both of them. I am not sure if there is a strong use case to support both alignment for offset and fragsz, we might create a new API for it if it is a strong use case?