From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ACE7EC3DA49
	for <linux-mm@archiver.kernel.org>; Tue, 23 Jul 2024 13:40:34 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3293F6B009B; Tue, 23 Jul 2024 09:40:34 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2D7576B009C; Tue, 23 Jul 2024 09:40:34 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 19F8A6B009D; Tue, 23 Jul 2024 09:40:34 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id EFB756B009B
	for <linux-mm@kvack.org>; Tue, 23 Jul 2024 09:40:33 -0400 (EDT)
Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id 45661121F84
	for <linux-mm@kvack.org>; Tue, 23 Jul 2024 13:40:33 +0000 (UTC)
X-FDA: 82371127146.15.7EE97C3
Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189])
	by imf22.hostedemail.com (Postfix) with ESMTP id C6351C1B78
	for <linux-mm@kvack.org>; Tue, 23 Jul 2024 13:19:49 +0000 (UTC)
Authentication-Results: imf22.hostedemail.com;
	dkim=none;
	spf=pass (imf22.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1721740768;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=EmQ7KbGwzb7BUbQ6TsjNbVA8w1QeASKdahwCqcVxKZ4=;
	b=Qidsb2/EACa44P2s5U/AWJ/yP6gJ310+rKPieqSoP8NeCOWpb3bKck+2IVfGQ/pqgIUTtJ
	j7l9JUdpR2Pv8bt8LBxi/XndscdYtjjzrT7kIHhNq3EjyLS+li2QjOk1wgAc13jLoenORl
	pgGTD8dqrnbvA/YwlodhdmwpA+vJqcA=
ARC-Authentication-Results: i=1;
	imf22.hostedemail.com;
	dkim=none;
	spf=pass (imf22.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721740768; a=rsa-sha256;
	cv=none;
	b=DRmPcFhFWb/Y5WKpC/4lbV1avsGCV69SvQaIRjr62aGXy/nRJo8joDu2cVh1vQQJSa9Ulb
	sSoCEIT+Gx5yqXjVuHbhSic0noHJwp5Ga5nF9cLQnvc4ATDWoDdvoSv8hBxhFjmVL1GaWQ
	IXhJRz87gGxCvywtvGuIM9xpJhbRsHc=
Received: from mail.maildlp.com (unknown [172.19.162.254])
	by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4WSyLw1wslzQmbK;
	Tue, 23 Jul 2024 21:15:36 +0800 (CST)
Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61])
	by mail.maildlp.com (Postfix) with ESMTPS id 7EA3C180102;
	Tue, 23 Jul 2024 21:19:46 +0800 (CST)
Received: from [10.67.120.129] (10.67.120.129) by
 dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1544.11; Tue, 23 Jul 2024 21:19:46 +0800
Message-ID: <5a0e12c1-0e98-426a-ab4d-50de2b09f36f@huawei.com>
Date: Tue, 23 Jul 2024 21:19:46 +0800
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [RFC v11 08/14] mm: page_frag: some minor refactoring before
 adding new API
To: Alexander Duyck <alexander.duyck@gmail.com>
CC: <davem@davemloft.net>, <kuba@kernel.org>, <pabeni@redhat.com>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Andrew Morton
	<akpm@linux-foundation.org>, <linux-mm@kvack.org>
References: <20240719093338.55117-1-linyunsheng@huawei.com>
 <20240719093338.55117-9-linyunsheng@huawei.com>
 <dbf876b000158aed8380d6ac3a3f6e8dd40ace7b.camel@gmail.com>
 <fdc778be-907a-49bd-bf10-086f45716181@huawei.com>
 <CAKgT0UeQ9gwYo7qttak0UgXC9+kunO2gedm_yjtPiMk4VJp9yQ@mail.gmail.com>
Content-Language: en-US
From: Yunsheng Lin <linyunsheng@huawei.com>
In-Reply-To: <CAKgT0UeQ9gwYo7qttak0UgXC9+kunO2gedm_yjtPiMk4VJp9yQ@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.67.120.129]
X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To
 dggpemf200006.china.huawei.com (7.185.36.61)
X-Rspamd-Server: rspam11
X-Rspamd-Queue-Id: C6351C1B78
X-Stat-Signature: bxnywsg7tc54dijy9whm68fzxjbfmdy4
X-Rspam-User: 
X-HE-Tag: 1721740789-979593
X-HE-Meta: U2FsdGVkX19eqEZ2H8orq0INZ/VfVcP0i+JDSZkc87o3fiol9V4nE4M7dKElJOsfkPqxl7WPKI6VK8FyLnoM7nkZSQCXl6lRCgS7bBCCsAUM+WAsVjRRRNgxCmkBvCIVr/pVjW8ZevGwP3o6/JSDAH/SZUxl/THj3c44VXpNgohjzy4BkLwv32QdzxzDXZe1SpYjBmygZJSu86/6t9jmrstTfEyKvqt6gfGA97AfP67b0kUT3qLnB5qWx1vqP6844xw9SEOX+Xv9MvJZ34xjpiwtex2tmWKztH9Tu8rDTlp3tNJYLEC8XRIf74G9Dsr6WKZopyX9JZbV2VzLYSb+dVKY5LUvaE4Me5PWFFMH64VmR1KRGFzWjhg3ximpQwc6LOlv+dQI66xi8FEl9M5G1leVpfDszKrd8LLgSoQ9PjtWGmL0tvh4UmZ97oMyL2K3XaQJHm1ZWet7T1ANYXkz9SLRcp+SpGu2IDkHuy73RdsZokUoLmVWekomyGPlbk8bfFy5l1Dsp03ph0EXRMR0yBE0JQKBCd418cTkar4qTRPk+fEd6bRz3hzQ0Wz2hZR7azrGJsJcAgG4ykLLjISqjNUlbwk8EYqOLF2AT6ifTEMwM9ZHJmfmWNzzHMNRQawYqY8/MUvzSKApArKJEc8MYuoJszQhQXStiLWZfBt2hqY+nHkfI1sL/7TJwfAgO42I8GYz7jKJ1tYNiYkMSB/s+bshMy7T4k1Rk9XMvpeq2JV6CjXTqJqG6iK3vHdnwdjACCfNVlF2aIqTvSl3ccpmAZfp57/+oQorEpUz7RRgFIug09q0DLrFYGvcO0kqo/lHhqjwUpgC982V9yiqM3GReEsu4bQnABloHv1a0J6ewos01MYdyoQteubRCf9VGKZut2MJGAFnuT6UhbsIhDjxfJHFWsqjR0LTCOQH6c9l8SxlF6YH2N1Ef4nordtzj+nPEQoPTMzNx2y5U3aNz+J
 ckbqjb4g
 aU4OHSgw6zxIYq0Z38AZUqzNdozsFOY9l3xtwCes6vAveVfA5lcroyIbLd4r4RE6CPM5fl7xWEdT0xbUUNPvfAStrZx/gNAOs7d2ALR8Aa65zBXnifW6A8rSQrV/RU0KrHLmS/9wsXJz4ALLdO36uF9VpNV42gZqgP6rUUU3uARJW6HqP8pmHOoTD69/DYor4p+Fn5xKqOp00uUmbz55z1J11UkRbFhVgtF/0HcrwDN/GLts8QkLvBeGpbqjU2o1yC+pAbcroE1k/p4PFE7w+imiHz7RkwGf5v89duySzOoIGTYG5lDxHXEb/hq7YSgmBCHtRg1Foy8naZD2c3M4Qtp+PP0xV4dNQB0Z2poSsC4aFJU8=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 2024/7/22 23:32, Alexander Duyck wrote:
> On Mon, Jul 22, 2024 at 5:55 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2024/7/22 7:40, Alexander H Duyck wrote:
>>> On Fri, 2024-07-19 at 17:33 +0800, Yunsheng Lin wrote:
>>>> Refactor common codes from __page_frag_alloc_va_align()
>>>> to __page_frag_cache_refill(), so that the new API can
>>>> make use of them.
>>>>
>>>> CC: Alexander Duyck <alexander.duyck@gmail.com>
>>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>>>> ---
>>>>  include/linux/page_frag_cache.h |  2 +-
>>>>  mm/page_frag_cache.c            | 93 +++++++++++++++++----------------
>>>>  2 files changed, 49 insertions(+), 46 deletions(-)
>>>>
>>>> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
>>>> index 12a16f8e8ad0..5aa45de7a9a5 100644
>>>> --- a/include/linux/page_frag_cache.h
>>>> +++ b/include/linux/page_frag_cache.h
>>>> @@ -50,7 +50,7 @@ static inline void *encoded_page_address(unsigned long encoded_va)
>>>>
>>>>  static inline void page_frag_cache_init(struct page_frag_cache *nc)
>>>>  {
>>>> -    nc->encoded_va = 0;
>>>> +    memset(nc, 0, sizeof(*nc));
>>>>  }
>>>>
>>>
>>> I do not like requiring the entire structure to be reset as a part of
>>> init. If encoded_va is 0 then we have reset the page and the flags.
>>> There shouldn't be anything else we need to reset as remaining and bias
>>> will be reset when we reallocate.
>>
>> The argument is about aoviding one checking for fast path by doing the
>> memset in the slow path, which you might already know accroding to your
>> comment in previous version.
>>
>> It is just sometimes hard to understand your preference for maintainability
>> over performance here as sometimes your comment seems to perfer performance
>> over maintainability, like the LEA trick you mentioned and offset count-down
>> before this patchset. It would be good to be more consistent about this,
>> otherwise it is sometimes confusing when doing the refactoring.
> 
> The use of a negative offset is arguably more maintainable in my mind
> rather than being a performance trick. Essentially if you use the
> negative value you can just mask off the upper bits and it is the
> offset in the page. As such it is actually easier for me to read
> versus "remaining" which is an offset from the end of the page.
> Assuming you read the offset in hex anyway.

Reading the above doesn't seems maintainable to me:(

> 
>>>
>>>>  static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc)
>>>> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
>>>> index 7928e5d50711..d9c9cad17af7 100644
>>>> --- a/mm/page_frag_cache.c
>>>> +++ b/mm/page_frag_cache.c
>>>> @@ -19,6 +19,28 @@
>>>>  #include <linux/page_frag_cache.h>
>>>>  #include "internal.h"
>>>>
>>>> +static struct page *__page_frag_cache_recharge(struct page_frag_cache *nc)
>>>> +{
>>>> +    unsigned long encoded_va = nc->encoded_va;
>>>> +    struct page *page;
>>>> +
>>>> +    page = virt_to_page((void *)encoded_va);
>>>> +    if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
>>>> +            return NULL;
>>>> +
>>>> +    if (unlikely(encoded_page_pfmemalloc(encoded_va))) {
>>>> +            VM_BUG_ON(compound_order(page) !=
>>>> +                      encoded_page_order(encoded_va));
>>>> +            free_unref_page(page, encoded_page_order(encoded_va));
>>>> +            return NULL;
>>>> +    }
>>>> +
>>>> +    /* OK, page count is 0, we can safely set it */
>>>> +    set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>>>> +
>>>> +    return page;
>>>> +}
>>>> +
>>>>  static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>>>                                           gfp_t gfp_mask)
>>>>  {
>>>> @@ -26,6 +48,14 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>>>      struct page *page = NULL;
>>>>      gfp_t gfp = gfp_mask;
>>>>
>>>> +    if (likely(nc->encoded_va)) {
>>>> +            page = __page_frag_cache_recharge(nc);
>>>> +            if (page) {
>>>> +                    order = encoded_page_order(nc->encoded_va);
>>>> +                    goto out;
>>>> +            }
>>>> +    }
>>>> +
>>>
>>> This code has no business here. This is refill, you just dropped
>>> recharge in here which will make a complete mess of the ordering and be
>>> confusing to say the least.
>>>
>>> The expectation was that if we are calling this function it is going to
>>> overwrite the virtual address to NULL on failure so we discard the old
>>> page if there is one present. This changes that behaviour. What you
>>> effectively did is made __page_frag_cache_refill into the recharge
>>> function.
>>
>> The idea is to reuse the below for both __page_frag_cache_refill() and
>> __page_frag_cache_recharge(), which seems to be about maintainability
>> to not having duplicated code. If there is a better idea to avoid that
>> duplicated code while keeping the old behaviour, I am happy to change
>> it.
>>
>>         /* reset page count bias and remaining to start of new frag */
>>         nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>>         nc->remaining = PAGE_SIZE << order;
>>
> 
> The only piece that is really reused here is the pagecnt_bias
> assignment. What is obfuscated away is that the order is gotten
> through one of two paths. Really order isn't order here it is size.
> Which should have been fetched already. What you end up doing with
> this change is duplicating a bunch of code throughout the function.
> You end up having to fetch size multiple times multiple ways. here you
> are generating it with order. Then you have to turn around and get it
> again at the start of the function, and again after calling this
> function in order to pull it back out.

I am assuming you would like to reserve old behavior as below?

	if(!encoded_va) {
refill:
		__page_frag_cache_refill()
	}


	if(remaining < fragsz) {
		if(!__page_frag_cache_recharge())
			goto refill;
	}

As we are adding new APIs, are we expecting new APIs also duplicate
the above pattern?

> 
>>>
>>>>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>>>>      gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
>>>>                 __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>>>> @@ -35,7 +65,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>>>      if (unlikely(!page)) {
>>>>              page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
>>>>              if (unlikely(!page)) {
>>>> -                    nc->encoded_va = 0;
>>>> +                    memset(nc, 0, sizeof(*nc));
>>>>                      return NULL;
>>>>              }
>>>>
>>>
>>> The memset will take a few more instructions than the existing code
>>> did. I would prefer to keep this as is if at all possible.
>>
>> It will not take more instructions for arm64 as it has 'stp' instruction for
>> __HAVE_ARCH_MEMSET is set.
>> There is something similar for x64?
> 
> The x64 does not last I knew without getting into the SSE/AVX type
> stuff. This becomes two seperate 8B store instructions.

I can check later when I get hold of a x64 server.
But doesn't it make sense to have one extra 8B store instructions in slow path
to avoid a checking in fast path?

> 
>>>
>>>> @@ -45,6 +75,16 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>>>      nc->encoded_va = encode_aligned_va(page_address(page), order,
>>>>                                         page_is_pfmemalloc(page));
>>>>
>>>> +    /* Even if we own the page, we do not use atomic_set().
>>>> +     * This would break get_page_unless_zero() users.
>>>> +     */
>>>> +    page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
>>>> +
>>>> +out:
>>>> +    /* reset page count bias and remaining to start of new frag */
>>>> +    nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>>>> +    nc->remaining = PAGE_SIZE << order;
>>>> +
>>>>      return page;
>>>>  }
>>>>
>>>
>>> Why bother returning a page at all? It doesn't seem like you don't use
>>> it anymore. It looks like the use cases you have for it in patch 11/12
>>> all appear to be broken from what I can tell as you are adding page as
>>> a variable when we don't need to be passing internal details to the
>>> callers of the function when just a simple error return code would do.
>>
>> It would be good to be more specific about the 'broken' part here.
> 
> We are passing internals to the caller. Basically this is generally
> frowned upon for many implementations of things as the general idea is
> that the internal page we are using should be a pseudo-private value.

It is implementation detail and it is about avoid calling virt_to_page()
as mentioned below, I am not sure why it is referred as 'broken', it would
be better to provide more doc about why it is bad idea here, as using
'pseudo-private ' wording doesn't seems to justify the 'broken' part here.

> I understand that you have one or two callers that need it for the use
> cases you have in patches 11/12, but it also seems like you are just
> passing it regardless. For example I noticed in a few cases you added
> the page pointer in 12 to handle the return value, but then just used
> it to check for NULL. My thought would be that rather than returning
> the page here you would be better off just returning 0 or an error and
> then doing the virt_to_page translation for all the cases where the
> page is actually needed since you have to go that route for a cached
> page anyway.

Yes, it is about aovid calling virt_to_page() as much as possible.