From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACE7EC3DA49 for ; Tue, 23 Jul 2024 13:40:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3293F6B009B; Tue, 23 Jul 2024 09:40:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D7576B009C; Tue, 23 Jul 2024 09:40:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 19F8A6B009D; Tue, 23 Jul 2024 09:40:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EFB756B009B for ; Tue, 23 Jul 2024 09:40:33 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 45661121F84 for ; Tue, 23 Jul 2024 13:40:33 +0000 (UTC) X-FDA: 82371127146.15.7EE97C3 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf22.hostedemail.com (Postfix) with ESMTP id C6351C1B78 for ; Tue, 23 Jul 2024 13:19:49 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721740768; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EmQ7KbGwzb7BUbQ6TsjNbVA8w1QeASKdahwCqcVxKZ4=; b=Qidsb2/EACa44P2s5U/AWJ/yP6gJ310+rKPieqSoP8NeCOWpb3bKck+2IVfGQ/pqgIUTtJ j7l9JUdpR2Pv8bt8LBxi/XndscdYtjjzrT7kIHhNq3EjyLS+li2QjOk1wgAc13jLoenORl pgGTD8dqrnbvA/YwlodhdmwpA+vJqcA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721740768; a=rsa-sha256; cv=none; b=DRmPcFhFWb/Y5WKpC/4lbV1avsGCV69SvQaIRjr62aGXy/nRJo8joDu2cVh1vQQJSa9Ulb sSoCEIT+Gx5yqXjVuHbhSic0noHJwp5Ga5nF9cLQnvc4ATDWoDdvoSv8hBxhFjmVL1GaWQ IXhJRz87gGxCvywtvGuIM9xpJhbRsHc= Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4WSyLw1wslzQmbK; Tue, 23 Jul 2024 21:15:36 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 7EA3C180102; Tue, 23 Jul 2024 21:19:46 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 23 Jul 2024 21:19:46 +0800 Message-ID: <5a0e12c1-0e98-426a-ab4d-50de2b09f36f@huawei.com> Date: Tue, 23 Jul 2024 21:19:46 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC v11 08/14] mm: page_frag: some minor refactoring before adding new API To: Alexander Duyck CC: , , , , , Andrew Morton , References: <20240719093338.55117-1-linyunsheng@huawei.com> <20240719093338.55117-9-linyunsheng@huawei.com> Content-Language: en-US From: Yunsheng Lin In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C6351C1B78 X-Stat-Signature: bxnywsg7tc54dijy9whm68fzxjbfmdy4 X-Rspam-User: X-HE-Tag: 1721740789-979593 X-HE-Meta: U2FsdGVkX19eqEZ2H8orq0INZ/VfVcP0i+JDSZkc87o3fiol9V4nE4M7dKElJOsfkPqxl7WPKI6VK8FyLnoM7nkZSQCXl6lRCgS7bBCCsAUM+WAsVjRRRNgxCmkBvCIVr/pVjW8ZevGwP3o6/JSDAH/SZUxl/THj3c44VXpNgohjzy4BkLwv32QdzxzDXZe1SpYjBmygZJSu86/6t9jmrstTfEyKvqt6gfGA97AfP67b0kUT3qLnB5qWx1vqP6844xw9SEOX+Xv9MvJZ34xjpiwtex2tmWKztH9Tu8rDTlp3tNJYLEC8XRIf74G9Dsr6WKZopyX9JZbV2VzLYSb+dVKY5LUvaE4Me5PWFFMH64VmR1KRGFzWjhg3ximpQwc6LOlv+dQI66xi8FEl9M5G1leVpfDszKrd8LLgSoQ9PjtWGmL0tvh4UmZ97oMyL2K3XaQJHm1ZWet7T1ANYXkz9SLRcp+SpGu2IDkHuy73RdsZokUoLmVWekomyGPlbk8bfFy5l1Dsp03ph0EXRMR0yBE0JQKBCd418cTkar4qTRPk+fEd6bRz3hzQ0Wz2hZR7azrGJsJcAgG4ykLLjISqjNUlbwk8EYqOLF2AT6ifTEMwM9ZHJmfmWNzzHMNRQawYqY8/MUvzSKApArKJEc8MYuoJszQhQXStiLWZfBt2hqY+nHkfI1sL/7TJwfAgO42I8GYz7jKJ1tYNiYkMSB/s+bshMy7T4k1Rk9XMvpeq2JV6CjXTqJqG6iK3vHdnwdjACCfNVlF2aIqTvSl3ccpmAZfp57/+oQorEpUz7RRgFIug09q0DLrFYGvcO0kqo/lHhqjwUpgC982V9yiqM3GReEsu4bQnABloHv1a0J6ewos01MYdyoQteubRCf9VGKZut2MJGAFnuT6UhbsIhDjxfJHFWsqjR0LTCOQH6c9l8SxlF6YH2N1Ef4nordtzj+nPEQoPTMzNx2y5U3aNz+J ckbqjb4g aU4OHSgw6zxIYq0Z38AZUqzNdozsFOY9l3xtwCes6vAveVfA5lcroyIbLd4r4RE6CPM5fl7xWEdT0xbUUNPvfAStrZx/gNAOs7d2ALR8Aa65zBXnifW6A8rSQrV/RU0KrHLmS/9wsXJz4ALLdO36uF9VpNV42gZqgP6rUUU3uARJW6HqP8pmHOoTD69/DYor4p+Fn5xKqOp00uUmbz55z1J11UkRbFhVgtF/0HcrwDN/GLts8QkLvBeGpbqjU2o1yC+pAbcroE1k/p4PFE7w+imiHz7RkwGf5v89duySzOoIGTYG5lDxHXEb/hq7YSgmBCHtRg1Foy8naZD2c3M4Qtp+PP0xV4dNQB0Z2poSsC4aFJU8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/7/22 23:32, Alexander Duyck wrote: > On Mon, Jul 22, 2024 at 5:55 AM Yunsheng Lin wrote: >> >> On 2024/7/22 7:40, Alexander H Duyck wrote: >>> On Fri, 2024-07-19 at 17:33 +0800, Yunsheng Lin wrote: >>>> Refactor common codes from __page_frag_alloc_va_align() >>>> to __page_frag_cache_refill(), so that the new API can >>>> make use of them. >>>> >>>> CC: Alexander Duyck >>>> Signed-off-by: Yunsheng Lin >>>> --- >>>> include/linux/page_frag_cache.h | 2 +- >>>> mm/page_frag_cache.c | 93 +++++++++++++++++---------------- >>>> 2 files changed, 49 insertions(+), 46 deletions(-) >>>> >>>> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h >>>> index 12a16f8e8ad0..5aa45de7a9a5 100644 >>>> --- a/include/linux/page_frag_cache.h >>>> +++ b/include/linux/page_frag_cache.h >>>> @@ -50,7 +50,7 @@ static inline void *encoded_page_address(unsigned long encoded_va) >>>> >>>> static inline void page_frag_cache_init(struct page_frag_cache *nc) >>>> { >>>> - nc->encoded_va = 0; >>>> + memset(nc, 0, sizeof(*nc)); >>>> } >>>> >>> >>> I do not like requiring the entire structure to be reset as a part of >>> init. If encoded_va is 0 then we have reset the page and the flags. >>> There shouldn't be anything else we need to reset as remaining and bias >>> will be reset when we reallocate. >> >> The argument is about aoviding one checking for fast path by doing the >> memset in the slow path, which you might already know accroding to your >> comment in previous version. >> >> It is just sometimes hard to understand your preference for maintainability >> over performance here as sometimes your comment seems to perfer performance >> over maintainability, like the LEA trick you mentioned and offset count-down >> before this patchset. It would be good to be more consistent about this, >> otherwise it is sometimes confusing when doing the refactoring. > > The use of a negative offset is arguably more maintainable in my mind > rather than being a performance trick. Essentially if you use the > negative value you can just mask off the upper bits and it is the > offset in the page. As such it is actually easier for me to read > versus "remaining" which is an offset from the end of the page. > Assuming you read the offset in hex anyway. Reading the above doesn't seems maintainable to me:( > >>> >>>> static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) >>>> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c >>>> index 7928e5d50711..d9c9cad17af7 100644 >>>> --- a/mm/page_frag_cache.c >>>> +++ b/mm/page_frag_cache.c >>>> @@ -19,6 +19,28 @@ >>>> #include >>>> #include "internal.h" >>>> >>>> +static struct page *__page_frag_cache_recharge(struct page_frag_cache *nc) >>>> +{ >>>> + unsigned long encoded_va = nc->encoded_va; >>>> + struct page *page; >>>> + >>>> + page = virt_to_page((void *)encoded_va); >>>> + if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) >>>> + return NULL; >>>> + >>>> + if (unlikely(encoded_page_pfmemalloc(encoded_va))) { >>>> + VM_BUG_ON(compound_order(page) != >>>> + encoded_page_order(encoded_va)); >>>> + free_unref_page(page, encoded_page_order(encoded_va)); >>>> + return NULL; >>>> + } >>>> + >>>> + /* OK, page count is 0, we can safely set it */ >>>> + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); >>>> + >>>> + return page; >>>> +} >>>> + >>>> static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, >>>> gfp_t gfp_mask) >>>> { >>>> @@ -26,6 +48,14 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, >>>> struct page *page = NULL; >>>> gfp_t gfp = gfp_mask; >>>> >>>> + if (likely(nc->encoded_va)) { >>>> + page = __page_frag_cache_recharge(nc); >>>> + if (page) { >>>> + order = encoded_page_order(nc->encoded_va); >>>> + goto out; >>>> + } >>>> + } >>>> + >>> >>> This code has no business here. This is refill, you just dropped >>> recharge in here which will make a complete mess of the ordering and be >>> confusing to say the least. >>> >>> The expectation was that if we are calling this function it is going to >>> overwrite the virtual address to NULL on failure so we discard the old >>> page if there is one present. This changes that behaviour. What you >>> effectively did is made __page_frag_cache_refill into the recharge >>> function. >> >> The idea is to reuse the below for both __page_frag_cache_refill() and >> __page_frag_cache_recharge(), which seems to be about maintainability >> to not having duplicated code. If there is a better idea to avoid that >> duplicated code while keeping the old behaviour, I am happy to change >> it. >> >> /* reset page count bias and remaining to start of new frag */ >> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; >> nc->remaining = PAGE_SIZE << order; >> > > The only piece that is really reused here is the pagecnt_bias > assignment. What is obfuscated away is that the order is gotten > through one of two paths. Really order isn't order here it is size. > Which should have been fetched already. What you end up doing with > this change is duplicating a bunch of code throughout the function. > You end up having to fetch size multiple times multiple ways. here you > are generating it with order. Then you have to turn around and get it > again at the start of the function, and again after calling this > function in order to pull it back out. I am assuming you would like to reserve old behavior as below? if(!encoded_va) { refill: __page_frag_cache_refill() } if(remaining < fragsz) { if(!__page_frag_cache_recharge()) goto refill; } As we are adding new APIs, are we expecting new APIs also duplicate the above pattern? > >>> >>>> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >>>> gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | >>>> __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; >>>> @@ -35,7 +65,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, >>>> if (unlikely(!page)) { >>>> page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); >>>> if (unlikely(!page)) { >>>> - nc->encoded_va = 0; >>>> + memset(nc, 0, sizeof(*nc)); >>>> return NULL; >>>> } >>>> >>> >>> The memset will take a few more instructions than the existing code >>> did. I would prefer to keep this as is if at all possible. >> >> It will not take more instructions for arm64 as it has 'stp' instruction for >> __HAVE_ARCH_MEMSET is set. >> There is something similar for x64? > > The x64 does not last I knew without getting into the SSE/AVX type > stuff. This becomes two seperate 8B store instructions. I can check later when I get hold of a x64 server. But doesn't it make sense to have one extra 8B store instructions in slow path to avoid a checking in fast path? > >>> >>>> @@ -45,6 +75,16 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, >>>> nc->encoded_va = encode_aligned_va(page_address(page), order, >>>> page_is_pfmemalloc(page)); >>>> >>>> + /* Even if we own the page, we do not use atomic_set(). >>>> + * This would break get_page_unless_zero() users. >>>> + */ >>>> + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); >>>> + >>>> +out: >>>> + /* reset page count bias and remaining to start of new frag */ >>>> + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; >>>> + nc->remaining = PAGE_SIZE << order; >>>> + >>>> return page; >>>> } >>>> >>> >>> Why bother returning a page at all? It doesn't seem like you don't use >>> it anymore. It looks like the use cases you have for it in patch 11/12 >>> all appear to be broken from what I can tell as you are adding page as >>> a variable when we don't need to be passing internal details to the >>> callers of the function when just a simple error return code would do. >> >> It would be good to be more specific about the 'broken' part here. > > We are passing internals to the caller. Basically this is generally > frowned upon for many implementations of things as the general idea is > that the internal page we are using should be a pseudo-private value. It is implementation detail and it is about avoid calling virt_to_page() as mentioned below, I am not sure why it is referred as 'broken', it would be better to provide more doc about why it is bad idea here, as using 'pseudo-private ' wording doesn't seems to justify the 'broken' part here. > I understand that you have one or two callers that need it for the use > cases you have in patches 11/12, but it also seems like you are just > passing it regardless. For example I noticed in a few cases you added > the page pointer in 12 to handle the return value, but then just used > it to check for NULL. My thought would be that rather than returning > the page here you would be better off just returning 0 or an error and > then doing the virt_to_page translation for all the cases where the > page is actually needed since you have to go that route for a cached > page anyway. Yes, it is about aovid calling virt_to_page() as much as possible.