From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2675C3DA4A for ; Tue, 20 Aug 2024 13:08:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 118416B0088; Tue, 20 Aug 2024 09:08:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C88E6B0089; Tue, 20 Aug 2024 09:08:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED2346B008A; Tue, 20 Aug 2024 09:08:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CFCD06B0088 for ; Tue, 20 Aug 2024 09:08:12 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 744B81219E9 for ; Tue, 20 Aug 2024 13:08:12 +0000 (UTC) X-FDA: 82472652024.24.F6A05C1 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf29.hostedemail.com (Postfix) with ESMTP id B3F0E120032 for ; Tue, 20 Aug 2024 13:08:08 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724159250; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c5vMRiOKfaQQfJmZuxSo5XZjQvnkQIRHPMJZNOLrmTs=; b=1TcyFKKxDBTZdV9IPpOEoLlmzyZsrCnUaeYlTkFxGV7arPMDqvFgB+AduPyNZeHpFScqvW bEFWf1AVKInheVohKW5uQLMIhatgrB/Emx3IdWMm+Jj6TsQUYqJaZ7PwQsz+JIS73SFRQW fKeXpQLvvjUib4WS2hBmlh0E5rW833U= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724159250; a=rsa-sha256; cv=none; b=vrcJMX6q5tYmguftj5XIgxgKh2dH6GDkq9optCpfjNOyr3ukbdPfUPYukXqp2RsVReUhXi ApHLmOnpLYYLl8/hxtWqnATlsjNOtZUYn9WY7r3a/Oc7cIfnmw3iKvP3nZtqQ8OIiodnbe 5w/cHMRb0yJMiTWk0RwoHdcSUo5R4wI= Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Wp8lv5K1LzQq66; Tue, 20 Aug 2024 21:03:23 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id D1AFE140133; Tue, 20 Aug 2024 21:08:04 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 20 Aug 2024 21:08:04 +0800 Message-ID: <37fc4b01-43f1-4a2c-af35-96cf3f7fe3d5@huawei.com> Date: Tue, 20 Aug 2024 21:08:04 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v13 11/14] mm: page_frag: introduce prepare/probe/commit API To: Alexander Duyck CC: , , , , , Andrew Morton , References: <20240808123714.462740-1-linyunsheng@huawei.com> <20240808123714.462740-12-linyunsheng@huawei.com> <7f06fa30-fa7c-4cf2-bd8e-52ea1c78f8aa@huawei.com> <5905bad4-8a98-4f9d-9bd6-b9764e299ac7@huawei.com> Content-Language: en-US From: Yunsheng Lin In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: B3F0E120032 X-Stat-Signature: g8byujp6kez4t9j6u4946whk5hksbywj X-HE-Tag: 1724159288-403082 X-HE-Meta: U2FsdGVkX1/zLnSU9cKkDVUnwMcj+hQjhOdBj2PC6WnTZxj+jXpphoIFXAGctWIRFhQ7D6lwOXfSIlDITiI2Jtr3WINGF8evIZY7IPv3excG4RVwqGcH4CMPLTBJT7VGQ2HSiAOObMdjPSidi7WMeWCf02ZjK1l+y9seTf/jZ3zXFZUk52owmgwVwC0bidU6r+JOYEiUmLi7kH6XuaDdnpQrzlU7mlyCFELaFYO0i7/9FMSTboYJraa8KYugKpccrD7JSqgQfz4uYSrrRWoX28tYkRh+LF+vpFRwx2ZdW1u1nO6Yehg+lQ2v5Vz/rHkOooKSX3OugVX8UHJPId7wGeP7FfVDnUumbjX5TeFaz5fdpLnVNaJ1w02pW7oZvNPHcbeD39rNFPmQVOvXmapJ7nSjGuzWLmxfxYL6bAzmIg3UA4sC0xHQueLQ4X00FR/uMS+9v+2qPHgyb9fPdFOXQBnwPMxg3yxF00fqYLwHkTRthtI07tFzD56tpzKfUIO6XHxzqUOFVQfVrVvbXUO+Bi6yf/Q5QOaNVVCoKCpbLDY0bxcwp4LOG1XPKR8NofGVe4CJgUA98MKtBEcom0pwl3BElQqE4PCIiXu+FQHto246p6bICNVLlwxUqKd9BVa8WvV3rMi0R30GzAQmfq6UV3Qme+7giDOG8Oo9B0erbRsAsRpTgqHVf2J/lg30NcZQw/nsHJullFG7q3m1tFNsIJsb2HilzhWXYIMFVPm9Xu33/xUttzwseLGqFwbz/hHoNJF2xS+lOCyPhXv2N5qjuwXzdaHsfM2ZD1w+sfEbRQFa3zKx2ZtL0EgtxmdCg/ybhX7ZGrGrrInl+4O9x04wbrXYo+o6vKV+Akmggy7WYsEt9WT5/ep+PUcV8dRHD9oYjveNe8T3UucpLzzJcrkJ1jTFtL7I/FTH/NGhSUKzpRb8NHHd4WlqFEEvhSe2ZgGtyJoIhzXgV29UR1TH+ds uJTX/qNF Guug5RFIpaFv2YIWKTyjHpFg77G1e6O6Qr2EAUjrP0Cacu+H3T7/j80RIazzKsqIuzAnaTmECJpqAJCQe4hmgyK8Nh4TI3cSaRlmLy1DumG2qB3AYK5rHGm0/w7rtalVIbnqFFKwKjbVQZegi7aWHULo4bkm87wuDZfwjT/Y/xepSBwxjWaJexQCqJSeJ0Y3Z/h7WHAXeXkg0Dji91HdWBfEVbRdi1W8qdDPrOhzYtZ6eah+E/rYeN60nzHJ1nSEzky9ethMsQ6ZaoUJN4vN6CLnnX0U+s5q62LZKnbkx07LaXbtSxJHVqCB0pO3AiCOGpbZS17p243QWoTocGMeuL88oPGTLB9EX5Kk9U9AGQpdSYuzSa+3iMIYgEuyg7PjjG6YPA/M48Wf++KQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/19 23:52, Alexander Duyck wrote: >> >> Yes, the expectation is that somebody else didn't take an access to the >> page/data to send it off somewhere else between page_frag_alloc_va() >> and page_frag_alloc_abort(), did you see expectation was broken in that >> patch? If yes, we should fix that by using page_frag_free_va() related >> API instead of using page_frag_alloc_abort(). > > The problem is when you expose it to XDP there are a number of > different paths it can take. As such you shouldn't be expecting XDP to > not do something like that. Basically you have to check the reference Even if XDP operations like xdp_do_redirect() or tun_xdp_xmit() return failure, we still can not do that? It seems odd that happens. If not, can we use page_frag_alloc_abort() with fragsz being zero to avoid atomic operation? > count before you can rewind the page. > >>> >>> >>>> >>>>>> +static struct page *__page_frag_cache_reload(struct page_frag_cache *nc, >>>>>> + gfp_t gfp_mask) >>>>>> { >>>>>> + struct page *page; >>>>>> + >>>>>> if (likely(nc->encoded_va)) { >>>>>> - if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias)) >>>>>> + page = __page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias); >>>>>> + if (page) >>>>>> goto out; >>>>>> } >>>>>> >>>>>> - if (unlikely(!__page_frag_cache_refill(nc, gfp_mask))) >>>>>> - return false; >>>>>> + page = __page_frag_cache_refill(nc, gfp_mask); >>>>>> + if (unlikely(!page)) >>>>>> + return NULL; >>>>>> >>>>>> out: >>>>>> /* reset page count bias and remaining to start of new frag */ >>>>>> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; >>>>>> nc->remaining = page_frag_cache_page_size(nc->encoded_va); >>>>>> - return true; >>>>>> + return page; >>>>>> +} >>>>>> + >>>>> >>>>> None of the functions above need to be returning page. >>>> >>>> Are you still suggesting to always use virt_to_page() even when it is >>>> not really necessary? why not return the page here to avoid the >>>> virt_to_page()? >>> >>> Yes. The likelihood of you needing to pass this out as a page should >>> be low as most cases will just be you using the virtual address >>> anyway. You are essentially trading off branching for not having to >>> use virt_to_page. It is unnecessary optimization. >> >> As my understanding, I am not trading off branching for not having to >> use virt_to_page, the branching is already needed no matter we utilize >> it to avoid calling virt_to_page() or not, please be more specific about >> which branching is traded off for not having to use virt_to_page() here. > > The virt_to_page overhead isn't that high. It would be better to just > use a consistent path rather than try to optimize for an unlikely > branch in your datapath. I am not sure if I understand what do you mean by 'consistent path' here. If I understand your comment correctly, the path is already not consistent to avoid having to fetch size multiple times multiple ways as mentioned in [1]. As below, doesn't it seems nature to avoid virt_to_page() calling while avoiding page_frag_cache_page_size() calling, even if it is an unlikely case as you mentioned: struct page *page_frag_alloc_pg(struct page_frag_cache *nc, unsigned int *offset, unsigned int fragsz, gfp_t gfp) { unsigned int remaining = nc->remaining; struct page *page; VM_BUG_ON(!fragsz); if (likely(remaining >= fragsz)) { unsigned long encoded_va = nc->encoded_va; *offset = page_frag_cache_page_size(encoded_va) - remaining; return virt_to_page((void *)encoded_va); } if (unlikely(fragsz > PAGE_SIZE)) return NULL; page = __page_frag_cache_reload(nc, gfp); if (unlikely(!page)) return NULL; *offset = 0; nc->remaining -= fragsz; nc->pagecnt_bias--; return page; } 1. https://lore.kernel.org/all/CAKgT0UeQ9gwYo7qttak0UgXC9+kunO2gedm_yjtPiMk4VJp9yQ@mail.gmail.com/ > >>> >>> >>>> >>>>>> +struct page *page_frag_alloc_pg(struct page_frag_cache *nc, >>>>>> + unsigned int *offset, unsigned int fragsz, >>>>>> + gfp_t gfp) >>>>>> +{ >>>>>> + unsigned int remaining = nc->remaining; >>>>>> + struct page *page; >>>>>> + >>>>>> + VM_BUG_ON(!fragsz); >>>>>> + if (likely(remaining >= fragsz)) { >>>>>> + unsigned long encoded_va = nc->encoded_va; >>>>>> + >>>>>> + *offset = page_frag_cache_page_size(encoded_va) - >>>>>> + remaining; >>>>>> + >>>>>> + return virt_to_page((void *)encoded_va); >>>>>> + } >>>>>> + >>>>>> + if (unlikely(fragsz > PAGE_SIZE)) >>>>>> + return NULL; >>>>>> + >>>>>> + page = __page_frag_cache_reload(nc, gfp); >>>>>> + if (unlikely(!page)) >>>>>> + return NULL; >>>>>> + >>>>>> + *offset = 0; >>>>>> + nc->remaining = remaining - fragsz; >>>>>> + nc->pagecnt_bias--; >>>>>> + >>>>>> + return page; >>>>>> } >>>>>> +EXPORT_SYMBOL(page_frag_alloc_pg); >>>>> >>>>> Again, this isn't returning a page. It is essentially returning a >>>>> bio_vec without calling it as such. You might as well pass the bio_vec >>>>> pointer as an argument and just have it populate it directly. >>>> >>>> I really don't think your bio_vec suggestion make much sense for now as >>>> the reason mentioned in below: >>>> >>>> "Through a quick look, there seems to be at least three structs which have >>>> similar values: struct bio_vec & struct skb_frag & struct page_frag. >>>> >>>> As your above agrument about using bio_vec, it seems it is ok to use any >>>> one of them as each one of them seems to have almost all the values we >>>> are using? >>>> >>>> Personally, my preference over them: 'struct page_frag' > 'struct skb_frag' >>>>> 'struct bio_vec', as the naming of 'struct page_frag' seems to best match >>>> the page_frag API, 'struct skb_frag' is the second preference because we >>>> mostly need to fill skb frag anyway, and 'struct bio_vec' is the last >>>> preference because it just happen to have almost all the values needed. >>> >>> That is why I said I would be okay with us passing page_frag in patch >>> 12 after looking closer at the code. The fact is it should make the >>> review of that patch set much easier if you essentially just pass the >>> page_frag back out of the call. Then it could be used in exactly the >>> same way it was before and should reduce the total number of lines of >>> code that need to be changed. >> >> So the your suggestion changed to something like below? >> >> int page_frag_alloc_pfrag(struct page_frag_cache *nc, struct page_frag *pfrag); >> >> The API naming of 'page_frag_alloc_pfrag' seems a little odd to me, any better >> one in your mind? > > Well at this point we are populating/getting/pulling a page frag from > the page frag cache. Maybe look for a word other than alloc such as > populate. Essentially what you are doing is populating the pfrag from > the frag cache, although I thought there was a size value you passed > for that isn't there? 'struct page_frag' does have a size field, but I am not sure if I understand what do you mean by 'although I thought there was a size value you passed for that isn't there?‘ yet.