From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 242B0E7717D for ; Fri, 13 Dec 2024 12:09:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 92B8F6B007B; Fri, 13 Dec 2024 07:09:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DB5E6B0082; Fri, 13 Dec 2024 07:09:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A2F06B0083; Fri, 13 Dec 2024 07:09:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5D5A56B007B for ; Fri, 13 Dec 2024 07:09:48 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 05497120B73 for ; Fri, 13 Dec 2024 12:09:47 +0000 (UTC) X-FDA: 82889815848.06.ED1D2FE Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by imf16.hostedemail.com (Postfix) with ESMTP id 35A22180005 for ; Fri, 13 Dec 2024 12:09:17 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf16.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734091774; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uNURZa3GpLfSrZcnWO1sqatEhwqIUsvcTk73nnf5HoI=; b=z/nQCamYIt/zzh6+OT7Ja24xCptmypmXO4x8f7HVqg5/K66ER7eg2usVBGjUrxgl2rfo0Q ShNf2CyoIcHiF0cM7STc+SDfOIFPcaSYwmRJ24IJjZ+JmyvVmZDV4muzqICaO9cFzCMo3s bM2v6TpXUeomPt/0uGk8qZIC1JmNgQU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf16.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734091774; a=rsa-sha256; cv=none; b=3fxgB2F00j3ZEHj1lwlwMJkHGZ2k1Wt8hC8rew8PnbHml8T/i2jzJYEhbUKf0ojtl4jOLB AfoqvzyJAykdpSCa/tjqP0OJxXY3urSBgcJKhmelTrFqOUq5sQfIyMLZUjOTZRXGM1XvYh dF+15By8zytAENicW+gaGmrs88z1P08= Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4Y8p772XM8z20lnN; Fri, 13 Dec 2024 20:09:55 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id F2081140123; Fri, 13 Dec 2024 20:09:37 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 13 Dec 2024 20:09:37 +0800 Message-ID: Date: Fri, 13 Dec 2024 20:09:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v2 00/10] Replace page_frag with page_frag_cache (Part-2) From: Yunsheng Lin To: Alexander Duyck CC: , , , , , Shuah Khan , Andrew Morton , Linux-MM References: <20241206122533.3589947-1-linyunsheng@huawei.com> <3de1b8a3-ae4f-492f-969d-bc6f2c145d09@huawei.com> <15723762-7800-4498-845e-7383a88f147b@huawei.com> <389876b8-e565-4dc9-bc87-d97a639ff585@huawei.com> Content-Language: en-US In-Reply-To: <389876b8-e565-4dc9-bc87-d97a639ff585@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspamd-Queue-Id: 35A22180005 X-Stat-Signature: k4xtyht95t87k946i6iqhua58ke3ifik X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1734091757-496878 X-HE-Meta: U2FsdGVkX1+Cqs7zFrlBThMp/XZ3F/i68omeQjbU+ZFZuoDQ2wG05KmM7utEtxYlWc7hGdSZO2U1wyB18VIF2bVUxrY9mOupZMyy+1CSFQ8WstaCMHa8hhkF1GbA3nadsZmN8y1uDcZwRkj9P35xo4BCMzgCKnwCL1MAEmN532ll8+eFuBVRRkcEHAHeVCPjti7MkI2DdO/Ymq6E+PtH4eMzAl5meDNpi7G8aQpWyj3eskfxGRL9Q7QG9cSPQdZJorwZGvs8Au+hvg3uoCFrsC6AxGRyUYojFj/kqE94Jku6xa++MZ2zW7l8lQ9xp9yQd/W8CFJ1CRn553M/1HldKM3EtDzwxtPV+iR9XUYA1E94XFxedmoeFCvbMMP3VEx4cT4fx3ofA5EJZ2Is2xZbxXtuQNP9zLyxTaCc8KLh7O0XIB+NUvXGCDllE+6k9ihBtBn/T3a8TqXqXib2eEdC2r2aEly7X75zebZ9va2ui3VxTgSEYcQ8O3rCgb5BAzAZOSRdLiKu39tFS6xseRluDaoeatmOAYYmD9755kS6OtQlgdtKgHxyo8tjI474QC6M72GFRXGy1ut+ibHB4ZpI26T1ogWXkhLsjCm5/cvLeo2E+Uh0XU4KWJG3pEu+78wZSYX0GfYCx9ErJdajKwgdlk9AoJXsaexwkGrZOHJ9lTHReJmCIXbQZTg1gfWyFeW0Y4DchwKbgQ2Wf79cx22IXAS9m2I+R9//corsqv6dyf3N5Kg6Ogwamhw8FIAVvQRPCcqESt96FKk92PpnSaTdti3HpTq0AUTydssw6A4CY+N6GgstupA6/OPmnyN74H2pPVcpxMjfvlLIiZtpC2toZn7uM6RgaSsta6oluCmH4lfZ5fhw+asC81Le0lTGY+EfxitOrz4zozdQuJjq3hC68NH7eVjGcacV2koNQeRxyQbZSIUkjB89aNKPZ08DrwQQFtL7QU9eXmkqcrzxPZC H5EMOl+I TDzPmsmq84kpwFOBoJvM9xhQbxeycQtTe6WIxSC5S1OpN7x587ubtbEM2kcYG5Ze2PeA6fAlpjbV6keUGIvRkdA6izYtJh/KP1U78mh9LtXXAkpXElX1IUEE6un2dAviTMkm0zRum14BH8hEKyV2AAJIQpFjuW2slQdRAWoDpgXjNdZIDLTp0v/8+ozSsU6S3bBhIow+Usk88byPUqxbd/O1B3wbXGtwF7uAG+3S9k6SEVgqckvsR+YEVmbg/WeOzmTKN1B7rNtrdCcaDyjEKV5tTd66BGkIAmV3x X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/12/11 20:52, Yunsheng Lin wrote: > It seems that bottleneck is still the freeing side that the above > result might not be as meaningful as it should be. Through 'perf top' annotating, there seems to be about 70%+ cpu usage for the atmoic operation of put_page_testzero() in page_frag_free(), it was unexpected that the atmoic operation had that much overhead:( > > As we can't use more than one cpu for the free side without some > lock using a single ptr_ring, it seems something more complicated > might need to be done in order to support more than one CPU for the > freeing side? > > Before patch 1, __page_frag_alloc_align took up to 3.62% percent of > CPU using 'perf top'. > After patch 1, __page_frag_cache_prepare() and __page_frag_cache_commit_noref() > took up to 4.67% + 1.01% = 5.68%. > Having a similar result, I am not sure if the CPU usages is able tell us > the performance degradation here as it seems to be quite large? > And using 'struct page_frag' to pass the parameter seems to cause some observable overhead as the testing is very low level, peformance seems to be negligible using the below patch to avoid passing 'struct page_frag', 3.62% and 3.27% for the cpu usages for __page_frag_alloc_align() before patch 1 and __page_frag_cache_prepare() after patch 1 respectively. The new refatcoring avoid some overhead for the old API, but might cause some overhead for the new API as it is not able to skip the virt_to_page() for refilling and reusing case, though it seems to be an unlikely case. Or any better idea how to do refatcoring for unifying the page_frag API? diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 41a91df82631..b83e7655654e 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -39,8 +39,24 @@ static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) void page_frag_cache_drain(struct page_frag_cache *nc); void __page_frag_cache_drain(struct page *page, unsigned int count); -void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, - gfp_t gfp_mask, unsigned int align_mask); +void *__page_frag_cache_prepare(struct page_frag_cache *nc, unsigned int fragsz, + gfp_t gfp_mask, unsigned int align_mask); + +static inline void *__page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) +{ + void *va; + + va = __page_frag_cache_prepare(nc, fragsz, gfp_mask, align_mask); + if (likely(va)) { + va += nc->offset; + nc->offset += fragsz; + nc->pagecnt_bias--; + } + + return va; +} static inline void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index 3f7a203d35c6..729309aee27a 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -90,9 +90,9 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) } EXPORT_SYMBOL(__page_frag_cache_drain); -void *__page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) +void *__page_frag_cache_prepare(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) { unsigned long encoded_page = nc->encoded_page; unsigned int size, offset; @@ -151,12 +151,10 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, offset = 0; } - nc->pagecnt_bias--; - nc->offset = offset + fragsz; - - return encoded_page_decode_virt(encoded_page) + offset; + nc->offset = offset; + return encoded_page_decode_virt(encoded_page); } -EXPORT_SYMBOL(__page_frag_alloc_align); +EXPORT_SYMBOL(__page_frag_cache_prepare); /* * Frees a page fragment allocated out of either a compound or order 0 page.