From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0FF6C3DA42 for ; Wed, 17 Jul 2024 12:31:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F05166B0099; Wed, 17 Jul 2024 08:31:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E8D776B009A; Wed, 17 Jul 2024 08:31:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D06BB6B009B; Wed, 17 Jul 2024 08:31:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B0B2E6B0099 for ; Wed, 17 Jul 2024 08:31:24 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5BB841A0881 for ; Wed, 17 Jul 2024 12:31:24 +0000 (UTC) X-FDA: 82349180088.17.C4EF321 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf14.hostedemail.com (Postfix) with ESMTP id E8220100026 for ; Wed, 17 Jul 2024 12:31:20 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721219462; a=rsa-sha256; cv=none; b=G52wHmsvLbYn24qGThgKmpQnC76/h8+huhRWnXv7tq44AgZI0PWW4KAqR4Xr3w+ChkJdoB J8NJdLtXbN/KtCINMqRCp5k9/lFbC3Ct9e63lfQVF26tDE26g1wHtMA9SL/QHH/rpKzrkS hwtVTHK5o0SEsMZtUWSxH9qRP07XkIs= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721219462; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7xEFqGRrViD4Nuqv4ZhfZSFTDsWuI9q83bU13MDAKBY=; b=oK4W9aEo/qH6EgoDGyaExa8bdMgnybggFsXWuj0Py6kF8L5B9bMjJJFkwyb6x3LNhIcm0x tJO4dOANOIYqy6xeoh2hyQDp3KP0PBkJByXiRePTrkT3tJEsN5+Pk5gAovWN+rnwQb9P2e B1cVpQ/vMjKf16cQfmozbQL0+/ztKTM= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4WPFcX1W3Tzdhnf; Wed, 17 Jul 2024 20:29:32 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 049561800A0; Wed, 17 Jul 2024 20:31:16 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 17 Jul 2024 20:31:15 +0800 Message-ID: <3aaa607f-0aaa-4973-bbb2-41416f828f44@huawei.com> Date: Wed, 17 Jul 2024 20:31:15 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v9 06/13] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' From: Yunsheng Lin To: Alexander Duyck , Yunsheng Lin CC: , , , , , Andrew Morton , References: <20240625135216.47007-1-linyunsheng@huawei.com> <20240625135216.47007-7-linyunsheng@huawei.com> <12a8b9ddbcb2da8431f77c5ec952ccfb2a77b7ec.camel@gmail.com> <808be796-6333-c116-6ecb-95a39f7ad76e@huawei.com> <96b04ebb7f46d73482d5f71213bd800c8195f00d.camel@gmail.com> <5daed410-063b-4d86-b544-d1a85bd86375@huawei.com> <29e8ac53-f7da-4896-8121-2abc25ec2c95@gmail.com> <12ff13d9-1f3d-4c1b-a972-2efb6f247e31@gmail.com> <5a3b39b7-c183-4c73-bd9b-184db8b24f6a@huawei.com> Content-Language: en-US In-Reply-To: <5a3b39b7-c183-4c73-bd9b-184db8b24f6a@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemf200006.china.huawei.com (7.185.36.61) X-Stat-Signature: 33tmjbumboo39q3itqsfp7tmtonwghk9 X-Rspamd-Queue-Id: E8220100026 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1721219480-670260 X-HE-Meta: U2FsdGVkX1+Yz92NwlQOXgl4PIz4cuS7PV0RFdET1zSMBlhTWKv2+eWePif/G0Al3/0ZoXGOpwpUqbKicKKDcZ54hi8r1QBx3kPLbd7UueiHTMm3b3yx7CIOH8+hxVEjqKZFtW0eJHlFxA7B/AXORc3FI0AU5PS2E7bJdfgGhryNUazfuQ3ZNvJ7Ev8xXqS0WPnXM0eN5bKKovJcnAEQay9flzKxNO+Cf01jbHk/ErBjxm4JR1WyD1nR/Dptm3kAWOEg8DB+z8qcle/+s1lNF57IGI7jlaN5lc7y6c8WOJpiXwtWervmHjt/72BNGQM8BkYGF0o1p6COnB/fQT7RE5GyRXxgYAwNGuQZdaQKM9hf4FX7omE4kHHNwNf5juY+xExtGLJ3wy9b/CoK018cufN1mCnw110bmetKAr+9y6vZFKrbANOqaRDsBpuVy5ijvOh7QDwiHDD6FldHe96hES4hYcTrTAaKbXaa5lh+9DHr0xuT5MCiLe6Qgd0tzGVDx+MqcSXc3waK15bheWuK232GjGeaU6LapiWq0VKGY315bFZaAqOxcg+RDV5+/Mb7Hi0Vv1otGQt7kofZKXyCJnlFdflz/NXWlwRUtJ+j+U/nvFBaLKrfNQ89KrbyUh/lunq/l76XxPpHEJQn3vz5dHP4KrAQWzhd64600fn1JN5OCucAqgeey4ToU/cK41lgXGx1Pu+JghhPttDiA7STVLmEbaF7J1gk9cnZ0Fmt4taKTx1nIPfKabpXqeFWU6pZDoa1FsptVdQOY1KJO8I2bDX3gHgsAHdBPTlvgK67hJRfcY139YraTQ7VCP7aVCmc1ADomkJs7cFcNrMyWMnA67aulnrD9cMsnGG3u3ZgiPwmsC4GJQrluwNow42xPfXLDMRV66xih0DCCGpvn81cHxOJSfeyb46rZrPkV4V9pdV+HsF+AvvjrfE6sxLbTjl0xCA1WHvvvaKfc9aftIU fmVQW341 vKc5p3KZ3RSt6lmuomLjD0aNJYfILcwBKwa3WDYrw1TLB4Lxftc3HM2aZafTrqS17Tsr0JFqcLELe/1UGNF+WeCCwSC58al7VGcQYwMbj/wNvL/fDYT9NY4oml+ag1ZC6ST4ikjQUkngGz+oiklAzanl1kXhL3N0CthLzgBxs3CPla7YNGI09irvRWmaHJp0IbgQWlV3LtnNInTK47gZ1an97KkoNtfFpmVq0RDikYsYbE41Ps2p6hd2rwc1jCkPq1okPQTN5IcyiTLskRrctr35kmxs04vV6Dt2t X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/7/16 20:58, Yunsheng Lin wrote: ... > > Option 1 assuming nc->remaining as a negative value does not seems to > make it a more maintainable solution than option 2. How about something > like below if using a negative value to enable some optimization like LEA > does not have a noticeable performance difference? Suppose the below as option 3, it seems the option 3 has better performance than option 2, and option 2 has better performance than option 1 using the ko introduced in patch 1. Option 1: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=5120000' (500 runs): 17.757768 task-clock (msec) # 0.001 CPUs utilized ( +- 0.17% ) 5 context-switches # 0.288 K/sec ( +- 0.28% ) 0 cpu-migrations # 0.007 K/sec ( +- 12.36% ) 82 page-faults # 0.005 M/sec ( +- 0.06% ) 46128280 cycles # 2.598 GHz ( +- 0.17% ) 60938595 instructions # 1.32 insn per cycle ( +- 0.02% ) 14783794 branches # 832.525 M/sec ( +- 0.02% ) 20393 branch-misses # 0.14% of all branches ( +- 0.13% ) 24.556644680 seconds time elapsed ( +- 0.07% ) Option 2: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=5120000' (500 runs): 18.443508 task-clock (msec) # 0.001 CPUs utilized ( +- 0.61% ) 6 context-switches # 0.342 K/sec ( +- 0.57% ) 0 cpu-migrations # 0.025 K/sec ( +- 4.89% ) 82 page-faults # 0.004 M/sec ( +- 0.06% ) 47901207 cycles # 2.597 GHz ( +- 0.61% ) 60985019 instructions # 1.27 insn per cycle ( +- 0.05% ) 14787177 branches # 801.755 M/sec ( +- 0.05% ) 21099 branch-misses # 0.14% of all branches ( +- 0.14% ) 24.413183804 seconds time elapsed ( +- 0.06% ) Option 3: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=5120000' (500 runs): 17.847031 task-clock (msec) # 0.001 CPUs utilized ( +- 0.23% ) 5 context-switches # 0.305 K/sec ( +- 0.55% ) 0 cpu-migrations # 0.017 K/sec ( +- 6.86% ) 82 page-faults # 0.005 M/sec ( +- 0.06% ) 46355974 cycles # 2.597 GHz ( +- 0.23% ) 60848779 instructions # 1.31 insn per cycle ( +- 0.03% ) 14758941 branches # 826.969 M/sec ( +- 0.03% ) 20728 branch-misses # 0.14% of all branches ( +- 0.15% ) 24.376161069 seconds time elapsed ( +- 0.06% ) > > struct page_frag_cache { > /* encoded_va consists of the virtual address, pfmemalloc bit and order > * of a page. > */ > unsigned long encoded_va; > > #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <= 32) > __u16 remaining; > __u16 pagecnt_bias; > #else > __u32 remaining; > __u32 pagecnt_bias; > #endif > }; > > void *__page_frag_alloc_va_align(struct page_frag_cache *nc, > unsigned int fragsz, gfp_t gfp_mask, > unsigned int align_mask) > { > unsigned int size = page_frag_cache_page_size(nc->encoded_va); > unsigned int remaining; > > remaining = nc->remaining & align_mask; > if (unlikely(remaining < fragsz)) { > if (unlikely(fragsz > PAGE_SIZE)) { > /* > * The caller is trying to allocate a fragment > * with fragsz > PAGE_SIZE but the cache isn't big > * enough to satisfy the request, this may > * happen in low memory conditions. > * We don't release the cache page because > * it could make memory pressure worse > * so we simply return NULL here. > */ > return NULL; > } > > if (!__page_frag_cache_refill(nc, gfp_mask)) > return NULL; > > size = page_frag_cache_page_size(nc->encoded_va); > remaining = size; > } > > nc->pagecnt_bias--; > nc->remaining = remaining - fragsz; > > return encoded_page_address(nc->encoded_va) + (size - remaining); > } > >