From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 537F6C0015E for ; Sat, 12 Aug 2023 08:08:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 707E56B0074; Sat, 12 Aug 2023 04:08:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B8D56B0078; Sat, 12 Aug 2023 04:08:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 531526B007B; Sat, 12 Aug 2023 04:08:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4051F6B0074 for ; Sat, 12 Aug 2023 04:08:18 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 09EE9C1303 for ; Sat, 12 Aug 2023 08:08:18 +0000 (UTC) X-FDA: 81114725076.07.39BFEB3 Received: from smtpbgsg2.qq.com (smtpbgsg2.qq.com [54.254.200.128]) by imf26.hostedemail.com (Postfix) with ESMTP id 664AF14001B for ; Sat, 12 Aug 2023 08:08:14 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of guohui@uniontech.com designates 54.254.200.128 as permitted sender) smtp.mailfrom=guohui@uniontech.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691827695; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a818tSqtx1QqaVTSiXZh2wohyzJMLPkOHIM0wv59KZw=; b=K/WXoxUzGPJvTfzp3IcQi4S9hptU8MAXcdzWDUdBdsGSNFiHPysBFXcgBBeBGltxZhasu+ 9LFfhxoirXe02H7K/JVei0qaQao3Vy1jbGVkTSX/oMNXiAeiOsFDH0jlDEP/zXhI78xKeH JzHIj0WFfPxaLXsLqDrqinJr/3ccdfc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691827695; a=rsa-sha256; cv=none; b=SdpZVrSj01Qygbs5n67j99WvBAqSjxyU2LQFkiam0Tfidv9R+YIibmeqo9Txl3h4wTZoFG bHYHw9104NspVglm8C1rDWOBArrabSUaLwVNYcyVGg5ieI8CI1XZAgNCz3Lv0bt/JqBU96 N54Xs0d3cR9rfCF2tTyMR3luK6YBYoM= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of guohui@uniontech.com designates 54.254.200.128 as permitted sender) smtp.mailfrom=guohui@uniontech.com; dmarc=none X-QQ-mid: bizesmtp62t1691827673tx08e8w4 Received: from [10.7.13.112] ( [113.200.76.118]) by bizesmtp.qq.com (ESMTP) with id ; Sat, 12 Aug 2023 16:07:51 +0800 (CST) X-QQ-SSF: 01400000000000D0H000000A0000020 X-QQ-FEAT: 5q30pvLz2idymUAbMZVK0YWQ9IIEkPKeZPThSc80IP2lpdhAXzk4jXltAGz1q f9W+2EhpuOnlliZza/G9eIRD718nvYM0q2ZbyKzppPUDpa3IrD036E/MqY/4aPu/yn7Ayrj ItLRrOVXZkxeB/5Ou+dBfz4lkdxjUBr5CG0oiykEiiy+CIwBg62QySc+PeRptv9cdWdbgsj ZeVRALW9/mA8BA16nwXjZ36pnBPXzTSUel1N7WzTXHK282dCdPjPM2DnwjdNf+WlvKdrvI7 ei0frf7G4dVMj8vpJYNrIym/xsA+SgNcwykgsoYTHSyw2m1xYowxsQ517t1i+bfqXR5Q1od F+rcSy+o+SmfRebGDkaqsqs+0G0LVluQ0CGpb10ptem+S38/W3X2+MyB/QrgQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 13298603257121293814 Subject: Re: [PATCH] mm: sparse: shift operation instead of division operation for root index To: Matthew Wilcox Cc: akpm@linux-foundation.org, linux-mm@kvack.org, wangxiaohua@uniontech.com References: <20230810103829.10007-1-guohui@uniontech.com> From: Guo Hui Message-ID: <691EB0CDB72D100F+cd48ad4b-e33f-eda9-4961-32ac309cf4f8@uniontech.com> Date: Sat, 12 Aug 2023 16:07:51 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:uniontech.com:qybglogicsvrsz:qybglogicsvrsz4a-0 X-Rspamd-Queue-Id: 664AF14001B X-Rspam-User: X-Stat-Signature: ix4jyqbir1uqyf33ba34y77m6cc477id X-Rspamd-Server: rspam03 X-HE-Tag: 1691827694-601311 X-HE-Meta: U2FsdGVkX19BmtJTSdb0KjZLn8tvBQ9qYoI0eF4j4KD5w/+T4mMrQvybs782jZdiIhXwmrEEwQExSA0wIDeOzFmKd+KRRHdvpocIS3ydPcwtaAjBOnYodnzoGlK6ImkSbWTdlEXntq902ha0P9oFZWFpxQdEuEPDwk0vTo6p4P5828F0YEBPcEm+FKbYiaacXGyEvu+LGSsLBC96SMiXvcdryar2X8m4y7V1dhh2vXBDlrQlq87T6hNoz8Sx06ytHbIyPmrBSjIBwB8tbIvvXz/aXqGIG7LxO0G5smm63HbdsB9klZGP8QoedGe+iKVxwU5BUHxCbn8P6Qz+TD7bq89r7qt4Pd2tytsZwJQj+yDhpN9oXDaM0GsBikdsK3J7U2U9X6OSY/vC/i8I/qEJoPOIZPXycMzXGGuz0nbudVqWlOtZxRP1qrKeQQ/GrTkSrmmlNdcO/bYVdVjDlTzxpvLBkQvIe+1WhdqIYoNfmd0h4UmC8e6qM5uTIKIjy4OwVBb5VKGmq4iQaCTxIRYwDbkoJOJsNFQh3elg6N1gQKzPUeID2AQe4CcarIZ4w9B1vv9JxinoLDwgfEX48DUVoZywl0kkv8zMgalo+MPVbaQC2WC0HKE3CNj7uGiVMrFSem6jvpVgdlrtHH3sFQJeUMkMcYtfVrCG3xlL8aCDahjNZNW6TG7Ljb+EZw6IaV5N3lGHyLwiv/sLV39gAiD4OeMqOlqHEgO24EFJWtHoaZfO2/HdajMWUvUPVn5ZZMRJ4pMxI2UTW2sVu7YI9VFntrHnT7S6MPsg4l8MZd5gycWxJfsVaxX0Fe03NJrQrhjXJyjThfRgUmFquSg/BOrR7yww/QsjEoUCQepxeVuhQpz3T6M5ZjAGmlloqBPSNmTOdhZvlLSn8/6LTdI7ZOtVKhI9WcIQVN4KBnv8oMOUxZVR/kqEkF1NCak3GPXLfr8afMY4MWAIcLHdkRhDewI ZzKfZWaq c7vS73JbpT5q1RuqgOv33kOb9KJmdyAC9DTAUgOmRO0JzrwO1ZA8Q+oEhM8XmDQrN1DW2ZhzdFayawUIbkwKJupjLEJYVkHwwB/e2dKCaG5GyWMNFxDVSt0LEIm7gp6foIEu2VSxQjD0HyKoCfi/4uOnqfIZqTC4idUDxZUFaLGNKujNN4u4TEbmQwM6TJ3sa22EyXyPKUAr7ESZPG1GUl6N+0XbjcuOyTGa8Ibe7QYmD0Oe/F2/32nydkFWW6IK1slIzfhT3LaMMALk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/10/23 9:14 PM, Matthew Wilcox wrote: > On Thu, Aug 10, 2023 at 06:38:29PM +0800, Guo Hui wrote: >> In the function __nr_to_section, >> Use shift operation instead of division operation >> in order to improve the performance of memory management. >> There are no functional changes. >> >> Some performance data is as follows: >> Machine configuration: Hygon 128 cores, 256M memory >> >> Stream single core: >> with patch without patch promote >> Copy 23376.7731 23907.1532 -1.27% >> Scale 12580.2913 11679.7852 +7.71% >> Add 11922.9562 11461.8669 +4.02% >> Triad 12549.2735 11491.9798 +9.20% > How stable are these numbers? Because this patch makes no sense to me. Thank you for your reply. The increase is not stable, between 4% and 7%. > > #define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) > > with: > > #ifdef CONFIG_SPARSEMEM_EXTREME > #define SECTIONS_PER_ROOT (PAGE_SIZE / sizeof (struct mem_section)) > #else > #define SECTIONS_PER_ROOT 1 > #endif > > sizeof(struct mem_section) is a constant power-of-two. So if this > result is real, then GCC isn't able to turn a > divide-by-a-constant-power-of-two into a shift. That seems _really_ > unlikely to me. And if that is what's going on, then that needs to be > fixed! Can you examine some before-and-after assembly dumps to see if > that is what's going on? Thank you for your guide. I have done an assembly code analysis on the use of __nr_to_section in the function online_mem_sections, as follows: ffffffff81383580 : { ... ...         return pfn >> PFN_SECTION_SHIFT; ffffffff8138359d:       48 89 f8                mov    %rdi,%rax         unsigned long root = SECTION_NR_TO_ROOT(nr); ffffffff813835a0:       48 89 f9                mov    %rdi,%rcx         return pfn >> PFN_SECTION_SHIFT; ffffffff813835a3:       48 c1 e8 0f             shr    $0xf,%rax         unsigned long root = SECTION_NR_TO_ROOT(nr); ffffffff813835a7:       48 c1 e9 16             shr $0x16,%rcx               -----------------------------  A ffffffff813835ab:       e9 38 ea d5 01          jmpq ffffffff830e1fe8 <_einittext+0x2a78> In code line A, the compiler can automatically convert the division into a shift operation. My patch has the same effect, so it is unnecessary to use my patch, so the improvement of Stream above may come from other reasons, and I will continue to go deeper analyze this improvement. My intention is to use the function __builtin_popcount to calculate the number of 1s in (SECTIONS_PER_ROOT - 1) at compile time, which is the offset number of section_nr. > >> Signed-off-by: Guo Hui >> --- >> include/linux/mmzone.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 5e50b78d58ea..8dde6fb56109 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -1818,7 +1818,8 @@ struct mem_section { >> #define SECTIONS_PER_ROOT 1 >> #endif >> >> -#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) >> +#define SECTION_ROOT_SHIFT (__builtin_popcount(SECTIONS_PER_ROOT - 1)) >> +#define SECTION_NR_TO_ROOT(sec) ((sec) >> SECTION_ROOT_SHIFT) >> #define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT) >> #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) >> >> -- >> 2.20.1 >> >>