* [PATCH] mm: sparse: shift operation instead of division operation for root index
@ 2023-08-10 10:38 Guo Hui
2023-08-10 13:14 ` Matthew Wilcox
0 siblings, 1 reply; 3+ messages in thread
From: Guo Hui @ 2023-08-10 10:38 UTC (permalink / raw)
To: akpm, linux-mm; +Cc: wangxiaohua, Guo Hui
In the function __nr_to_section,
Use shift operation instead of division operation
in order to improve the performance of memory management.
There are no functional changes.
Some performance data is as follows:
Machine configuration: Hygon 128 cores, 256M memory
Stream single core:
with patch without patch promote
Copy 23376.7731 23907.1532 -1.27%
Scale 12580.2913 11679.7852 +7.71%
Add 11922.9562 11461.8669 +4.02%
Triad 12549.2735 11491.9798 +9.20%
Signed-off-by: Guo Hui <guohui@uniontech.com>
---
include/linux/mmzone.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5e50b78d58ea..8dde6fb56109 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1818,7 +1818,8 @@ struct mem_section {
#define SECTIONS_PER_ROOT 1
#endif
-#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT)
+#define SECTION_ROOT_SHIFT (__builtin_popcount(SECTIONS_PER_ROOT - 1))
+#define SECTION_NR_TO_ROOT(sec) ((sec) >> SECTION_ROOT_SHIFT)
#define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
#define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1)
--
2.20.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm: sparse: shift operation instead of division operation for root index
2023-08-10 10:38 [PATCH] mm: sparse: shift operation instead of division operation for root index Guo Hui
@ 2023-08-10 13:14 ` Matthew Wilcox
2023-08-12 8:07 ` Guo Hui
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2023-08-10 13:14 UTC (permalink / raw)
To: Guo Hui; +Cc: akpm, linux-mm, wangxiaohua
On Thu, Aug 10, 2023 at 06:38:29PM +0800, Guo Hui wrote:
> In the function __nr_to_section,
> Use shift operation instead of division operation
> in order to improve the performance of memory management.
> There are no functional changes.
>
> Some performance data is as follows:
> Machine configuration: Hygon 128 cores, 256M memory
>
> Stream single core:
> with patch without patch promote
> Copy 23376.7731 23907.1532 -1.27%
> Scale 12580.2913 11679.7852 +7.71%
> Add 11922.9562 11461.8669 +4.02%
> Triad 12549.2735 11491.9798 +9.20%
How stable are these numbers? Because this patch makes no sense to me.
#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT)
with:
#ifdef CONFIG_SPARSEMEM_EXTREME
#define SECTIONS_PER_ROOT (PAGE_SIZE / sizeof (struct mem_section))
#else
#define SECTIONS_PER_ROOT 1
#endif
sizeof(struct mem_section) is a constant power-of-two. So if this
result is real, then GCC isn't able to turn a
divide-by-a-constant-power-of-two into a shift. That seems _really_
unlikely to me. And if that is what's going on, then that needs to be
fixed! Can you examine some before-and-after assembly dumps to see if
that is what's going on?
> Signed-off-by: Guo Hui <guohui@uniontech.com>
> ---
> include/linux/mmzone.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 5e50b78d58ea..8dde6fb56109 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1818,7 +1818,8 @@ struct mem_section {
> #define SECTIONS_PER_ROOT 1
> #endif
>
> -#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT)
> +#define SECTION_ROOT_SHIFT (__builtin_popcount(SECTIONS_PER_ROOT - 1))
> +#define SECTION_NR_TO_ROOT(sec) ((sec) >> SECTION_ROOT_SHIFT)
> #define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
> #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1)
>
> --
> 2.20.1
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm: sparse: shift operation instead of division operation for root index
2023-08-10 13:14 ` Matthew Wilcox
@ 2023-08-12 8:07 ` Guo Hui
0 siblings, 0 replies; 3+ messages in thread
From: Guo Hui @ 2023-08-12 8:07 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: akpm, linux-mm, wangxiaohua
On 8/10/23 9:14 PM, Matthew Wilcox wrote:
> On Thu, Aug 10, 2023 at 06:38:29PM +0800, Guo Hui wrote:
>> In the function __nr_to_section,
>> Use shift operation instead of division operation
>> in order to improve the performance of memory management.
>> There are no functional changes.
>>
>> Some performance data is as follows:
>> Machine configuration: Hygon 128 cores, 256M memory
>>
>> Stream single core:
>> with patch without patch promote
>> Copy 23376.7731 23907.1532 -1.27%
>> Scale 12580.2913 11679.7852 +7.71%
>> Add 11922.9562 11461.8669 +4.02%
>> Triad 12549.2735 11491.9798 +9.20%
> How stable are these numbers? Because this patch makes no sense to me.
Thank you for your reply.
The increase is not stable, between 4% and 7%.
>
> #define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT)
>
> with:
>
> #ifdef CONFIG_SPARSEMEM_EXTREME
> #define SECTIONS_PER_ROOT (PAGE_SIZE / sizeof (struct mem_section))
> #else
> #define SECTIONS_PER_ROOT 1
> #endif
>
> sizeof(struct mem_section) is a constant power-of-two. So if this
> result is real, then GCC isn't able to turn a
> divide-by-a-constant-power-of-two into a shift. That seems _really_
> unlikely to me. And if that is what's going on, then that needs to be
> fixed! Can you examine some before-and-after assembly dumps to see if
> that is what's going on?
Thank you for your guide.
I have done an assembly code analysis on the use of __nr_to_section in
the function online_mem_sections,
as follows:
ffffffff81383580 <online_mem_sections>:
{
... ...
return pfn >> PFN_SECTION_SHIFT;
ffffffff8138359d: 48 89 f8 mov %rdi,%rax
unsigned long root = SECTION_NR_TO_ROOT(nr);
ffffffff813835a0: 48 89 f9 mov %rdi,%rcx
return pfn >> PFN_SECTION_SHIFT;
ffffffff813835a3: 48 c1 e8 0f shr $0xf,%rax
unsigned long root = SECTION_NR_TO_ROOT(nr);
ffffffff813835a7: 48 c1 e9 16 shr
$0x16,%rcx ----------------------------- A
ffffffff813835ab: e9 38 ea d5 01 jmpq ffffffff830e1fe8
<_einittext+0x2a78>
In code line A, the compiler can automatically convert the division into
a shift operation.
My patch has the same effect, so it is unnecessary to use my patch, so
the improvement of
Stream above may come from other reasons, and I will continue to go
deeper analyze
this improvement.
My intention is to use the function __builtin_popcount to calculate the
number
of 1s in (SECTIONS_PER_ROOT - 1) at compile time, which is the offset
number
of section_nr.
>
>> Signed-off-by: Guo Hui <guohui@uniontech.com>
>> ---
>> include/linux/mmzone.h | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 5e50b78d58ea..8dde6fb56109 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1818,7 +1818,8 @@ struct mem_section {
>> #define SECTIONS_PER_ROOT 1
>> #endif
>>
>> -#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT)
>> +#define SECTION_ROOT_SHIFT (__builtin_popcount(SECTIONS_PER_ROOT - 1))
>> +#define SECTION_NR_TO_ROOT(sec) ((sec) >> SECTION_ROOT_SHIFT)
>> #define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
>> #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1)
>>
>> --
>> 2.20.1
>>
>>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-08-12 8:08 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-10 10:38 [PATCH] mm: sparse: shift operation instead of division operation for root index Guo Hui
2023-08-10 13:14 ` Matthew Wilcox
2023-08-12 8:07 ` Guo Hui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox