From: Prathu Baronia <prathu.baronia@oneplus.com>
To: Michal Hocko <mhocko@suse.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
gregkh@linuxfoundation.org, gthelen@google.com, jack@suse.cz,
ken.lin@oneplus.com, gasine.xu@oneplus.com,
chintan.pandya@oneplus.com, Huang Ying <ying.huang@intel.com>
Subject: Re: [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases
Date: Thu, 9 Apr 2020 20:59:14 +0530 [thread overview]
Message-ID: <20200409152913.GA9878@oneplus.com> (raw)
In-Reply-To: <20200403085201.GX22681@dhcp22.suse.cz>
Following your response, I tried to find out real benefits of removing
effective barrier() calls. To find that out, I wrote simple diff (exp-v2) as below
on top of the base code:
-------------------------------------------------------
include/linux/highmem.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h index b471a88.. df908b4 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -145,9 +145,8 @@ do { \
#ifndef clear_user_highpage
static inline void clear_user_highpage(struct page *page, unsigned long vaddr) {
- void *addr = kmap_atomic(page);
+ void *addr = page_address(page);
clear_user_page(addr, vaddr, page);
- kunmap_atomic(addr);
}
#endif
-------------------------------------------------------
For consistency, I kept CPU, DDR and cache on the performance governor. Target
used is Qualcomm's SM8150 with kernel 4.14.117. In this platform, CPU0 is
Cortex-A55 and CPU6 is Cortex-A76.
And the result of profiling of clear_huge_page() is as follows:
-------------------------------------------------------
Ftrace results: Time mentioned is in micro-seconds.
-------------------------------------------------------
- Base:
- CPU0:
- Sample size : 95
- Mean : 237.383
- Std dev : 31.288
- CPU6:
- Sample size : 61
- Mean : 258.065
- Std dev : 19.97
-------------------------------------------------------
- v1 (original submission):
- CPU0:
- Sample size : 80
- Mean : 112.298
- Std dev : 0.36
- CPU6:
- Sample size : 83
- Mean : 71.238
- Std dev : 13.7819
-------------------------------------------------------
- exp-v2 (experimental diff mentioned above):
- CPU0:
- Sample size : 69
- Mean : 218.911
- Std dev : 54.306
- CPU6:
- Sample size : 101
- Mean : 241.522
- Std dev : 19.3068
-------------------------------------------------------
- Comparing base vs exp-v2: Simply removing barriers from kmap_atomic() code doesn't
Improve results significantly.
- Comparing v1 vs exp-v2: memset(0) of 2MB page straight is significantly faster than
Zeroing individual pages.
- Analysing base and exp-v2: It was expected that CPU6 should have outperformed CPU0.
But the zeroing pattern is adversarial for CPU6 and end up performing poor. Whereas,
CPU6 truly outperforms CPU0 in serialized load.
Based on above 3 points, it looks like calling straight memset(0) indeed
improves Execution time, primarily due to predictable pattern of execution for
most CPU Architectures out there.
Having said that, I also understand that, v1 will loose out on optimization made
by c79b57e462b5 which keeps caches hot around faulting address. If having caches
hot around faulting address is so important (which numbers can prove, and I
don't have insights to get those numbers), it might be better to develop on top
of v1 than not using v1 at all.
The 04/03/2020 10:52, Michal Hocko wrote:
>
> This is an old kernel. Do you see the same with the current upstream
> kernel? Btw. 60% improvement only from dropping barrier sounds
> unexpected to me. Are you sure this is the only reason? c79b57e462b5
> ("mm: hugetlb: clear target sub-page last when clearing huge page")
> is already 4.14 AFAICS, is it possible that this is the effect of this
> patch? Your patch is effectively disabling this optimization for most
> workloads that really care about it. I strongly doubt that hugetlb is a
> thing on 32b kernels these days. So this really begs for more data about
> the real underlying problem IMHO.
>
> --
> Michal Hocko
> SUSE Labs
next prev parent reply other threads:[~2020-04-09 15:29 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-03 8:18 Prathu Baronia
2020-04-03 8:52 ` Michal Hocko
2020-04-09 15:29 ` Prathu Baronia [this message]
2020-04-09 15:45 ` Michal Hocko
[not found] ` <SG2PR04MB2921D2AAA8726318EF53D83691DE0@SG2PR04MB2921.apcprd04.prod.outlook.com>
2020-04-10 9:05 ` Huang, Ying
2020-04-11 15:40 ` Chintan Pandya
2020-04-11 20:47 ` Alexander Duyck
2020-04-13 15:33 ` Prathu Baronia
2020-04-13 16:24 ` Alexander Duyck
2020-04-14 1:10 ` Huang, Ying
2020-04-10 18:54 ` Alexander Duyck
2020-04-11 8:45 ` Chintan Pandya
2020-04-14 15:55 ` Daniel Jordan
2020-04-14 17:33 ` Chintan Pandya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200409152913.GA9878@oneplus.com \
--to=prathu.baronia@oneplus.com \
--cc=akpm@linux-foundation.org \
--cc=chintan.pandya@oneplus.com \
--cc=gasine.xu@oneplus.com \
--cc=gregkh@linuxfoundation.org \
--cc=gthelen@google.com \
--cc=jack@suse.cz \
--cc=ken.lin@oneplus.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox