linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Prathu Baronia <prathu.baronia@oneplus.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	gregkh@linuxfoundation.org, gthelen@google.com, jack@suse.cz,
	ken.lin@oneplus.com, gasine.xu@oneplus.com,
	chintan.pandya@oneplus.com, Huang Ying <ying.huang@intel.com>
Subject: Re: [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases
Date: Thu, 9 Apr 2020 17:45:38 +0200	[thread overview]
Message-ID: <20200409154538.GR18386@dhcp22.suse.cz> (raw)
In-Reply-To: <20200409152913.GA9878@oneplus.com>

On Thu 09-04-20 20:59:14, Prathu Baronia wrote:
> Following your response, I tried to find out real benefits of removing    
> effective barrier() calls. To find that out, I wrote simple diff (exp-v2) as below    
> on top of the base code:    
>     
> -------------------------------------------------------
> include/linux/highmem.h | 3 +--    
> 1 file changed, 1 insertion(+), 2 deletions(-)    
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h index b471a88.. df908b4 100644    
> --- a/include/linux/highmem.h    
> +++ b/include/linux/highmem.h    
> @@ -145,9 +145,8 @@ do {                                                            \    
> #ifndef clear_user_highpage    
> static inline void clear_user_highpage(struct page *page, unsigned long vaddr)  {    
> -       void *addr = kmap_atomic(page);    
> +       void *addr = page_address(page);    
>         clear_user_page(addr, vaddr, page);    
> -       kunmap_atomic(addr);    
> }    
> #endif    
> -------------------------------------------------------
> 
> For consistency, I kept CPU, DDR and cache on the performance governor. Target
> used is Qualcomm's SM8150 with kernel 4.14.117. In this platform, CPU0 is
> Cortex-A55 and CPU6 is Cortex-A76.
> 
> And the result of profiling of clear_huge_page() is as follows:
> -------------------------------------------------------
> Ftrace results: Time mentioned is in micro-seconds.
> -------------------------------------------------------
> - Base:
> 	- CPU0:
> 		- Sample size : 95
> 		- Mean : 237.383
> 		- Std dev : 31.288
> 	- CPU6:
> 		- Sample size : 61
> 		- Mean : 258.065
> 		- Std dev : 19.97
> 
> -------------------------------------------------------
> - v1 (original submission):
> 	- CPU0:
> 		- Sample size : 80
> 		- Mean : 112.298
> 		- Std dev : 0.36
> 	- CPU6:
> 		- Sample size : 83
> 		- Mean : 71.238
> 		- Std dev : 13.7819
> -------------------------------------------------------
> - exp-v2 (experimental diff mentioned above):
> 	- CPU0:
> 		- Sample size : 69
> 		- Mean : 218.911
> 		- Std dev : 54.306
> 	- CPU6:
> 		- Sample size : 101
> 		- Mean : 241.522
> 		- Std dev : 19.3068
> -------------------------------------------------------
> 
> - Comparing base vs exp-v2: Simply removing barriers from kmap_atomic() code doesn't
>   Improve results significantly.

Unless I am misreading those numbers, barrier() doesn't change anything
because differences are withing a noise. So the difference is indeed
caused by the more clever initialization to keep the faulted address
cache hot.

Could you be more specific how have you noticed the slow down? I mean,
is there any real world workload that you have observed a regression for
and narrowed it down to zeroing?

I do realize that the initialization improvement patch doesn't really
mention any real life usecase either. It is based on a microbenchmark
but the objective sounds reasonable. If it regresses some other
workloads then we either have to make it conditional or find out what is
causing the regression and how much that regression actually matters.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2020-04-09 15:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03  8:18 Prathu Baronia
2020-04-03  8:52 ` Michal Hocko
2020-04-09 15:29   ` Prathu Baronia
2020-04-09 15:45     ` Michal Hocko [this message]
     [not found]       ` <SG2PR04MB2921D2AAA8726318EF53D83691DE0@SG2PR04MB2921.apcprd04.prod.outlook.com>
2020-04-10  9:05         ` Huang, Ying
2020-04-11 15:40           ` Chintan Pandya
2020-04-11 20:47             ` Alexander Duyck
2020-04-13 15:33               ` Prathu Baronia
2020-04-13 16:24                 ` Alexander Duyck
2020-04-14  1:10                 ` Huang, Ying
2020-04-10 18:54 ` Alexander Duyck
2020-04-11  8:45   ` Chintan Pandya
2020-04-14 15:55     ` Daniel Jordan
2020-04-14 17:33       ` Chintan Pandya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200409154538.GR18386@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=chintan.pandya@oneplus.com \
    --cc=gasine.xu@oneplus.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gthelen@google.com \
    --cc=jack@suse.cz \
    --cc=ken.lin@oneplus.com \
    --cc=linux-mm@kvack.org \
    --cc=prathu.baronia@oneplus.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox