From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 850ABC2BA2B for ; Thu, 9 Apr 2020 15:45:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 42CA120769 for ; Thu, 9 Apr 2020 15:45:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 42CA120769 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E52C28E000D; Thu, 9 Apr 2020 11:45:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DDCB18E0006; Thu, 9 Apr 2020 11:45:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF28F8E000D; Thu, 9 Apr 2020 11:45:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0226.hostedemail.com [216.40.44.226]) by kanga.kvack.org (Postfix) with ESMTP id BD5898E0006 for ; Thu, 9 Apr 2020 11:45:43 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9AA22180AD811 for ; Thu, 9 Apr 2020 15:45:43 +0000 (UTC) X-FDA: 76688741766.12.value97_14d2bcf133318 X-HE-Tag: value97_14d2bcf133318 X-Filterd-Recvd-Size: 4337 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Apr 2020 15:45:42 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 163A7ABE9; Thu, 9 Apr 2020 15:45:40 +0000 (UTC) Date: Thu, 9 Apr 2020 17:45:38 +0200 From: Michal Hocko To: Prathu Baronia Cc: akpm@linux-foundation.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, gthelen@google.com, jack@suse.cz, ken.lin@oneplus.com, gasine.xu@oneplus.com, chintan.pandya@oneplus.com, Huang Ying Subject: Re: [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases Message-ID: <20200409154538.GR18386@dhcp22.suse.cz> References: <20200403081812.GA14090@oneplus.com> <20200403085201.GX22681@dhcp22.suse.cz> <20200409152913.GA9878@oneplus.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200409152913.GA9878@oneplus.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 09-04-20 20:59:14, Prathu Baronia wrote: > Following your response, I tried to find out real benefits of removing > effective barrier() calls. To find that out, I wrote simple diff (exp-v2) as below > on top of the base code: > > ------------------------------------------------------- > include/linux/highmem.h | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h index b471a88.. df908b4 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -145,9 +145,8 @@ do { \ > #ifndef clear_user_highpage > static inline void clear_user_highpage(struct page *page, unsigned long vaddr) { > - void *addr = kmap_atomic(page); > + void *addr = page_address(page); > clear_user_page(addr, vaddr, page); > - kunmap_atomic(addr); > } > #endif > ------------------------------------------------------- > > For consistency, I kept CPU, DDR and cache on the performance governor. Target > used is Qualcomm's SM8150 with kernel 4.14.117. In this platform, CPU0 is > Cortex-A55 and CPU6 is Cortex-A76. > > And the result of profiling of clear_huge_page() is as follows: > ------------------------------------------------------- > Ftrace results: Time mentioned is in micro-seconds. > ------------------------------------------------------- > - Base: > - CPU0: > - Sample size : 95 > - Mean : 237.383 > - Std dev : 31.288 > - CPU6: > - Sample size : 61 > - Mean : 258.065 > - Std dev : 19.97 > > ------------------------------------------------------- > - v1 (original submission): > - CPU0: > - Sample size : 80 > - Mean : 112.298 > - Std dev : 0.36 > - CPU6: > - Sample size : 83 > - Mean : 71.238 > - Std dev : 13.7819 > ------------------------------------------------------- > - exp-v2 (experimental diff mentioned above): > - CPU0: > - Sample size : 69 > - Mean : 218.911 > - Std dev : 54.306 > - CPU6: > - Sample size : 101 > - Mean : 241.522 > - Std dev : 19.3068 > ------------------------------------------------------- > > - Comparing base vs exp-v2: Simply removing barriers from kmap_atomic() code doesn't > Improve results significantly. Unless I am misreading those numbers, barrier() doesn't change anything because differences are withing a noise. So the difference is indeed caused by the more clever initialization to keep the faulted address cache hot. Could you be more specific how have you noticed the slow down? I mean, is there any real world workload that you have observed a regression for and narrowed it down to zeroing? I do realize that the initialization improvement patch doesn't really mention any real life usecase either. It is based on a microbenchmark but the objective sounds reasonable. If it regresses some other workloads then we either have to make it conditional or find out what is causing the regression and how much that regression actually matters. -- Michal Hocko SUSE Labs