From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,MSGID_FROM_MTA_HEADER,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39DB2C2BBFD for ; Thu, 9 Apr 2020 15:29:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B613E21655 for ; Thu, 9 Apr 2020 15:29:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=oneplus.com header.i=@oneplus.com header.b="sQpr4L0q" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B613E21655 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=oneplus.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 698738E000E; Thu, 9 Apr 2020 11:29:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6209A8E0006; Thu, 9 Apr 2020 11:29:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C1708E000E; Thu, 9 Apr 2020 11:29:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id 3845F8E0006 for ; Thu, 9 Apr 2020 11:29:25 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id ED970181AEF31 for ; Thu, 9 Apr 2020 15:29:24 +0000 (UTC) X-FDA: 76688700648.12.sense60_17db0b012ff56 X-HE-Tag: sense60_17db0b012ff56 X-Filterd-Recvd-Size: 9844 Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-eopbgr1310109.outbound.protection.outlook.com [40.107.131.109]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Apr 2020 15:29:23 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JVt7wNBz1c9azfWiwxJlFl2gylN1QA6mHY16aWNT5JHWo7DaqtbR6aLNIn6CGtptbps3KvGfGJvrvBp3KjuB0IW4ohRmAOFQoFO9iOtg5IXWWh5/RCrXlReZsWokEb/zOA9aW30/YQQuGyBCzFDm2Cg4pWARnv4VzzgInvHSDB0daEAmccT3Stnmg1EixRYkYcgL3fmf9d0VUQKVu3pL37NroXX4f+VwLeTIZ9ZsgL/b4iZ9Ourf8mGSje8DOPVXaul3L24UlzqFzPh27UIVjtqNGQXTcME7Fv6HkooGLUG4Ky43stXZGjzVLrnJt67FANZuuELfmg5G58WT/zLINQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MqQiDO+yoh4lRHVtZ04VVjlypArMitBtKj00qeseSWQ=; b=AqTZOdbbv7653w24JHvXQKqSQv8HOsAQJL6IYuc/wCuphOzCIZY3o/bPZ6CFq3eynW7jpTLR4LTVZk2loL5idt5e20VCHFxF8ZVT49ujGNZ31AgrUr8LW+ZBptH6vb46u0yVI28b9YOavHJSUwSvCWXBbcuyjlMWSE0LTrHLETZvfWuj7Yml69yvcO8fAPWUBDiuwpHMhPz8p6QGE8eovEgsPNflYKXx/2CptFHbKrjCeVDbGXPQOMJNqIVqUhYZZvtn7wshmcXRWxDVgcfL7bWfu3rcg+R2xiNnVRKQk+IrQTvlcG5ZRmh2TrGfXY3BCBShzugqgJzxSDinVf5BJw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oneplus.com; dmarc=pass action=none header.from=oneplus.com; dkim=pass header.d=oneplus.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oneplus.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MqQiDO+yoh4lRHVtZ04VVjlypArMitBtKj00qeseSWQ=; b=sQpr4L0qkZn+hbumNjuzqmdoB4rCgi7psX2Jbw1rJnresHFe0y3ezwOkVDdij1JGxj+K/E5DyoXGadU62ppGFdTF/xeiafJuLIuxEkjgUH4USJCKkLZG9NmJJP4kEw2IHKP9KmmLP8eAb4mTkiM9S9D7tGpdpZFe1fsH87zWdno= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=prathu.baronia@oneplus.com; Received: from HKAPR04MB4067.apcprd04.prod.outlook.com (20.180.91.77) by HKAPR04MB3938.apcprd04.prod.outlook.com (20.180.92.13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2900.16; Thu, 9 Apr 2020 15:29:18 +0000 Received: from HKAPR04MB4067.apcprd04.prod.outlook.com ([fe80::41bb:a53c:a2f5:3511]) by HKAPR04MB4067.apcprd04.prod.outlook.com ([fe80::41bb:a53c:a2f5:3511%9]) with mapi id 15.20.2878.022; Thu, 9 Apr 2020 15:29:18 +0000 Date: Thu, 9 Apr 2020 20:59:14 +0530 From: Prathu Baronia To: Michal Hocko Cc: akpm@linux-foundation.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, gthelen@google.com, jack@suse.cz, ken.lin@oneplus.com, gasine.xu@oneplus.com, chintan.pandya@oneplus.com, Huang Ying Subject: Re: [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases Message-ID: <20200409152913.GA9878@oneplus.com> References: <20200403081812.GA14090@oneplus.com> <20200403085201.GX22681@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200403085201.GX22681@dhcp22.suse.cz> User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: HK2PR0401CA0005.apcprd04.prod.outlook.com (2603:1096:202:2::15) To HKAPR04MB4067.apcprd04.prod.outlook.com (2603:1096:203:de::13) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from oneplus.com (183.83.136.195) by HK2PR0401CA0005.apcprd04.prod.outlook.com (2603:1096:202:2::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2900.18 via Frontend Transport; Thu, 9 Apr 2020 15:29:16 +0000 X-Originating-IP: [183.83.136.195] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: f3e81ab9-3fb8-4fc0-a4af-08d7dc9ac4ae X-MS-TrafficTypeDiagnostic: HKAPR04MB3938:|HKAPR04MB3938: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-Forefront-PRVS: 0368E78B5B X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:HKAPR04MB4067.apcprd04.prod.outlook.com;PTR:;CAT:NONE;SFTY:;SFS:(10019020)(4636009)(366004)(33656002)(26005)(956004)(4326008)(55016002)(2616005)(86362001)(6916009)(44832011)(1076003)(8886007)(66476007)(5660300002)(81166007)(81156014)(8936002)(2906002)(52116002)(66556008)(498600001)(66946007)(8676002)(186003)(7696005)(16526019)(36756003);DIR:OUT;SFP:1102; Received-SPF: None (protection.outlook.com: oneplus.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: DAsum3o0Kw8RMRh5uyTazm978c0sdE2mBa7K8sBhDa31JlKH9l+UnKj6/2JobjN/rUCHzHAUkaeGNFd6V/7+MrvlviQDUrGb1b6/r2YGC8UvJBupG/tfDDIVF3lIU47hBXbRGtHR2dK36ZDhRxsUDymJA7vcEQSP8hostLasz/hcUkJ8OHEMtZ4IxnbFykw6v3iiM+4YJU9nRerLoDGPfS6vHL+nQtyQSgX4JV2iiQbbSi2GmplXL+HMl/qFXBVK8yB4T1zxYCPALQsx4YRIXOlunQzRhWmPAygcT0OTxlWDkzt4uJ2368r3AuQvX5+whEQxqz1sz0zS0/AudXD8TpSEC7uyxpqw93I7rzkj8p+smhDzNF24LNebTAbIL8p8zs+0NFG+iEJvTN9hCDzSmPHY6bsQhlUyxekLgTaMWZ4Y9VnMQdaJw/UZSyfawCqL X-MS-Exchange-AntiSpam-MessageData: 3U8E5qA08pPlQEV3dozPHmOGz+4KxnOCDRVMwAn4T6raVn+vD0UqTWOMLOlmAOGRLo1fLOaomS42uJYWDIf/mV249N6RCCLTyJctYBT04752RvJCRy+qSTVgzo6qCrUwELhqM4TXr0bWH3otJoC9TA== X-OriginatorOrg: oneplus.com X-MS-Exchange-CrossTenant-Network-Message-Id: f3e81ab9-3fb8-4fc0-a4af-08d7dc9ac4ae X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Apr 2020 15:29:18.1408 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 0423909d-296c-463e-ab5c-e5853a518df8 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: /yYy2pzM3+bpwtTTUNfn+nSeYELgV1+yDFEqgvtoRGXJZBPk0d7Ms1mYZTGdr2SHoWAEhiyBdVXLkWNje06qx6QFtgenyHdhtfoKoxfrOUk= X-MS-Exchange-Transport-CrossTenantHeadersStamped: HKAPR04MB3938 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Following your response, I tried to find out real benefits of removing effective barrier() calls. To find that out, I wrote simple diff (exp-v2) as below on top of the base code: ------------------------------------------------------- include/linux/highmem.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/highmem.h b/include/linux/highmem.h index b471a88.. df908b4 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -145,9 +145,8 @@ do { \ #ifndef clear_user_highpage static inline void clear_user_highpage(struct page *page, unsigned long vaddr) { - void *addr = kmap_atomic(page); + void *addr = page_address(page); clear_user_page(addr, vaddr, page); - kunmap_atomic(addr); } #endif ------------------------------------------------------- For consistency, I kept CPU, DDR and cache on the performance governor. Target used is Qualcomm's SM8150 with kernel 4.14.117. In this platform, CPU0 is Cortex-A55 and CPU6 is Cortex-A76. And the result of profiling of clear_huge_page() is as follows: ------------------------------------------------------- Ftrace results: Time mentioned is in micro-seconds. ------------------------------------------------------- - Base: - CPU0: - Sample size : 95 - Mean : 237.383 - Std dev : 31.288 - CPU6: - Sample size : 61 - Mean : 258.065 - Std dev : 19.97 ------------------------------------------------------- - v1 (original submission): - CPU0: - Sample size : 80 - Mean : 112.298 - Std dev : 0.36 - CPU6: - Sample size : 83 - Mean : 71.238 - Std dev : 13.7819 ------------------------------------------------------- - exp-v2 (experimental diff mentioned above): - CPU0: - Sample size : 69 - Mean : 218.911 - Std dev : 54.306 - CPU6: - Sample size : 101 - Mean : 241.522 - Std dev : 19.3068 ------------------------------------------------------- - Comparing base vs exp-v2: Simply removing barriers from kmap_atomic() code doesn't Improve results significantly. - Comparing v1 vs exp-v2: memset(0) of 2MB page straight is significantly faster than Zeroing individual pages. - Analysing base and exp-v2: It was expected that CPU6 should have outperformed CPU0. But the zeroing pattern is adversarial for CPU6 and end up performing poor. Whereas, CPU6 truly outperforms CPU0 in serialized load. Based on above 3 points, it looks like calling straight memset(0) indeed improves Execution time, primarily due to predictable pattern of execution for most CPU Architectures out there. Having said that, I also understand that, v1 will loose out on optimization made by c79b57e462b5 which keeps caches hot around faulting address. If having caches hot around faulting address is so important (which numbers can prove, and I don't have insights to get those numbers), it might be better to develop on top of v1 than not using v1 at all. The 04/03/2020 10:52, Michal Hocko wrote: > > This is an old kernel. Do you see the same with the current upstream > kernel? Btw. 60% improvement only from dropping barrier sounds > unexpected to me. Are you sure this is the only reason? c79b57e462b5 > ("mm: hugetlb: clear target sub-page last when clearing huge page") > is already 4.14 AFAICS, is it possible that this is the effect of this > patch? Your patch is effectively disabling this optimization for most > workloads that really care about it. I strongly doubt that hugetlb is a > thing on 32b kernels these days. So this really begs for more data about > the real underlying problem IMHO. > > -- > Michal Hocko > SUSE Labs