From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC7CDC4363D for ; Fri, 2 Oct 2020 05:36:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5B54A206DC for ; Fri, 2 Oct 2020 05:36:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B54A206DC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 410116B005D; Fri, 2 Oct 2020 01:36:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39AE06B0062; Fri, 2 Oct 2020 01:36:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23B118E0001; Fri, 2 Oct 2020 01:36:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id E91086B005D for ; Fri, 2 Oct 2020 01:36:03 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 7E0E71F10 for ; Fri, 2 Oct 2020 05:36:03 +0000 (UTC) X-FDA: 77325874206.11.talk54_42039b3271a1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 5D4D7180F8B80 for ; Fri, 2 Oct 2020 05:36:03 +0000 (UTC) X-HE-Tag: talk54_42039b3271a1 X-Filterd-Recvd-Size: 7989 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 2 Oct 2020 05:36:02 +0000 (UTC) IronPort-SDR: GF+l8MMdavlfNrh8yVrOu29CLe7AfAdJe0mYq7gSaLDyqLzxd1vtjY1rn4PyrX80BUQVfMqt1Z gdX1lJQJZYsg== X-IronPort-AV: E=McAfee;i="6000,8403,9761"; a="224534403" X-IronPort-AV: E=Sophos;i="5.77,326,1596524400"; d="scan'208";a="224534403" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Oct 2020 22:36:00 -0700 IronPort-SDR: 6jbjLfWQqsr8aAjCtPX3bOqRuP0w7ep6Pn0NOrfCvkw3J64sV+Kxqz7ve4t9XBDwiBnZPS2cyw 1gHeBPuottrg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,326,1596524400"; d="scan'208";a="515784545" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga005.fm.intel.com with ESMTP; 01 Oct 2020 22:35:49 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 3AFB4CB; Fri, 2 Oct 2020 08:35:47 +0300 (EEST) Date: Fri, 2 Oct 2020 08:35:47 +0300 From: "Kirill A. Shutemov" To: Lokesh Gidra Cc: Kalesh Singh , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , "Cc: Android Kernel" , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Andrew Morton , Shuah Khan , "Aneesh Kumar K.V" , Kees Cook , Peter Zijlstra , Sami Tolvanen , Masahiro Yamada , Arnd Bergmann , Frederic Weisbecker , Krzysztof Kozlowski , Hassan Naveed , Christian Brauner , Mark Rutland , Mike Rapoport , Gavin Shan , Zhenyu Ye , Jia He , John Hubbard , William Kucharski , Sandipan Das , Ralph Campbell , Mina Almasry , Ram Pai , Dave Hansen , Kamalesh Babulal , Masami Hiramatsu , Brian Geffon , SeongJae Park , linux-kernel , "moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)" , "open list:MEMORY MANAGEMENT" , "open list:KERNEL SELFTEST FRAMEWORK" Subject: Re: [PATCH 0/5] Speed up mremap on large regions Message-ID: <20201002053547.7roe7b4mpamw4uk2@black.fi.intel.com> References: <20200930222130.4175584-1-kaleshsingh@google.com> <20200930223207.5xepuvu6wr6xw5bb@black.fi.intel.com> <20201001122706.jp2zr23a43hfomyg@black.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 01, 2020 at 05:09:02PM -0700, Lokesh Gidra wrote: > On Thu, Oct 1, 2020 at 9:00 AM Kalesh Singh wrote: > > > > On Thu, Oct 1, 2020 at 8:27 AM Kirill A. Shutemov > > wrote: > > > > > > On Wed, Sep 30, 2020 at 03:42:17PM -0700, Lokesh Gidra wrote: > > > > On Wed, Sep 30, 2020 at 3:32 PM Kirill A. Shutemov > > > > wrote: > > > > > > > > > > On Wed, Sep 30, 2020 at 10:21:17PM +0000, Kalesh Singh wrote: > > > > > > mremap time can be optimized by moving entries at the PMD/PUD level if > > > > > > the source and destination addresses are PMD/PUD-aligned and > > > > > > PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and > > > > > > x86. Other architectures where this type of move is supported and known to > > > > > > be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD > > > > > > and HAVE_MOVE_PUD. > > > > > > > > > > > > Observed Performance Improvements for remapping a PUD-aligned 1GB-sized > > > > > > region on x86 and arm64: > > > > > > > > > > > > - HAVE_MOVE_PMD is already enabled on x86 : N/A > > > > > > - Enabling HAVE_MOVE_PUD on x86 : ~13x speed up > > > > > > > > > > > > - Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up > > > > > > - Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up > > > > > > > > > > > > Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD > > > > > > give a total of ~150x speed up on arm64. > > > > > > > > > > Is there a *real* workload that benefit from HAVE_MOVE_PUD? > > > > > > > > > We have a Java garbage collector under development which requires > > > > moving physical pages of multi-gigabyte heap using mremap. During this > > > > move, the application threads have to be paused for correctness. It is > > > > critical to keep this pause as short as possible to avoid jitters > > > > during user interaction. This is where HAVE_MOVE_PUD will greatly > > > > help. > > > > > > Any chance to quantify the effect of mremap() with and without > > > HAVE_MOVE_PUD? > > > > > > I doubt it's a major contributor to the GC pause. I expect you need to > > > move tens of gigs to get sizable effect. And if your GC routinely moves > > > tens of gigs, maybe problem somewhere else? > > > > > > I'm asking for numbers, because increase in complexity comes with cost. > > > If it doesn't provide an substantial benefit to a real workload > > > maintaining the code forever doesn't make sense. > > > mremap is indeed the biggest contributor to the GC pause. It has to > take place in what is typically known as a 'stop-the-world' pause, > wherein all application threads are paused. During this pause the GC > thread flips the GC roots (threads' stacks, globals etc.), and then > resumes threads along with concurrent compaction of the heap.This > GC-root flip differs depending on which compaction algorithm is being > used. > > In our case it involves updating object references in threads' stacks > and remapping java heap to a different location. The threads' stacks > can be handled in parallel with the mremap. Therefore, the dominant > factor is indeed the cost of mremap. From patches 2 and 4, it is clear > that remapping 1GB without this optimization will take ~9ms on arm64. > > Although this mremap has to happen only once every GC cycle, and the > typical size is also not going to be more than a GB or 2, pausing > application threads for ~9ms is guaranteed to cause jitters. OTOH, > with this optimization, mremap is reduced to ~60us, which is a totally > acceptable pause time. > > Unfortunately, implementation of the new GC algorithm hasn't yet > reached the point where I can quantify the effect of this > optimization. But I can confirm that without this optimization the new > GC will not be approved. IIUC, the 9ms -> 90us improvement attributed to combination HAVE_MOVE_PMD and HAVE_MOVE_PUD, right? I expect HAVE_MOVE_PMD to be reasonable for some workloads, but marginal benefit of HAVE_MOVE_PUD is in doubt. Do you see it's useful for your workload? -- Kirill A. Shutemov