From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B19DEB64DD for ; Tue, 27 Jun 2023 03:31:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF3908D0003; Mon, 26 Jun 2023 23:30:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA0A48D0002; Mon, 26 Jun 2023 23:30:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9687F8D0003; Mon, 26 Jun 2023 23:30:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 867C18D0002 for ; Mon, 26 Jun 2023 23:30:59 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4FB9D1A0669 for ; Tue, 27 Jun 2023 03:30:59 +0000 (UTC) X-FDA: 80947101438.11.2C0A2D4 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf25.hostedemail.com (Postfix) with ESMTP id 6DB43A001C for ; Tue, 27 Jun 2023 03:30:56 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=DPpXKg59; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687836656; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Dps86R1lndJwF4gJQy/c3O9t7fuW8Q1JEcrGXaT4yyM=; b=JHwRJXNS55Dy0zgznT04B+8IUfv1Fp1Pgb5IMYn/15NsNFFyyIegnarHhwVrF5ww5wgG8l hKANmMj+89RwrMWpfRcdteIND+T1NNuR4/BMmuKTxEnS0CH49NkVeg5M3QAS3frPMXhqYj WLLXG3b8CFPQcnI9rkFoS/+RS0EkdB4= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=DPpXKg59; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687836656; a=rsa-sha256; cv=none; b=3s6KWBnzSGkt5DNpikEbnXFzf0qV9uCo8ix/s7Amkf1vyDqFIPgkMK7hkTIz0fs9oXtRuT 1o0idG+eIonUjSygsDySXiHJl/ZAiGjw49fHDf5gNcOZiPSOc/AqN6BHEBb8+AaQwZnJl7 gZ+cq2hvI3ojyNkti1HmtINQbRUrqDo= Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-401d1d967beso142191cf.0 for ; Mon, 26 Jun 2023 20:30:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687836655; x=1690428655; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Dps86R1lndJwF4gJQy/c3O9t7fuW8Q1JEcrGXaT4yyM=; b=DPpXKg59gxCfYZfd9BY428kZ8iu0DYsney6Rdb3rBdwDtXCqI6by6+cpd+iFCxaq79 l5qUs+UtG6qNw6vClJxqZ156uilkFr4nqOfpOjwb4sCDK3cewEwBmVMA0EIXWOjk8R9i TVfgsTBk1kHsR06KuWByKOPsHwpa4kfikZiOJBSb1rUERnaQZ2h4T5rsCZtkeArohaZX Mkw8ML+8yLd2TpK+iFeG6sjz2DCcKu+mgd+76Du4ZXe9n58Em657zW5MzIJDxXMzqKBe fEm9np6GD+3Cx5VBNcJHXVbW+BthLgsTGUPU+3D1UFEeHCXje/m+ogWz3nybucVagpcD 4StQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687836655; x=1690428655; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Dps86R1lndJwF4gJQy/c3O9t7fuW8Q1JEcrGXaT4yyM=; b=J+lPqdqj+4QjHydr/1QUdrxxYVXmr9ghp7cA3YrQxKiuBw5hOjrShRbyejJXOigrGY gfRzM5d3IUEtu9xVD/Eg3/06SjEVrmU3ZF6CQ+xGtNEpnb4v+1zGj9tdGvJ/xXXRjwav Dumyeiphai/Ws2T/PovganOZQ+r8esthVCHIKw+LXzp1wgEG4A+TocxWa/KYiKJSb8d7 7kAkFsQZBk7DQ87m6ca9xvpkB7bsAp2gyIeP3SGeRCIo1IErkeEkbAeVkclrVnP7GeHA jJk7WMRhc7wvFK0fWlNlHYZt+7fFLI8ig1+vak2VsPmfiDDHUIzfIZ3Lp+aD8PSl5I4+ GHHQ== X-Gm-Message-State: AC+VfDyx3OJFdjHBsHXw3vNNC5NR1FmFsUEhpvI+WkfWJm9JtPl8p1sE Uukg5KtedfZAm8TIvwXDkR2VScsmgPd0PrH47S2OtQ== X-Google-Smtp-Source: ACHHUZ4qGh8xmLwSTj4hOczEKK2tKapgoJ7DPtSPYajCf/D7+00nPQt51ielpDPZJ55CqXUcttPpnW7qGjNXY0ewa5Y= X-Received: by 2002:ac8:5953:0:b0:3f5:49b6:f18d with SMTP id 19-20020ac85953000000b003f549b6f18dmr57784qtz.11.1687836655385; Mon, 26 Jun 2023 20:30:55 -0700 (PDT) MIME-Version: 1.0 References: <20230626171430.3167004-1-ryan.roberts@arm.com> In-Reply-To: <20230626171430.3167004-1-ryan.roberts@arm.com> From: Yu Zhao Date: Mon, 26 Jun 2023 21:30:19 -0600 Message-ID: Subject: Re: [PATCH v1 00/10] variable-order, large folios for anonymous memory To: Ryan Roberts Cc: Andrew Morton , "Matthew Wilcox (Oracle)" , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Geert Uytterhoeven , Christian Borntraeger , Sven Schnelle , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-s390@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6DB43A001C X-Stat-Signature: 6378onarpp4w7sx35um1qkcfjofhzhbd X-HE-Tag: 1687836656-289235 X-HE-Meta: U2FsdGVkX18uO06UKP5apnfOffTEVzx0oGv3u0/fs2JzI5XDrHL347Seq90HV87J0G/TmM54tLxSeNNHPCCQ8Ph3Yubi+rVRpUnqWNFvkE+QrZS59U6lBBoqikerg/kk6IpQSpzIY3ZXlU4oL1xl6sqj0JmjSYt6C0/bQk7mk++eAHKDTFn9DG8hODr77Huvg72p8sINpKgsLeBHPayAH5eJd1BohoGF2XOJiHNqqvOcerUlnO6puxtbGUb2zK30FyOSY2s3wHgXcunzeUySB6tTvdADHfC1GynN3Ou3A81lxguqmDVQSoUsnL2eqCkHTgU2IaZWhM3+2oXVHmclMbQT8fKtvGV0CQcpPSVdHM/IIN4LwKvryq73HpnGYnr4NTs+TqvfogNRpi5uyPs7fxGGJ2N2jsXJn4TkroLTqkgLRnOphm6rc4StZsn8YAl4xfjQlU4NZpp5w7csExiGlHvBawiA8hsS/bY7+6lfvaBpYDocWLd+O+skS3S3zEGCgRy+qr69U1xotL8xPsbhp6LDGbEX1FTxddIyhdwweHeS6e7+s3EPNV93A6zuKnlXP/7MwwmSRx31KaqOe745V3/gAEMaIOvbam51ByyaTu2iAAG44i+gubO2bWk1+jkuw4+mA0Eb9JJYBrtcDyL9hteZjn3SuWqZT9ZFiVZdlabg27kL9RI/b1lXexZfIO9m9dRVk/N0AaziGnOpz5rL1Zwdrtk7tI/0b82ref/wStIqyEbmnASXOtlyce19BAMCg6MZWEEO7Y9hSb1PFnfIdsGpCqlSfMQ41XV4HEXwiBI0iofr+zO0W0ELNECKxL3yrG9EgoQbnYbsy8FQy89lUCgoQU0Ay46ZtHgexR/q81n5n3tDBRjUQ7poKAt9LqDzLGqUQlLfyXWibHgnuJMWgGjlDM9rzarVZDfBA2xqLHgfbfF63CaveGM6WTtNrhPrkn6NrAlo9s448anJrD1 509WzsOu Gd2ynFraWL46DOGLHMkK54HYMMRJrS3e+QS9jBYlYszG2sztKnsaJRo8PWEH9Riux2KI66U5aaEpT4zsypqQdEgYpxmlUajZMIoAYJkoIsSvhAOs9NJthxP1EZLmakvw1TBqISHwHc5qQEuGvO8hxT8gutkikRIfhPm/7dYYCxa+IYqZtnJIN02DYKYy8qqlElPljRxy62BWdaDS+NQ3aPfMy+VEp63+dCSyU3c2CbcwNwtUe+sKmGFmZ7f3nkXrVlRzzw+ipGfBLKe65VrJxxU/H7LhyJAQ2plkR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 26, 2023 at 11:14=E2=80=AFAM Ryan Roberts wrote: > > Hi All, > > Following on from the previous RFCv2 [1], this series implements variable= order, > large folios for anonymous memory. The objective of this is to improve > performance by allocating larger chunks of memory during anonymous page f= aults: > > - Since SW (the kernel) is dealing with larger chunks of memory than bas= e > pages, there are efficiency savings to be had; fewer page faults, batc= hed PTE > and RMAP manipulation, fewer items on lists, etc. In short, we reduce = kernel > overhead. This should benefit all architectures. > - Since we are now mapping physically contiguous chunks of memory, we ca= n take > advantage of HW TLB compression techniques. A reduction in TLB pressur= e > speeds up kernel and user space. arm64 systems have 2 mechanisms to co= alesce > TLB entries; "the contiguous bit" (architectural) and HPA (uarch). > > This patch set deals with the SW side of things only and based on feedbac= k from > the RFC, aims to be the most minimal initial change, upon which future > incremental changes can be added. For this reason, the new behaviour is h= idden > behind a new Kconfig switch, CONFIG_LARGE_ANON_FOLIO, which is disabled b= y > default. Although the code has been refactored to parameterize the desire= d order > of the allocation, when the feature is disabled (by forcing the order to = be > always 0) my performance tests measure no regression. So I'm hoping this = will be > a suitable mechanism to allow incremental submissions to the kernel witho= ut > affecting the rest of the world. > > The patches are based on top of v6.4 plus Matthew Wilcox's set_ptes() ser= ies > [2], which is a hard dependency. I'm not sure of Matthew's exact plans fo= r > getting that series into the kernel, but I'm hoping we can start the revi= ew > process on this patch set independently. I have a branch at [3]. > > I've posted a separate series concerning the HW part (contpte mapping) fo= r arm64 > at [4]. > > > Performance > ----------- > > Below results show 2 benchmarks; kernel compilation and speedometer 2.0 (= a > javascript benchmark running in Chromium). Both cases are running on Ampe= re > Altra with 1 NUMA node enabled, Ubuntu 22.04 and XFS filesystem. Each ben= chmark > is repeated 15 times over 5 reboots and averaged. > > All improvements are relative to baseline-4k. 'anonfolio-basic' is this s= eries. > 'anonfolio' is the full patch set similar to the RFC with the additional = changes > to the extra 3 fault paths. The rest of the configs are described at [4]. > > Kernel Compilation (smaller is better): > > | kernel | real-time | kern-time | user-time | > |:----------------|------------:|------------:|------------:| > | baseline-4k | 0.0% | 0.0% | 0.0% | > | anonfolio-basic | -5.3% | -42.9% | -0.6% | > | anonfolio | -5.4% | -46.0% | -0.3% | > | contpte | -6.8% | -45.7% | -2.1% | > | exefolio | -8.4% | -46.4% | -3.7% | > | baseline-16k | -8.7% | -49.2% | -3.7% | > | baseline-64k | -10.5% | -66.0% | -3.5% | > > Speedometer 2.0 (bigger is better): > > | kernel | runs_per_min | > |:----------------|---------------:| > | baseline-4k | 0.0% | > | anonfolio-basic | 0.7% | > | anonfolio | 1.2% | > | contpte | 3.1% | > | exefolio | 4.2% | > | baseline-16k | 5.3% | Thanks for pushing this forward! > Changes since RFCv2 > ------------------- > > - Simplified series to bare minimum (on David Hildenbrand's advice) My impression is that this series still includes many pieces that can be split out and discussed separately with followup series. (I skipped 04/10 and will look at it tomorrow.)