From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E75A5EB64DC for ; Thu, 29 Jun 2023 02:21:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80C6B8D0002; Wed, 28 Jun 2023 22:21:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BC8E8D0001; Wed, 28 Jun 2023 22:21:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6AB418D0002; Wed, 28 Jun 2023 22:21:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5A19D8D0001 for ; Wed, 28 Jun 2023 22:21:21 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 136ABA0B4D for ; Thu, 29 Jun 2023 02:21:21 +0000 (UTC) X-FDA: 80954183562.07.EEB8E5F Received: from mail-oi1-f172.google.com (mail-oi1-f172.google.com [209.85.167.172]) by imf09.hostedemail.com (Postfix) with ESMTP id 489D314000C for ; Thu, 29 Jun 2023 02:21:19 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="f/AVb8r9"; spf=pass (imf09.hostedemail.com: domain of shy828301@gmail.com designates 209.85.167.172 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688005279; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ySyAmfVhMfVx98sxPbiAVA/jFjbxdrr+oshC1ybeyhc=; b=Ib5T/pG542nD8aQ6RekXaldsmY4QmgbyUI4TQOBdpkBEw2vSQgg/6nGTOuMGxZtohUtuGX 5aLxovdH+X3CsLjQnVRuP03M2BfrM/0orqkSJ4jsqHpcyWRRN/fKh8R3zqpBO5EItLOPC1 hGsC0cr9rhTO1SomvufF5PQ4FVsRJwc= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="f/AVb8r9"; spf=pass (imf09.hostedemail.com: domain of shy828301@gmail.com designates 209.85.167.172 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688005279; a=rsa-sha256; cv=none; b=4RjM4KCzN/fc2LFmY7QUjrTIlO9ttBhsWVeoj1+wHMBo41LGzgz6aLJur4Fo9eZfExKvUf EoKonDVoMBzoQN5NBvAkCF0jwklKs7qFhi36O96QLrpEq+YpgH72mZHdCirk8x1RQ9C0nJ HaWGX4IjX7KZ6fgvgdDWNoEYUQ0YATA= Received: by mail-oi1-f172.google.com with SMTP id 5614622812f47-3a1c162cdfeso149158b6e.2 for ; Wed, 28 Jun 2023 19:21:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688005278; x=1690597278; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ySyAmfVhMfVx98sxPbiAVA/jFjbxdrr+oshC1ybeyhc=; b=f/AVb8r9xYD3+whHU3qtmPhPa/Wh01Idfg/drHq8Gr6guNSXvXQ/pAjwYbqItSG+Nj Z84SzpAh17q6wvr3q2N5ze1fpSRPxUpsP/kxaREQxH076ZEmRHVPam+YCm6P0CDXBKG6 bVNS+8UJBISeaOmr3dbOLt8PsGF/ZtgGU2Ana6FMCgIF2G6mo13X0pKJi/mfsena8JFL udMFLL4DI8r5WrXp8NJ8kEl/TBJXr9bTse16S39f5NY08Xj8hIp0wcv53gbhlKQIc/Vw lmaAbiDcW5DHveSlmwuZtRQ61CIXCSUOII2jpAN9w4tXZMimamKxTfOxRlE1VQ4wmzgV H1Bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688005278; x=1690597278; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ySyAmfVhMfVx98sxPbiAVA/jFjbxdrr+oshC1ybeyhc=; b=YV9bKBvcqW7JUDA21DTKttup1dz3VaD9WJJHdvlXqcL8UJFuoWRidYCZconSmtuS3K UD2WZGZpJqmlU6UwRTvkPg/vPSsPEbuWonwDDM26lcOCI0PuvXH2MC843YiNFLfhLd9q QhU6M794iTy+6GhrEOQRTNC12J31Bzn3TiVafMmXFPG+n3qmLt9hjLp/aUQ4TqsYgkfc rAa/rYgDEUVL/Bu5Cfv1uCq+A/GBIhFn+YJQlqRtBxHsknufDNITvFP/eIs/UrpspJxZ 6y+F8VXD3FLErm4QvtloqsU1APflKOBHO9w/Ek2MfHnJJefrTYjXCVE7h/l0R95K29ZU gbOw== X-Gm-Message-State: AC+VfDyxhY9CattTyAwnPcuJEIupevijtGOsJr3QzmA/6ijksyYqXZYM PIK5GEWdo0gwMa7QNJvbaN9+eFOBBJzFPPDc7go= X-Google-Smtp-Source: ACHHUZ7UAQTQw0vu93YPC3yL+Fh5PVTx9pSYuEGC+hZSnUlnYtuDlw64T8GUG8gOF+WXmdmX60xB15SofoDXqSO9ovU= X-Received: by 2002:aca:1703:0:b0:3a1:d075:348b with SMTP id j3-20020aca1703000000b003a1d075348bmr10384323oii.59.1688005278204; Wed, 28 Jun 2023 19:21:18 -0700 (PDT) MIME-Version: 1.0 References: <20230626171430.3167004-1-ryan.roberts@arm.com> In-Reply-To: From: Yang Shi Date: Wed, 28 Jun 2023 19:21:06 -0700 Message-ID: Subject: Re: [PATCH v1 00/10] variable-order, large folios for anonymous memory To: Yu Zhao Cc: Ryan Roberts , Andrew Morton , "Matthew Wilcox (Oracle)" , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Geert Uytterhoeven , Christian Borntraeger , Sven Schnelle , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-s390@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 489D314000C X-Rspam-User: X-Stat-Signature: ts8dgtrenorn9xpehcrp6r4jkq6ju66y X-Rspamd-Server: rspam01 X-HE-Tag: 1688005279-29796 X-HE-Meta: U2FsdGVkX19cK5zAZvIn9ZaNF4LxIPcLX623HEyFTGUyiNZhOY8OI4dv0VgRJVuACJFZYbakSCzr6ZIImgoD6lNbzFXYgmpiPT+VjqhNuNJSoyRQtk5N3hWqedwCRsIjPuW0Eg5vFZh3/XlSWpU8d9lezIk+8zYEJgHAmGIEZP4eoE0ZcUX+4rW51iOmwIe4d9DBPbNqQ6VEJrJ2Qv0sA0k0MlnKOAUMrP0R4Gk0C/JhvNS3Nxej/dKddZx02ND3Fb8XmYcG+XyYwI1A2QvDmbpDV1JxEWOrwh3jPsQhPjkBJG7KA7Avc2KaiJAouruOR3Wh9M4ksFb0e3OK1bFqvNIHaDThbt0zCLKomXsIAOA44rqTsDfeu2hftW3NZf4fgYPMdeBUVHA8WqC060/ly0Y9h1u+o6837C+vLN8t+TDp5beNjjUV96m3T7Bg8QxZukv9jSthVgd4L7F0rjXM8mPjTZUl9I9x8RxnE/KVJYnNrGv6oFixQswohAihYOMZUlK+10iNuEQpLI+I8/vkMtOyIqI8btouBFlq+MpX9sWjlATeDXAJtJ2D+yEOjwhR0eqNKcFrBXPiBbtgNsWEeT178ebEb6Lar7+jwlE8uikmWzSJzzU+GXEnwAkPB21fiP4sD4sDV3aJ6AmYVh6GWKXRv0WXTPnCXeT+CO0j+qrpOm0swTRg4ILOhCqCTak32OTvK2Sh/ZH78tUj0j72c9WI1aCB1unPcen3z0qIrTMqaBRbUYTvYiIDou/1zsL7V7hSy2BNqP6U86gYrdxbOJFNRxEmvyCTPiAhWhzzHBK3pRzJrLWVdZ0X8j/o2JDNjqllZA1hKcT+Bx3xtJL5tw0UPRopkRqGLfhO98Puhb1g/vqmgOJC4YcfjFmlEVyRoOiof1XNLJAXbgnIEmcifKqL5jhDyyvGsJFVYSA25bplBR5mtw565F9zWJMGqxS/m1ZAA1ewCLy6+uF0yT+ HbE1zlmS nDtUmCVDok2V1u2pGqLui3tHQ+uJ7iJDx4lkauSjZQzIV9Hk/r+WiQM5SAzQdGpKnVxfUcMaZdrNjxFclBUQZv638LwDaCk4oIIecbpHi0csFtWbssfq+5/d+S50W2t6vCJnag1qPKS78tg6kPmsPYLEEsoDwaoX+AwWFmGkhzQzh94kx+6OZtmM8LTV+7pXO6qOJo1frSX7QAYn8zcUaEu0fBw/DZ6SgbvS8UKNQouXDvsiQjKi7/dBeK9nsGdZSd0JCTyedpQFIkzHDYlIvJmIabM4ulJscZFCUxCt1fELyr+N/SRTm2w0xkMCyV8iiC9DsaK0IUC+2tG2JMvuVUxSBiXZruHA3g+Fqo4iED58+W4MC8jcIlSWIVdulDHd53A3IHJg5loYNleoNmh79PRAzp1JjHyyJDvPZntoVkXIWnCf6DWi3bIBMNg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 27, 2023 at 12:49=E2=80=AFAM Yu Zhao wrote: > > On Mon, Jun 26, 2023 at 9:30=E2=80=AFPM Yu Zhao wrote= : > > > > On Mon, Jun 26, 2023 at 11:14=E2=80=AFAM Ryan Roberts wrote: > > > > > > Hi All, > > > > > > Following on from the previous RFCv2 [1], this series implements vari= able order, > > > large folios for anonymous memory. The objective of this is to improv= e > > > performance by allocating larger chunks of memory during anonymous pa= ge faults: > > > > > > - Since SW (the kernel) is dealing with larger chunks of memory than= base > > > pages, there are efficiency savings to be had; fewer page faults, = batched PTE > > > and RMAP manipulation, fewer items on lists, etc. In short, we red= uce kernel > > > overhead. This should benefit all architectures. > > > - Since we are now mapping physically contiguous chunks of memory, w= e can take > > > advantage of HW TLB compression techniques. A reduction in TLB pre= ssure > > > speeds up kernel and user space. arm64 systems have 2 mechanisms t= o coalesce > > > TLB entries; "the contiguous bit" (architectural) and HPA (uarch). > > > > > > This patch set deals with the SW side of things only and based on fee= dback from > > > the RFC, aims to be the most minimal initial change, upon which futur= e > > > incremental changes can be added. For this reason, the new behaviour = is hidden > > > behind a new Kconfig switch, CONFIG_LARGE_ANON_FOLIO, which is disabl= ed by > > > default. Although the code has been refactored to parameterize the de= sired order > > > of the allocation, when the feature is disabled (by forcing the order= to be > > > always 0) my performance tests measure no regression. So I'm hoping t= his will be > > > a suitable mechanism to allow incremental submissions to the kernel w= ithout > > > affecting the rest of the world. > > > > > > The patches are based on top of v6.4 plus Matthew Wilcox's set_ptes()= series > > > [2], which is a hard dependency. I'm not sure of Matthew's exact plan= s for > > > getting that series into the kernel, but I'm hoping we can start the = review > > > process on this patch set independently. I have a branch at [3]. > > > > > > I've posted a separate series concerning the HW part (contpte mapping= ) for arm64 > > > at [4]. > > > > > > > > > Performance > > > ----------- > > > > > > Below results show 2 benchmarks; kernel compilation and speedometer 2= .0 (a > > > javascript benchmark running in Chromium). Both cases are running on = Ampere > > > Altra with 1 NUMA node enabled, Ubuntu 22.04 and XFS filesystem. Each= benchmark > > > is repeated 15 times over 5 reboots and averaged. > > > > > > All improvements are relative to baseline-4k. 'anonfolio-basic' is th= is series. > > > 'anonfolio' is the full patch set similar to the RFC with the additio= nal changes > > > to the extra 3 fault paths. The rest of the configs are described at = [4]. > > > > > > Kernel Compilation (smaller is better): > > > > > > | kernel | real-time | kern-time | user-time | > > > |:----------------|------------:|------------:|------------:| > > > | baseline-4k | 0.0% | 0.0% | 0.0% | > > > | anonfolio-basic | -5.3% | -42.9% | -0.6% | > > > | anonfolio | -5.4% | -46.0% | -0.3% | > > > | contpte | -6.8% | -45.7% | -2.1% | > > > | exefolio | -8.4% | -46.4% | -3.7% | > > > | baseline-16k | -8.7% | -49.2% | -3.7% | > > > | baseline-64k | -10.5% | -66.0% | -3.5% | > > > > > > Speedometer 2.0 (bigger is better): > > > > > > | kernel | runs_per_min | > > > |:----------------|---------------:| > > > | baseline-4k | 0.0% | > > > | anonfolio-basic | 0.7% | > > > | anonfolio | 1.2% | > > > | contpte | 3.1% | > > > | exefolio | 4.2% | > > > | baseline-16k | 5.3% | > > > > Thanks for pushing this forward! > > > > > Changes since RFCv2 > > > ------------------- > > > > > > - Simplified series to bare minimum (on David Hildenbrand's advice) > > > > My impression is that this series still includes many pieces that can > > be split out and discussed separately with followup series. > > > > (I skipped 04/10 and will look at it tomorrow.) > > I went through the series twice. Here what I think a bare minimum > series (easier to review/debug/land) would look like: > 1. a new arch specific function providing a prefered order within (0, > PMD_ORDER). > 2. an extended anon folio alloc API taking that order (02/10, partially). > 3. an updated folio_add_new_anon_rmap() covering the large() && > !pmd_mappable() case (similar to 04/10). > 4. s/folio_test_pmd_mappable/folio_test_large/ in page_remove_rmap() > (06/10, reviewed-by provided). > 5. finally, use the extended anon folio alloc API with the arch > preferred order in do_anonymous_page() (10/10, partially). > > The rest can be split out into separate series and move forward in > parallel with probably a long list of things we need/want to do. Yeah, the suggestion makes sense to me. And I'd like to go with the simplest way unless there is strong justification for extra optimization for the time being IMHO. >