From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2311DEB64DD for ; Wed, 5 Jul 2023 17:25:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 843238D0007; Wed, 5 Jul 2023 13:25:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F3918D0001; Wed, 5 Jul 2023 13:25:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BB2F8D0007; Wed, 5 Jul 2023 13:25:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5F8288D0001 for ; Wed, 5 Jul 2023 13:25:11 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0332B120ABC for ; Wed, 5 Jul 2023 17:25:10 +0000 (UTC) X-FDA: 80978234022.12.DFFA02F Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf17.hostedemail.com (Postfix) with ESMTP id EDDB140022 for ; Wed, 5 Jul 2023 17:25:07 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=DM87sN9e; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688577908; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uvJl8XInaVm4n5AR4AD2zyrea1AdB/Yqy4IKMB7L0Iw=; b=pXT4YtgZLVrzUMfz1+spwntSZ4PW6+m4dsHGoIEyoZ3G51hKvOn7uQ//4ptQmkjH7ugDZO Ao2LiohvCo9Gcy9yHlP4HDG05KaavooZGG78NzhjhVNPsXskxVXHlcZT3G4GTXTUgj4WBb z/pl5qPBeTbA10Un+SBVBf059SdOAP0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=DM87sN9e; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688577908; a=rsa-sha256; cv=none; b=RtM/R65JXdX1miWc1Otx+vMNcDMwiCWtK49tl/OO5fdGmkZthhJzjvpl/ns+4FZzxtVBuP mnX23A756zYUP2mknI6lSfvuUaQmsZXmfr7TdcM1rYpyl0iUbTctRZBAh5STZIFpOcTRJu EuwqV2stlq2OXHYOskX6sGeLzxuLPOI= Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-40371070eb7so16101cf.1 for ; Wed, 05 Jul 2023 10:25:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688577907; x=1691169907; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uvJl8XInaVm4n5AR4AD2zyrea1AdB/Yqy4IKMB7L0Iw=; b=DM87sN9etq3pLXS2B0O0x8/s69x8n8NtNg54LKnNdle2WaUIVHRKv2ggFtINDm6Wmq Tqw4LejYbfeB0xcvLh0X5mpLO2lh9LpxapgsVfLpB4qGdv5keyAOHiOqlruV+bBp6iiO a5WlNuHufIBSuqJ2SAN8sH3TvZVGH6x9hwxTX6ugTClo7DgB6YkQzk6WIqj0g+uYRm8y kXIH5o6hAztnIiTFDahGGEEUxi6AN4Z5VtkqouFT1mEOhKAkIyquMjVfHFRZRLPK2Icq 7xfArJI7fgL0ZRiqzznqyR1obr7TvIYfcjc+NfAlpM3GDURvTgdjB79Tg0SfJ8vjbbvN odYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688577907; x=1691169907; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uvJl8XInaVm4n5AR4AD2zyrea1AdB/Yqy4IKMB7L0Iw=; b=JBK0yj3v8CPlvX0NEOIoW0bcDKLHrmC55zhBRI7ipc21FdvqPb6BSCVcSVUWyI10a7 kUCMFx8b3sYjAfcYNO5H7LTTW+6fkP0UJndr4M8AnoX0lGEthNIKunXinPEsojTNoU6D WvfFzworsl6Yhiy27na+p09Vl/3BuMy1Uav3KiKcjT7SN1Bx9TbY80NhnkafnFvG1OE2 aMtYDOPofSd+iRcrfBn1b3/XnFOtmHDonTo/OCgZ5kD8IH10NZMS71wPT9XSB7gyn+co oM/dKPg9tOGmIO/h952YJmFc7c9kj5nGekX7oe2hZ4/N90VXl3sPaZSPR2qQlZldqWjr 4zHA== X-Gm-Message-State: ABy/qLYQjZdCQbDv7pjuktN8Jzgo+X+bI6CVoTV75LlQfwFpdEHtcbb0 CUEWewfaNScUOT/sBPaX422ZxMVwqhuLV6CDLQshOQ== X-Google-Smtp-Source: APBJJlFvsrxca+yc7W6N+OGa5E4od+AwyUaACPlOJi2RIZLZQoFjwJXpzRBVXAIhOwUbSGzbP0UszSU6QIsPScInIuU= X-Received: by 2002:a05:622a:18e:b0:3f8:5b2:aeec with SMTP id s14-20020a05622a018e00b003f805b2aeecmr2782qtw.20.1688577906947; Wed, 05 Jul 2023 10:25:06 -0700 (PDT) MIME-Version: 1.0 References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-4-ryan.roberts@arm.com> <9c5f3515-ad39-e416-902e-96e9387a3b60@arm.com> In-Reply-To: <9c5f3515-ad39-e416-902e-96e9387a3b60@arm.com> From: Yu Zhao Date: Wed, 5 Jul 2023 11:24:30 -0600 Message-ID: Subject: Re: [PATCH v2 3/5] mm: Default implementation of arch_wants_pte_order() To: Ryan Roberts Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: EDDB140022 X-Stat-Signature: ij8dhx9tar7pmrjuzfxzfqtaazj1wmhy X-Rspam-User: X-HE-Tag: 1688577907-891222 X-HE-Meta: U2FsdGVkX1/qfvbWxxbJhVmoXRArA+j3xpCrVCa2B4ODfYwkLnB/nWp5RA2KGFml2CxRn653HpI25NB9qh91+WGZxS0OoAc24enTT0yfsIZzYZaLy/fMQVL9jQty5jU0H0lTrT3lV2TGKP3DLxKopCjsh7CqqR4mw5AWYLVVJr0cqpcFDVxj64fR43gPC/kruBjcPorfJqdmTHfX+mDGwolwJqKHJQXewMNCaOt74STaYsr5eMaPAg5e7+CZCufMEkhp3Z1uwIfHzdjtrl4g79VcmcfqrlCancFlkLw8CmsSaK8hzZ6eZQDBmSfatWDrDnB9zl57DZfkMYWYasuYkFQ7F5SmajMSs19W/FRjz+z7aVNd+fjBI3S5A03XScSW7x+omLeK0PqMIMSKVhcv3MEN1DbV4YL+6cjD0eeMWOxD9pbQqYTG4IoxgjNdDTtqpsticqs1BQEH2jLPHUgYf2ZVgqhGSyBcZGvdmV8xc4DYXazkszPFjNWGNDGNIW3xGRqK42Z/hmGqXcC7+AiEtn5D+ikEyZvKQNi9l43bxkcC2A7iZitvlq2zHyCR6I5SuVjA+0e7bZ4N8ppj8vrSyMj+fhz82h7nblRDBRypdBaO73NS/F/+oOI0OrihWLRGuxpEKXQAPUuJ1ND3A57F2xmPrXrtKrUM2XcImT5MjpHj4aaDvTA+axhTJqWIKFxScJZsJUgLYQZCtqcQCTXPOJADotmw5Mzj+fRozwCWVQD/z+/Nxr6/fc7rISVZHI2j8MQnX+1F2adVzjU8DT9mj64YXP+4ZtMdOeJjp/1F83T61HB9qs/II0C1RK6Vl3a87gOpkVhmgYD6CuxEff7IARZgnFrwDPTZFIRvd+NN+WB86rgO63cl7DRy31/9UldlL9/9ceV5qYtMdF/DAAkQGnwtS3IWQFMvmd9r3dHa4Aup73DVxcr+XkiBTQt7gBqQ0fC5vdAoQo0L+DAJaYX t4sUzh8e f96OHow+sSkluvxVHus86NGEtgtwmlDCwQQdB2eihOSXSQEjK7ZmFVWiziIgNSHI6i7DatNRW1hiBcizX4VQ21lU8RguSwF5ADWfVbC5l7uis7LDQX5on9oWSrZeQO9TwCJiBD88HufwP9sMEUWmQJZuIM6v+5heNmYZ/w7jrjyG25FvqI91D/Yy7231tlW/I8ZyJatLjhdVeyxyEvlwy9Dba63GTfThoYO52T+symcZ8h9NKB99rp0z+uPQAiR7LUG69WqY9xqrhsAZCDSaUD0LjLmhAaSroBhchSIH2dH7DMIhGzUxMMedFXC2sFA6gqC6vr7FT/KqSygOx0U8+EDgeiA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 5, 2023 at 3:11=E2=80=AFAM Ryan Roberts = wrote: > > On 05/07/2023 03:07, Yu Zhao wrote: > > On Tue, Jul 4, 2023 at 7:20=E2=80=AFAM Ryan Roberts wrote: > >> > >> On 03/07/2023 20:50, Yu Zhao wrote: > >>> On Mon, Jul 3, 2023 at 7:53=E2=80=AFAM Ryan Roberts wrote: > >>>> > >>>> arch_wants_pte_order() can be overridden by the arch to return the > >>>> preferred folio order for pte-mapped memory. This is useful as some > >>>> architectures (e.g. arm64) can coalesce TLB entries when the physica= l > >>>> memory is suitably contiguous. > >>>> > >>>> The first user for this hint will be FLEXIBLE_THP, which aims to > >>>> allocate large folios for anonymous memory to reduce page faults and > >>>> other per-page operation costs. > >>>> > >>>> Here we add the default implementation of the function, used when th= e > >>>> architecture does not define it, which returns the order correspondi= ng > >>>> to 64K. > >>> > >>> I don't really mind a non-zero default value. But people would ask wh= y > >>> non-zero and why 64KB. Probably you could argue this is the large siz= e > >>> all known archs support if they have TLB coalescing. For x86, AMD CPU= s > >>> would want to override this. I'll leave it to Fengwei to decide > >>> whether Intel wants a different default value.> > >>> Also I don't like the vma parameter because it makes > >>> arch_wants_pte_order() a mix of hw preference and vma policy. From my > >>> POV, the function should be only about the former; the latter should > >>> be decided by arch-independent MM code. However, I can live with it i= f > >>> ARM MM people think this is really what you want. ATM, I'm skeptical > >>> they do. > >> > >> Here's the big picture for what I'm tryng to achieve: > >> > >> - In the common case, I'd like all programs to get a performance bump= by > >> automatically and transparently using large anon folios - so no explic= it > >> requirement on the process to opt-in. > > > > We all agree on this :) > > > >> - On arm64, in the above case, I'd like the preferred folio size to b= e 64K; > >> from the (admittedly limitted) testing I've done that's about where th= e > >> performance knee is and it doesn't appear to increase the memory wasta= ge very > >> much. It also has the benefits that for 4K base pages this is the cont= pte size > >> (order-4) so I can take full benefit of contpte mappings transparently= to the > >> process. And for 16K this is the HPA size (order-2). > > > > My highest priority is to get 16KB proven first because it would > > benefit both client and server devices. So it may be different from > > yours but I don't see any conflict. > > Do you mean 16K folios on a 4K base page system Yes. > or large folios on a 16K base > page system? I thought your focus was on speeding up 4K base page client = systems > but this statement has got me wondering? Sorry, I should have said 4x4KB. > >> - On arm64 when the process has marked the VMA for THP (or when > >> transparent_hugepage=3Dalways) but the VMA does not meet the requireme= nts for a > >> PMD-sized mapping (or we failed to allocate, ...) then I'd like to map= using > >> contpte. For 4K base pages this is 64K (order-4), for 16K this is 2M (= order-7) > >> and for 64K this is 2M (order-5). The 64K base page case is very impor= tant since > >> the PMD size for that base page is 512MB which is almost impossible to= allocate > >> in practice. > > > > Which case (server or client) are you focusing on here? For our client > > devices, I can confidently say that 64KB has to be after 16KB, if it > > happens at all. For servers in general, I don't know of any major > > memory-intensive workloads that are not THP-aware, i.e., I don't think > > "VMA does not meet the requirements" is a concern. > > For the 64K base page case, the focus is server. The problem reported by = our > partner is that the 512M huge page size is too big to reliably allocate a= nd so > the fauls always fall back to 64K base pages in practice. I would also sp= eculate > (happy to be proved wrong) that there are many THP-aware workloads that a= ssume > the THP size is 2M. In this case, their VMAs may well be too small to fit= a 512M > huge page when running on 64K base page system. Interesting. When you have something ready to share, I might be able to try it on our ARM servers as well. > But the TL;DR is that Arm has a partner for which enabling 2M THP on a 64= K base > page system is a very real requirement. Our intent is that this will be t= he > mechanism we use to enable it. Yes, contpte makes more sense for what you described. It'd fit in a lot better in the hugetlb case, but I guess your partner uses anon.