From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2311DEB64DD
	for <linux-mm@archiver.kernel.org>; Wed,  5 Jul 2023 17:25:12 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 843238D0007; Wed,  5 Jul 2023 13:25:11 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 7F3918D0001; Wed,  5 Jul 2023 13:25:11 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6BB2F8D0007; Wed,  5 Jul 2023 13:25:11 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 5F8288D0001
	for <linux-mm@kvack.org>; Wed,  5 Jul 2023 13:25:11 -0400 (EDT)
Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id 0332B120ABC
	for <linux-mm@kvack.org>; Wed,  5 Jul 2023 17:25:10 +0000 (UTC)
X-FDA: 80978234022.12.DFFA02F
Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173])
	by imf17.hostedemail.com (Postfix) with ESMTP id EDDB140022
	for <linux-mm@kvack.org>; Wed,  5 Jul 2023 17:25:07 +0000 (UTC)
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=DM87sN9e;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1688577908;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=uvJl8XInaVm4n5AR4AD2zyrea1AdB/Yqy4IKMB7L0Iw=;
	b=pXT4YtgZLVrzUMfz1+spwntSZ4PW6+m4dsHGoIEyoZ3G51hKvOn7uQ//4ptQmkjH7ugDZO
	Ao2LiohvCo9Gcy9yHlP4HDG05KaavooZGG78NzhjhVNPsXskxVXHlcZT3G4GTXTUgj4WBb
	z/pl5qPBeTbA10Un+SBVBf059SdOAP0=
ARC-Authentication-Results: i=1;
	imf17.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=DM87sN9e;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688577908; a=rsa-sha256;
	cv=none;
	b=RtM/R65JXdX1miWc1Otx+vMNcDMwiCWtK49tl/OO5fdGmkZthhJzjvpl/ns+4FZzxtVBuP
	mnX23A756zYUP2mknI6lSfvuUaQmsZXmfr7TdcM1rYpyl0iUbTctRZBAh5STZIFpOcTRJu
	EuwqV2stlq2OXHYOskX6sGeLzxuLPOI=
Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-40371070eb7so16101cf.1
        for <linux-mm@kvack.org>; Wed, 05 Jul 2023 10:25:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20221208; t=1688577907; x=1691169907;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=uvJl8XInaVm4n5AR4AD2zyrea1AdB/Yqy4IKMB7L0Iw=;
        b=DM87sN9etq3pLXS2B0O0x8/s69x8n8NtNg54LKnNdle2WaUIVHRKv2ggFtINDm6Wmq
         Tqw4LejYbfeB0xcvLh0X5mpLO2lh9LpxapgsVfLpB4qGdv5keyAOHiOqlruV+bBp6iiO
         a5WlNuHufIBSuqJ2SAN8sH3TvZVGH6x9hwxTX6ugTClo7DgB6YkQzk6WIqj0g+uYRm8y
         kXIH5o6hAztnIiTFDahGGEEUxi6AN4Z5VtkqouFT1mEOhKAkIyquMjVfHFRZRLPK2Icq
         7xfArJI7fgL0ZRiqzznqyR1obr7TvIYfcjc+NfAlpM3GDURvTgdjB79Tg0SfJ8vjbbvN
         odYQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1688577907; x=1691169907;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=uvJl8XInaVm4n5AR4AD2zyrea1AdB/Yqy4IKMB7L0Iw=;
        b=JBK0yj3v8CPlvX0NEOIoW0bcDKLHrmC55zhBRI7ipc21FdvqPb6BSCVcSVUWyI10a7
         kUCMFx8b3sYjAfcYNO5H7LTTW+6fkP0UJndr4M8AnoX0lGEthNIKunXinPEsojTNoU6D
         WvfFzworsl6Yhiy27na+p09Vl/3BuMy1Uav3KiKcjT7SN1Bx9TbY80NhnkafnFvG1OE2
         aMtYDOPofSd+iRcrfBn1b3/XnFOtmHDonTo/OCgZ5kD8IH10NZMS71wPT9XSB7gyn+co
         oM/dKPg9tOGmIO/h952YJmFc7c9kj5nGekX7oe2hZ4/N90VXl3sPaZSPR2qQlZldqWjr
         4zHA==
X-Gm-Message-State: ABy/qLYQjZdCQbDv7pjuktN8Jzgo+X+bI6CVoTV75LlQfwFpdEHtcbb0
	CUEWewfaNScUOT/sBPaX422ZxMVwqhuLV6CDLQshOQ==
X-Google-Smtp-Source: APBJJlFvsrxca+yc7W6N+OGa5E4od+AwyUaACPlOJi2RIZLZQoFjwJXpzRBVXAIhOwUbSGzbP0UszSU6QIsPScInIuU=
X-Received: by 2002:a05:622a:18e:b0:3f8:5b2:aeec with SMTP id
 s14-20020a05622a018e00b003f805b2aeecmr2782qtw.20.1688577906947; Wed, 05 Jul
 2023 10:25:06 -0700 (PDT)
MIME-Version: 1.0
References: <20230703135330.1865927-1-ryan.roberts@arm.com>
 <20230703135330.1865927-4-ryan.roberts@arm.com> <CAOUHufa_xFJvFFvmw1Tkdc9cXaZ1GPA1dVSauH+J9zGX-sO1UA@mail.gmail.com>
 <eea2b36d-9c6d-64ca-4e21-57cfd5a93d57@arm.com> <CAOUHufZypv+kLFu3r8iPYbceBh0KSE=gus-_iC1Q35_QVQdnMQ@mail.gmail.com>
 <9c5f3515-ad39-e416-902e-96e9387a3b60@arm.com>
In-Reply-To: <9c5f3515-ad39-e416-902e-96e9387a3b60@arm.com>
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 5 Jul 2023 11:24:30 -0600
Message-ID: <CAOUHufYvRYO=x==+i1aDQHvO=fx_sa6kmi5T4CMvsYiw1wgWqw@mail.gmail.com>
Subject: Re: [PATCH v2 3/5] mm: Default implementation of arch_wants_pte_order()
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, 
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Yin Fengwei <fengwei.yin@intel.com>, 
	David Hildenbrand <david@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, 
	Will Deacon <will@kernel.org>, Anshuman Khandual <anshuman.khandual@arm.com>, 
	Yang Shi <shy828301@gmail.com>, linux-arm-kernel@lists.infradead.org, 
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam09
X-Rspamd-Queue-Id: EDDB140022
X-Stat-Signature: ij8dhx9tar7pmrjuzfxzfqtaazj1wmhy
X-Rspam-User: 
X-HE-Tag: 1688577907-891222
X-HE-Meta: U2FsdGVkX1/qfvbWxxbJhVmoXRArA+j3xpCrVCa2B4ODfYwkLnB/nWp5RA2KGFml2CxRn653HpI25NB9qh91+WGZxS0OoAc24enTT0yfsIZzYZaLy/fMQVL9jQty5jU0H0lTrT3lV2TGKP3DLxKopCjsh7CqqR4mw5AWYLVVJr0cqpcFDVxj64fR43gPC/kruBjcPorfJqdmTHfX+mDGwolwJqKHJQXewMNCaOt74STaYsr5eMaPAg5e7+CZCufMEkhp3Z1uwIfHzdjtrl4g79VcmcfqrlCancFlkLw8CmsSaK8hzZ6eZQDBmSfatWDrDnB9zl57DZfkMYWYasuYkFQ7F5SmajMSs19W/FRjz+z7aVNd+fjBI3S5A03XScSW7x+omLeK0PqMIMSKVhcv3MEN1DbV4YL+6cjD0eeMWOxD9pbQqYTG4IoxgjNdDTtqpsticqs1BQEH2jLPHUgYf2ZVgqhGSyBcZGvdmV8xc4DYXazkszPFjNWGNDGNIW3xGRqK42Z/hmGqXcC7+AiEtn5D+ikEyZvKQNi9l43bxkcC2A7iZitvlq2zHyCR6I5SuVjA+0e7bZ4N8ppj8vrSyMj+fhz82h7nblRDBRypdBaO73NS/F/+oOI0OrihWLRGuxpEKXQAPUuJ1ND3A57F2xmPrXrtKrUM2XcImT5MjpHj4aaDvTA+axhTJqWIKFxScJZsJUgLYQZCtqcQCTXPOJADotmw5Mzj+fRozwCWVQD/z+/Nxr6/fc7rISVZHI2j8MQnX+1F2adVzjU8DT9mj64YXP+4ZtMdOeJjp/1F83T61HB9qs/II0C1RK6Vl3a87gOpkVhmgYD6CuxEff7IARZgnFrwDPTZFIRvd+NN+WB86rgO63cl7DRy31/9UldlL9/9ceV5qYtMdF/DAAkQGnwtS3IWQFMvmd9r3dHa4Aup73DVxcr+XkiBTQt7gBqQ0fC5vdAoQo0L+DAJaYX
 t4sUzh8e
 f96OHow+sSkluvxVHus86NGEtgtwmlDCwQQdB2eihOSXSQEjK7ZmFVWiziIgNSHI6i7DatNRW1hiBcizX4VQ21lU8RguSwF5ADWfVbC5l7uis7LDQX5on9oWSrZeQO9TwCJiBD88HufwP9sMEUWmQJZuIM6v+5heNmYZ/w7jrjyG25FvqI91D/Yy7231tlW/I8ZyJatLjhdVeyxyEvlwy9Dba63GTfThoYO52T+symcZ8h9NKB99rp0z+uPQAiR7LUG69WqY9xqrhsAZCDSaUD0LjLmhAaSroBhchSIH2dH7DMIhGzUxMMedFXC2sFA6gqC6vr7FT/KqSygOx0U8+EDgeiA==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Jul 5, 2023 at 3:11=E2=80=AFAM Ryan Roberts <ryan.roberts@arm.com> =
wrote:
>
> On 05/07/2023 03:07, Yu Zhao wrote:
> > On Tue, Jul 4, 2023 at 7:20=E2=80=AFAM Ryan Roberts <ryan.roberts@arm.c=
om> wrote:
> >>
> >> On 03/07/2023 20:50, Yu Zhao wrote:
> >>> On Mon, Jul 3, 2023 at 7:53=E2=80=AFAM Ryan Roberts <ryan.roberts@arm=
.com> wrote:
> >>>>
> >>>> arch_wants_pte_order() can be overridden by the arch to return the
> >>>> preferred folio order for pte-mapped memory. This is useful as some
> >>>> architectures (e.g. arm64) can coalesce TLB entries when the physica=
l
> >>>> memory is suitably contiguous.
> >>>>
> >>>> The first user for this hint will be FLEXIBLE_THP, which aims to
> >>>> allocate large folios for anonymous memory to reduce page faults and
> >>>> other per-page operation costs.
> >>>>
> >>>> Here we add the default implementation of the function, used when th=
e
> >>>> architecture does not define it, which returns the order correspondi=
ng
> >>>> to 64K.
> >>>
> >>> I don't really mind a non-zero default value. But people would ask wh=
y
> >>> non-zero and why 64KB. Probably you could argue this is the large siz=
e
> >>> all known archs support if they have TLB coalescing. For x86, AMD CPU=
s
> >>> would want to override this. I'll leave it to Fengwei to decide
> >>> whether Intel wants a different default value.>
> >>> Also I don't like the vma parameter because it makes
> >>> arch_wants_pte_order() a mix of hw preference and vma policy. From my
> >>> POV, the function should be only about the former; the latter should
> >>> be decided by arch-independent MM code. However, I can live with it i=
f
> >>> ARM MM people think this is really what you want. ATM, I'm skeptical
> >>> they do.
> >>
> >> Here's the big picture for what I'm tryng to achieve:
> >>
> >>  - In the common case, I'd like all programs to get a performance bump=
 by
> >> automatically and transparently using large anon folios - so no explic=
it
> >> requirement on the process to opt-in.
> >
> > We all agree on this :)
> >
> >>  - On arm64, in the above case, I'd like the preferred folio size to b=
e 64K;
> >> from the (admittedly limitted) testing I've done that's about where th=
e
> >> performance knee is and it doesn't appear to increase the memory wasta=
ge very
> >> much. It also has the benefits that for 4K base pages this is the cont=
pte size
> >> (order-4) so I can take full benefit of contpte mappings transparently=
 to the
> >> process. And for 16K this is the HPA size (order-2).
> >
> > My highest priority is to get 16KB proven first because it would
> > benefit both client and server devices. So it may be different from
> > yours but I don't see any conflict.
>
> Do you mean 16K folios on a 4K base page system

Yes.

> or large folios on a 16K base
> page system? I thought your focus was on speeding up 4K base page client =
systems
> but this statement has got me wondering?

Sorry, I should have said 4x4KB.

> >>  - On arm64 when the process has marked the VMA for THP (or when
> >> transparent_hugepage=3Dalways) but the VMA does not meet the requireme=
nts for a
> >> PMD-sized mapping (or we failed to allocate, ...) then I'd like to map=
 using
> >> contpte. For 4K base pages this is 64K (order-4), for 16K this is 2M (=
order-7)
> >> and for 64K this is 2M (order-5). The 64K base page case is very impor=
tant since
> >> the PMD size for that base page is 512MB which is almost impossible to=
 allocate
> >> in practice.
> >
> > Which case (server or client) are you focusing on here? For our client
> > devices, I can confidently say that 64KB has to be after 16KB, if it
> > happens at all. For servers in general, I don't know of any major
> > memory-intensive workloads that are not THP-aware, i.e., I don't think
> > "VMA does not meet the requirements" is a concern.
>
> For the 64K base page case, the focus is server. The problem reported by =
our
> partner is that the 512M huge page size is too big to reliably allocate a=
nd so
> the fauls always fall back to 64K base pages in practice. I would also sp=
eculate
> (happy to be proved wrong) that there are many THP-aware workloads that a=
ssume
> the THP size is 2M. In this case, their VMAs may well be too small to fit=
 a 512M
> huge page when running on 64K base page system.

Interesting. When you have something ready to share, I might be able
to try it on our ARM servers as well.

> But the TL;DR is that Arm has a partner for which enabling 2M THP on a 64=
K base
> page system is a very real requirement. Our intent is that this will be t=
he
> mechanism we use to enable it.

Yes, contpte makes more sense for what you described. It'd fit in a
lot better in the hugetlb case, but I guess your partner uses anon.