From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 756EFEB64D9
	for <linux-mm@archiver.kernel.org>; Wed,  5 Jul 2023 00:22:06 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 889DA2800C8; Tue,  4 Jul 2023 20:22:05 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 83A1D2800B2; Tue,  4 Jul 2023 20:22:05 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6DA722800C8; Tue,  4 Jul 2023 20:22:05 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 5B3872800B2
	for <linux-mm@kvack.org>; Tue,  4 Jul 2023 20:22:05 -0400 (EDT)
Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 2F7C9160928
	for <linux-mm@kvack.org>; Wed,  5 Jul 2023 00:22:05 +0000 (UTC)
X-FDA: 80975655810.19.87FA466
Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177])
	by imf28.hostedemail.com (Postfix) with ESMTP id 5D267C0003
	for <linux-mm@kvack.org>; Wed,  5 Jul 2023 00:22:03 +0000 (UTC)
Authentication-Results: imf28.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=2gstzQ66;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf28.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=yuzhao@google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1688516523;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=UXr7GRLlBKX4b8iUsbUAWkSJYd3+XLs/QXyV/h+r82o=;
	b=Uy4vnYw6z95w9Poks4+eGY+yUkZDYpoHt5nspQY4OVXDFmbPShIE2yb+nUquuPAteO88F5
	Q+S5KdyUbj5XNJluw/oPBRx+SSS+uESPlXY/h08WQX3B7Vgnkt2tI1V0SI5uz7itSMXm/4
	B1wxBHCV7xyNKbuBtCsv4qPuZScyO98=
ARC-Authentication-Results: i=1;
	imf28.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=2gstzQ66;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf28.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=yuzhao@google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688516523; a=rsa-sha256;
	cv=none;
	b=57tsu+vllrkVzPKFUfVUnpXHWmWfY6fHB5PTQXnKOyn92f/M2Kf1gHd+DlXvsfKzFUbcFp
	FRPC10P6OqAbY8OtGmw4w3PeP5wF7AgTwkUNlKpAFrobHExKGtv3R4gS1pgF6qKiHvBwW2
	bZatA+5ZwtTnackOza9FLnWCzDhOpgU=
Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-40371070eb7so246241cf.1
        for <linux-mm@kvack.org>; Tue, 04 Jul 2023 17:22:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20221208; t=1688516522; x=1691108522;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=UXr7GRLlBKX4b8iUsbUAWkSJYd3+XLs/QXyV/h+r82o=;
        b=2gstzQ66MBhazWZtX0mKLT/XaluMubOgVx28ED31+TQX1Gg0Jpz/inHl6FOFnEQFlL
         Wr2xe06RdgzqBVA0wM/O89KZbFt9BC7MUv4n7eP3Z129v2BthG2ZoIApnbIqCqSJq6vW
         iWQExmyWk4Evnax5i6J+7xgB86pg8HmbkmF3N4ms7cmbd5wxbhRNsbennMNJogNoUlz4
         GO2DmZ5bkjwAGyfmOmqSfewAgF3fhsYQip1yIswEMPnmXFEJJJcuUhRfWG9vWTEXZ3lu
         favnTi4DQXhzEpSDWmza6MqeZ1bcjdlm40G7dmc/ZP5gDfBZhCji5YE2XcAJVx/q03se
         u/tQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1688516522; x=1691108522;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=UXr7GRLlBKX4b8iUsbUAWkSJYd3+XLs/QXyV/h+r82o=;
        b=gpOnfQ+Fv3U/MTPAcYGNtovAyp1p5wsqxWpi+kszCX21RH4UHsfw2L0F4AbSFxWgtt
         88X0wPddh3zYOEx0+W9oSrWey0W3MOeb6K7QVBXX9QfmDLsiPioKWCFTc5GB10pXjLD1
         yxWHu40MEy5xKfB7AovHlocc7mfkEqsrqH01XK8j85jMVtiszqLpX2kelqQGDIYkTyFx
         OqZUvmgwjboToW0DsvLOFibW3dRcNMoGsjkerv28smZR6RECoiq5d2QrFVqbgXihSX+H
         486TqB8CkkgbediQCbGnimZU7JOI2ovQbwTyoKcKCXnc325nUVlc40SFkrPxqs+H5lHU
         vhJg==
X-Gm-Message-State: ABy/qLYerhLlnwsVAHH0XXRGbum3TbkQQsfG918lxDCgu59ENklTb7aG
	RGs+U/EmoA/pxYHXonO8TrDXTtA8EVtq9wl2Anrm2Q==
X-Google-Smtp-Source: APBJJlFxKqQTImwWVc8NmsoGdPEjlLx2JxHDyVUjrvoTiBaFG8w7QkDvWKEDSWpv/5Xz6N1WwgyTj1BdG7ing5HfrKQ=
X-Received: by 2002:a05:622a:199c:b0:3f8:99c1:52a1 with SMTP id
 u28-20020a05622a199c00b003f899c152a1mr29598qtc.10.1688516522353; Tue, 04 Jul
 2023 17:22:02 -0700 (PDT)
MIME-Version: 1.0
References: <20230703135330.1865927-1-ryan.roberts@arm.com>
 <CAOUHufYB2kB0r9hhSbzfEzdF85MkXVfWoFOhy3LwLfJ5Qo8H6g@mail.gmail.com>
 <69aada71-0b3f-e928-6413-742fe7926576@intel.com> <CAOUHufYsOdywAJMxdh6W-=uLykD=7JrUwgBvUJWvfWJeQ5XxnA@mail.gmail.com>
 <467afd30-c85a-8b9d-97b9-a9ef9d0983af@arm.com> <449183bd-76ef-2a3a-c3f5-0478a7c574ef@intel.com>
In-Reply-To: <449183bd-76ef-2a3a-c3f5-0478a7c574ef@intel.com>
From: Yu Zhao <yuzhao@google.com>
Date: Tue, 4 Jul 2023 18:21:26 -0600
Message-ID: <CAOUHufZkMbgsTU+MqDVDjPbavvisT6EXfcNnWO8oN4XtK9Bgvw@mail.gmail.com>
Subject: Re: [PATCH v2 0/5] variable-order, large folios for anonymous memory
To: Yin Fengwei <fengwei.yin@intel.com>, Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, 
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, David Hildenbrand <david@redhat.com>, 
	Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, 
	Anshuman Khandual <anshuman.khandual@arm.com>, Yang Shi <shy828301@gmail.com>, 
	linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, 
	linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 5D267C0003
X-Rspam-User: 
X-Rspamd-Server: rspam02
X-Stat-Signature: 1c3hk3tmj7n7b1qr9cmwz5zdibeqis1m
X-HE-Tag: 1688516523-948626
X-HE-Meta: U2FsdGVkX1+TDLkk4rFg3Hyt88uw+/lTE0kRnEKEW7b3+V7+G8qPr2nOXU/bbR06qAqDlTq6Y7zA6jLLL7O8fWySKrz5y+7drEt5tBIvoSWbPMBPJRaVtAKUhCtVbCmaWDJEyAa44BL/0A8Q0Fac84mjHiXuCMQkwfLLNVMOq73FdkgsWynNMm98fYuFYH4qxhZBvcHcfjtr8q8AV+7ID0G1w8CTfihiamjPu+JBfiEhrvujhL/b4qiLU6Mm/K7Ivnr2Hlxr33n+wHCrI1Kb0GjmXFtc5V4LS6Pry5R+mafn4jg7TdhmK3gqsQS/cNy1lfRJTZiYWZreIcCBxC/ZZMaIsQAGwQJ9TTWDGl1dYC13PkalEj48A6FXf5mOx4pKZl0BxcBgMyYBELk5dwrvnkEmMmb4We0Cazhl2bA/c9WJQkJvj13RDq9oxhMFOx8DB6gLECnTPnIa0t5E/vcgHa7/mLB9PtkgA9UdxfdJWjDrsmmC4Yo/o6xxXpQ2bw9OFLC+00YXbnf1iRl8KF4tiKwUb+e3MlURphwneoxR+1GeE0hWeniY7TFm26NVR7MJuI5E8lhO7bqsE09dbuUkzs3oO2cDTrH/fkPjyN3uEshop6o8iMAA3fqI+PkKfbfGE0SP+iY2lf7I6V1RMeWiXggrTZruOA5+WpRSmE0zKISXskZFLsDPZYglGhG6GzVdYJVEFriTZ5su7hXhipIz+S5SQg+c0hjrTQJTvvFW6mXUYnJ0LuW5SbRyaMNHS9eGlX8ilZ8tYG2n6rs3gpF4cGU+Xu2uNy675u1OEjHX66O0hO2agFjxZlkKhNWiFRuD92zDi/qIQg/GPEl8kaDJK0tMR4lYNnBHznr36/lUIA7UB4qKYXV+EdGCKmKXL7ewUWuVRh9eL83GC7WpW6HwyACrOW+XRllHG5nxm5BFbGfFvC8/VABNyCRRayjtzQ7djp/SQKyiH4efbhD/WfS
 0z4pkWSn
 //rboPEHbafc/yEvoGKDptkNVP/Cy6jGtzjul7QJGEJ0+gXW7gcIIWnurgoj4no0wEo9KLNEnGkIFCRRAVAvo4Gcs4Dy21DGcDwbwe/+jlrLQ5Wrj2cl63rLJrXwyELeYKgW5YhNSezFUjP0KNzoveCj0ehkVwCo9BfgzBGBvXL2REzSTBF2l1TPdzDgWvTSgJQBT6DYxCXC+dZA8AfNrff5p0GRNUmMtAswoiFLhByHBtTU8nYFz3j2RBqauqwz2Dy1iw4+62ktUtds2J5SdD1LAtxOsUW2IA0YnT5XkiuavoN19IzgpQ2Prmg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Jul 4, 2023 at 5:53=E2=80=AFPM Yin Fengwei <fengwei.yin@intel.com> =
wrote:
>
>
>
> On 7/4/23 23:36, Ryan Roberts wrote:
> > On 04/07/2023 08:11, Yu Zhao wrote:
> >> On Tue, Jul 4, 2023 at 12:22=E2=80=AFAM Yin, Fengwei <fengwei.yin@inte=
l.com> wrote:
> >>>
> >>> On 7/4/2023 10:18 AM, Yu Zhao wrote:
> >>>> On Mon, Jul 3, 2023 at 7:53=E2=80=AFAM Ryan Roberts <ryan.roberts@ar=
m.com> wrote:
> >>>>>
> >>>>> Hi All,
> >>>>>
> >>>>> This is v2 of a series to implement variable order, large folios fo=
r anonymous
> >>>>> memory. The objective of this is to improve performance by allocati=
ng larger
> >>>>> chunks of memory during anonymous page faults. See [1] for backgrou=
nd.
> >>>>
> >>>> Thanks for the quick response!
> >>>>
> >>>>> I've significantly reworked and simplified the patch set based on c=
omments from
> >>>>> Yu Zhao (thanks for all your feedback!). I've also renamed the feat=
ure to
> >>>>> VARIABLE_THP, on Yu's advice.
> >>>>>
> >>>>> The last patch is for arm64 to explicitly override the default
> >>>>> arch_wants_pte_order() and is intended as an example. If this serie=
s is accepted
> >>>>> I suggest taking the first 4 patches through the mm tree and the ar=
m64 change
> >>>>> could be handled through the arm64 tree separately. Neither has any=
 build
> >>>>> dependency on the other.
> >>>>>
> >>>>> The one area where I haven't followed Yu's advice is in the determi=
nation of the
> >>>>> size of folio to use. It was suggested that I have a single preferr=
ed large
> >>>>> order, and if it doesn't fit in the VMA (due to exceeding VMA bound=
s, or there
> >>>>> being existing overlapping populated PTEs, etc) then fallback immed=
iately to
> >>>>> order-0. It turned out that this approach caused a performance regr=
ession in the
> >>>>> Speedometer benchmark.
> >>>>
> >>>> I suppose it's regression against the v1, not the unpatched kernel.
> >>> From the performance data Ryan shared, it's against unpatched kernel:
> >>>
> >>> Speedometer 2.0:
> >>>
> >>> | kernel                         |   runs_per_min |
> >>> |:-------------------------------|---------------:|
> >>> | baseline-4k                    |           0.0% |
> >>> | anonfolio-lkml-v1              |           0.7% |
> >>> | anonfolio-lkml-v2-simple-order |          -0.9% |
> >>> | anonfolio-lkml-v2              |           0.5% |
> >>
> >> I see. Thanks.
> >>
> >> A couple of questions:
> >> 1. Do we have a stddev?
> >
> > | kernel                    |   mean_abs |   std_abs |   mean_rel |   s=
td_rel |
> > |:------------------------- |-----------:|----------:|-----------:|----=
------:|
> > | baseline-4k               |      117.4 |       0.8 |       0.0% |    =
  0.7% |
> > | anonfolio-v1              |      118.2 |         1 |       0.7% |    =
  0.9% |
> > | anonfolio-v2-simple-order |      116.4 |       1.1 |      -0.9% |    =
  0.9% |
> > | anonfolio-v2              |        118 |       1.2 |       0.5% |    =
  1.0% |
> >
> > This is with 3 runs per reboot across 5 reboots, with first run after r=
eboot
> > trimmed (it's always a bit slower, I assume due to cold page cache). So=
 10 data
> > points per kernel in total.
> >
> > I've rerun the test multiple times and see similar results each time.
> >
> > I've also run anonfolio-v2 with Kconfig FLEXIBLE_THP=3Ddisabled and in =
this case I
> > see the same performance as baseline-4k.
> >
> >
> >> 2. Do we have a theory why it regressed?
> >
> > I have a woolly hypothesis; I think Chromium is doing mmap/munmap in wa=
ys that
> > mean when we fault, order-4 is often too big to fit in the VMA. So we f=
allback
> > to order-0. I guess this is happening so often for this workload that t=
he cost
> > of doing the checks and fallback is outweighing the benefit of the memo=
ry that
> > does end up with order-4 folios.
> >
> > I've sampled the memory in each bucket (once per second) while running =
and its
> > roughly:
> >
> > 64K: 25%
> > 32K: 15%
> > 16K: 15%
> > 4K: 45%
> >
> > 32K and 16K obviously fold into the 4K bucket with anonfolio-v2-simple-=
order.
> > But potentially, I suspect there is lots of mmap/unmap for the smaller =
sizes and
> > the 64K contents is more static - that's just a guess though.
> So this is like out of vma range thing.
>
> >
> >> Assuming no bugs, I don't see how a real regression could happen --
> >> falling back to order-0 isn't different from the original behavior.
> >> Ryan, could you `perf record` and `cat /proc/vmstat` and share them?
> >
> > I can, but it will have to be a bit later in the week. I'll do some mor=
e test
> > runs overnight so we have a larger number of runs - hopefully that migh=
t tell us
> > that this is noise to a certain extent.
> >
> > I'd still like to hear a clear technical argument for why the bin-packi=
ng
> > approach is not the correct one!
> My understanding to Yu's (Yu, correct me if I am wrong) comments is that =
we
> postpone this part of change and make basic anon large folio support in. =
Then
> discuss which approach we should take. Maybe people will agree retry is t=
he
> choice, maybe other approach will be taken...
>
> For example, for this out of VMA range case, per VMA order should be cons=
idered.
> We don't need make decision that the retry should be taken now.

I've articulated the reasons in another email. Just summarize the most
important point here:
using more fallback orders makes a system reach equilibrium faster, at
which point it can't allocate the order of arch_wants_pte_order()
anymore. IOW, this best-fit policy can reduce the number of folios of
the h/w prefered order for a system running long enough.