From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D7B00C3DA4A
	for <linux-mm@archiver.kernel.org>; Mon, 29 Jul 2024 21:56:18 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 536346B007B; Mon, 29 Jul 2024 17:56:18 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4E5A06B0083; Mon, 29 Jul 2024 17:56:18 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3AD2F6B0085; Mon, 29 Jul 2024 17:56:18 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 1E99C6B007B
	for <linux-mm@kvack.org>; Mon, 29 Jul 2024 17:56:18 -0400 (EDT)
Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 308CD1606B8
	for <linux-mm@kvack.org>; Mon, 29 Jul 2024 21:56:17 +0000 (UTC)
X-FDA: 82394149194.21.E50FD98
Received: from mail-vs1-f49.google.com (mail-vs1-f49.google.com [209.85.217.49])
	by imf22.hostedemail.com (Postfix) with ESMTP id 6A9E4C002B
	for <linux-mm@kvack.org>; Mon, 29 Jul 2024 21:56:15 +0000 (UTC)
Authentication-Results: imf22.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=F4BDIeyC;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=21cnbao@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722290135; a=rsa-sha256;
	cv=none;
	b=KL1uTeJ5Mwcf4JJNb4NgsTSuKYkTOYs6C+zgJg5EIS7uCvXMIWMG44I5LpozFz81F3swx6
	CivbGWu1V0wYTXTw0A6nq/xk0QQPSADjmJn4T3co1O5GohAMGk6/hckSCl+1PyuxWxt+93
	jCHDtrrshDUxw4Ub/VeKl94kh+/e9qc=
ARC-Authentication-Results: i=1;
	imf22.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=F4BDIeyC;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=21cnbao@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1722290135;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=buKYfD2vEc4bJOlfBlCE01OD2iMJ6++Hrcgkg0+V6g0=;
	b=lq6ubZZM0D7VWK3KZKW1gvyiIRT0j6MyqpKErlGg+hPP6DmmZukyExaU2LIWPySjwITBk/
	X4fJrokIWENfJ3uUVGzonPHKjbm4X5yc7MNtfi6lNQAVf89c5tMBBl9saXE6CndOV49LI4
	/p+uqqNRSIOpWbtA84LXjE5jb5bDUmg=
Received: by mail-vs1-f49.google.com with SMTP id ada2fe7eead31-492aae5fd78so817790137.2
        for <linux-mm@kvack.org>; Mon, 29 Jul 2024 14:56:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1722290174; x=1722894974; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=buKYfD2vEc4bJOlfBlCE01OD2iMJ6++Hrcgkg0+V6g0=;
        b=F4BDIeyCnI1ODndkVirdF3NYpV6zVVFia5c6dRGW3BjvADjoJaVGmtiqATjtZXV/AY
         DHl4uv+mCs10H24eRaE1OLDHWGS8xaIBGQhwwm1YYlu+3h+BMRayhl0NtmuguQusjXgh
         IKKhqiGBd9VdMgZxTPBdfCHnkF2OkouaLfz+j0lkcDP7MaaCz0BHxv9jTuJzSSv9ZjdB
         F9fXlX3aftGtcPFh0myCaUtHIHVkbZME6ZFo5sF1y2tNPtXpuNuP/XzjuE7Y1a8noLFN
         D2YZrfEuUChieFQjuz9HWbDi1CloczGsm+a/jCTFXmKv/a53t0tnnraNp1HAALSYtgUc
         1UkA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722290174; x=1722894974;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=buKYfD2vEc4bJOlfBlCE01OD2iMJ6++Hrcgkg0+V6g0=;
        b=EL8rRhELBhSSg2uiy8k14XqxwDkgOSAoNlPU/fhtiOk7nQFeOpc7XQdjybQgaz2Z04
         J1Iq/w2/glkaWKzVhersn5AZ8xwRtUwNB+F/exJFiGso1Qp90WGSk8tw1enJ7AT+SjJ0
         90psE4TTSHyyEBcy0QZh8x94S7jnUsCjkUIn4JbfmsTtlS9WXWDq/jtKEw/xUh4yjiU5
         LEkmgB/cgfGVHrf8y1YRRAir/UOfZiubHZJHWO/qyZ5OhP0MMzIounSG2G/PE16beI2I
         qDyo1ld2xZt0HBCdlk/V2FagJiVWpW24XgIUbgv5THFExDZpHPq3xdHZfSwbwlnjzEpq
         LtwA==
X-Forwarded-Encrypted: i=1; AJvYcCX00Nuh2Ys36VQHMfd3NFz32hORMDPNvspAJSo/IMwa8E6tI3e4ByFOlBVhd3B8mcn/JnTYDWoP4RffIrAlwJUNuF4=
X-Gm-Message-State: AOJu0YydKtldbjs/7hMjKQfYBhIXIF5hnjBA+teyWa8YRODtX0eEzDL6
	W8wGN9GHED9NODwQUoxz4ud7ZDUuE2bc7EUUVjGRRpZYBqp4CI2cafNbaRZVEtWO8PhpkzVmIQv
	6BLkWxv9Q2Pr8vPBSawJsMFk5eUQ=
X-Google-Smtp-Source: AGHT+IHz+04Nm+a7KXjZa3Dhb86GCyTqxzeBIw35gjeoXs7Tw9QapympkWZYNkDA8b9tTILb177mq2VC3NwQ3U6nJJI=
X-Received: by 2002:a05:6102:3589:b0:48f:e5d1:241d with SMTP id
 ada2fe7eead31-493faa40575mr11169360137.14.1722290174336; Mon, 29 Jul 2024
 14:56:14 -0700 (PDT)
MIME-Version: 1.0
References: <20240726094618.401593-1-21cnbao@gmail.com> <20240726094618.401593-4-21cnbao@gmail.com>
 <ZqcRqxGaJsAwZD3C@casper.infradead.org> <CAGsJ_4xQkPtw1Cw=2mcRjpTdp-c9tSFCuW_U6JKzJ3zHGYQWrA@mail.gmail.com>
 <CAGsJ_4wxUZAysyg3cCVnHhOFt5SbyAMUfq3tJcX-Wb6D4BiBhA@mail.gmail.com>
 <ZqePwtX4b18wiubO@casper.infradead.org> <CAGsJ_4zXfYT4KxBnLx1F8tpK-5s6PX3PQ7wf7tteuzEyKS+ZPQ@mail.gmail.com>
 <ZqexmNIc00Xlwy2c@casper.infradead.org> <CAGsJ_4yqLVvCUFpHjWmNAYvPRMzGK8JJWYMXJLR7d9UhKp+QDA@mail.gmail.com>
In-Reply-To: <CAGsJ_4yqLVvCUFpHjWmNAYvPRMzGK8JJWYMXJLR7d9UhKp+QDA@mail.gmail.com>
From: Barry Song <21cnbao@gmail.com>
Date: Tue, 30 Jul 2024 09:56:03 +1200
Message-ID: <CAGsJ_4zFVrJ3BVDBBAD5mSQgZybsig5ZoT6PVyohYAbZt9Ndnw@mail.gmail.com>
Subject: Re: [PATCH v5 3/4] mm: support large folios swapin as a whole for
 zRAM-like swapfile
To: Matthew Wilcox <willy@infradead.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org, ying.huang@intel.com, 
	baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, 
	hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, 
	kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, 
	minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, 
	senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, 
	surenb@google.com, v-songbaohua@oppo.com, xiang@kernel.org, 
	yosryahmed@google.com, Chuanhua Han <hanchuanhua@oppo.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 6A9E4C002B
X-Rspam-User: 
X-Rspamd-Server: rspam05
X-Stat-Signature: pzdswe8msmeds1fxopho7oafjso47cna
X-HE-Tag: 1722290175-861294
X-HE-Meta: U2FsdGVkX1/LM+o83gD8Ooeq70iMXDUzX0FdTrPF1C6q0kxlxa/a/ReHMuL9MeQTjoz4GsdZFLlhNeX9tAfsfBeasp6OX0Gyaguz72K1A/v0lCmfuZ3I0KKoQCioF6C9UtE+GYf89f8rTDjwVqDFuYyFj+Ll2anI4h+e2vaMWXJGLTRi+6lhyuA7aodgCJTGAk6tXuYUSZdbYlU9Qkb3Xf0oH68N9GW40VDwInhH8n0dU+qPYoGV7VzvOMOjfgwcf/zlpa1qjpX1wk9wJd5aF6WAcmt3Wi+nyhqiycHnTMa60VGkA0k+yjYRmFqfVujY93lZ/7e6yIfLwZqUqL44HITd3m4AzgmXTz1Xd/oB/gTmjN5+azjdK76i5quNb4mjnT+PPe0nZ5gWwdyyDUjwr5IixyEQ9qngxq35ur2J/RjiZAwJK2tc0g+7GsAPTdda1dMhqul2mI7BhpH2oL0f518BB+MMyBQ2417wccZjEPPj9MS6JNpWruZWjN6fICI2BRBupaWzPCQLouxkiAd0e5WJ0ajNwq/9zTjlox8uvmIGpvXnmZOHdcKpBC6U8v9V/gQO7HKhlFv/PoE20ja7NqA/q8aJMqAIiRLDSJ8f6z4vQ6rpqPL1KUo2wKWiaU1KR8b+5w9OG7u5KITaYNHNa4hPP6FGDVNY9lOkB2PnEKV9Bl89MNw6ZXCQJag89ZctdUm0CpC3hPMLIKRJxRr7Q4J1WVXANBptXzb86fGSdOhzUWWYBoyCLfgug4M4FCnWhmt5rIqqz5Tjw2xCNMt4mwJQ3aAKIfW3MvYVeRulBLS82XQewgjE5gDyXfwnckLhyapVjXRGXjHy2Brk+VPY/+XvOXvenl33dC1cocz27xSbWDX8IsP/HVh4qFqO157lfnocoWCiDIa1eNmczkOsgl2JhIJzmDaWqO8tAdoWVdhQaWuRppMRExwrAB0HN9GmiKvEtdSbCKZ7T77W9nc
 oOQVN/Px
 L6xt7vspyFdGThS81rrhAxhuhZxyw7GWNDxfRsm7cK7Z6er0q/id0CIiF3X2l5k+vorqXN75TmJxQemK6V1faGwdMayITURpKAbeTQDqqtTY5vCvjleMu0wIabH167xpFNIk9x1yEa9tDlp3/FhcvMKqXLHOC9s+ccwqxpCGRbsz9orREwnr7mIntPY/HPO5sdNOTV43PiDXS0S7shJoez4zqLpyDPwJvWbFm+MQbcRH85ZWnzRXBGo1diTETICy3dXY1WTQDjwLyiF9taKT2SeAwWX5buXwZymOFnszhmNsPAWJwfev32zXgbyZbL1kFpjToPQ11gLo2d18EU3bQ67x87JQLOKsQ+GGOPwCPZ2amP6F3ko2vSPameoZcDYa47LLO
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Tue, Jul 30, 2024 at 8:03=E2=80=AFAM Barry Song <21cnbao@gmail.com> wrot=
e:
>
> On Tue, Jul 30, 2024 at 3:13=E2=80=AFAM Matthew Wilcox <willy@infradead.o=
rg> wrote:
> >
> > On Tue, Jul 30, 2024 at 01:11:31AM +1200, Barry Song wrote:
> > > for this zRAM case, it is a new allocated large folio, only
> > > while all conditions are met, we will allocate and map
> > > the whole folio. you can check can_swapin_thp() and
> > > thp_swap_suitable_orders().
> >
> > YOU ARE DOING THIS WRONGLY!
> >
> > All of you anonymous memory people are utterly fixated on TLBs AND THIS
> > IS WRONG.  Yes, TLB performance is important, particularly with crappy
> > ARM designs, which I know a lot of you are paid to work on.  But you
> > seem to think this is the only consideration, and you're making bad
> > design choices as a result.  It's overly complicated, and you're leavin=
g
> > performance on the table.
> >
> > Look back at the results Ryan showed in the early days of working on
> > large anonymous folios.  Half of the performance win on his system came
> > from using larger TLBs.  But the other half came from _reduced software
> > overhead_.  The LRU lock is a huge problem, and using large folios cuts
> > the length of the LRU list, hence LRU lock hold time.
> >
> > Your _own_ data on how hard it is to get hold of a large folio due to
> > fragmentation should be enough to convince you that the more large foli=
os
> > in the system, the better the whole system runs.  We should not decline=
 to
> > allocate large folios just because they can't be mapped with a single T=
LB!
>
> I am not convinced. for a new allocated large folio, even alloc_anon_foli=
o()
> of do_anonymous_page() does the exactly same thing
>
> alloc_anon_folio()
> {
>         /*
>          * Get a list of all the (large) orders below PMD_ORDER that are =
enabled
>          * for this vma. Then filter out the orders that can't be allocat=
ed over
>          * the faulting address and still be fully contained in the vma.
>          */
>         orders =3D thp_vma_allowable_orders(vma, vma->vm_flags,
>                         TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1=
);
>         orders =3D thp_vma_suitable_orders(vma, vmf->address, orders);
>
> }
>
> you are not going to allocate a mTHP for an unaligned address for a new
> PF.
> Please point out where it is wrong.

Let's assume we have a folio with the virtual address as
0x500000000000 ~ 0x500000000000 + 64KB
if it is swapped out to 0x10000 ~ 0x100000 + 64KB.

The current code will swap it in as a mTHP if page fault occurs in
any address within (0x500000000000 ~ 0x500000000000 + 64KB)

In this case, the mTHP enjoys both decreased TLB and reduced overhead
such as LRU lock etc. So it sounds we have nothing lost in this case.

But if the folio is mremap-ed to an unaligned address like:
(0x600000000000 + 16KB ~ 0x600000000000 + 80KB)
and its swap offset is still (0x10000 ~ 0x100000 + 64KB).

The current code won't swap in them as mTHP. Sounds like a loss?

If this is the performance problem you are trying to address, my point
is that it is not worth increasing the complexity for this stage though thi=
s
might be doable. We once tracked hundreds of phones running apps randomly
for a couple of days, and we didn't encounter such a case. So this is
pretty much a corner case.

If your concern is more than this, for example, if you want to swap in
large folios even when swaps are completely not contiguous, this is a diffe=
rent
story. I agree this is a potential optimization direction to go,  but in th=
at
case, you still need to find an aligned boundary to handle page faults
just like do_anonymous_page(), otherwise, you may result in all
kinds of pointless intersections where PFs can cover the address ranges of
other PFs, then make the PTEs check such as pte_range_none()
completely dis-ordered:

static struct folio *alloc_anon_folio(struct vm_fault *vmf)
{
        ....

        /*
         * Find the highest order where the aligned range is completely
         * pte_none(). Note that all remaining orders will be completely
         * pte_none().
         */
        order =3D highest_order(orders);
        while (orders) {
                addr =3D ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
                if (pte_range_none(pte + pte_index(addr), 1 << order))
                        break;
                order =3D next_order(&orders, order);
        }
}

>
> Thanks
> Barry