From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0931CC3DA49
	for <linux-mm@archiver.kernel.org>; Tue, 30 Jul 2024 08:12:39 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 941E56B0089; Tue, 30 Jul 2024 04:12:38 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8F1AB6B008A; Tue, 30 Jul 2024 04:12:38 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 812F56B008C; Tue, 30 Jul 2024 04:12:38 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 605186B0089
	for <linux-mm@kvack.org>; Tue, 30 Jul 2024 04:12:38 -0400 (EDT)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 192631C07F7
	for <linux-mm@kvack.org>; Tue, 30 Jul 2024 08:12:38 +0000 (UTC)
X-FDA: 82395702396.16.4B707C7
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by imf10.hostedemail.com (Postfix) with ESMTP id 06A3DC0004
	for <linux-mm@kvack.org>; Tue, 30 Jul 2024 08:12:35 +0000 (UTC)
Authentication-Results: imf10.hostedemail.com;
	dkim=none;
	spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com;
	dmarc=pass (policy=none) header.from=arm.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1722327084;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=L54y1zaN2MUZIGo5pPXZxYGGbyrulEdzurOrfsH6CCA=;
	b=2otm+3dpe6aoiYiHa0398FOa/IDWk8V3PnJWowHgBMXs77ywNeJtjfeJVw/0+N+JmVjBKu
	ho6d367Uf2An4VON8RPcIdrWb5ANN6Z00TZBYITJiX6qj2W0OAQn5ozlSSqFKIWdZ+0tAM
	hEjlniHgWGFuFbFUYy5ntSozuPz7te0=
ARC-Authentication-Results: i=1;
	imf10.hostedemail.com;
	dkim=none;
	spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com;
	dmarc=pass (policy=none) header.from=arm.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722327084; a=rsa-sha256;
	cv=none;
	b=ueaYOap5AWDDFz18hD3G2FI1nIM3nzXZONsP9XNan+g5T5BeDnOPY+BO/vxGx6NI5pcjJr
	MzYJg1GrVX09rWQQ5k0XPewv5y+8iq3JJ2VTUYMFHX8ct81oxFZsQ3FEBKuphvIFx9p3js
	oykXybf7nIcY2Fkf8huNwV60+B22BVE=
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6F9A51007;
	Tue, 30 Jul 2024 01:13:00 -0700 (PDT)
Received: from [10.57.78.186] (unknown [10.57.78.186])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3A0393F766;
	Tue, 30 Jul 2024 01:12:31 -0700 (PDT)
Message-ID: <ab1f7e18-c52e-4b48-9eec-90c6c71c826c@arm.com>
Date: Tue, 30 Jul 2024 09:12:29 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v5 3/4] mm: support large folios swapin as a whole for
 zRAM-like swapfile
Content-Language: en-GB
To: Matthew Wilcox <willy@infradead.org>, Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org, ying.huang@intel.com,
 baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com,
 hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com,
 kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com,
 minchan@kernel.org, nphamcs@gmail.com, senozhatsky@chromium.org,
 shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com,
 v-songbaohua@oppo.com, xiang@kernel.org, yosryahmed@google.com,
 Chuanhua Han <hanchuanhua@oppo.com>
References: <20240726094618.401593-1-21cnbao@gmail.com>
 <20240726094618.401593-4-21cnbao@gmail.com>
 <ZqcRqxGaJsAwZD3C@casper.infradead.org>
 <CAGsJ_4xQkPtw1Cw=2mcRjpTdp-c9tSFCuW_U6JKzJ3zHGYQWrA@mail.gmail.com>
 <CAGsJ_4wxUZAysyg3cCVnHhOFt5SbyAMUfq3tJcX-Wb6D4BiBhA@mail.gmail.com>
 <ZqePwtX4b18wiubO@casper.infradead.org>
 <CAGsJ_4zXfYT4KxBnLx1F8tpK-5s6PX3PQ7wf7tteuzEyKS+ZPQ@mail.gmail.com>
 <ZqexmNIc00Xlwy2c@casper.infradead.org>
From: Ryan Roberts <ryan.roberts@arm.com>
In-Reply-To: <ZqexmNIc00Xlwy2c@casper.infradead.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: 06A3DC0004
X-Stat-Signature: g9n8wfz8gypkbk6799tfs5x9hhgfjjwo
X-Rspam-User: 
X-HE-Tag: 1722327155-965073
X-HE-Meta: U2FsdGVkX1/BCIN3gOdIa2AoEMkjLaFJoNTK1Axubw+PhtRmiceIiTYOEGU2QUq53BJuY6gUikB7jRbGgrzey1bvKduZQk4eddX88LqVyjBWZ9FCfksSAn/iBbtB2yso8gsZloctATBR3qb+X1zBolTv+kOMoyopuizkwA5aIqLXzuY0VplRQwnxLnMAOXE8Q6JYGy7nEjLI7E8zhtjiMb1xt+VA0lHuXyw+t3VsDYW2cB6tagNXqP2xu4tInSvd6T/5+BTk5DPShhpPX0epzbkwzAxyx3Cqu7Z0+W49UIR/iK0U0uDCo4GfGyNL2fwRDdckxzIl3C7b0uFcJidOfMe5FoTMWu/rE75vqJPtEMTxC5ardvIodZljFllblhK/YrPb3Kd/53ZCENZAUVXBlZgx0Qi5t83+4N1NCtXin4FyIxRoUXbHaQkawI31CFmolDELzHdswJRfKiy/p5Kbhf6eRdp+cXmI4GZpJogd7IgQ6nuavdE3wdhCOVLr3P8ceA/cN1ra0cg1ntXtAyKlIJMwYnZUcRO8CNgT2MUEarmrB72pzz/PZBExS7711iXXTkgq5ZiXwYlzZAVMc2d9qN3B0HnuZ4fdi/kkG44/+fenqzT7cKYoPTeKY9waCK2NwkvIeeHpW4pi+gfiijiNIeywgrO/SAxJqfoYOR4uouBLMon9tg3QEtQ4djjGS/b55Jq2FYaX1xRn3cegYEU3ihUgwAjFfQgaAB6WbE1xfe/e9VpS0erZxlzIslxNB5OcfCl3UNoAN/Jjf8wAx8KGOaARH5eLb1doQdi/pwo6Cvr8KbUx76q2uNMvuPKdPw5bgLdJZIqn8ngzZCg4ICZOlWjLNbzAJYMQz4aVj9jwPUA/J7CI2MldkjYTnGFIsIVy1FMKWP/sKDnVbsQI+VNnfmGtxlsLvij4f1geO3SN2UxGkYjaMuI8BghhlRbee7YZqjX0KB8ho6EIUXU89Ri
 LGVsgZ1S
 oBwSQZnKf8mFIKhoGuXpzZO5VmfFBg5UrxFKfaFHSyRnTlo1V9+cbvv8h3DHg47bJ6wqK+UNY3ow5hmWFR/+HZV79WVAq6Qt75IvWtG6Jzi+qz3CCnEvUreLhscM8LO+2nm4Z43CSWWsGeYDHAOOevg8cLQ==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 29/07/2024 16:13, Matthew Wilcox wrote:
> On Tue, Jul 30, 2024 at 01:11:31AM +1200, Barry Song wrote:
>> for this zRAM case, it is a new allocated large folio, only
>> while all conditions are met, we will allocate and map
>> the whole folio. you can check can_swapin_thp() and
>> thp_swap_suitable_orders().
> 
> YOU ARE DOING THIS WRONGLY!

I've only scanned the preceeding thread, but I think you're talking about the
design descision to only allocate large folios that are naturally aligned in
virtual address space, and you're arguing to remove that restriction? The main
reason we gave ourselves that constraint for anon mTHP was because allowing it
would create the possibility of wandering off the end of the PTE table and add
significant complexity to manage neighbouring PTE tables and their respective PTLs.

If the proposal is to start doing this, then I don't agree with that approach.

> 
> All of you anonymous memory people are utterly fixated on TLBs AND THIS
> IS WRONG.  Yes, TLB performance is important, particularly with crappy
> ARM designs, which I know a lot of you are paid to work on.  But you
> seem to think this is the only consideration, and you're making bad
> design choices as a result.  It's overly complicated, and you're leaving
> performance on the table.
> 
> Look back at the results Ryan showed in the early days of working on
> large anonymous folios.  Half of the performance win on his system came
> from using larger TLBs.  But the other half came from _reduced software
> overhead_. 

I would just point out that I think the results you are referring to are for the
kernel compilation workload, and yes this is indeed what I observed. But kernel
compilation is a bit of an outlier since it does a huge amount of fork/exec so
the kernel spends a lot of time fiddling with page tables and faulting. The vast
majority of the reduced sw overhead is due to significantly reducing the number
of faults because we map more pages per fault. But in my experience this
workload is a bit of an outlier; most workloads that I've tested with at least
tend to set up their memory at the start and its static forever more, which
means that those workloads benefit mostly from the TLB benefits - there are very
few existing SW overheads to actually reduce.

> The LRU lock is a huge problem, and using large folios cuts
> the length of the LRU list, hence LRU lock hold time.

I'm sure this is true and you have lots more experience and data than me. And it
makes intuitive sense. But I've never personally seen this in any of the
workloads that I've benchmarked.

Thanks,
Ryan

> 
> Your _own_ data on how hard it is to get hold of a large folio due to
> fragmentation should be enough to convince you that the more large folios
> in the system, the better the whole system runs.  We should not decline to
> allocate large folios just because they can't be mapped with a single TLB!
>