From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 97396E77188
	for <linux-mm@archiver.kernel.org>; Fri, 10 Jan 2025 04:29:41 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id CFAD46B007B; Thu,  9 Jan 2025 23:29:40 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id CAAD06B0082; Thu,  9 Jan 2025 23:29:40 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id B73176B0083; Thu,  9 Jan 2025 23:29:40 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id D22176B007B
	for <linux-mm@kvack.org>; Thu,  9 Jan 2025 23:29:38 -0500 (EST)
Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 0521144203
	for <linux-mm@kvack.org>; Fri, 10 Jan 2025 04:29:37 +0000 (UTC)
X-FDA: 82990263636.11.5176DEC
Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48])
	by imf06.hostedemail.com (Postfix) with ESMTP id 1F5AD180005
	for <linux-mm@kvack.org>; Fri, 10 Jan 2025 04:29:35 +0000 (UTC)
Authentication-Results: imf06.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=XLDDI5VZ;
	spf=pass (imf06.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1736483376;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=r9h8q8mb/iEsRBVQtgkGEOREeUq29DJjPaMT6VBdJlY=;
	b=0IH/RWby/7ilOWQ5oWllKCcFXdCVMczChJ+gH6i7Ga4/YRqcHBruDf6EdrW2aa0mDBokQs
	tretZdNOMX9LJOatwmesgo+X+FEKCZQ2J4l9rh/izGFdoO+ISbM29RIlOnoD59YC/8c7G6
	BnEVolOv60c9nKOXhcL9/BWtPJ6vSls=
ARC-Authentication-Results: i=1;
	imf06.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=XLDDI5VZ;
	spf=pass (imf06.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736483376; a=rsa-sha256;
	cv=none;
	b=bkVyezCHbe1G72b5DKVAMN3wgnjL6y5ApytSetIl1Z4tfmqXi5FkGG1vP2qYzyOlXY+w8T
	Fox69Lvo7iYrFGhq+b7HDqr1fglwDwe9A5KV30l44Umaf9v5pcY05Hy4rHz3zrLW/9mM2i
	V1ar7xUTdAb9p5+MC1/8fg8/z6Vxv98=
Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-6d8fa32d3d6so20833386d6.2
        for <linux-mm@kvack.org>; Thu, 09 Jan 2025 20:29:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1736483375; x=1737088175; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=r9h8q8mb/iEsRBVQtgkGEOREeUq29DJjPaMT6VBdJlY=;
        b=XLDDI5VZVnmuxMPxrj15EnOU9id71F7jO1xzZXspchNdnH8Ul5T9Z/cyObWz+9nq11
         Sz5/Sje8p0h3ealyHJaejzdPCa+jYT6rK9sIFIc7/Es5JfqimfQJTbMlElvPZKup9Qxm
         fHsMkNYJ8z9KtjsnSh8nyky6Em0yjHggVMcVj6eHBvQSMo5JKS8z633KhXQCumpK+HEo
         DGGTvFurDdyBZ+d/8s4mHlVI19hoJSzjWXU61HiLXQ4zO6Ad0j5AhHWDLbcVZmAY/QiW
         1iUAURVc6CzGH8AuoSgWJwbVhvjYNPbCJqvrAOn4zYZWVKVgozVIAq/9FWEYJuUeMPMz
         yf3Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1736483375; x=1737088175;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=r9h8q8mb/iEsRBVQtgkGEOREeUq29DJjPaMT6VBdJlY=;
        b=fKRMWtNnXBEPSiThXu0jS3j87ClyyKDWytBSO5WwOnLEgTcpx2+QG22MVJ7b3b4wsx
         xksGEQ1UKp/bnY+xi1fjN/B3gfw/q7dKDG56Gft35uTEQQfJjSSfYWUk7oY/sxdxEaNm
         TMIcyFtlTErZknJAUS/3Rv3yyIU1d7o4lQ+7rZ44QeXuioVbBEtlQvTlCeU+k1vLlDr5
         Zq8Zg7SntJ41Io1Th5NLjBXvhXlcVWI/Y/RYMi6cZPC7LXK8iHus3jr3C40QQAYsdgH+
         FKJ5NnJLY4F+2NTyNclAQzMcKcqOHChrN9b3PZD2BIcG/qfqhGzjUAw7HIcWen5V0LiE
         DMwA==
X-Forwarded-Encrypted: i=1; AJvYcCXQB2u7ISAjWpKJH19aLf7PzZdXtaMQbJFQpYqldoTbq6vr0osI/X1xaDcOEMoL9urChoDSLuao5g==@kvack.org
X-Gm-Message-State: AOJu0YxNXm16sU1cpbxXVNtI3Gf56i3h3afVN3+3nELMkJsjSCe2HOb3
	Udu9GP6CzBig0Bf15MUpDoxqEXP44b7I04EGzWeszJsg5LO9vUuSR6hV9HUjdAFb6XYzE1xc91G
	W/rmh/g53n4vtJC1WNPiBbxAbgr8=
X-Gm-Gg: ASbGncshvoo3nILCLmL9SNbiJ+0+LvHOszMmzm7mqYLK4gYOfWzlet3MQUn/a/vXAXC
	OTjEtDUYPaY0f4FJj2gjNeLtu9xIiGlIeNB2IYDOexoSJ5STopAQ+WNbPzcvqCFeIgrLoCA==
X-Google-Smtp-Source: AGHT+IFNq+Rht4YHDP7OUWNGDh0pHvtcR/PttAU3VH+1NNRZ9jNWV3tIhMVCpsvf0Keq7HVwFWcWHnlCAFgVAqXOqsY=
X-Received: by 2002:a05:6214:53c2:b0:6d4:85f:ccb7 with SMTP id
 6a1803df08f44-6df9aef1beamr156211786d6.0.1736483375113; Thu, 09 Jan 2025
 20:29:35 -0800 (PST)
MIME-Version: 1.0
References: <58716200-fd10-4487-aed3-607a10e9fdd0@gmail.com>
In-Reply-To: <58716200-fd10-4487-aed3-607a10e9fdd0@gmail.com>
From: Nhat Pham <nphamcs@gmail.com>
Date: Fri, 10 Jan 2025 11:29:23 +0700
X-Gm-Features: AbW1kvaiaQaxSQEZY0mjPkjooCxRtAgE3ash8D64lF7U75j6HFOVRPRpSI9ljJQ
Message-ID: <CAKEwX=PezunYEAjDVi6jumbGCHJEGc9UJaDnfh2nKaX8+UhxFQ@mail.gmail.com>
Subject: Re: [LSF/MM/BPF TOPIC] Large folio (z)swapin
To: Usama Arif <usamaarif642@gmail.com>
Cc: lsf-pc@lists.linux-foundation.org, 
	Linux Memory Management List <linux-mm@kvack.org>, Johannes Weiner <hannes@cmpxchg.org>, Barry Song <21cnbao@gmail.com>, 
	Yosry Ahmed <yosryahmed@google.com>, Shakeel Butt <shakeel.butt@linux.dev>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 1F5AD180005
X-Rspamd-Server: rspam12
X-Stat-Signature: 6rqmhnk7qxdtky9aofwpkj3g1qhgjyuq
X-Rspam-User: 
X-HE-Tag: 1736483375-67522
X-HE-Meta: U2FsdGVkX18TvcwLd2l125Rx7gpv1h9EMZNWne3eLYCHeo58E4aS3G575CB11PgNS+gXRHVH/MoD0yMpiX1dKbJaXh4xZ3aZegRVdN+0jkz400HCOcTX/oPanN773wEOoeSIvsSTob6Knd0mQPwjhibk6Wpfp6rRBHX+sJiHtbGS4c7BRgV/5B/o1EAtaWTVCbLjabv1O7y45ihCwXddIg0/aDYrJm3Or85cZ3a0VZixk6fSkgNF612FhH+86aELOZ8WqQ0dVlC6X56P3UuLYUTF3h7r4K8eEFmrWAOkg7p9JUMCgawpq02bvknDP8fZcl10UA7kB2Da8qy8RB2cwI97T8UXg11e4J1HIGjzPqKIu0DlKIsZtXr7Z/NOpaSNh4XvLuhuR7gdjNENtXfCQvqaSNgwWTkCVed7AeRBrVmc6W+Ihx9t5TgN6Ba+as2t/jjNipGplySUnbv+2R5WUeGP2e9RhIIo6wdFQ1L4ca5fHFcfTvOp7MzAXmgUTOH/woukBJAQszAcwc/u2lirDCNp8AzCvhEecNJRy6Coy7+Er9wAguqaMh2sZDLEuLvbGYYGW9qh7k0G/juLnOQ0Jk6IXItnSAFKHEnwsNa1DUVJvMupPYmyAy9s/x7ABisJ9Jwv60b4oiKo7E/kBFKYnl4rPYXzT/AElTWe7gEYCxEvWms2mFS4Tlv4IXGstDodREyDP2RKsnogGF1Ho9l4rFNoirdAd3GOaZFj6eiiZs//qBO125L6jeIpfnTy2rR5YcZ8yu1m5hgqDPjeQgTIpcNJukkoVMyP1Z2P9NxesZMizInomzW6euCdQlBC1AB7wY/Sp/07DiLHSacgsXlYf9YjY9cRAXghgXOcEpndNtHx7ONwyRyYfznHZtc6nSX26cY3hVwtkddz3luShAvq8FxeXsD7KZoFYPq31mUbkpWOwLZhNJGSHeyIUZ5hzDq5EavYDZZCdZIoiQW/yN5
 oQbel3mZ
 fM/8dx6lggYZFrDzj2RLnYhdpZlRZo+wT1Kkdur/YWsX0Q6sKB586jy5JZFDfsf31twIONhne5V0tlD88hzSCWXtcxObutEpFx3F7fa08Yw19DVkhf6hdSBbHraUHBfetxxpdz9QhMpoOcMjYxwCNQkJlVymqTV44hKrvmJlg7nYag8ZL15qIGNsug21twnc19JUCMn3Wab9cN+rLKDdITTlSVcwcm6orUfvqV/wVCOizZR1tGUJWj84oIx4jktSg4028SNTgRgkvSOp0T0+XgZGb0Ozs0jbEVNUmaaKuYXKOzdYvvG7BSjwzJF8QGJ3wQoGXWKIxoihlo/YieGd53TX8GQ==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Fri, Jan 10, 2025 at 3:08=E2=80=AFAM Usama Arif <usamaarif642@gmail.com>=
 wrote:
>
> I would like to propose a session to discuss the work going on
> around large folio swapin, whether its traditional swap or
> zswap or zram.

I'm interested! Count me in the discussion :)

>
> Large folios have obvious advantages that have been discussed before
> like fewer page faults, batched PTE and rmap manipulation, reduced
> lru list, TLB coalescing (for arm64 and amd).
> However, swapping in large folios has its own drawbacks like higher
> swap thrashing.
> I had initially sent a RFC of zswapin of large folios in [1]
> but it causes a regression due to swap thrashing in kernel
> build time, which I am confident is happening with zram large
> folio swapin as well (which is merged in kernel).
>
> Some of the points we could discuss in the session:
>
> - What is the right (preferably open source) benchmark to test for
> swapin of large folios? kernel build time in limited
> memory cgroup shows a regression, microbenchmarks show a massive
> improvement, maybe there are benchmarks where TLB misses is
> a big factor and show an improvement.
>
> - We could have something like
> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/swapin_enabled
> to enable/disable swapin but its going to be difficult to tune, might
> have different optimum values based on workloads and are likely to be

Might even be different across memory regions.

> left at their default values. Is there some dynamic way to decide when
> to swapin large folios and when to fallback to smaller folios?
> swapin_readahead swapcache path which only supports 4K folios atm has a
> read ahead window based on hits, however readahead is a folio flag and
> not a page flag, so this method can't be used as once a large folio
> is swapped in, we won't get a fault and subsequent hits on other
> pages of the large folio won't be recorded.

Is this beneficial/useful enough to make it into a page flag?

Can we push this to the swap layer, i.e record the hit information on
a per-swap-entry basis instead? The space is a bit tight, but we're
already in the talk for the new swap abstraction layer. If we go the
dynamic route, we can squeeze this kind of information in the
dynamically allocated per-swap-entry metadata structure (swap
descriptor?).

However, the swap entry can go away after a swapin (see
should_try_to_free_swap()), so that might be busted :)

>
> - For zswap and zram, it might be that doing larger block compression/
> decompression might offset the regression from swap thrashing, but it
> brings about its own issues. For e.g. once a large folio is swapped
> out, it could fail to swapin as a large folio and fallback
> to 4K, resulting in redundant decompressions.
> This will also mean swapin of large folios from traditional swap
> isn't something we should proceed with?

Yeah the cost/benefit analysis differs between backend. I wonder if a
one-size-fit-all, backend-agnostic policy could ever work - maybe we
need some backend-driven algorithm, or some sort of hinting mechanism?

This would make the logic uglier though. We've been here before with
HDD and SSD swap, except we don't really care about the former, so we
can prioritize optimizing for SSD swap (in fact looks like we're
removing the HDD portion of the swap allocator). In this case however,
zswap, zram, and SSD swap are all valid options, with different
characteristics that can make the optimal decision differ :)

If we're going the block (de)compression route, there is also this
pesky block size question. For instance, do we want to store the
entire 2MB in a single block? That would mean we need to decompress
the entire 2MB block at load time. It might be more straightforward in
the mTHP world, but we do need to consider 2MB THP users too.

Finally, the calculus might change once large folio allocation becomes
more reliable. Perhaps we can wait until Johannes and Yu make this
work?

>
> - Should we even support large folio swapin? You often have high swap
> activity when the system/cgroup is close to running out of memory, at thi=
s
> point, maybe the best way forward is to just swapin 4K pages and let
> khugepaged [2], [3] collapse them if the surrounding pages are swapped in
> as well.

Perhaps this is the easiest thing to do :)