From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8D034D767DF
	for <linux-mm@archiver.kernel.org>; Thu, 31 Oct 2024 16:00:27 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id B550C6B007B; Thu, 31 Oct 2024 12:00:26 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id B046C6B0082; Thu, 31 Oct 2024 12:00:26 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 9CB476B0083; Thu, 31 Oct 2024 12:00:26 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 7FFCC6B007B
	for <linux-mm@kvack.org>; Thu, 31 Oct 2024 12:00:26 -0400 (EDT)
Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id 263E1A1735
	for <linux-mm@kvack.org>; Thu, 31 Oct 2024 16:00:26 +0000 (UTC)
X-FDA: 82734358266.13.BD511C8
Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50])
	by imf21.hostedemail.com (Postfix) with ESMTP id 810CA1C0024
	for <linux-mm@kvack.org>; Thu, 31 Oct 2024 15:59:32 +0000 (UTC)
Authentication-Results: imf21.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=h3FsssmX;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730390262; a=rsa-sha256;
	cv=none;
	b=FLoMVIkCLcBVHcyFldxu+w/1m6mZZZ9nxVxODSoeZ7B4fCynwV3+K4cwWU72aYk0ldUYjU
	IDDPnBwjxQW0mvhUSLNB3qmiIf8dU8AZHLGCJpJR4hC92lsq3hWlP1I6LPslYf9uT3lvQu
	zDK0c9+5bAv+aJitkXXXT2VLwmvEe5o=
ARC-Authentication-Results: i=1;
	imf21.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=h3FsssmX;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1730390262;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=N037QPcNODAt+T6LQFiIHYByKbWxv8lhGI5bnvDi5rU=;
	b=DlAGwI6tENGi1POsSSkjE80gV6M0sbOOKK3aY2Oy0bUqr3kjIt0rLw39kXfMcPKkm5jgfD
	iH8DITcU3+6MqRHPoiMT883Y3l0Bz1qIpi9lvhpCncFpWY6F5309aBlOfETpT+tYdqhzRW
	zHLwbUUmdgAKafUtIc1O1++tDhhRq+I=
Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-6cbe3ea8e3fso6736196d6.0
        for <linux-mm@kvack.org>; Thu, 31 Oct 2024 09:00:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1730390423; x=1730995223; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=N037QPcNODAt+T6LQFiIHYByKbWxv8lhGI5bnvDi5rU=;
        b=h3FsssmX/+r7Qy/VZOaV2oLdu+MMw9z6r0kxh6N4MOc8y8vVGjPl+RcfbuAnOpLJsS
         C62bPQuszaY+TS+ymKpxEGimX1xDGFZc+Ytpp0QPTTGzaKyQ2bqhTiSDPUtbdASpre5E
         VYuYGHBxQdbOL7jRt2mi8bKima9UUgHv+T/YnX3k5tjT74yw3czQT8pX1foAie6z8c1k
         EYwntFnsCEKyBGWASKTs3CGy8AP/wZZUhNrKqhkuCGGO8z3DbWOH1LOYI30Di5P4iK2M
         IsZliLVB7zEg5kMzdSqyUARgfeJkUOu5LJ/RAKC8/Js+04roSRouV6ItdNXyHv7VD4M/
         qdzQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1730390423; x=1730995223;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=N037QPcNODAt+T6LQFiIHYByKbWxv8lhGI5bnvDi5rU=;
        b=HRskffgckeQFrdHLNuhJgE8ynYiw60uGFEt80MRizlJYpU/13cnVNjmZHqapys1Wqh
         RyJC21rBQlvdq4sZgVNWV7OJK7fJajmzAN+W4gRju2OuiiHYO3K2KccVpuMzkOqk+Smg
         Dlm5gRqjU6xu0l/sEfyUYiApRlr/97DxnE0x/2D0Azp/FinuNF00j1v7Sdr9kmGlyaJM
         mQvYQkkNwmtTkHzG5Sb4KXI0lPmMk0ahQHD0vK3+lowIft8z/XNqMm2Bvw7534Nn4ifT
         bDRPaxosSid7NwnKBpnw92qVC8EiWac87Ulcjpver90hPP0GkBN3r+2YelSFPKGiGwS6
         TH4g==
X-Forwarded-Encrypted: i=1; AJvYcCWUQlE6nvDovYbLBaytZ4CUbW9H2JE3PRpHPK3DqFklVtNQqumkKZf3us1Ar/3P38qFMVyYxO8jTg==@kvack.org
X-Gm-Message-State: AOJu0YyENoQtmZuPnFAdx8M8H+tAP1NwYOSVAKHU5u2v31jJZXpi1C9M
	7VGU4BNS2e5HC/1+0A74KK05Z1zYt2XVOexprqbdhWeisi+pe+1K8wBvBROLPOPEBWHfhUgpNpk
	t7NNQq/EG+hHyBkMQxAbllshdaoZvUmr0tP9Y
X-Google-Smtp-Source: AGHT+IFOamOItuF+r4b4reM3jE9mjWoU9TWypbyuHHQFG7gxgXonzt2LCi2MPyQTtx6nlN+Wun2ISIW3fyLX+bI78hc=
X-Received: by 2002:a05:6214:3f88:b0:6c7:5e6d:3f79 with SMTP id
 6a1803df08f44-6d351b2fb6dmr49677566d6.48.1730390422509; Thu, 31 Oct 2024
 09:00:22 -0700 (PDT)
MIME-Version: 1.0
References: <20241027001444.3233-1-21cnbao@gmail.com> <33c5d5ca-7bc4-49dc-b1c7-39f814962ae0@gmail.com>
 <CAGsJ_4wdgptMK0dDTC5g66OE9WDxFDt7ixDQaFCjuHdTyTEGiA@mail.gmail.com>
 <e8c6d46c-b8cf-4369-aa61-9e1b36b83fe3@gmail.com> <CAJD7tkZ60ROeHek92jgO0z7LsEfgPbfXN9naUC5j7QjRQxpoKw@mail.gmail.com>
 <852211c6-0b55-4bdd-8799-90e1f0c002c1@gmail.com> <CAJD7tkaXL_vMsgYET9yjYQW5pM2c60fD_7r_z4vkMPcqferS8A@mail.gmail.com>
 <c76635d7-f382-433a-8900-72bca644cdaa@gmail.com> <CAJD7tkYSRCjtEwP=o_n_ZhdfO8nga-z-a=RirvcKL7AYO76XJw@mail.gmail.com>
 <20241031153830.GA799903@cmpxchg.org>
In-Reply-To: <20241031153830.GA799903@cmpxchg.org>
From: Yosry Ahmed <yosryahmed@google.com>
Date: Thu, 31 Oct 2024 08:59:46 -0700
Message-ID: <CAJD7tkZ_xQHMoze_w3yBHgjPhQeDynJ+vWddbYKFzi2c63sT7w@mail.gmail.com>
Subject: Re: [PATCH RFC] mm: mitigate large folios usage and swap thrashing
 for nearly full memcg
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Usama Arif <usamaarif642@gmail.com>, Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, 
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, 
	Barry Song <v-songbaohua@oppo.com>, Kanchana P Sridhar <kanchana.p.sridhar@intel.com>, 
	David Hildenbrand <david@redhat.com>, Baolin Wang <baolin.wang@linux.alibaba.com>, 
	Chris Li <chrisl@kernel.org>, "Huang, Ying" <ying.huang@intel.com>, 
	Kairui Song <kasong@tencent.com>, Ryan Roberts <ryan.roberts@arm.com>, 
	Michal Hocko <mhocko@kernel.org>, Roman Gushchin <roman.gushchin@linux.dev>, 
	Shakeel Butt <shakeel.butt@linux.dev>, Muchun Song <muchun.song@linux.dev>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Stat-Signature: 3hzmtiuzfxf5scauz1wytm8cekexrdi8
X-Rspamd-Queue-Id: 810CA1C0024
X-Rspamd-Server: rspam08
X-Rspam-User: 
X-HE-Tag: 1730390372-906957
X-HE-Meta: U2FsdGVkX18FSFDtOfct6J9+52HF+4ih1wUAJLk0sqCzq26oS0/rVhKcq9RNuXORT4g9EIoL2PGHs0lF8e9ZoB9Oexkr/jdPNciLtdS0/c2g+tmRBvBQMwxybiRCDT5DS0GLT7njIdI4xFTHtCfpHr1+2Ch3EQme6LpAHpmc6ncYvBTMzunGAvQL094BFzIqou8czSldxwvzAFFpzshFXxUwMLq1n/NOc0Of8ND+MZuqUrUKBRY55sxlizA6NcbfHXQl6Aeqx1HQE+WKBNJkaadfFgj10YdE5g35HnmfprXIbEiowQw6jpBmdPJAfky69R2LKKZDPL23pFr98ok6U8f9oIhX0t3QnsJALQ2J/nRd0erpop8yOrBfzn7G09Ytd3veTMLPehvkLKRsJ5AL1FRwK9TIDr5jvE7wqaSmTooDbRY4ReadZo6mlLF2NQp/weA6quV5EFBaBE0UR5pfaBNuarDS6WVV9IAIwkqwPwa5DwFjooPUgah9iXHxXFU6nrQlsfWT//GouCjqcnGfHz7nfViR15GIfv6ipRTtj2nFH6w0Ffw/CwFdVRhciSbJuUVNCV2ntBy5TQ52843U1D8G14Rxnr4ZGbYA8YomE42hT/6QW3aeuBek1kb3EzJljqV5thaCRieRddT4iWlywFzCTS3xmKydTnTDSrmqG7kefVYk0GK83XlLC1DZotQN/xifukFrzB3jEIP0vE2xUXEyGlRrqza/t0zQiDGmeJWN332oC/YywITthVQQ2Vl3YQQbk0waFwkQqalnCWZkYHk1z8MYFYpIo3RqzQgSOQIxHB/cWl0dCqm8IMuq4agb4SwG5Fpi357n59h6iF3mWvZuBd4TFSpHglS9Ery/gOzVWO2jEFxJBhqXXARSBvwftv3pLyKXGgLMq2k7p0ocUcF0R6C0UFL71w3tXqOt4c2SZKx5VXLkDHfPkDZrP6HTbA0kOo4AKKkkyNxuZ6c
 neNBNlJ9
 Avs1NC+hxgLQ1ak6DYal3Z39ODr//jUPJyH3E9UgaR6DH9HB6/GjrD6oeK/2Phl4G1pEnYF133pfrt6+Mgo1OYKJc9IwI2eoZal7g+5bBZmMHwlsofPJqMDoixBLll0g+xIJxIx/64Va6HbRJy92jdg+Y4pj04iEvVgTXW790P0upGqEG6We2eZvknTTqmbMvlumXnWruJCa1eFM9Tz6Ed4dXIOXUr6EZ8SYL+MAQqpDqcDrrzdlCLTiZulgZkSw3ikDjLcPlUBypAffbDeqsWnD4vw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, Oct 31, 2024 at 8:38=E2=80=AFAM Johannes Weiner <hannes@cmpxchg.org=
> wrote:
>
> On Wed, Oct 30, 2024 at 02:18:09PM -0700, Yosry Ahmed wrote:
> > On Wed, Oct 30, 2024 at 2:13=E2=80=AFPM Usama Arif <usamaarif642@gmail.=
com> wrote:
> > > On 30/10/2024 21:01, Yosry Ahmed wrote:
> > > > On Wed, Oct 30, 2024 at 1:25=E2=80=AFPM Usama Arif <usamaarif642@gm=
ail.com> wrote:
> > > >>>> I am not sure that the approach we are trying in this patch is t=
he right way:
> > > >>>> - This patch makes it a memcg issue, but you could have memcg di=
sabled and
> > > >>>> then the mitigation being tried here wont apply.
> > > >>>
> > > >>> Is the problem reproducible without memcg? I imagine only if the
> > > >>> entire system is under memory pressure. I guess we would want the=
 same
> > > >>> "mitigation" either way.
> > > >>>
> > > >> What would be a good open source benchmark/workload to test withou=
t limiting memory
> > > >> in memcg?
> > > >> For the kernel build test, I can only get zswap activity to happen=
 if I build
> > > >> in cgroup and limit memory.max.
> > > >
> > > > You mean a benchmark that puts the entire system under memory
> > > > pressure? I am not sure, it ultimately depends on the size of memor=
y
> > > > you have, among other factors.
> > > >
> > > > What if you run the kernel build test in a VM? Then you can limit i=
s
> > > > size like a memcg, although you'd probably need to leave more room
> > > > because the entire guest OS will also subject to the same limit.
> > > >
> > >
> > > I had tried this, but the variance in time/zswap numbers was very hig=
h.
> > > Much higher than the AMD numbers I posted in reply to Barry. So found
> > > it very difficult to make comparison.
> >
> > Hmm yeah maybe more factors come into play with global memory
> > pressure. I am honestly not sure how to test this scenario, and I
> > suspect variance will be high anyway.
> >
> > We can just try to use whatever technique we use for the memcg limit
> > though, if possible, right?
>
> You can boot a physical machine with mem=3D1G on the commandline, which
> restricts the physical range of memory that will be initialized.
> Double check /proc/meminfo after boot, because part of that physical
> range might not be usable RAM.
>
> I do this quite often to test physical memory pressure with workloads
> that don't scale up easily, like kernel builds.
>
> > > >>>> - Instead of this being a large folio swapin issue, is it more o=
f a readahead
> > > >>>> issue? If we zswap (without the large folio swapin series) and c=
hange the window
> > > >>>> to 1 in swap_vma_readahead, we might see an improvement in linux=
 kernel build time
> > > >>>> when cgroup memory is limited as readahead would probably cause =
swap thrashing as
> > > >>>> well.
>
> +1
>
> I also think there is too much focus on cgroup alone. The bigger issue
> seems to be how much optimistic volume we swap in when we're under
> pressure already. This applies to large folios and readahead; global
> memory availability and cgroup limits.

Agreed, although the characteristics of large folios and readahead are
different. But yeah, different flavors of the same problem.

>
> It happens to manifest with THP in cgroups because that's what you
> guys are testing. But IMO, any solution to this problem should
> consider the wider scope.

+1, and I really think this should be addressed separately, not just
rely on large block compression/decompression to offset the cost. It's
probably not just a zswap/zram problem anyway, it just happens to be
what we support large folio swapin for.

>
> > > >>> I think large folio swapin would make the problem worse anyway. I=
 am
> > > >>> also not sure if the readahead window adjusts on memory pressure =
or
> > > >>> not.
> > > >>>
> > > >> readahead window doesnt look at memory pressure. So maybe the same=
 thing is being
> > > >> seen here as there would be in swapin_readahead?
> > > >
> > > > Maybe readahead is not as aggressive in general as large folio
> > > > swapins? Looking at swap_vma_ra_win(), it seems like the maximum or=
der
> > > > of the window is the smaller of page_cluster (2 or 3) and
> > > > SWAP_RA_ORDER_CEILING (5).
> > > Yes, I was seeing 8 pages swapin (order 3) when testing. So might
> > > be similar to enabling 32K mTHP?
> >
> > Not quite.
>
> Actually, I would expect it to be...
>
> > > > Also readahead will swapin 4k folios AFAICT, so we don't need a
> > > > contiguous allocation like large folio swapin. So that could be
> > > > another factor why readahead may not reproduce the problem.
> >
> > Because of this ^.
>
> ...this matters for the physical allocation, which might require more
> reclaim and compaction to produce the 32k. But an earlier version of
> Barry's patch did the cgroup margin fallback after the THP was already
> physically allocated, and it still helped.
>
> So the issue in this test scenario seems to be mostly about cgroup
> volume. And then 8 4k charges should be equivalent to a singular 32k
> charge when it comes to cgroup pressure.

In this test scenario, yes, because it's only exercising cgroup
pressure. But if we want a general solution that also addresses global
pressure, I expect large folios to be worse because of the contiguity
and the size (compared to default readahead window sizes). So I think
we shouldn't only test with readahead, as it won't cover some of the
large folio cases.