From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E61F8D6B6DB
	for <linux-mm@archiver.kernel.org>; Wed, 30 Oct 2024 21:13:21 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 741BD6B00B5; Wed, 30 Oct 2024 17:13:21 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 6CAF46B00B8; Wed, 30 Oct 2024 17:13:21 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 56B266B00BC; Wed, 30 Oct 2024 17:13:21 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 2EFE56B00B5
	for <linux-mm@kvack.org>; Wed, 30 Oct 2024 17:13:21 -0400 (EDT)
Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id DA86BAAF87
	for <linux-mm@kvack.org>; Wed, 30 Oct 2024 21:13:20 +0000 (UTC)
X-FDA: 82731518058.07.EFD3C6D
Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54])
	by imf25.hostedemail.com (Postfix) with ESMTP id 22D5DA0027
	for <linux-mm@kvack.org>; Wed, 30 Oct 2024 21:12:59 +0000 (UTC)
Authentication-Results: imf25.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=QmbYLApg;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf25.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730322668; a=rsa-sha256;
	cv=none;
	b=gW7H4e8h+JOvA8KZK7lAggjZl8a+Gq6q32Tn1Omt8nCO+cQotemVPfM5uvLHispslwqg/T
	XOWhfPnADXXUWla0bAqg9f8LV43RkXdzOP/b7P78s7lSh7v6UpwQAd02A5wraQFKMyS8XP
	fgtVG+v2VL8DUn7qA8GtGe91gGvYuZ8=
ARC-Authentication-Results: i=1;
	imf25.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=QmbYLApg;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf25.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1730322668;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=WKeGul/SDsN92Pb5au+3ekIaxkpI//pN1FvsRpcKQm8=;
	b=twWeDapFnhq90ZDQQ8KHFKsSaAmnVkoJ0uCbPXVORaollM8d8uz/5zqhXji4t75U6rHT9z
	xg++0NtnLskAHQZbVHsSWVcKLe9jUMvXH0C02XroN/bvftyE0G/7ZJT3HvKdDI9anr85Rz
	EMlFfur0tcSk1g5/zY2gPQsfEPSCTao=
Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-43155afca99so8831605e9.1
        for <linux-mm@kvack.org>; Wed, 30 Oct 2024 14:13:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1730322797; x=1730927597; darn=kvack.org;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:cc:to:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=WKeGul/SDsN92Pb5au+3ekIaxkpI//pN1FvsRpcKQm8=;
        b=QmbYLApglNI3Usd1lUR6ChJqm6w2Gxpen4aA1qLjy1hZXqg+UHokEgy6r1SQm+vE1t
         5H3e8WNH5MXOFDfIbEIBkEvY5EmsFoAIj8RBZGXyGaRSfTFjl5SxGBtNR/ZGQYDxG4q5
         NOLMJkbNWpSgooyyG2/nIk+Edak/TsmxS8vzSprd57UXMU+SjZLIS0X36MI0XaEKCCvO
         Idp+ZH2URPXUDcK+SpJGDdeBPS/uFxmvbI44GAjhdwtuQVyoH224FrpSGzMyilnRtIF5
         QPlO9l40f6FyR8mKxrjCtPE6t4BL88aHmxkJynJH9Olwc6kYN8Dy9RXuhC2vw0Zg0Ef0
         Srmg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1730322797; x=1730927597;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:cc:to:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=WKeGul/SDsN92Pb5au+3ekIaxkpI//pN1FvsRpcKQm8=;
        b=odoQRYku5LQ+oB5fQRXVoG3XRpPEFIpAk13HnP56TCSkZhvs8spBTp9cNVvSCaKp0x
         T+bpZjLAU4nyVyrtscIaITyRm5S+9dM9rS1+7ovx+yXEauNhkRUBf+qpVbCOT65tHOaJ
         DSbfqPbIpEpaHmmpnwTrxBjKePHHNNjZuW9e+0VkO4iMEfROthyme4yi9abFVG/meKjM
         iyneQ+72BAMSDXVj2+/o1NNPhHj+TVrx+CUvpo6g2vOIZEVvgap/7Dp9BC3cl1W5LKMi
         1m4XjOyCGjr9WDW+nOQWf3vg/eSqotrPy0AQITcSa+eGC3B/lxeexIANrl2SNICU6i5Z
         hOmg==
X-Forwarded-Encrypted: i=1; AJvYcCXODAOmfjZ3GzimRsMQuzgLRFVY0vM82PBLGUuWzpm9u9bUIRu0E2j8A9EVlEw8LgIYZ9N2skkzew==@kvack.org
X-Gm-Message-State: AOJu0YxlsZLRjMj9O2Aem76n4Hiaecc4CQdgxxhnWvqxitUnjbUBxOjB
	dTCFF4WsPoRoJ0uQ5Ql8Wx1jqVEpClqWC7ZjTQTa4GUDpwivg3Up
X-Google-Smtp-Source: AGHT+IGWn1XwDEoeEIROjeVkfQWLW+GPMZvxP5fY7z850DOOVJ5adUnyA6yGIVW4DrvSn1gA8NC3Xg==
X-Received: by 2002:a05:600c:1d20:b0:431:4a82:97f2 with SMTP id 5b1f17b1804b1-4327da87536mr929165e9.6.1730322797293;
        Wed, 30 Oct 2024 14:13:17 -0700 (PDT)
Received: from ?IPV6:2a02:6b67:d751:7400:c2b:f323:d172:e42a? ([2a02:6b67:d751:7400:c2b:f323:d172:e42a])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-381c10e7387sm106489f8f.51.2024.10.30.14.13.16
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 30 Oct 2024 14:13:17 -0700 (PDT)
Message-ID: <c76635d7-f382-433a-8900-72bca644cdaa@gmail.com>
Date: Wed, 30 Oct 2024 21:13:16 +0000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH RFC] mm: mitigate large folios usage and swap thrashing
 for nearly full memcg
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org,
 linux-mm@kvack.org, linux-kernel@vger.kernel.org,
 Barry Song <v-songbaohua@oppo.com>,
 Kanchana P Sridhar <kanchana.p.sridhar@intel.com>,
 David Hildenbrand <david@redhat.com>,
 Baolin Wang <baolin.wang@linux.alibaba.com>, Chris Li <chrisl@kernel.org>,
 "Huang, Ying" <ying.huang@intel.com>, Kairui Song <kasong@tencent.com>,
 Ryan Roberts <ryan.roberts@arm.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Michal Hocko <mhocko@kernel.org>, Roman Gushchin <roman.gushchin@linux.dev>,
 Shakeel Butt <shakeel.butt@linux.dev>, Muchun Song <muchun.song@linux.dev>
References: <20241027001444.3233-1-21cnbao@gmail.com>
 <33c5d5ca-7bc4-49dc-b1c7-39f814962ae0@gmail.com>
 <CAGsJ_4wdgptMK0dDTC5g66OE9WDxFDt7ixDQaFCjuHdTyTEGiA@mail.gmail.com>
 <e8c6d46c-b8cf-4369-aa61-9e1b36b83fe3@gmail.com>
 <CAJD7tkZ60ROeHek92jgO0z7LsEfgPbfXN9naUC5j7QjRQxpoKw@mail.gmail.com>
 <852211c6-0b55-4bdd-8799-90e1f0c002c1@gmail.com>
 <CAJD7tkaXL_vMsgYET9yjYQW5pM2c60fD_7r_z4vkMPcqferS8A@mail.gmail.com>
Content-Language: en-US
From: Usama Arif <usamaarif642@gmail.com>
In-Reply-To: <CAJD7tkaXL_vMsgYET9yjYQW5pM2c60fD_7r_z4vkMPcqferS8A@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Rspamd-Server: rspam12
X-Rspamd-Queue-Id: 22D5DA0027
X-Stat-Signature: hzosr59jy53mzkc5tir84w77yn1qkq6r
X-Rspam-User: 
X-HE-Tag: 1730322779-937819
X-HE-Meta: U2FsdGVkX1+Zq3qxjOu9Jt+ZzuS1k4ghUuqTQ3B/HNi7H2g162nH6xQhjGHajRtNgDNuzMT8qaCzDNFwNlsyXZbKJIZ8JSkZnmUV+LuWtPH5GUFZ8XGJWc+1t9vOEhQOKkLqNPHx31bZpGvOLq0J2CdVMZgDwsYL5udEYowNnYGrkpGPMAdfhtBQ62rneH1vNi4XFmruSP4ZrsN4NaeQt852r/9dsRcXK+ZJ0ICEtuFeIUzf6+8kopypZ8Njsm2KMiTtG00i4b+Tnaubm8Um8TGEpZLGF5ku80jycd18Q3VPOB9/5S/z8v7or28exWlTsV0wC5u2n/vxQSM9SeUMWzDqwgjtwHHu/ff+7f4OMey3OPCG7zblwSQHURiZ5TmJqFzvkS9Iblln/yUVYli0WTD9uu0a5nToNg92yYFUz0SPOodbd+dVeppxY7n3wznBW4RlpynspHO7yfJNN7pGypEjOmhPR6SgtdFaWWa0brV1yusPnT+xgFAZO0Yv5h1eVaCTjiyMsBNRKOOGEg9NzXazC2HVJjqQU7KHRoJuzjQIys4lxulYtxfTqowmR1RNk+uZm9Uo7YHfnB7qsw7U17SdPsFkIS82M7ZK/sUw3dxFg2s/8bBHeUncKNO2KFmsQc1mLHpKx04IovYF/Kqk81bAOJJe6f+fDUrKGG8z1lPS+VqwRYMDwtrT3tDGHj+TrH7vRfqPQRp5cuTki5MVvAMAbkdvuzdW1D5WlSF8OOSOd5AzusQx/JMyzsCtx2n3PhgHtjHQliMwlAz7KGwQOq6OvCwBTTR97hIaUZ2vmRWqEDosEeir4Gy4pQgsHyT3op+W2e+Q99apALWZGkEYie14G7S3IjYBLYo+hcGAaFuqaJpx71Wyk51E4US1bjfG9o3XCZbTMpQoXzlFetwDHXGj8d1V7nmgp+xmekgLxyMlAiHa0qvEYqACDgOJsRucyNt0qx4mv6We/oelJke
 5key2xM1
 l6TDUF5kYSoE900FZvLBYbotyJZ1O2lTNM6JM6/HaoH0MeqB0kGi5Jw6wZRao0rkSLu6lsyXetHTrPSIsF02i57B5RBgYRMNbEmOWs40sFl3KVgG9OZACJhbz4fPUeeOti84iTVsEJQQEKF7DK8TaNJKBOzueW4GsSRw+G49lkt6PzgVJa7p4VBo4rxpEgx92kW4J/MRSwz1TYUthDPsOwsH6vAFzHWOzjI2GapvzeTrvGFJhdEnmqOVcpoVIwxfrT2q5+9MrncGiVO8ynx3KgksWKf1QGkB2UZkxmqQZSDtzvsUid/SSYUG98YoKSZcMiM0NKencK6g/PfDLeL+n1FE/cPWTz60GuaHo409VjoJ6GlVoe04M7HALYi8EhEtNsfKYNe2JMuhx2vP25fHDn8+fM942YKPA1GT0OoPvsVlJanm5Sn4YGdH4tkIIgSJQd5Bo3QsQJnbfaBKyUoBOUhGX5CxO5LXGC6qQLYz73LHzmwA=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000055, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>


On 30/10/2024 21:01, Yosry Ahmed wrote:
> On Wed, Oct 30, 2024 at 1:25 PM Usama Arif <usamaarif642@gmail.com> wrote:
>>
>>
>>
>> On 30/10/2024 19:51, Yosry Ahmed wrote:
>>> [..]
>>>>> My second point about the mitigation is as follows: For a system (or
>>>>> memcg) under severe memory pressure, especially one without hardware TLB
>>>>> optimization, is enabling mTHP always the right choice? Since mTHP operates at
>>>>> a larger granularity, some internal fragmentation is unavoidable, regardless
>>>>> of optimization. Could the mitigation code help in automatically tuning
>>>>> this fragmentation?
>>>>>
>>>>
>>>> I agree with the point that enabling mTHP always is not the right thing to do
>>>> on all platforms. I also think it might be the case that enabling mTHP
>>>> might be a good thing for some workloads, but enabling mTHP swapin along with
>>>> it might not.
>>>>
>>>> As you said when you have apps switching between foreground and background
>>>> in android, it probably makes sense to have large folio swapping, as you
>>>> want to bringin all the pages from background app as quickly as possible.
>>>> And also all the TLB optimizations and smaller lru overhead you get after
>>>> you have brought in all the pages.
>>>> Linux kernel build test doesnt really get to benefit from the TLB optimization
>>>> and smaller lru overhead, as probably the pages are very short lived. So I
>>>> think it doesnt show the benefit of large folio swapin properly and
>>>> large folio swapin should probably be disabled for this kind of workload,
>>>> eventhough mTHP should be enabled.
>>>>
>>>> I am not sure that the approach we are trying in this patch is the right way:
>>>> - This patch makes it a memcg issue, but you could have memcg disabled and
>>>> then the mitigation being tried here wont apply.
>>>
>>> Is the problem reproducible without memcg? I imagine only if the
>>> entire system is under memory pressure. I guess we would want the same
>>> "mitigation" either way.
>>>
>> What would be a good open source benchmark/workload to test without limiting memory
>> in memcg?
>> For the kernel build test, I can only get zswap activity to happen if I build
>> in cgroup and limit memory.max.
> 
> You mean a benchmark that puts the entire system under memory
> pressure? I am not sure, it ultimately depends on the size of memory
> you have, among other factors.
> 
> What if you run the kernel build test in a VM? Then you can limit is
> size like a memcg, although you'd probably need to leave more room
> because the entire guest OS will also subject to the same limit.
> 

I had tried this, but the variance in time/zswap numbers was very high.
Much higher than the AMD numbers I posted in reply to Barry. So found
it very difficult to make comparison.

>>
>> I can just run zswap large folio zswapin in production and see, but that will take me a few
>> days. tbh, running in prod is a much better test, and if there isn't any sort of thrashing,
>> then maybe its not really an issue? I believe Barry doesnt see an issue in android
>> phones (but please correct me if I am wrong), and if there isnt an issue in Meta
>> production as well, its a good data point for servers as well. And maybe
>> kernel build in 4G memcg is not a good test.
> 
> If there is a regression in the kernel build, this means some
> workloads may be affected, even if Meta's prod isn't. I understand
> that the benchmark is not very representative of real world workloads,
> but in this instance I think the thrashing problem surfaced by the
> benchmark is real.
> 
>>
>>>> - Instead of this being a large folio swapin issue, is it more of a readahead
>>>> issue? If we zswap (without the large folio swapin series) and change the window
>>>> to 1 in swap_vma_readahead, we might see an improvement in linux kernel build time
>>>> when cgroup memory is limited as readahead would probably cause swap thrashing as
>>>> well.
>>>
>>> I think large folio swapin would make the problem worse anyway. I am
>>> also not sure if the readahead window adjusts on memory pressure or
>>> not.
>>>
>> readahead window doesnt look at memory pressure. So maybe the same thing is being
>> seen here as there would be in swapin_readahead?
> 
> Maybe readahead is not as aggressive in general as large folio
> swapins? Looking at swap_vma_ra_win(), it seems like the maximum order
> of the window is the smaller of page_cluster (2 or 3) and
> SWAP_RA_ORDER_CEILING (5).
Yes, I was seeing 8 pages swapin (order 3) when testing. So might
be similar to enabling 32K mTHP?

> 
> Also readahead will swapin 4k folios AFAICT, so we don't need a
> contiguous allocation like large folio swapin. So that could be
> another factor why readahead may not reproduce the problem.
> 
>> Maybe if we check kernel build test
>> performance in 4G memcg with below diff, it might get better?
> 
> I think you can use the page_cluster tunable to do this at runtime.
> 
>>
>> diff --git a/mm/swap_state.c b/mm/swap_state.c
>> index 4669f29cf555..9e196e1e6885 100644
>> --- a/mm/swap_state.c
>> +++ b/mm/swap_state.c
>> @@ -809,7 +809,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
>>         pgoff_t ilx;
>>         bool page_allocated;
>>
>> -       win = swap_vma_ra_win(vmf, &start, &end);
>> +       win = 1;
>>         if (win == 1)
>>                 goto skip;
>>