From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CEBCD767D8 for ; Thu, 31 Oct 2024 15:38:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C54426B0088; Thu, 31 Oct 2024 11:38:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C02676B008A; Thu, 31 Oct 2024 11:38:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7C0E6B008C; Thu, 31 Oct 2024 11:38:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 842216B0088 for ; Thu, 31 Oct 2024 11:38:39 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 32959AC8E6 for ; Thu, 31 Oct 2024 15:38:39 +0000 (UTC) X-FDA: 82734303414.03.5551621 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf03.hostedemail.com (Postfix) with ESMTP id 40BD820013 for ; Thu, 31 Oct 2024 15:38:24 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=Y3fwFLR0; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf03.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.175 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730389061; a=rsa-sha256; cv=none; b=AX4EukMj+DchDI9Lgl8fxEttCl5oeveQBI99ArImQDKbevUEiUY8saVH7Lf15KCOShQzi/ IQgAbdrrIbG0QCUORGUJsguspvmfSxcKZT32nT19bOHZvzaIJ+qZcTNDnrnVeqTpOtL3Xq QP97UAXOV8QPoi31YDt6Gh0pqrkZqrc= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=Y3fwFLR0; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf03.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.175 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730389061; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9gWs9upHHF9CR2nmJmlZ16u/IpRbll0Yw0eHTCqcxKQ=; b=y9sJWXRPKS4Ou3n0ycg1wllGZoBJSfVfWXhayKPGBWozpbPCmAC272p7QhxEAzpAhbhKwq wXvHKrjBBci5AXYrAeROuQOD2C9hcouJnVZEUbMts0clbMLE1ehk2KXPWwa+yrjnBM+Xfp ct9msvxBopDuKVvyDjMydKp8ABgzYSY= Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-7b13ff3141aso79535285a.1 for ; Thu, 31 Oct 2024 08:38:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1730389116; x=1730993916; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=9gWs9upHHF9CR2nmJmlZ16u/IpRbll0Yw0eHTCqcxKQ=; b=Y3fwFLR0dG5SI7OnF4sYR6Y1NHPpAWj2U6a+e2Ndqgq2NnkUxeS1BUYSm0tK77iNlA Xl+UM6TSy/kFfbQ6+5/hM9Wap5rnYWXgw9SJZ/aohn1eFrL74V7OVJ59ko0ojY5u6zJf 2A2S+e8BAOJomgYPApFBz4rZGne1I+u5Es/g5hPJTss9sWO4HJrl2natR230BRgh8LTU xXc39Ze90P3YLgZ2aSYziC28scQijXKpVxngtWoySYwn/jF5vv5SbPNwPpWbkDsvSvCg Z6ACbu0HiD31iyPTWPDBHBtA6XAyoqnWRZPResITVMRQtk3Jr/1wxBBfHlQuk4yr3gCA Fktg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730389116; x=1730993916; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9gWs9upHHF9CR2nmJmlZ16u/IpRbll0Yw0eHTCqcxKQ=; b=eXiInPMxS5mIMk7Xim/A0L7Gz2Q2yNU4uJPnVcocdd8+4nGQLHB2Gddpx7UD3jdnwJ wQwC7lXoy6C9cidhcrdIxGK0gXRYiZW+fEAj5/RnDgTvY2r0sZ8D5dZh4IGMIzoVwLGn 88rAVZK0JOfUJv5kT1R1g8RPm8qyN+4T62exx10QQBs2Q/wIh7F2pqeXsapQU1mGZeiQ XvYw3kqaMdow7YbMAW6liAepMpEdjrhW9TdAPqf8Xr878qSvnq3JQFpapUHXzg2OuAvs Rsu3aS6ZUH8R+W779eYFBQBkyJYfBrfVOu16xVHB5m2oOqfS67AaWG7FbWhk/NZMvf54 ybPw== X-Forwarded-Encrypted: i=1; AJvYcCUyzIQsvFpggkgFBK5uOE7DbAr2Km12IgPIynRTqlYGIwgooC7P1TFoAleLAez9lR2/e6vZ29TSdg==@kvack.org X-Gm-Message-State: AOJu0YxNT5He50zD8p08+Col9oxB4ztHUVgi0tC0M8JQeJueNSi8iS9t QhtQTJa9tKeuyYv1bNTRKeGeStmJcjqCdEq7f/c6Ip9pTGR9+1v9Stj7BA76HMk= X-Google-Smtp-Source: AGHT+IEycZbJnyH99AKtQ3mz3WwXxE1TxDuqTuwE9LZ0+n/8BJk+f+Aj1aFZZEyuEntgRoxjJFMrtw== X-Received: by 2002:a05:620a:2996:b0:7b1:55ae:a9bb with SMTP id af79cd13be357-7b1aed6f6a3mr886139685a.13.1730389115774; Thu, 31 Oct 2024 08:38:35 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7b2f3a83159sm78740285a.116.2024.10.31.08.38.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Oct 2024 08:38:34 -0700 (PDT) Date: Thu, 31 Oct 2024 11:38:30 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Usama Arif , Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Kanchana P Sridhar , David Hildenbrand , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song Subject: Re: [PATCH RFC] mm: mitigate large folios usage and swap thrashing for nearly full memcg Message-ID: <20241031153830.GA799903@cmpxchg.org> References: <20241027001444.3233-1-21cnbao@gmail.com> <33c5d5ca-7bc4-49dc-b1c7-39f814962ae0@gmail.com> <852211c6-0b55-4bdd-8799-90e1f0c002c1@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: 40BD820013 X-Rspamd-Server: rspam11 X-Stat-Signature: n46gc8atdnekgbufnc67pax64ktyqm56 X-HE-Tag: 1730389104-336245 X-HE-Meta: U2FsdGVkX19d41U5lZWeIJmTaPrUjRe7C15kgXfGHO/jj0fyqqGejzRzt7H4gfNt4JWx+i1dg1qvWutffGx8h3UpYVsh6yx9Xym6kENA+D6PecOHkNFeIumnXCuL+MF1aZ2DF1lauZUCqCZ/5bzt/tkVgzRrJ9b0PXDeyyvIz6RcEl70U3HZCUhkXv1PtSRZtmDINmKtkmKMKIkQatOX/e3guu3FP7JHswSTd8E9l0WYa4+ps+rSDaIljcsFAbHirOYNQ7ezsZvrMbtdlGEjIQqoZMLKbGON7XT9djzbX0BK1ZSIb5bHCtqZ16JM1TQyVVc/Ou7/BQXqBGttCto150HI5zE2NZNsemCbK6wJOd2tL75+a1fgXJcopbDl/pe2E/AnRneycZte3t1Ebcj7RnuZvAmnv1kNA1x8wSXXGYM3iyAsnsGzK8qeMaIbjLgfv7ZJaFOm9J0udTQ76lkmN1gUfCpkHFXC1ktuH7QDuPCfwPoh164RwIGF3gUyypUqz6EWNukVA1iL/FLeUk0usfs4wrIudDedIiTHgoLk+ObHVAziMWCvfvxy9SOejigmaqNdRLnnCF4afjkaONmOEH0YUvextheVI9YArcp7GBDyS9yKpzPeDK+Rx6Q5pEsb6RZq3tAzfnfLybblq2IduaITvbLdsrpw2O8WkFJKaozvN/84nLRikeXvLcMbOvNMn+LNmD7Q+0VCUiROJ/Tql2a32GU3mrzHR2gNz/Hg4QCAoZ2umYe+LsyXbtvwqcu+P676/fn6vbJxqXbp5lbC53INL0dYG2RnUCnxrl3GQzgg38caTQ1dhsfLCLv8tDq7RzpOWc6WJJbS5ASE5MudcicwYbXX/HZ+xqOz61QIGHjJf0jaMP44lWy44C0qzAZA1HRww3j8S7sphqRAC4eZ9OTziulJ1vJ2IbZNoBOY3orLKeznLBPy6zBOopxFpQHveYDCDmUhwnR28awytjO 98yVgF3L 9+rFGPJoj5Jd2Fuexd4PTu8JO+z9qrotNwEyt2bDC+fM55IUGimXYTvHK54I6l+aF5aEQPwj4ORO5AhAaq4w4gnW+u7yVF4p7Rt7enJSmV1h2IipCtdG8WkVM7qlC89AZrFwd0n676GXmbEaFRca+nQ/WkucURgf0ivbBRJ7iC+WNIytHqHBUnNP9PSxUCFIFKf7WMf0FGrtthpvGtqqN+kIoAIeHQjFCz2NGOs6pEqF4RvFiJvxZcBCX/ZWABWKrrJl6fORZrRH+iytHeZNv7DVMJyH3TuVw01jYPt3nO7UeaUunerAA79TnrHXTx3WjWjkr+Q3YTDo06GtRITB8BoRAlhme2rWmQAS79qRlh5acnPdPYtJ4qMpozCXRDjJBFF996MtRwwpcYQf+7I5ZEXX28elGNWwmW0eYbsK0ic8uswJaV4vK0JC9l5tjYGujuYHyOJXOs787Nc6wkzJvnRIE36mr+7q2B4PzVKcDBKT98P7a5hgGfQo31tCMPvK0HCTq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 30, 2024 at 02:18:09PM -0700, Yosry Ahmed wrote: > On Wed, Oct 30, 2024 at 2:13 PM Usama Arif wrote: > > On 30/10/2024 21:01, Yosry Ahmed wrote: > > > On Wed, Oct 30, 2024 at 1:25 PM Usama Arif wrote: > > >>>> I am not sure that the approach we are trying in this patch is the right way: > > >>>> - This patch makes it a memcg issue, but you could have memcg disabled and > > >>>> then the mitigation being tried here wont apply. > > >>> > > >>> Is the problem reproducible without memcg? I imagine only if the > > >>> entire system is under memory pressure. I guess we would want the same > > >>> "mitigation" either way. > > >>> > > >> What would be a good open source benchmark/workload to test without limiting memory > > >> in memcg? > > >> For the kernel build test, I can only get zswap activity to happen if I build > > >> in cgroup and limit memory.max. > > > > > > You mean a benchmark that puts the entire system under memory > > > pressure? I am not sure, it ultimately depends on the size of memory > > > you have, among other factors. > > > > > > What if you run the kernel build test in a VM? Then you can limit is > > > size like a memcg, although you'd probably need to leave more room > > > because the entire guest OS will also subject to the same limit. > > > > > > > I had tried this, but the variance in time/zswap numbers was very high. > > Much higher than the AMD numbers I posted in reply to Barry. So found > > it very difficult to make comparison. > > Hmm yeah maybe more factors come into play with global memory > pressure. I am honestly not sure how to test this scenario, and I > suspect variance will be high anyway. > > We can just try to use whatever technique we use for the memcg limit > though, if possible, right? You can boot a physical machine with mem=1G on the commandline, which restricts the physical range of memory that will be initialized. Double check /proc/meminfo after boot, because part of that physical range might not be usable RAM. I do this quite often to test physical memory pressure with workloads that don't scale up easily, like kernel builds. > > >>>> - Instead of this being a large folio swapin issue, is it more of a readahead > > >>>> issue? If we zswap (without the large folio swapin series) and change the window > > >>>> to 1 in swap_vma_readahead, we might see an improvement in linux kernel build time > > >>>> when cgroup memory is limited as readahead would probably cause swap thrashing as > > >>>> well. +1 I also think there is too much focus on cgroup alone. The bigger issue seems to be how much optimistic volume we swap in when we're under pressure already. This applies to large folios and readahead; global memory availability and cgroup limits. It happens to manifest with THP in cgroups because that's what you guys are testing. But IMO, any solution to this problem should consider the wider scope. > > >>> I think large folio swapin would make the problem worse anyway. I am > > >>> also not sure if the readahead window adjusts on memory pressure or > > >>> not. > > >>> > > >> readahead window doesnt look at memory pressure. So maybe the same thing is being > > >> seen here as there would be in swapin_readahead? > > > > > > Maybe readahead is not as aggressive in general as large folio > > > swapins? Looking at swap_vma_ra_win(), it seems like the maximum order > > > of the window is the smaller of page_cluster (2 or 3) and > > > SWAP_RA_ORDER_CEILING (5). > > Yes, I was seeing 8 pages swapin (order 3) when testing. So might > > be similar to enabling 32K mTHP? > > Not quite. Actually, I would expect it to be... > > > Also readahead will swapin 4k folios AFAICT, so we don't need a > > > contiguous allocation like large folio swapin. So that could be > > > another factor why readahead may not reproduce the problem. > > Because of this ^. ...this matters for the physical allocation, which might require more reclaim and compaction to produce the 32k. But an earlier version of Barry's patch did the cgroup margin fallback after the THP was already physically allocated, and it still helped. So the issue in this test scenario seems to be mostly about cgroup volume. And then 8 4k charges should be equivalent to a singular 32k charge when it comes to cgroup pressure.