From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80226CFA45C for ; Wed, 23 Oct 2024 18:03:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73A9E6B0085; Wed, 23 Oct 2024 14:03:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 69C516B0098; Wed, 23 Oct 2024 14:03:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 563C06B0099; Wed, 23 Oct 2024 14:03:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 359E76B0085 for ; Wed, 23 Oct 2024 14:03:30 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B453680D35 for ; Wed, 23 Oct 2024 18:03:14 +0000 (UTC) X-FDA: 82705639044.04.480522A Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf20.hostedemail.com (Postfix) with ESMTP id 969691C0025 for ; Wed, 23 Oct 2024 18:03:06 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="qj1B/pAh"; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729706455; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5eFP8vDVd7jdxnDSszVWk36szMtGFtjiKJDaHiyfHos=; b=PCM8EmvODz9TNQf068Sfbp2Vt/SE3qgVzwWeZDJVDU11cT2mYjrCshRlslG4rfgZ5Q5BL/ PT/+yT6NFbwGKsRGzD/354hJrrxb9nNrQqNWTVw6XJ8P7NpPW1IkujJjP957I6aETXZJ8V KROZ8nMKSMiMtJ0HXh8Y6WNeor+VvjM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729706455; a=rsa-sha256; cv=none; b=z9C4j4EqgQ5sTvjrXInolgCGMlQKw2UQv6osnpgL+YhXSXdo2wlxR3fy80iFjBMGw4KYjK ru+f1IHK6TcuJCU9xJBRy3TP68gzccajUYXHoPWfr7G74GyjGKIreGC8DQ10zk9hp/k2Tc AC7sQ5DU96hS6oM1fC6uhzAlR0aTUX4= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="qj1B/pAh"; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5c9404c0d50so40342a12.3 for ; Wed, 23 Oct 2024 11:03:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729706607; x=1730311407; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5eFP8vDVd7jdxnDSszVWk36szMtGFtjiKJDaHiyfHos=; b=qj1B/pAhKAIzBQA4KuwetMmUuNOlHhYvfhOJW4Kvr5fGe/JGyKTinDEhn+fO2yfuLx 1Ut3FeiLwzUcWrJCGerVgt/metCtpgv3Kx7o/0ZE4TrVtrOTyopfvlYh9gpWe9aq/wiB I1bvVqCdmPWVP6nIXo3Z0vUnMnlVt6yFnsDZPA5Q5V6HqrBQxHscJU8pUY+Xn/42LYtw RpSX0uAY+HqTdUOVaHOl1fLoODtjA9yEqHVFd+mFhL/SViLEcEFpMYu1ACii4M/M2AAy xxl3g2OddxfEa50gUnoqxn3Sof0QCInLOecuQ5zgqDwzwqjAcTHnLUYKPXlu72qWcoRU MjqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729706607; x=1730311407; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5eFP8vDVd7jdxnDSszVWk36szMtGFtjiKJDaHiyfHos=; b=RsYDwbEP3MZhfufTi8V6LjvgMDCyIncVgWom8i+0b90Q3vbzezQbNc0NomDbuvQEnS 3l0o4l1Bp1EkQonCSF0OZ9dcysBvU25pxixdeok3JzEXZeTEiFy1ksN1SbmnjsopvZVr dKUJ5LEIAmjKDTMlHA9MazmV1N8Cl6QnompgJw8Urh981QAUKWSxCV2GSiNCuDKW061+ ku6v4BX2NuRU+zjUZUf15oXhsDZd+MDQpIadBMZQmAmU3Xbnyh8U73KXeusx0NRWOGWc LKyQIYlZ8IP5fUe3yUrOMTdL+dEeDvfbDMlgLaj02hCt6i/7up3UtM7rnK7YxSlO13B0 AgiQ== X-Forwarded-Encrypted: i=1; AJvYcCUOqTwvIZnJuKZdRxs3Ov2SJ7LOg3n7PVKYJPQrD2KTc8VLGW0WKyqA6iwfK84aYIMYaWLn0BqYKw==@kvack.org X-Gm-Message-State: AOJu0Yy/04j93WY0Vpg6ws53hFuzPUfEBGfXmGjxyFbBIyC4scHoWCFb 3pnuOQ1DGYzZGKAathiOfh0lP/goTDBY+KF4wtxW/YqQRMur3V6AX+6HnIFkeEolB079baP3th4 CN6r1LRX08znEe6eO6ZYUMhDWRmIy6G0F2d6+PKi5pOd9i0T51w== X-Google-Smtp-Source: AGHT+IGYGrcE0OeG8Xy8f+rEGoiCHvfXOe43wRySZPTgHVqK6ukFi8XSUN1CiuBKCrdlws12iFH+c5+YobynhJAzRvs= X-Received: by 2002:a17:907:3e91:b0:a99:f7df:b20a with SMTP id a640c23a62f3a-a9abf964e26mr320337166b.62.1729706606270; Wed, 23 Oct 2024 11:03:26 -0700 (PDT) MIME-Version: 1.0 References: <20241018105026.2521366-1-usamaarif642@gmail.com> <5313c721-9cf1-4ecd-ac23-1eeddabd691f@gmail.com> <4c30cc30-0f7c-4ca7-a933-c8edfadaee5c@gmail.com> <7a14c332-3001-4b9a-ada3-f4d6799be555@gmail.com> In-Reply-To: <7a14c332-3001-4b9a-ada3-f4d6799be555@gmail.com> From: Yosry Ahmed Date: Wed, 23 Oct 2024 11:02:50 -0700 Message-ID: Subject: Re: [RFC 0/4] mm: zswap: add support for zswapin of large folios To: Usama Arif Cc: Barry Song <21cnbao@gmail.com>, senozhatsky@chromium.org, minchan@kernel.org, hanchuanhua@oppo.com, v-songbaohua@oppo.com, akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, david@redhat.com, willy@infradead.org, kanchana.p.sridhar@intel.com, nphamcs@gmail.com, chengming.zhou@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, riel@surriel.com, shakeel.butt@linux.dev, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 969691C0025 X-Stat-Signature: c4qudzwqn1e3uy6ouefhc3b1w5bbtsir X-HE-Tag: 1729706586-293369 X-HE-Meta: U2FsdGVkX19t6JfsgnjcJVgoLatTVQqDl+PZUhzGQyv7ywZz69QIWYRdMztfqW4IPZZdKWNtXGq8Fl6fLrVkMTV31EN4C8lulGohmI2i3MTzkperDwsbyDkotf2+v3MWOg6E6zD2Ium9brSl1R7nuRe7sKLTBNvUai4Scy5xrwigIUmD6mbFcuWKVFg7NR2awgHMrShqSJV1mB2vRPvzA6ZgWQlOEXMLSKEPyJJsyNJFMt6M1Uspe52U0hhgxgJ9JV+yjwvXtwr/a+OrVqXqi3SFtk15o8GEo76mnFScNFXStlUkWT70XwOQMh7rH8u6FqwoEGgNbn7pMRrMpXNMZzLa29NlR27fLf9Rtfb69wa9ypKrbPxaXR2EvaoY7RCq5nvHeny3/TrX0sa6bNyr/duAeJ15VDlTQcmmlTirRzh3gR02zJfsQ2m/2HKqEx1B1Uyw30mtAcyaQgeXfvxk22+a1BLEIpvUm+NmVFgvyBj+8U1AQx8nP73XbQkAEJh2GfX4xSn3k9i4ytQ/Yql24xIVibxCEyhhtUaOikDVOhnFCcrZA+jM+DEy+sSqtZv63gJUEVAYVh5tF0IBFZRIxr6wxmuqAsRJ8xQN+rjuCFZHu2sb4UWOXk2s8nT5wVcYFJz7EP5L1db0IFsdRabj94INhgB/3I9fKGH2GTOz+GQnL6hkiXE0jTUCykqAWrcamNIMdORgRs2XkZS8y46VDZv0tEjQ92Rk+6dlfGZRmiZqn1xq1GHuTs4GX0vA4wJKQ0jjXLUPdVqBn/G12iuXe5eR8iM57UVFEpJ4YZ5/tu2Uzlc00+Ntxo3OsDzeHcDAgnS6Jo772w4hMDF1kyl1vOzX3NvVwLZlKVDNgrjUuJQsZiJk0y70qJ8CMhlUgL2jrga7/HNjZxL5xhtYEAp7HBjBQV+LLCEjvLmHdANwQ5O+8LrnPUaEdNkjCsGhbdeblEtoDW53H+xKb+p9qoR U1gARZBc BtZDpEx3m6caCGm8GfkllGvCxW0pyek4qClhyPEGzch79KhhQYxzQ2/X05FmHXabp2LQ1k1kzdLGOrhY3AMS7P+6IzcP/346MQQwhu6eOGq8BwsWUnwEUzVDBbplFWEfmINtx+3bmVG0Bum+lRitT/RLy5CqIw4nQkSN9nrsTpw97t5rrDcaq/IOsGQLGXbua3mJ+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [..] > >> I suspect the regression occurs because you're running an edge case > >> where the memory cgroup stays nearly full most of the time (this isn't > >> an inherent issue with large folio swap-in). As a result, swapping in > >> mTHP quickly triggers a memcg overflow, causing a swap-out. The > >> next swap-in then recreates the overflow, leading to a repeating > >> cycle. > >> > > > > Yes, agreed! Looking at the swap counters, I think this is what is going > > on as well. > > > >> We need a way to stop the cup from repeatedly filling to the brim and > >> overflowing. While not a definitive fix, the following change might help > >> improve the situation: > >> > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> > >> index 17af08367c68..f2fa0eeb2d9a 100644 > >> --- a/mm/memcontrol.c > >> +++ b/mm/memcontrol.c > >> > >> @@ -4559,7 +4559,10 @@ int mem_cgroup_swapin_charge_folio(struct folio > >> *folio, struct mm_struct *mm, > >> memcg = get_mem_cgroup_from_mm(mm); > >> rcu_read_unlock(); > >> > >> - ret = charge_memcg(folio, memcg, gfp); > >> + if (folio_test_large(folio) && mem_cgroup_margin(memcg) < > >> MEMCG_CHARGE_BATCH) > >> + ret = -ENOMEM; > >> + else > >> + ret = charge_memcg(folio, memcg, gfp); > >> > >> css_put(&memcg->css); > >> return ret; > >> } > >> > > > > The diff makes sense to me. Let me test later today and get back to you. > > > > Thanks! > > > >> Please confirm if it makes the kernel build with memcg limitation > >> faster. If so, let's > >> work together to figure out an official patch :-) The above code hasn't consider > >> the parent memcg's overflow, so not an ideal fix. > >> > > Thanks Barry, I think this fixes the regression, and even gives an improvement! > I think the below might be better to do: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c098fd7f5c5e..0a1ec55cc079 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -4550,7 +4550,11 @@ int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, > memcg = get_mem_cgroup_from_mm(mm); > rcu_read_unlock(); > > - ret = charge_memcg(folio, memcg, gfp); > + if (folio_test_large(folio) && > + mem_cgroup_margin(memcg) < max(MEMCG_CHARGE_BATCH, folio_nr_pages(folio))) > + ret = -ENOMEM; > + else > + ret = charge_memcg(folio, memcg, gfp); > > css_put(&memcg->css); > return ret; > > > AMD 16K+32K THP=always > metric mm-unstable mm-unstable + large folio zswapin series mm-unstable + large folio zswapin + no swap thrashing fix > real 1m23.038s 1m23.050s 1m22.704s > user 53m57.210s 53m53.437s 53m52.577s > sys 7m24.592s 7m48.843s 7m22.519s > zswpin 612070 999244 815934 > zswpout 2226403 2347979 2054980 > pgfault 20667366 20481728 20478690 > pgmajfault 385887 269117 309702 > > AMD 16K+32K+64K THP=always > metric mm-unstable mm-unstable + large folio zswapin series mm-unstable + large folio zswapin + no swap thrashing fix > real 1m22.975s 1m23.266s 1m22.549s > user 53m51.302s 53m51.069s 53m46.471s > sys 7m40.168s 7m57.104s 7m25.012s > zswpin 676492 1258573 1225703 > zswpout 2449839 2714767 2899178 > pgfault 17540746 17296555 17234663 > pgmajfault 429629 307495 287859 > Thanks Usama and Barry for looking into this. It seems like this would fix a regression with large folio swapin regardless of zswap. Can the same result be reproduced on zram without this series?