From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3419FCFA45C for ; Wed, 23 Oct 2024 18:31:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8649E6B0092; Wed, 23 Oct 2024 14:31:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 83B1A6B0096; Wed, 23 Oct 2024 14:31:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70D056B0098; Wed, 23 Oct 2024 14:31:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4CC5E6B0092 for ; Wed, 23 Oct 2024 14:31:51 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 514B61A0930 for ; Wed, 23 Oct 2024 18:31:19 +0000 (UTC) X-FDA: 82705710276.22.706F4B8 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf26.hostedemail.com (Postfix) with ESMTP id C0DF5140020 for ; Wed, 23 Oct 2024 18:31:35 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="BK7/AXvc"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729708141; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EwQqP5AWVTYwFtS7kdVHGRz6sy9Kk9hcy334T4mBOI8=; b=UwSYKXrb0I6WjgPigHyxxoqs9tZRGTMcxhx6VZ+pT3KMVBAv01h8ZW5s15Q28JbuKiwk0A GMWUBmNxY1WlYEOIIaEHSZNvU+zGZ8uZjTbbqEyWnnQBx1lTAAVoRGrY6b0et/Vljrq495 fdD/iRu/RKRINxgTzr2pBmA0YV8xEBY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729708141; a=rsa-sha256; cv=none; b=dUtgyNa6/idCL2XEnGAwAGnhjSwkkV2r/dORe0+mA0KR2MROK1TEVAU3xLIWnqPCxgUBhK ZiAr9geqt49/zOFfB/UKIc+PhAGwYjaXCMdLSL4MzDUlhzZ9kMPTPG7wiuxSeSjeFfOplQ lbF3iUFrhwA5DXdFgGv7TFeJ4UVUCpw= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="BK7/AXvc"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5c9404c0d50so70051a12.3 for ; Wed, 23 Oct 2024 11:31:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729708307; x=1730313107; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=EwQqP5AWVTYwFtS7kdVHGRz6sy9Kk9hcy334T4mBOI8=; b=BK7/AXvcWUw00PlE0x0qtiLhJQmiFyj88jeE7pqgEApZLDyTU8VM6TcE5VBqd6p/ZG gmCveotvyclbifmMVm2pfnuLGq7wDbTJoKvp7q4bI/O4FvR3abV4W1T00cjvP7aKQTb0 tJFy2pdpLn5lCae8b7m2dBcQ6MRpuS/hHM3uHhqd/DsQJ+WgIkC2IbXHb445bL+UMKv3 jEcojRsZ0i7lZgAiNyCUJ5TjpyaxeFcrZgGDKoFTs1WHKlym18LfC/mBsJq2kdN8gA8p ozEZeH6Jw3SQ6TX0OLJ6QZQLevc5vaNiby9FUavLcIjQbStElAJkVIZ6x5NmVU/AGMIj b/Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729708307; x=1730313107; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EwQqP5AWVTYwFtS7kdVHGRz6sy9Kk9hcy334T4mBOI8=; b=Tp7AHud1lrG93S6mEf7dcInhvOG4FpSpAHJ2tEtUZkTubUgqUdaqZEl+3zpwRu7RRE GdI9UdBZRjRtOF4SttpPLqGkSylLTfPUaKbquU35ZntCAh9l8qZOA5i58d+MrWxvNbm2 VTnB5TVLR9098v0g7Wl8rYaVsM86NaiYkkT2onEkYQtQuFNb3Kc0m9x9IRG1IQjwPYi9 OK26dclGUl0GEyynFXpGkOo/kTIlBW54dke0M4cKl7Mn++PEnGvnzZcXx4TE8VyEJiN2 6zIBAz7yDJ+dxo8oS0f9YHmS4UDWPeVeqYZcU9LkHjv8fYpsewi5jGUg8+/DHqwpDjmJ KmoA== X-Forwarded-Encrypted: i=1; AJvYcCV4gSE+o5ih96I3DYi8qHy8JRsdpwRkn6accoOBhLRSdxWm6BgCg76wSSqiQx+aTi+GYEfz2J/Gtw==@kvack.org X-Gm-Message-State: AOJu0Yz9yEcZq0c0TOKN75z5Ut6bDi9EMg067IqE7b9gb1/wNYBMU2Zz 6AXc4PHIyksbONkUy/XidK44ow9kilDHPLBi1y8CzQFYXzbheUdO X-Google-Smtp-Source: AGHT+IG9s4/MZj0zBY44ccwxDxyYhdj18MMmDIcIqy+iQlSVXJUZgc60wgvOe/Bic0z/OFYSruZKqw== X-Received: by 2002:a17:907:a08:b0:a9a:17f5:79a8 with SMTP id a640c23a62f3a-a9abf84a887mr322150666b.13.1729708307157; Wed, 23 Oct 2024 11:31:47 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:eb:d0d0:c7fd:c82c? ([2620:10d:c092:500::7:ca73]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a912edff8sm507478666b.59.2024.10.23.11.31.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Oct 2024 11:31:46 -0700 (PDT) Message-ID: <3dca2498-363c-4ba5-a7e6-80c5e5532db5@gmail.com> Date: Wed, 23 Oct 2024 19:31:46 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 0/4] mm: zswap: add support for zswapin of large folios To: Yosry Ahmed , Barry Song <21cnbao@gmail.com> Cc: senozhatsky@chromium.org, minchan@kernel.org, hanchuanhua@oppo.com, v-songbaohua@oppo.com, akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, david@redhat.com, willy@infradead.org, kanchana.p.sridhar@intel.com, nphamcs@gmail.com, chengming.zhou@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, riel@surriel.com, shakeel.butt@linux.dev, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org References: <20241018105026.2521366-1-usamaarif642@gmail.com> <5313c721-9cf1-4ecd-ac23-1eeddabd691f@gmail.com> <4c30cc30-0f7c-4ca7-a933-c8edfadaee5c@gmail.com> <7a14c332-3001-4b9a-ada3-f4d6799be555@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C0DF5140020 X-Stat-Signature: 9gn5ejud759bp8u7tmmo5ystgz4o7i5d X-Rspam-User: X-HE-Tag: 1729708295-76969 X-HE-Meta: U2FsdGVkX1/4mzx0C6JlHoWPc9q169algTYj54ROCGAGh0WUf1FDzuo4ErB+TdqcZ3fN2Gw0X2NUp71xjRXD3EzuciG5HEGmx7RWdeLDnkBtgQ+E6R1pawNdGNBHBHMMyBHoJ/bETIm+273D96iGZbPUB/AO+olzQbJEeDLtCd2tQ6/+6tQjMssLRp22OeeL0fNm8wKrKRPi4+SQSn3SwDHSgfu0RjkFHxiOEoUQxewg1GPE9hwtt/bh+lZX4TVm3oaACzHCng8wOZfDWYGT+qd1BST233+pwphfVfu8VxzKfAtlGoQaocnlTZXIiWMO4IUmikn7jZlfbhxfBs/LCxqC8PWCCyUm01Dp5fWUv4gSaiJSi9W4gSUgBJI0URv7KmRhVupp0fFL1WQKv1XUbCLQ2CZoJrSdDXv8e3K+if5C3sB6O4pI4JQHPpMfZ3JEg68UhG4rTDlmTwMF5l+DrFmuXpQqD4D9Ttw2bD4Ce2k0rEsDrqzcfRwIHDItXRx3J+o8XPzLnJWY7lPndUGX0yIeWVvIfietpzX8fCldcMQzXxxOnpRjKPOGzpI2u5Sg58688ldXBOf6d6EAGxhh/dPQHshpi6f5Ut0Dw+V5zy4AfO25RFG34EYYanV6A6vshnuurvLJ9G34laZq6uBuJ47smn+EAh1+ncMfFfpV/E/o9imXzfLv61xlSTDa9vf4/9Pn4VN7Xg6u7z70OAZ3luiBzG526/zsJ3nCyI7LyofLCdaq4Ys8Tdl0tizU3VmAnpwbHAwC6EjIvGRtWZMoV1r3icq7UacFFv83+bsntB4Xut62mPikTiVcpB4argPLi2FhcSJN0eP1W1KDAoO4avfJKrKxDrWzv7ZQgrh/xhVTlXSsvCLCMjNP/BBuPs9iDbEwejxzDvIvhqX7pbry1XlrmboRWwNcxpvwp17sMV+ji42v8OXO9hLQHyaflMV7gpfJ8XTQH2p+lvK8TCV KZCfm8rw G3P0wWch83xnqKpCMg7QqUS2LzluqNLs91gYa2qeQ9Zux/McTa6yhohisdUFArKv5g5lQrirH4fGK3XRFnXhPC5XTZu3KgYOZPuBXXCB5X5c+GSFVvQCCuVbfrI19wuGyDIkRg+ME07H1jBCOK2aUO2Xcv8CJw28b3AwDiX0HjVs8h0eZWbypU0fJqlKP4x+QcSWtD9bASbw20uHv4dLADhv4iKcZcTOMAokYdgplkCqKXWHXfhRhBXVCTcgDJd1Y6ZdhLUrZgmwcMq73YWdsU4ABlVL+zDVhDTO7WY4rIB8rC0bebsUemJ13QVLOSVH4a/Iul+loHZxJU6sv4vIuGQ9LMKYchXCFVk93iqfJndTbdzsPjyY3vK89HjEQGTx16LCllO3UnWOQ/opbq1TbKMuxkM7WwKN/FmEavYU4y6I/1tsnwbby2MiKZ3ek3broyoUxdtjEr0NkanoVN4TuPbXgMA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 23/10/2024 19:02, Yosry Ahmed wrote: > [..] >>>> I suspect the regression occurs because you're running an edge case >>>> where the memory cgroup stays nearly full most of the time (this isn't >>>> an inherent issue with large folio swap-in). As a result, swapping in >>>> mTHP quickly triggers a memcg overflow, causing a swap-out. The >>>> next swap-in then recreates the overflow, leading to a repeating >>>> cycle. >>>> >>> >>> Yes, agreed! Looking at the swap counters, I think this is what is going >>> on as well. >>> >>>> We need a way to stop the cup from repeatedly filling to the brim and >>>> overflowing. While not a definitive fix, the following change might help >>>> improve the situation: >>>> >>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>>> >>>> index 17af08367c68..f2fa0eeb2d9a 100644 >>>> --- a/mm/memcontrol.c >>>> +++ b/mm/memcontrol.c >>>> >>>> @@ -4559,7 +4559,10 @@ int mem_cgroup_swapin_charge_folio(struct folio >>>> *folio, struct mm_struct *mm, >>>> memcg = get_mem_cgroup_from_mm(mm); >>>> rcu_read_unlock(); >>>> >>>> - ret = charge_memcg(folio, memcg, gfp); >>>> + if (folio_test_large(folio) && mem_cgroup_margin(memcg) < >>>> MEMCG_CHARGE_BATCH) >>>> + ret = -ENOMEM; >>>> + else >>>> + ret = charge_memcg(folio, memcg, gfp); >>>> >>>> css_put(&memcg->css); >>>> return ret; >>>> } >>>> >>> >>> The diff makes sense to me. Let me test later today and get back to you. >>> >>> Thanks! >>> >>>> Please confirm if it makes the kernel build with memcg limitation >>>> faster. If so, let's >>>> work together to figure out an official patch :-) The above code hasn't consider >>>> the parent memcg's overflow, so not an ideal fix. >>>> >> >> Thanks Barry, I think this fixes the regression, and even gives an improvement! >> I think the below might be better to do: >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index c098fd7f5c5e..0a1ec55cc079 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -4550,7 +4550,11 @@ int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, >> memcg = get_mem_cgroup_from_mm(mm); >> rcu_read_unlock(); >> >> - ret = charge_memcg(folio, memcg, gfp); >> + if (folio_test_large(folio) && >> + mem_cgroup_margin(memcg) < max(MEMCG_CHARGE_BATCH, folio_nr_pages(folio))) >> + ret = -ENOMEM; >> + else >> + ret = charge_memcg(folio, memcg, gfp); >> >> css_put(&memcg->css); >> return ret; >> >> >> AMD 16K+32K THP=always >> metric mm-unstable mm-unstable + large folio zswapin series mm-unstable + large folio zswapin + no swap thrashing fix >> real 1m23.038s 1m23.050s 1m22.704s >> user 53m57.210s 53m53.437s 53m52.577s >> sys 7m24.592s 7m48.843s 7m22.519s >> zswpin 612070 999244 815934 >> zswpout 2226403 2347979 2054980 >> pgfault 20667366 20481728 20478690 >> pgmajfault 385887 269117 309702 >> >> AMD 16K+32K+64K THP=always >> metric mm-unstable mm-unstable + large folio zswapin series mm-unstable + large folio zswapin + no swap thrashing fix >> real 1m22.975s 1m23.266s 1m22.549s >> user 53m51.302s 53m51.069s 53m46.471s >> sys 7m40.168s 7m57.104s 7m25.012s >> zswpin 676492 1258573 1225703 >> zswpout 2449839 2714767 2899178 >> pgfault 17540746 17296555 17234663 >> pgmajfault 429629 307495 287859 >> > > Thanks Usama and Barry for looking into this. It seems like this would > fix a regression with large folio swapin regardless of zswap. Can the > same result be reproduced on zram without this series? Yes, its a regression in large folio swapin support regardless of zswap/zram. Need to do 3 tests, one with probably the below diff to remove large folio support, one with current upstream and one with upstream + swap thrashing fix. We only use zswap and dont have a zram setup (and I am a bit lazy to create one :)). Any zram volunteers to try this? diff --git a/mm/memory.c b/mm/memory.c index fecdd044bc0b..62f6b087beb3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4124,6 +4124,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) gfp_t gfp; int order; + goto fallback; + /* * If uffd is active for the vma we need per-page fault fidelity to * maintain the uffd semantics.