From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83600D58D79 for ; Mon, 25 Nov 2024 18:32:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 004EA6B0088; Mon, 25 Nov 2024 13:32:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EF6BA6B0089; Mon, 25 Nov 2024 13:32:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D98866B008C; Mon, 25 Nov 2024 13:32:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BD0AC6B0088 for ; Mon, 25 Nov 2024 13:32:52 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 33966160CDA for ; Mon, 25 Nov 2024 18:32:52 +0000 (UTC) X-FDA: 82825463700.16.CE7D55B Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) by imf16.hostedemail.com (Postfix) with ESMTP id AB59218000D for ; Mon, 25 Nov 2024 18:32:46 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NhPXqoCV; spf=pass (imf16.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732559568; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Zc6X67DX02GxHbsO9YJwq3Z2iOa7eiBwg+zESmfMgjk=; b=fpAULis0OJ79cFEdQWh2l3R9db3mkXxRAXXiNb8SM2STxv0YfCKr2Vcw6QndcyE9k8CVEe tev57bHNnRvmndqHmE+pV/h0/+rSJLts4loBby8zdK++Wly2sqsXJRK3/FtWjtszF0FeSr NtjnpvE7yNwK2skOJllXNzbdAsTEZ8Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732559568; a=rsa-sha256; cv=none; b=77FD2pucdox5KF9t6XnF3pxTiX/oCEP6A3B0ttDVesny7T7sNWe8/jK9cIA3ecFb3M1Wml 7kFiV1I53iW21XTmhg4QtZisic+M2PKDqfqVFiLW/v0w032B/mIFt6kVgFyQeDB9NJ4uIW j7+x4RAdvXGRpUa8RlDNFwu69JVt/VE= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NhPXqoCV; spf=pass (imf16.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-71d4ba17cd2so684458a34.3 for ; Mon, 25 Nov 2024 10:32:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732559569; x=1733164369; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Zc6X67DX02GxHbsO9YJwq3Z2iOa7eiBwg+zESmfMgjk=; b=NhPXqoCVZen597H+o343uZ9FCYbRRWOWP8+8i3Nl126c0/XUsqlhg/e7OoHNpk6Gsj BgWu5zrZ8gIsHQUZSy6wLIwu2XYfRNB1x0lhN5VqnNQ65saRQ0iQ3vv35WooMtEtm0HX dP1F3D4Z5o61pt30c0nOR9GOxUdp6v1vP9rKK231O4b+QANOS5aI4pc78zA54xkHKP4w nwWNwguHyn2+eVsLiJoXs33MWLf4fgZHaQkBOSU298dJyA87JThfKkiPD3pFLN+IfUpn ZkiyuMdJwVCV4WuY8E/hiJeBERFTz2Z8CqyAI2WVKmRiXhvT9/tyG4o0pMj+WazDas0t aiIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732559569; x=1733164369; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Zc6X67DX02GxHbsO9YJwq3Z2iOa7eiBwg+zESmfMgjk=; b=WFSgoyuBFcHwAwuJu/7fZB2bE1Do8Lgmizg0w2M+fwtPduLbPKMDM9Hz8D8Fj0gWPr A4h0p4WaTUI+tTn4VpMeTRREKetEAWu3ad90bjuh3zjkEunaaLxCosn1wP0zHQ1f6W4h kRtgBC9pHemf7IQN96yFFOkVK3GRjLe0pMWuwV26imBPghbTNpLKgjYGgbXFN6PFbKYk toOb8MfoLQx/E10rXd6ik0WfOyUX0tM7QqierOhu7Kq0QQ9rbsmjRxkzBjMc6Vrfn24j eJyvbUXd9CB5FZA4bZB5OdBFQcr+SfCEFpESD/29kWAs14C3BjTMdJ4hKsjM3MetqbZ7 QzsQ== X-Forwarded-Encrypted: i=1; AJvYcCXlgyRXaKPcmZTNlB7zsnMf2Tphunb8aIna4OfeUp3YlgN7bKMes9J9hBD7gp77V1yfhPFWJ9GI+w==@kvack.org X-Gm-Message-State: AOJu0YzFR+zuhpuDVk/0gFsXX0w4mNWZ43+jUnQ1TLx04ZrXBGk5vQD4 CklAV87fr4HbtUgAjPjx3kIEEQKvQJnTJiMauUypT8MSmb0NUE1txpXUTkoF/TxzUdG5McKgxXi 0nqQ9OiC2JAEh6aGCqvxcmsfy7xc= X-Gm-Gg: ASbGncsCjwADuWpkEPjwjtBJGvd7gvStSYBr1ByoK1sqUMZI6mKHl0Eb4/fF3o2aZ8k c1zVx9m22u6lDYk/ZxRiogWLjkfzk8d9VicMH56PMHRJIAOXjKkULREUp5RxuZ4mQzg== X-Google-Smtp-Source: AGHT+IFkGljtknKdoqdkSxourhIdBkOKkKTww/nP3meVN1YxFUoWaimjGYLHMEeC31vIWdPdlylQpMd3F7Lj7OeYEN0= X-Received: by 2002:a05:6358:e48b:b0:1ca:9b11:c7fe with SMTP id e5c5f4694b2df-1ca9b11c8f9mr208073455d.21.1732559569084; Mon, 25 Nov 2024 10:32:49 -0800 (PST) MIME-Version: 1.0 References: <20241121222521.83458-1-21cnbao@gmail.com> <20241121222521.83458-5-21cnbao@gmail.com> <24f7d8a0-ab92-4544-91dd-5241062aad23@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 26 Nov 2024 07:32:38 +1300 Message-ID: Subject: Re: [PATCH RFC v3 4/4] mm: fall back to four small folios if mTHP allocation fails To: Usama Arif Cc: akpm@linux-foundation.org, linux-mm@kvack.org, axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, surenb@google.com, terrelln@fb.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com, Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Stat-Signature: 6bg3kn8ga4s54krwck4gibp8oe81gzoj X-Rspamd-Queue-Id: AB59218000D X-Rspam-User: X-HE-Tag: 1732559566-130519 X-HE-Meta: U2FsdGVkX1+sJZwMd4fNpKTLBgmWlGPKKPlbN3t5A3rpZtJbjjHw+yNgDJQW1vgYIL4TJo8agWdXw2EiHfxFUiMRDLc43HfehGQNiZ9M0ebnorWWN895C84mxxpmy1k9D5aMPvIl8k5Mgx4RKQLQ71ppVzPRdmBigLx2dT7fJBTGwrXW2d96XACJDEtf65mAKlldCk/ApwUc7SknX5p6OHjLq8Tr1xP62dS7SnGdNe9eEZdGopYfT9motDTnoe+HKvg3uqS8LHZ48GA4yEJquKgLOe5tnlGv6XOSuWqW5GNIQI/9jFIIXcyNwi0c2cnB31SBf1IZeMhhpX1gFj/o8G8a031r/7j37ezgtAsGIcGPxJrAcQDSXWpcGT5r2MbMoMOhtwqdv+yanAGs8L0+yDyTxo5bmYfs8/B4AlbxHUj9rjLyeUpjrdzzc0viG051chDCLPj/hxwm+y8tfz9ZY5t4syoykLKRjRKZcoH6zbETlByPpJwP18Ht//CUjiq8clmiGh1cMOol/RO87Ull2ZX9L4+iwBshBVMsdrWWVENsJwj7aB7Rj+abwx8aQlVjIJdt3mq7a4szFYC+SZJ2EQbXkDzXxW9n6aeTFNSfm6Udfzv7RAgWa0sCRYn7vihjE7uexHfqtI7mryhFhZs7FK+4JhyfOZZTrUEpLEAlfjPUA5y+oo/JdMdn9hnLdVDXVQzvNw03HRSx1Kt42ZTKBxNNgfx/0xL+27m+4qXyVi43kLDhXLt5x4J6uEqpBKM+BzVdEBcvwxxFAHu57eEzckWIC6Cm0MZe4PScyxvl1TUEHClaF12z4ZO+JH3RZZmCfJ4JAHm1ehK3AQYXCVuL8KccXPHsaK+OqGaCQI5DWfHXf/DaNi8ou/Iy1YhiHqbhJ48/eJFLyNQ18p5Fct0ERBLrZARbymz+UuLRS8srrCtsJeKSZcecUsissgZtufqTBrJrfdgIAvszRczoXUJ XO6owZvQ iKcz9spQgSXbBxCXTO1WBD3kca5k98dyM/Zf6GtaSPOfkxQNUrZ+i6hZaexI09n5eJfP3MT9SwZhUQmdgPlsMOQaPmG0uAT0/EwSzv/JUj9jQSte8lu2OY2VBSJNke+IDu3u/uLJa59VIMnu1d9Edr6pG6Lnn7AW3vB9Myx7nEatokBGi6yKuIFW6ER6fXR2DGEQm87RaFfBTER5WI3A+S+HxtNVYu/tMYQ7VxKzU5HHw4W2PhohuVmXqj2GkLOIeiWGFhDhRAeZ+pSKQGDn09Pc1InqKN+xYqsFjUGjsYHJaEmDWmibV0YIWkXYHxtvj3WrffanydsNyeDKfRAyFJG5E0Y6lgO1L3QKpGyn06BRU8QlBhvUn+Ql5yab2NHfxh0DXW31DS8P3uUQj2OO2hjjWOUhKu70H51um/W79SX3DUXUcmT3TUnpxrg7kS4y1QjbaM+Hjea70QTF+LybHlQjsRFZyrgeGyJOMB8yf7AQ0aXHD2ZMXaj0GL2+YnQn1+Mtm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 26, 2024 at 5:19=E2=80=AFAM Usama Arif = wrote: > > > > On 24/11/2024 21:47, Barry Song wrote: > > On Sat, Nov 23, 2024 at 3:54=E2=80=AFAM Usama Arif wrote: > >> > >> > >> > >> On 21/11/2024 22:25, Barry Song wrote: > >>> From: Barry Song > >>> > >>> The swapfile can compress/decompress at 4 * PAGES granularity, reduci= ng > >>> CPU usage and improving the compression ratio. However, if allocating= an > >>> mTHP fails and we fall back to a single small folio, the entire large > >>> block must still be decompressed. This results in a 16 KiB area requi= ring > >>> 4 page faults, where each fault decompresses 16 KiB but retrieves onl= y > >>> 4 KiB of data from the block. To address this inefficiency, we instea= d > >>> fall back to 4 small folios, ensuring that each decompression occurs > >>> only once. > >>> > >>> Allowing swap_read_folio() to decompress and read into an array of > >>> 4 folios would be extremely complex, requiring extensive changes > >>> throughout the stack, including swap_read_folio, zeromap, > >>> zswap, and final swap implementations like zRAM. In contrast, > >>> having these components fill a large folio with 4 subpages is much > >>> simpler. > >>> > >>> To avoid a full-stack modification, we introduce a per-CPU order-2 > >>> large folio as a buffer. This buffer is used for swap_read_folio(), > >>> after which the data is copied into the 4 small folios. Finally, in > >>> do_swap_page(), all these small folios are mapped. > >>> > >>> Co-developed-by: Chuanhua Han > >>> Signed-off-by: Chuanhua Han > >>> Signed-off-by: Barry Song > >>> --- > >>> mm/memory.c | 203 +++++++++++++++++++++++++++++++++++++++++++++++++-= -- > >>> 1 file changed, 192 insertions(+), 11 deletions(-) > >>> > >>> diff --git a/mm/memory.c b/mm/memory.c > >>> index 209885a4134f..e551570c1425 100644 > >>> --- a/mm/memory.c > >>> +++ b/mm/memory.c > >>> @@ -4042,6 +4042,15 @@ static struct folio *__alloc_swap_folio(struct= vm_fault *vmf) > >>> return folio; > >>> } > >>> > >>> +#define BATCH_SWPIN_ORDER 2 > >> > >> Hi Barry, > >> > >> Thanks for the series and the numbers in the cover letter. > >> > >> Just a few things. > >> > >> Should BATCH_SWPIN_ORDER be ZSMALLOC_MULTI_PAGES_ORDER instead of 2? > > > > Technically, yes. I'm also considering removing ZSMALLOC_MULTI_PAGES_OR= DER > > and always setting it to 2, which is the minimum anonymous mTHP order. = The main > > reason is that it may be difficult for users to select the appropriate = Kconfig? > > > > On the other hand, 16KB provides the most advantages for zstd compressi= on and > > decompression with larger blocks. While increasing from 16KB to 32KB or= 64KB > > can offer additional benefits, the improvement is not as significant > > as the jump from > > 4KB to 16KB. > > > > As I use zstd to compress and decompress the 'Beyond Compare' software > > package: > > > > root@barry-desktop:~# ./zstd > > File size: 182502912 bytes > > 4KB Block: Compression time =3D 0.765915 seconds, Decompression time = =3D > > 0.203366 seconds > > Original size: 182502912 bytes > > Compressed size: 66089193 bytes > > Compression ratio: 36.21% > > 16KB Block: Compression time =3D 0.558595 seconds, Decompression time = =3D > > 0.153837 seconds > > Original size: 182502912 bytes > > Compressed size: 59159073 bytes > > Compression ratio: 32.42% > > 32KB Block: Compression time =3D 0.538106 seconds, Decompression time = =3D > > 0.137768 seconds > > Original size: 182502912 bytes > > Compressed size: 57958701 bytes > > Compression ratio: 31.76% > > 64KB Block: Compression time =3D 0.532212 seconds, Decompression time = =3D > > 0.127592 seconds > > Original size: 182502912 bytes > > Compressed size: 56700795 bytes > > Compression ratio: 31.07% > > > > In that case, would we no longer need to rely on ZSMALLOC_MULTI_PAGES_O= RDER? > > > > Yes, I think if there isn't a very significant benefit of using a larger = order, > then its better not to have this option. It would also simplify the code. > > >> > >> Did you check the performance difference with and without patch 4? > > > > I retested after reverting patch 4, and the sys time increased to over > > 40 minutes > > again, though it was slightly better than without the entire series. > > > > *** Executing round 1 *** > > > > real 7m49.342s > > user 80m53.675s > > sys 42m28.393s > > pswpin: 29965548 > > pswpout: 51127359 > > 64kB-swpout: 0 > > 32kB-swpout: 0 > > 16kB-swpout: 11347712 > > 64kB-swpin: 0 > > 32kB-swpin: 0 > > 16kB-swpin: 6641230 > > pgpgin: 147376000 > > pgpgout: 213343124 > > > > *** Executing round 2 *** > > > > real 7m41.331s > > user 81m16.631s > > sys 41m39.845s > > pswpin: 29208867 > > pswpout: 50006026 > > 64kB-swpout: 0 > > 32kB-swpout: 0 > > 16kB-swpout: 11104912 > > 64kB-swpin: 0 > > 32kB-swpin: 0 > > 16kB-swpin: 6483827 > > pgpgin: 144057340 > > pgpgout: 208887688 > > > > > > *** Executing round 3 *** > > > > real 7m47.280s > > user 78m36.767s > > sys 37m32.210s > > pswpin: 26426526 > > pswpout: 45420734 > > 64kB-swpout: 0 > > 32kB-swpout: 0 > > 16kB-swpout: 10104304 > > 64kB-swpin: 0 > > 32kB-swpin: 0 > > 16kB-swpin: 5884839 > > pgpgin: 132013648 > > pgpgout: 190537264 > > > > *** Executing round 4 *** > > > > real 7m56.723s > > user 80m36.837s > > sys 41m35.979s > > pswpin: 29367639 > > pswpout: 50059254 > > 64kB-swpout: 0 > > 32kB-swpout: 0 > > 16kB-swpout: 11116176 > > 64kB-swpin: 0 > > 32kB-swpin: 0 > > 16kB-swpin: 6514064 > > pgpgin: 144593828 > > pgpgout: 209080468 > > > > *** Executing round 5 *** > > > > real 7m53.806s > > user 80m30.953s > > sys 40m14.870s > > pswpin: 28091760 > > pswpout: 48495748 > > 64kB-swpout: 0 > > 32kB-swpout: 0 > > 16kB-swpout: 10779720 > > 64kB-swpin: 0 > > 32kB-swpin: 0 > > 16kB-swpin: 6244819 > > pgpgin: 138813124 > > pgpgout: 202885480 > > > > I guess it is due to the occurrence of numerous partial reads > > (about 10%, 3505537/35159852). > > > > root@barry-desktop:~# cat /sys/block/zram0/multi_pages_debug_stat > > > > zram_bio write/read multi_pages count:54452828 35159852 > > zram_bio failed write/read multi_pages count 0 0 > > zram_bio partial write/read multi_pages count 4 3505537 > > multi_pages_miss_free 0 > > > > This workload doesn't cause fragmentation in the buddy allocator, so it= =E2=80=99s > > likely due to the failure of MEMCG_CHARGE. > > > >> > >> I know that it wont help if you have a lot of unmovable pages > >> scattered everywhere, but were you able to compare the performance > >> of defrag=3Dalways vs patch 4? I feel like if you have space for 4 fol= ios > >> then hopefully compaction should be able to do its job and you can > >> directly fill the large folio if the unmovable pages are better placed= . > >> Johannes' series on preventing type mixing [1] would help. > >> > >> [1] https://lore.kernel.org/all/20240320180429.678181-1-hannes@cmpxchg= .org/ > > > > I believe this could help, but defragmentation is a complex issue. Espe= cially on > > phones, where various components like drivers, DMA-BUF, multimedia, and > > graphics utilize memory. > > > > We observed that a fresh system could initially provide mTHP, but after= a few > > hours, obtaining mTHP became very challenging. I'm happy to arrange a t= est > > of Johannes' series on phones (sometimes it is quite hard to backport t= o the > > Android kernel) to see if it brings any improvements. > > > > I think its definitely worth trying. If we can improve memory allocation/= compaction > instead of patch 4, then we should go for that. Maybe there won't be a ne= ed for TAO > if allocation is done in a smarter way? > > Just out of curiosity, what is the base kernel version you are testing wi= th? This kernel build testing was conducted on my Intel PC running mm-unstable, which includes Johannes' series. As mentioned earlier, it still shows many partial reads without patch 4. For phones, we have to backport to android kernel such as 6.6, 6.1 etc: https://android.googlesource.com/kernel/common/+refs Testing new patchset can sometimes be quite a pain .... > > Thanks, > Usama Thanks Barry