From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0007AD58D65 for ; Mon, 25 Nov 2024 16:19:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A2A36B0093; Mon, 25 Nov 2024 11:19:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8527C6B0095; Mon, 25 Nov 2024 11:19:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F4016B0096; Mon, 25 Nov 2024 11:19:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 500C76B0093 for ; Mon, 25 Nov 2024 11:19:13 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EF8141408A2 for ; Mon, 25 Nov 2024 16:19:12 +0000 (UTC) X-FDA: 82825126902.16.EBB6940 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) by imf10.hostedemail.com (Postfix) with ESMTP id 67589C0010 for ; Mon, 25 Nov 2024 16:19:09 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=O9A3e7CF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732551547; a=rsa-sha256; cv=none; b=f5ApN05bXjK84IT0RP9IsBQNWBtlpdT4piVbrNP5y2VaV+SjJx2WAEfZpILMXDSL3ZkVbN qkzhNansB2hyHQANj6RTS+oOgMzlMS5xslQLhJVz7TN6NHvjJIdtExR8MmC4iRdVMINE4R aduAF8YDmoCf0dnJ0o3SmPHRE26tZK8= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=O9A3e7CF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732551547; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kSD3aRNDMkY4Pw4NIgjIzjJpElfB8e8RRBrwwFpvTOM=; b=cSo+Y0Qx9QoIwT1kUTeiIPJYJUm6AcqkXUtlr9ApY2Sx50GwauzRVGTGPwfxzvjQlKv64C AAMJqVHepNOeQ8fff+O0MZXo8NUUVXzPxEc2RGJJ7ydU35lpYBDa1gsu2V5qUaySDA+JLk Rfe7vkGZCb7rsQgSFKhz6IbYVpPlHIg= Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-38246333e12so4650692f8f.1 for ; Mon, 25 Nov 2024 08:19:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732551549; x=1733156349; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=kSD3aRNDMkY4Pw4NIgjIzjJpElfB8e8RRBrwwFpvTOM=; b=O9A3e7CFAyTvZDxecSFOvwnEpmv+dukEZsmDkd5bHofuDYj/bZ9GKfsBzlQ/1kdnCx fU03z2C4C9EEVr00RR1MvcnVNpaDAKh6MSoKzbVCOkcZOAZauWh0FsGgK35iaBu0OUMD ImcZmjZ1Pcqt85dfwaJRDqXnAT50+G2eGjZ39hnTVW44ojjhRAOxr+2Deu1AoyAP1BJv kiiyWlDmL/SX2vmK4J38kag9PJSfN4LBPfb0PP7liTCgGUygkzppmxbYFqnMpZozQByl V2bxkZUFuxZmPb9TYQwihaqE9kYPZ1biW/tIzN3WRPJtMAqLMIOo5DrieClCSCoPg+ZC ecmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732551549; x=1733156349; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kSD3aRNDMkY4Pw4NIgjIzjJpElfB8e8RRBrwwFpvTOM=; b=EV5z0OF6NUKImIrbLx+Xj6TQyrzYiekl/sW5lHqs4qwS21RZYMZzv4DHA3PAvuIz1m /8fMe9yhltKZULwshxSjSPKE44wmBbJds/vmNI7arVemjcrBS2zqeyIBAx2EfnTjXCMi +F4LfLqdIHThp90xV8K+jg5Vccs8ra+LWjhLEjYCtGKNHFBVgbDaoDnb3kZc29ssoXSy Y/90QzQZr4KP1/7S304NqI2+GTBGzjXia5t40LzhU55Z76wqhgz1tBEkoTTOLVSt+ulX J9iHoaiBxRIuT33zLVZpNnfz8xO9soM9BnJeUxBrvsGZljdMolw8KZ9MivnFa3BWOyDe i3sg== X-Forwarded-Encrypted: i=1; AJvYcCVYk+lM9m5e8zC/b5AU6FKrLGcZBu15RC43mGe+0AirRIMmh5dGMwkJgFd1jXp/npsFbaMSMD5dYA==@kvack.org X-Gm-Message-State: AOJu0Yx8/PTWOXA2PvzNNoUF0YqknfLsMUvCFse7O+fZwPMDg4+k5fKV tiopoWsQXzNLR374LBN7RdoiMvuFCswwdcVI4zqoeVPcT4luTt/y X-Gm-Gg: ASbGncvITYlL2ipKH/zi+bun7YkqnvGKIHYYLu18QX9EwfK/wZz9TklP3wcaB/woVfR RAUJEcZtlgiVrgJraSKMfrKXbDedxejxi/LGeAY9ghYICzaLTXdRUlvn1geCUz+ETMHAaT5KX6z MM/dplnDzZX5WA2L46SPIVWdIkWkLa+gQGqEtgUKrIE61UscabDEAj5a7TH2AjW3hGwbIv8TT2Y jHGZdrJ6G16iWkE2m4JS3vwbMPI+Z7MO1imebWyNAffLQ60Ve5AK+YqitnBYEDHq/P3Z2hoC5W3 ecvz6ZV5LRdDPPM6iIw3lqCWkPqYc2y4SGlDQz8b4gaDSg== X-Google-Smtp-Source: AGHT+IEU8oz3NceCX8oxiMV1ikeXO5kPQP265M86hu0lxXMRzFnslgBobY2PVpUyxZ1t6RBGE4f6qw== X-Received: by 2002:a05:6000:186c:b0:382:4849:d5c8 with SMTP id ffacd0b85a97d-38260b81082mr14217728f8f.31.1732551549028; Mon, 25 Nov 2024 08:19:09 -0800 (PST) Received: from ?IPV6:2a01:4b00:b211:ad00:1c87:efae:a44a:15a7? ([2a01:4b00:b211:ad00:1c87:efae:a44a:15a7]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3825fb25d74sm10936433f8f.47.2024.11.25.08.19.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 25 Nov 2024 08:19:08 -0800 (PST) Message-ID: Date: Mon, 25 Nov 2024 16:19:07 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC v3 4/4] mm: fall back to four small folios if mTHP allocation fails To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, surenb@google.com, terrelln@fb.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com, Chuanhua Han References: <20241121222521.83458-1-21cnbao@gmail.com> <20241121222521.83458-5-21cnbao@gmail.com> <24f7d8a0-ab92-4544-91dd-5241062aad23@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: 5b8ws4mj3wt9obtzrqx3cxenzwbwexjx X-Rspamd-Queue-Id: 67589C0010 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1732551549-573460 X-HE-Meta: U2FsdGVkX19/hEJ9h19t3E1yiGaXba4H8GxnqtM2/SnyIgt1JJf3uhQT/VBdaP83ePLnhJm9EaugmJFoeXHysCgsxCL6BovnrBDC7x07GqRx1jW3jNuacDIhGUaWSH6bc2gOUAJN99Z9l759iR5Dbc3mLneODk9IukIBfB7eHDHjX11SySo3y3vf02k5ZauyptJVqWPOLCcGHS/3Xyv8Wbcrster6DTJrLBLLfwH6gFix290Fdzk/Bdfes6fIRcq5v4vlAfoLX9eFS3Ce7NvdUaVMMBconGqjAqXkmQJDC9KPXR1mEuV1Dtxslx0YbVdqo82pnHjagalXJK6a8/WHas8OKp+icnLM7ypBpy6rVndrSrLuxUi+OXR6/m13/UPjJyIZjBhig/k2nTfrD++h9JouMfBay1L8xD3ZeYRO0Bv4OIdnwPm/8KVqyQfMwRxAIYiBNxQgAnQQWLzXeorzu0Ww+hwH8dd/NMab105G3rI5nSA/KX4Tlcl51qckFHDbfEkKNK1rvL5XhbyIkhFutQ/JU5dYqcFif82QBQ6GbeZBgYbpjOYOR6Wg8+IykAVTYkE27guUdtQPyvhZVFqDNoPzsVFwg9gt0mirvwo8ZH0EYefh3JAtQOYKFFJyXdIbexF/8c/AR+pfeNCqx/A9hpCkzBCqZGgCvRTHX/mbDSm2zgm2FoYK1NKZ7BZIWU/bRIul6U/IXNFZ0heR+8IcKr0FVucvFmjSxgtvSHq57v1PkxSQnoad4MaguV4tHw68WFkV+jnlgQ1ZIhmvAaYQB4UmHy6lZcET0EeVo+ilhWGHVP95FXQPDF3pduUmqvTlZ+ugMsFjINUz9cThXrvbYyTK5lK1rE2a0h7/TSbYqnGdF8IBB5DQcWhDsPcVMXvtFA0uypQlgElx07UdPTb6o3C3CETRneNaCibEMQrZkWtY8dX9syrw03k7aKh/fffephjXukGn8J/5qqSleE zo+Smjs5 pzWQB3yRPFByOdFPcWheip+cH63anbkp39Qa7SA79xGU5adkrtlf7O7ZKAx9KYdeJZwcK+c3B+O4W1XybgtYFYG9UKWEbWKRi344mJIaCnANHwyuD7djJWUxB3gbl1I1s7GLYya5olfhc+3iFjh26vSQfh4XnL2lWuqzo+jKcnbPt9CvsrUV51OLdcGFG1tSzOnbscoBeMlplwJjMDlcRioSRaDSkop/+NmJq7n9OZowumFnSs9/NUTDvCGGM4JvJlj/pkSbH5IG5pPfRKaGGfkAAq+jRIVm/JHpwDQQsnoJwz9D3jqne1c/7fgq/NUuY7L1IKDwMIOUX1sMWMdtd+91ZM0dYSiIhqGR5TDglyEqYnG0WV4qKt2EBXJsU+4Euqc5r7C/jXPAOL1hefgK0g52qahzmqE6YtTNJXXRT/ardctiaCGpcnFI2M3YfRpJbyILhQJweytHlBF2M6x4I2eAAII+fsKS7aV/sf8TKMyLem52yUh+9ScI8JCif+8OPlLOXCx6sUMi+d3phFoifAZey1t/rOK0kwWxXiHNUwUkByjVw0IlK8FiRzChNeyl1mWtujDDhLDWLj9LG3X5EdWstJg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 24/11/2024 21:47, Barry Song wrote: > On Sat, Nov 23, 2024 at 3:54 AM Usama Arif wrote: >> >> >> >> On 21/11/2024 22:25, Barry Song wrote: >>> From: Barry Song >>> >>> The swapfile can compress/decompress at 4 * PAGES granularity, reducing >>> CPU usage and improving the compression ratio. However, if allocating an >>> mTHP fails and we fall back to a single small folio, the entire large >>> block must still be decompressed. This results in a 16 KiB area requiring >>> 4 page faults, where each fault decompresses 16 KiB but retrieves only >>> 4 KiB of data from the block. To address this inefficiency, we instead >>> fall back to 4 small folios, ensuring that each decompression occurs >>> only once. >>> >>> Allowing swap_read_folio() to decompress and read into an array of >>> 4 folios would be extremely complex, requiring extensive changes >>> throughout the stack, including swap_read_folio, zeromap, >>> zswap, and final swap implementations like zRAM. In contrast, >>> having these components fill a large folio with 4 subpages is much >>> simpler. >>> >>> To avoid a full-stack modification, we introduce a per-CPU order-2 >>> large folio as a buffer. This buffer is used for swap_read_folio(), >>> after which the data is copied into the 4 small folios. Finally, in >>> do_swap_page(), all these small folios are mapped. >>> >>> Co-developed-by: Chuanhua Han >>> Signed-off-by: Chuanhua Han >>> Signed-off-by: Barry Song >>> --- >>> mm/memory.c | 203 +++++++++++++++++++++++++++++++++++++++++++++++++--- >>> 1 file changed, 192 insertions(+), 11 deletions(-) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 209885a4134f..e551570c1425 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -4042,6 +4042,15 @@ static struct folio *__alloc_swap_folio(struct vm_fault *vmf) >>> return folio; >>> } >>> >>> +#define BATCH_SWPIN_ORDER 2 >> >> Hi Barry, >> >> Thanks for the series and the numbers in the cover letter. >> >> Just a few things. >> >> Should BATCH_SWPIN_ORDER be ZSMALLOC_MULTI_PAGES_ORDER instead of 2? > > Technically, yes. I'm also considering removing ZSMALLOC_MULTI_PAGES_ORDER > and always setting it to 2, which is the minimum anonymous mTHP order. The main > reason is that it may be difficult for users to select the appropriate Kconfig? > > On the other hand, 16KB provides the most advantages for zstd compression and > decompression with larger blocks. While increasing from 16KB to 32KB or 64KB > can offer additional benefits, the improvement is not as significant > as the jump from > 4KB to 16KB. > > As I use zstd to compress and decompress the 'Beyond Compare' software > package: > > root@barry-desktop:~# ./zstd > File size: 182502912 bytes > 4KB Block: Compression time = 0.765915 seconds, Decompression time = > 0.203366 seconds > Original size: 182502912 bytes > Compressed size: 66089193 bytes > Compression ratio: 36.21% > 16KB Block: Compression time = 0.558595 seconds, Decompression time = > 0.153837 seconds > Original size: 182502912 bytes > Compressed size: 59159073 bytes > Compression ratio: 32.42% > 32KB Block: Compression time = 0.538106 seconds, Decompression time = > 0.137768 seconds > Original size: 182502912 bytes > Compressed size: 57958701 bytes > Compression ratio: 31.76% > 64KB Block: Compression time = 0.532212 seconds, Decompression time = > 0.127592 seconds > Original size: 182502912 bytes > Compressed size: 56700795 bytes > Compression ratio: 31.07% > > In that case, would we no longer need to rely on ZSMALLOC_MULTI_PAGES_ORDER? > Yes, I think if there isn't a very significant benefit of using a larger order, then its better not to have this option. It would also simplify the code. >> >> Did you check the performance difference with and without patch 4? > > I retested after reverting patch 4, and the sys time increased to over > 40 minutes > again, though it was slightly better than without the entire series. > > *** Executing round 1 *** > > real 7m49.342s > user 80m53.675s > sys 42m28.393s > pswpin: 29965548 > pswpout: 51127359 > 64kB-swpout: 0 > 32kB-swpout: 0 > 16kB-swpout: 11347712 > 64kB-swpin: 0 > 32kB-swpin: 0 > 16kB-swpin: 6641230 > pgpgin: 147376000 > pgpgout: 213343124 > > *** Executing round 2 *** > > real 7m41.331s > user 81m16.631s > sys 41m39.845s > pswpin: 29208867 > pswpout: 50006026 > 64kB-swpout: 0 > 32kB-swpout: 0 > 16kB-swpout: 11104912 > 64kB-swpin: 0 > 32kB-swpin: 0 > 16kB-swpin: 6483827 > pgpgin: 144057340 > pgpgout: 208887688 > > > *** Executing round 3 *** > > real 7m47.280s > user 78m36.767s > sys 37m32.210s > pswpin: 26426526 > pswpout: 45420734 > 64kB-swpout: 0 > 32kB-swpout: 0 > 16kB-swpout: 10104304 > 64kB-swpin: 0 > 32kB-swpin: 0 > 16kB-swpin: 5884839 > pgpgin: 132013648 > pgpgout: 190537264 > > *** Executing round 4 *** > > real 7m56.723s > user 80m36.837s > sys 41m35.979s > pswpin: 29367639 > pswpout: 50059254 > 64kB-swpout: 0 > 32kB-swpout: 0 > 16kB-swpout: 11116176 > 64kB-swpin: 0 > 32kB-swpin: 0 > 16kB-swpin: 6514064 > pgpgin: 144593828 > pgpgout: 209080468 > > *** Executing round 5 *** > > real 7m53.806s > user 80m30.953s > sys 40m14.870s > pswpin: 28091760 > pswpout: 48495748 > 64kB-swpout: 0 > 32kB-swpout: 0 > 16kB-swpout: 10779720 > 64kB-swpin: 0 > 32kB-swpin: 0 > 16kB-swpin: 6244819 > pgpgin: 138813124 > pgpgout: 202885480 > > I guess it is due to the occurrence of numerous partial reads > (about 10%, 3505537/35159852). > > root@barry-desktop:~# cat /sys/block/zram0/multi_pages_debug_stat > > zram_bio write/read multi_pages count:54452828 35159852 > zram_bio failed write/read multi_pages count 0 0 > zram_bio partial write/read multi_pages count 4 3505537 > multi_pages_miss_free 0 > > This workload doesn't cause fragmentation in the buddy allocator, so it’s > likely due to the failure of MEMCG_CHARGE. > >> >> I know that it wont help if you have a lot of unmovable pages >> scattered everywhere, but were you able to compare the performance >> of defrag=always vs patch 4? I feel like if you have space for 4 folios >> then hopefully compaction should be able to do its job and you can >> directly fill the large folio if the unmovable pages are better placed. >> Johannes' series on preventing type mixing [1] would help. >> >> [1] https://lore.kernel.org/all/20240320180429.678181-1-hannes@cmpxchg.org/ > > I believe this could help, but defragmentation is a complex issue. Especially on > phones, where various components like drivers, DMA-BUF, multimedia, and > graphics utilize memory. > > We observed that a fresh system could initially provide mTHP, but after a few > hours, obtaining mTHP became very challenging. I'm happy to arrange a test > of Johannes' series on phones (sometimes it is quite hard to backport to the > Android kernel) to see if it brings any improvements. > I think its definitely worth trying. If we can improve memory allocation/compaction instead of patch 4, then we should go for that. Maybe there won't be a need for TAO if allocation is done in a smarter way? Just out of curiosity, what is the base kernel version you are testing with? Thanks, Usama