From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7C03D41D44 for ; Tue, 12 Nov 2024 01:29:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 470208D0013; Mon, 11 Nov 2024 20:29:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 41E928D0001; Mon, 11 Nov 2024 20:29:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 274488D0013; Mon, 11 Nov 2024 20:29:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 069F38D0001 for ; Mon, 11 Nov 2024 20:29:13 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B8F2A801EA for ; Tue, 12 Nov 2024 01:29:12 +0000 (UTC) X-FDA: 82775708526.23.D5C0433 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by imf01.hostedemail.com (Postfix) with ESMTP id 85AF440007 for ; Tue, 12 Nov 2024 01:28:37 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=XgLcWyLF; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.8 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731374775; a=rsa-sha256; cv=none; b=TXX9aiG77MK/eqLul/janMrRlrRSAwnTY4FrU2rJU6VB0mLKIB+MBS8pORROkWHEHi5zpA paX2Xv2unAOyBI0H8vOVguCa/L0mLMF+KXEiE1dMUYQfgZQo01NLNtORjGfKzFPxJt7IVk H80joiHivn0IuatvMXCMnGW7QAH7c/g= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=XgLcWyLF; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.8 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731374775; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=t3IWoIqRtqUow/dT6rTQRji81NBNPU7Xrt819MnNPlQ=; b=BD94suz+pPLJk0p0rufw4tAijoD5B7TfrzGuVb8eseVmKVBdBwZ0FMtu2QTtfFAf9uhSJD UW/zBy6KN1OYq6h2gClcyfgTHKOoSaYK3UvBeDSlJfH34vjQ2ume89FJC03dASLZTUoyxK loW3zxI6aUSesC5JzT2Ud7B7a6pFJAM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731374950; x=1762910950; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=78FiGBevGAlrD+lEeXYFeqSXsf/4SebGYSRVDN/r1ag=; b=XgLcWyLFMearDXkyR9CuXVIBjTQCNUvHCnrNMbXOx9l/2hMLzFRErPc1 TmS1foeAxZjy8j+77DXArbVtfqqLMDHkFO9DfJzBybELZnukqg50GXMlu 32zBweDZ9501lTaS+1xXhEUYHDdmgoNSab1MQBaPfmUj0/oodw5Gc9Ub+ 0ncv72sy0nftNtTVbb4s19f2q6X8/mbfxqJmWgeN7pyAtZQ10Bd4sVofC 4z0sGOutbAsKZG6p5RaPW76PO/FP+hg19kiHNVG43c9rxQvPZ0Rb2Ywjk ny7DPumig+/kk29ogc3uNXz3C5U6ZOyLVow6rzyTosVkQdJ5kRFUTnNnu Q==; X-CSE-ConnectionGUID: Vm9eXndqSx+YCKiossbdrg== X-CSE-MsgGUID: +sSrHNUURl60Sz42KjUD9g== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="48705479" X-IronPort-AV: E=Sophos;i="6.12,146,1728975600"; d="scan'208";a="48705479" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 17:29:08 -0800 X-CSE-ConnectionGUID: MGc6mV94TsKcmP4dRr02gg== X-CSE-MsgGUID: o4vKLpv1RU21iN1q31NYFw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,146,1728975600"; d="scan'208";a="91701896" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 17:29:03 -0800 From: "Huang, Ying" To: Barry Song <21cnbao@gmail.com> Cc: linux-mm@kvack.org, akpm@linux-foundation.org, axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, nphamcs@gmail.com, senozhatsky@chromium.org, surenb@google.com, terrelln@fb.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com, usamaarif642@gmail.com, ryan.roberts@arm.com Subject: Re: [PATCH RFC v2 0/2] mTHP-friendly compression in zsmalloc and zram based on multi-pages In-Reply-To: (Barry Song's message of "Tue, 12 Nov 2024 14:25:08 +1300") References: <20241107101005.69121-1-21cnbao@gmail.com> <87iksy5mkh.fsf@yhuang6-desk2.ccr.corp.intel.com> <87pln1z2da.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 12 Nov 2024 09:25:30 +0800 Message-ID: <87h68dz1it.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 5ftsjbhcut3qdbw15bwraumanrtx3584 X-Rspamd-Queue-Id: 85AF440007 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1731374917-590628 X-HE-Meta: U2FsdGVkX18ZKmkrttWr7srWHLHPy2dZQhgQF/B+AQt3Sjfek2yuM53HMxLf1WfTi0wBSoYtwV97n6i6iF2+VTILjflrgHEeKXFp1VbmJSLN91uu0RP5RRGTUG+CDLTntmNKUdvD4VaXhsCkVePWeiBYNGjhe3oAl1NMoMpI80TzJNm2bb/4tHLbWz+oTXCWVSCQQjEpWldTIKxGjwD6W8eHHl4wLW6T6X2vd0Vs7dB6vTRndAZ3E+TVfhBVtW6rNEbQ8gjShizC0LZaLLD/r01SzRsuo72vnB1BFwqfjSDoKUwH05ptRF3tu9blotD5WeUPsbLdeDerD4z37NzrsM7naCGt52a5St8Gha1PahLDnNZ/WJzX9DFuAnv9hXVsc6/AOt6Mp2x4gZW2ub2fShrNrvCvcefem/o6/Qaz5weqpXAQfKEKvqj12zcqGFGw26M8KNyr8GPYQiYCaxdPHn3MV6wipIGRr1Vvrn15sNRgVbcDOcGEs9kHdFSUTB0jw4H+qHEqA7xy/33hjKp5UDBQyGkVVHyLTcWV0e0ZSG7iECWYpkC9hQlS+adbHj/rbGkKyHBXZYt6QDm54NicgWWRMzovApquydiPNuQYP01xYKSf9AcUc+jC0GBcNjz8z772rb0a+515YkeOn0wv5uSYOlM6N4qhdQTVZyZSxnaWSEmBAlkQCkXT02QdRMygigkUgZjD3xVpcnRRhqAcNzulcVbw6Q6pmjQ+VESNdXGb4JEpDqAepqdELuzewaJumyjFF1hiNxdYbUteMArQ5GTxHOnqS+AUWLhDSNdwX26fnvWVPS3Gb2kHG3A6jfhyKc7ifUrQPBVn7e6SaXRcejA+4mMniTYdNU0Vrh1MLlfCHnrqcouKtHttWZH+ER8uRK6tBPyocEaZmH+FE3lXM+UaQjOso+jZDNcJuEDiYFgyLXGjPuZ4YdfwNZeM8CGbAn2JL1R79UM92PrGfRp GplPkTZO D4eJECNj67o7sMVn62bz5HlHX6Uh98nqs7qQULm6xiFyQp4RNt/4OmHw0oeHLQA7nOgDZGUiKmTzBQpqZRFvjhDw9C3czhiVrKaHUSRg0JNZ0OREG74c5qhWMPdnerAtR7CAQLYSQLwxllM68y1r7zYEWF+lVaFVZQbrJomvQPSsi8LCxASmvFIKAAe5wdWwrIxrFJbNPn5fRknbmHZjU5aPeZxR7wayv4jZWAp1TKRULEoVi69yry89qq0uTVCzCiyzj9a3GgXa5V98= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Barry Song <21cnbao@gmail.com> writes: > On Tue, Nov 12, 2024 at 2:11=E2=80=AFPM Huang, Ying wrote: >> >> Barry Song <21cnbao@gmail.com> writes: >> >> > On Fri, Nov 8, 2024 at 6:23=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Hi, Barry, >> >> >> >> Barry Song <21cnbao@gmail.com> writes: >> >> >> >> > From: Barry Song >> >> > >> >> > When large folios are compressed at a larger granularity, we observe >> >> > a notable reduction in CPU usage and a significant improvement in >> >> > compression ratios. >> >> > >> >> > mTHP's ability to be swapped out without splitting and swapped back= in >> >> > as a whole allows compression and decompression at larger granulari= ties. >> >> > >> >> > This patchset enhances zsmalloc and zram by adding support for divi= ding >> >> > large folios into multi-page blocks, typically configured with a >> >> > 2-order granularity. Without this patchset, a large folio is always >> >> > divided into `nr_pages` 4KiB blocks. >> >> > >> >> > The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` >> >> > setting, where the default of 2 allows all anonymous THP to benefit. >> >> > >> >> > Examples include: >> >> > * A 16KiB large folio will be compressed and stored as a single 16K= iB >> >> > block. >> >> > * A 64KiB large folio will be compressed and stored as four 16KiB >> >> > blocks. >> >> > >> >> > For example, swapping out and swapping in 100MiB of typical anonymo= us >> >> > data 100 times (with 16KB mTHP enabled) using zstd yields the follo= wing >> >> > results: >> >> > >> >> > w/o patches w/ patches >> >> > swap-out time(ms) 68711 49908 >> >> > swap-in time(ms) 30687 20685 >> >> > compression ratio 20.49% 16.9% >> >> >> >> The data looks good. Thanks! >> >> >> >> Have you considered the situation that the large folio fails to be >> >> allocated during swap-in? It's possible because the memory may be ve= ry >> >> fragmented. >> > >> > That's correct, good question. On phones, we use a large folio pool to= maintain >> > a relatively high allocation success rate. When mTHP allocation fails,= we have >> > a workaround to allocate nr_pages of small folios and map them togethe= r to >> > avoid partial reads. This ensures that the benefits of larger block c= ompression >> > and decompression are consistently maintained. That was the code runn= ing >> > on production phones. >> > >> > We also previously experimented with maintaining multiple buffers for >> > decompressed >> > large blocks in zRAM, allowing upcoming do_swap_page() calls to use th= em when >> > falling back to small folios. In this setup, the buffers achieved a >> > high hit rate, though >> > I don=E2=80=99t recall the exact number. >> > >> > I'm concerned that this fault-around-like fallback to nr_pages small >> > folios may not >> > gain traction upstream. Do you have any suggestions for improvement? >> >> It appears that we still haven't a solution to guarantee 100% mTHP >> allocation success rate. If so, we need a fallback solution for that. >> >> Another possible solution is, >> >> 1) If failed to allocate mTHP with nr_pages, allocate nr_pages normal (4= k) >> folios instead >> >> 2) Revise the decompression interface to accept a set of folios (instead >> of one folio) as target. Then, we can decompress to the normal >> folios allocated in 1). >> >> 3) in do_swap_page(), we can either map all folios or just the fault >> folios. We can put non-fault folios into swap cache if necessary. >> >> Does this work? > > this is exactly what we did in production phones: I think that this is upstreamable. > [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/onep= lus/sm8650_u_14.0.0_oneplus12/mm/memory.c#L4656 > [2] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/onep= lus/sm8650_u_14.0.0_oneplus12/mm/memory.c#L5439 > > I feel that we don't need to fall back to nr_pages (though that's what > we used on phones); > using a dedicated 4 should be sufficient, as if zsmalloc is handling > compression and > decompression of 16KB. Yes. We only need the number of normal folios to make decompress work. > However, we are not adding them to the > swapcache; instead, > they are mapped immediately. I think that works. >> >> >> >> >> > -v2: >> >> > While it is not mature yet, I know some people are waiting for >> >> > an update :-) >> >> > * Fixed some stability issues. >> >> > * rebase againest the latest mm-unstable. >> >> > * Set default order to 2 which benefits all anon mTHP. >> >> > * multipages ZsPageMovable is not supported yet. >> >> > >> >> > Tangquan Zheng (2): >> >> > mm: zsmalloc: support objects compressed based on multiple pages >> >> > zram: support compression at the granularity of multi-pages >> >> > >> >> > drivers/block/zram/Kconfig | 9 + >> >> > drivers/block/zram/zcomp.c | 17 +- >> >> > drivers/block/zram/zcomp.h | 12 +- >> >> > drivers/block/zram/zram_drv.c | 450 ++++++++++++++++++++++++++++++= +--- >> >> > drivers/block/zram/zram_drv.h | 45 ++++ >> >> > include/linux/zsmalloc.h | 10 +- >> >> > mm/Kconfig | 18 ++ >> >> > mm/zsmalloc.c | 232 +++++++++++++----- >> >> > 8 files changed, 699 insertions(+), 94 deletions(-) >> >> -- Best Regards, Huang, Ying