From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97C1CD65552 for ; Tue, 26 Nov 2024 20:20:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C823B6B0089; Tue, 26 Nov 2024 15:20:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C31E36B008C; Tue, 26 Nov 2024 15:20:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF99A6B0092; Tue, 26 Nov 2024 15:20:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 933A96B0089 for ; Tue, 26 Nov 2024 15:20:52 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4654B1414AD for ; Tue, 26 Nov 2024 20:20:52 +0000 (UTC) X-FDA: 82829364576.26.FC1C0EC Received: from mail-oa1-f52.google.com (mail-oa1-f52.google.com [209.85.160.52]) by imf18.hostedemail.com (Postfix) with ESMTP id 335AB1C0006 for ; Tue, 26 Nov 2024 20:20:48 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UV3eHDLd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.160.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732652448; a=rsa-sha256; cv=none; b=JEfGyWzy5zGjSP4PlnjtxdfUPxtQsgULNLwTet8pR8n59GHYEB2Fbag5pBPEWAaT7qGPQ8 uyoB1Or4yaYlNTwL6PTMjW3S9ypWs831bTjXzqTpnBr3o9BIIinCVUq551IES746xib5TS eBdypsyuEMNhFMi2yDENbW6HppPMkXI= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UV3eHDLd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.160.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732652448; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BQql4Y5I+YPwVO2ky+hnTzFQ7Ll10AQ/5pNTZ0v08Yw=; b=kfeu762I1S7/cRpnRnzOgJXDo/U/jSLm0fkBXUnuskolOq5BGr8YH1NWqYdVQfLXUnapTl jrOFIY39yqnk5hxoE3wax41b+sRTvKiEy0c9TWfRVf4+oWDuqDBNLz1HXvCVGRag3axPde Jha0AsRd2qsJddImRWwxODD601jg8pg= Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-2976afeb682so1604299fac.0 for ; Tue, 26 Nov 2024 12:20:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732652449; x=1733257249; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BQql4Y5I+YPwVO2ky+hnTzFQ7Ll10AQ/5pNTZ0v08Yw=; b=UV3eHDLduJRSZjGP8GIi3Qw1XnzNCi6Ryf4rSj8ZFF6tkDz5UmTv1qm7wZ6ORStUiO OyaNoYBd28CV9dmooZ3v9+aeXnm93tHWJHH0+2h5cGpALnZR1iVTxbboUEcRBVXW9+8w /tVvmSkpGqp8YASKlMwDxhMPb5a6Y/HQebfaL9P7yKeP8Qw2iyv5rdMrWeE91coltFyl IMqrmuPV9SCoM3+Qur1tSWVOj3D3wJfEakpGtWz9nT0D3X98H/i1xDowrwngI/k3psmh tv5cnjQTrvkEHPwLNqxuxcJEKjiUvGvy1+/fKLswaiuwamZoTyRJXk0DsU5+w69x8JLd gZkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732652449; x=1733257249; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BQql4Y5I+YPwVO2ky+hnTzFQ7Ll10AQ/5pNTZ0v08Yw=; b=Vix7xBriJ1MuUZOt5nLzcsXv1e2dHYEOSV6OOGHDS0zFGJxptSiJsF8Q1zFHN9hMWB vjilhGSiDL+1Jp7+w68vBNlxlXzCCpQiyk318p1i0MVCxscP33oTxtC2BPKCOC/gUulC GkWJj+f34CFucdPdJm0fF82MNIORQzNSXyuLIq9EfugwiZDlp8vqZ3jeqyl1RGzT18mM knJA1Lg00qXhytA7/AoBIc6mkFAWfV0acX2aw9y8erappznqhmVEALH9GJLR/9ze0tvf z86zM3VNIkTnS6SQ8WcAOnH5eEb9DUJUwjXKDJHP2X8qSHCjnwabhsJxXcgykqMCTkkb UoxQ== X-Forwarded-Encrypted: i=1; AJvYcCWymXj5fuGLAb8zdjStn0xGbMh9ORZHIUo3KgZDsP3rqyWXPEVjas0+oMNx7ULpuUVTE7sVgPdQrw==@kvack.org X-Gm-Message-State: AOJu0Yzw+9EbZk10vjEqPXD9asTiebJ16BgeTltDeaz/WISKbHhTBZ+N AUMilwdKzd45WQGrEyC10thsNPrxPoL86TDdOrgjBZrL/YPvA4XTlbOJr/NNkX3SGVgnyeoaDM0 PS1oOUhx+nSnUUvukdaY4Ii2U4K4= X-Gm-Gg: ASbGncvAQV60auRzK+bKOvQ0GloKRMN4W5upMLueL5w2zZ+R7yFycWNyOEeD/xgSCAT Fd21gU5KtQWBjtFgPLo3j+mKv9H+itHB1nkazJuNnY6g0YjoXE1Xo9mtNryXs/DGBOQ== X-Google-Smtp-Source: AGHT+IGvxWf6P3bvvD29v+jrSBd3DPZLplPo9lbJC1ZwvkWgel8Rr3anmPdeC56rfKubNGneB2SnNqI1WAtilsQT+jI= X-Received: by 2002:a05:6358:7184:b0:1bc:45bc:81f0 with SMTP id e5c5f4694b2df-1cab15d3a16mr104283455d.11.1732652449165; Tue, 26 Nov 2024 12:20:49 -0800 (PST) MIME-Version: 1.0 References: <20241121222521.83458-1-21cnbao@gmail.com> <20241126050917.GC440697@google.com> In-Reply-To: <20241126050917.GC440697@google.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 27 Nov 2024 09:20:37 +1300 Message-ID: Subject: Re: [PATCH RFC v3 0/4] mTHP-friendly compression in zsmalloc and zram based on multi-pages To: Sergey Senozhatsky Cc: akpm@linux-foundation.org, linux-mm@kvack.org, axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, surenb@google.com, terrelln@fb.com, usamaarif642@gmail.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 335AB1C0006 X-Stat-Signature: h5mc39gqz5i9py8eutk6nxoyefro4ewd X-HE-Tag: 1732652448-324793 X-HE-Meta: U2FsdGVkX1/hxnPFwEJTlXiP69rWvw4vJmAWJIZfsoDRoetoF1+a38zWyBbzb5hMfBpzSl52ejo1GBhBQmL1WXS7r0ND/sbs3n27x+BnpgH6xUbx8pNqpECKozd6KLDwxipTzeNinJqAMVh4VRM+Z90bCnbgskOkmbJ5QabhUGFOvY3fan0D/Z79wVhsLZ9p+X7cXP8s+N/oKkWjQFaI8+b4BcsfIsBuzxILTRdFn6OKxj6MIDEeTzJGuScdJKcYVptUK36dEDDLE7sYUJNtITsMJDC/epOYMG7REp9R3Yxm50eKQlgeU3lHwF2OCgEG00i2jgaDDFUsKk1B8pQJW/gF1zpWfvA79HHa3MDIxOF3tqxcbWMHrC5PMnU8dUXW/acaPxwMIB9UFlBEeokCPs3elVjaKmNbEUVtPsgDjHqV3iGO9DfcaSjs9fqqZ0w9M2zESJbKGbBuErMlPNqmTNnasZgaBg6Yjo1UZ05nK/rrAE8YI6acRh4g77DZAEUo1u5fB1OExm4k+PGwIYY+UYxoa9c1YVlc8Yi/cYMkR7FW+RuY2NdJukiCtcy8dHrXveSA3gYkEVv5V4PsRpFfxcn+lcwKD977NgoXKoBjS6p7HYx4M6mkTBgxzDjYyiVkdkRk5XK2Rm3yYQsYZpv7XbSRyVdiUbCKyWj6dH1BGTxMoEOXSscynRjKoDJyDnGCSzs5YE4vAmKTNBcERsr14ISf8PnWmPd+Ps79PHZ47EIRFKhvW9Wrsr6XHTZoVNEiOz45qkoil+qJ2fQDaJkl6sAVZ/X4oTq1Q09w8ZXxoBV3PzB5LtMu4f28oZWuvbYibZjQG8oWs5GN0qXfpQIZAu/HQ61DfK0UpTrUwno36uJdzzJuAUlRdcfHQGVJ2vEF+X4SfZXBaGwE8ZDuVqRCcssNecISREDfkhT2qApPajJaAJ/JkRpINb1HqCVUp47xBPESu9RG7aMoxXFGaVq pnSrtatl GBvBEMr0t9qP3zl5/89f/bRR9GIZgBGPczCzamCgDHA8L9kbv6QmokmaWQV7aUKS7F8TIpCnOB7BrcZmk2AgUaOyjFYzmLdJHos8z/HLdP/VvJExxHtWIybPXZVWpQ0a4Pbn1z/EqmRlUUNBrXGmoatm7baNDcpSykQM/bWVSyCn2VT3sWkOLR3hALFZbfXmFG2chxZi9wEsKy/tPVvchdopaJYpe3C/iTCQsjAFttHiC5v3VqR2Ov/OlXQl8GY1Ooy0WF6yuBn5OAo9dTMDtCi5MOSrNwn31LiFvpUN8T73z2GAucyFcL1mdkKMOAu4XSFkBYWCKTc50Pvm5K4rXQA5x+g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000519, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 26, 2024 at 6:09=E2=80=AFPM Sergey Senozhatsky wrote: > > On (24/11/22 11:25), Barry Song wrote: > > When large folios are compressed at a larger granularity, we observe > > a notable reduction in CPU usage and a significant improvement in > > compression ratios. > > > > This patchset enhances zsmalloc and zram by adding support for dividing > > large folios into multi-page blocks, typically configured with a > > 2-order granularity. Without this patchset, a large folio is always > > divided into `nr_pages` 4KiB blocks. > > > > The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` > > setting, where the default of 2 allows all anonymous THP to benefit. > > I can't say that I'm in love with this part. > > Looking at zsmalloc stats, your new size-classes are significantly > further apart from each other than our tradition size classes. > For example, with ZSMALLOC_CHAIN_SIZE of 10, some size-classes are > more than 400 (that's almost 10% of PAGE_SIZE) bytes apart > > // stripped > 344 9792 > 348 10048 > 351 10240 > 353 10368 > 355 10496 > 361 10880 > 368 11328 > 370 11456 > 373 11648 > 377 11904 > 383 12288 > 387 12544 > 390 12736 > 395 13056 > 400 13376 > 404 13632 > 410 14016 > 415 14336 > > Which means that every objects of size, let's say, 10881 will > go into 11328 size class and have 447 bytes of padding between > each object. > > And with ZSMALLOC_CHAIN_SIZE of 8, it seems, we have even larger > padding gaps: > > // stripped > 348 10048 > 351 10240 > 353 10368 > 361 10880 > 370 11456 > 373 11648 > 377 11904 > 383 12288 > 390 12736 > 395 13056 > 404 13632 > 410 14016 > 415 14336 > 418 14528 > 447 16384 > > E.g. 13632 and 13056 are more than 500 bytes apart. > > > swap-out time(ms) 68711 49908 > > swap-in time(ms) 30687 20685 > > compression ratio 20.49% 16.9% > > These are not the only numbers to focus on, really important metrics > are: zsmalloc pages-used and zsmalloc max-pages-used. Then we can > calculate the pool memory usage ratio (the size of compressed data vs > the number of pages zsmalloc pool allocated to keep them). To address this, we plan to collect more data and get back to you afterwards. From my understanding, we still have an opportunity to refine the CHAIN SIZE? Essentially, each small object might cause some waste within the original PAGE_SIZE. Now, with 4 * PAGE_SIZE, there could be a single instance of waste. If we can manage the ratio, this could be optimized? > > More importantly, dealing with internal fragmentation in a size-class, > let's say, of 14528 will be a little painful, as we'll need to move > around 14K objects. > > As, for the speed part, well, it's a little unusual to see that you > are focusing on zstd. zstd is slower than any from the lzX family, > sort of a fact, zstsd sports better compression ratio, but is slower. > Do you use zstd in your smartphones? If speed is your main metrics, Yes, essentially, zstd is too slow. However, with mTHP and this patch set, the swap-out/swap-in bandwidth has significantly improved. As a result, we are now using zstd directly on phones with two zRAM devices: zRAM0: swap-out/swap-in small folios using lz4; zRAM1: swap-out/swap-in large folios using zstd Without large folios, the latency of zstd for small folios is unacceptable, which is why zRAM0 uses lz4. On the other hand, zRAM1 strikes a balance by combin= ing the acceptable speed of large folios with the memory savings provided by zs= td. > another option might be to just use a faster algorithm and then utilize > post-processing (re-compression with zstd or writeback) for memory > savings? The concern lies in power consumption, as re-compression would require decompressing LZ4 and recompressing it into Zstd. Mobile phones are particularly sensitive to both power consumption and standby time. On the other hand, I don=E2=80=99t see any conflict between recompression a= nd the large block compression proposed by this patchset. Even during recompression, the advantages of large block compression can be utilized to enhance speed. Writeback is another approach we are exploring. The main concern is that it might require swapping in data from backend block devices. We need to ensure that only truly cold data is stored there; otherwise, it could significantly impact app launch times when an app transitions from the background to the foreground. > > Do you happen to have some data (pool memory usage ratio, etc.) for > lzo or lzo-rle, or lz4? TBH, I don't, because the current use case involves using zstd for large fo= lios, which is our main focus. We are not using lzo or lz4 for large folios, but I can definitely collect some data on that. Thanks Barry