From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A555D49205 for ; Mon, 18 Nov 2024 10:27:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A6356B00B5; Mon, 18 Nov 2024 05:27:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 956DF6B00B7; Mon, 18 Nov 2024 05:27:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D0CD6B00B8; Mon, 18 Nov 2024 05:27:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 595476B00B5 for ; Mon, 18 Nov 2024 05:27:17 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0655A400B0 for ; Mon, 18 Nov 2024 10:27:17 +0000 (UTC) X-FDA: 82798837002.23.411836A Received: from mail-vk1-f170.google.com (mail-vk1-f170.google.com [209.85.221.170]) by imf20.hostedemail.com (Postfix) with ESMTP id 396EF1C0016 for ; Mon, 18 Nov 2024 10:26:14 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Gv6OQGfR; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731925455; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DBKIink2BgWRUKJgTW5YrrRAT4RmtW/o5Huze++s3JY=; b=NwZTk1Bd0vXqMy2Jdy1B87QMJGan0d7cAeBh3DIpwyTjN7M0ICEdCvHkW8i4QRuhf4Vm8v EJFrV0s3y58GEnTgTLTe89dG8ZvX4Fcat1f6YQvaEaR/cC0KEvS060MKSLF/1aJ+KN1ihb VPTYKrmG89hxEnyEXDjt4RS5f4wwVQo= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Gv6OQGfR; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731925455; a=rsa-sha256; cv=none; b=z8tBB01cJUegLu0eHfWzjeBtR/mlMIIvyqCdIkzMlIkq1O6cqS7cxf5l0jvzE3wU0gxYxx HNhH1t1I0bnEq6B1xV0FB4+2H++IFA1RXh2suBHBR733EXaMdZsCxcYL6aJZHWO6DU4l5S 4/49Ou63zOdHRPGQ19Jqj3E9KqU0bMg= Received: by mail-vk1-f170.google.com with SMTP id 71dfb90a1353d-513e583c173so1566152e0c.2 for ; Mon, 18 Nov 2024 02:27:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731925634; x=1732530434; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DBKIink2BgWRUKJgTW5YrrRAT4RmtW/o5Huze++s3JY=; b=Gv6OQGfRAx2W4GQZumYubr2cs8bqxNILkZ5PeMOFwZQ01K6Jxl1qSInzeptXN87+1V avms/MA31yo0i3M4Ea4nL7UpP5Ld7bOC7aX27GVuHN80gte6xROe09y11ynaarZTOzoA TzNemnTggnrs1SyeVYf6TQ4dIwN7agXJvxrOlx4zcxAOfQTcCM5dK9fbjxGsHzeRp+PE dw/RUozBHk00q3oHKW2j/dzXi3IzYAorVPh7r1gW3w9CGcupb7BGf0wgNlH43zl9CsWN 5WXf6l0BrVb7S8LeTjlxlPhJkSlUtH8eQfWMy7DUVXlggflV8PgPseXa3kjWpTfZeSF0 BL1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731925634; x=1732530434; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DBKIink2BgWRUKJgTW5YrrRAT4RmtW/o5Huze++s3JY=; b=VqxJdY42Lq4zWt1t/Al//duKLTKSKjCMg6T9yLMoQku9uMUieNlTZKaVc9QdVADgtn /eIDY64iHVwIDh2AN9qtUWpfpB/M8cWW3qI9d2Whb6l2suU8HHfxxNNkH2BxaRk4J1dZ s5uueisghTqOwQgV19/VH8dblXz312YVSkNBJaeTiqUNj7Qx1ZH013lSGMoVCQsnmtqZ YsqjYUS41SrKO08YMSKkd62jqF8JjX5tGxLPyalg522qTJn51EzJ7sfR581R6clNr+BI 2wJnz3TwxpuZnIXeVBzrcpaCNw5mwDAmRLfHFiDjlsl1K82movqxqnhsschtSIiR/oKs i5vA== X-Gm-Message-State: AOJu0YwUAYz44KPIgurLC5jYzlJa3txpdyan0xMZKz0vhP6jgWq/VUVC swoN5jrBhzXa89z6DFSy7QNs287s9DtkD0BT9BZScULNP9M/oLgnIawidBPlRnBzs/lKCJFhXhy hUUy4SOTXP9tkfyYMQLjeKuf/KJA= X-Google-Smtp-Source: AGHT+IEgFQ/P+plwWw9WRkDaMfhRIUQJ4AFOdPn7FZ2NhFgWj5MBtua6yEaCfs/JRDtKIdMV/4GaSIsr1H5na+53xlg= X-Received: by 2002:a05:6122:54b:b0:50d:869a:e542 with SMTP id 71dfb90a1353d-51477f99ce3mr10392529e0c.9.1731925634344; Mon, 18 Nov 2024 02:27:14 -0800 (PST) MIME-Version: 1.0 References: <20241107101005.69121-1-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Mon, 18 Nov 2024 23:27:03 +1300 Message-ID: Subject: Re: [PATCH RFC v2 0/2] mTHP-friendly compression in zsmalloc and zram based on multi-pages To: Nhat Pham , usamaarif642@gmail.com, ying.huang@intel.com Cc: linux-mm@kvack.org, akpm@linux-foundation.org, axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, senozhatsky@chromium.org, surenb@google.com, terrelln@fb.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com, ryan.roberts@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 396EF1C0016 X-Stat-Signature: oz9cikgcukqrmrpw35dari6pgssz9ykq X-Rspam-User: X-HE-Tag: 1731925574-458214 X-HE-Meta: U2FsdGVkX197fGJPPcy3DwX/peuexn/bJqJIOAeeAx1eb3zTlu04nzJRXYZ2TZ3ZF5oGVpb8QqsUkizavJuosZ8RFWqLa48qWEXbcTPUdwtAg/a3UhVzeN+lPnVqSpJZatf40QeU0hJ5nV+pc2hTgJpGa+LgACPQrZBTsq6cIdd/WqjHnnws9YWeOzRqg2jh9mV7ghoqJzSvgCax1x/e3E08X+h5by89AtTeVCHcl9XED/884dK8wQJNjYcemITFn5T3XkUHiW34ywVXVnRr5xSjCNaZysxC0/yMq9xtw7Cg4lmhUPrOXokzqH3a6kyLwQxg5wCYrsSakYb/WUOHcv7m7fTYJ6XMONHNz2gYXtzzm7gdmVAOGsNqW78o1ERlsF3jUY784zNTOu7L3m9s2PUU2oCLhWoJX4asnRgNYI5b3k2lT0Lau5wuZuKnnKNRdB6J4V14H4vVnSxs9T2lq6eNASoS2xCXn65IDIxKkOsIUYW2dVfoaFFue69yd2AuSUnYV1clmpp183mcCFhZltMAbvzdm0W2wub0ESFQ1ySaekQsgsPzieuwRt6uddDzMH0vXAFyIFHWCz4RyFODN0wKKbhWCRIVnJFGD4I8yFQtC/6joBUM7wR2F5U6i67Yv7ebjMyqLy1RKqJHkBS6W+XIxyT7OvEygWylqNrOfuptuZVqpYN6vPxne9BYTk3Ah6Z0Ofj47bwNqzB80CD3PG8tJOqkq6L01pgW544mkuFboefmIycEOLG/4p2EkZZR0Tw9CRFZwkL7HopLrSaGgU22JmA/aG6V+6YmoKjdLLJrjnAQWd4jvm57Yz4eD2yr3Irgaq1DF7cN+aNbMgHpWl/b7IEytYGn8sq+KEmA5o3ixzMS8QGuEppJkStrmYTFr9As8Q82JJEBWNs6MntL2BclvS7THFiqZDMw1cVIPI5bcP+oFE/yUiHAI8QHqMJdfu3yleJewhc6QW6B88K iQ3iWe/S UAMyhE7iIE5doICjnekotPdeA+9Co5LrCjzdU1pkgfPgysc+zTMPDwBbCUEhdV+0m5Pe0AalAT/KetbLsmiu443Vj2vA8bssM/7grk5OGfF46noIG9BviiYqCaSiJ0yNCUgYnupUaGTUr46dMI/V+XaEQZIdNZfS7T/KgPeK0+jxWH7aopymIHUa9PORhRWJkMVbjDGeUx6pZleY2gxfuQyZVRzzC1GxJmzdCwXorOhyZWd10RB2IdMXjHWyFfMKITqlJnuGRxmu5dN2bCdDZM8QgPRbj1ZjEAcjMVTqTv1/LqDcOq8NbKTR+qahpScfGCFxmmkDm1RYKa+z0DmQdQ4FNLVxDWDCpC59TtYXUlnSYM+ZQv6ulphJHizX8kx34RjZ/EvYBpeMQiOA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 12, 2024 at 10:37=E2=80=AFAM Barry Song <21cnbao@gmail.com> wro= te: > > On Tue, Nov 12, 2024 at 8:30=E2=80=AFAM Nhat Pham wro= te: > > > > On Thu, Nov 7, 2024 at 2:10=E2=80=AFAM Barry Song <21cnbao@gmail.com> w= rote: > > > > > > From: Barry Song > > > > > > When large folios are compressed at a larger granularity, we observe > > > a notable reduction in CPU usage and a significant improvement in > > > compression ratios. > > > > > > mTHP's ability to be swapped out without splitting and swapped back i= n > > > as a whole allows compression and decompression at larger granulariti= es. > > > > > > This patchset enhances zsmalloc and zram by adding support for dividi= ng > > > large folios into multi-page blocks, typically configured with a > > > 2-order granularity. Without this patchset, a large folio is always > > > divided into `nr_pages` 4KiB blocks. > > > > > > The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` > > > setting, where the default of 2 allows all anonymous THP to benefit. > > > > > > Examples include: > > > * A 16KiB large folio will be compressed and stored as a single 16KiB > > > block. > > > * A 64KiB large folio will be compressed and stored as four 16KiB > > > blocks. > > > > > > For example, swapping out and swapping in 100MiB of typical anonymous > > > data 100 times (with 16KB mTHP enabled) using zstd yields the followi= ng > > > results: > > > > > > w/o patches w/ patches > > > swap-out time(ms) 68711 49908 > > > swap-in time(ms) 30687 20685 > > > compression ratio 20.49% 16.9% > > > > The data looks very promising :) My understanding is it also results > > in memory saving as well right? Since zstd operates better on bigger > > inputs. > > > > Is there any end-to-end benchmarking? My intuition is that this patch > > series overall will improve the situations, assuming we don't fallback > > to individual zero order page swapin too often, but it'd be nice if > > there is some data backing this intuition (especially with the > > upstream setup, i.e without any private patches). If the fallback > > scenario happens frequently, the patch series can make a page fault > > more expensive (since we have to decompress the entire chunk, and > > discard everything but the single page being loaded in), so it might > > make a difference. > > > > Not super qualified to comment on zram changes otherwise - just a > > casual observer to see if we can adopt this for zswap. zswap has the > > added complexity of not supporting THP zswap in (until Usama's patch > > series lands), and the presence of mixed backing states (due to zswap > > writeback), increasing the likelihood of fallback :) > > Correct. As I mentioned to Usama[1], this could be a problem, and we are > collecting data. The simplest approach to work around the issue is to fal= l > back to four small folios instead of just one, which would prevent the ne= ed > for three extra decompressions. > > [1] https://lore.kernel.org/linux-mm/CAGsJ_4yuZLOE0_yMOZj=3DKkRTyTotHw4g5= g-t91W=3DMvS5zA4rYw@mail.gmail.com/ > Hi Nhat, Usama, Ying, I committed to providing data for cases where large folio allocation fails = and swap-in falls back to swapping in small folios. Here is the data that Tangq= uan helped collect: * zstd, 100MB typical anon memory swapout+swapin 100times 1. 16kb mTHP swapout + 16kb mTHP swapin + w/o zsmalloc large block (de)compression swap-out(ms) 63151 swap-in(ms) 31551 2. 16kb mTHP swapout + 16kb mTHP swapin + w/ zsmalloc large block (de)compression swap-out(ms) 43925 swap-in(ms) 21763 3. 16kb mTHP swapout + 100% fallback to small folios swap-in + w/ zsmalloc large block (de)compression swap-out(ms) 43423 swap-in(ms) 68660 Thus, "swap-in(ms) 68660," where mTHP allocation always fails, is significa= ntly slower than "swap-in(ms) 21763," where mTHP allocation succeeds. If there are no objections, I could send a v3 patch to fall back to 4 small folios instead of one. However, this would significantly increase the complexity o= f do_swap_page(). My gut feeling is that the added complexity might not be well-received :-) Thanks Barry