From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A407ED591B1 for ; Mon, 18 Nov 2024 20:29:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F4436B008C; Mon, 18 Nov 2024 15:29:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A5386B0092; Mon, 18 Nov 2024 15:29:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 250276B0093; Mon, 18 Nov 2024 15:29:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 06C4A6B008C for ; Mon, 18 Nov 2024 15:29:03 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7E2381A0564 for ; Mon, 18 Nov 2024 20:29:02 +0000 (UTC) X-FDA: 82800352152.09.8FB7351 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf11.hostedemail.com (Postfix) with ESMTP id 3A1F84000B for ; Mon, 18 Nov 2024 20:27:59 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CYzIomWj; spf=pass (imf11.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731961592; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=remZFZr7j+fDJ2l19DEONWkqHuM9FVu6W7lwTvaNClg=; b=8CqEW9fLGHHj2MXo7KH++YoHVTnM08aGrrYP0B+dYbNkOVdTw9381LRTo6pCds1Wj1NDUu L6v0/y+pCp00tvakp8t+qzJexMimR9ncYS6h08sKKX+mOKFMmE9obOQQPTOj83G5gtjeJ7 w8vCubCI633Z5v908KwO94PCkblq7zw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731961592; a=rsa-sha256; cv=none; b=3c2wHthhdympXq4/tvk7s//FM6GVJcRsVobU694w0Wewkof35YlO+iwClJ44XPS6o/g9+t GA7T+YN+1ecxxA0AhD23h/jRzWrelfy5XjDHKFNZhEVofX1DMWp5LUSTvWXhMI2iv2+qFP tzigBjIx2ADoQSjI9sYTOxh2EjL+lr4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CYzIomWj; spf=pass (imf11.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-20cd76c513cso38787355ad.3 for ; Mon, 18 Nov 2024 12:29:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731961739; x=1732566539; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=remZFZr7j+fDJ2l19DEONWkqHuM9FVu6W7lwTvaNClg=; b=CYzIomWjGHJopXkOC/Q8IislYFSxXbnHpfqSPkuEujx0sZBWEQCkI4WmyQipwjWexX mmB8DcnXHlyfdfRlKBz8Prr+KYC9MjmJRy3+LnrjvrV6qFxfJnTsthE4vJR/IsNX4ju7 hc9t0bvf4y3jA4DHX30oJR/OsstmgRmgsseeMn7cgq7U1jjkB29LyOzUX6DO4DwVpy0D b/MW/UqdPX2xNirfB+W0s5dNYRugkyw8KHloRPUxhqktmjY/HbCbydcCZiJaKRcFHnEi d96k4VUp9s0RYoXxzjLyZgk1sVp8q+xkjBx0Q2+OMMol9c7p/BCJ1K73X7l4x0dE++D/ iJkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731961739; x=1732566539; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=remZFZr7j+fDJ2l19DEONWkqHuM9FVu6W7lwTvaNClg=; b=Eb8esVU1MU0bm/9xLz73V1y4puT0PECRtFAm9wkjlyi4xpOTKM4fv/tMRqrbwXIRHa 5pPPtKvT1dQmj7foKwVJnYghKzcvgr7F4Jh8/UDdcWszTqvp935kQm/V/+cEipEj0zpZ Wk3OsZgZ177RLC9TvcaGjzvxZI8fxzP5KDSkOC5w9+Dc/3oZ2/PYTxw94c+K4LUVPPE9 817NqtxsmwH2G0+Y8VKx6Ms+K7c2hpVwin60zoyZNyGM42hS3xqQ5BZGYOt3rGtXIR9r JA6oMxFa+erhKZMxrFUJziTmitlFJv9h7ubx1gFag6QXGh8CukBLF4m8XlSnAKWK6hDl Cedw== X-Gm-Message-State: AOJu0YxKvbDsWtYoFU62fp8W2DpB626m2c+58pz6ZJrig/Jw6kiCR0o9 mWrvSnXmDAJYDgsEJS0phBFLN9bJwLihTZLgR3C0dRfzegcD0YdL X-Google-Smtp-Source: AGHT+IGlKPr6I3gREpjEvm/P8z81hxXxxthwouDkF1+agNyX0nr18Y3stsudiL7ynDxNrdO8TILhVQ== X-Received: by 2002:a17:903:2a8b:b0:20f:ab4a:db2e with SMTP id d9443c01a7336-211d0d7f556mr187876095ad.29.1731961739149; Mon, 18 Nov 2024 12:28:59 -0800 (PST) Received: from ?IPV6:2a03:83e0:1151:15:1814:7077:ef3e:914? ([2620:10d:c090:500::7:b3a3]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-211d0f47ae7sm59156775ad.208.2024.11.18.12.28.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 18 Nov 2024 12:28:58 -0800 (PST) Message-ID: <92b25b7b-63e8-4eb1-b2a6-9c221de2b7e4@gmail.com> Date: Mon, 18 Nov 2024 12:28:56 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC v2 0/2] mTHP-friendly compression in zsmalloc and zram based on multi-pages To: Barry Song <21cnbao@gmail.com>, Nhat Pham , ying.huang@intel.com Cc: linux-mm@kvack.org, akpm@linux-foundation.org, axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, senozhatsky@chromium.org, surenb@google.com, terrelln@fb.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com, ryan.roberts@arm.com References: <20241107101005.69121-1-21cnbao@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 3A1F84000B X-Stat-Signature: 48r9ysrj3tiwk36pimc8m8wsfw13kb53 X-HE-Tag: 1731961679-246776 X-HE-Meta: U2FsdGVkX19t58TCEkQ888baq6dDXJ7RSqT+pJuthLAyysmKdfVduIOqHI4aUf29aTaFlFYOyzSRp363GVd7wXQxQR1dMphJVP9yi4uHCmLVRh/XYZzZhi+WvsdmrowCkrdBZjN7JoCkhFhTvb6OCqAP29SlHM2HWePw3WvKaC+D9lGpcc3i1UqiB1L9mx6/tCIR7tRGJfOn7EJw207Dz3du6bOW0eoPpFF+AQbOZUHKUn6OtaXAt26R8jTdKLEBsELJnz71Oj/Q5X2ODs/fw7TRyeJIJfYn++y3YrsyK5EKlEXJClHvtG36WkQ0XcZJVGYvIvgzZ6PO3L/aisu2J1hHo1cMOUBjPPc0EdioLQ0Rj3YV61/tKcwOg2Q/9MF75fCdIGHOhsuGgELK6yUm8nDVEk8uBu1U/sXSpiBRG5MASMItfg4TecLQ3GddhtXjpFkMEZX0xjyfKI2Z2GRSP1Zemi10nwpOXJaQ4oR9yKpPdsMgHHCIgEyP7FPyE0mFIb/9dPDgoZ7CKR/BFg4dBSgF2B2MJEooZ2FK9SPedRTtc1p4zJdZuj8uREVCvMddf+57lnvEnLM+fP7ZKex2EfSXlbNofsN3XktboW+qs8IIxhbYFqvI1FOvgAhbpUKyVun1OoIidTRp2dQk0+koSlstSsjzUMXihx6LfAyc8McFwQ/6vQCx2QAXU99IWwFVLzVUmfMeEZ6PTs+YHuErp22ZjdvCbmQaOWyfsq+8epKCsyXgyOPV3tZZR72NE+D3rG124Y+MrsQHQy85Njf2RSfTDJE1yQPhKbe51Dkq6q1YdGtfx064FAAOIwLG0SFdD5zbBIAtv+noSMVJv004Gjf/ucfA2eosf45lKnTBHfdTohB6lNAs+c6+0A3GN2kbSvdzH8Phh+N0yQhbTGAX2EyiecjCyDUYwB5Tk9fChWeAZgAepePkrspCKPzaKtVl75s1czyt+KObr83Q7Mr F2idKijS BDnrKiC7pMqa7WoMB4zQRR2zHoxQOp+HhwaE6CAs85OnF0b7k2XMCUoLSKyOVAHYTKl3zebL9+fUDEFG8gURPZrLv1l85JwzucX8cTLi6I3FkUBJ0kpD3eM9gIjnXvCL+xe/bwnCj5B+W5uPnLqvnADF/ObpPUTF5wxGMCvI+iaDHbEoMdwOXjqi89386OK9K2M70mXMfQ7k/TPIyVN8EkYMkpl0Dk+On2UbB5JaB3qqjqn7QToXDRy8j2iJjL0XCzEbfjzMC2AHvqiv/jIoRjOgRG3TxvELqI9c+VJwHBngW/yZOmNbsMqg8U+1ZJ7G/S2kplioNjunkKUSl+40gT0pxfnKvDmKlJyskdyiySbWykB+30pRGRk66dgqqk8e2IsNEnPEdQtIXfyCpYoVXQkgRnfY3HRoo0qvtj6xSx4GPwlOMRfBOkAimmAz1eS7r6WqIK4VI/WIRO5AwbkOn6hvBP0703oA6XSuQ/hGDnetRbS8CAvT5aGcWwdzNkTCcd5df2ASnCGhXjR6RLSF6Afhq5g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 18/11/2024 02:27, Barry Song wrote: > On Tue, Nov 12, 2024 at 10:37 AM Barry Song <21cnbao@gmail.com> wrote: >> >> On Tue, Nov 12, 2024 at 8:30 AM Nhat Pham wrote: >>> >>> On Thu, Nov 7, 2024 at 2:10 AM Barry Song <21cnbao@gmail.com> wrote: >>>> >>>> From: Barry Song >>>> >>>> When large folios are compressed at a larger granularity, we observe >>>> a notable reduction in CPU usage and a significant improvement in >>>> compression ratios. >>>> >>>> mTHP's ability to be swapped out without splitting and swapped back in >>>> as a whole allows compression and decompression at larger granularities. >>>> >>>> This patchset enhances zsmalloc and zram by adding support for dividing >>>> large folios into multi-page blocks, typically configured with a >>>> 2-order granularity. Without this patchset, a large folio is always >>>> divided into `nr_pages` 4KiB blocks. >>>> >>>> The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` >>>> setting, where the default of 2 allows all anonymous THP to benefit. >>>> >>>> Examples include: >>>> * A 16KiB large folio will be compressed and stored as a single 16KiB >>>> block. >>>> * A 64KiB large folio will be compressed and stored as four 16KiB >>>> blocks. >>>> >>>> For example, swapping out and swapping in 100MiB of typical anonymous >>>> data 100 times (with 16KB mTHP enabled) using zstd yields the following >>>> results: >>>> >>>> w/o patches w/ patches >>>> swap-out time(ms) 68711 49908 >>>> swap-in time(ms) 30687 20685 >>>> compression ratio 20.49% 16.9% >>> >>> The data looks very promising :) My understanding is it also results >>> in memory saving as well right? Since zstd operates better on bigger >>> inputs. >>> >>> Is there any end-to-end benchmarking? My intuition is that this patch >>> series overall will improve the situations, assuming we don't fallback >>> to individual zero order page swapin too often, but it'd be nice if >>> there is some data backing this intuition (especially with the >>> upstream setup, i.e without any private patches). If the fallback >>> scenario happens frequently, the patch series can make a page fault >>> more expensive (since we have to decompress the entire chunk, and >>> discard everything but the single page being loaded in), so it might >>> make a difference. >>> >>> Not super qualified to comment on zram changes otherwise - just a >>> casual observer to see if we can adopt this for zswap. zswap has the >>> added complexity of not supporting THP zswap in (until Usama's patch >>> series lands), and the presence of mixed backing states (due to zswap >>> writeback), increasing the likelihood of fallback :) >> >> Correct. As I mentioned to Usama[1], this could be a problem, and we are >> collecting data. The simplest approach to work around the issue is to fall >> back to four small folios instead of just one, which would prevent the need >> for three extra decompressions. >> >> [1] https://lore.kernel.org/linux-mm/CAGsJ_4yuZLOE0_yMOZj=KkRTyTotHw4g5g-t91W=MvS5zA4rYw@mail.gmail.com/ >> > > Hi Nhat, Usama, Ying, > > I committed to providing data for cases where large folio allocation fails and > swap-in falls back to swapping in small folios. Here is the data that Tangquan > helped collect: > > * zstd, 100MB typical anon memory swapout+swapin 100times > > 1. 16kb mTHP swapout + 16kb mTHP swapin + w/o zsmalloc large block > (de)compression > swap-out(ms) 63151 > swap-in(ms) 31551 > 2. 16kb mTHP swapout + 16kb mTHP swapin + w/ zsmalloc large block > (de)compression > swap-out(ms) 43925 > swap-in(ms) 21763 > 3. 16kb mTHP swapout + 100% fallback to small folios swap-in + w/ > zsmalloc large block (de)compression > swap-out(ms) 43423 > swap-in(ms) 68660 > Hi Barry, Thanks for the numbers! In what condition was it falling back to small folios. Did you just added a hack in alloc_swap_folio to just jump to fallback? or was it due to cgroup limited memory pressure? Would it be good to test with something like kernel build test (or something else that causes swap thrashing) to see if the regression worsens with large granularity decompression? i.e. would be good to have numbers for real world applications. > Thus, "swap-in(ms) 68660," where mTHP allocation always fails, is significantly > slower than "swap-in(ms) 21763," where mTHP allocation succeeds. > > If there are no objections, I could send a v3 patch to fall back to 4 > small folios > instead of one. However, this would significantly increase the complexity of > do_swap_page(). My gut feeling is that the added complexity might not be > well-received :-) > If there is space for 4 small folios, then maybe it might be worth passing __GFP_DIRECT_RECLAIM? as that can trigger compaction and give a large folio. Thanks, Usama > Thanks > Barry