From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95A7CD3ABF4 for ; Mon, 11 Nov 2024 20:31:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 206F48D0009; Mon, 11 Nov 2024 15:31:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B7F38D0001; Mon, 11 Nov 2024 15:31:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 056E98D0009; Mon, 11 Nov 2024 15:31:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DA77E8D0001 for ; Mon, 11 Nov 2024 15:31:23 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 85B31121B8C for ; Mon, 11 Nov 2024 20:31:23 +0000 (UTC) X-FDA: 82774958070.22.23DA0F8 Received: from mail-vk1-f169.google.com (mail-vk1-f169.google.com [209.85.221.169]) by imf09.hostedemail.com (Postfix) with ESMTP id 4C86A140007 for ; Mon, 11 Nov 2024 20:30:53 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EWBBIukD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731356953; a=rsa-sha256; cv=none; b=k9gW6JV+mkQssyHlGBzrFXLOA6T3forpt+kBjDClZBGfmSY68R9azpWrIxWtfYAkPefycj NA5vKE3f1xc18B73HeRwyfcBOpbQQITkIhhknLNdHBtp9nyPBc0SM9AYPn/n2bk1X/ozNE Mn+FMxubD/JgcMWSsPyROmCTnAkrQh8= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EWBBIukD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731356953; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4m+CAl8xmjRRoAU9r3aqEsFRGdffCDozgyQfRK4HAzg=; b=CFrxcc1C6wyS6KV5MNekkkD1XouSX3sVMGJX5XS/YDizdv68tK2mOcHsWE93VcRP7yW1gI 2S9mELjoDPavdlu/wWCJ3zc3uGG1Ywj+SUd8A4MPXH20Bv3oDS7nNvtDaiNB6bQvZeKOER cAa3q23XZT3M8c9jxWupW8wIyQQkWHI= Received: by mail-vk1-f169.google.com with SMTP id 71dfb90a1353d-50d34db4edeso3591834e0c.0 for ; Mon, 11 Nov 2024 12:31:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731357081; x=1731961881; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4m+CAl8xmjRRoAU9r3aqEsFRGdffCDozgyQfRK4HAzg=; b=EWBBIukDQ67Tby8uibiNQ8Tl8yFp0L/mb47bAQi5Mbp4GHQoXPg7RVE1khwcQ7c790 TWI/Ah7OwV6JpFsyiKyMPEbznUnlj4+yXR87cwJ5SIiDCooI8pM5L0WDDYRAVNbpdfAB 7wYXY1Rz64oRWpsGzF+Yahe3RDKwZ2WMMBkzdFkmSbX4crmXOonQCwObhYnqBDxxm1De Wzux2sTNDB+KFYT2CadUBL3vy22VYQu09GKOirTFpTfjCGyeD8E85UyqU5GhfU52Kpts 9st40wPZq4s8ieGWGZ6Qm63s2rqBvVI6wfOcFRl1u36N89ADnMbBEv4vY2cpAXYraoij 0wJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731357081; x=1731961881; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4m+CAl8xmjRRoAU9r3aqEsFRGdffCDozgyQfRK4HAzg=; b=F/V5g92cGG2Sbo1B4QF1r9Ee5dyDZyThy073h/pFPI+DjzBG8FPL5ShSEy3nuCO+rR B0P7BHltE23neE7zZ6AemgYodFb55ZrKPBnLY3Q2hCEDJU/x+tyr/+TdKoGiwokSfNvE QFCgAjkpypse6SHKcBUZDElsyHDkaz3UmMvsAAkgvis5kP+jabx0WuFTP8HpK0ZdVzqA Vy1LN5N/7cJ/x/rOzA6o0ywGbqvOBppkM76vvJOXPpQ5OPTGuOoo6BrOacc0sgyaHGwL A7IonlcBMqkfmdKjjTUE+XliCeqFNH7cTW93aAO8fyFUUSXB7NSy3vrLD7xmFLIBzKWj YIYw== X-Forwarded-Encrypted: i=1; AJvYcCUmC47u5v87I0/VcIzwO4dAd6lBEnR5Iy7gg+ozkJYIvntfAsW+Ykh+tG0VzwPaei3JJGhDeUt7Og==@kvack.org X-Gm-Message-State: AOJu0Yx6+o3zoJIACnYGvFhqAHghCIl8abNAurTBj3XbLVAsNwPRS0AO HOI6sf26ZbeT4qNJvgkb4XIffgKRRkvndyk7d1N1V5CIYhLtlvEmduT/rClGK7EF0MzuOxQVWip K6xoG3n6S7IqrHSs3ZunWkMIqTTg= X-Google-Smtp-Source: AGHT+IH8p5KmEazeLGjGEu9FVBZXAN7xs55SO7VcOtUW9hovlquT8qaaf2S22O4V5mTf7Y9ydqnw4J6SWv663bs5oTw= X-Received: by 2002:a05:6122:640c:20b0:50d:5754:c903 with SMTP id 71dfb90a1353d-513fffc8fc0mr8960702e0c.4.1731357080535; Mon, 11 Nov 2024 12:31:20 -0800 (PST) MIME-Version: 1.0 References: <20241107101005.69121-1-21cnbao@gmail.com> <87iksy5mkh.fsf@yhuang6-desk2.ccr.corp.intel.com> <28446805-f533-44fe-988a-71dcbdb379ab@gmail.com> In-Reply-To: <28446805-f533-44fe-988a-71dcbdb379ab@gmail.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 12 Nov 2024 09:31:09 +1300 Message-ID: Subject: Re: [PATCH RFC v2 0/2] mTHP-friendly compression in zsmalloc and zram based on multi-pages To: Usama Arif Cc: "Huang, Ying" , linux-mm@kvack.org, akpm@linux-foundation.org, axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, nphamcs@gmail.com, senozhatsky@chromium.org, surenb@google.com, terrelln@fb.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com, ryan.roberts@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4C86A140007 X-Stat-Signature: 8xz5ihmq1jyjmhtaqyeci7qh38cd6jg1 X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1731357053-602539 X-HE-Meta: U2FsdGVkX1/8k9G1u7Hh+Ipj2OqfO5hPS8+B+09Sv9NH8rsYOc3/6pDE4TPos+CU2revllY+ybjgOiX7G9hjw7aZwcmp/DDQN4cKOBcdC3vJ0tC/+IIp7j1WOFizV6olkB6bGoqMEvuUEZz8VdG5SqJVGzVCKstCRpTHO2Ihxy1Ij+QAQLI7lNQePyT29wTWAlUy0SnO7Qq8ThPnWpRZeGA3NFrW86wPnRK5dbVdtXdnblSL0kFbE/V8SHflFP/mmrI4+f7fnzriXKcZZdBsngHSzzHcxyZYW/jZ18hjwQhCJp+37zmUzIfIOt96/3/y01gy1DpzA0qnTGMtGNQxDnSgj5k0Sstjz5wGzFAWq4TMy6FyWFevPygEKMieJgICgINXqwLqo4V9tZ69V6aCeChFjI8iIH61Hnl9hWHTq6GFmW1/6LnqldXSI2Qt9NqsqwPnxG43eSGlhZLyQkV4+f/OJRBkuiLadZqZUyAVlYnfxlzLrxZrZYCDskEux2RqSCsQoZjVZEuidgxiFvL8mhbvN5UdR1DbCMQkOeWkjw8XozWGTGQSxWUl5b1j0EDtsS06PpjkKzncpLzXppnPEpaEQzrGHJfUjQWVxkFjTxPsO7afWX+in7rYmltZg36WQ6O5Yy96rRNbHW7HYQ0HIi3gGR7CssP7n5/widMQMU/b6QEVCkL1Btg5r7Lx+6wDFVilPp6dLgx79NxlJx7m2GFqh2MAzDD2vwNb8WLfzjLkI2oCx5WKXGcrrAXJ7tdcvCbeZy4/AEwt2yLLkHfNU70PRzuEGQDjWDjw/MAsIdYc4M6DZAaiJt7fMkgbvG/KzHXCWKQrr88Hoz+b5TpEsg12t0diLAxgGEwETi7oKf6E3n6xGH4mNO4z+eZcw0iyvpyZlloNK9X2yB300gSuE8ieh2xuYhc3+2fBY6bOh/AWokl8w5qDO5xR+kvFQjA6Cz2Cgm2c9kVa6OjAYTp 1fqK5NV4 wbrgGMfhUYQu+CWM/oLi/7mybQwfxwtiVrfBMZkbbx3Cnwd+JQ/KjhamqamMkVPrVDU5JIOy+1CPfwjvTQqhjnk+iwn9aKIP6odjCSnW79JelRDoKd9LFLPX4JXQDzh1xc9ZFE8vSZoxQVRlXGcCvHK0Eze8O+UazE/7fKYupZELZThM5M/vQtwtsqeuYMJb2kSRh36LyjE2EXm5qy0ReGoAr5hWsJ082zB9sjT5X2LQsH/8kFQwb/vUPJW/2VDrOMNl4lbhoLcNzsNkzN3zTTUGFJdads/zhuCp5tldq9sT9tY4r3NyuQUGnPxjnSEwk55CO/aNg1leniK+IF/Gefw4m2Zgag07FbzKNzQndu0OrJ8yqWRHgAM05TzduDaEt3uRKFbbpNPA2qWLBNEB7k5C7pK1qmS+UukziyMISodrk2/5g+m7SScVbLbKFfdHjfQx1vUkOJw9/ztnOojSfzN//ahTrEgg17zA3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 12, 2024 at 5:43=E2=80=AFAM Usama Arif = wrote: > > > > On 08/11/2024 06:51, Barry Song wrote: > > On Fri, Nov 8, 2024 at 6:23=E2=80=AFPM Huang, Ying wrote: > >> > >> Hi, Barry, > >> > >> Barry Song <21cnbao@gmail.com> writes: > >> > >>> From: Barry Song > >>> > >>> When large folios are compressed at a larger granularity, we observe > >>> a notable reduction in CPU usage and a significant improvement in > >>> compression ratios. > >>> > >>> mTHP's ability to be swapped out without splitting and swapped back i= n > >>> as a whole allows compression and decompression at larger granulariti= es. > >>> > >>> This patchset enhances zsmalloc and zram by adding support for dividi= ng > >>> large folios into multi-page blocks, typically configured with a > >>> 2-order granularity. Without this patchset, a large folio is always > >>> divided into `nr_pages` 4KiB blocks. > >>> > >>> The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` > >>> setting, where the default of 2 allows all anonymous THP to benefit. > >>> > >>> Examples include: > >>> * A 16KiB large folio will be compressed and stored as a single 16KiB > >>> block. > >>> * A 64KiB large folio will be compressed and stored as four 16KiB > >>> blocks. > >>> > >>> For example, swapping out and swapping in 100MiB of typical anonymous > >>> data 100 times (with 16KB mTHP enabled) using zstd yields the followi= ng > >>> results: > >>> > >>> w/o patches w/ patches > >>> swap-out time(ms) 68711 49908 > >>> swap-in time(ms) 30687 20685 > >>> compression ratio 20.49% 16.9% > >> > >> The data looks good. Thanks! > >> > >> Have you considered the situation that the large folio fails to be > >> allocated during swap-in? It's possible because the memory may be ver= y > >> fragmented. > > > > That's correct, good question. On phones, we use a large folio pool to = maintain > > a relatively high allocation success rate. When mTHP allocation fails, = we have > > a workaround to allocate nr_pages of small folios and map them together= to > > avoid partial reads. This ensures that the benefits of larger block co= mpression > > and decompression are consistently maintained. That was the code runni= ng > > on production phones. > > > > Thanks for sending the v2! > > How is the large folio pool maintained. I dont think there is something i= n upstream In production phones, we have extended the migration type for mTHP separately during Linux boot[1]. [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplu= s/sm8650_u_14.0.0_oneplus12/mm/page_alloc.c#L2089 These pageblocks have their own migration type, resulting in a separate bud= dy free list. We prevent order-0 allocations from drawing memory from this pool, ensuring= a relatively high success rate for mTHP allocations. In one instance, phones reported an mTHP allocation success rate of less th= an 5% after running for a few hours without this kind of reservation mechanism. Therefore, we need an upstream solution in the kernel to ensure sustainable mTHP suppo= rt across all scenarios. > kernel for this? The only thing that I saw on the mailing list is TAO for= pmd-mappable > THPs only? I think that was about 7-8 months ago and wasn't merged? TAO supports mTHP as long as it can be configured through the bootcmd: nomerge=3D25%,4 This means we are providing a 4-order mTHP pool with 25% of total memory reserved. Note that the Android common kernel has already integrated TAO[2][3], so we are trying to use TAO to replace our previous approach of extending the migration type= . [2] https://android.googlesource.com/kernel/common/+/c1ff6dcf209e4abc23584d= 2cd117f725421bccac [3] https://android.googlesource.com/kernel/common/+/066872d13d0c0b076785f0= b794b650de0941c1c9 > The workaround to allocate nr_pages of small folios and map them > together to avoid partial reads is also not upstream, right? Correct. It's running on the phones[4][5], but I still don't know how to handle it upstream properly. [4] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplu= s/sm8650_u_14.0.0_oneplus12/mm/memory.c#L4656 [5] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplu= s/sm8650_u_14.0.0_oneplus12/mm/memory.c#L5439 > > Do you have any data how this would perform with the upstream kernel, i.e= . without > a large folio pool and the workaround and if large granularity compressio= n is worth having > without those patches? I=E2=80=99d say large granularity compression isn=E2=80=99t a problem, but = large granularity decompression could be. The worst case would be if we swap out a large block, such as 16KB, but end up swapping in 4 times due to allocation failures, falling back to smaller folios. In this scenario, we would need to perform three redundant decompressions. I will work with Tangquan to provide this data this week. But once we swap in small folios, they remain small (we can't collapse them for mTHP). As a result, the next time, they will be swapped out and swapped in as small folios. Therefore, this potential loss is one-time. > > Thanks, > Usama > > > We also previously experimented with maintaining multiple buffers for > > decompressed > > large blocks in zRAM, allowing upcoming do_swap_page() calls to use the= m when > > falling back to small folios. In this setup, the buffers achieved a > > high hit rate, though > > I don=E2=80=99t recall the exact number. > > > > I'm concerned that this fault-around-like fallback to nr_pages small > > folios may not > > gain traction upstream. Do you have any suggestions for improvement? > > > >> > >>> -v2: > >>> While it is not mature yet, I know some people are waiting for > >>> an update :-) > >>> * Fixed some stability issues. > >>> * rebase againest the latest mm-unstable. > >>> * Set default order to 2 which benefits all anon mTHP. > >>> * multipages ZsPageMovable is not supported yet. > >>> > >>> Tangquan Zheng (2): > >>> mm: zsmalloc: support objects compressed based on multiple pages > >>> zram: support compression at the granularity of multi-pages > >>> > >>> drivers/block/zram/Kconfig | 9 + > >>> drivers/block/zram/zcomp.c | 17 +- > >>> drivers/block/zram/zcomp.h | 12 +- > >>> drivers/block/zram/zram_drv.c | 450 +++++++++++++++++++++++++++++++-= -- > >>> drivers/block/zram/zram_drv.h | 45 ++++ > >>> include/linux/zsmalloc.h | 10 +- > >>> mm/Kconfig | 18 ++ > >>> mm/zsmalloc.c | 232 +++++++++++++----- > >>> 8 files changed, 699 insertions(+), 94 deletions(-) > >> > >> -- > >> Best Regards, > >> Huang, Ying > > Thanks barry