From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11940D5D688 for ; Thu, 7 Nov 2024 20:54:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70B306B0099; Thu, 7 Nov 2024 15:54:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6BAA36B00A4; Thu, 7 Nov 2024 15:54:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 582C26B00A5; Thu, 7 Nov 2024 15:54:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 395176B0099 for ; Thu, 7 Nov 2024 15:54:00 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D6C824017F for ; Thu, 7 Nov 2024 20:53:59 +0000 (UTC) X-FDA: 82760499990.12.9F3B20D Received: from mail-ua1-f43.google.com (mail-ua1-f43.google.com [209.85.222.43]) by imf23.hostedemail.com (Postfix) with ESMTP id 8E845140018 for ; Thu, 7 Nov 2024 20:53:34 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BwcsIQHM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731012753; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8LiI5PmN3yOlhOciy8T3neuTdt71zqYk84uEfxQdpXg=; b=oetcwUHnr6/XgUxTC280GOTkOh9OR+pF8BZcVydCaqAsvYPhRacuN1lrDbmtfWk70rHo6P 8qYQDBrGGu9wTG6OUOsALJPVcLnRQABzRgcVifKehYXZ92Fvp/yOPONYaHPez9CDDzj70m EmzMQ36N6s4AlSgnE0XLkBdwQyfrijY= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BwcsIQHM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731012753; a=rsa-sha256; cv=none; b=P56BTFPet4nZceDi49AsWKHnDakmGk84hWV4zEVJde//BqArLudqWcohlAjGdYRc7petRP tLrKpusxAigqDgFT52RJo3vqitoATwglB7TFRmVZENq4MbnCORVPIxv8ZuwmO3PDhREn28 6THPc8wdzdWwfjjduWFWgSuSQVbkcFU= Received: by mail-ua1-f43.google.com with SMTP id a1e0cc1a2514c-851d6c397bbso505566241.2 for ; Thu, 07 Nov 2024 12:53:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731012837; x=1731617637; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8LiI5PmN3yOlhOciy8T3neuTdt71zqYk84uEfxQdpXg=; b=BwcsIQHM5kww1AfCEB86NOR2Ao9ZHz2PjMEDIHAvE7dVy2NjvK3lRxzI8Byr5ylpTv hyovb0Sr90GBVwmXtDEUtuO4OPewfv9hxLHfXmIqWQpKlybQwazKFXzO5uR+LyllGapu SBEDyixDzvnjF5+raV10wEbGfXNKSX7np4ENhw+UjQuCH77HiXJjzFIEw4pBJjuqEcd1 xDcwRG/kAeAYPTzh4BHWN0ELmafEE1v8oQlmNThlMev1AFhnqb8+KO1OJYWub3JLPmiN kL/5A/S7kC2Tg2At61ksJHQ/uxOhzV29vSdv0nIfusH96nKkGtRsCaOEgoJZiJFqoTKi Ol1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731012837; x=1731617637; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8LiI5PmN3yOlhOciy8T3neuTdt71zqYk84uEfxQdpXg=; b=ZqI6x6dxxHmC92F2+QRpHrXd9k6UoteXI/Zli6lDWXz+0U9jmQ7o6ag1qCAjTXjfSH bIx9LSV0Jp/HUVuwzqH/b9CtAmEwvZzpAoVH8cK6SQNlqvXnrkvAKhz+dQemVipo/RMm 8SDNdMJWbXo7XQYe8mEZoJRLpINwHFG/M6nngKbIDvSZJVT7x9bHFmK4EohHd6Hqxm5g I2jN73vTJPknZbGrVVFPs8KjgZGBrfIEctwr9Jm8o3n1ro7d/iPW41iuZdiN4kv0EYZ6 C7xS+effn7WH7q8LSnrQqoZLm7bYGc874qkBOt07SbIlPpmsg+rN12U3xMAKUHohTB4N C++A== X-Forwarded-Encrypted: i=1; AJvYcCXTfI42Muipwj3pBA9s0ovo2FYpkJCOsWOx9PEpL3FTu3mV6ItXhLVA+i9dpgQdq2sJ80Th5LMHIA==@kvack.org X-Gm-Message-State: AOJu0YzqG/hbqZDU8RXH4KT6pTGc4KiTyTmcHLguZEJh3ABFD8RdHHgt FVSVQO4iLw/3UCb+99sveBJnbP9J063/xRHMuI+m1GJjb+dNS4XIKsB6VV4ifuDaGTNavvDQfZu RHPECnBf7sZiqGM09mltkq/4cGSM= X-Google-Smtp-Source: AGHT+IE6Bpnntr9NX1ALyWqsmGCtjviaoSUOX8WNouvOOOs8uLHtVWlROgS4j1r5o5g1IcomiFrCuTyJcjRlK/ca+0Q= X-Received: by 2002:a05:6102:4192:b0:4a4:8b67:4f73 with SMTP id ada2fe7eead31-4aae139c9afmr668929137.7.1731012836845; Thu, 07 Nov 2024 12:53:56 -0800 (PST) MIME-Version: 1.0 References: <20240327214816.31191-3-21cnbao@gmail.com> <20241021232852.4061-1-21cnbao@gmail.com> <490a923e-d450-4476-a9f5-2a247b6d3a12@gmail.com> In-Reply-To: <490a923e-d450-4476-a9f5-2a247b6d3a12@gmail.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 8 Nov 2024 09:53:46 +1300 Message-ID: Subject: Re: [PATCH RFC 2/2] zram: support compression at the granularity of multi-pages To: Usama Arif Cc: akpm@linux-foundation.org, axboe@kernel.dk, chrisl@kernel.org, corbet@lwn.net, david@redhat.com, kanchana.p.sridhar@intel.com, kasong@tencent.com, linux-block@vger.kernel.org, linux-mm@kvack.org, minchan@kernel.org, nphamcs@gmail.com, senozhatsky@chromium.org, surenb@google.com, terrelln@fb.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, zhengtangquan@oppo.com, zhouchengming@bytedance.com, bala.seshasayee@linux.intel.com, Johannes Weiner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8E845140018 X-Stat-Signature: 151jz95n1a3zxzoyxgriwg6wgbnq8prd X-HE-Tag: 1731012814-643376 X-HE-Meta: U2FsdGVkX1+OSa93hfetuejB/8UZ/BUgGf2ZhOGc3QAPrxCs1TIZqwM4GnQ21p7HroHHnjoGM8z6NvbvtaSogy9AvE/lSQHrdJcn2p5qkfgcUcczeAIhtlsDrT6T2KbSjSAQiiKCvgI7HwQWG6aThRM0IahU0lbagWQ2xXnl4xaWHYlWHGIO6UsUMAMYsqlk8nZRdhPyJtzDc3eChptkSWZ0F7EXlPOU/tE8b19QWqSLMWQ8kKd/3iV/KLGS08E6iLekvApXa6dqD8ZlSaxQZ2wDwYx1FAzMQ7raqFR9KEGcsHvGV6BA8UKPk8fIL5OixT23Bg9u/EouO2NcZO0SMG9c+clrxIQ8HwMul9+jTTIvTekrG7lRPiACPeps5v6cNDS02JJpOdd4zhWg+b6rVtTlL8STrYIAo95JbgWZ2rB1Ep9JlgFwP3U4XiO4hrPzXX8BaOFbinWoCeprbvm6+hOTSoRCGSbCUcy6YRuKnTJ/hEh5UexvFWFGjgjx0cSeokrMxRuSEkZqoPzitX9CPV+nua0txHtgUoGaussBmOb3mgKxUyeoi3gT07USvIoDLCy4Dv7qwDK1CLzLfpuDoHP3oa4HGUtw9WTS87pcp/rlBaw7MnaxYPOI44dVypMqBEGaskCUGhMPOymwfhW1BZQIafmJq6iFlWGogGha6upDdFo2g//esWsCDHdhFY0tQmB9PTX4THib9gYJI2RL1UFYKtBl+g5Ivscg8uAzv0iCkxy6Iu+An/3G307jSHCnaev70ilqkBmSgI9WLEdjXHJ9CNV3QNMni/T0qj9LuFnov9dX1biL1gB+llpmTd3W6xDdxtL77j15xZeJLhpZY0Jk+1gzIbzkYK/z9fcANE8Ah/s5u7gPBQKFGf6HEfaF4tlzZbNclpoB27rmM9hKBXcpORmtYEUq2/rp/Mu//YFrVrCaesEdRNI01vkJEZFQ9W0qnchKASi+7yDbpzW PPYOSYMS KkpE+BjTUQ94yuEDV76wvhxKSsWyV58yTs4AdhmHXyQ8bn3rRp0HL4no1qzM9wt/CA+0LmbcFqPz2q8LjkzQuE5nr9dDdoth4ZnpMKyRpKGenM5ABOOiKzOBDPHVhzurw0e9B4PqEUuxd+eeGXv1RDKBSE6YBtAyvtyS5roH61vo8lGFynSgs+ZWE4gfi4NEG0gr5Vx+ea1Tr+xkNJdtm6cE+kERQGt6NdN6SsRnvEZ2AaQ0VsvPgWK5GUd/BgHSXk0fR/ObydVQ5dkc5frchP1RssaZ+SrB7SfKvP4RAl/VOoZ1PD+uSyjDYlSBuJmZPx1+VXBE/r1SBaZeytTJWJsFDmm+o3pWtUlN6JY8vQdh1kJryiE082VIQSQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Nov 8, 2024 at 12:49=E2=80=AFAM Usama Arif = wrote: > > > > On 07/11/2024 10:31, Barry Song wrote: > > On Thu, Nov 7, 2024 at 11:25=E2=80=AFPM Barry Song <21cnbao@gmail.com> = wrote: > >> > >> On Thu, Nov 7, 2024 at 5:23=E2=80=AFAM Usama Arif wrote: > >>> > >>> > >>> > >>> On 22/10/2024 00:28, Barry Song wrote: > >>>>> From: Tangquan Zheng > >>>>> > >>>>> +static int zram_bvec_write_multi_pages(struct zram *zram, struct b= io_vec *bvec, > >>>>> + u32 index, int offset, struct bio *bio) > >>>>> +{ > >>>>> + if (is_multi_pages_partial_io(bvec)) > >>>>> + return zram_bvec_write_multi_pages_partial(zram, bvec,= index, offset, bio); > >>>>> + return zram_write_page(zram, bvec->bv_page, index); > >>>>> +} > >>>>> + > >>> > >>> Hi Barry, > >>> > >>> I started reviewing this series just to get a better idea if we can d= o something > >>> similar for zswap. I haven't looked at zram code before so this might= be a basic > >>> question: > >>> How would you end up in zram_bvec_write_multi_pages_partial if using = zram for swap? > >> > >> Hi Usama, > >> > >> There=E2=80=99s a corner case where, for instance, a 32KiB mTHP is swa= pped > >> out. Then, if userspace > >> performs a MADV_DONTNEED on the 0~16KiB portion of this original mTHP, > >> it now consists > >> of 8 swap entries(mTHP has been released and unmapped). With > >> swap0-swap3 released > >> due to DONTNEED, they become available for reallocation, and other > >> folios may be swapped > >> out to those entries. Then it is a combination of the new smaller > >> folios with the original 32KiB > >> mTHP. > > > > Hi Barry, > > Thanks for this. So in this example of 32K folio, when swap slots 0-3 are > released zram_slot_free_notify will only clear the ZRAM_COMP_MULTI_PAGES > flag on the 0-3 index and return (without calling zram_free_page on them)= . > > I am assuming that if another folio is now swapped out to those entries, > zram allows to overwrite those pages, eventhough they haven't been freed? Correct. This is a typical case for zRAM. zRAM allows zram_slot_free_notify= () to be skipped entirely (known as miss_free). As long as swap_map[] indicate= s that the slots are free, they can be reused. > > Also, even if its allowed, I still dont think you will end up in > zram_bvec_write_multi_pages_partial when you try to write a 16K or > smaller folio to swap0-3. As want_multi_pages_comp will evaluate to false > as 16K is less than 32K, you will just end up in zram_bio_write_page? Until all slots are cleared from ZRAM_COMP_MULTI_PAGES, these entries remain available for storing small folios. Prior to this, the large block remains intact. For instance, if swap0 to swap3 are free and swap4 to swap7 still reference the old compressed mTHP, writing only to swap0 would modify the large block= . static inline int __test_multi_pages_comp(struct zram *zram, u32 index) { int i; int count =3D 0; int head_index =3D index & ~((unsigned long)ZCOMP_MULTI_PAGES_NR - = 1); for (i =3D 0; i < ZCOMP_MULTI_PAGES_NR; i++) { if (zram_test_flag(zram, head_index + i, ZRAM_COMP_MULTI_PA= GES)) count++; } return count; } a mapping exists between the head index and the large block of zsmalloc. As= long as any entry with the same head index remains, the large block persists. Another possible option is: swap4 to swap7 indexes reference the old large block, while swap0 to swap3 point to new small blocks compressed from small folios. This approach would greatly increase implementation complexity and could also raise zRAM's memory consumption. With Chris's and Kairui's swap allocation optimizations, hopefully, this corner case will remain minimal. > > Thanks, > Usama > > > > Sorry, I forgot to mention that the assumption is ZSMALLOC_MULTI_PAGES_= ORDER=3D3, > > so data is compressed in 32KiB blocks. > > > > With Chris' and Kairui's new swap optimization, this should be minor, > > as each cluster has > > its own order. However, I recall that order-0 can still steal swap > > slots from other orders' > > clusters when swap space is limited by scanning all slots? Please > > correct me if I'm > > wrong, Kairui and Chris. > > > >> > >>> > >>> We only swapout whole folios. If ZCOMP_MULTI_PAGES_SIZE=3D64K, any fo= lio smaller > >>> than 64K will end up in zram_bio_write_page. Folios greater than or e= qual to 64K > >>> would be dispatched by zram_bio_write_multi_pages to zram_bvec_write_= multi_pages > >>> in 64K chunks. So for e.g. 128K folio would end up calling zram_bvec_= write_multi_pages > >>> twice. > >> > >> In v2, I changed the default order to 2, allowing all anonymous mTHP > >> to benefit from this > >> feature. > >> > >>> > >>> Or is this for the case when you are using zram not for swap? In that= case, I probably > >>> dont need to consider zram_bvec_write_multi_pages_partial write case = for zswap. > >>> > >>> Thanks, > >>> Usama > >> > > > > Thanks > > barry > Thanks Barry