From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAF0ECD1297 for ; Thu, 11 Apr 2024 02:04:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BF506B007B; Wed, 10 Apr 2024 22:04:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66F6F6B0095; Wed, 10 Apr 2024 22:04:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55EE66B009B; Wed, 10 Apr 2024 22:04:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3BE3B6B007B for ; Wed, 10 Apr 2024 22:04:01 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E93851C0E3D for ; Thu, 11 Apr 2024 02:04:00 +0000 (UTC) X-FDA: 81995605440.02.4E55DC3 Received: from mail-vk1-f171.google.com (mail-vk1-f171.google.com [209.85.221.171]) by imf19.hostedemail.com (Postfix) with ESMTP id 2F8041A0011 for ; Thu, 11 Apr 2024 02:03:59 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NwEbXPRg; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712801039; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S9SFCjfw8oIFA8MLTpiWyALkIoIxQkZEh4iCJuRNDWY=; b=a3VxmxHKZSgSaWBTp1xFlJw6wIP3iF2X/v0ksadMbnEgygqeBBZJeOj2c8KECXTu0JrA2J liCQzqIQ04y78RXz9czaWqcOZb64YQzid+aW+rSGhF9iqbl8HXd3LHdSvoSuAlhN0pfthK hWxN73tFuT2Qn86p0+mbecHt5EoyDi4= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NwEbXPRg; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712801039; a=rsa-sha256; cv=none; b=tlHWcYcfsjgeGNACoiqoeMEBTVLJXpTf7mPjU6bx75nxq3/YLEPmNtgjnejdI04M6gXtDU kcbHYAvS61yeW9g/nACGi/gqZZMdpa8oMx4uxX43VtdmXvM1SdpIEpciyeHyfUiALTrP1R T+zGjlYGvwXyGnghq91iFHx/JeSsy1A= Received: by mail-vk1-f171.google.com with SMTP id 71dfb90a1353d-4dac6802e7aso1322839e0c.0 for ; Wed, 10 Apr 2024 19:03:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712801038; x=1713405838; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=S9SFCjfw8oIFA8MLTpiWyALkIoIxQkZEh4iCJuRNDWY=; b=NwEbXPRgYyZAocWDsenipPEdfxm6DSKEhwV7PqHR/WHDSoyAtHVbyAJlPtk0LMFQnn h5w+ZksHFRJapTPCEG940b5gOe0ABFeYnueIPLVEzYty/WmcIFy+BqUVfCmDjz2nto7M Y6SGloGZG1n/0TjjHcveqvelCz2GV+1uhklKxBDK7AoP+auPrtxtzPHRssOUERKlt3L8 fFcnySzvjCdE2W6LZ/C20XKmOnnoD2c2nVObPc2O7eM3qzMg/ftmWOIhUFYV5mIg5p8p 0VjlK8BDceoOazKseCmhS6syQvc6bazyhB0t+pUzKfyTWl5ASwEDugebJRjZ8Psm/bH7 7quQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712801038; x=1713405838; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S9SFCjfw8oIFA8MLTpiWyALkIoIxQkZEh4iCJuRNDWY=; b=H6hlpqiw0Mrig+cs2MXqanQdv3Flx9uSjZ6eaj5WZnbFLUeG2TKnBjxSVqI3vMZkfN iktFEdl0/Dl1EkoErE4xBFSh9QHZSv6RoKjIA3aryyoI7Kp4OimSSl6NQExJhHrbTp6O rPKcVYR/zPqwGdz9jlqWq46oUaSZilxILB/gML9xSNYNub7S6ZT80QlgL/U9LGhGyDPg yN651y9wl5mbRLcVyjp8o9spqkqenamkf/s5eWMOBPFWBwBKLYUG1tNpYAlS/tCviYdF ASdaudwRQCIHUhqZicH9vpicTKs4vo8GGGssRU/j8fkAct+LZ6lWDL4ibKkje6WGfyQw xjDw== X-Forwarded-Encrypted: i=1; AJvYcCVf6Zqu4Glomw/Xn9sqkxcARsHws7HCgU0iANyA5wjwq3BSsGqYkaeo3+acoBu6ioZ7IfvbXU0l/rRoB4vswhQMiGk= X-Gm-Message-State: AOJu0Yxiv+XScNQF82a0WU3b1Q5ug0PCtMLSSLSXTkZu4e2N4M22AQO1 JxdstoSHPKTCMEbO0szM1JxKWqIkkmk7+D43Gpjf1y/v9nMjIWrfUnT+4fX1xYssLbfI2foTvY6 Stu1N3QEH3uI4Lz2j2jbq/VJIkes= X-Google-Smtp-Source: AGHT+IGs5YjDMuq3uxEYMciqHhp2xdo7YnYO7u6J3RNNTSpRFbyqvTQmvevpZPzlTLzugoKJcWQ2qbLLKJCBkXD7WNU= X-Received: by 2002:a05:6122:308e:b0:4d4:17c5:8605 with SMTP id cd14-20020a056122308e00b004d417c58605mr5106379vkb.7.1712801038248; Wed, 10 Apr 2024 19:03:58 -0700 (PDT) MIME-Version: 1.0 References: <20240327214816.31191-1-21cnbao@gmail.com> <20240327214816.31191-3-21cnbao@gmail.com> <20240411014237.GB8743@google.com> In-Reply-To: <20240411014237.GB8743@google.com> From: Barry Song <21cnbao@gmail.com> Date: Thu, 11 Apr 2024 14:03:46 +1200 Message-ID: Subject: Re: [PATCH RFC 2/2] zram: support compression at the granularity of multi-pages To: Sergey Senozhatsky Cc: akpm@linux-foundation.org, minchan@kernel.org, linux-block@vger.kernel.org, axboe@kernel.dk, linux-mm@kvack.org, terrelln@fb.com, chrisl@kernel.org, david@redhat.com, kasong@tencent.com, yuzhao@google.com, yosryahmed@google.com, nphamcs@gmail.com, willy@infradead.org, hannes@cmpxchg.org, ying.huang@intel.com, surenb@google.com, wajdi.k.feghali@intel.com, kanchana.p.sridhar@intel.com, corbet@lwn.net, zhouchengming@bytedance.com, Tangquan Zheng , Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 2F8041A0011 X-Stat-Signature: t9a8fag1enitpdyue1trthykssg1xdag X-Rspam-User: X-HE-Tag: 1712801039-189817 X-HE-Meta: U2FsdGVkX18aHJ5QWFjmdzvDJhyoZb89a2ofBRHYeNbxxKYz2I0RiCYFX+SUWo1X2QIrrYB8M5Mb1VscO56pQtufe5Xr5pLIJCrzr4uInucWJ19AkmFFu9API+oqOru9bJZq4HRZXvj31jWqF8k05A46hC53y06OHnzMDpZh1aS7grnu9X336Zz9tTKNbfx8akAR1snrcrqXdG6DSDY9bl/4BD8z+7VhaZDXiLBOmEAUrINeZ2L46mJ8NlviX1ZdjZoWNqPPt4rKgPK+MCzkNqDhV+YOhgpkiIX6cHi7cjvmCOhm1OQQE2q3cH8KqZ0cxDyrtFWesxnIeAZvNHipaeNIRgBrutY4puvO24ePIrZcj/mE94xq6UhuRu0xLoOafRnba6JaJ6t3QFRZrfornbz5gQDXmcySE9YA6pDLFdMMmGOjiJLrQV7KsYPzM+WXDMraia30pfEdQV6dDdO+AiBU1Zc6o+aYAWuvVSJ0Az0quSHzJirBHUtaDB/jHTMS0aqM3WVxOouZMujdLrZ7zIJt29wC/HYVN2ClAQgwzWoaIOi8a8DtDgt7u85h9oRFp+CfvahPesRm0yIBSCUiIAsZByFMrBL51jMw0zTSs88sIYekDFu0j5rCgoSP0Y6ZqWgd0GRrhqHBGJFk5JGN2e7koPsUr0mPpF3rfiTbJ8q7TC3xo2r0sspijJ+LWB45tu7c/CJT5tt84cl0/AGwC+TyYoZo4k+HYKG8O7OUQeqKRQtoYRHS33H0Rt7/iVj3LnmxV9USmcfElPw5Ty5KEJdkhClWfeD5YmtVVUq4aDMg16Z48Zfdw2vIhlXivDwMYo4RVvSmSzLvREjxSjpgWC3kFZfQRIaOwpOeO014PqulF9yXjPA0qJkiUm0yRX61KHDw3A3RyWzN3tmGu5BBesEuWF81uyYHXUk8WOkvEoZCATATc7ID1HMtFhUa/rw2k6Sf7XBLFsd6UZ5ZWvT oIugEkUN 2397pmPVo2i4Aa0ibO7qKxjUVSky3ZO5547iAfv58rHeZ3+sh0p05ik0eKhFf/Vo7Tb0ReFSZGuo8pNkMGnBFTXL066iv4FrfBVuRvSkSjwdT9bOOjJDWnR+2/kmiYpY8CEyP6nmZ7priDyeKAKdt9Ty0UIUkj7wFsna9Cc5JbPsxviXsrOUo6cgyMeNqeH5mPFMP86UthkGIlOgcFIoQ1qzYAUUlMab6NPvcSeNU39/UhyoCct9LfE/Rj0tMxQ/6r4DjLH4J2Xh8soJuSagYfLvl/yAbvtvaLSw455u2gA15Q6LDR6W+Yn3gZoVkVIEWG0Z27hSEcuktlZJ7rvDAX5cFURfs59mvZqOLPL1rKS3ZqqzxSv7P9MMOx1h9idtoWqe2KAx8ttx/q248m4aIGV5Ety5a2j6MmDB3jkmOr5l7Bq/WjFWx2SCBod1f29W12GupuDAz/rgu+iIeO9mTCM25vShB+5k1B0NFDdgZx+fZub3ooVPLdtvk103IP22jKTfPCPm1oLuW0kc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 11, 2024 at 1:42=E2=80=AFPM Sergey Senozhatsky wrote: > > On (24/03/28 10:48), Barry Song wrote: > [..] > > +/* > > + * Use a temporary buffer to decompress the page, as the decompressor > > + * always expects a full page for the output. > > + */ > > +static int zram_bvec_read_multi_pages_partial(struct zram *zram, struc= t bio_vec *bvec, > > + u32 index, int offset) > > +{ > > + struct page *page =3D alloc_pages(GFP_NOIO | __GFP_COMP, ZCOMP_MU= LTI_PAGES_ORDER); > > + int ret; > > + > > + if (!page) > > + return -ENOMEM; > > + ret =3D zram_read_multi_pages(zram, page, index, NULL); > > + if (likely(!ret)) { > > + atomic64_inc(&zram->stats.zram_bio_read_multi_pages_parti= al_count); > > + void *dst =3D kmap_local_page(bvec->bv_page); > > + void *src =3D kmap_local_page(page); > > + > > + memcpy(dst + bvec->bv_offset, src + offset, bvec->bv_len)= ; > > + kunmap_local(src); > > + kunmap_local(dst); > > + } > > + __free_pages(page, ZCOMP_MULTI_PAGES_ORDER); > > + return ret; > > +} > > [..] > > > +static int zram_bvec_write_multi_pages_partial(struct zram *zram, stru= ct bio_vec *bvec, > > + u32 index, int offset, struct bio *bio= ) > > +{ > > + struct page *page =3D alloc_pages(GFP_NOIO | __GFP_COMP, ZCOMP_MU= LTI_PAGES_ORDER); > > + int ret; > > + void *src, *dst; > > + > > + if (!page) > > + return -ENOMEM; > > + > > + ret =3D zram_read_multi_pages(zram, page, index, bio); > > + if (!ret) { > > + src =3D kmap_local_page(bvec->bv_page); > > + dst =3D kmap_local_page(page); > > + memcpy(dst + offset, src + bvec->bv_offset, bvec->bv_len)= ; > > + kunmap_local(dst); > > + kunmap_local(src); > > + > > + atomic64_inc(&zram->stats.zram_bio_write_multi_pages_part= ial_count); > > + ret =3D zram_write_page(zram, page, index); > > + } > > + __free_pages(page, ZCOMP_MULTI_PAGES_ORDER); > > + return ret; > > +} > > What type of testing you run on it? How often do you see partial > reads and writes? Because this looks concerning - zsmalloc memory > usage reduction is one metrics, but this also can be achieved via > recompression, writeback, or even a different compression algorithm, > but higher CPU/power usage/higher requirements for physically contig > pages cannot be offset easily. (Another corner case, assume we have > partial read requests on every CPU simultaneously.) This question brings up an interesting observation. In our actual product, we've noticed a success rate of over 90% when allocating large folios in do_swap_page, but occasionally, we encounter failures. In such cases, instead of resorting to partial reads, we opt to allocate 16 small folios a= nd request zram to fill them all. This strategy effectively minimizes partial = reads to nearly zero. However, integrating this into the upstream codebase seems like a considerable task, and for now, it remains part of our out-of-tree code[1], which is also open-source. We're gradually sending patches for the swap-in process, systematically cleaning up the product's code. To enhance the success rate of large folio allocation, we've reserved some page blocks for mTHP. This approach is currently absent from the mainline codebase as well (Yu Zhao is trying to provide TAO [2]). Consequently, we anticipate that partial reads may reach 50% or more until this method is incorporated upstream. [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplu= s/sm8550_u_14.0.0_oneplus11 [2] https://lore.kernel.org/linux-mm/20240229183436.4110845-1-yuzhao@google= .com/ Thanks Barry