From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE383CD1292 for ; Thu, 11 Apr 2024 04:14:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10E6A6B007B; Thu, 11 Apr 2024 00:14:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BF406B0082; Thu, 11 Apr 2024 00:14:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC7E26B0083; Thu, 11 Apr 2024 00:14:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CFFA16B007B for ; Thu, 11 Apr 2024 00:14:39 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3AAE81A09AA for ; Thu, 11 Apr 2024 04:14:39 +0000 (UTC) X-FDA: 81995934678.29.7226CEB Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) by imf25.hostedemail.com (Postfix) with ESMTP id 7818FA000A for ; Thu, 11 Apr 2024 04:14:37 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=E7ZTqS5X; spf=pass (imf25.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.215.181 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712808877; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QRpoNsjOK6j5XqQ61OspVBmQJ87V0XiHLogvxYX/tbU=; b=PTVzAnTJ1nM8LHpkptk3jszE2guWJm03LzoYHCcNi7sMCtoEQnsV/8JTocfvV1UTQSC6M9 Ow2lpKVXVS1HfnU1Ww4ppXIVhJMVZQBkUHQSGNqACTO9Cde1vt2zb2W03cKuWp3Anav9fZ SfR3G0lmRZU6ViyOHao0A5X3DSVu5ss= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712808877; a=rsa-sha256; cv=none; b=cdVrGdPBs+/SFw+DTZQwDGSQuUNXGtjZVM/2VlJXE0Puj72usm2uIGTTdiUbt1EVxU9tVD OddQR1hqxfAY2jDeKR6zEq95sotOv6GGs7H7OuoqWhxOn5SDtubddnY6fLnSY4YOQOfyap uGg08uPpm53R9QpZ2nclt1LqOako1BA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=E7ZTqS5X; spf=pass (imf25.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.215.181 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org Received: by mail-pg1-f181.google.com with SMTP id 41be03b00d2f7-5dcc4076c13so362077a12.0 for ; Wed, 10 Apr 2024 21:14:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1712808876; x=1713413676; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=QRpoNsjOK6j5XqQ61OspVBmQJ87V0XiHLogvxYX/tbU=; b=E7ZTqS5X/8WGnCw4WW+VhPW1T+uZtxMJ50G4l+o4pug5m2+IP0wUf51XJC1FpqlrWn /M5RXvapjY5pvO0E8GLmwGzkibo4S9nN+ftVlcOQx6bFO29KMAJotNVT2OtTKH5Xqsv0 LfTu8pq1WM9RZ0+Lgt4f5ztJq3fP2QF6R3sOk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712808876; x=1713413676; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=QRpoNsjOK6j5XqQ61OspVBmQJ87V0XiHLogvxYX/tbU=; b=NDgXnVnyuSWwm/4BLaF605Te3C1glSNwegnHmhj4eev8XqWZY8UVOpchDULqexea4j AhI+uWVUIuxfxQCR7UzDWv9vnvNT+TNiu9NjlcEZScK8s5G9ri1dyleoXtSfdew70hV9 HOIjBzi5WPUzEflLn489bC72bSDn7eEpdqZuPNEECJIR1wK/KkSEgtb1mvrtEIxlX5/6 SJx0qKBRh/a+FMQe7nhZ4pwDcsHLiC2tepzcJwxHsBlbgZdGkqo7yhBykFwFCxMO3Hpc 6vCKGy4j1J9E9XmGHj74z/tdvUYZbdbi9YqhRT/1Yr91eBvXVT3IbLAUt+wqq0ZkDJX8 k1Wg== X-Forwarded-Encrypted: i=1; AJvYcCXNqpbo108HgvoeDDtxrY5ZMkYoYDQ3U4fIa8QWynVr7D+sFW1W1TUc5SmlLg7UTELFoMK973MQ/9UMt0ICp6WMvCk= X-Gm-Message-State: AOJu0YymHD6D27LiIcUKHFVj9zMbRoab/zEekK/GLnEO+fGR2GxVLPE1 qE4V4xllYVXJj8Wdfuxn1GgsxJujn0NgIzEiwc+s7fo8NHOtA4Xw+OLksp1afg== X-Google-Smtp-Source: AGHT+IGoydN9224pn35l5jD63mMzmg3BIpFzF/YZMufBNif8aUBovCp+Y1xf/odqVVg4Olh9jDdbLg== X-Received: by 2002:a17:902:ea02:b0:1e3:f622:f21a with SMTP id s2-20020a170902ea0200b001e3f622f21amr2122851plg.24.1712808876099; Wed, 10 Apr 2024 21:14:36 -0700 (PDT) Received: from google.com ([2401:fa00:8f:203:f30d:f29e:acb:4140]) by smtp.gmail.com with ESMTPSA id m7-20020a170902db0700b001e446490072sm350021plx.25.2024.04.10.21.14.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Apr 2024 21:14:35 -0700 (PDT) Date: Thu, 11 Apr 2024 13:14:29 +0900 From: Sergey Senozhatsky To: Barry Song <21cnbao@gmail.com> Cc: Sergey Senozhatsky , akpm@linux-foundation.org, minchan@kernel.org, linux-block@vger.kernel.org, axboe@kernel.dk, linux-mm@kvack.org, terrelln@fb.com, chrisl@kernel.org, david@redhat.com, kasong@tencent.com, yuzhao@google.com, yosryahmed@google.com, nphamcs@gmail.com, willy@infradead.org, hannes@cmpxchg.org, ying.huang@intel.com, surenb@google.com, wajdi.k.feghali@intel.com, kanchana.p.sridhar@intel.com, corbet@lwn.net, zhouchengming@bytedance.com, Tangquan Zheng , Barry Song Subject: Re: [PATCH RFC 2/2] zram: support compression at the granularity of multi-pages Message-ID: <20240411041429.GC8743@google.com> References: <20240327214816.31191-1-21cnbao@gmail.com> <20240327214816.31191-3-21cnbao@gmail.com> <20240411014237.GB8743@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: 66md18mhyfmf3be4eppdrxbebpjfzwtn X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7818FA000A X-Rspam-User: X-HE-Tag: 1712808877-440895 X-HE-Meta: U2FsdGVkX18j0VSG4SxYfSBMtpNpsCtn9yiJG/tHGmKOb5BXdDOECxTq5ekNOxximHIDC1RwpaFpoaA8S3LL2mb/4CKGqZnBsTMzMD2c/jcloTpbLEbFUwq+bJ8QF5d3m/+7csD/jvBo6GBkCEcbEeva9yJp/MHPZ5yvQ0PCfuTsaWMWLMx8KEZ+eg6Ef/aIqvk4WualGn42dfNOD88hE/hX106rEKLwmaAyvp53vIDzdezUq6EUBGtdATEmonpVFnmXEAQqlJdbF67GRBI7vvYwv/9d4DqNnjUcxTW654jXjVAhNLzP9S0RUc7xOfKSdqLwSCgBFcHChOeaPUHL+6Xl3pzuH6hY13wbAdQA5vln/1XdG7CurVzc65HsNEyOxt86dBcICjzN5OvwETLIfsy+aSrN7A4pJcFJ9+qKRNJgdPQKyPWSBrMlplZTsS989c0MSa/ci+qJVZFz29Tt8A+Gbq8551j+E2r0yP6/8eEWVUUDQmToziB09m3yGFZpuX89AHARg5k7vY/7HWcH2ERUX7oCpdYHIq3cS7HWguCNWKjn7f4ir3itNWWiil+oXuxbyLNj7kN+s0PWMo7POlXXuLsSgCmOUQ7FM/Q+76YVlObh3pUvaVeSbU1ITsSLaRWoCoXn0Iz9MVz18CHRw9SCRVeEeN6GPCk78UEMEYiXUdiB4WwAmMHLNq4+U8ExFlsoAEHvUmr288YoRWI5GTrQRVPyv7cSvH3VmJPcGBKuEzWPBcXH2sDpbv9kosTvmJM6B1iOq+/xReBDwbKDsfGHc2DFFDnv6aCmolWwwidoQmNQoxcKAWGuloyGTWt09Li7T3nA6ClOWdDTzGwtxNc1vDprpuOUjW4WMh4LOHsdTYZ8OInGChuDh0VEJQzvPl3vc5FIeh2mIag2Y3JO3v1p1Oydw4Wf6Ldqglv9mv8qjTpDDAEsUa5HyiPyLD8Ai3b7vKiDdY0yIAoEss4 7txAoEhn zH25FXzTiUIz3yVAXrg23Qxifks9Jngiw1QMKv1z2hXDk1ak= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On (24/04/11 14:03), Barry Song wrote: > > [..] > > > > > +static int zram_bvec_write_multi_pages_partial(struct zram *zram, struct bio_vec *bvec, > > > + u32 index, int offset, struct bio *bio) > > > +{ > > > + struct page *page = alloc_pages(GFP_NOIO | __GFP_COMP, ZCOMP_MULTI_PAGES_ORDER); > > > + int ret; > > > + void *src, *dst; > > > + > > > + if (!page) > > > + return -ENOMEM; > > > + > > > + ret = zram_read_multi_pages(zram, page, index, bio); > > > + if (!ret) { > > > + src = kmap_local_page(bvec->bv_page); > > > + dst = kmap_local_page(page); > > > + memcpy(dst + offset, src + bvec->bv_offset, bvec->bv_len); > > > + kunmap_local(dst); > > > + kunmap_local(src); > > > + > > > + atomic64_inc(&zram->stats.zram_bio_write_multi_pages_partial_count); > > > + ret = zram_write_page(zram, page, index); > > > + } > > > + __free_pages(page, ZCOMP_MULTI_PAGES_ORDER); > > > + return ret; > > > +} > > > > What type of testing you run on it? How often do you see partial > > reads and writes? Because this looks concerning - zsmalloc memory > > usage reduction is one metrics, but this also can be achieved via > > recompression, writeback, or even a different compression algorithm, > > but higher CPU/power usage/higher requirements for physically contig > > pages cannot be offset easily. (Another corner case, assume we have > > partial read requests on every CPU simultaneously.) > > This question brings up an interesting observation. In our actual product, > we've noticed a success rate of over 90% when allocating large folios in > do_swap_page, but occasionally, we encounter failures. In such cases, > instead of resorting to partial reads, we opt to allocate 16 small folios and > request zram to fill them all. This strategy effectively minimizes partial reads > to nearly zero. However, integrating this into the upstream codebase seems > like a considerable task, and for now, it remains part of our > out-of-tree code[1], > which is also open-source. > We're gradually sending patches for the swap-in process, systematically > cleaning up the product's code. I see, thanks for explanation. Does this sound like this series is ahead of its time? > To enhance the success rate of large folio allocation, we've reserved some > page blocks for mTHP. This approach is currently absent from the mainline > codebase as well (Yu Zhao is trying to provide TAO [2]). Consequently, we > anticipate that partial reads may reach 50% or more until this method is > incorporated upstream. These partial reads/writes are difficult to justify - instead of doing comp_op(PAGE_SIZE) we, in the worst case, now can do ZCOMP_MULTI_PAGES_NR of comp_op(ZCOMP_MULTI_PAGES_ORDER) (assuming a access pattern that touches each of multi-pages individually). That is a potentially huge increase in CPU/power usage, which cannot be easily sacrificed. In fact, I'd probably say that power usage is more important here than zspool memory usage (that we have means to deal with). Have you evaluated power usage? I also wonder if it brings down the number of ZRAM_SAME pages. Suppose when several pages out of ZCOMP_MULTI_PAGES_ORDER are filled with zeroes (or some other recognizable pattern) which previously would have been stored using just unsigned long. Makes me even wonder if ZRAM_SAME test makes sense on multi-page at all, for that matter.