From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 347BBCD128A for ; Thu, 28 Mar 2024 23:19:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0F186B0085; Thu, 28 Mar 2024 19:19:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BE9C6B0087; Thu, 28 Mar 2024 19:19:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 85E8B6B0088; Thu, 28 Mar 2024 19:19:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 694AD6B0085 for ; Thu, 28 Mar 2024 19:19:28 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 176DD121064 for ; Thu, 28 Mar 2024 23:19:28 +0000 (UTC) X-FDA: 81948016416.23.C43644D Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf24.hostedemail.com (Postfix) with ESMTP id 4DC4C18000C for ; Thu, 28 Mar 2024 23:19:26 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gNW6hQpv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711667966; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SIbaw2eMeenxypIQFqFji+0xGTljNvCMCxvoti4qhyA=; b=xJ9ij2wIKlivDwhGz2FLpTJbOufzdDczCwboAMP4pZ9yztetcii0bnDX9CRbvxUlAZPKPY agZPzPHnTQ0P/2UrM+NY78+B8hsPKlOYQgGrRELmFYEM/SmVUTmpQY1QjwQ67gWclz0s0K dXzUTAVYqwoQtlsPvPSXm6cwSkXApQc= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gNW6hQpv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711667966; a=rsa-sha256; cv=none; b=PjDDVQlkHCIGxtJuvyyU8xnV/D3yHbn/ot0g7QW9cOI9tmOfuwGhAxtOloxPiV2tm2LOKy Wj2uSjI4FGeVQr69EPfEhy5sSenzAZBtoKDAlKXB2soHVZ3vX6/KHUHaUyJnT8m2kJWUxN CCNEKkFjzeVRyFJ5zFAzRvKjPe3wIMQ= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-43107ccd7b9so8486321cf.3 for ; Thu, 28 Mar 2024 16:19:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711667965; x=1712272765; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SIbaw2eMeenxypIQFqFji+0xGTljNvCMCxvoti4qhyA=; b=gNW6hQpvTy1EmcevHW3Ddom4o2smhpHzwZeH8ZyvLylv/tYWOipG8DWdC1bq2Sej54 wFiBhrVmbjL+oWU9+oThubwBZy2Zu8NFAhZZ78wWVyeivv5T+RfpXFd75dr+WsOWh5qe Jwcl/3fZI2mCiOflW8dX2WV7taU31+ZtYeQoQL70e416rvplGt+jDNRKqltBjv9W/gJM 5z2apnQuB0w9UuRSnjvHxoL5a2QgqlG+4tp0JVwMZCz34sJ1Fqg5e4LMPx04V9RCMjis yUnJPOnH7oRJUBg9qFiv/+aHHTxAFzsNmEsOoK1DiYeR0LzlyEK3+rzIP9SnVeb79hMN Qvyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711667965; x=1712272765; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SIbaw2eMeenxypIQFqFji+0xGTljNvCMCxvoti4qhyA=; b=J2jelKBs//BeBvXkt4HlV0teS4rpxziqYj5MYI80KPFWMCTnjIJ8RQiNLpIZw+p2TW SlwDhxhEwBohLRkZY7lNJJmXPyQO/R5DASwpZ4ow4Ci38tJmVO7RffVbPtl6UXnI4sq8 8pt5kVxbKKJ7C+eAKGfmjWVsxP+q2YbwOUIRJeoaxB3JLxTI5YuFzapbakU7n3oIvLMb cLYsroYmQKLUvJHiEjfP1+oOvcGKqvPU8kp+KILmouzcv1nft3Ops6KIcltK5tmDGCwR gpR1+cIkSE5Fxq1RHQI80Au2Y/+rwfFPD6+3SDTcqDkpcG6LiailXs+SHvydPuz1e4G8 43SQ== X-Forwarded-Encrypted: i=1; AJvYcCWqJZ83LBEq2uovmt//anbtJTZ4egdjfQ6aUvHpMT4zz5v30EUlcjGGHdgRibtnJQDQgzV036R8uWspF7BHVLSvx2w= X-Gm-Message-State: AOJu0YyqrquOVXKVBb67oaDjnoWxpOLYp7TbKlCsB5iE6xYiQ9ii4hf3 2MpagNTg40qBqAjmUbu1/YtezRC/gGVTy0gvnGEeLhwhTAh+5uEd71M8BDJFUgojuPWOaCDp908 y/ohE0usvkcJxKvdJMFqsUqgrtL4= X-Google-Smtp-Source: AGHT+IFgN+DW8j78guKJpsV7F2scc+l42v5DbNe6u5YAh5XqcRs3LJxE9Z1ZoI0poPiXHYUHsFnTYWHLlbPB/vPFZAc= X-Received: by 2002:a0c:f549:0:b0:698:e89b:6982 with SMTP id p9-20020a0cf549000000b00698e89b6982mr681948qvm.28.1711667965306; Thu, 28 Mar 2024 16:19:25 -0700 (PDT) MIME-Version: 1.0 References: <20240325235018.2028408-1-yosryahmed@google.com> <20240325235018.2028408-7-yosryahmed@google.com> <20240328193149.GF7597@cmpxchg.org> <20240328210709.GH7597@cmpxchg.org> In-Reply-To: <20240328210709.GH7597@cmpxchg.org> From: Nhat Pham Date: Thu, 28 Mar 2024 16:19:14 -0700 Message-ID: Subject: Re: [RFC PATCH 6/9] mm: zswap: drop support for non-zero same-filled pages handling To: Johannes Weiner Cc: Yosry Ahmed , Andrew Morton , Chengming Zhou , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: gfoe59tiqf69cfcxeynqmfzehw8na17c X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4DC4C18000C X-HE-Tag: 1711667966-287894 X-HE-Meta: U2FsdGVkX1/bQ/kMZ71pYiZdJ5DEOIg5rv9hyukM+Gsu6ePE15Y9t/brmf5EU0xsrWFYkWEXlhCsq1AYRiBxJERHtf288BQTp5g0D2DQz5J+cs4eBKVFMTWPD/SHSJjuvlj5RbGZYAc+WI/lyXf3Mhc7j+YHIzYVp2HMKUaFaICC2aBag4EDnprJ7aL3gDINL1XHLr+A5IfDEJPUGJAS60qRP5TfF33XuJVQ8hqwHmNAnkFG3yIO8wM/ZON8+OxE2Ek+tXdye+Op9pC7FvE/eiPR1Le/wLFeDXCBzxJbwv84+IDw3cwlXezlS9ph04nHJ0TdzxqYOVW27vSr/iliYyG1rNwZ4ubcONmmwnCVvYw1I+/jIRr71qye1k+JS55gsKnVxujZfQZ+I6VuMaRUzwtf9tzWgeDsbl4gWMP8s5cy+i8EhbwwbbWgJsCoHHX/3I6qQnndoYRnBVR7iEjvtkI3S52OROUQACL6xpWNkoqe44mq78Rhr9UQreocHv3gxcjUBG4dC2mgO8JjYnAM75lLchTwpd6uBZvC9lCYFFnuqa0nBPtrDrAbM1Lcv8n17PDZf5cJz7tk9KjRimJhy9v/m+fUsX+gHap+rNboJWOT7IGS9usMB1J4g6pbpPtnnQJeYNd8TGFGapei3Mb2mQbW/cbinL4h2uqNONCa9JVdXdfGRbbEBAg0zDrZqvjgGi5H/gue6z3th3Wlp1+2sLcwb9pB0Iau6S18eR3XIAkbhxt2bappM8H27zvLmV8tYxHwNng3XwWcLn71g6gINeY1oKq1jFn6Woj2ZpABWR9+SGS9jAzkyu/qrRk/+YLSymoVCPLVfs6L8MvgY1MnUya7usU/qabFXBLraX30YfyXclg+HlLLbPuQKmGkFzeMALWTf6ystBeIUv9l3uH0Rb39mvudH3qy+5Aq84nATMg3M4rrIw4LRw/JymVqOtGT0RnZTLfpRyPsRjRx6Ty nD9uX3NQ q8MMQfjpjiFIjUNqB241A0mbAObVhNwL1mGE9bdokSDYAZPYkFm0/o+LvuHeROqyY4C7QiK7gchnZVZk74/C3Jxz9rRMz1IbSpSLSMP5eRAgR9s7v2vDIYlVeEaIk2r2aUbGYc6gPh8RQnbXiuHlqOqr4zEFA0ZCNrdNOUbGTVVH/H6K5HKG9n4Sh7pWIXQ2Leowq6WmaxZ+RtDK19l+NdHvofcHm0KyxW9jEdqIvk/Tn69wloOyzRSxR2+NfDfD+I4332otYEZ9nylArYX94IpLFMzG+Mxv0+1iQjbuNLeSGLvPI1nyZPSkEWW1MaK5KjrFnWCMVEWovgW8c93MXEWs/vQAXu2OIuuH+nJcDYICrqbDBwSAhkX74/eEfCEQhFRh0UD2e0tAxkh2TpdoLASb95rVR+6NTEBgCSy8R2JM9uOZVgjjxjZ+F9SAj0Es2ua0r5pJXwNC7NKdWJ380c5bqAQfb5pY1gx9y X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 28, 2024 at 2:07=E2=80=AFPM Johannes Weiner wrote: > > On Thu, Mar 28, 2024 at 01:23:42PM -0700, Yosry Ahmed wrote: > > On Thu, Mar 28, 2024 at 12:31=E2=80=AFPM Johannes Weiner wrote: > > > > > > On Mon, Mar 25, 2024 at 11:50:14PM +0000, Yosry Ahmed wrote: > > > > The current same-filled pages handling supports pages filled with a= ny > > > > repeated word-sized pattern. However, in practice, most of these sh= ould > > > > be zero pages anyway. Other patterns should be nearly as common. > > > > > > > > Drop the support for non-zero same-filled pages, but keep the names= of > > > > knobs exposed to userspace as "same_filled", which isn't entirely > > > > inaccurate. > > > > > > > > This yields some nice code simplification and enables a following p= atch > > > > that eliminates the need to allocate struct zswap_entry for those p= ages > > > > completely. > > > > > > > > There is also a very small performance improvement observed over 50= runs > > > > of kernel build test (kernbench) comparing the mean build time on a > > > > skylake machine when building the kernel in a cgroup v1 container w= ith a > > > > 3G limit: > > > > > > > > base patched % diff > > > > real 70.167 69.915 -0.359% > > > > user 2953.068 2956.147 +0.104% > > > > sys 2612.811 2594.718 -0.692% > > > > > > > > This probably comes from more optimized operations like memchr_inv(= ) and > > > > clear_highpage(). Note that the percentage of zero-filled pages dur= ing > > > > this test was only around 1.5% on average, and was not affected by = this > > > > patch. Practical workloads could have a larger proportion of such p= ages > > > > (e.g. Johannes observed around 10% [1]), so the performance improve= ment > > > > should be larger. > > > > > > > > [1]https://lore.kernel.org/linux-mm/20240320210716.GH294822@cmpxchg= .org/ > > > > > > > > Signed-off-by: Yosry Ahmed > > > > > > This is an interesting direction to pursue, but I actually thinkg it > > > doesn't go far enough. Either way, I think it needs more data. > > > > > > 1) How frequent are non-zero-same-filled pages? Difficult to > > > generalize, but if you could gather some from your fleet, that > > > would be useful. If you can devise a portable strategy, I'd also b= e > > > more than happy to gather this on ours (although I think you have > > > more widespread zswap use, whereas we have more disk swap.) > > > > I am trying to collect the data, but there are.. hurdles. It would > > take some time, so I was hoping the data could be collected elsewhere > > if possible. > > > > The idea I had was to hook a BPF program to the entry of > > zswap_fill_page() and create a histogram of the "value" argument. We > > would get more coverage by hooking it to the return of > > zswap_is_page_same_filled() and only updating the histogram if the > > return value is true, as it includes pages in zswap that haven't been > > swapped in. > > > > However, with zswap_is_page_same_filled() the BPF program will run in > > all zswap stores, whereas for zswap_fill_page() it will only run when > > needed. Not sure if this makes a practical difference tbh. > > > > > > > > 2) The fact that we're doing any of this pattern analysis in zswap at > > > all strikes me as a bit misguided. Being efficient about repetitiv= e > > > patterns is squarely in the domain of a compression algorithm. Do > > > we not trust e.g. zstd to handle this properly? > > > > I thought about this briefly, but I didn't follow through. I could try > > to collect some data by swapping out different patterns and observing > > how different compression algorithms react. That would be interesting > > for sure. > > > > > > > > I'm guessing this goes back to inefficient packing from something > > > like zbud, which would waste half a page on one repeating byte. > > > > > > But zsmalloc can do 32 byte objects. It's also a batching slab > > > allocator, where storing a series of small, same-sized objects is > > > quite fast. > > > > > > Add to that the additional branches, the additional kmap, the extr= a > > > scanning of every single page for patterns - all in the fast path > > > of zswap, when we already know that the vast majority of incoming > > > pages will need to be properly compressed anyway. > > > > > > Maybe it's time to get rid of the special handling entirely? > > > > We would still be wasting some memory (~96 bytes between zswap_entry > > and zsmalloc object), and wasting cycling allocating them. This could > > be made up for by cycles saved by removing the handling. We will be > > saving some branches for sure. I am not worried about kmap as I think > > it's a noop in most cases. > > Yes, true. > > > I am interested to see how much we could save by removing scanning for > > patterns. We may not save much if we abort after reading a few words > > in most cases, but I guess we could also be scanning a considerable > > amount before aborting. On the other hand, we would be reading the > > page contents into cache anyway for compression, so maybe it doesn't > > really matter? > > > > I will try to collect some data about this. I will start by trying to > > find out how the compression algorithms handle same-filled pages. If > > they can compress it efficiently, then I will try to get more data on > > the tradeoff from removing the handling. > > I do wonder if this could be overthinking it, too. > > Double checking the numbers on our fleet, a 96 additional bytes for > each same-filled entry would result in a > > 1) p50 waste of 0.008% of total memory, and a > > 2) p99 waste of 0.06% of total memory. > > And this is without us having even thought about trying to make > zsmalloc more efficient for this particular usecase - which might be > the better point of attack, if we think it's actually worth it. > > So my take is that unless removing it would be outright horrible from > a %sys POV (which seems pretty unlikely), IMO it would be fine to just > delete it entirely with a "not worth the maintenance cost" argument. > > If you turn the argument around, and somebody would submit the code as > it is today, with the numbers being what they are above, I'm not sure > we would even accept it! The context guy is here :) Not arguing for one way or another, but I did find the original patch that introduced same filled page handling: https://github.com/torvalds/linux/commit/a85f878b443f8d2b91ba76f09da21ac0af= 22e07f https://lore.kernel.org/all/20171018104832epcms5p1b2232e2236258de3d03d1344d= de9fce0@epcms5p1/T/#u The number looks impressive, and there is some detail about the experiment setup, but I can't seem to find what the allocator + compressor used. Which, as Johannes has pointed out, matters a lot. A good compressor (which should work on arguably the most trivial data pattern there is) + a backend allocator that is capable of handling small objects well could make this case really efficient, without resorting to special handling at the zswap level.