From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 998F4CD1284 for ; Fri, 29 Mar 2024 02:08:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 162296B008C; Thu, 28 Mar 2024 22:08:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 113266B0092; Thu, 28 Mar 2024 22:08:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1C066B0093; Thu, 28 Mar 2024 22:08:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D3E8C6B008C for ; Thu, 28 Mar 2024 22:08:17 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 986ED80F01 for ; Fri, 29 Mar 2024 02:08:17 +0000 (UTC) X-FDA: 81948441834.07.CAB5FF7 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf11.hostedemail.com (Postfix) with ESMTP id B443840004 for ; Fri, 29 Mar 2024 02:08:15 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=M+iOPeWS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711678095; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=11c8G8lctTX5FPuHgPoxiYuBMcabHIl74EqlnLaTfGY=; b=W1wa2Pv+xSff8K4bXDimptK1n1PWE4YDJ77qRq20FHMezhBtf7+2WS+W1YaN4lXkU8R2pZ XPNXrF3RUjQH5SGBNX5DcTVFp8qgbGrDaAJztaxmqXk3+h2Lkzl0QQY5erpVqx2yzU5gWX Jizsq21BXiO1yBa8NWaZMcTyLDAAtv8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=M+iOPeWS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711678095; a=rsa-sha256; cv=none; b=Nlb3yPfWgaXxsnlRhiivi2WGwdPqakPOVoSFSmhQ0G2SFZRo1VUoUTP+9a4nKfnA3LACAs 4BB8Kdc9+ZW+opeiI6XiJgF59ATolwVkBP5TtiFXsciN/qwfvN2O70fly4Gcx0yvC7o/Ub umA4JPCOGhqtPL3bGVKgMyC53mm2RJs= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-a4702457ccbso210255566b.3 for ; Thu, 28 Mar 2024 19:08:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711678094; x=1712282894; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=11c8G8lctTX5FPuHgPoxiYuBMcabHIl74EqlnLaTfGY=; b=M+iOPeWSw+3Y7OWjIuZiz/JZAsmIobDE233uKkSA4SQBmkHuApIbrnXvVQYpW7m5Wo Kj8izxab4qSoafIZ2+1N7ZIVwj2dzB3kL9t6k3YzPKybAaCxUWlWDNu1/WDLIFQdrjlb LW8o2hPPb1g6qVIY2R502rxY7IxoCIhUAGKZnYmq1WcYVt2wcZKsLAyc0+8U8TxXpj9K 62tMv6UyAE73T1UwKl98KPL7DIVJSzAgzoZ3DB3kSUN3sZj62v5+zgYRFiWrikuHg4Tq ZiD91FNrrNbpZQ5OAZt5tu+vit9seRYwt4OpYda/mRr/nu224c90+038m2pAvkwFM9aR o9JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711678094; x=1712282894; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=11c8G8lctTX5FPuHgPoxiYuBMcabHIl74EqlnLaTfGY=; b=U1hzxKTaFma/rQCNMogJII0CM1+sg5Na34jGmdh8OkW1/5ZWR553bkjeKFU72ZD45X KN6uIIOkUDwh6jIiBt1i7hzV/DOHhpc9yOU/jNDqZt/hv3H8MpHPit3wbLQDcdZrKJMD caF3PvwYVN58LcK8wtYhApcSt5eS7O4o/xFz/yQGyismKK7zAthQhwUA67+cAFbBVBs/ PKXKxdTI/WixSjwN3yLY576GnOIeBEiHiM75ypFmjT6SyILqxyJTQR1WqAmQ1WQBSA+1 TEwRbr0smfp93wERyGuuvGHvkqapoExEeWODNNhg9PL+5VfnOSNq8bIsNkmcfL0EkYfL 3dig== X-Forwarded-Encrypted: i=1; AJvYcCWnqtU0P/psNlfm0vyEwCitcQglip5xtUGmbc90U5B3WPcdTh82LDCYPmyvjt0VUHPhF6+UF+YsJs5VlgqQlimZZ6o= X-Gm-Message-State: AOJu0YzZIJMb4hXoNH/TjKmtbFK7HQe5nVAyyumd16Yg429c1rcWh6R6 arE/VPJbT5+3t28tSSldvZ2poCInZZkUCYxMhmSOXf+qx/rqeO3WRlXrggupjcjBAsPfZnEh5A4 7EKbD4E9Fj2uWmKaJSLnUk6NMIes8jBlDX77N X-Google-Smtp-Source: AGHT+IG0Rm5qu9wJswt12/atXr3sJnHBizAISo0bEl3j1aBkAKeucvrdhtY4eTVd689T82N8PB+fhFFotaWQSZh11+M= X-Received: by 2002:a17:906:ca0f:b0:a46:e8c1:11ac with SMTP id jt15-20020a170906ca0f00b00a46e8c111acmr578332ejb.18.1711678093875; Thu, 28 Mar 2024 19:08:13 -0700 (PDT) MIME-Version: 1.0 References: <20240325235018.2028408-1-yosryahmed@google.com> <20240325235018.2028408-7-yosryahmed@google.com> <20240328193149.GF7597@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Thu, 28 Mar 2024 19:07:37 -0700 Message-ID: Subject: Re: [RFC PATCH 6/9] mm: zswap: drop support for non-zero same-filled pages handling To: Nhat Pham Cc: Johannes Weiner , Andrew Morton , Chengming Zhou , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: B443840004 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: k4r895j94zf71z3jii7zxbjoop1waj3n X-HE-Tag: 1711678095-114241 X-HE-Meta: U2FsdGVkX18E21ds8UuLz8xH7SKartXwYS/eZWef2HlJr/bATho7oksUeIMLLxwCWjd3cbUIWPq9yMyiDYvi4Wp8BJldFlQ4O3cbWTWonGmS0uh7VZyIzT0eCt5TDbwvGgaIw4/jcV6BfDL39nO7s4gkzgkNIEqoYQ5ZUdRXr81xdYlco49PAo1EtGzfJWUxo4jbaKMbpL72xU+ZBANeRdptmpL+mk2qj4AIoBCV1vF4bFcZiHXSPxy2y7OyTMypFKozd0GB4LYmJyZ7Bi0BfUPJwJsZIeg8Q+08/wQG/Cur9vlVrbBbira6/gkYANSYMyBG7Xx19jOzSxMI/b8exceXXMR7YKVwoLzr2ZqEuCrXm9zi5DE9xNuxhQcDpD5K27WOfpHdVKLg5mKzJp5Hed+zuLXaSxacX8pSBhUfeCHoUTH5FJ5i5B/rmuII6SwlaqfEFSkwxnXDi1mYWUJS4BNeay5XSvjXjHFxjmgCjpeTwU0cPXYKxXRPrrjsJZbdviry9hrhMtWc1GawbwIIeExp5JgZ2UK1meaD4LSrX/l0z/9/EgX20VgSNhNnZMMgfrC1fbKdywdjwbMwE4EwiyKOl4Ps8kI71eVImFVzXNCnF5GMHdLlkNAlYde/fwfFnnuEK+J5qhjY5H1AXZCanMDnUXvqrLhZnz6Llf4Ak3mjHVDxxHbTr72GaVxMFcmwClkTWjIOpnt7MV6ooWwlP2QjA5BMEEPVQ4J34cXmdKxgzpprcTbNTA3aM7R5Tx9T/fJ2iIZ39MsHR+/sskiRIts+nikIdIdTNMNfOYWu46s21hayrCD0/U69uK0eCIIzjJNl0VQi3aYpkxaVxPZ6bXRhWSbwgdaa7SpP4X8PeAa2CyMnobvcfcgxxxpx3diMYS0cKEGSSEle5eQU84GaSsIw9MT0pzmYVkHthXfTHIv6jeMRyU0r37G+BHrrumPZndjMEdF4B6VKKKdG8mF PBE1mJI7 ho+X5CBxNQzXip9iJQBytXF63pP/AGUvAJRj4vMYV7vLb7G32C1W0CkjC3/Rz3SQFQPHlVk+ISd9ZPfvX0u9q1l+SiYusBp3mjMTQsf/Fpw//U4DxvCzlU35TsA6irXB/vyhn+ES4EYdC/A689ik+aAztXvCo89irIZBLBr8/7naTRdnVLJ8AHMurzjtTGfUjCLsS/M5He66kwK6xPYuD/7gKabv/v4OxCvb35oyoiP6zcSRdSrbPcY1q56kFOtETjy3jjSvhhv9zk0EeLZaYY1XBZ06k8BrX5PzLsEv991czz9c+2nr2/ERcxj6IVeVSmAef+N/LYmeGm0EJl3S1zTFGHNluniarJJu2gBJCC0PcG/HePmRJkq5iQpFZSEhbUI89sIFpTZvVxmg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001467, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 28, 2024 at 4:34=E2=80=AFPM Nhat Pham wrote= : > > On Thu, Mar 28, 2024 at 1:24=E2=80=AFPM Yosry Ahmed wrote: > > > > On Thu, Mar 28, 2024 at 12:31=E2=80=AFPM Johannes Weiner wrote: > > > > > > On Mon, Mar 25, 2024 at 11:50:14PM +0000, Yosry Ahmed wrote: > > > > The current same-filled pages handling supports pages filled with a= ny > > > > repeated word-sized pattern. However, in practice, most of these sh= ould > > > > be zero pages anyway. Other patterns should be nearly as common. > > > > > > > > Drop the support for non-zero same-filled pages, but keep the names= of > > > > knobs exposed to userspace as "same_filled", which isn't entirely > > > > inaccurate. > > > > > > > > This yields some nice code simplification and enables a following p= atch > > > > that eliminates the need to allocate struct zswap_entry for those p= ages > > > > completely. > > > > > > > > There is also a very small performance improvement observed over 50= runs > > > > of kernel build test (kernbench) comparing the mean build time on a > > > > skylake machine when building the kernel in a cgroup v1 container w= ith a > > > > 3G limit: > > > > > > > > base patched % diff > > > > real 70.167 69.915 -0.359% > > > > user 2953.068 2956.147 +0.104% > > > > sys 2612.811 2594.718 -0.692% > > > > > > > > This probably comes from more optimized operations like memchr_inv(= ) and > > > > clear_highpage(). Note that the percentage of zero-filled pages dur= ing > > > > this test was only around 1.5% on average, and was not affected by = this > > > > patch. Practical workloads could have a larger proportion of such p= ages > > > > (e.g. Johannes observed around 10% [1]), so the performance improve= ment > > > > should be larger. > > > > > > > > [1]https://lore.kernel.org/linux-mm/20240320210716.GH294822@cmpxchg= .org/ > > > > > > > > Signed-off-by: Yosry Ahmed > > > > > > This is an interesting direction to pursue, but I actually thinkg it > > > doesn't go far enough. Either way, I think it needs more data. > > > > > > 1) How frequent are non-zero-same-filled pages? Difficult to > > > generalize, but if you could gather some from your fleet, that > > > would be useful. If you can devise a portable strategy, I'd also b= e > > > more than happy to gather this on ours (although I think you have > > > more widespread zswap use, whereas we have more disk swap.) > > > > I am trying to collect the data, but there are.. hurdles. It would > > take some time, so I was hoping the data could be collected elsewhere > > if possible. > > > > The idea I had was to hook a BPF program to the entry of > > zswap_fill_page() and create a histogram of the "value" argument. We > > would get more coverage by hooking it to the return of > > zswap_is_page_same_filled() and only updating the histogram if the > > return value is true, as it includes pages in zswap that haven't been > > swapped in. > > > > However, with zswap_is_page_same_filled() the BPF program will run in > > all zswap stores, whereas for zswap_fill_page() it will only run when > > needed. Not sure if this makes a practical difference tbh. > > > > > > > > 2) The fact that we're doing any of this pattern analysis in zswap at > > > all strikes me as a bit misguided. Being efficient about repetitiv= e > > > patterns is squarely in the domain of a compression algorithm. Do > > > we not trust e.g. zstd to handle this properly? > > > > I thought about this briefly, but I didn't follow through. I could try > > to collect some data by swapping out different patterns and observing > > how different compression algorithms react. That would be interesting > > for sure. > > > > > > > > I'm guessing this goes back to inefficient packing from something > > > like zbud, which would waste half a page on one repeating byte. > > > > > > But zsmalloc can do 32 byte objects. It's also a batching slab > > > allocator, where storing a series of small, same-sized objects is > > > quite fast. > > > > > > Add to that the additional branches, the additional kmap, the extr= a > > > scanning of every single page for patterns - all in the fast path > > > of zswap, when we already know that the vast majority of incoming > > > pages will need to be properly compressed anyway. > > > > > > Maybe it's time to get rid of the special handling entirely? > > > > We would still be wasting some memory (~96 bytes between zswap_entry > > and zsmalloc object), and wasting cycling allocating them. This could > > be made up for by cycles saved by removing the handling. We will be > > saving some branches for sure. I am not worried about kmap as I think > > it's a noop in most cases. > > A secondary effect of the current same-filled page handling is that > we're not considering them for reclaim. Which could potentially be > beneficial, because we're not saving much memory (essentially just the > zswap entry and associated cost of storing them) by writing these > pages back - IOW, the cost / benefit ratio for reclaiming these pages > is quite atrocious. Yes, but I think this applies even without same-filled pages. Johannes mentioned that zsmalloc could store compressed pages down to 32 bytes in size. If these are common, it would be absurd to them out too. We already have some kind of heuristic in the shrinker to slowdown writeback if the compression ratio is high. Perhaps it's worth skipping writeback completely for some pages on the LRU, even if this means we violate the LRU ordering. We already do this for same-filled pages, so it may make sense to generalize it. > > Again, all of this is just handwaving without numbers. It'd be nice if > we can have more concrete data for this conversation :P