From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A67CC27C52 for ; Thu, 6 Jun 2024 23:11:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B3076B009E; Thu, 6 Jun 2024 19:11:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13C2C6B009F; Thu, 6 Jun 2024 19:11:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1ED86B00A0; Thu, 6 Jun 2024 19:11:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D1EB26B009E for ; Thu, 6 Jun 2024 19:11:38 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8F2D2C01AA for ; Thu, 6 Jun 2024 23:11:38 +0000 (UTC) X-FDA: 82202012676.16.D463EBD Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf18.hostedemail.com (Postfix) with ESMTP id B89761C0010 for ; Thu, 6 Jun 2024 23:11:36 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ACMaoUpG; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717715496; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q+3KPt2nJvf1TgYW57/HxN+zoujEHesZU8KKVd4ivnY=; b=Dd5bT7NZELemgM2MziXR62D+iEtTCVwXMw0RC0q6nepid/1jn/f6lGwP6UNsPfOvjGqhuB RxhyT8Eu14mAfyBdZldaJ4KJZG7ialEsfxeoVsjFfyxKy/rffWhDPkNg+X1mzrQXeYKL6t tNf/z26Jj9a7XDC3n9eLdyZFNHRvSTg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717715496; a=rsa-sha256; cv=none; b=44N0IzZdpsBkB1nOPeFa+IJOJ3PvfbiIXyTMs/XrsyVo95V8aOHDU+Oeq9eCzOgzbv5Vrb Klzt7cpvBechRtl8ccacaPQCjLZagCM6oL6QgwmcWEKsZvq/vLVhTD54j3pzA2xKLpZRCQ NITHgpYxuYzyECcg2kBaTPC+LAXegOQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ACMaoUpG; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-57aaaf0eb0dso1443874a12.3 for ; Thu, 06 Jun 2024 16:11:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717715495; x=1718320295; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=q+3KPt2nJvf1TgYW57/HxN+zoujEHesZU8KKVd4ivnY=; b=ACMaoUpGRM4aWIGLLCj/jjmTtmMuV8AcgJi4AeiGUw1L6tAF9xib70MKmbOLgsSYtw NQ12axBsiHVUhLh2TCWBecjIq2R3JkYBWkFQC6Hk3luvowpeVHXl4GehoQ/r7i1XIlIC +iA0KJ8VITJ1AdBN2IbFaewUOTW72qxJ63dYKPN3VwIoKNrm01kHWMJXFLeMX6GWp6rj 1M9dcURHjw2ucgYgFJkzBmYCJiQB2lvgP0skyRBPphGtzTXuoevvFKzeAmzQ9osAOkjI irJR65+iTUo94yoLCxS6bBY6FaXRSWkIfDtOkGLX4WM0jFOepV/bm/lwbW5D5iUile1Q NFpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717715495; x=1718320295; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q+3KPt2nJvf1TgYW57/HxN+zoujEHesZU8KKVd4ivnY=; b=q96XUjlHQM9QHaTkppWgnLRolhglCQ0UOONM/JAx0gZ5ADcqwsF2qjfUp7Jmt/2qfl lI7gC6A7HfPDVYFWV53CGQaaA3GIWwklGh8HxIlc0T5pSIDuWVR0gwHMf7IfqsiQNkQU 6gBFFFob+n6c3NLKRl9mWhEl4e8tpmWKT/7Eo6mL6dwGvoJShY2s2j5dAXV/i8sXAr9c v0LT2gKtjXnnAvFEINSs4+GbaWvTxIobs93VY6d4NrlCZ0fmhIbNGPHTFBCeyghBAVaO f56WLDtXgBhMF7B6iDlIv2RZzIX6YSSjAKSPg782opKGyOMETdtqV28HAKM8yDkUQnDn AQGQ== X-Forwarded-Encrypted: i=1; AJvYcCVRTpUtyqu9j3u49j/ZQwJEOx6rkLAoRsn2uAp6Ncq0Vdk1yMDqwg3P01XCMmaPY/wVkEXoCT2wg+vCh7fRyU9IOnk= X-Gm-Message-State: AOJu0YxNXVgF0w0lZ6X8ihfjkdfKt2vzZw+lwnHKb4SHlv15ZaAv2Le5 XKHOtTWf3jypcMWqtHBrv6aYqB+sPPVLpxKEQrOi3DiJxMIVAqQ3syotsczHQ89O+Uod5OTl7o0 LraUeovfXf9ZiMYOlyaG/lKlS40HxsTcX5/0t X-Google-Smtp-Source: AGHT+IFNiFFnhENwqOI+298n5J5418oaFJcQaIba1KTvkhnfSk8Pre1KoxQy4qsGY35MzxINLXH7AW0DYULAth/OFgc= X-Received: by 2002:a17:906:694e:b0:a68:d2c:2a23 with SMTP id a640c23a62f3a-a6cdc0e0f9dmr54481066b.76.1717715494736; Thu, 06 Jun 2024 16:11:34 -0700 (PDT) MIME-Version: 1.0 References: <20240604175340.218175-1-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Thu, 6 Jun 2024 16:10:58 -0700 Message-ID: Subject: Re: [PATCH] mm: zsmalloc: share slab caches for all zsmalloc zpools To: Minchan Kim Cc: Andrew Morton , Sergey Senozhatsky , Vlastimil Babka , David Rientjes , Christoph Lameter , Erhard Furtner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Chengming Zhou Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: srh861697j3yja5cd3tgt44z3i5cqq76 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B89761C0010 X-HE-Tag: 1717715496-803239 X-HE-Meta: U2FsdGVkX19nl4HEJfCmmE+tu3hV3H9TBuv5Yw9mDJSA+3q5HIAwYsCAVSJ3JjxfvbNQ/xE96A4sRtGGUbBoz7uemdmpwIGsa35968alv7vkTznCrPNo6VIxtZheevziwCthxHQ7mI91Nf3frMk7ID8v/u+LOu11NojSOVSNP4PNRiOL22muybqPWL+eFr1JJS0Xh8VpQ377afFJs046JMfvnzQ754iq6R7WFxtTZGfJFft3+gpY4tA87ZzKzDrfo/1GRzEI2S5rJLmiy7JZyu8ZFoyoCo1K511juKlFfDQ/jJqdfFLoL2arGg8txyba3vgVGArKNY5prO0zERxSmujofl6+cetZSfjgFKxEtwMwvappArFu9Rcw/v1N5NW3Sq1UNhX2ZKIGvaKRdVqceb3/cXZx804n70N3rqbk8ppjsDjkbdG6F6xvvCl4coTDiFYge/YRwr5SxBgk7E33Gk4kgVpk9bVuPsFatL3zTiGnJK/uONVGJDyHk14qh0PibJmOBohREFsvkhqg76TmQ9lkTVbr18NMqq7N7qv0HP/O6FmVibdHrSyjdgGWn40LRUHPQLdZc01tr0CYIrQKH7PyLCMVku10zxaA3/PrNIML+h9TvaNNRW5UJQBZ1JShzt6E+Bqo3Qt/WeF1NtRWZMV3/AByiyo+DaDGXN4iKhVliL7RvMiUq6vaQeILBLYffIjKT5FL1KiNFqmzz6LNPTL3h9lBVZKRhio5julpx6U3JEmiJU5GnkCfx3o8w56gW3rLxQ1WLAvOIuyPdYJ15B9qXXhNhKhjp1SToBmpo9lGReu6Gr57shTOQMuom4iO34P24nNtIvxnBqvgMvAnL37EzYpdY1fpIS1ye0h20Cs7DN51HJvwyAYFBtbgK8cUqfU0fZk/ymAcTbPqQcfeU3iKePfC3DIoCSqbvT22sVqAdVsMEWft11lyymG9UGEADeDgU6k3Xvh3vPiBdUL quvImPUU bBx0PSlHY/0LS4oKHhguFgrDu1C+Kag3SRxHJKW5yiHXmWpnFdgjrMlD5jjPu8ebtypxBORPs6fpXtgh81c2YhVNvUpHvBwVmEKWyFW8Y2vyS58GCVPbcb1kUv+jTd7L5so4L2U/uk+F4BhKw458qD0ilt+ZhJK3Vse4Ti/UeXe/LBT+NXG5l2iAk9B27apLSJRCTrGwsBAdhK0LxOFdewq7rbN1X6IGpWSSzRNxRBtzpoG5UPJmX8yBFobFzIuczTx2jAd0TfB98AeVUEgu3mQK9BwzgMl0VMeWdoxDbQUhy2qgheyO4IGgR0g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 6, 2024 at 4:03=E2=80=AFPM Yosry Ahmed = wrote: > > On Thu, Jun 6, 2024 at 3:36=E2=80=AFPM Minchan Kim w= rote: > > > > On Tue, Jun 04, 2024 at 05:53:40PM +0000, Yosry Ahmed wrote: > > > Zswap creates multiple zpools to improve concurrency. Each zsmalloc > > > zpool creates its own 'zs_handle' and 'zspage' slab caches. Currently= we > > > end up with 32 slab caches of each type. > > > > > > Since each slab cache holds some free objects, we end up with a lot o= f > > > free objects distributed among the separate zpool caches. Slab caches > > > are designed to handle concurrent allocations by using percpu > > > structures, so having a single instance of each cache should be enoug= h, > > > and avoids wasting more memory than needed due to fragmentation. > > > > > > Additionally, having more slab caches than needed unnecessarily slows > > > down code paths that iterate slab_caches. > > > > > > In the results reported by Eric in [1], the amount of unused slab mem= ory > > > in these caches goes down from 242808 bytes to 29216 bytes (-88%). Th= is > > > is calculated by (num_objs - active_objs) * objsize for each 'zs_hand= le' > > > and 'zspage' cache. Although this patch did not help with the allocat= ion > > > failure reported by Eric with zswap + zsmalloc, I think it is still > > > worth merging on its own. > > > > > > [1]https://lore.kernel.org/lkml/20240604134458.3ae4396a@yea/ > > > > I doubt this is the right direction. > > > > Zsmalloc is used for various purposes, each with different object > > lifecycles. For example, swap operations relatively involve short-lived > > objects, while filesystem use cases might have longer-lived objects. > > This mix of lifecycles could lead to fragmentation with this approach. > > Even in a swapfile, some objects can be short-lived and some objects > can be long-lived, and the line between swap and file systems both > becomes blurry with shmem/tmpfs. I don't think having separate caches > here is vital, but I am not generally familiar with the file system > use cases and I don't have data to prove/disprove it. > > > > > I believe the original problem arose when zsmalloc reduced its lock > > granularity from the class level to a global level. And then, Zswap wen= t > > to mitigate the issue with multiple zpools, but it's essentially anothe= r > > bandaid on top of the existing problem, IMO. > > IIRC we reduced the granularity when we added writeback support to > zsmalloc, which was relatively recent. I think we have seen lock > contention with zsmalloc long before that. We have had a similar patch > internally to use multiple zpools in zswap for many years now. > > +Yu Zhao > > Yu has more historical context about this, I am hoping he will shed > more light about this. > > > > > The correct approach would be to further reduce the zsmalloc lock > > granularity. > > I definitely agree that the correct approach should be to fix the lock > contention at the source and drop zswap's usage of multiple zpools. > Nonetheless, I think this patch provides value in the meantime. The > fragmentation within the slab caches is real with zswap's use case. > OTOH, sharing a cache between swap and file system use cases leading > to fragmentation within the same slab cache is a less severe problem > in my opinion. > > That being said, I don't feel strongly. If you really don't like this > patch I am fine with dropping it. Oh and I forgot to mention, Chengming said he is already working on restoring the per-class lock and collecting lock contention data, so maybe that will be enough after all. Ideally we want to compare: - single zpool with per-pool lock - multiple zpools with per-pool lock (current) - single zpool with per-class locks