From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E7F1C54E58 for ; Tue, 12 Mar 2024 18:43:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DEB2B6B0275; Tue, 12 Mar 2024 14:43:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9B216B0276; Tue, 12 Mar 2024 14:43:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C62956B0277; Tue, 12 Mar 2024 14:43:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B379D6B0275 for ; Tue, 12 Mar 2024 14:43:37 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8BDDBA1273 for ; Tue, 12 Mar 2024 18:43:37 +0000 (UTC) X-FDA: 81889260474.04.BC5A769 Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com [209.85.210.52]) by imf27.hostedemail.com (Postfix) with ESMTP id 8924D40013 for ; Tue, 12 Mar 2024 18:43:35 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=vi1PLUS6; spf=pass (imf27.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.210.52 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710269015; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rt7f+q+DTeqWhNoRdGZRpb3drbVDdbSe8MdSG9RnJy4=; b=2JbXQnfFM7dmfPhs+x/O7GS8BKS46sBAtilFnH3+nmezBCfyVPewTw75Q40YzKZQXBk/h6 dzGRxB9rzKE8iEfqSGFGLWUzsL+K7YFGoQHp2fqvmnCmgT1rJ15jpyK99mrvhHAZkn5XxH JZW+gy0XdxK+m89/B81jLM4pMxNVdpA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710269015; a=rsa-sha256; cv=none; b=0IGA3HetjuqQE/GyflHBsEBtmA+V7FSxVrBDOxsL9HQGL/gLxkbaSdmKn9d4Bb8LGgxRlR 70ElLkEvXvBVTbinaK7Nkqi4L/Nu7eF4+pWCQHeBwuuCepjGZGuAnA+WFtov9Ort5hniFB eY/QdLR4VGxZmuJIPwZ6OxXlbWrpIpI= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=vi1PLUS6; spf=pass (imf27.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.210.52 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-ot1-f52.google.com with SMTP id 46e09a7af769-6e53be9e2cdso63381a34.3 for ; Tue, 12 Mar 2024 11:43:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1710269014; x=1710873814; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=rt7f+q+DTeqWhNoRdGZRpb3drbVDdbSe8MdSG9RnJy4=; b=vi1PLUS6mqRI4+Fabny1jy0j08UGoTA7RUxJSqPd5XItvw3HO+ruTInnJrFHgJpzLj K+NEt4Ox/JKUzKwuubl9LHzo92QrSmnBfJDXM+Xp9JcQ9a+5z1bCkXoF6GOBwR0hAYo1 av1ugk1wAOfx5QrPjkbXsfW7wALzgp5aeD6RAoiVZRhqMx0oafwaYNylQPmfOuGFSBV2 hD3cYFF/bMX1VMfOEnuXKnbI6/hLe7QmXB+8VaVMPtgQn1vtceWllC0aOr2C86z9bHgh Rw79Bbh57RbjEkZ33xvL6pRnUNHx5w3Iv1BRoqaLJXzgxGfy7iDPiVFUINRzJg40MOzL vyNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710269014; x=1710873814; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=rt7f+q+DTeqWhNoRdGZRpb3drbVDdbSe8MdSG9RnJy4=; b=GFbCgyitWx/avXS+zpVjp8uEd6cith2mIdHaVO9iJZdu7dm8PECeR3LxO7GymVKcBI C5mNd9cx5KcM2/fp3x93K4010Tlk5NlyKUnUhKWV4rAPMLC3aBKXZJ349ug0PiWlQtEy V8NCt5Pc1Ro5Had2o+MPHlHfH5karoSNmhXcabnA8/G/OYVCG4GBWa3o2ge/DH027GiO 44IAPM32b2uvR5aDIqG0ZMzP9eZ5Mimr/6/2rm0cYA0QgphvxHNZiydzTLoq8aP3xjah yCNrlizszYJyhpgXUL0BsktZ/wcg1VKAt5qmnYtGJ/v8WPqITHCIZikEH3wGvsZjr4eO BbXA== X-Forwarded-Encrypted: i=1; AJvYcCVLTRoGtHlGzX7p29TSFPPBu2yow580Od4vFyvdn7YWSQVBORU9Fu57EYt5WkddJi8EcZrW9KRFXRPH5uU3agBnjfk= X-Gm-Message-State: AOJu0YxZaH5+snzAoTyce1IvJW+IHz27YfhhfxCLqSZjYJ66bnGxXQFM J/CZhbcfF75VDwyqmfQmmEk5q5YOWlsdpuqTWIud8lr4OzqrYVNMHwjuRvaaEws= X-Google-Smtp-Source: AGHT+IEwgqJbo2Pl85BubNzexezi/41LXH6x4XWMmazgG18/DwJqO+pQ2IZQWHBrvoNHqriIMyRQUw== X-Received: by 2002:a9d:7841:0:b0:6e4:8fa3:d861 with SMTP id c1-20020a9d7841000000b006e48fa3d861mr4294276otm.34.1710269014573; Tue, 12 Mar 2024 11:43:34 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-da5e-d3ff-fee7-26e7.res6.spectrum.com. [2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id qp10-20020a05620a388a00b0078870b3ad29sm1966546qkn.126.2024.03.12.11.43.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 11:43:33 -0700 (PDT) Date: Tue, 12 Mar 2024 14:43:29 -0400 From: Johannes Weiner To: Chris Li Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yosry Ahmed , Nhat Pham , "Matthew Wilcox (Oracle)" , Chengming Zhou , Barry Song , Barry Song , Chengming Zhou Subject: Re: [PATCH v6] zswap: replace RB tree with xarray Message-ID: <20240312184329.GA3501@cmpxchg.org> References: <20240312-zswap-xarray-v6-1-1b82027d7082@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240312-zswap-xarray-v6-1-1b82027d7082@kernel.org> X-Stat-Signature: 9f7m168qm91bic4xkdyfem8xpaieuy7m X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 8924D40013 X-Rspam-User: X-HE-Tag: 1710269015-375886 X-HE-Meta: U2FsdGVkX199DRKmBtwamsiqN83JbYSJpLfSpgT/HAXxVsv3bUvk6ZZlqhz8XphPOzSI9uYT/QMhjwLWxLiyuhu5RwHudvAnJAVMXU47BFgs+QXqcQhJ6irZ9XCJq5iP+zXciAxpT+m1Ged1kdPb7Xz9xk7rvkjEpwRACJqJ87ejOvow502PF5HMn34t0Cloolq7zOb9MRSUBGQtDjWYuamw03Ha+BlATZ0jItsaWxsODkB6UsjmGjFrTvEPrXOrYvreGcf7k2LISiDIfABEbDmCo4RHxjRt/kPBXwuqTMw6cpTvKikNKTgA6TKIKEVXSuXh66+oJLBVyKhbsTP9U5GcweBSQiB19UDjS9CMhQqlwDDRWOenyIbqvSnnhH4isvm32V9Acr2Go/Tu0At9W3nRAkqIO75/6I16YCv0Axk3KWmFA0dyexdA7Vs/aH7sLx95PNQzItBqkYtZ0h0vyv1j2sP4en/pMlb9YTJ15BugPwuNo388j3Rs6Jc0fUPrJv1MqSYsfluvWYjZHyfXXwRJzODNb2aUMgFypLaVlupEQ+HbsOxQxRyvWpr6KL5FQtKoGoLs+obdEhvNoRajKhDHDG+5YTvUmOd+Y4XIunmhSrfVm8Gu8OSxsYXR2ClfuzkHgsgHXdYAl8hH5KcrGAUDbLATYqyPEL1mKmkMfRyJ4Qa6zb03mz9MhLa35sXCYxSI4b35vv1cbyD/TEDe6wYqNoXDGeFHhxLslGtwWp+cnzx56gr7nEdzymjvc4TqvfMOaAZezym+dgUBSKu3yB2Oul3qIPg8qgVKrTVuJLq1VW0npjv/mC92sjBG+FIS/G7R/VXuOs/woYQkU0x/SfnUoWQJPnHG+EA7hPEkyKDXIwIaLEUiSZ1mqs3xu4Y2bll4tDOp5wGDQww8pFweZoyl1yGo1mOyugj4ovwjAljg3mP4NnaNEpsOg+3RomwEWXxhNIlsFrBbomzvCvZ gX89uK59 +I7fBCzf+Po0MYhCt0vQIbgWzz35FYr4FyC3hQAjCWW92MMllwcRH/Gx73xkefpxjb/KkakVdC+J0o3RQ9f9b0swpOyn0ZPnvvK6+BnN9ARYwthWJmWm9I52f2hRdYQBxFAAoAmmdmPv9Yp7QZVZJYuRbIS/OMp+DPgjGBvG2CMf0yIq6WJO6iVflITzu2iRmI9d/PVjYHrXxOUti/wUotJMNQbOAqCts1Gk+X1S8NVGZPCSWlDqwiE2NrviXOOt71qFIDdlh3/xatorcmQav2RyEC88ECGlZD8T4WjM37w+u9wV3R8REYjt6Gd18qeYKnuuShJrOEZsKL0HllCWclfsgXMaQMT3SBpRC9S/kjO7BZcqboSCJLw0aQSaf1jSCo1BTfEtH+lMMTwk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 12, 2024 at 10:31:12AM -0700, Chris Li wrote: > Very deep RB tree requires rebalance at times. That > contributes to the zswap fault latencies. Xarray does not > need to perform tree rebalance. Replacing RB tree to xarray > can have some small performance gain. > > One small difference is that xarray insert might fail with > ENOMEM, while RB tree insert does not allocate additional > memory. > > The zswap_entry size will reduce a bit due to removing the > RB node, which has two pointers and a color field. Xarray > store the pointer in the xarray tree rather than the > zswap_entry. Every entry has one pointer from the xarray > tree. Overall, switching to xarray should save some memory, > if the swap entries are densely packed. > > Notice the zswap_rb_search and zswap_rb_insert always > followed by zswap_rb_erase. Use xa_erase and xa_store > directly. That saves one tree lookup as well. > > Remove zswap_invalidate_entry due to no need to call > zswap_rb_erase any more. Use zswap_free_entry instead. > > The "struct zswap_tree" has been replaced by "struct xarray". > The tree spin lock has transferred to the xarray lock. > > Run the kernel build testing 10 times for each version, averages: > (memory.max=2GB, zswap shrinker and writeback enabled, > one 50GB swapfile, 24 HT core, 32 jobs) > > mm-9a0181a3710eb xarray v5 > user 3532.385 3535.658 > sys 536.231 530.083 > real 200.431 200.176 This is a great improvement code and complexity wise. I have a few questions and comments below: What kernel version is this based on? It doesn't apply to mm-everything, and I can't find 9a0181a3710eb anywhere. > @@ -1555,28 +1473,35 @@ bool zswap_store(struct folio *folio) > insert_entry: > entry->swpentry = swp; > entry->objcg = objcg; > - if (objcg) { > - obj_cgroup_charge_zswap(objcg, entry->length); > - /* Account before objcg ref is moved to tree */ > - count_objcg_event(objcg, ZSWPOUT); > - } > > - /* map */ > - spin_lock(&tree->lock); > /* > * The folio may have been dirtied again, invalidate the > * possibly stale entry before inserting the new entry. > */ The comment is now somewhat stale and somewhat out of place. It should be above that `if (old)` part... See below. > - if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > - zswap_invalidate_entry(tree, dupentry); > - WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry)); > + old = xa_store(tree, offset, entry, GFP_KERNEL); > + if (xa_is_err(old)) { > + int err = xa_err(old); > + if (err == -ENOMEM) > + zswap_reject_alloc_fail++; > + else > + WARN_ONCE(err, "%s: xa_store failed: %d\n", > + __func__, err); > + goto store_failed; No need to complicate it. If we have a bug there, an incorrect fail stat bump is the least of our concerns. Also, no need for __func__ since that information is included in the WARN: if (xa_is_err(old)) { WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err); zswap_reject_alloc_fail++; goto store_failed; } I think here is where that comment above should go: /* * We may have had an existing entry that became stale when * the folio was redirtied and now the new version is being * swapped out. Get rid of the old. */ > + if (old) > + zswap_entry_free(old); > + > + if (objcg) { > + obj_cgroup_charge_zswap(objcg, entry->length); > + /* Account before objcg ref is moved to tree */ > + count_objcg_event(objcg, ZSWPOUT); > } > + > if (entry->length) { > INIT_LIST_HEAD(&entry->lru); > zswap_lru_add(&zswap.list_lru, entry); > atomic_inc(&zswap.nr_stored); > } > - spin_unlock(&tree->lock); We previously relied on the tree lock to finish initializing the entry while it's already in tree. Now we rely on something else: 1. Concurrent stores and invalidations are excluded by folio lock. 2. Writeback is excluded by the entry not being on the LRU yet. The publishing order matters to prevent writeback from seeing an incoherent entry. I think this deserves a comment. > /* update stats */ > atomic_inc(&zswap_stored_pages); > @@ -1585,6 +1510,12 @@ bool zswap_store(struct folio *folio) > > return true; > > +store_failed: > + if (!entry->length) { > + atomic_dec(&zswap_same_filled_pages); > + goto freepage; > + } It'd be good to avoid the nested goto. Why not make the pool operations conditional on entry->length instead: store_failed: if (!entry->length) atomic_dec(&zswap_same_filled_pages); else { zpool_free(zswap_find_zpool(...)); put_pool: zswap_pool_put(entry->pool); } freepage: Not super pretty either, but it's a linear flow at least.