From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 725AFC48BF6 for ; Tue, 5 Mar 2024 02:52:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 095696B0074; Mon, 4 Mar 2024 21:52:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 036B96B0081; Mon, 4 Mar 2024 21:52:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E18D26B0082; Mon, 4 Mar 2024 21:52:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D043D6B0080 for ; Mon, 4 Mar 2024 21:52:35 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A8B3480AE2 for ; Tue, 5 Mar 2024 02:52:35 +0000 (UTC) X-FDA: 81861462270.15.B2D5D81 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf08.hostedemail.com (Postfix) with ESMTP id D2E9216000F for ; Tue, 5 Mar 2024 02:52:32 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=E0zNxDGp; spf=pass (imf08.hostedemail.com: domain of zhouchengming@bytedance.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=zhouchengming@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709607153; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6iJ9HOyzHMIzN7edZwdA+F8TshEnnWxUjZwSSaji3iY=; b=MjfzGEyspCb3owD+1HfDp9KG1m/NVN4brRRGPm1xoU429cg6awgsboP1jG2eJTr/w/hwXf NbDAJGSGHUeAfLE6Ww4kYPngzZH8c1aMLi6LWOIVjrSZODJvOY3dOP27p6j5ss2BaIFg4J yYwLgpBig6LckyFxzbRJdTl1JZ7fjyU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709607153; a=rsa-sha256; cv=none; b=F1nqNaUoEQQ4FCiE/pePZsDgvNzQj/F7mg4GV+2ETIrJIo0aESjpXy5hdwUawqDf9CjJPp zlc9OtHpEac2LjUu+FC+eQFr/OYf1sO8YJRckRNZZn7kHEekTfyHUvRORx4p75knKnvcDp hUumfAwmgsKWm7dSqXzQusjtUww6Cjw= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=E0zNxDGp; spf=pass (imf08.hostedemail.com: domain of zhouchengming@bytedance.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=zhouchengming@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-29b31254820so1696826a91.0 for ; Mon, 04 Mar 2024 18:52:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1709607151; x=1710211951; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=6iJ9HOyzHMIzN7edZwdA+F8TshEnnWxUjZwSSaji3iY=; b=E0zNxDGpxOII5ZNpIaL7pmxFvsypqQKrfqamrKlcMX8qQ/J9K/ltftrz0CjBr9Y8wE aGvVCeasxjmITQ2sOCNYiXa4dKjLG6chvyprJ8jzz7L4Gmb6h9ceNPfTFGjyK37DRycS 2DYN3bGTlYIVqSI/Hu23PdgvQul4k8Szhv5ctyJQpGjz6UN9+2QHFf43NKmUJmXj15Bf V/De7bSZjZayCqdLuHXsKXXOxFMcPbUVMVJ3T3CHQxcGm3h1DGveXJE9F/HGB/gzXBxT WX9lPgq/PgSJCfvooo/xCju3yImtmnIcxv/IC2AnBh0LlYtxdDdzYNNGmRenkpz+2QdJ X4/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709607151; x=1710211951; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6iJ9HOyzHMIzN7edZwdA+F8TshEnnWxUjZwSSaji3iY=; b=i9MPe+RkmY9M3k+D4BAP451uK7px2gn0tSWRxGemXb9rcPkyEv+w5T3E6DnHP7676d bnG7pSS6LmsoMkD544akaE7J3Mq8QvSsxnAhyLI3QQV5b4gp/r1oGkiCwNjXtUK61Di1 Ak4b0KUX6pRWjw7aaNBY7u8+Nvf9bNZEulXD1djyAXeRND+NSX8S760uKfdSMlQ98tfO 4508NC/iNPABi+KHjCqccvFU4fMpAHPKwIvowhKXqjva8HlmP4AALBAvVLwuE8wODpQN DWbYcZmRyScj2LUyFUAE0vcy3U0Fc2HAA09n5V9l4pyayfhu872ZczFK8IkOnCKmjMim ZoYw== X-Forwarded-Encrypted: i=1; AJvYcCU4PY9zVy8eekR2FQ99jZx16LmBr+iwxAV23Tovy8n6bLEMGqDi+nUXw7GB+8hvW130rX5cj9ez0DTZNm5K6XXLC70= X-Gm-Message-State: AOJu0Ywnnj9J45CwMEi2dlIzrbhtT2ltx7rgcRPdL9hlVLuU32+6Mlge QFgwscT6cvHM2Qi6m5mVxZl45b7mGEDrQjqZW2SCvbbDYZmAmfHTJL9qmgMKg6FZ+JoiW0St44X m X-Google-Smtp-Source: AGHT+IF4tD5aaAi18GGb1mIXKv6F3Xik3EifggL7oeXj5k4mfjjXEjfkzkOIbkrUjPu22R4vEEtFpQ== X-Received: by 2002:a17:90a:c596:b0:299:5913:db15 with SMTP id l22-20020a17090ac59600b002995913db15mr7760807pjt.29.1709607151270; Mon, 04 Mar 2024 18:52:31 -0800 (PST) Received: from [10.254.206.221] ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id o6-20020a17090a5b0600b00299e946b9cdsm8556826pji.20.2024.03.04.18.52.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Mar 2024 18:52:30 -0800 (PST) Message-ID: <05cbfe34-bac1-4c16-92fa-f38b09160458@bytedance.com> Date: Tue, 5 Mar 2024 10:52:24 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4] zswap: replace RB tree with xarray Content-Language: en-US To: Chris Li , Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yosry Ahmed , Nhat Pham , Johannes Weiner , "Matthew Wilcox (Oracle)" , Barry Song References: <20240304-zswap-xarray-v4-1-c4b45670cc30@kernel.org> From: Chengming Zhou In-Reply-To: <20240304-zswap-xarray-v4-1-c4b45670cc30@kernel.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: D2E9216000F X-Rspam-User: X-Stat-Signature: meouz9kbpzdy69i366wasztm8g1cdskb X-Rspamd-Server: rspam03 X-HE-Tag: 1709607152-101914 X-HE-Meta: U2FsdGVkX18HfZLrDwTUaV9bVNQJTv2mt44nk8GEi2rhvNWauOZLpswqsrGgvKAHwVuoxNbB6+a3tGfDl4wWdYTghbmSbMUP0UPQyzZVe/2hphewTYC52jMewNpzA8qLCqTAnifyFy14sbkZWlRkX4Z6cgG4OrzME4Bem1r1wgOVGhRsU8vslHwz5joTBbDhR1xt1LrHmwpOUdp1J1upZBig7cCZo3lAUf1t4YpTDMlDcI2ZGIBZRcJLXDGKSghFSe1OR6r93tOcNQJH7nWjciOo3GCloG4OO7fBdGTWe4pZP3Lqq76NEaR0M5UvSwJl8KamkWhSW5FacHisOWsuUdqV61PJCvIj1EPRo/lVzTDsJcfE5XVh4KgB4mMotmM3baxX9p8qztSe8pfXJEmIG0vSQ9juAQMIkxifiVr7WGGPn2wJCznzrOzB2ltPmv3R0Ka1CRttAERvly6NEnZ6na43IvBZdQSIm3IIfi3BhZqxbnJO0fe4r0GI7GWijj6bOXCOC4SI2eY2r0fXv9buYFp5PNVhBC/pPgEGmqcEbxHczIsvZnYYGfAfFNQY5CX/mi7C2wXciUbi5NhP7LcxEtkVKmlHBt6HicCTfS3RYqvdrBYL21YYQWzpbFuOkEZKF8MrQhDnS/WOmIQp1Z3Qs4Fcpe52AxYVMrYTpv+OzfeRv/NvFirj5v1r4rGfqLxDSAoL1x/pdP8SAtMStUcVssqGbojJZxcKMIMm7HwQM9FjwwKMLogETOhLXYgkd6rDM7Bl/uIEGtHUx4S1HkvrSApJjrPYshZQCG1zOdXa8GBy3YnGxPp3SOWNkKmR+fCifQdSOcqIztpkx4XhoEVjU1WZMWO/rd0HiAUkNONZ7O+FAYEMqYzb/10UsDq362DUcIG7wsvXYlsdfIZ7iWL6DQmRThHq4U2p3xNpbc01sW3XJJQX46zo1m6meamx5+dkFKThUe83+9qPw5eUqzY LWFtzrGb dpihHi6Vj5KF7UmyTe/f7DUlV+alklngaNn322/zpUIbnRIesbjUNdi5rO8VDhwB/QQY0mt0pWTQX6L2r8SL1lU7KVdQvBcl2PFZa1Y1ZnnXx8suM2v0NlSWBrOiEAEnc31zf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Chris, On 2024/3/5 05:32, Chris Li wrote: > Very deep RB tree requires rebalance at times. That > contributes to the zswap fault latencies. Xarray does not > need to perform tree rebalance. Replacing RB tree to xarray > can have some small performance gain. > > One small difference is that xarray insert might fail with > ENOMEM, while RB tree insert does not allocate additional > memory. > > The zswap_entry size will reduce a bit due to removing the > RB node, which has two pointers and a color field. Xarray > store the pointer in the xarray tree rather than the > zswap_entry. Every entry has one pointer from the xarray > tree. Overall, switching to xarray should save some memory, > if the swap entries are densely packed. > > Notice the zswap_rb_search and zswap_rb_insert always > followed by zswap_rb_erase. Use xa_erase directly. The entry > erase into zswap_xa_insert as well. That saves one tree > lookup as well. > > Remove zswap_invalidate_entry due to no need to call > zswap_rb_erase any more. Use zswap_free_entry instead. > > The "struct zswap_tree" has been replaced by "struct xarray". > The tree spin lock has transferred to the xarray lock. > > Run the kernel build testing 10 times for each version, averages: > (memory.max=2GB, zswap shrinker and writeback enabled, one 50GB swapfile.) > > mm-9a0181a3710eb xarray v4 > user 3526.829 3526.930 > sys 532.754 526.525 > real 198.748 198.850 > > --- > > > Signed-off-by: Chris Li > --- > Changes in v4: > - Remove zswap_xa_search_and_earse, use xa_erase directly. > - Move charge of objcg after zswap_xa_insert. > - Avoid erase old entry on insert fail error path. > - Remove not needed swap_zswap_tree change > - Link to v3: https://lore.kernel.org/r/20240302-zswap-xarray-v3-1-5900252f2302@kernel.org > > Changes in v3: > - Use xa_cmpxchg instead of zswap_xa_search_and_delete in zswap_writeback_entry. > - Use xa_store in zswap_xa_insert directly. Reduce the scope of spinlock. > - Fix xa_store error handling for same page fill case. > - Link to v2: https://lore.kernel.org/r/20240229-zswap-xarray-v2-1-e50284dfcdb1@kernel.org > > Changes in v2: > - Replace struct zswap_tree with struct xarray. > - Remove zswap_tree spinlock, use xarray lock instead. > - Fold zswap_rb_erase() into zswap_xa_search_and_delete() and zswap_xa_insert(). > - Delete zswap_invalidate_entry(), use zswap_free_entry() instead. > - Link to v1: https://lore.kernel.org/r/20240117-zswap-xarray-v1-0-6daa86c08fae@kernel.org > --- > mm/zswap.c | 186 ++++++++++++++++++++++++------------------------------------- > 1 file changed, 72 insertions(+), 114 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 011e068eb355..4f4a3f452b76 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -20,7 +20,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -71,6 +70,8 @@ static u64 zswap_reject_compress_poor; > static u64 zswap_reject_alloc_fail; > /* Store failed because the entry metadata could not be allocated (rare) */ > static u64 zswap_reject_kmemcache_fail; > +/* Store failed because xarray can't insert the entry*/ > +static u64 zswap_reject_xarray_fail; > > /* Shrinker work queue */ > static struct workqueue_struct *shrink_wq; > @@ -196,7 +197,6 @@ static struct { > * This structure contains the metadata for tracking a single compressed > * page within zswap. > * > - * rbnode - links the entry into red-black tree for the appropriate swap type > * swpentry - associated swap entry, the offset indexes into the red-black tree > * length - the length in bytes of the compressed page data. Needed during > * decompression. For a same value filled page length is 0, and both > @@ -208,7 +208,6 @@ static struct { > * lru - handle to the pool's lru used to evict pages. > */ > struct zswap_entry { > - struct rb_node rbnode; > swp_entry_t swpentry; > unsigned int length; > struct zswap_pool *pool; > @@ -220,12 +219,7 @@ struct zswap_entry { > struct list_head lru; > }; > > -struct zswap_tree { > - struct rb_root rbroot; > - spinlock_t lock; > -}; > - > -static struct zswap_tree *zswap_trees[MAX_SWAPFILES]; > +static struct xarray *zswap_trees[MAX_SWAPFILES]; > static unsigned int nr_zswap_trees[MAX_SWAPFILES]; > > /* RCU-protected iteration */ > @@ -253,7 +247,7 @@ static bool zswap_has_pool; > * helpers and fwd declarations > **********************************/ > > -static inline struct zswap_tree *swap_zswap_tree(swp_entry_t swp) > +static inline struct xarray *swap_zswap_tree(swp_entry_t swp) > { > return &zswap_trees[swp_type(swp)][swp_offset(swp) > >> SWAP_ADDRESS_SPACE_SHIFT]; > @@ -805,60 +799,33 @@ void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) > } > > /********************************* > -* rbtree functions > +* xarray functions > **********************************/ > -static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset) > -{ > - struct rb_node *node = root->rb_node; > - struct zswap_entry *entry; > - pgoff_t entry_offset; > - > - while (node) { > - entry = rb_entry(node, struct zswap_entry, rbnode); > - entry_offset = swp_offset(entry->swpentry); > - if (entry_offset > offset) > - node = node->rb_left; > - else if (entry_offset < offset) > - node = node->rb_right; > - else > - return entry; > - } > - return NULL; > -} > > /* > * In the case that a entry with the same offset is found, a pointer to > - * the existing entry is stored in dupentry and the function returns -EEXIST > + * the existing entry is stored in old and erased from the tree. > + * Function return error on insert. > */ > -static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry, > - struct zswap_entry **dupentry) > +static int zswap_xa_insert(struct xarray *tree, struct zswap_entry *entry, > + struct zswap_entry **old) > { > - struct rb_node **link = &root->rb_node, *parent = NULL; > - struct zswap_entry *myentry; > - pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry); > - > - while (*link) { > - parent = *link; > - myentry = rb_entry(parent, struct zswap_entry, rbnode); > - myentry_offset = swp_offset(myentry->swpentry); > - if (myentry_offset > entry_offset) > - link = &(*link)->rb_left; > - else if (myentry_offset < entry_offset) > - link = &(*link)->rb_right; > - else { > - *dupentry = myentry; > - return -EEXIST; > - } > - } > - rb_link_node(&entry->rbnode, parent, link); > - rb_insert_color(&entry->rbnode, root); > - return 0; > -} > + int err; > + struct zswap_entry *e; > + pgoff_t offset = swp_offset(entry->swpentry); > > -static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) > -{ > - rb_erase(&entry->rbnode, root); > - RB_CLEAR_NODE(&entry->rbnode); > + e = xa_store(tree, offset, entry, GFP_KERNEL); > + err = xa_err(e); > + > + if (err) { > + e = xa_erase(tree, offset); > + if (err == -ENOMEM) > + zswap_reject_alloc_fail++; > + else > + zswap_reject_xarray_fail++; > + } > + *old = e; > + return err; > } > > /********************************* > @@ -872,7 +839,6 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp, int nid) > entry = kmem_cache_alloc_node(zswap_entry_cache, gfp, nid); > if (!entry) > return NULL; > - RB_CLEAR_NODE(&entry->rbnode); > return entry; > } > > @@ -914,17 +880,6 @@ static void zswap_entry_free(struct zswap_entry *entry) > zswap_update_total_size(); > } > > -/* > - * The caller hold the tree lock and search the entry from the tree, > - * so it must be on the tree, remove it from the tree and free it. > - */ > -static void zswap_invalidate_entry(struct zswap_tree *tree, > - struct zswap_entry *entry) > -{ > - zswap_rb_erase(&tree->rbroot, entry); > - zswap_entry_free(entry); > -} > - > /********************************* > * compressed storage functions > **********************************/ > @@ -1113,7 +1068,9 @@ static void zswap_decompress(struct zswap_entry *entry, struct page *page) > static int zswap_writeback_entry(struct zswap_entry *entry, > swp_entry_t swpentry) > { > - struct zswap_tree *tree; > + struct xarray *tree; > + pgoff_t offset = swp_offset(swpentry); > + struct zswap_entry *e; > struct folio *folio; > struct mempolicy *mpol; > bool folio_was_allocated; > @@ -1150,19 +1107,14 @@ static int zswap_writeback_entry(struct zswap_entry *entry, > * be dereferenced. > */ > tree = swap_zswap_tree(swpentry); > - spin_lock(&tree->lock); > - if (zswap_rb_search(&tree->rbroot, swp_offset(swpentry)) != entry) { > - spin_unlock(&tree->lock); > + e = xa_cmpxchg(tree, offset, entry, NULL, GFP_KERNEL); > + if (e != entry) { Maybe "if (xa_cmpxchg() != entry)" look better, so "e" variable can be removed, since we don't use it. > delete_from_swap_cache(folio); > folio_unlock(folio); > folio_put(folio); > return -ENOMEM; > } > > - /* Safe to deref entry after the entry is verified above. */ > - zswap_rb_erase(&tree->rbroot, entry); > - spin_unlock(&tree->lock); > - > zswap_decompress(entry, &folio->page); > > count_vm_event(ZSWPWB); > @@ -1471,10 +1423,12 @@ bool zswap_store(struct folio *folio) > { > swp_entry_t swp = folio->swap; > pgoff_t offset = swp_offset(swp); > - struct zswap_tree *tree = swap_zswap_tree(swp); > - struct zswap_entry *entry, *dupentry; > + struct xarray *tree = swap_zswap_tree(swp); > + struct zswap_entry *entry, *old; > struct obj_cgroup *objcg = NULL; > struct mem_cgroup *memcg = NULL; > + int err; > + bool old_erased = false; > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); > @@ -1526,6 +1480,7 @@ bool zswap_store(struct folio *folio) > kunmap_local(src); > entry->length = 0; > entry->value = value; > + entry->pool = NULL; > atomic_inc(&zswap_same_filled_pages); > goto insert_entry; > } > @@ -1555,28 +1510,31 @@ bool zswap_store(struct folio *folio) > insert_entry: > entry->swpentry = swp; > entry->objcg = objcg; > - if (objcg) { > - obj_cgroup_charge_zswap(objcg, entry->length); > - /* Account before objcg ref is moved to tree */ > - count_objcg_event(objcg, ZSWPOUT); > - } > > /* map */ > - spin_lock(&tree->lock); > /* > * The folio may have been dirtied again, invalidate the > * possibly stale entry before inserting the new entry. > */ > - if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > - zswap_invalidate_entry(tree, dupentry); > - WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry)); > + err = zswap_xa_insert(tree, entry, &old); > + if (old) > + zswap_entry_free(old); > + if (err) { > + old_erased = true; > + goto insert_failed; > } It looks a little complicated for me :( How do you think this like below? old = xa_store(tree, offset, entry, GFP_KERNEL); if (xa_is_err(old)) goto store_failed; if (old) zswap_entry_free(old); Then zswap_xa_insert() wrapper can be removed since no use elsewhere. So the error handling path is kept much the same as before and simpler. > + > + if (objcg) { > + obj_cgroup_charge_zswap(objcg, entry->length); > + /* Account before objcg ref is moved to tree */ > + count_objcg_event(objcg, ZSWPOUT); > + } > + > if (entry->length) { > INIT_LIST_HEAD(&entry->lru); > zswap_lru_add(&zswap.list_lru, entry); > atomic_inc(&zswap.nr_stored); > } > - spin_unlock(&tree->lock); > > /* update stats */ > atomic_inc(&zswap_stored_pages); > @@ -1585,6 +1543,12 @@ bool zswap_store(struct folio *folio) > > return true; > > +insert_failed: > + if (!entry->length) { > + atomic_dec(&zswap_same_filled_pages); > + goto freepage; > + } > + zpool_free(zswap_find_zpool(entry), entry->handle); entry->pool can be used here instead of zswap_find_zpool(). Thanks! > put_pool: > zswap_pool_put(entry->pool); > freepage: > @@ -1592,17 +1556,19 @@ bool zswap_store(struct folio *folio) > reject: > if (objcg) > obj_cgroup_put(objcg); > + > + if (old_erased) > + goto failed; > check_old: > /* > * If the zswap store fails or zswap is disabled, we must invalidate the > * possibly stale entry which was previously stored at this offset. > * Otherwise, writeback could overwrite the new data in the swapfile. > */ > - spin_lock(&tree->lock); > - entry = zswap_rb_search(&tree->rbroot, offset); > + entry = xa_erase(tree, offset); > if (entry) > - zswap_invalidate_entry(tree, entry); > - spin_unlock(&tree->lock); > + zswap_entry_free(entry); > +failed: > return false; > > shrink: > @@ -1615,20 +1581,15 @@ bool zswap_load(struct folio *folio) > swp_entry_t swp = folio->swap; > pgoff_t offset = swp_offset(swp); > struct page *page = &folio->page; > - struct zswap_tree *tree = swap_zswap_tree(swp); > + struct xarray *tree = swap_zswap_tree(swp); > struct zswap_entry *entry; > u8 *dst; > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > - spin_lock(&tree->lock); > - entry = zswap_rb_search(&tree->rbroot, offset); > - if (!entry) { > - spin_unlock(&tree->lock); > + entry = xa_erase(tree, offset); > + if (!entry) > return false; > - } > - zswap_rb_erase(&tree->rbroot, entry); > - spin_unlock(&tree->lock); > > if (entry->length) > zswap_decompress(entry, page); > @@ -1652,19 +1613,17 @@ bool zswap_load(struct folio *folio) > void zswap_invalidate(swp_entry_t swp) > { > pgoff_t offset = swp_offset(swp); > - struct zswap_tree *tree = swap_zswap_tree(swp); > + struct xarray *tree = swap_zswap_tree(swp); > struct zswap_entry *entry; > > - spin_lock(&tree->lock); > - entry = zswap_rb_search(&tree->rbroot, offset); > + entry = xa_erase(tree, offset); > if (entry) > - zswap_invalidate_entry(tree, entry); > - spin_unlock(&tree->lock); > + zswap_entry_free(entry); > } > > int zswap_swapon(int type, unsigned long nr_pages) > { > - struct zswap_tree *trees, *tree; > + struct xarray *trees, *tree; > unsigned int nr, i; > > nr = DIV_ROUND_UP(nr_pages, SWAP_ADDRESS_SPACE_PAGES); > @@ -1674,11 +1633,8 @@ int zswap_swapon(int type, unsigned long nr_pages) > return -ENOMEM; > } > > - for (i = 0; i < nr; i++) { > - tree = trees + i; > - tree->rbroot = RB_ROOT; > - spin_lock_init(&tree->lock); > - } > + for (i = 0; i < nr; i++) > + xa_init(trees + i); > > nr_zswap_trees[type] = nr; > zswap_trees[type] = trees; > @@ -1687,7 +1643,7 @@ int zswap_swapon(int type, unsigned long nr_pages) > > void zswap_swapoff(int type) > { > - struct zswap_tree *trees = zswap_trees[type]; > + struct xarray *trees = zswap_trees[type]; > unsigned int i; > > if (!trees) > @@ -1695,7 +1651,7 @@ void zswap_swapoff(int type) > > /* try_to_unuse() invalidated all the entries already */ > for (i = 0; i < nr_zswap_trees[type]; i++) > - WARN_ON_ONCE(!RB_EMPTY_ROOT(&trees[i].rbroot)); > + WARN_ON_ONCE(!xa_empty(trees + i)); > > kvfree(trees); > nr_zswap_trees[type] = 0; > @@ -1727,6 +1683,8 @@ static int zswap_debugfs_init(void) > zswap_debugfs_root, &zswap_reject_kmemcache_fail); > debugfs_create_u64("reject_compress_fail", 0444, > zswap_debugfs_root, &zswap_reject_compress_fail); > + debugfs_create_u64("reject_xarray_fail", 0444, > + zswap_debugfs_root, &zswap_reject_xarray_fail); > debugfs_create_u64("reject_compress_poor", 0444, > zswap_debugfs_root, &zswap_reject_compress_poor); > debugfs_create_u64("written_back_pages", 0444, > > --- > base-commit: 9a0181a3710eba1f5c6d19eadcca888be3d54e4f > change-id: 20240104-zswap-xarray-716260e541e3 > > Best regards,