From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BB333CCD185 for ; Wed, 15 Oct 2025 22:24:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18C4D8E0095; Wed, 15 Oct 2025 18:24:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13CF58E0027; Wed, 15 Oct 2025 18:24:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02AEC8E0095; Wed, 15 Oct 2025 18:24:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DF4EB8E0027 for ; Wed, 15 Oct 2025 18:24:32 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DF2B95874D for ; Wed, 15 Oct 2025 22:24:30 +0000 (UTC) X-FDA: 84001778700.05.1280E98 Received: from out-183.mta1.migadu.com (out-183.mta1.migadu.com [95.215.58.183]) by imf23.hostedemail.com (Postfix) with ESMTP id D3605140010 for ; Wed, 15 Oct 2025 22:24:28 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PWg2J37x; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf23.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.183 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1760567069; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AIr+d68g84l91W3HwMz0ACZJQKOkzX4zJJOPAlhsoUQ=; b=tMJegTeDyQeZtC6vY6dWpA3fvmO3OJpN0Brbi2/0HlzLYC8B3tLy0HvHAoQ67XyHsonPgx grJ1EROwu3INaR+SN+IHbOgus4yNl4zs3udNy19YWZLVTPVmeeJr6W6crnIH5N0/ur07++ LCOfxELXATZbYYxSuZ25IqihONgvVHI= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PWg2J37x; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf23.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.183 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1760567069; a=rsa-sha256; cv=none; b=sCYepBMiAh49ZSwvHiuoZ1Mrb3SkZE3BIITomqhoFCBlrfP7m9Hmgc3ZZJ4QSyTbk4u8Hc BqucLkzKdyXuhdttIHD4BDrhdtAA6Fs5MSCvu1Ge9LbXx8PWHyP+L1V7WcXn54U/rvq2Rw MGneqiMJYQFL1PxU4t/bTnDlQNxegzw= Date: Wed, 15 Oct 2025 22:24:17 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1760567066; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AIr+d68g84l91W3HwMz0ACZJQKOkzX4zJJOPAlhsoUQ=; b=PWg2J37xTFtci3yVqYYX89J4w3nkCje5JhlfDTC4VGW21RO8zYwFfK40C/DfNG7rMQwzvm 6gWbSUQgBZBfCT3hpSW637lCncfcshMqpxcKCWxt4EFvGIqebOwgChqdlu7vk0k/finMFB SmtZgtHrtNqnEFXp5M4v6uMzgClJfEE= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: "Sridhar, Kanchana P" Cc: Nhat Pham , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "chengming.zhou@linux.dev" , "usamaarif642@gmail.com" , "ryan.roberts@arm.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "ying.huang@linux.alibaba.com" , "akpm@linux-foundation.org" , "senozhatsky@chromium.org" , "sj@kernel.org" , "kasong@tencent.com" , "linux-crypto@vger.kernel.org" , "herbert@gondor.apana.org.au" , "davem@davemloft.net" , "clabbe@baylibre.com" , "ardb@kernel.org" , "ebiggers@google.com" , "surenb@google.com" , "Accardi, Kristen C" , "Gomes, Vinicius" , "Feghali, Wajdi K" , "Gopal, Vinodh" Subject: Re: [PATCH v12 22/23] mm: zswap: zswap_store() will process a large folio in batches. Message-ID: References: <20250926033502.7486-1-kanchana.p.sridhar@intel.com> <20250926033502.7486-23-kanchana.p.sridhar@intel.com> <2qvfjavbepw3sq2pvvcez6jsc3bxkxej27674l4ztfdza7jqaq@xi6qndkj5xhh> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: D3605140010 X-Rspamd-Server: rspam03 X-Stat-Signature: mti5uw1gst1wrstbjnfyjip69ujsmrje X-HE-Tag: 1760567068-961750 X-HE-Meta: U2FsdGVkX1/hkQphbkEC8iO8IbBA4JwbpOmC9izhk24EeiS1YYw7TDpp3Mdnm2FxKZSETzQieFweVsEz3KyfWk1uRmEq6DVrYZxJKBf7lJ+9nlsM+zR/tyQnNirtV4cxGxnvElMfonLsM+AF4TT24YuIXbHZOt0IFdqOBaH5bDTugJWDDO3uwIWJZFqARXx3gX5Lz158rtia9GaJVgtJhfll21bfz312EqmAB3fwAW0IU+a+pfwJEtNJ22pOTXXx2oPNl2yLkU50bQW8BCuZdrBMZNwg1xnYt7xlXzEsDE8yH7ltycFbk/zxNS3GEyeGzZO4mgULnP3zKy7LZpA7f2KwUjdln2vY+21G7hjw1vjLuSe2KusLgbUluXvN4YWkFSJTrYt68V+qKPSwDzzhiYTgMCrND4RJo9zAkilTtd431L13WyP/ru02MMCZEeQXXzjCMdY+g1bbF7omiqjNzK/wd/DVyDXDlUyA9K1++FC4DxZv0zlF8faiKtBBXZUgWnSBWZ444HWfMBuPpAw9fogRuBfVBQY+BWYWo4TIo8lRfuQLABfaxd07nIF6Zp7RDe4OQiag9Hlj+IYnTyfv4MamHd93VJ83SrM5lTag7oXcfqfqw4Sr2VDNN2u2zAs2lbTe/BW31nLZU72MiPHM7dYk1N0shnDa2+zD7IkH425MQdHLSM5PywPBcBGQ27ZXRcwfRNT389P9E1EjLWsF9gsrZmjIs9WXzEvtqsBiI8RnBhDjVUnHjK2BU7gu25kvalTekHFz3a2BJ7UZVWP7xQ0vXmunpoIvTAY62EUF1zoGyFlg7e3wQU5jAOKbf6GiCfD8kmeEPVXUrKiytC2oyjojjvPiVxS3twZDNQyYvukFh8sIIPVnAyiueJaibm5XtDGj+6zEnFoLUtyXn/zIYQvXwhQeSmY4DE6cNZW6kF3/WWSkmSVHDUFd02gb5jMeN1Gg4LWjiIRPiRrpaCQ mdH7XpbD ODzla9gFmWOhHUJgE9iMoXAXyuf8jEZtwburT9EN7dAKXr7StYGjgPROK0sAs7R7OZJgPlzxAwado3VKWhMiZgiH1tNMawjQUP2AhpQdlZ0B5kgjerjdCANad4LKAGtdeGsuVYgHNSAxDwmviJqj070PaehIawv1E9VvToXDzACBZZGcgNYpJwzFZTVEcNv6iWYqokBcnyTxwTYnqmRLNLCl2vbuFuOSwnB+ddZuN3oQWtVNT+MK3BYUTGeYUYGFlhnbpeNFTIQWiS7Lz15KMiqdrayk7HR7pZqEKOnZVVEYg1p49YB5c2dKOvHHJhtwVVkH6VdVqcU60/b3mK/o/rVlkXHRPa24cBDDZRw/LCjxga4VFsdgzl2rcnedqYLkkl9cUvy5+4Ibmza2GtdS0Go4qKX3nPC7BVo36PBNK+yDDyVfZe5OpavutfsyQU+I9eAcFCp4JYqV2YJ7sFWT6ViEDKLsnHtxBY/AHcyjGuVvrCfh2O8I8o2mtEKxiVd7l4MRaba/HE7dH2bcLFvayXk7VIHBF9wNgZL8gtgyB1JlRNfJAogtJIHM+NDH+Cg1sJc9buWtCncu5ZpQKJjGp1rODXluGzxRya/YspMvtdUQ43Doulows3jNJWU+5Z2e9NmFYXVH0kzwuxT9oHIryNz+LmKWMG98oYVAzgaNzlPWElHV5nDiFNH0HTJJtChqlys6Pfjygc53yhBLkSlNzaykSIMOSYAhbnN7sd4xXFzT57byv3oThaM2NQpPn9PuAzfd7VPguLAD62qNjM5awTklhvxWUIwt1PPf618fHhbZgKxxnC+VFyWCgZ6TI7pnvA9TXwxm3lbh98R/UjdXPuafVyZRuSW53Yd3Ku/UifGZG/2cl1oifcPnxmBKLde0M2hO0k3+Xdtcze3XDXY9gdG0WO/Ju6Fc4/hah X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 15, 2025 at 10:15:12PM +0000, Sridhar, Kanchana P wrote: > > > -----Original Message----- > > From: Nhat Pham > > Sent: Wednesday, October 15, 2025 10:04 AM > > To: Sridhar, Kanchana P > > Cc: Yosry Ahmed ; linux-kernel@vger.kernel.org; > > linux-mm@kvack.org; hannes@cmpxchg.org; chengming.zhou@linux.dev; > > usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com; > > ying.huang@linux.alibaba.com; akpm@linux-foundation.org; > > senozhatsky@chromium.org; sj@kernel.org; kasong@tencent.com; linux- > > crypto@vger.kernel.org; herbert@gondor.apana.org.au; > > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org; > > ebiggers@google.com; surenb@google.com; Accardi, Kristen C > > ; Gomes, Vinicius ; > > Feghali, Wajdi K ; Gopal, Vinodh > > > > Subject: Re: [PATCH v12 22/23] mm: zswap: zswap_store() will process a > > large folio in batches. > > > > On Tue, Oct 14, 2025 at 8:42 PM Sridhar, Kanchana P > > wrote: > > > > > > > > > > -----Original Message----- > > > > From: Nhat Pham > > > > Sent: Tuesday, October 14, 2025 9:35 AM > > > > To: Yosry Ahmed > > > > Cc: Sridhar, Kanchana P ; linux- > > > > kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org; > > > > chengming.zhou@linux.dev; usamaarif642@gmail.com; > > > > ryan.roberts@arm.com; 21cnbao@gmail.com; > > > > ying.huang@linux.alibaba.com; akpm@linux-foundation.org; > > > > senozhatsky@chromium.org; sj@kernel.org; kasong@tencent.com; linux- > > > > crypto@vger.kernel.org; herbert@gondor.apana.org.au; > > > > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org; > > > > ebiggers@google.com; surenb@google.com; Accardi, Kristen C > > > > ; Gomes, Vinicius > > ; > > > > Feghali, Wajdi K ; Gopal, Vinodh > > > > > > > > Subject: Re: [PATCH v12 22/23] mm: zswap: zswap_store() will process a > > > > large folio in batches. > > > > > > > > On Tue, Oct 14, 2025 at 8:29 AM Yosry Ahmed > > > > wrote: > > > > > > > > > > [..] > > > > > > > > @@ -158,6 +161,8 @@ struct zswap_pool { > > > > > > > > struct work_struct release_work; > > > > > > > > struct hlist_node node; > > > > > > > > char tfm_name[CRYPTO_MAX_ALG_NAME]; > > > > > > > > + u8 compr_batch_size; > > > > > > > > + u8 store_batch_size; > > > > > > > > > > > > > > I don't think we need to store store_batch_size, seems trivial to > > > > > > > calculate at store time (perhaps in a helper). > > > > > > > > > > > > > > Taking a step back, is there any benefit to limiting store_batch_size > > to > > > > > > > compr_batch_size? Is there a disadvantage to using > > > > > > > ZSWAP_MAX_BATCH_SIZE > > > > > > > even if it's higher than the HW compression batch size? > > > > > > > > > > > > Thanks Yosry, for the code review comments. I had a discussion with > > > > > > Barry earlier on these very same topics as follow up to his review > > > > comments > > > > > > for v11, starting with [1]. Can you please go through the rationale for > > > > > > these design choices, and let me know if you have any questions: > > > > > > > > > > > > [1]: https://patchwork.kernel.org/comment/26530319/ > > > > > > > > > > I am surprised that calculating the value in zswap_store() causes a > > > > > regression, but I am fine with keeping the precalculation in this case. > > > > > > > > > > I think there's a bigger problem here tho, more below. > > > > > > > > > > > > > + */ > > > > > > > > +static __always_inline int zswap_entries_cache_alloc_batch(void > > > > > > > **entries, > > > > > > > > + unsigned int > > > > > > > nr_entries, > > > > > > > > + gfp_t gfp) > > > > > > > > +{ > > > > > > > > + return kmem_cache_alloc_bulk(zswap_entry_cache, gfp, > > nr_entries, > > > > > > > entries); > > > > > > > > > > > > > > We currently use kmem_cache_alloc_node() in > > > > zswap_entry_cache_alloc() to > > > > > > > allocate the entry on the same node as the compressed page. We > > use > > > > > > > entry_to_nid() to get the node for LRU operations. > > > > > > > > > > > > > > This breaks that assumption. > > > > > > > > > > > > You bring up a good point. I was looking at the code in slub.c and my > > > > > > understanding thus far is that both, bulk allocations and > > > > kmem_cache_alloc_node() > > > > > > allocations are made from a per-CPU "cpu_slab" that is allocated by > > SLUB. > > > > > > > > > > > > IIUC, the concern you are raising is in the mainline, the entry is > > allocated > > > > on > > > > > > the same node as the compressed page, and gets added to the LRU list > > of > > > > that > > > > > > node. IOW, the node to which the compressed page belongs is the one > > to > > > > whose > > > > > > LRU the entry will be added. > > > > > > > > > > > > With this patch, with kmem_cache_alloc_bulk(), the entry will be > > created > > > > on > > > > > > the per-CPU slab of the CPU on which zswap_store() is called and will > > be > > > > > > added to the LRU of that per-CPU slab's NUMA node. Hence, the end > > > > result > > > > > > could potentially be that the zswap_entry for a page could potentially > > be > > > > > > on a different NUMA node/memcg than the page's NUMA node. > > > > > > > > I think only the NUMA node is the problem, not the memcg. > > > > > > > > > > > > > > > > This is my thinking as to how this will impact the zswap shrinker: > > > > > > > > > > > > 1) memcg shrinker: if the memcg the entry ends up in is on the > > > > zswap_list_lru, > > > > > > the entry will be written back. > > > > > > 2) Global shrinker: will cycle through all memcg's that have pages in > > the > > > > > > zswap_list_lru, and the entry will be written back. > > > > > > > > > > > > Based on this, it is not clear to me if there is a problem, and would like > > to > > > > > > request you, Nhat and others to provide insights as well. > > > > > > > > > > > > Interestingly, most of the code in slub.c has unlikely(!node_match(slab, > > > > node)). > > > > > > Does this imply some higher level mm slab allocation requirements? > > > > > > > > > > > > I am Ok with just calling zswap_entry_cache_alloc() for "nr_pages" if > > we > > > > > > think this would be more correct. > > > > > > > > > > I saw your other response as well, but I think one thing is not clear > > > > > here. The zswap entry will get written back "eventually", sure, but > > > > > that's not the problem. > > > > > > > > > > If the zswap entry is on the wrong node lru, two things happen: > > > > > (a) When the "right" node is under memory pressure, we cannot free > > this > > > > > entry by writing it back since it's not available in the lru. > > > > > (b) When the "wrong" node is under memory pressure, it will potentially > > > > > writeback entries from other nodes AND report them as being freed > > > > > from this node. > > > > > > > > > > Both (a) and (b) cause less effective reclaim from the zswap shrinker. > > > > > Additionally (b) causes the shrinker to report the wrong amount of freed > > > > > memory from the node. While this may not be significant today, it's very > > > > > possible that more heuristics start relying on this number in the > > > > > future. > > > > > > > > > > I don't believe we should put zswap entries on the wrong LRU, but I will > > > > > defer to Nhat for the final verdict if he has a different opinion. > > > > > > > > Oh shoot. Yeah I missed that part. > > > > > > > > In the past, we sort of did not care - zswap was very poorly designed > > > > for NUMA architecture in general, and most of our test setups have > > > > been single-node, so these kinds of discrepancies did not show up in > > > > performance numbers. > > > > > > > > But we are getting more multi-node systems: > > > > > > > > 1. Bigger hosts (memory-wise) tend to also have more than one nodes. > > > > It scales better that way (especially because a lot of structures and > > > > locks protecting them are node-partitioned). > > > > > > > > 2. We have also seen different memory media that are often expressed > > > > to the kernel as nodes: CXL, GPU memory, etc. > > > > > > > > This will necessitate tightening memory placement. We recently had to > > > > fix one such issue: > > > > > > > > > > https://github.com/torvalds/linux/commit/56e5a103a721d0ef139bba7ff3d3a > > > > da6c8217d5b > > > > > > > > So I'm a bit nervous about this change, which will make us use the wrong > > > > LRU... > > > > > > > > Some work around: > > > > > > > > 1. Can we squeeze an extra int field anywhere in struct zswap_entry? > > > > > > > > 2. Can we pump nid all the way to zswap_lru_add()? > > > > > > > > This is still not 100% ideal - the metadata (struct zswap_entry) will > > > > still be allocated on the wrong node. But at least the data are > > > > properly managed, i.e on the right LRU. > > > > > > Thanks, Nhat and Yosry for the discussion. Thank you Nhat, for the > > > zsmalloc change log reference and for the work arounds! > > > > > > Following your suggestion in (2), can we pass in the folio's nid from > > > zswap_store_pages() to zswap_lru_add(), as follows: > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index 263bc6d7f5c6..44665deece80 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -694,9 +694,9 @@ static inline int entry_to_nid(struct zswap_entry > > *entry) > > > return page_to_nid(virt_to_page(entry)); > > > } > > > > > > -static void zswap_lru_add(struct list_lru *list_lru, struct zswap_entry > > *entry) > > > +static void zswap_lru_add(struct list_lru *list_lru, struct zswap_entry > > *entry, > > > + int nid) > > > { > > > - int nid = entry_to_nid(entry); > > > struct mem_cgroup *memcg; > > > > > > /* > > > @@ -1758,7 +1758,7 @@ static bool zswap_store_pages(struct folio *folio, > > > * an incoherent entry. > > > */ > > > if (likely(entry->length)) > > > - zswap_lru_add(&zswap_list_lru, entry); > > > + zswap_lru_add(&zswap_list_lru, entry, nid); > > > } > > > > > > return true; > > > -- > > > > > > I believe this will add the entry to the LRU node of the folio being > > > compressed. If so, we may be able to avoid adding an int field to > > > struct zswap_entry? > > > > Hmm that might not work for zswap_lru_del() :( > > > > zswap_entry_free() might be called in context where we do not have > > access to the node information (zswap_load()) for e.g. > > I was thinking that zswap_lru_del() would follow a similar approach, > i.e., zswap_load() would pass the folio's nid to zswap_lru_del(), but > you're right, this would not work if the process faulting in the page > is running on a different node than the one that stored the page. > > > > > Another alternative: can we instead determine the node from the > > compressed object's storage? i.e store zswap_entry in the LRU > > corresponding to the node that holds the compressed data? > > > > You'll probably need the new zsmalloc API to get the node information. > > And can zsmalloc migrate a backing page to a different node? This > > seems complicated... > > That's a great idea! It might be worth exploring if our goal is to maintain > parity with the current status for nodes/LRU/shrinker. > > Good point about zsmalloc migrating a backing page to a different node: > although wouldn't this be a problem with the current status quo also? > > To summarize my understanding: the current approach ensures that the > NUMA node in which the page was allocated, is the one that will hold the > compressed data (the zsmalloc commit log you shared), and is the node > which, under memory pressure, will cause the entry to be written back. > > The entry being allocated on the same NUMA node as the page being > stored in zswap, imho is a "mechanism" to achieve the above. When the > page is faulted in, it is possible that the process has migrated to a different > node, and the folio is now assigned a different nid. IOW, there is no more > significance to the entry's nid than to facilitate the current approach, IIUC. > > I think your suggestion (1) wherein we store the NUMA node as an int field > in the entry can accomplish the same thing. The entry doesn't have to be > allocated on the same node as the page being stored in zswap: we could > let the slab allocator decide this (potentially more optimal system-wide?). > > The entry int field could also be more fail-safe than looking up the zsmalloc > node info (which could have migrated the compressed zspage [still need to > verify]). > > I think the entry int field might also be cleaner with changes encapsulated > to zswap_lru_add()/del(). If we rely on zsmalloc node derivation, it might > require changes in zswap_decompress(). The downside is we add an int > member to the zswap_entry. > > If its Ok with you, can I evaluate the feasibility of (1) and update shortly > after gathering data with usemem30 and kernel_compilation? > I am trying to avoid the latency penalty of not using the bulk allocation > API, and at the same time, ensure we don't change the NUMA node/LRU > lists/shrinker functionality. Based on the data I had gathered recently > in [1], reverting to use kmem_cache_alloc_node() for the batch in > zswap_store_pages() impacts latency considerably. > > [1] https://patchwork.kernel.org/comment/26590517/ > > > > > Taking a step back though, do we have to use the bulk allocation API > > here? Calling the single-allocation version 512 times for a PMD-sized > > page is no worse than the status quo, correct? We can leave this part > > unoptimized for now - in the future, if the use case justifies it, we > > can talk to slab allocator maintainers and ask for guidance on a > > lock-optimized cross-node bulk allocation API. > > Definitely. This is a sound fallback strategy if (1) doesn't work out, and > even if it does, we feel that adding an int field to the metadata is not > acceptable/needed. I will make sure to share before/after data with > usemem30 and kernel_compilation with the different options (the > int field in zswap_entry, bulk vs. single allocation). I am against increasing the size of struct zswap_entry. On x86_64, there is a 3 byte hole after 'referenced'. We can technically use that, although the node id is usually an int, which is 4 bytes on x86_64. In practice, I think 2 bytes (i.e. short) should be enough, but it will be ugly to cast the node id to a short. We should at least WARN on overflow. Or we can take the simple route and drop the bulk allocation.