From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDC27C87FCF for ; Thu, 7 Aug 2025 23:54:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D01348E0002; Thu, 7 Aug 2025 19:54:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8AAE8E0001; Thu, 7 Aug 2025 19:54:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B52488E0002; Thu, 7 Aug 2025 19:54:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A24FA8E0001 for ; Thu, 7 Aug 2025 19:54:34 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3A2741A033F for ; Thu, 7 Aug 2025 23:54:34 +0000 (UTC) X-FDA: 83751618468.13.ACF11FB Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf20.hostedemail.com (Postfix) with ESMTP id 75BB31C0008 for ; Thu, 7 Aug 2025 23:54:32 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Sv2aohvv; spf=pass (imf20.hostedemail.com: domain of sj@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1754610872; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JyWGAG/dPC0bpWAmwlYju0LKA2Xe07/DJLht8zkW6FU=; b=fkxBJH1CEwKhP9RbUHOF/xFAY4vpTUh7E0eMkixmttDqQexddvzyya6c5p/uOto6oyTMYb zL6unLQqcfJJSBgOelYUMdklc8V6HCCNUHGlpbgtIqzmxf6+H3NjAHXmLT3GvdOT+n60vB suDF2fXPIqmgofyzxKZsYfEhy0cdsNM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1754610872; a=rsa-sha256; cv=none; b=sVuDrM8WnM49Rwaz4U6EXSZwRp2XqdpHuysXQF24ANh6mRC+WfdQ/MyFh2Zrhybv0iIiwD fpkR0wVuiHv6Udhl8uABJ0B8AYcZE5M+Hnss2HZyzhPtgQK+Hslduv/xbhBbg+CUnaWZvZ a2wxILs3zSMgqim+i4efs5yIFX9UjmY= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Sv2aohvv; spf=pass (imf20.hostedemail.com: domain of sj@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 4554E43D3D; Thu, 7 Aug 2025 23:54:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E28D8C4CEEB; Thu, 7 Aug 2025 23:54:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1754610871; bh=vEYB2B88OUYFZ5LJS94X33b1uoKVdexcX7humGXPCHE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Sv2aohvvZW0xLewCgqt06zb9IoeAT+DQJcnmMDk786BqmJTklbJgsXvZ+/cqpsNkd XpCelYCWtt0JX1PPLRRmTs10zZ0ouua/199VULrZHsOrGItZNNpDthibBzCIzxHgsu e4NqEzn6t3aO9fYnJe+QH4bjNIA67rI785ZMPFHGEVEiArWlNLdiM9ytZ6aWMAZC0e mAm+uWxpOD0OdwOMJtKki2q+AIOMc8lSroIu8MpS+E51GFZqRnsNgnVx0w6Plu2wbG 93PiAmEZGPGJADVeosEwk6PZ0ZR0+I/JfiZOFT/shobVXEX21H2JmJtnbhL7IAOdna Fnz4gS835X+Zw== From: SeongJae Park To: Nhat Pham Cc: SeongJae Park , Andrew Morton , Chengming Zhou , David Hildenbrand , Johannes Weiner , Yosry Ahmed , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Takero Funaki Subject: Re: [PATCH] mm/zswap: store X-Mailer: git-send-email 2.39.5 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 75BB31C0008 X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: ddz5gyqex6gujycpymxjd1sctnw8xjfs X-HE-Tag: 1754610872-247314 X-HE-Meta: U2FsdGVkX18ujfywyGPWMwUr4DYxh5KINha3+vM1nrLVDKBpQcn9A8/TrYhhfD/AMM9wEHq1HbNyxj47jBJ6k87/pQYDSa0X74uedI2DPnrcYLpnQ0/sSTTFdjcpwIcO8YdB1h19AhBX6O3N11A0kFMJrkt4g7u3jbYY3gyKI/lJJUC0a0PQMR4nAJ/nlVVCh6n2jFrpj7zYv6zUcyb2V6NdIdZ0bhX35IDZA2HgSmzrbEnB75Cz1DSaudJEp1Kc65Dz+P0hg1ZFCXoOlCFtLpMUPxu2sx5a54TSFb7v4W7nZfEJ4FoEYJBqDusvxLEVe5N31qkvC/sT5nw/feZfTyVO4DketX/qKiM0YBusdu74YBAfbJs1Wn8+/jR8rQ5wcmOseDakt8XNuAr5/OIRieMsQMZeUEawdi5zaib0VR3Mla+l8DN0KITyfQLXgYovxg2TktU/mjY0BtT2bnfodSH9yGZkvFCooyZsa/07ucXNht8sfByQJZ2mJkKIMtu7LdVfAHl4g1gWxXB+Tn7G4Y43hQBk49ZxOHa02g2vheh8lBNlMg1PuL70LjLSwwMPm/Q3glg+2ulIy3MlPsyC2IjuE2ytvPgOW3Ex6GSEUdlgTo7uqqqZVC/8SPo9W2BlBUoxyQvx3c6elf9R9jt8IrtuVjuML8CPnVgM+pJuDbP8nf6nhklQIqhgBOIIY2EZTthY0sEbzkhbo9q5P4avnVcqXoI9qHg/DoiBuePkK0MVHT1PudpdliiqerScxOgaQi+aPJ+FySxaCMb8Zx43R+Bvagu3Lw55TZC5EqIcitpvbDX0HM9ywz7ALReNOYJCr7fTrfhaeiKMUROr9M/U+px3RU08eWIqLMQDqZVtnC6vY6SH7ACMWBUhhdxnfdNfh1k0Ak8bbYDq0ZtTkEmHxJee3bInuyi9k0QbGzFSL9qQIuxigDZf0K9ZKAzuif3bQ9oOFZgodkXIb0Adu78 y25emt+A nTiV7snAAPSo7JUOpMNrZf84hTpOlqS0GkxN2vCFcbxwYsDYd1LVt7bivbcACPW8cYDi9l7a6QQhja70O7wk05t26/hMsM54s6ECDLp+E7BdXAYAGtNRYI01U1MrkIyy53riNzy+DbeEzBAXhBsJ3SPaEsvKBFEGLZJJAMTkQzlGCy0n+W0SooZawWWSz0r+aVUs1VFzqqsJnEEIgzpEHdMdZ8AiWipyDLGH5wH5y9CnrQ7zItyCtMDoznaf7Hc/HYVbQ6q1NhWeoMKALTHiO64G/eW1nQc2AzWFXxMqHxQcBs510iexOi0zqD9wul4QGbOIaEWFIbvWCS+FTYXIRwOohRg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 7 Aug 2025 16:03:54 -0700 Nhat Pham wrote: > On Thu, Aug 7, 2025 at 11:16 AM SeongJae Park wrote: > > > > When zswap writeback is enabled and it fails compressing a given page, > > the page is swapped out to the backing swap device. This behavior > > breaks the zswap's writeback LRU order, and hence users can experience > > unexpected latency spikes. If the page is compressed without failure, > > but results in a size of PAGE_SIZE, the LRU order is kept, but the > > decompression overhead for loading the page back on the later access is > > unnecessary. > > Maybe add the writeback disabled story - we added to get around this > latency issue, but that solution was not satisfactory either: wasting > cpu cycles retrying incompressible pages, especially when we're under > reclaim/memory pressure, and we've reclaimed most if not all other > sources. > > This problem was pointed out by Yosry: > > https://lore.kernel.org/all/CAJD7tkZXS-UJVAFfvxJ0nNgTzWBiqepPYA4hEozi01_qktkitg@mail.gmail.com/ Thank you for sharing the detailed context, I will add this history with the link to the last paragraph of this section, or the 'Related Works' section. > > > > > Keep the LRU order and optimize unnecessary decompression overheads in > > those cases, by storing the original content as-is in zpool. The length > > field of zswap_entry will be set appropriately, as PAGE_SIZE, Hence > > whether it is saved as-is or not (whether decompression is unnecessary) > > is identified by 'zswap_entry->length = PAGE_SIZE'. > > > > Because the uncompressed data is saved in zpool, same to the compressed > > ones, this introduces no change in terms of memory management including > > movability and migratability of involved pages. > > > > This change is also not increasing per zswap entry metadata overhead. > > But as the number of incompressible pages increases, total zswap > > metadata overhead is proportionally increased. The overhead should not > > be problematic in usual cases, since the zswap metadata for single zswap > > entry is much smaller than PAGE_SIZE, and in common zswap use cases > > there should be a sufficient amount of compressible pages. Also it can > > be mitigated by the zswap writeback. > > > > When the writeback is disabled, the additional overhead could be > > problematic. For the case, keep the current behavior that just returns > > the failure and let swap_writeout() put the page back to the active LRU > > list in the case. It is known to be suboptimal when the incompressible > > pages are cold, since the incompressible pages will continuously be > > tried to be zswapped out, and burn CPU cycles for compression attempts > > that will anyway fails. One imaginable solution for the problem is > > reusing the swapped-out page and its struct page to store in the zswap > > pool. But that's out of the scope of this patch. > > > > Tests > > ----- [...] > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -976,8 +976,25 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, > > */ > > comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait); > > dlen = acomp_ctx->req->dlen; > > - if (comp_ret) > > - goto unlock; > > + > > + /* > > + * If a page cannot be compressed into a size smaller than PAGE_SIZE, > > + * save the content as is without a compression, to keep the LRU order > > + * of writebacks. If writeback is disabled, reject the page since it > > + * only adds metadata overhead. swap_writeout() will put the page back > > + * to the active LRU list in the case. > > + */ > > + if (comp_ret || dlen >= PAGE_SIZE) { > > + if (mem_cgroup_zswap_writeback_enabled( > > + folio_memcg(page_folio(page)))) { > > + comp_ret = 0; > > + dlen = PAGE_SIZE; > > + memcpy_from_page(dst, page, 0, dlen); > > I wonder if we can save one memcpy here. Would it be safe to map the page: > > dst = kmap_local_page(page); Sounds good, I will do so in the next version. > > then, after we're done with storing (i.e after zpool_obj_write()), do: > > if (dlen = PAGE_SIZE) > kunmap(dst); > > (don't forget to unmap the page in the failure paths too!) Sure, I think the 'unlock' label would be appropriate part to add the unmap code. I also got your correction of s/kunmap/kunmap_local() in the reply. I will not miss that on the next version. > > > + } else { > > + comp_ret = comp_ret ? : -EINVAL; > > Does this keep the value of comp_ret if comp_ret != 0 lol. Yes. > Syntax > looks weird to me. I agree it could look weird. I prefer keeping the code in a way more familiar to ones who will keep maintain the file. So I will modify this as below, in the next version. comp_ret ? comp_ret : -EINVAL > > > + goto unlock; > > + } > > + } > > Also, can we fix the counter value? > > I assume we want: > > else if (comp_ret || dlen = PAGE_SIZE) > zswap_reject_compress_fail++; > > or something like that. I'm not very clearly getting your point. I was thinking we should increase the counter if we "reject" the page (does not save the content in the zpool) due to failing at compressing the page's content into a size smaller than PAGE_SIZE. This patch implements the behavior. Am I missing a mis-implementation of the behavior in this patch, or the behavior is not what you think it should be? More elaboration of your point would be helpful for me. > > And what happened to the incompressible page stored counter? :) I don't get what counter you are asking about. Could you please elaborate? > > > > > > zpool = pool->zpool; > > gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE; > > @@ -1012,6 +1029,14 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio) > > acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool); > > obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer); > > > > + /* zswap entries of PAGE_SIZE are not compressed. */ > of length? Thank you for this nice suggestion, I will change the comment so, in the next version. > > > > + if (entry->length = PAGE_SIZE) { > > + memcpy_to_folio(folio, 0, obj, entry->length); > > + zpool_obj_read_end(zpool, entry->handle, obj); > > + acomp_ctx_put_unlock(acomp_ctx); > > + return true; > > + } > > + > > /* > > * zpool_obj_read_begin() might return a kmap address of highmem when > > * acomp_ctx->buffer is not used. However, sg_init_one() does not > > > > base-commit: 2ec534125ae474292175ae198483c545eac2161d > > -- > > 2.39.5 > Thank you for your nice review and comments :) Thanks, SJ