From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5748FC77B7F for ; Fri, 27 Jun 2025 23:21:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFD7B6B009D; Fri, 27 Jun 2025 19:21:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AAECB6B00A1; Fri, 27 Jun 2025 19:21:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99D096B00AD; Fri, 27 Jun 2025 19:21:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 86F7A6B009D for ; Fri, 27 Jun 2025 19:21:47 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D3E44803C8 for ; Fri, 27 Jun 2025 23:21:46 +0000 (UTC) X-FDA: 83602755012.09.9FCBF6E Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) by imf07.hostedemail.com (Postfix) with ESMTP id F411E40004 for ; Fri, 27 Jun 2025 23:21:44 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="fWA/72OQ"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751066505; a=rsa-sha256; cv=none; b=bYjJbQCOqPOcM2nWxcF+UWG0qlx7DXNA9RdG5r05qfDA0FaakYjVahXR4kFlnuOppNMsJs 5VTCRjkjl9JTlv4U5b/wPw5gc6br+D2pvx599/BNyWiv/gsj5elzqX7cp3mXBYvegcQPJI luUUkcSqI+AmFnEhdivoLZhM803qK6k= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="fWA/72OQ"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751066505; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ucJkPKtW7Pg719flpv7obuaScxhby2iCUB5MbPa/kpk=; b=sl9V6/9QJFIlsHRRhSQsFtP03o3cBgTsieO+4HAQJ3IzHx3gMUI3YUFI8ziAeaZnYqmsVf 5Z9O5Z4tVGCdoNXiMoVtnlPlFoEBJhZuQvsR4U7vuI1MIipkmtMFLCzItDxxSfi0g2IHO+ OWz5yG1D9SRquIbqLuEFat7yKnEXgF8= Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6fadd3ad18eso27209116d6.2 for ; Fri, 27 Jun 2025 16:21:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751066504; x=1751671304; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ucJkPKtW7Pg719flpv7obuaScxhby2iCUB5MbPa/kpk=; b=fWA/72OQNTRpl1TmuxPsKF3S0ZF4GPn4J3VkYQ4QAV8JzwKmPc6Lj904yDLQSHIBGM 2zenFsLxMe3Zy4Tiy2M7HoBxWWpwN+Rlb/IkI82RGXEGxmfccwFqEF7UjaBSRxOnpS3C FxC7I40AXaZUcbXCM/ZogGbynBsOLUTtU1Lxe0/fk2gfb8iX7hCnt5ovCQs72B5DsC1H kGe5XbhzJNjnOtszh/VLQW0RcbGa+m4Ppc6ER/BGzeNTSslU25G7k1H5vi0eZyalTBO2 sIF97ll0csyR4bpYJj7KK/qyWPfk43EUbpXUGZ+WiqdLH6XRliLIFOkjV5IbQBJQSMTx jlpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751066504; x=1751671304; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ucJkPKtW7Pg719flpv7obuaScxhby2iCUB5MbPa/kpk=; b=vCQ1uDtD6aOPlxxBxIjPEkWZmzT6B0eQdiqdTcInQtoX0MEo03maRQGAKlzN5R3qBs cdRafknm1fzJOLSEMm7Gi8XroUKPXxKKiLw3aAn2rHat7kXnQMe5OaBPsVVHV/cDSyuL 7mMaIEH42S87ARSCCWF5jiUxFC/cA0RLI2853ky5NXGxbNxDPCsYuvEEEV5uZb7e/NWW N8qwureAZAMMnwzKV1D7XJusFWQMQqa7lMR/g00Bigm52WmVSUp1nV6VMyoR/PDztQHg TOmR9P6MiUdIW8hW+vMcRyO5+EcaYrlJtf60Za6xHTLmpK+YBunCbGCKL63xcHTFbu3U hYLQ== X-Forwarded-Encrypted: i=1; AJvYcCWj9ROCfDkUdkrGn8JiPCCOIkU+W0eSWYjdeOznxyKcv0E65Qq6/hlXPmN0JRqtYdqewp+NH0P6Nw==@kvack.org X-Gm-Message-State: AOJu0YxiSJex2FP2cWxTzkK0/CM8oGOKBJKKTz9swio6eBwYwJlV7R22 UBh9xPWvB/zG0ZUDvp+JmSikf9EuCMKIJ+EDQkhQR+ReTWrwJIsPN5TzMTwRNFWY/rVOvqeXN4i h1dAXiJPtCdisaQtslwZEIne5JqcOHas= X-Gm-Gg: ASbGncvskdkw5Hfax+2cmrC7NrwLo7j25c44dr4Nojzf9bs2OLMp9RutFfAPTVdoWox q54oHz0GXuvDcElAQ5CiiKtxG+/WpmvaeYIZ7FVA+ARQbUelKUk0GsSWa9C4nwpGu1JtePq6WuJ 9//tbuaYnzC+eMEAq16ZYThnEeJOBClH6z7KaBdkZSJuU= X-Google-Smtp-Source: AGHT+IHt7Bnl1QDdVB6cTrkscdZrmsp2tuuy01YtS0lHhra3cCiP4pdY5RNPLzOZWdY2wMvLK7r6ZqOWRGCudJ2pzN8= X-Received: by 2002:a05:6214:19e2:b0:6e4:2dd7:5c88 with SMTP id 6a1803df08f44-70005d01547mr81865596d6.38.1751066503861; Fri, 27 Jun 2025 16:21:43 -0700 (PDT) MIME-Version: 1.0 References: <20250623051642.3645-1-21cnbao@gmail.com> In-Reply-To: <20250623051642.3645-1-21cnbao@gmail.com> From: Nhat Pham Date: Fri, 27 Jun 2025 16:21:32 -0700 X-Gm-Features: Ac12FXw9SLcgHLB1-sjIGFMt6c57r8Qm130o6AnwRJ48KBb5RI23RH4t3Y7AL0Q Message-ID: Subject: Re: [PATCH] mm: Add Kcompressd for accelerated memory compression To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, andrew.yang@mediatek.com, angelogioacchino.delregno@collabora.com, casper.li@mediatek.com, chinwen.chang@mediatek.com, hannes@cmpxchg.org, james.hsu@mediatek.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org, linux-mm@kvack.org, matthias.bgg@gmail.com, minchan@kernel.org, qun-wei.lin@mediatek.com, rppt@kernel.org, senozhatsky@chromium.org, sj@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F411E40004 X-Stat-Signature: q9b1smefdbq647rto1uq89g96zsdufn8 X-HE-Tag: 1751066504-575549 X-HE-Meta: U2FsdGVkX1+yHIz+xfZrp64oSRqkEyesb0N+LiJi7fpFDvRGS9q7fzSiMSFnBsCWC86NwAH6MUs6eEyNwWtxL95AH61UAzIpZw7jGpvH9gNzD+fr+visH1vngoq7A3Ls4WHZrB4ohqFAh+7KJbYS7fcOTMacSU9Rl3crUa3JLIdejBAqS1V52sD2EvgRmyaNsslHd6JB2JLFWY9eIziIE3E+BH2Sc8WOf1d1FAeCkv8h6PyhzNpGgC7EO513kTODIieqVAtuRcK7jTAspUjWeYH55HGgIc5kfX1RrAXqoFUNmFzu4ZZ+8M84/GpimgSwiyma0WPytJ5sSyRqBrwcqNJLk1rnSm5xjLCAfNozrHota+5qDHCPU6+GfVvSfOg1WhUZS7HRmjk1wwhGizqVklW96R5iispVZt2dHTIgTvNzy8xG0+Rrr0pSUbuScOpX8A49fyD3VMUgl+zk7RjgzJOkUHRB9B2kZ6HmzEj3Kc9dx/RznCLUnFr24x9DxJ4C3rmNpv0rlptw9LQqcNiucWnGSvxDCR4JhMTHbrdMsnS12GlgsUNFV10dEzjq4FX+dnAejdPu+v4evMVT0Qx4XRFxVyRR5pXO7a6b0voBfESGhf7cnwT1oF7ZEpFuVgFDyicBPRwAexIIz7fM+6g4AfQ/UAM5/VvkhoFGhRhsbcqTK1aQysnfvSucZHVun2o7GFzOlLyuJGFdgZNn2xPbnFHmWJYtBAVSRHWqj94GyU36vPNWLlLnD9gsL98Kqs1/vQEei8g5URKueNvcXtLnEPtizTnJz+YwamEh0z35LtuUvVeSEmSSWINDAZaVL/LevZeFKrfFNIktIic7yfqOYlft0Wz4rAdsAKvj4GPxJmDQGPmjr/JL6TukH/9olF1LX7D/JS/yL5MIAo7E8/wC1nJk1fzUGDDFM/uE42UVyB/6t31VCSKI24YUYPcAkfswKF0jNM7pcrokjUXZs0Z bvQ9caxo uw/Ay+7H3BE3VPEKQEbngyV1qMR+3v33JbMrlgJUlc+CwB+GwZy2zr6X/UO6DL73/ZD/aI+1YkndMsJkXx9o57RKGZd08IKbaFTzSQ3KEr/Xoa6OqohDDxMQ9sHKl+fyahwGdQuiQF+XmHCLQONJA6Gn3Pn0M1XAQISky5hkO+NrNOHg5v7DT+8/VzKgm+AAXvhhBAQOtvKhf4lfPeQI/HkfPBKHe/HOeQt8ufEVUEVUVAnjKFvM0X4rsePm/AfBqHo0r3QoSSeypKVm8E1wMNfXXUiTa00S+VAAvQYUkdniloECTzotqT5Y6Vh2Mu9ROx/pjX5tXboo30qpYMK0T4hCNzy40oyfzKK7ZibdVYaChYrT8xibqVqr63wGKiVb1Rdcybgs/JU2kxuTCiMgwTTr8Da6ZrspvLrIyalyViu4vr9HXa1AYw3MH5npuvOJNh+YH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jun 22, 2025 at 10:16=E2=80=AFPM Barry Song <21cnbao@gmail.com> wro= te: > > Hi Nhat, > > On Wed, Jun 18, 2025 at 2:21=E2=80=AFAM Nhat Pham wro= te: > > > > On Sun, Jun 15, 2025 at 8:41=E2=80=AFPM Barry Song <21cnbao@gmail.com> = wrote: > > > >> > > > >> That seems unnecessary. There is an existing method for asynchrono= us > > > >> writeback, and pageout() is naturally fully set up to handle this. > > > >> > > > >> IMO the better way to do this is to make zswap_store() (and > > > >> zram_bio_write()?) asynchronous. Make those functions queue the wo= rk > > > >> and wake the compression daemon, and then have the daemon call > > > >> folio_end_writeback() / bio_endio() when it's done with it. > > > > > > > +1. > > > > > > > > > But, > > > How could this be possible for zswap? zswap_store() is only a fronten= d =E2=80=94 > > > we still need its return value to determine whether __swap_writepage(= ) > > > is required. Waiting for the result of zswap_store() is inherently a > > > synchronous step. > > > > Hmm, I might be misunderstanding either of you, but it sounds like > > what you're describing here does not contradict what Johannes is > > proposing? > > It seems contradictory: Johannes proposes that zswap could behave like zR= AM > by invoking `folio_end_writeback()` or `bio_endio()`, but this doesn=E2= =80=99t align > with actual behavior since zswap_store might not end `swap_writeout()`=E2= =80=94it may > still proceed to `__swap_writeback()` to complete the final steps. > > Meanwhile, Qun-wei=E2=80=99s RFC has already explored using `folio_end_wr= iteback()` and > `bio_endio()` at the end of `__swap_writepage()` for zRAM, though that ap= proach > also has its own issues. Hmm OK. I'll let Johannes comment on this then :) > > > > > > > > > My point is that folio_end_writeback() and bio_endio() can only be > > > called after the entire zswap_store() =E2=86=92 __swap_writepage() se= quence is > > > completed. That=E2=80=99s why both are placed in the new kcompressed. > > > > Hmm, how about: > > > > 1. Inside zswap_store(), we first obtain the obj_cgroup reference, > > check cgroup and pool limit, and grab a zswap pool reference (in > > effect, determining the slot allocator and compressor). > > > > 2. Next, we try to queue the work to kcompressd, saving the folio and > > the zswap pool (and whatever else we need for the continuation). If > > this fails, we can proceed with the old synchronous path. > > > > 3. In kcompressed daemon, we perform the continuation of > > zswap_store(): compression, slot allocation, storing, zswap's LRU > > modification, etc. If this fails, we check if the mem_cgroup enables > > writeback. If it's enabled, we can call __swap_writepage(). Ideally, > > if writeback is disabled, we should activate the page, but it might > > not be possible since shrink_folio_list() might already re-add the > > page to the inactive lru. Maybe some modification of pageout() and > > shrink_folio_list() can make this work, but I haven't thought too > > deeply about it :) If it's impossible, we can perform async > > compression only for cgroups that enable writeback for now. Once we > > fix zswap's handling of incompressible pages, we can revisit this > > decision (+ SJ). > > > > TLDR: move the work-queueing step forward a bit, into the middle of > > zswap_store(). > > > > One benefit of this is we skip pages of cgroups that disable zswap, or > > when zswap pool is full. > > I assume you meant something like the following: > > bool try_to_sched_async_zswap_store() > { > get_obj_cgroup_from_folio() > if (err) goto xxx; > zswap_check_limits(); > if (err) goto xxx; > zswap_pool_current_get() > if (err) goto xxx; > > queue_folio_to_kcompressd(folio); Something like this, yeah. Can queue_folio_to_kcompressd() fail? If so, we can also try synchronous compression on failure here (__zswap_store() ?). > return true; > > xxx: > error handler things; > return false; > } > > If this function returns true, it suggests that compression requests > have been queued to kcompressd. Following that, in kcompressd(): > > int __zswap_store(folio) > { > for(i=3D0;i zswap_store_page(); > if (err) return err; > } > return 0; > } > > kcompressd() > { > while(folio_queue_is_not_empty) { > folio =3D dequeue_folio(); > if (folio_queued_by_zswap(folio)) { > if(!__zswap_store(folio)) > continue; > } > if ((zswap_store_page_fails && mem_cgroup_zswap_writeback= _enabled()) || > folio_queued_by_zram) { If !mem_cgroup_zswap_writeback_enabled(), I wonder if we can activate the page here? > __swap_writepage(); > } > } > } > > In kswapd, we will need to do > int swap_writeout(struct folio *folio, struct swap_iocb **swap_plug) > { > ... > if (try_to_sched_async_zswap_store(folio)) > return; > if (is_sync_comp_blkdev(swap)) { > queue_folio_to_kcompressd(folio); > return; > } > __swap_writepage(); > } > > To be honest, I'm not sure if there's a flag that indicates whether the > folio was queued by zswap or zram. If not, we may need to add a member I don't think there is. > associated with folio pointers in the queue between kswapd and kcompressd= , > since we need to identify zswap cases. Maybe we can reuse bit 0 of the > folio pointer? > > What I mean is: while queuing, if the folio is queued by zswap, we do > `pointer |=3D BIT(0)`. Then in kcompressd, we restore the original folio > with `folio =3D pointer & ~BIT(0)`. It's a bit ugly, but I=E2=80=99m not = sure > there=E2=80=99s a better approach. I think this approach is fine. We can also hack struct zswap_entry, but that would require an extra xarray look up. OTOH, if we can assume that zram users will not enable zswap, we might optimize that lookup away? Not sure if it's much cleaner than just pointer tagging though. > > Thanks > Barry