From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E89A2C71136 for ; Tue, 17 Jun 2025 14:22:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F3706B0095; Tue, 17 Jun 2025 10:22:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8CB366B0098; Tue, 17 Jun 2025 10:22:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E1486B0099; Tue, 17 Jun 2025 10:22:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6C3766B0095 for ; Tue, 17 Jun 2025 10:22:02 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CF7601202CB for ; Tue, 17 Jun 2025 14:22:01 +0000 (UTC) X-FDA: 83565106842.21.04A1DAC Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf10.hostedemail.com (Postfix) with ESMTP id E05C9C0019 for ; Tue, 17 Jun 2025 14:21:59 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Aey6DGI1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750170119; a=rsa-sha256; cv=none; b=h1MyumaLKmsG3Z68GDtf87LhgTB3WsAHg2Sjx0Xd1wE3xWeF16vlg2fo8uLodL7quTsB5X mHiQTI4vaGUqRAaWkq0ffzN49BVCddJQuSftfwkSxAENueqqvmq6rZYF9aSVaXlOhWmbYO gIS+4GpksAmwDjem2/YbfJxiiEuvZN4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Aey6DGI1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750170119; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=m5H3llY72enjkEdNIToGWJYvPuL2qq4dnT9xlwtzu20=; b=vE+/Bz5vNx1daAXrkStlIrbgx8P2B1ZGm6nGNAbq9F+8CZIZLzkYOu5KH/U+HPWM3/t3th q4wotklqQQkjS7qo4HOYFOrixAzRdwEUy1TTNPCD4tvvYlkh2JURVgsMexgvqL4xj6c07g ww9yYuRp3VDSgnJ46d5a24HMDbFrHqQ= Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-7d3cc0017fdso347354185a.3 for ; Tue, 17 Jun 2025 07:21:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750170119; x=1750774919; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=m5H3llY72enjkEdNIToGWJYvPuL2qq4dnT9xlwtzu20=; b=Aey6DGI1PBqC3R/Lohv/noPJPfIRaoNO/rHg2M4VAKM2jEfrvi+rdUbldpcrlgeIDA Tpty5LGmXihuk99yaCQCpZYRUW9dvvw5aTecBuJeuEiGr3HnLJjx5lNS5huGALpGxDSi XwHnxfQLUNPsRTtUR+YOP3TJ8cUpF9BKFNz4jIZNjvvstpOsl04Tdih2hdy049rjUkvr e2IzrSPVrDkHXQeWxE2wkKDTNOI7Cx6ZArGxtc8VfX7MDgiubaclF26GJLpKdpYljSxJ l/fZ9EBZ/7UBg7U5SF9ODPH8jTHbMFEo5/evOrYRX4Lvn3tvN0F9qFwR+5spD1ZJukgo KVig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750170119; x=1750774919; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=m5H3llY72enjkEdNIToGWJYvPuL2qq4dnT9xlwtzu20=; b=mMZxgTvUVM8ag0N57ctmJPsj6bvj4V96gUs77yeuoyMI7086vpmPFWfVd/e4Op2Utm Mmg/R2z5MTZ8JoF8WqGjJ4HImRmHITe3AURrtQJjAdvKKYX2i0xLfCFUBPk1tjQdP68s 45WwWGohkZENJeF+5cgK2dlzxiyXPSBEjcoAJhNWBcN6FuzCASRofgUrWkIvgMrJVtWH uFvoqfMOLaQU6WTpvFI3towoXTv4slj+fmjlSSYwbqPKmF7D6JFwo3zY6rF8KUwETZ9V 8zUYufYfq12uoCRdAf9BJV653xiNejZOBQvAHSEGQ1AMaKvqUj3V3sDNrUBtPjQyhtBq O9WA== X-Forwarded-Encrypted: i=1; AJvYcCUs+rirkdFv+kRGSQfsF4JQiY1ScWNb9jFSN9sXCbjwdOWEuUDEjQfk+Sk+f8AzcrK3IRpIyNIa8A==@kvack.org X-Gm-Message-State: AOJu0YzMKCjDWEgvakZTcaxmHGM05It4Ez5WRRQkB7QdpQfDFlUjd50U IMusi54DDwN5hXbrSK3km+kLPoejd2sYxt/wnJbQ1NSz6wViDnvjLFbviuKrZV9UiilD2dDrR0A Gf89UP5aN/4rSHTnqSAKNIFI2y2DkGyc= X-Gm-Gg: ASbGnctaQRmYIukFBGgOwjOFavEoLilNjYRbr1S59So/ZFYPf3QqBvijDct+gl7lSHz c2RZVXOc8cEw4fe8hVUFaK5AZkGP8LsPwBDvH2zEBc2nb+btiz9VLHzL0Cx4WFobCMNOQrNLxs7 372+w3c9oSMlfWJyTo3KA5Omd6iaZPsDo6bv5MxAziLA== X-Google-Smtp-Source: AGHT+IHYNkrCgtJHt2hFgEP+iYzHoZb39T53y/Gl2gXFOZbEhw0D+9micgfKOfZ2G95P3OgJvLkG2Kh/Ej/C/K4Aaj4= X-Received: by 2002:a05:6214:5b04:b0:6fa:fcb0:b88d with SMTP id 6a1803df08f44-6fb4777844amr254933506d6.28.1750170118787; Tue, 17 Jun 2025 07:21:58 -0700 (PDT) MIME-Version: 1.0 References: <20250616034106.1978-1-21cnbao@gmail.com> In-Reply-To: <20250616034106.1978-1-21cnbao@gmail.com> From: Nhat Pham Date: Tue, 17 Jun 2025 07:21:47 -0700 X-Gm-Features: AX0GCFsLfKaP0pe5rAQXmN-MLD8Oy91foKqjJyGMkrVUvROAKlnJQR793e_ehyQ Message-ID: Subject: Re: [PATCH] mm: Add Kcompressd for accelerated memory compression To: Barry Song <21cnbao@gmail.com> Cc: hannes@cmpxchg.org, akpm@linux-foundation.org, andrew.yang@mediatek.com, angelogioacchino.delregno@collabora.com, casper.li@mediatek.com, chinwen.chang@mediatek.com, james.hsu@mediatek.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org, linux-mm@kvack.org, matthias.bgg@gmail.com, minchan@kernel.org, qun-wei.lin@mediatek.com, rppt@kernel.org, senozhatsky@chromium.org, SeongJae Park Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: E05C9C0019 X-Rspamd-Server: rspam10 X-Stat-Signature: 5sdjxwppa7t44xpttziscotgftm5oydg X-HE-Tag: 1750170119-775667 X-HE-Meta: U2FsdGVkX18VrcXgdeWanQpwo3gpoXlQ8Wp5AAdJ6sspTk0xozQK8a2BwwE0/D0tLnwHTsQZ334cQOwE7UdsnqMvgHAw2ugzO4gmc55pCEvv6bkXS85w77LvfhmdSzmTAs/BUCfRgqUYXNGkQOuV8BvDvVcox83BsZT0kYq3mtAs119AqD84lrY/8s4DN7+KvfwGUfekD2PdL5tTKp+7R6rwRXUUgUNSvPiPtbVWSpgMeqUWzWWiM8k2MuP4NffpDK4TLWUXVG/BhBks+a9L9qThV47h9lXSDQ3tCwh9POImthkTD3jPS2QecElguymIPi70zqqa5lySUxQanjwvp6IKBbl18y/nVuwWlrFQvzHbADbuaZGAq2JyGJZFmaXEngb4R8sxj2Z5xNQnUgrotm0iJkTdiB8etHBAJh2fn3cBI18t9QWJU6hiahtMl/nP1B/dR73stqedtaPrwwA70MGjemVT5rVDLuNcQ6K1v73q3OzJJW7pQtU6ZcgRNCn5hTWJ8odp2dVoaFnMARgQdx/q5ymGPHIbuimQT9gzTR523/nO8wpS315eUaxflIMbcenmQft2UkA7gJjb2d7CiORLqMU49vclNDDaPN97W5cJUw1l/xqjjCSPTfabCQyNes8LcG+B2jvXnSVYy57tEDz2a1E9ExftLx5UHsTR8ej9a+edl1KnL6jTufGAz+B+JVVUaDtnU0KGeCr7wI1LBwYgTP2fGMxgbKmtUhOcHEhlCjGwyWf9BnW0p2kRo/D+m4VHr0lUvOwOM9NzXkcWaYdKLnEx5yEV36PLxmQQfl0zcIyf85I7tX5OiIzJ0iWYYwRM2vGAho44DqyRnYOYz1k2nZgTC2iFwFF/fSai6zeEwQSuJezS3pOItl63Ks6hnv3OyhsSSri/SvqJ6hY6PFjR2KNi7VbVyk7nXhNEcnPdesgsMo4v0CA24DlhDbOVK63+sLcgyBZfh34GusJ x8EqXlBt ne5dTMUeDGihfYlTcZkrIsZfAfS1XwewaDcj8D/G6dO+zJy6SJWzemFmNT53JaVjuhZiwY2hn9jOFqLXmRpAXab7EFkmHal03fi26idDDLrGgB8+Qr0XpNMmDuNv2zqcT1jhklhSQ7heel9QTflOoxRurk8IrudjztQRTgV2bemtpjlzYgDPhW7UtrwvLbVHHdJQnFarKwvF5Wu1AjZK+ILs9YecpvMwvTLkB1chKJTBujKGgxAdt5EjNtvI7fg2342FT30N/jCJpmsnP4WCj0dIKIz+n8AQUd044P/+QDaxTCXdPOfStSSUdcISNQlDFlLN398Ci2l24q7/VhdlXaoQ95zTVF5xrpS6/1QEsLR3vZ8w0+60Q0P55AnCywEVOHMaAmQCEz6JSuCuZ775PKuB0H+WiVn6flE1FdNVPa1ONU5kno5TYy6J5qaw82zVvTFyfq/1djW4EMthlda2hR21cMQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jun 15, 2025 at 8:41=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > Hi Nhat, Johannes, > > >> The way you implemented this adds time-and-space overhead even on > >> systems that don't have any sort of swap compression enabled. > > I agree =E2=80=94 we can eliminate the time and space overhead by refinin= g the > code to hook kcompressed only when zswap or zram is enabled. > > >> > >> That seems unnecessary. There is an existing method for asynchronous > >> writeback, and pageout() is naturally fully set up to handle this. > >> > >> IMO the better way to do this is to make zswap_store() (and > >> zram_bio_write()?) asynchronous. Make those functions queue the work > >> and wake the compression daemon, and then have the daemon call > >> folio_end_writeback() / bio_endio() when it's done with it. > > > +1. > > > But, > How could this be possible for zswap? zswap_store() is only a frontend = =E2=80=94 > we still need its return value to determine whether __swap_writepage() > is required. Waiting for the result of zswap_store() is inherently a > synchronous step. Hmm, I might be misunderstanding either of you, but it sounds like what you're describing here does not contradict what Johannes is proposing? > > My point is that folio_end_writeback() and bio_endio() can only be > called after the entire zswap_store() =E2=86=92 __swap_writepage() sequen= ce is > completed. That=E2=80=99s why both are placed in the new kcompressed. Hmm, how about: 1. Inside zswap_store(), we first obtain the obj_cgroup reference, check cgroup and pool limit, and grab a zswap pool reference (in effect, determining the slot allocator and compressor). 2. Next, we try to queue the work to kcompressd, saving the folio and the zswap pool (and whatever else we need for the continuation). If this fails, we can proceed with the old synchronous path. 3. In kcompressed daemon, we perform the continuation of zswap_store(): compression, slot allocation, storing, zswap's LRU modification, etc. If this fails, we check if the mem_cgroup enables writeback. If it's enabled, we can call __swap_writepage(). Ideally, if writeback is disabled, we should activate the page, but it might not be possible since shrink_folio_list() might already re-add the page to the inactive lru. Maybe some modification of pageout() and shrink_folio_list() can make this work, but I haven't thought too deeply about it :) If it's impossible, we can perform async compression only for cgroups that enable writeback for now. Once we fix zswap's handling of incompressible pages, we can revisit this decision (+ SJ). TLDR: move the work-queueing step forward a bit, into the middle of zswap_store(). One benefit of this is we skip pages of cgroups that disable zswap, or when zswap pool is full. > > The use of folio_end_writeback() and bio_endio() was the case for zRAM > in Qun-Wei's RFC. > > https://lore.kernel.org/linux-mm/20250307120141.1566673-3-qun-wei.lin@med= iatek.com/ > > However, the implementation tightly coupled zRAM with reclamation logic. > For example, zRAM needed to know whether it was running in the kswapd > context, which is not ideal for a generic block device =E2=80=94 the role= zRAM > is supposed to play. Additionally, the code was not shared between zswap > and zRAM. > > Thanks > Barry