From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7810FC54798 for ; Tue, 5 Mar 2024 18:24:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED3676B0088; Tue, 5 Mar 2024 13:24:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E83146B0089; Tue, 5 Mar 2024 13:24:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D4A956B0092; Tue, 5 Mar 2024 13:24:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C60FD6B0088 for ; Tue, 5 Mar 2024 13:24:23 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9697DA1372 for ; Tue, 5 Mar 2024 18:24:23 +0000 (UTC) X-FDA: 81863810406.23.802C637 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf20.hostedemail.com (Postfix) with ESMTP id 06E7D1C000B for ; Tue, 5 Mar 2024 18:24:20 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ROdvEI5e; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709663061; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CHVZQLg2EE8RKONnd9mtAy7a8jlhtLoUYNtRIYMeJBs=; b=1aU5taJKWKnQKM+VT+twBZSvv1uOt2F/ym5XG6Rzx5XzDkM9RCBhSTEtVd3+s5dWSQz8yi Q9k1FDO0IxMLxJwwkM0np5nIFo2C51CQvIftroTS6xS2rvBSCH2HPgpX/xu3Xv+L1nuSAs wkFjbTmL8FJoDXVuq5vW26xE75QxCtQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709663061; a=rsa-sha256; cv=none; b=kzpz2YLoKe70uv1G5O6GvcyPtNI66ZTkhHFzXJADojxTlWQryq+0nROz8N6h+2A3k7Uk0A nIDROSyIKIOEH1nULtZiOT2bSDKfQGlDYwUICXbFJy8iGuBksbyIhjyjufPiUdSMxnVvvn UzD2AqQtTrgwom6vAD8FbJcf11DlyZg= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ROdvEI5e; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 76DABCE1DB9 for ; Tue, 5 Mar 2024 18:24:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B3E7EC43390 for ; Tue, 5 Mar 2024 18:24:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709663055; bh=FIXiNbGtNknXc7s+trsXgJtaMyJz0iGcrP5s5I/IJ8c=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=ROdvEI5eAmTwRl1pkW2K9mqfarIYTYfEhzIW2SCOWxyvZRSnEoKnU2GCl5SYA+Ewm NFQ8XPX3Z9fdjKxZuPSEB9HHFNq/vhmnbsFg26xtounztjh7aFJxDkFp4MnK+RhDtG /TlZJAuGh2loVfxSu9Fm1NQpTYV2hrK9a9FofnHKNFN4rJu1ZyaQGb0QDyAlP+/Yi6 vO3DURkATeUrr2X2dlJVsMUGkVuwTfjz7FbTdurlfUrUe1L8xk0FqEcT45rSW14Zv/ 2lechMlQA5vFDTdkpigJZsHdL9XPm5SCgZjnw5wgtdAOsWknUhOWGhWDGFjhJBgE96 NF4KHB0pAFllQ== Received: by mail-il1-f173.google.com with SMTP id e9e14a558f8ab-36576b35951so4641075ab.3 for ; Tue, 05 Mar 2024 10:24:15 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCXMYgcuOhkhT5FvhHKKwdbe6RpU8qi0/y9L6hJn31trfvIhrAv+N9B6ByYourXqsLhN+a5laYfvULxY61qbNqF/qEs= X-Gm-Message-State: AOJu0Yw676kgvv/1y1CMLAVctyXeHNGW7MDEtMn95YbdWIqp0q5v7TYB jEVtP7Qk/UAcVflhJsSDSPmjg0it86BTy0UXcJ0CBrgyYqXFtGmFogg3VP3S0Wz4I2pSBrAXh8N hWyGMN7shQg5N7A5IhgNslCKIL+PhXbmmb7E7 X-Google-Smtp-Source: AGHT+IH71YDay1DdOUVzYYpu1FiBo2l3YdAmUrcjPmxX9gNwumUeiQEyzGnX6KssFBizKLI7+dcVAbMe7GhnFMoooTc= X-Received: by 2002:a05:6e02:194d:b0:365:b482:1b67 with SMTP id x13-20020a056e02194d00b00365b4821b67mr14285654ilu.31.1709663055062; Tue, 05 Mar 2024 10:24:15 -0800 (PST) MIME-Version: 1.0 References: <63cdfad3-c27c-4232-8bca-9cdb3ec0c6f5@linux.dev> In-Reply-To: <63cdfad3-c27c-4232-8bca-9cdb3ec0c6f5@linux.dev> From: Chris Li Date: Tue, 5 Mar 2024 10:24:02 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Chengming Zhou Cc: Matthew Wilcox , Nhat Pham , lsf-pc@lists.linux-foundation.org, linux-mm , ryan.roberts@arm.com, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: nwzga4aqhqst5darhk3taz855gzfxmfd X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 06E7D1C000B X-Rspam-User: X-HE-Tag: 1709663060-940691 X-HE-Meta: U2FsdGVkX1/vV0RY4YHvCSWUBafvt95goac9mG5CC4vbL2VJGnoA/blpIB1NUhfhuoGFS2xt7v9IK4exn8xE9q0o6ETM9YamMw1AJKrmU+Z25+miiPya8s+k4ultAUsGrOFFw7n82LJQMQqK0LlaMq1cp5R5HwZFe9e0oNoEYe2cZ6T4lRvIDOoyDi43x7ByURsQLujEzcvw8Zg/hhOU3u+7iZ12pkBoqnbjUzTui0yWTxfKgPf366P7hxOfDEkuoJrCprdGuRffGUdzoKmEcl4qmy6Dpeekf1zW1rz9MqoXq5pvZsbqCJT6x3dkKysnyqt8IoqN6utd4F3gQBexxxVU7fH10RigbZez0FIRvTSHxwd6OIuCWkTQCwQCm3P7hi0zuMXWWxUHF0x3PoYBMfG2NAUA/vXX8pAhO/FOgrDIMnwY/kKqE4aWOIsJxlqKARjE7vMickMlZR0tufWrm+z7sZsgf/6MsQJ7i+P1Olew44oZ9unyJVMp+EZeJgJYvhx14RLka096tv+iWGjEP/z7lhJ34nOUOoeUFa+EyUGCfgqy/f8KwFC4VAbiZbnoInTaTcuhW4DQkiDug4F+hA16o2n6Yxl9GT0b+44nYUjxdrrvqwRUr41D5pGWp3k64ZkPJAVGJl2PcfPe1j8B2MY4L68UcbtsqOJHQunDW/SkktY+CLNr9qgw7JxdhyM6lgfEE/ggMihadDsdhJ2HFPFZGdzXo9ETU3W5vdDBZWcZKZuHtZkiAIWhxZI5++XwF27bWtB8NHyYm4VDOwsM5tsdGSJDp1Xl6ohwen7o0oLwHU5C8mHB433uYWztcyMY2lzM4Y1pkfPNPoxx1nDLlCQflzA+bBk6ZmlEoL/P87+CGWj2VA7E11wsevKH73FN1NwiMUEXip/R/aqI6n0xxyNuqR3cBnt0ePcMwG8EkUKhXEs/PtLo7OELk2NQDhB8/MxsEHifSyPMSDGQ44M JoHWfwhX uHnUKwSwg4DP5WaBF5ecApqx9Sfeq02tJhcVNZakgoWdfYAUSpMVz/JIjr9CZDluRyDFss8O9awxIBqlSDbCL1podHCEF+jCUxL+Am83gLqapMZYj8g6ktulumkzqnFnMs9Moiz3F5oRvB36aJ7krpnobwQz/EmdzxYaP4qTVikqABr0RFC21Ofcr9pW9Aj0ScNPORG1MyHvsyine0h2hcZ1TEKtDfPxtjjaBZTdx6ZCmYAgShVZwnkDsqZBFcSEWy6GR3er0B/RIv3C3v7S0tXDeUQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 5, 2024 at 12:16=E2=80=AFAM Chengming Zhou wrote: > > We can write out zsmalloc blocks of data as it is, however there is no > > guarantee the data in zsmalloc blocks have the same LRU order. > > Right, so we should choose to write out objects based on the LRU order > in zswap, but don't decompress it, write out it directly to swap file. Here is an idea. Since zsmalloc uses N pages as a block to store the data, we can have a backend read the compressed data, write out to another zsmalloc in N page blocks, with LRU ordering. Then those N pages block write out to the swap file, The meta data of zsmalloc keep track of the handle will convert into physical locations of the disk. Those meta data of zsmalloc will stay in the memory. > > > > > It makes more sense when writing higher order > 0 swap pages. e.g > > writing 64K pages in one buffer, then we can write out compressed data > > as page boundary aligned and page sizes, accepting the waste on the > > last compressed page, might not fill up the whole page. > > > >> > >> Right, I also thought about this direction for some time. > >> Apart from fewer IO, there are more advantages we can see: > >> > >> 1. Don't need to allocate a page when write out compressed data. > >> This method actually has its own problem[1], by allocating a new pa= ge and > >> put on LRU list, wait for writeback and reclaim. > >> If we write out compressed data directly, so don't need to allocate= d page, > >> these problems can be avoided. > > > > Does it go through swap cache at all? If not, there will be some > > interesting synchronization issues when other races swap in the page > > and modify it. > > No, right, we have to handle the races. (Maybe we can leave "shadow" entr= y in zswap, > which can be used for synchronization) I kind of wish swap cache store either folio or a pointer to the swap entry struct. At the cost of one extra pointer per swap entry, we can have different types of swap entry struct, e.g. zswap. The shadow will be the common part of the swap entry members. Then zswap or more fancy swap entries can allocate different types of swap structs. That will simplify a lot of swap cache for each looping code as well, no need to deal with is_value() of swap entry. We just need a tag to tell it is folio or swap entry pointer. > > > > >> > >> 2. Don't need to decompress when write out compressed data. > > > > Yes. > > > >> > >> [1] https://lore.kernel.org/all/20240209115950.3885183-1-chengming.zho= u@linux.dev/ > >> > >>> > >>> I'm sure it'd be a big redesign, but that seems to be what we're talk= ing > >>> about anyway. > >>> > >> > >> Yes, we need to do modifications in some parts: > >> > >> 1. zsmalloc: compressed objects can be migrated anytime, we need to su= pport pinning. > > > > Or use a bounce buffer to read it out. > > Yeah, also a choice if pinning is not easy to implement :) In the above another zsmalloc backend idea, the bounce buffer is kind of required to compact different size objects into page aligned blocks. That removes the pinning requirement as well. Chris