From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A93D7C25B74 for ; Tue, 21 May 2024 20:41:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F9EA6B0088; Tue, 21 May 2024 16:41:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 182AB6B0089; Tue, 21 May 2024 16:41:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3D686B008A; Tue, 21 May 2024 16:41:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D20696B0088 for ; Tue, 21 May 2024 16:41:14 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 48D6F1C1EFE for ; Tue, 21 May 2024 20:41:14 +0000 (UTC) X-FDA: 82143572868.15.33FA04A Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf02.hostedemail.com (Postfix) with ESMTP id 5A2028000F for ; Tue, 21 May 2024 20:41:11 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="MVt/x/Vj"; spf=pass (imf02.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716324071; a=rsa-sha256; cv=none; b=r1vkAXTpXkrd0SWsdyMENnJurX7QtRANgiCWKM/HXIVoHlwDDHX+ECWU7LdZJcGmSwVG/q kIVcNX1Jxh5TTCGa3Rd62WO+8PWIhx8XnlydH0gt39ZQP4TbguxFqG2mfg/kR42JGD2lsY 6TEPgCDXxVAh8tkLD0861YS0/lsvPFg= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="MVt/x/Vj"; spf=pass (imf02.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716324071; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FdjVrGiXWaUDMOOF5/NsM5Es+OoXmb65rFMpNSx2HKQ=; b=Jrp+Sq4y52WCO/j2tE7xnZ2uC/JsuNRn66vAIClSA+Kv8bUGLZMusIfe082xOAHFZ0w6QV GA7IV1hmpHGQhNQl0MB1Bl/udRZgMDa0D79Kq8n4xnBemY8m2zkH9tjsZ+4cjYp17AG1hv 0ZoIZbUu2yQh0god0DnXY4lsjoj7PK4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 3D59061792 for ; Tue, 21 May 2024 20:41:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E387AC2BD11 for ; Tue, 21 May 2024 20:41:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716324069; bh=XrLqJ/5Jn/3kcPbFuQH+TwSQBrqnYZivqmOriD+RUjM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=MVt/x/VjiId9CHlfCdNbp6OjxeOpRRdUVM4A2qu1eqiFVSlnS8fM4F1F357M/H63e rv5raIumL1I1RLVNeEMi6zg9ZKBXhmm1HMdBwn82HLnMRA7J1REa9FtN13tjPCDDlO cDlcrFUDLTSjNzutBg+U7mwcKzPEnlH2Z5RgJQnZGeHpLeiLjytRDlZ7P+IkQAGTeL YoZqailcQKzxl78eliYSKM3QAL2mjBiKEKBpFMEUX/59TSkTgD0kgY8gZic1szX1i+ Hz4ARUg//KEMPdPosqGsIu7ySgWLof6UEvZI+6gQPzz8GGK7ZgeDNgEsEeC7k4X/a7 5djbtdglJpCBg== Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-51f0602bc58so219141e87.0 for ; Tue, 21 May 2024 13:41:09 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCW5O+yHQmHZpSXSCb28N+CaYnMUXP8VIwY6q9cK8mXLMBb4n2simKkmOg4L777+XN0LLD3B6+LO275iMlM/9Um76zA= X-Gm-Message-State: AOJu0YyMlw1V/erhcVYALIGZ7PQYNZ6C4t3q4BQp0VR9B9hOiTz0cTVQ fr3819ymWklygDOiXl/WocI6XtbHidrOz30iADjF7CBQNTE6JGDHUJEtCKeLM/OIyxHMWkM1gV+ CqVOxcSB32zPmZP8KBBpWe2z2jQ== X-Google-Smtp-Source: AGHT+IFXMBkS356GBVGRw/tG/bPkGYkbSS32TYMUsB6mw8drwS3p45A+XPUYB3+tXVKEaXGjS5Jb+lCVQKxLRasUt8Q= X-Received: by 2002:ac2:58c1:0:b0:51f:33d5:31fd with SMTP id 2adb3069b0e04-5269b2e65acmr43732e87.11.1716324068609; Tue, 21 May 2024 13:41:08 -0700 (PDT) MIME-Version: 1.0 References: <039190fb-81da-c9b3-3f33-70069cdb27b0@oppo.com> <20240307140344.4wlumk6zxustylh6@quack3> In-Reply-To: From: Chris Li Date: Tue, 21 May 2024 13:40:56 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Karim Manaouil Cc: Jan Kara , Chuanhua Han , linux-mm , lsf-pc@lists.linux-foundation.org, ryan.roberts@arm.com, 21cnbao@gmail.com, david@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5A2028000F X-Stat-Signature: 1rcz86ixdai5tmjwxwchpomgosw4z6qa X-HE-Tag: 1716324071-530726 X-HE-Meta: U2FsdGVkX19YxoM4H1pxliMqZBn6MGl0+rZHn7EaMFIHFqLv35qh5RLy8HF74cqps+HeqGy/CNpQV+cD22lyS2cRETfJ/FokCgqqR0C7umoPRm9wmZXAB1kX4VcBcooURMHseJZwgWgIlhtVMBkuZTWBLsXAhuB75WMwt/TIsuWq/o5o46vj+yEu7PW3FGVpjgpP2lJbrK/GKJBoZvs+ejYx584dI9AYT1+0iwoy0zvyNCVbrzieZ9lYcvps4H3fibD8Sf6ULCQFz+VXHIK9N8fs3nojmFIknyMm6r2b/PsJBOmOQw12DL3lW/vroLHx57sUC/9qmGiPURD/0kJ07CkWqaIOqCfTdKRgKjsxSFMOFaAc6mDEY0SqiaRfvSUmi8FPJK3WMWND0g7aPi81WmCdmBYQfuClKKDEh3AHYO729d78FvluDnpprYktxojEqXevr+YcKuaYIHYr43KdyVZN2MJm3yA5/NgUDVYFSJARdnrT2zLthIYX90mZeYSq+e2fIsC65Q5QInAo25xw9ovEJYyLSzwtO3p9VbGXPAUVHtM4Lh/j8JD71eWJxQZuwwqjPApV0NtCnTgyQANZlRIzJg02Jdyv2mTZh/mWgRwIhy+JMae9GkGrySgx+uj41lVY7dFzl3x1qDbk1O8ehoue1RUmCW+DLF6rv5nfjGrmsylgEYm/QQ3McDeF0HeG8jNQAokqPKMcVnW1xox2jaH1+1+hbQvO9yXLuXOnvLXtgKwti3BbD2LGed+7QhDPjSaC1fb4DJvO0YfOx3VaPfAbukzXuQuYdXbsF2BfG4hgtEvfgpE5Pz3x4tyE06SkteKr4lnpGX6rhdBZP+uFhEiXikbhtHXDWIFAixOcrqZQAxzv5iLIqZls9D5s+QyPLtB2LVezezEbo5CVny/+RvY7u5Dh2wxeo5J/MNpx2Z9UcURLgKmEsZ6ITbpp0/8+GEXvAV4fg3HsCFi+Pdl kxek7hBc YKca81Fb3YrkxLRIgIPskjjNSp4x1c9Z+/Z+W1W6RCYqN2Qmb2YQK59/cNsBiZy9e02wGpGTWACwZiaVr8vKO5Gsv6Dyd9oprMhv+FBPOSIg70CaZ9PSFOl03dL3opbrM5tGUR2TWs6Qw6X9QGKH64DyBDq7Xf9q8YWCTu9eo71Hvd5KmaS9C7r0iw0JIU4RulYMM08/mW4wn5jdBb9N001z83IrK1UFXdeUIbSuzA0OxkrpYxWQyrT2EXeUkMAfQKGuVyvND9kOOKuZroNsKd6ZznXfB0xUWcxtBeR4U7MnYQqbtYb0M3tPgMkPHnN4O1OOZijC/AGeqQ6NwF3WZfr25vRRKxxWg9HZ6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Karim, On Fri, May 17, 2024 at 5:12=E2=80=AFAM Karim Manaouil wrote: > > On Thu, Mar 07, 2024 at 03:03:44PM +0100, Jan Kara wrote: > > Frankly as I'm reading the discussions here, it seems to me you are try= ing > > to reinvent a lot of things from the filesystem space :) Like block > > allocation with reasonably efficient fragmentation prevention, transpar= ent > > data compression (zswap), hierarchical storage management (i.e., moving > > data between different backing stores), efficient way to get from > > VMA+offset to the place on disk where the content is stored. Sure you s= till > > don't need a lot of things modern filesystems do like permissions, > > directory structure (or even more complex namespacing stuff), all the s= tuff > > achieving fs consistency after a crash, etc. But still what you need is= a > > notable portion of what filesystems do. > > > > So maybe it would be time to implement swap as a proper filesystem? Or = even > > better we could think about factoring out these bits out of some existi= ng > > filesystem to share code? > > I definitely agree with you on this point. I had the same exact thought, > reading the discussion. > > Filesystems already implemented a lot of solutions for fragmentation > avoidance that are more apropriate for slow storage media. > Swap and file systems have very different requirements and usage patterns and IO patterns. > Also, writing chunks of any size (e.g. to directly write compressed > pages) means slab-based management of swap space might not be ideal > and will waste space for internal fragmentation. Also compaction > for slow media is obviously harder and slower to implement compared > to doing it in memory. You can do it in memory as well, but that is > at the expense of more I/O. I am not able to understand what you describe above. The current swap entry is not allocated from slab. The compressed swap backend, zswap or zram. both use zsmalloc as backend to store compressed pages. > > It sounds to me that all the problems above can be solved with an > extent-based filesystem implementation of swap. It looks good on paper, once you try to actually implement it you will find out a lot of new obstacles. One challenging aspect is that the current swap back end has a very low per swap entry memory overhead. It is about 1 byte (swap_map), 2 byte (swap cgroup), 8 byte(swap cache pointer). The inode struct is more than 64 bytes per file. That is a big jump if you map a swap entry to a file. If you map more than one swap entry to a file, then you need to track the mapping of file offset to swap entry, and the reverse lookup of swap entry to a file with offset. Whichever way you cut it, it will significantly increase the per swap entry memory overhead. Chris