From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD2ADC54798 for ; Thu, 7 Mar 2024 21:18:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 29B446B02C2; Thu, 7 Mar 2024 16:18:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2259E6B02C4; Thu, 7 Mar 2024 16:18:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A07D6B02C5; Thu, 7 Mar 2024 16:18:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E43566B02C2 for ; Thu, 7 Mar 2024 16:18:01 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BC36C120A9C for ; Thu, 7 Mar 2024 21:18:01 +0000 (UTC) X-FDA: 81871505562.27.5BCD1E0 Received: from mail-ua1-f54.google.com (mail-ua1-f54.google.com [209.85.222.54]) by imf09.hostedemail.com (Postfix) with ESMTP id 26A45140009 for ; Thu, 7 Mar 2024 21:18:00 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UGbofuue; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709846280; a=rsa-sha256; cv=none; b=NlVLLYTubLlHfNJav0vgKCkQxfl/yzJnv22Jrw9q+zAEZHJxlfT7Dlr7EQ3PFmDocTmSOI i+HQEupzhKnqfctPP8xGYC+YBwbQdEfRf/G6d/9hNsvw10n51SWLXWf19sekkIRBblqhf9 aC92aMc02g7+kbx71ITW3JZTf+ek710= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UGbofuue; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709846280; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kgcovSnb2fPFlLM/p7pWMQXGjzdOj4DdXtLenBaf1Lk=; b=OwJFtdKsNCNxzjs8ieoJ+i7PNxlEcXAh2+TnMf/xWp6nm1XVVxam7Ep6Uyw3KlhodfeY9a U0Aj/2bHs96JbXrRXpZf2FNb2IWx6xygFPFyVOCch/VJaRL42L8NGyGA9jxLlj1n0mOGOw AZNq/pF02ETYjUdSiaTpWa2WoYep/98= Received: by mail-ua1-f54.google.com with SMTP id a1e0cc1a2514c-7d130979108so703713241.1 for ; Thu, 07 Mar 2024 13:17:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709846279; x=1710451079; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kgcovSnb2fPFlLM/p7pWMQXGjzdOj4DdXtLenBaf1Lk=; b=UGbofuueTjl0Nya+qf8YRCtVimlbxRMVTC7LpEXE+APRLCVZnSKi7WHmXJ7rB3utsj qAkcCppr8FDR9OtdoSIrJlLF3dCRfZeiJNU70j62lgHvFGClUSreJCQAfH2C0JLmjB2S hQAjaG6OQHPs2fJFhXl/cZ2ac3lgDBvu5z+rH8doGNZaoanQVIudupThwEWj1LEuIoEV KK+/p7cdIXPb8amz0lGv7PSIoHVEKl/r8YbWpxv/mGgx+xPafuTy1ZdWZklXqLnwHvCM tA74j5nrX79jLpBRTV9YhjiTY4gz5gCNnhaxBPDd+ZcCTZE6Mc0kullNA2BjE3Q10zmb abxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709846279; x=1710451079; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kgcovSnb2fPFlLM/p7pWMQXGjzdOj4DdXtLenBaf1Lk=; b=YxtuAuT+BvLyIaaxRJ3ktZrLYnfz/pN1w6fseFk1m6X/Amqg5fa7Qu5mzZC9v/djng FH+WOp63JVwCdR8ODzlaIxJ9w44WUqQL6UmZpbvsMmus9vZP9m+PjONDbCPnfsyO5DYf Tm/UlZhXaNngN1H+jRnR0qpo2Ho91Hkl7J0aqJkScJEQ6ECyQpGDaKZ5SIM3wq5gDdXJ r6Otxw6FDBaIy3zx2KVJs0xbJFviULQThjtRnL1HzUeDGHKKsyiTzOungcQy6Od48xi5 f7RFjjZuNTtY36hQYgAB4hacb+HzieLvmz2Do1eurodMv5aamYlAR2eTZmLVv5/HHIlR 6g1A== X-Forwarded-Encrypted: i=1; AJvYcCWRk4qr3ZGUVBbIkxc/oDTzQQcmPoWhltCHo1lZF/Nv1i6zD8Sr/VpRPAE5Y0U11YAimVM5SGOPLLV2y9VauvsAqrU= X-Gm-Message-State: AOJu0YzzYOS6qYWeLc9guBYe39P5uYdLzVIhatPHvLfNEpEI4ZlRcMWG 5mzIn5xm5d9ene6ijJFLM5l3fD+Bvqs5ufSJN10kk/e/41lIMHkURgTzjvyViZXP5xhIG92t6NY DgsOFyQ50okJ1UFcAWlTYezRzjps= X-Google-Smtp-Source: AGHT+IExqh3JYqixYaVhe+iJXfnHiSR3jJ4CLucEBlroe+kzJv4fFmGt+ewT0N70bNheWoeWMOYsQGcmy6dicujYByw= X-Received: by 2002:a1f:f409:0:b0:4d3:43f8:8541 with SMTP id s9-20020a1ff409000000b004d343f88541mr9501325vkh.1.1709846279167; Thu, 07 Mar 2024 13:17:59 -0800 (PST) MIME-Version: 1.0 References: <039190fb-81da-c9b3-3f33-70069cdb27b0@oppo.com> <20240307140344.4wlumk6zxustylh6@quack3> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Fri, 8 Mar 2024 05:17:46 +0800 Message-ID: Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Jared Hulbert Cc: Jan Kara , Chuanhua Han , Chris Li , linux-mm , lsf-pc@lists.linux-foundation.org, ryan.roberts@arm.com, david@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 26A45140009 X-Stat-Signature: 8c4anab43bq1knh8a4ygsmng4sih7gpu X-HE-Tag: 1709846279-17295 X-HE-Meta: U2FsdGVkX1+dYeqJdTRcsr6SdYqyOQVwjmTOmO21QYf9VQs0Xh6b2P260MWd9QpJsGPeNLvitzaT34Nf0YOE4lzkCvsqb4gBb6NrHiZ5LhSG7CVjiHwRcC2zg6jcOBe9Trn/vQ81FwzZ2LA3bcww6ZOg6NvmH43mQXaH5p2ztmljZ+UJ4jHepIS9nwleD/YvPg+jWuIFdexd6FQ9CD8T+1BqxcFq54w3aHRHr0X0FWOKIb6pjULwFGPrwfCjBTODCMdqgFGk6lleXBSqJAdgIfDhN7Cr4M6fPfD62HDcMBFliyA9UlrQrksi4AOwdadTDR2qgiGIoNFhsclUuY4B8ROIXMN9bN10rnyy9npA9wpfEOiI2BXKAbHidi9ITZv078Ak58LtSGKVV+TtAXkOywWUk7d5X9ckDvBEgBoIvrq9CteLvB9FMy28UymevMccwBQncIP/Owl8il9ra5lS5zgcpeXoT7nyfoNIKFQh161l8uUqMqlBIl583Wm2ql1tJj+xXQWHZhA+HfJrCbGPGZUh3kyIXzLn93ysIis7Keyd1Y0RG1sTLcmiR1psoG1d73I/X71K+wVJP6XTVmU/GmM4hDU3pi4YO+8iqhDbi6umcLwxZgTKfcHVpyEE+hK5Y/vbeND8MlD1rRdlQbXEuDENMhFpycFgVW17GkmvNbH7Vt1Ip2+xh8T1kSapncPohctldi4dcsOkpGcg1B2euwCLxVBEqwIEsiyZm66UgWb75mV5uJq0P5YB5KlkodzIREzggzx3WBMpAAtrSRCos/HS36bwKoaCHysrHIG2cr72+AnjCna/GQOddCvNFdYRMsQcFV4Y82D1UqeqT8pGnOSWhXbZU7D4J7Xu5U5ZeHVLLPUq6BGV0yJV+04l0B3mp5azuVf0x5hfz0dzs5reyHaYbVJehZUrb1wLoJsdhyHdpFjzvOorY5aKyvroIyCWOdg4HtyQjt2kuZJqSFE rN4+IiT3 zivbewN/lgT2urcE2K1IFE9JM/LnlkGGadpPA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 8, 2024 at 5:06=E2=80=AFAM Jared Hulbert wr= ote: > > On Thu, Mar 7, 2024 at 9:35=E2=80=AFAM Jan Kara wrote: > > > > Well, but then if you fill in space of a particular order and need to s= wap > > out a page of that order what do you do? Return ENOSPC prematurely? > > > > Frankly as I'm reading the discussions here, it seems to me you are try= ing > > to reinvent a lot of things from the filesystem space :) Like block > > allocation with reasonably efficient fragmentation prevention, transpar= ent > > data compression (zswap), hierarchical storage management (i.e., moving > > data between different backing stores), efficient way to get from > > VMA+offset to the place on disk where the content is stored. Sure you s= till > > don't need a lot of things modern filesystems do like permissions,> dir= ectory structure (or even more complex namespacing stuff), all the stuff > > achieving fs consistency after a crash, etc. But still what you need is= a > > notable portion of what filesystems do. > > > > So maybe it would be time to implement swap as a proper filesystem? Or = even > > better we could think about factoring out these bits out of some existi= ng > > filesystem to share code? > > Yes. Thank you. I've been struggling to communicate this. > > I'm thinking you can just use existing filesystems as a first step > with a modest glue layer. See the branch of this thread where I'm > babbling on to Chris about this. > > "efficient way to get from VMA+offset to place on the disk where > content is stored" > You mean treat swapped pages like they were mmap'ed files and use the > same code paths? How big of a project is that? That seems either > deceptively easy or really hard... I've been away too long and was > never really good enough to have a clear vision of the scale. I don't understand why we need this level of complexity. All we need to kno= w are the offsets during pageout. After that, the large folio is destroyed, and all offsets are stored in page table entries (PTEs) or xa. Swap-in doesn't depe= nd on a complex file system; it can make its own decision on how to swap-in based on the values it reads from PTEs. Swap-in doesn't need to know whether the swapped-out folio was large or not= . > > On the file side we have the page cache, but on the swap side you have > swap cache and zswap. If we reconciled file pages and swap pages you > could have page cache and zpage_cache(?) bringing gains in both > directions. If the argument is that the swap fault path is a lot > faster, then shouldn't we be talking about fixing the file fault path > anyway? > > I'd love to hear the real experts chime in. Thanks Barry