From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24EEEC54E5D for ; Thu, 14 Mar 2024 09:03:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89E9F80091; Thu, 14 Mar 2024 05:03:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8000880073; Thu, 14 Mar 2024 05:03:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 62DAA80091; Thu, 14 Mar 2024 05:03:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4E96680073 for ; Thu, 14 Mar 2024 05:03:52 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EEF2B141396 for ; Thu, 14 Mar 2024 09:03:51 +0000 (UTC) X-FDA: 81895057062.21.706B8FD Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf27.hostedemail.com (Postfix) with ESMTP id BBBA44001A for ; Thu, 14 Mar 2024 09:03:49 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=qHxpG0w0; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=e96zI15o; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=yx+z28UR; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=RFj67XpX; dmarc=none; spf=pass (imf27.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710407030; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D+pGIBXmztq83v7MaBHClXYn0/kSocM4BYY2w8/IR7U=; b=nxaS304LQO8fBpc9AB3pl1TuxQe1Fb6w5yiLCgEpQDFK1RXyeW3sncbdnB97OoZoLPO31P Gq5FVj/Z8um2xl5GI4HoUdHhFDt+sdX7FuYBueD+/gDT6aO286kg+tLsnjQpM7rg/p0Vid Y9vwvgAtdSsqNNOktzlD/XQkAb5tqUs= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=qHxpG0w0; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=e96zI15o; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=yx+z28UR; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=RFj67XpX; dmarc=none; spf=pass (imf27.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710407030; a=rsa-sha256; cv=none; b=nUGwaMoXjNkg61depFWS1vJN8fdUCa5iz7S5gyyUAf6fx17ymf+6ZDILtf1lQ28AObDBO0 ue7GjjArhih0h+PnOexNEmnptKrcyXlbMAx0I0tYmQqsE+RYVBgAlnAH3csTELVeIK/LGN BmksiaPb1D2ozhUf/CZOQTS6vj0CV6k= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C937B1F820; Thu, 14 Mar 2024 09:03:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1710407028; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D+pGIBXmztq83v7MaBHClXYn0/kSocM4BYY2w8/IR7U=; b=qHxpG0w0UIRvQkGAG/hfc+FKseWdXl4UJIs6nTvfkkyzX5pXBovYQzbSlsfLaF52cSXOX1 F4Ke27z8umxU1Wsz8mvlQBc/CYS5awQLKaKlVLgIN7zfwlcV9MmTlhQInLi2KlO/ENh2xd fTs01ZEz2AGAP72sfsDSWyPlXIqwj70= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1710407028; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D+pGIBXmztq83v7MaBHClXYn0/kSocM4BYY2w8/IR7U=; b=e96zI15o7d4Yj+fzAYG9ur40yfULdxxupYsrTcVLF9ujzit1SqFv2uCmXUQt1/+xoauo3A 9Femq2l9uSyk/lCw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1710407027; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D+pGIBXmztq83v7MaBHClXYn0/kSocM4BYY2w8/IR7U=; b=yx+z28UR9koV9BjI65Gem3c3BG0Pnr6faQ3E/q9nlwMrmyJQiAWaKmswHHdTqvniHVV89C VCJ/2j4MMSjCfMkLIENjkeEX84T/8sy9RPoWveB5m0+CiNOGdOJYQzxYDNNLHIOr03qyJI bBgjcj0z6wzMAigzYRLF4foQqrhXS/E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1710407027; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D+pGIBXmztq83v7MaBHClXYn0/kSocM4BYY2w8/IR7U=; b=RFj67XpXm56Kzp9wfXzIvFSKezhUYAV/zKw9PlwgRwgA6y4y+01H2M06dbZe7AFs1QEr7o 06hpjO0Ge+JSmSCQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id BD7D51386E; Thu, 14 Mar 2024 09:03:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id hLU+LnO98mUDMQAAD6G6ig (envelope-from ); Thu, 14 Mar 2024 09:03:47 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 58EAFA07D9; Thu, 14 Mar 2024 10:03:39 +0100 (CET) Date: Thu, 14 Mar 2024 10:03:39 +0100 From: Jan Kara To: Barry Song <21cnbao@gmail.com> Cc: Jared Hulbert , Jan Kara , Chuanhua Han , Chris Li , linux-mm , lsf-pc@lists.linux-foundation.org, ryan.roberts@arm.com, david@redhat.com Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" Message-ID: <20240314090339.kieqv4v4m6yyewn5@quack3> References: <039190fb-81da-c9b3-3f33-70069cdb27b0@oppo.com> <20240307140344.4wlumk6zxustylh6@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Stat-Signature: qddnpq13zhyshi5smcwg5865hc7oiqdu X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: BBBA44001A X-HE-Tag: 1710407029-47003 X-HE-Meta: U2FsdGVkX19O5XUrVsZA/e1XaZ8TS+KrLjT2bV7eeJpMFZFuvmVP0cW1yns6Pkl5hhPBJerivQ7QH8IvfsL0CXDWFsn/TTDeQ4ovxa/qzAJZuxtVQZPcn3SFWzS7KTOh3wXS5lEPhRP74juGeL+mT4+Ze36TOZ1zUlOSuY2wCwiuXc71Dl94eLsOgQeJt7/FZdnv4dTIE3Pwl4jpbyprt5OVVGeDNc3BlJO7ddAmbkzVhnFK7yZlBgN98xH2AzhMJRt6cwmVdMEsXs/oPGLB6Uc69/uffmzCK2e6eKiMWy+34pLTL2v+DBctMWtyrR6YI7yUOpPEALWzG2+7cRaW/C19WGSLS32r/YZhtCBjAOi1z3sD8g93rvlVDy61O2gMQyEe8ZOyc4HcmYwW/uSaRDVX/B4tXL0V/Hs7UUyumJlIPxaHzYTJZaav1ArSLiYTnl+Do/1ZNnpGDkCKKtDqD4bVOEIPoHxOZTzKi90udQegOdhBSEe7yc7tasVIKMGfMYPWsPtbfrm0ikmMI9Y/DE7MeTyUDVxhDwipjlLa33y/dcdkOdIgsYYcwJiLaKyHI+yfpiHsbkg6m7oqCK66q++Ii/UZGywlmBClzvPf372ESmdijG5VWiqRZSY5dm4JA39MZp+c6n6W2y7I0NRIfhMt5p0jYBSYFabgEzniK1El+knoDHKBh0y+i6Bxt07bxdR001vs/LxL+FCea5qEzmWjGGoWfBTDL8NhU0tqqpyC1sC96KtKmJGW2Dis9Wbgv5RyOmfubiJnAXWqc4Xl6hY92WcPrdikYccUQ1KQlMlf4lXlHAQnR9REIwwnyZLMCgcvUbGtDmTtkPQDyEOiIINEbW27tPj97md49wJyof0sO+UmzIg9uFtSqHgx/jgA4tYCTb4G9gc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri 08-03-24 05:17:46, Barry Song wrote: > On Fri, Mar 8, 2024 at 5:06 AM Jared Hulbert wrote: > > > > On Thu, Mar 7, 2024 at 9:35 AM Jan Kara wrote: > > > > > > Well, but then if you fill in space of a particular order and need to swap > > > out a page of that order what do you do? Return ENOSPC prematurely? > > > > > > Frankly as I'm reading the discussions here, it seems to me you are trying > > > to reinvent a lot of things from the filesystem space :) Like block > > > allocation with reasonably efficient fragmentation prevention, transparent > > > data compression (zswap), hierarchical storage management (i.e., moving > > > data between different backing stores), efficient way to get from > > > VMA+offset to the place on disk where the content is stored. Sure you still > > > don't need a lot of things modern filesystems do like permissions,> directory structure (or even more complex namespacing stuff), all the stuff > > > achieving fs consistency after a crash, etc. But still what you need is a > > > notable portion of what filesystems do. > > > > > > So maybe it would be time to implement swap as a proper filesystem? Or even > > > better we could think about factoring out these bits out of some existing > > > filesystem to share code? > > > > Yes. Thank you. I've been struggling to communicate this. > > > > I'm thinking you can just use existing filesystems as a first step > > with a modest glue layer. See the branch of this thread where I'm > > babbling on to Chris about this. > > > > "efficient way to get from VMA+offset to place on the disk where > > content is stored" > > You mean treat swapped pages like they were mmap'ed files and use the > > same code paths? How big of a project is that? That seems either > > deceptively easy or really hard... I've been away too long and was > > never really good enough to have a clear vision of the scale. > > I don't understand why we need this level of complexity. All we need to > know are the offsets during pageout. After that, the large folio is > destroyed, and all offsets are stored in page table entries (PTEs) or xa. > Swap-in doesn't depend on a complex file system; it can make its own > decision on how to swap-in based on the values it reads from PTEs. Well, but once compression chimes in (like with zswap) or if you need to perform compaction on swap space and move swapped out data, things aren't that simple anymore, are they? So as I was reading this thread I had the impression that swap complexity is coming close to a complexity of a (relatively simple) filesystem so I was brainstorming about possibility of sharing some code between filesystems and swap... Honza -- Jan Kara SUSE Labs, CR