From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72A59C54798 for ; Tue, 5 Mar 2024 07:44:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09B986B0082; Tue, 5 Mar 2024 02:44:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 04C086B0089; Tue, 5 Mar 2024 02:44:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7C2B6B0095; Tue, 5 Mar 2024 02:44:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D99776B0082 for ; Tue, 5 Mar 2024 02:44:24 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8EAFDC0902 for ; Tue, 5 Mar 2024 07:44:24 +0000 (UTC) X-FDA: 81862197648.13.44ED176 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf02.hostedemail.com (Postfix) with ESMTP id 2293C80005 for ; Tue, 5 Mar 2024 07:44:21 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=BQcJ4V74; spf=pass (imf02.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709624662; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wD++yrpqyh396v+qFTdbQgFco9k7rzY3y43eYZdKmRw=; b=t1UIKGl7F1n/qnn8WgWNQd4TWVmJLc9f4hQsj8te0MQS0p1EZWz9V+IuYdohfgQzWTh89Y H8t4IC/XlC3Q2VV7yVe7L1ojCj7+jkUwJtpSwy53AFMPpDgDuNA7lGEaS1VcCjTK3Z88iv IY9Eop2I+GJaG0WThwT65MnNEYUdfmo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709624662; a=rsa-sha256; cv=none; b=n/0cJTL0MwgnGy9RRBrUyXKkxIZDKI4LwyeClhoBJF961FRLk/A4fVwa7S3mr+F3eAAX2L 7Dd7HsRKpM1QTzazZTi+Ak4UMvMNJN5nB1VoOWbXn/pujDNQ4o8/LiMgamQ9wWMpUAYW1o zSUBJa7Pp7xWpy8jCQVX1Lj2F9l551Q= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=BQcJ4V74; spf=pass (imf02.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id B5E73CE13A0 for ; Tue, 5 Mar 2024 07:44:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 03633C43399 for ; Tue, 5 Mar 2024 07:44:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709624658; bh=IHxkmT2mDAPW2q/YCdQr+pdHn2lSzMAo6XOjlxeGSKM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=BQcJ4V74QPh/VYtNKRyHUJjivXTCQ+Vp8iCxekXbyUSODrhKvDidkyyqtmS3rU5KA UBKY3e8EPzlOZ71IkLWzHG/FFSTduTSizH7CYbSckoO1kxQK9m+m38x0R291wf+duE Bp/tqAoa1zFAmsbxj4z0MHPSWLviIimW7U6PTHLrb0sTggKOCeR5sLq1hmhy0afu8+ sF0lpKJGoF4A8+4TzX+xnA02ArR+CROObDg9IoeIqUl4DqrcgIpPnUOB8cNFQioQcW A0DHd0G//Jiv0vRttiipRBJcnHr+CkVJCT/piG8x+w1NXqkIrNldTMVV3KCKCDaM6C NC7BhLyQk1HMQ== Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-5132010e5d1so529777e87.0 for ; Mon, 04 Mar 2024 23:44:17 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUOAo3Xg2PJ+nPvLRWPPPUEPpPOX5qqtVShRQzAUIghJpPqIW40nsxMb5jU9nPbAsHhscY7yWTy7wqkUVyJ1Y2BoiQ= X-Gm-Message-State: AOJu0YxqRuDi2GLg6gKdoOprAazWWaMQH9ns2ZRGN4iYwV43r4Mv5o5j Cf7VWZx7JdyGJqj7oiiVy2HrmtSmXTsJUHLkRQI8uitR3WjP4cQy5AAXgkkhS95vCmQoryAD21N BCA1omlycbNCJltSNcnhUL1/ISw== X-Google-Smtp-Source: AGHT+IFL/71UPBNyF+vQdSIIJsZhBZcoh9/Vd4m5X2OC7+/RLr9nZtsuX043iRzkc/XDmWoK6NSfF1eTGVzlDWWQ+lM= X-Received: by 2002:ac2:5dcc:0:b0:513:5a3:d0d4 with SMTP id x12-20020ac25dcc000000b0051305a3d0d4mr687079lfq.45.1709624656440; Mon, 04 Mar 2024 23:44:16 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Chris Li Date: Mon, 4 Mar 2024 23:44:04 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Chengming Zhou Cc: Matthew Wilcox , Nhat Pham , lsf-pc@lists.linux-foundation.org, linux-mm , ryan.roberts@arm.com, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2293C80005 X-Rspam-User: X-Stat-Signature: cmn3yx9n3ik3arki3193zuh69n8doij5 X-Rspamd-Server: rspam03 X-HE-Tag: 1709624661-837104 X-HE-Meta: U2FsdGVkX18DhwZqRKsbte0sste9NTBlilgRHYMkg1Do9YeStOqYDcAQD9lFzZi6cKWehAoMK0EjtoJDn2ogBeRtUuc0hk7lLkjGtqz7XusSYYLwBxIqZhPQiWM62jh/8Ji6NjnPQf0GCzAyh6/KGAgwDhCIKHka6QfvkLrB7bFSBuNDALoLVognXnxUBdx4LF1/VVwsyMOlmdHTntOfBFAgyCV3+OSvl931yjlR4gODxFxAyko2Qy8C5NHyiYgSMPfHGX6nBQYi8SKakqfv9ItOMqXw8Iz7fpnaSjOUszQIhcosnb5hiyLy3wTCSmQ58VvBxxl1knIbNqPvDxPabrTyywRQ2KxFsp7LFMj/k0IifL4upRXVoJ9twN9C5nDhyKxiiVnVWh6sc0ONkW+jZ4yjs3rxZ6DBvi7Y0VYwWgZ245PiUUJvk5nj8jDujQ9kvBGHEwHnOEUkHYu8QAZ9XiHQ+Dk1cBRp+50XeYKpeXMMc+C05ueActrEL4mDhwv7wrNvplIFqXIvBL3FJCEBComfjku0wFDV5qxGS93W3e8rhqlX1FSC0eeKLGZ+gJDp8Gw0p8DJqiFxNTezqCaIo7zCIQJp0TYdp9hU0FDJfxmMWIjzM7LPjgjJ+4+039Y65oDBKYUlFkwDriBsUN7XCdskYhwJt4RiElnjmrKgN5P1y3UUCg5Hwo7k1DLOoXmcXacBmLA4s2NFxDBkGqI+MeJOb5qByDC/p4PLbGsKYqhnKE23ktgCyRF6okA/dVYB3rH42ZoT/Cs6c5uMvdb2KGHhX5EPek9FL4zB0wy6iKz6S6hG2AO9zaA1Fv7/oheu4tiUusO0erN4LAEf5IudOv4et9kxdUrCevNliuQjk7rGTvWoH7pwB6Sg8DLSNCfuBBz2Q76a45U3eoj+Nmwq1cUuzwYOOXuT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000124, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 4, 2024 at 7:24=E2=80=AFPM Chengming Zhou wrote: > > On 2024/3/5 06:58, Matthew Wilcox wrote: > > On Fri, Mar 01, 2024 at 04:53:43PM +0700, Nhat Pham wrote: > >> IMHO, one thing this new abstraction should support is seamless > >> transfer/migration of pages from one backend to another (perhaps from > >> high to low priority backends, i.e writeback). > >> > >> I think this will require some careful redesigns. The closest thing we > >> have right now is zswap -> backing swapfile. But it is currently > >> handled in a rather peculiar manner - the underlying swap slot has > >> already been reserved for the zswap entry. But there's a couple of > >> problems with this: > >> > >> a) This is wasteful. We're essentially having the same piece of data > >> occupying spaces in two levels in the hierarchies. > >> b) How do we generalize to a multi-tier hierarchy? > >> c) This is a bit too backend-specific. It'd be nice if we can make > >> this as backend-agnostic as possible (if possible). > >> > >> Motivation: I'm currently working/thinking about decoupling zswap and > >> swap, and this is one of the more challenging aspects (as I can't seem > >> to find a precedent in the swap world for inter-swap backends pages > >> migration), and especially with respect to concurrent loads (and > >> swapcache interactions). > > > > Have you considered (and already rejected?) the opposite approach -- > > coupling zswap and swap more tightly? That is, we always write out > > the original pages today. Why don't we write out the compressed pages > > instead? For the same amount of I/O, we'd free up more memory! That > > sounds like a win to me. I have considered that as well, that is further than writing from one swap device to another. The current swap device currently can't accept write on non page aligned offset. If we allow byte aligned write out size, the whole swap entry offset stuff needs some heavy changes. If we write out 4K pages, and the compression ratio is lower than 50%, it means a combination of two compressed pages can't fit into one page. Which means some of the page read back will need to overflow into another page. We kind of need a small file system to keep track of how the compressed data is stored, because it is not page aligned size any more. We can write out zsmalloc blocks of data as it is, however there is no guarantee the data in zsmalloc blocks have the same LRU order. It makes more sense when writing higher order > 0 swap pages. e.g writing 64K pages in one buffer, then we can write out compressed data as page boundary aligned and page sizes, accepting the waste on the last compressed page, might not fill up the whole page. > > Right, I also thought about this direction for some time. > Apart from fewer IO, there are more advantages we can see: > > 1. Don't need to allocate a page when write out compressed data. > This method actually has its own problem[1], by allocating a new page = and > put on LRU list, wait for writeback and reclaim. > If we write out compressed data directly, so don't need to allocated p= age, > these problems can be avoided. Does it go through swap cache at all? If not, there will be some interesting synchronization issues when other races swap in the page and modify it. > > 2. Don't need to decompress when write out compressed data. Yes. > > [1] https://lore.kernel.org/all/20240209115950.3885183-1-chengming.zhou@l= inux.dev/ > > > > > I'm sure it'd be a big redesign, but that seems to be what we're talkin= g > > about anyway. > > > > Yes, we need to do modifications in some parts: > > 1. zsmalloc: compressed objects can be migrated anytime, we need to suppo= rt pinning. Or use a bounce buffer to read it out. > > 2. swapout: need to support non-folio write out. Yes. Non page aligned write out will change swap back end design dramatical= ly. > > 3. zswap: zswap need to handle synchronization between compressed write o= ut and swapin, > since they share the same swap entry. Exactly. Same for ZRAM as well. Chris