From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B92ABC5478C for ; Fri, 1 Mar 2024 18:57:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36EB76B0074; Fri, 1 Mar 2024 13:57:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 31E626B008A; Fri, 1 Mar 2024 13:57:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1725E6B0075; Fri, 1 Mar 2024 13:57:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id ED70D6B009B for ; Fri, 1 Mar 2024 13:57:23 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 806F414096B for ; Fri, 1 Mar 2024 18:57:23 +0000 (UTC) X-FDA: 81849378366.27.7422CB8 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf11.hostedemail.com (Postfix) with ESMTP id 81A1540019 for ; Fri, 1 Mar 2024 18:57:21 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Qm7Fgi8f; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf11.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709319441; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LhC6qHI6KEDcyaX0lFOFnkypOSpaFOM2hHChGpnmlGg=; b=GxXmb5YMBDDyiV+a/pl6gWMQTV0dJ8Ltr8poWW2enPPlCgIj+Okn+5xac0OPFRIuHFA5II q+DfaKJdJ4VSqfidpmgH2mNpLeiKqDhHngI5JeVITxc9+i5anOryN5vpWSTaSOGQiMuxg9 9uOBHv9V3gSjl4EddY+oInavyBZTN/w= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Qm7Fgi8f; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf11.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709319441; a=rsa-sha256; cv=none; b=ULWMdvMbQY7J7tVcn7h7lDfH4qhTOle42PjQL7aBv1G+kz/tCb9rHaz1A/Q5f2bgxyKPKj pO2sv88gIg4ZS7dqRp3b2xVT+yITP+BZUgZC2HvsMXVLugL4kk5TaPHgJCMZZO1LAVZLxP NMHvyF1kOdVz/LPTGyDvlGJ2WCQoiiQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 82BEA61A4D for ; Fri, 1 Mar 2024 18:57:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E2C18C433B2 for ; Fri, 1 Mar 2024 18:57:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709319439; bh=LhC6qHI6KEDcyaX0lFOFnkypOSpaFOM2hHChGpnmlGg=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Qm7Fgi8fWC9F+cyEKqUt82MPeVRZLa6POn8BEoPgSR01DZfNTLOytWG1Rfh4raMm+ H4/Ojg5/2ja4aBpQgk6Apq4OfULthCM8ByYfVdsAEV5nP6v3Xc/x0OMhnFkat+AG3J hgbFkNfhOE9E+2WkcTl4Se0Gs6Ijl2ONfbXngWv7Bj0fljyFdsqgKx+sVTZj36ZCa+ AH/W7aLZaQ6rwwvQWvgAIgQ/dTzIWMg7sogoyUqsA60wOcQ5dh1p+OXtMCwF49d6i3 qhCoyn3lAtF86VCfha5tSYNrqdnVe8CXuBadvz8wIRBpFntRmIXxnxSpDMbfkG9z67 66E69aGUfnxgQ== Received: by mail-il1-f176.google.com with SMTP id e9e14a558f8ab-365bd66bea9so9210685ab.3 for ; Fri, 01 Mar 2024 10:57:19 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUxz6e2u8YbD7ljBqeWipuYe6LHrB1tjSQLKqIbGPKdQyJ5JY9AxLLyb9pFg9kzu3/PrK/6vmpf1p6aU57NvymRgjE= X-Gm-Message-State: AOJu0Yxsujy1Hguc88ySwhTUcatODNWWfNJPSng9aN5PHrL+ZRnGLtRv OuQ0YHJ3B3tXoJBbyZKI4nWQR+Aj5h26uG68AWX8Bx9un2jveVIZtoe9rjKGseBgQ4WCFkRzGHX l0eMfhhYtuKo/rezG/hhU2aa7WaxWT+tmIhid X-Google-Smtp-Source: AGHT+IH8hw/EaBdS1phYo1apnCCahTjSzG36EV9nqbT1pFFbkEZDIDREkjWIW95cGN0ViM5+sCG3tEaab1eECqAGnBg= X-Received: by 2002:a05:6e02:b2d:b0:363:d88e:e111 with SMTP id e13-20020a056e020b2d00b00363d88ee111mr2763767ilu.30.1709319439165; Fri, 01 Mar 2024 10:57:19 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Chris Li Date: Fri, 1 Mar 2024 10:57:07 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Nhat Pham Cc: lsf-pc@lists.linux-foundation.org, linux-mm , ryan.roberts@arm.com, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 81A1540019 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: m9osxatnz47q4epgpjkq864mqd7k3osq X-HE-Tag: 1709319441-570787 X-HE-Meta: U2FsdGVkX1/6+9HcFeEpji0KGeDAN4oBRIazZ+xXrj74ynPv1Z4X15ObNfPMDPkZzjL+Gw1a4KS+Ka0MgoTSi9b8e/sQl5nO+EUbpPki1XYQrDgx401lj6EI1nxFb8Zi4JgllswmZDnGl5r4Cn8dbyS0HRAJA+5E5JSWomy2e9fqZZzY9lI0k4ZBALdfshsApl8gT1vn+F5zi6EKqkGyp8/Mk72vZEwliFMArHyvSZF/9NyvPb5D14d/fVxKdWYol3FYQ7SsOy/3fIqS19c9MrqzpCdvHGEzwUYtR/bA5DfM6tO0sMfHWibJefqZCFOaADJXhNoONue9HfMZcEuxd2ebNHH2D9oHNHoZDtP9loG7cq+8OIaaoGHT073cy1u0NOCx1vU4eISjv+zEmVzDW1FfhafrtnKn68eivgUHeYiDrIk/iU0jK+5ou2W3KQakHIgkgIMwAj0lo/3XSKhbRbaQLw5yTph1OKSgX3hOTtTPCLrJTGe3Hdx4FmsbXWyyISw7jP2vrfOGnuCck/0VkAkg0qB2MVA2njE4nGW/zkWw4s9AjOBpXEt6BdMRxZVPPlchU3vM8Y6qwfQqhYFV47RyLYOTUarZZMS9NTl8aDfj5kZij0JNNP52gcQhR8VhcVOZTJI2IqUJJJ0Glog/EjQM8iKR6eGdYwfzMpyXjzV3hj4iBK7plC8FjQ3UT+ig4s6Bh72FVRj/Kf/mwuKcpvcO4L8ZFiF25SVlmFAtp5f9ersz82gPtnfVqb43U4C9BUAHsm7XGeHIBXGhcr/K5/4wzyVafLFK4ctlUCV0EO4KYEEgoZTkRWdpXbXwuCQ22KMJgzuXU26zZj6+Q3hTyLFO867E75Yik7soO42wxDAVRbSWkW/A5VyDb6H02sJAfC6G28t8fVbMKHjSRwB1nEsDwovOv500 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 1, 2024 at 1:53=E2=80=AFAM Nhat Pham wrote: > > At the swap entry level, here is the list of existing swap entry usage: > > > > * Swap entry allocation and free. Each swap entry needs to be > > associated with a location of the disk space in the swapfile. (offset > > of swap entry). > > * Each swap entry needs to track the map count of the entry. (swap_map) > > * Each swap entry needs to be able to find the associated memory > > cgroup. (swap_cgroup_ctrl->map) > > * Swap cache. Lookup folio/shadow from swap entry > > * Swap page writes through a swapfile in a file system other than a > > block device. (swap_extent) > > * Shadow entry. (store in swap cache) > > IMHO, one thing this new abstraction should support is seamless > transfer/migration of pages from one backend to another (perhaps from > high to low priority backends, i.e writeback). Yes, that is the next step. I am just covering the existing usage here. What you describe is what I call "the swap tiers". I considered that topic but did not submit it this year. The current swap back end is too en-tangled, (lack of a better word). It is very hard to add more complex data structures in the existing swap back end. That is why I want to untangle it a bit before attacking the next level stuff. > > I think this will require some careful redesigns. The closest thing we > have right now is zswap -> backing swapfile. But it is currently > handled in a rather peculiar manner - the underlying swap slot has > already been reserved for the zswap entry. But there's a couple of > problems with this: > > a) This is wasteful. We're essentially having the same piece of data > occupying spaces in two levels in the hierarchies. Can you elerate? If you have a ghost swap file, the zswap will not store data in two swap devices. The price to pay is that you need to allocate another swap slot on the real backing swap file. That is the same if you move SSD data to a hard disk. You need to allocate a new swap entry on the destination device. > b) How do we generalize to a multi-tier hierarchy? If zswap runs on a ghost swap file, flushing from zswap to another real swap file would be very similar to flushing from one SSD to another. That is the more generalized case. Zswap sharing swap slot with the backing swapfile is a very special case. > c) This is a bit too backend-specific. It'd be nice if we can make > this as backend-agnostic as possible (if possible). Totally agree, that is one of my motivations for the "swap.tiers" idea. > > Motivation: I'm currently working/thinking about decoupling zswap and > swap, and this is one of the more challenging aspects (as I can't seem > to find a precedent in the swap world for inter-swap backends pages > migration), and especially with respect to concurrent loads (and > swapcache interactions). It will be very messy if you try that in the current swap back end. Chris > > I don't have good answers/designs quite yet - just raising some > questions/concerns :) >