From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2411AC5475B for ; Fri, 1 Mar 2024 09:53:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ACFF36B0081; Fri, 1 Mar 2024 04:53:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A80476B0083; Fri, 1 Mar 2024 04:53:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96EEE6B0085; Fri, 1 Mar 2024 04:53:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 88C6A6B0081 for ; Fri, 1 Mar 2024 04:53:57 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5DCC7403C4 for ; Fri, 1 Mar 2024 09:53:57 +0000 (UTC) X-FDA: 81848008914.02.0B1E689 Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) by imf05.hostedemail.com (Postfix) with ESMTP id B20DC100010 for ; Fri, 1 Mar 2024 09:53:55 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fDqPEyVv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.53 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709286835; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hf2fkCSJcXvXZt21tfMbi28hcj2jY4UrT45o8w6mr34=; b=cwhM7r99TFFSRWNF+h6qmGgDo0nIYUdPowA4YRdXEGkiasaXH8kGjZ7c/zdmUDSImfHz2o YaozBENaFjiMhGa/o2pCb+J1hoqrpAnTGE8d9eijQLp0q2kEy5PHUa/r6pwoPPn/EDZsbz o87t/fTzkoMuGenHPbf4dHEHoZ4HPOg= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fDqPEyVv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.53 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709286835; a=rsa-sha256; cv=none; b=OS+jTYw/niRXBMD8r//bPNfQoC2WnC+KP3NJd5QVaodyTZyrTYqMjmKVEvt/nlT5a7UzrM tgqG2OXgk8crxGlzB6FIB3O3OyAq7Z6J6rUpSjr8K0BNJ/RQpxnIBYHuhoUIU0PPtycSPg UhC1CCNsAt+JKzYSSdUI+X/LDcfbusc= Received: by mail-io1-f53.google.com with SMTP id ca18e2360f4ac-7c7b8fb8ba6so113303539f.2 for ; Fri, 01 Mar 2024 01:53:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709286835; x=1709891635; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hf2fkCSJcXvXZt21tfMbi28hcj2jY4UrT45o8w6mr34=; b=fDqPEyVvUAABZmGOLTy8Zw2GTVYx5Ylckf8WARVmc1AhBM0M8wDMcNvZUaX1SNZCFz aeQ7rlkI5fMHKiViWeHoATmEAoRmAEG6Q+eGuw2l+SmyCmI7XCYC4oXt03QP0HIEdHE/ Up00QBEYoSmN1HG/JL3gN3Dqb9CFP5J8bsR8DzpC6L8yOMj3S3rMj/o3VBLI4BDkLemF /jnSI7tUStyYJMJZRZFN3uQr/EmzM5OEzSMU8YR6Zd7ME2ks3lH2eFREKx8RpgQ4MDeM OHBONq49XiOEub8iif0msa+aWtaGAjd3O5jidhiSQrIC0l/nXYOG8nNE8fYGcMLspsDN BiMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709286835; x=1709891635; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hf2fkCSJcXvXZt21tfMbi28hcj2jY4UrT45o8w6mr34=; b=quq9vimcylYPultsJ5jAxGT69FjJTOG9MOLCMu2jcglxl6gg+HKw3t1S+V6wz9U8zN TTkTd1yDxqLey8jfMuJXaVHeGOGOq6xBdf8PJJl62eIwDHYemSFKhxYdYoHK7xgrMfas j47HsGYhqpgqngNykEwXxlnSMeQjpKfrFBYTnB+89d2ha+OU69NFQ14QwSaC4ygOWTw1 JsOJOWuZ7TkDzOoQO/HcggltLwVkuah640UhIWz6H3bD4DGSyEtiNdnstRaNzLqJcuxq vzzzQdV/EnIbBfYBNVorQW1QumGwCKlmJs19oCkjE1Ko9MKDcuuFlz/4IG/w53Bo8h9k 4U4A== X-Forwarded-Encrypted: i=1; AJvYcCXw1irdHLL3zjimyRc7dbtDZCw8y3m8Puhh3CGheXR3bWBWrXXYqMsbh1IkguOsgwXqXPoZ0quddl311fQT+cKZW1Y= X-Gm-Message-State: AOJu0YygO4jd13r8ZKHt+55l1xpNIIX6ySPG758HbQtXgX3KoAVNBC5Y ZTmpiFQzjC2dykCeKC2Q1Ahi3EXkUVgOqf7iDb9Zw3zc0M323bzIZ3VF1z2Hz4q6ZB+Y8XhegmR svnTb3Lz7tyHXpbvfPNNiW+yuAZE= X-Google-Smtp-Source: AGHT+IE37IKHRR/Sn9c+PiLWLJrlEiPhL7vXPo5hPZFsHo+6DrQfyUAIf1R6I/N3Plg4Xj9kdYFwnn6uf6LXyEQnEUc= X-Received: by 2002:a6b:7a42:0:b0:7c7:ffe3:1b47 with SMTP id k2-20020a6b7a42000000b007c7ffe31b47mr1176649iop.15.1709286834761; Fri, 01 Mar 2024 01:53:54 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Nhat Pham Date: Fri, 1 Mar 2024 16:53:43 +0700 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Chris Li Cc: lsf-pc@lists.linux-foundation.org, linux-mm , ryan.roberts@arm.com, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B20DC100010 X-Stat-Signature: ttj3entxmabaknrafx7af348ydbniu67 X-HE-Tag: 1709286835-491705 X-HE-Meta: U2FsdGVkX1+rQD3DkhyhOhg0pPMBYQ79DWugov3gjixXEaSYAqQ3x/IN3OK3xoXNcx1/fqdPN//+0FQjAXw2o8wPJ5q7eFQXXCLcfGNQ90LVbsPyava46HpIeEwIbfb5MvZDUWMLEDCRhnK19wtyGuLya5jox6tV+CgqsU/TVtlfNUAj7PW18a5bTFWRQk0L9ir0k/dT8Lq6b9sIW5w2XbFaGI78oNJ8/McjH/U7NR5ZwCKyMFrrDSBHU5hI+CCYQsjmho2pU21WulwDjVPkYI79JNjHyvKYrHJ4xeD3Nb8Zar8nGS3BngiYCpWITL3Mr8B1kMnVO0wHEMfom8f73qXkHDLfhxVKzgJ+UJu6aM9TpqGtDZ6XkdDnWgATFnyzQTTKGeJLAPzWd44LSqiElATz0xCLpMoHhCNp8j9++w7YYMahQTMP8Y+7SFmFn9Abc3pXUComJ4qHYorfZdFAeQfFIRjRJw94UfiKv5JZ0AdVB22BEZ/Kk7KQVyNPM80yzIWMy9bQmCM3blplGYvEe3Qnd+MIGQxB8nJi6D6KPX0tlQJZoioB8lPh2itYewxpNDe7Hi3gyyZpWTxKbVE38UZS0BGk1vG1uSKnJ4uKPrU1661bZfznyHpR0C7+CzYOD30Jbx18+1ecv93fTzjs5q26WW3zKLKeDG/dhwbeEYHIguCIOelV011klOKMaBWCOGc2hSIn3q1snaN7N3WS+l0gsTDBIJJ+bU/AUd6b0e3sHCSvJY1VB7A5S9JMKlVj+XLuxPKIGyxICj+nTeYP5j9rnusCqG+EQbzD+jMh8eV9qdtF4OOgkJUv9nAPIG2tIwHdA4/riC+fCbDOWeyfldsgU8mUpHMwcA/OgK11A1gjjvZfQgfpoJjvB7zz1GyuW1tjUeo+41sNSFc0wrSiE80B1HgpJwiy+kWN8H0lz+ugbcQUc6L2G4DPeoqgJIy1/wLWyNhJo6F7L8gQu3Y 0tSWJORU ettD6TZfWm44wKGDlRcWbwKFJip5o5pXlkCoE X-Bogosity: Ham, tests=bogofilter, spamicity=0.004924, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 1, 2024 at 4:24=E2=80=AFPM Chris Li wrote: > > In last year's LSF/MM I talked about a VFS-like swap system. That is > the pony that was chosen. > However, I did not have much chance to go into details. I'd love to attend this talk/chat :) > > This year, I would like to discuss what it takes to re-architect the > whole swap back end from scratch? > > Let=E2=80=99s start from the requirements for the swap back end. > > 1) support the existing swap usage (not the implementation). > > Some other design goals:: > > 2) low per swap entry memory usage. > > 3) low io latency. > > What are the functions the swap system needs to support? > > At the device level. Swap systems need to support a list of swap files > with a priority order. The same priority of swap device will do round > robin writing on the swap device. The swap device type includes zswap, > zram, SSD, spinning hard disk, swap file in a file system. > > At the swap entry level, here is the list of existing swap entry usage: > > * Swap entry allocation and free. Each swap entry needs to be > associated with a location of the disk space in the swapfile. (offset > of swap entry). > * Each swap entry needs to track the map count of the entry. (swap_map) > * Each swap entry needs to be able to find the associated memory > cgroup. (swap_cgroup_ctrl->map) > * Swap cache. Lookup folio/shadow from swap entry > * Swap page writes through a swapfile in a file system other than a > block device. (swap_extent) > * Shadow entry. (store in swap cache) IMHO, one thing this new abstraction should support is seamless transfer/migration of pages from one backend to another (perhaps from high to low priority backends, i.e writeback). I think this will require some careful redesigns. The closest thing we have right now is zswap -> backing swapfile. But it is currently handled in a rather peculiar manner - the underlying swap slot has already been reserved for the zswap entry. But there's a couple of problems with this: a) This is wasteful. We're essentially having the same piece of data occupying spaces in two levels in the hierarchies. b) How do we generalize to a multi-tier hierarchy? c) This is a bit too backend-specific. It'd be nice if we can make this as backend-agnostic as possible (if possible). Motivation: I'm currently working/thinking about decoupling zswap and swap, and this is one of the more challenging aspects (as I can't seem to find a precedent in the swap world for inter-swap backends pages migration), and especially with respect to concurrent loads (and swapcache interactions). I don't have good answers/designs quite yet - just raising some questions/concerns :)