From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B631C25B7E for ; Tue, 28 May 2024 07:08:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA5366B0082; Tue, 28 May 2024 03:08:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2DB96B0083; Tue, 28 May 2024 03:08:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF4A36B0089; Tue, 28 May 2024 03:08:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8D7416B0082 for ; Tue, 28 May 2024 03:08:27 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B1281415F1 for ; Tue, 28 May 2024 07:08:26 +0000 (UTC) X-FDA: 82166926212.17.662D3CF Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) by imf06.hostedemail.com (Postfix) with ESMTP id E49EB180008 for ; Tue, 28 May 2024 07:08:24 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=m5pbcMJy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of jaredeh@gmail.com designates 209.85.160.53 as permitted sender) smtp.mailfrom=jaredeh@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716880104; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Iz1T2hV2H3BjrTFfNdgHKYOy0Ok+ljnKTsUaVeu42dA=; b=hFE852NZ/7Xq/V4rEuhUIhn8/Jc4f7mFG0DDiNtGdW3I0+h0qeagEHmH/uRoWeOIJ5KRPY aKwc49sMLtqxozA/1x9oD0gLS4m/FhIo0f88jPecbbAgLtf7iyrpYtTfDHWJkX09MKZAMS HiNADwo3z4GiT5HasxubnTSvcn/vemc= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=m5pbcMJy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of jaredeh@gmail.com designates 209.85.160.53 as permitted sender) smtp.mailfrom=jaredeh@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716880104; a=rsa-sha256; cv=none; b=hTInFrNwkl44F5ijB7R19y5M/nKQ7fG4S9q84WSyq4OXbAkEQhdYJLua++ZWnuGBW5bTNv WI2xLLP+HmaBToXvK0LldEQHe+Cf520Lh4qzfKE9a+qKnyLLT9soyFdjND4cvF7/JkzYP7 avsuADU2+NlXxxsCeUprS/1UJCabzU8= Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-24e64ca31dfso191796fac.3 for ; Tue, 28 May 2024 00:08:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716880104; x=1717484904; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Iz1T2hV2H3BjrTFfNdgHKYOy0Ok+ljnKTsUaVeu42dA=; b=m5pbcMJyTz2fkTQKBia8nl1FwfpyMNG2G2vOrzVr6iZUuvGIcMM36ZuPTtFGzSSYCI MB8KQdvtob9U5SUQD5BxSEbmoacTM673P/gyrFkgad21/EL7iLHkoXSm894KAJLkzkCf pxKcKXnhLTytdB3kcVG3gIz6WkP8FKG3TvJ3Oc5tzdVUEfAyEL8nB3e9KoFGOczp21aC B1S+cJhvo2i3OOkAEl45mKenZfS68vBi5iyr6/b3mfpfi2Fj10c3QSlndLkUeldGfZOg ONYdkEHQAFgfl9RxPh4ePwVzzBCyQAD/vQsquQE02lgOsi/OGr0sqXU1SAeAkaWEJAqI oLZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716880104; x=1717484904; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Iz1T2hV2H3BjrTFfNdgHKYOy0Ok+ljnKTsUaVeu42dA=; b=toLRbr2W61lK2bZ2dIWhERcHxJ+eNlJPeTY2o/5GbkTRPfgCmVevN4kgWJmj6rDxCh 62fVqR2aTYD38GzajGSODXckWuLVror3ggbIwM0pejtbTuHGTokPPkKdtrZP74TiTT4V uZ0PNZ2zjr4aeZVEqyzIsU/OmUuQPZL8d3CCGWrgnTDZYQ33hTUM69vgNxnzfqnp/fj5 6e8Jp0jcYDx/jD8J77Ei74Piyh7iA0c6pQ8g64h4OrK0JjfR1PJHRHyMza1F2zhLjGde u39I+1/5uHxw82jTYiVV77LhWC38u/nwOoqOVJRh4V9C59n3kOUgoqPtTd/GN3Of9NGf 28XQ== X-Forwarded-Encrypted: i=1; AJvYcCWInrUCNTL44mVbQAGBQ9NXc09rghiUClenEioygQbNB8+IGw3bopyuye+Yuq8QgTfGS7J7JR/X6KjjPaXXnzbEB5Q= X-Gm-Message-State: AOJu0YwYhSu0GyvIK3PUbtLd7I6edSpQFlFvARuCshgxdfbLcVcSn3U2 1Mx5+Qor4OR3c9v/tQoyuGFNU5LOqxhRC+rMuTSTS90+nD5592KSEEulI6fOEydtygzxf7GvW1L 3IBK82yNe5H79Izbuz7fqx5kuF1I= X-Google-Smtp-Source: AGHT+IHoaq4MfCgkmNWDxa3/k0vWdi/U/b0vaqGd+zRQx0PoQmympD+HR2rnT4wZuWoOsZzl1GCMH/O7/qAJ/7XWjTY= X-Received: by 2002:a05:6870:2186:b0:24c:b3cc:a8b with SMTP id 586e51a60fabf-24cb3cc5e79mr11161035fac.50.1716880103841; Tue, 28 May 2024 00:08:23 -0700 (PDT) MIME-Version: 1.0 References: <039190fb-81da-c9b3-3f33-70069cdb27b0@oppo.com> <20240307140344.4wlumk6zxustylh6@quack3> In-Reply-To: From: Jared Hulbert Date: Tue, 28 May 2024 00:08:12 -0700 Message-ID: Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Chris Li Cc: Karim Manaouil , Jan Kara , Chuanhua Han , linux-mm , lsf-pc@lists.linux-foundation.org, ryan.roberts@arm.com, 21cnbao@gmail.com, david@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: E49EB180008 X-Stat-Signature: 8o3buqz9tueqw8jaoiu7b659zsgk6psw X-HE-Tag: 1716880104-234422 X-HE-Meta: U2FsdGVkX18GRugnNO5lNB7HQhr6uIWLaWTR6UT7mc6YZRPxbpSphWfuegg/Pp734mtDROI1iBg59d3WbS87K9jxGmYRyNfFPzZY0hDNOb3tqvgUUAhU/6OdKAHCH0qOGQCTBhTm6u58uI/0ghzSo9hq3TUzCf5OcWZGEmOdHk1g3fSBaWqiQIL6NIxIy7la1ozYbHD17npAUWsq+CeEkcQgwKXr2ASB4SgC8jxNWetK+qtApdKok56wW1remhqtSaLlnPJn2atDno0OWLfA+2K2EM9AYQIrIa5QudlhPu6Z7A3w3ZmsQuLgS2YCYVHpa1NHXGSwceTcsPUW+V2EUFBgPOO2unNeyI05b6r3LLmPtwCJvUpmfEVWORC51v56xW7u4dLnqiSw5T6pgbz9u/WtlGm37+cCFQTUBZImsNjfHQjkaxPU0dJSwZ/IWqcSuJQ71kpzu28bNiWdWJvCucHl7Miq288txqsEaA9KjcB2xee8crTFs5TEudH5N/M9/UPJf7OW58hycKvj5X/VUZoXqoIg6rYzqTOXW6Ud2RiRaIfTPrxw7ivvG9bnz1xoS9gdMLbwm2pJUPoB12pzXPFwysHFp+44xODFZn6H2ya84lnIODCsqEf4jhMQtjFJmgs9qj/bLRus42k+2YOxQrUx9SeartGx1jfcqOHDLg+HQciNrzgCyVOGIpJBy3Mm8JPEYLsDVjXZaGNrYCFXPGD2h8E9DC7NXYn2e71eJY2koB7Y8cArSZBGg3oAqYDKpl9DMJ5Bh6Ptmg+sJQwOZsVxfiXDpMvgd5WqvfjfiP0qKLkh9ok3c3TkwlC0FCypCmZd8xrRXQUl6vdFF1I/YGtIJ18D7rQ3Nd+m71k25FcE/u/56OOF8jsWvHdrg0ZD3dxzjKMctaqw1QngVicgk57P9WwLHiIGfYZDaT8unq0MJ3IQQPget/bsoQJzGM5HzGTlDq/+/XlHrNpKNRI 0+HgN7ry tpm+SDYbR+kyVerpUnE9C0vFy+LlI5wWGReLrdwRGtfgUO7Wyybi60IPvOlg8flSwpuBdJFmhNYSe56nxHkdXfQkUBCDVfJUPfiy4zDu6nnrViufn4lOkeJgmvUavlyLboDAxGEmcdQAy5HptFf0QLFzlCRZTaZh71FY7tLiP8sp9DvZuyMu1wikeXq8sH9NrOt23zo339d+xHUIdDbe8+cw7VFsbeobtV+YAEpoTsCY3mBt48caJjInfbB6zHwM+w7MN52mqPk84jzF3xXSl0l97LbqEmfVhSP4lwsX6QhgKHAEQWIILyfW6ioVEjQn/CKcsoAovU1DD9CHyKOBZHQlAbXXWzwEKUPCoSXuWRY31g8PvfZnSHumrOj8DDT4Nxa6r+qCMff/u160wIbnbgdt85Sr2vsJ7HzFf9HczlKPRXb1ses2Jptk+Qoa8w7803jHBcaR0a1XS0kT40OgwkdpezBayaCl7q2nPfCwX7ZGw710= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 21, 2024 at 1:43=E2=80=AFPM Chris Li wrote: > > Swap and file systems have very different requirements and usage > patterns and IO patterns. I would counter that the design requirements for a simple filesystem and what you are proposing doing to support heterogeneously sized block allocation on a block device are very similar, not very different. Data is owned by clients, but I've done the profiling on servers and Android. As I've stated before, databases have reasonably close usage and IO. Swap usage of block devices is not a particularly odd usage profile. > One challenging aspect is that the current swap back end has a very > low per swap entry memory overhead. It is about 1 byte (swap_map), 2 > byte (swap cgroup), 8 byte(swap cache pointer). The inode struct is > more than 64 bytes per file. That is a big jump if you map a swap > entry to a file. If you map more than one swap entry to a file, then > you need to track the mapping of file offset to swap entry, and the > reverse lookup of swap entry to a file with offset. Whichever way you > cut it, it will significantly increase the per swap entry memory > overhead. No it won't. Because the suggestion is NOT to add some array of inode structs in place of the structures you've been talking about altering. IIUC your proposals per the "Swap Abstraction LSF_MM 2024.pdf" are to more than double the per entry overhead from 11 B to 24 B. Is that correct? Of course if modernizing the structures to be properly folio aware requires a few bytes, that seems prudent. Also IIUC 8 bytes of the 24 are a per swap entry pointer to a dynamically allocated structure that will be used to manage heterogeneous block size allocation management on block devices. I object to this. That's what the filesystem abstraction is for. EXT4 too heavy for you? Then make a simpler filesystem. So how do you map swap entries to a filesystem without a new mapping layer? Here is a simple proposal. (It assumes there are only 16 valid folio orders. There are ways to get around that limit but it would take longer to explain, so let's just go with it.) * swap_types (fs inodes) map to different page sizes (page, compound order, folio order, mTHP size etc). ex. swap_type =3D=3D 1 -> 4K pages, swap_type =3D=3D 15 -> 1G hugepag= es etc * swap_type =3D fs inode * swap_offset =3D fs file offset * swap_offset is selected using the same simple allocation scheme as today. - because the swap entries are all the same size/order per swap_type/inode you can just pick the first free slot. * on freeing a swap entry call fallocate(FALLOC_FL_PUNCH_HOLE) - removes the blocks from the file without changing its "size". - no changes are required to the swap_offsets to garbage collect blocks. This allows you the following: * dynamic allocation of block space between sizes/orders * avoids any new tracking structures in memory for all swap entries * places burden of tracking on filesystem