From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39C81C02180 for ; Thu, 16 Jan 2025 08:38:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC2176B0082; Thu, 16 Jan 2025 03:38:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A4B9D6B0085; Thu, 16 Jan 2025 03:38:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EBB36B0088; Thu, 16 Jan 2025 03:38:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6BF326B0082 for ; Thu, 16 Jan 2025 03:38:30 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D7791C0689 for ; Thu, 16 Jan 2025 08:38:29 +0000 (UTC) X-FDA: 83012663538.26.7DCC423 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf26.hostedemail.com (Postfix) with ESMTP id CB39E14000A for ; Thu, 16 Jan 2025 08:38:27 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Tfzo84hV; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737016708; a=rsa-sha256; cv=none; b=pB2Se6M7BSJxlUT90APhNOaxUVP13LvpeyPnovTEX2MoOFj56fLGWmpWTWvJntRHv9AWxD 89OALJowZt5bQs3Pba/KMJv6xTBHgOBotiM+LnlkIN8hQ3EV8NlVJ1h1+ucdTq3s8s+PgP qswyb14WK6V6PAGWf0BFvIqvjvJ13BE= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Tfzo84hV; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737016708; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cmtbdcwS+J0hNYUnfCF4uqKqpFYttaWzJ6JXcAlD43M=; b=eB2+e+oV4wUtfubRkrc4opifsH28ovlLXIcLZ4qfgVqUxoXh4KkeV49ifXNjYHIK+M0O5A YsaFMipVib9law7Glp9keJe2jkoXIcP1uRFml7Rs1gSNgFbKChwRrWmZzVMqdhhnENg135 unBOehNr1GllWJC6qWKd5i7frFkPCVA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 7FFA55C5CA3 for ; Thu, 16 Jan 2025 08:37:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BF6F3C4CEE4 for ; Thu, 16 Jan 2025 08:38:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1737016704; bh=cmtbdcwS+J0hNYUnfCF4uqKqpFYttaWzJ6JXcAlD43M=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Tfzo84hV3RPkHLuM1ChZa3rsVSndB8as6jQxrcMWxu/Th8imu5qY0fopFdgWvRNDa RfOTLNX5vF0O9ptenRS8PfijuVauHxQGX/KJhr7Yp2/2WNkdTNuseenVKxU6GaVMYY ySZOvA67qVJt+4DgY9JXdCFlvK6yRg1B2KmFL7yb4x0F1Ch0/jcfs6HnAnwavl5Oj+ 8RxMH0TiEmHmwxekq408nf1Lm4NXK2sRrpwzNvqpZkRNQhj05P1CuOga5iND0/8x86 FJ75SdyZAL+kBrkFG/RCLImI0W5vnZw60nU0uFUtsEwUpLc/62kWVPEP23YKQWKJaw jGelA21xxtAsg== Received: by mail-yb1-f177.google.com with SMTP id 3f1490d57ef6-e53ef7462b6so1076002276.3 for ; Thu, 16 Jan 2025 00:38:24 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCVnfUX6qfq7B/gbq8bCLhlHGqEOYWrLHIHyt1a3Yj9YjqRtt/ejInrTxQ34kcRhVz+icZFDoNl5xQ==@kvack.org X-Gm-Message-State: AOJu0YxwvTG89Qwimnte1zCb1WD3kiHIlDi+ymX/RR8C9JmG1asCxKCy /uXiJFXi3eyEh3sRwvT2Y2daXS9LRKgUGRMmyCnVzj5lS/06981KLgPCzWTzSaiwxFjk/nIJFrs 0/RvpgZlo5x2DNRcf3AzHuvrzjEzYsrF1g+B+lg== X-Google-Smtp-Source: AGHT+IGIagp+qoMazXm8+uy8LRNetPEtUrx2uebDPk+H/e+gIg6ojqNPcwi6mxSjjMEy6Z0m1UiW8dq/3P5OnO5WHLY= X-Received: by 2002:a05:690c:e09:b0:6ef:6536:bb6f with SMTP id 00721157ae682-6f53125ed3dmr282605757b3.22.1737016704013; Thu, 16 Jan 2025 00:38:24 -0800 (PST) MIME-Version: 1.0 References: <20250107094347.l37isnk3w2nmpx2i@AALNPWDAGOMEZ1.aal.scsc.local> <20250107122931.qpkn43yvs4kq3twi@AALNPWDAGOMEZ1.aal.scsc.local> <470be5fa-97d6-4045-a855-5332d3a46443@redhat.com> <20250108141406.3gen6dnlb3b4zga6@AALNPWDAGOMEZ1.aal.scsc.local> <85e2b81d-9255-4c54-b4ae-de52b2c02e7f@redhat.com> <127a4c29-e34d-401c-a642-cc73d9d1c2f6@redhat.com> In-Reply-To: <127a4c29-e34d-401c-a642-cc73d9d1c2f6@redhat.com> From: Chris Li Date: Thu, 16 Jan 2025 00:38:13 -0800 X-Gmail-Original-Message-ID: X-Gm-Features: AbW1kvZOBqvN108NkRO1GuX_DokdRdOrmUVzSs4gSpj9IyWKSjHoIk5jzo9wgPU Message-ID: Subject: Re: Swap Min Odrer To: David Hildenbrand Cc: Daniel Gomez , Ryan Roberts , Barry Song , Andrew Morton , linux-mm@kvack.org, Luis Chamberlain , Pankaj Raghav , David Rientjes , Kairui Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: CB39E14000A X-Stat-Signature: n7esn43ychz4zugijyprjjd1kzfcwrsf X-Rspam-User: X-HE-Tag: 1737016707-133583 X-HE-Meta: U2FsdGVkX18I1hWoFeEHBiZDJWXO6PWYpsGZGrN9wxi0Bbiqe0nztrzpP5NyQqOWdW6r9gRKQoLOOfrJTAahE0BqmWdJSekIK2slPo+R4LLMG00XDX/dDAY+EIsi1qvqf7twX6uSTUnmKC8e4ww57uZUnog02lZQZXxOf5+QYqbHKiYlDTp1UEgktjXmAQizBtOSMwc0CQezw1J/C0iG1lk/O2lusRzu0Eon2Bm8kBNK8vjbyDjw3f1av9czijbYoNwzRnJf2nPBMHkYS1MTi/mX+IAg8nQ9DUuedamUTW7i1VSqe81uh43jy28gnC7jHbI8U2PGgVOG0fXZWa0PktVdfLQdK7vb0ErwISeG+DiIOP+QttEB7SQKk5lf2EW4/+duRgkORclaAa7M+ilOq7FoaRn45WAOWoB8R0MnvIZtjck6pjncZfhKFgk4P42S9MdVISvwAHlh3J0t25Zn3at+7cd/9M2TlI/FSxtpGjmSoVXXJzirX18Y7rgBszilTQLPigm6M5yoWGyja2tDdeQkRwzO+JHiQa5OQyAWqOGQOCbPf4jij1lrYsnyCmkSxImrCkaArhCJxb9JAIOZn5MZbuvnKwEokvaYgXWAVuYFS0vKwliFQIZA6myFa9UPNwc/WnsUSIOU0zFZ/sWufofpwCGt0nQwHSi8Uk0L++hKvzez7CLP+7K1ArXfOMqFahqVmfMOGQvE/UahDkQQoSWo7EDxGiBHAzOVbygQeTn2xe8Lcj+IF5c3UVR31WgHD5ZflMDQuTHQgbDRbDpnzK/EU8OTtN56eQoeY9paL1/fOS2fYjivM74I/6Mm/Mpnvxb5luNpp6GzXjserEcbztAxyMGkjfuw5TIn2JE9A1h2mawhchr85Z+8j16zuNnSonxtCpaSY0b5Rck0oVzwLOwEm8jrwBDm5f6mN2NKG3sTjUwycx5m8uGgmtNYudi29L9p6+kivgmslCZ6ycG YBcoqq6e Syoahp7FTVaCemuJNGZEliDvKsq/Lv+fBiqLuHiX4rKO04tCeIr34p2WbeFqOCTkOb+dJPzVCYKTKWsvkab9Z7Y6UnWSp6B/bYPUHOXoAVBy2ML0t4nq47/f05GMqclcx5EJtxelCthRMAFOh++ab6Rgh1lEPnMHxf4KjsrLH6nYZXPSO7RKbJoSZ68UjAPtZJuXFRCjfJlzQ65Q3Tbymoa2CYNa2cUVCh11CXdjWuXbxjoThQ0JPzN8GZ1fcqAg4HFeF/P0Eu0ofasmHE08yFeHrvkYtsAjEuqA4pICXZt4r5bEYG+2VvRUfvQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000128, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 8, 2025 at 1:24=E2=80=AFPM David Hildenbrand = wrote: > > On 08.01.25 22:19, Chris Li wrote: > > On Wed, Jan 8, 2025 at 12:36=E2=80=AFPM David Hildenbrand wrote: > >> > >>>> Maybe the swapcache could somehow abstract that? We currently have t= he swap > >>>> slot allocator, that assigns slots to pages. > >>>> > >>>> Assuming we have a 16 KiB BS but a 4 KiB page, we might have various= options > >>>> to explore. > >>>> > >>>> For example, we could size swap slots 16 KiB, and assign even 4 KiB = pages a > >>>> single slot. This would waste swap space with small folios, that wou= ld go > >>>> away with large folios. > >>> > >>> So batching order-0 folios in bigger slots that match the FS BS (e.g.= 16 > >>> KiB) to perform disk writes, right? > >> > >> Batching might be one idea, but the first idea I raised here would be > >> that the swap slot size will match the BS (e.g., 16 KiB) and contain a= t > >> most one folio. > >> > >> So a order-0 folio would get a single slot assigned and effectively > >> "waste" 12 KiB of disk space. > > > > I prefer not to "waste" that. It will be wasted on the write > > amplification as well. > > If it can be implemented fairly easily, sure! :) > > Looking forward to hearing about the proposal! Hi David, Sorry I have been pretty busy with other work related stuff recently. I did not have a chance to do the write up yet. I might not be able to make the next Wednesday upstream alignment meeting for this topic. Adding Kairui to the CC list, I have been collerating with him on the swap related changes. I do see it is beneficial to separate out the swap cache part of the swap entries (virtual) and block layer write locations (physical). So the current swap allocator allocates the virtual swap entry and still keeps the property of swap entry contiguous within a folio. The virtual swap entry also owns the current swap count and swap cache reclaim. Have a lookup array to translate the virtual entry to the physical location. The physical location also needs an allocator, but much simpler. The physical location allocation does not participate in swap cache reclaim, those happen in the virtual entry. Nor does it have the swap count, only 1 bit of information used or not. The physical entry allocation does not need to be contiguous within the folio either. This redirection layer will provide the flexibility to do more. e.g. bridge the gap between the block size between virtual entry and physical entry. It can provide the IO batching layer to merge more than one virtual swap entry into a larger physical writing block. Similarly it can allow swap to write out compressed zswap/zram into the SSD, using similar IO batching. The memory overhead is 4 byte per swap entry for the lookup table. Maybe 1 bit per physical entry for that location is used or not. That is the key part of the idea. There are other ideas like dynamic growing the vmalloc array pages can be viewed as incremental local improvement, it does not change the core data structure of swap much. Chris