From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D31A6E77199 for ; Wed, 8 Jan 2025 21:05:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B8146B0085; Wed, 8 Jan 2025 16:05:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 268FB6B0088; Wed, 8 Jan 2025 16:05:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 108C56B0089; Wed, 8 Jan 2025 16:05:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E6EB66B0085 for ; Wed, 8 Jan 2025 16:05:56 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4208945F17 for ; Wed, 8 Jan 2025 21:05:56 +0000 (UTC) X-FDA: 82985516712.04.0F499B2 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf20.hostedemail.com (Postfix) with ESMTP id 1CA691C0019 for ; Wed, 8 Jan 2025 21:05:53 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GigZ0ihw; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736370354; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kwXlgLr3l9zOgXxiDkuldPO1eJlyXmtCFutF95wS714=; b=E/+GHyrkvmp5GCewuRWKAT8QW/4/Xg7gp3cQleiCyaPoZVOCEOmEvwEpHl/Lin9crxl50O AV/hhxnJrxQaRI4pxzfy9etb/mO7gS/EPXcSUDHES0MXBdBl4xshZn5Jq/pUCL7LN/WXw2 qIqcQMJ1OvxA3Nn/0Dve1IY8Llf/APo= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GigZ0ihw; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736370354; a=rsa-sha256; cv=none; b=M6XrT45+nJLBHjvU3fIU7GTBS5Uhs139q5KWVdBpWD0d/7VTnKpyve+YiN66RBfy3YIqQ9 T4kUR8k6ZMLwoO5pZzEFQ/Z/ne9TDeBD9gQyKpKTAqiGgDya87G8ouMNAGgsgGDigtorcE bHGxiIA1Mmj1zEU+gXxteWptmpiqpYE= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id AB5EF5C5790 for ; Wed, 8 Jan 2025 21:05:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D415C4CEE2 for ; Wed, 8 Jan 2025 21:05:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736370352; bh=kwXlgLr3l9zOgXxiDkuldPO1eJlyXmtCFutF95wS714=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=GigZ0ihwgm3aO7imq1EEDKwXSrQi1UZC29U5IXHNMfjG9zrLIJoufNUZD+kAp06FD BQT0RAJka/PXNp6IJ7Gk5r3WNknlB/hrzUe8ATV4BO/dLHSLptzVMT/Lbu2Unt84kA 33IqD6vSc344Qf9pVRiWaL52kawqDCvhaYTSxQpuSyHVcvH13MG8CQn/JAGlEywD0A DpW6QvKCxcCOXuNbjtEZzNZKVjXyzYmYQzcAWgj6fQuAntCnsW2ul2bTVx/JNH1FPY EDuvjFZpKr1aeHyW1pcy6YvGJkkJ4jjUyxPeAhn/OVtczNqRkFE+Mr66jFKcy6Rjll Bm0rt1wn6du/w== Received: by mail-yb1-f180.google.com with SMTP id 3f1490d57ef6-e3978c00a5aso269636276.1 for ; Wed, 08 Jan 2025 13:05:52 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCVWAIY2+Nw5ADSU7qprmmXS5a4cuvnCiaru5yLIoBTumX7zYMER+x5SOVkldH5xZMwR34YWEMET+Q==@kvack.org X-Gm-Message-State: AOJu0YxuSOrH0F68tYwMQgTh7Ufj/pvcppzpYDzUf78vWitP//a/p8Yd Z3KSKasshXoi3wRpds0XMImHNxpayZcwg10KMadbyDgB2lqB53aexGHzTPbnYVZXVJ/RY1C4sEA 7sXu6yCfT7p7irSLBdJiTi5bJiQviBIAHWGqV/w== X-Google-Smtp-Source: AGHT+IF91L/hLi9gON08cjeWmPc0+jFzsG2eJlkcIzkozMJexFvAIQr2kTm76SD8VXjfh7axfpjpTofch9NZU7GneMQ= X-Received: by 2002:a05:6902:118b:b0:e4b:ebf9:15fc with SMTP id 3f1490d57ef6-e54edf42e64mr3730467276.15.1736370351741; Wed, 08 Jan 2025 13:05:51 -0800 (PST) MIME-Version: 1.0 References: <20250107094347.l37isnk3w2nmpx2i@AALNPWDAGOMEZ1.aal.scsc.local> <20250107122931.qpkn43yvs4kq3twi@AALNPWDAGOMEZ1.aal.scsc.local> In-Reply-To: <20250107122931.qpkn43yvs4kq3twi@AALNPWDAGOMEZ1.aal.scsc.local> From: Chris Li Date: Wed, 8 Jan 2025 13:05:40 -0800 X-Gmail-Original-Message-ID: X-Gm-Features: AbW1kvbEv7Q5AKjRil3IL7ReV8xUJ2TLd2BG3fh54pf6p5i4_t2yvgiLpgVAYuk Message-ID: Subject: Re: Swap Min Odrer To: Daniel Gomez Cc: David Hildenbrand , Ryan Roberts , Barry Song , Andrew Morton , linux-mm@kvack.org, Luis Chamberlain , Pankaj Raghav Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1CA691C0019 X-Rspamd-Server: rspam12 X-Stat-Signature: g6kswjoyqmd8fzaf578k6fioaiiifw4d X-Rspam-User: X-HE-Tag: 1736370353-473966 X-HE-Meta: U2FsdGVkX19GQd5oBwqY46uOwdqDewdFPgxpQcWyl7HNEdNlxV46vOJr9ogM6JQ/dEXUw1JP1wNYoNIioAVUIYFe+A5pNnPNWcqcrs0iRkna+SCBsiDbaRHiF/b+4QtRjw6PFxTON0PoYi9qSphECp4bSbbY8DZFOH0MPrLc+LxQuIe/bszVFXpNR/+Z6fckBwBq49K7gU3HLatYOE0NWaQI8+oZtQ6TATgMU+Xow3+o7HCsWjoGxxKeoXwkfbm4GPAtTA0hKdT4xEFn6S8VLjXyKzPv+U1OfXlK+Cu1ao3MgF7be4v0jHGSEh7OV4RsR1HbQXu1EzQTqzsKVHCCrwWgubeNXugjp3Z8CUiGM12bm726F4LngEm4Gwoo/LdEQndORs9A7JscAREcofcSluj6mORZsmJdh3HjnKDymJAUs/OWznd5+F1LeDsjeMCtIVoLblc7amLZDMm3pJ6rx9HMOWhHgAv+L+Zpe5RKGUIuVVqo3/b8m4zE7ZTvFwHSYvd53eRP9U+p/k32weCfj5MaX8uIJ3QtAtwy8iiE4PJ4E9PZCYbFmQSexNBzYiBCVyYX2zwb3uChOnebFFJ1pq3gk1/ZOwQMjT71co4xzSkZ/b8ksAJSdSLq97RlqV7yhDG7nDSIBZkhqnL0uGRi2fSF7+AwoW5cjQYDXXitMfiiehAKrQY2wqCQQD13Wsf1bNgJuSsHy7fwfQ1aN0yB17sr+GzuxeJzjdIHwNQ64XWHE/vW4V78AvswT28OTtCggx1vktu/zZvjVa09KsYezjbLsxrj3+kgr1Mma1Dset0Whm27dmB7BHX4fPSUzi8tgNky8su978wU3Ga8wPZ1UZXO1luVvFuIV0pZLjlUmrqIXITjyCIRPkeL6kQwFLWzv1Y6SqNwEpAMh0xtr/Iz5XHjs5AbIUxzEuwTnzMb/kKSyp4Xs9fRFpVnQAE/puqjNjSRnBcKPJauPZsKeXJ ERpu3eC7 i9uH1wAQ0H9kYHTI4YqGlzUUwDFu4ycqw6VluZwkIUWo0hayVr9XPmcKNmWZpxYfWslYqo9cTGQX4P0tID27fX0zjFjeJLdzsUfkidjIZV5vd0JgRrgwiOlf0FkjfCfsX17gwhamBVrX5iN4y139kHsfQ7gC1cdFsEOrn8sEJkIbTHG78W58oHpkv/oANQvJRjGJDJ/Fd/awQgK9Hw91n6fwxN6axNID7OdbGcnw9qZPjT7k2Z+hlu+uTjDGmK/f9fAMwrfWj3h1JsVT4TZLaNIKXXTxzuYQztbRdokIuEyQgI8LcdLZL8RYYvQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 7, 2025 at 4:29=E2=80=AFAM Daniel Gomez = wrote: > > On Tue, Jan 07, 2025 at 11:31:05AM +0100, David Hildenbrand wrote: > > On 07.01.25 10:43, Daniel Gomez wrote: > > > Hi, > > > > Hi, > > > > > > > > High-capacity SSDs require writes to be aligned with the drive's > > > indirection unit (IU), which is typically >4 KiB, to avoid RMW. To > > > support swap on these devices, we need to ensure that writes do not > > > cross IU boundaries. So, I think this may require increasing the mini= mum > > > allocation size for swap users. > > > > How would we handle swapout/swapin when we have smaller pages (just ima= gine > > someone does a mmap(4KiB))? > > Swapout would require to be aligned to the IU. An mmap of 4 KiB would > have to perform an IU KiB write, e.g. 16 KiB or 32 KiB, to avoid any > potential RMW penalty. So, I think aligning the mmap allocation to the > IU would guarantee a write of the required granularity and alignment. > But let's also look at your suggestion below with swapcache. I think only the writer needs to be grouped by IU size. Ideally the swap front end doesn't have to know about the IU size. There are many reasons forcing the swap entry size on the swap cache would be tricky. e.g. If the folio is 4K, it is tricky to force it to be 16K. Only 1 4K page is cold, the other nearby page is hot. etc etc. > > Swapin can still be performed at LBA format levels (e.g. 4 KiB) without > the same write penalty implications, and only affecting performance > if I/Os are not conformant to these boundaries. So, reading at IU > boundaries is preferred to get optimal performance, not a 'requirement'. > > > > > Could this be something that gets abstracted/handled by the swap > > implementation? (i.e., multiple small folios get added to the swapcache= but > > get written out / read in as a single unit?). Yes. > > Do you mean merging like in the block layer? I'm not entirely sure if > this could guarantee deterministically the I/O boundaries the same way > it does min order large folio allocations in the page cache. But I guess > is worth exploring as optimization. > > > > > I recall that we have been talking about a better swap abstraction for = years > > :) > > Adding Chris Li to the cc list in case he has more input. Sorry I'm a bit late to the party. Yes I do have some ideas I want to propose on the LSF/MM as topics, maybe early next week. Here are some highlights of it. I think we need to have a separation of the swap cache and the backing of IO of the swap file. I call it the "virtual swapfile". It is virtual in two aspect: 1) There is an up front size at swap on, but no up front allocation of the vmalloc array. The array grows as needed. 2) There is a virtual to physical swap entry mapping. The cost is 4 bytes per swap entry. But it will solve a lot of problems all together. IU size write grouping would be a good user of this virtual layer. Another usage case if we want to write a compressed zswap/zram entry into the SSD, we might actually encounter the size problem in another direction. e.g. writing swap entries smaller than 4K. I am still working on the write up. More details will come. Chris > > > > > Might be a good topic for LSF/MM (might or might not be a better place = than > > the MM alignment session). > > Both options work for me. LSF/MM is in 12 weeks so, having a previous > session would be great. > > Daniel > > > > > -- > > Cheers, > > > > David / dhildenb > >