From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6C10E77188 for ; Wed, 8 Jan 2025 21:09:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 74DC06B0093; Wed, 8 Jan 2025 16:09:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FDB56B0095; Wed, 8 Jan 2025 16:09:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5ECAA6B0096; Wed, 8 Jan 2025 16:09:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4292E6B0093 for ; Wed, 8 Jan 2025 16:09:27 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E67A11408B3 for ; Wed, 8 Jan 2025 21:09:26 +0000 (UTC) X-FDA: 82985525532.03.8F5F0D6 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf18.hostedemail.com (Postfix) with ESMTP id ECEBD1C0012 for ; Wed, 8 Jan 2025 21:09:24 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=fV5LmOSD; spf=pass (imf18.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736370565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6axs2K0UgShV1J3QtK7KN7yvB1nN6UosJYdAN0TwKIQ=; b=RL0JLg7bdIvAoSXd0lxhhSq5bOoQokgxV78LVi3VfphBBQS6GoU/QGuHhHtbT50P4uv4vK Rc2fIp9LTQdAqbnsUErBYLF4FNmKK3DJq7+5yeECWTAwUAaCKGLXsTxOFMkMuSwL4a26R7 26/3C7CGGJTpujp5eqXmCEyfSbSJqzk= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=fV5LmOSD; spf=pass (imf18.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736370565; a=rsa-sha256; cv=none; b=gPcj8GTQzlwOw2AxnDMsTT5FSoIhrjaByOju7wGBoz1lyrXvtGjDI0qGg92uc6XlzYyfo5 1yXMXMRzs4ObD8slEwpzz6pIeJjplLvR2grnIKbm3slkr1PeSLasTLBKPFcmmcH94k4pO1 FfVUgbogF0i2rKMKcIGVxRtjp77YFzU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 16EF45C5694 for ; Wed, 8 Jan 2025 21:08:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C96DAC4CED3 for ; Wed, 8 Jan 2025 21:09:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736370563; bh=6axs2K0UgShV1J3QtK7KN7yvB1nN6UosJYdAN0TwKIQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=fV5LmOSDtPkvJfctLKCOZY9uTFlG6lQ04TWQ9+F0xJPL5V7d8JydY0RZ5huev4KJo gvyE3xTIegB48cKAPjvPqdiXJCVHCVyt2epH6ul4mOIF7QCH2Uoak4FAbDwen/JJds 4tABqSJiUo4BOvGWeIPwy1WXwS9F351+5jN6/1z4wGVOud49bmrt7+cqKeua9RSlFa t2urWEMVXmiOYqLjAk9UOO7wcXsP5azUwYpPfeXf5xSs4gkS91AaeS8DZW4HU8e7Re jjWcMT7m6uWBgofNSRaqKXsq+XfS53BFQVBacqzt39Q+gP2lWW1xMxhOYhQpX7dNt4 QkJw+kmL8hhuw== Received: by mail-yb1-f176.google.com with SMTP id 3f1490d57ef6-e53aa843a24so492719276.0 for ; Wed, 08 Jan 2025 13:09:23 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUC/6Iw0Y8mUdpNq8YDCLViwkkip9k4g+NYvNOq64BkhmQGFG7x1X1HY8Lprb7DSAt487mpjFs41g==@kvack.org X-Gm-Message-State: AOJu0YyhUr5nNPJIOt086KeR80sgQe4I/xK8gqVGiSJzR4VpR3cXwKR0 DJi4Gg2xBHGT5r6KCsymDw9XePVidqxAtrIPvDAQNlNjHtJHzti21symcY67SPAzlSIPKNtt9eg 7MczM+u0WXtFMM2q4hrY+aaVr1A4abj/NB1SRJg== X-Google-Smtp-Source: AGHT+IGrsBEpR/I4Hk1VoV6EcoK1qQcibSvU0VGMuuWLw9cfu9Wt1gfB77yB9IfuQJ/NALPcgLt2Aub7ZRWgG5JhBUA= X-Received: by 2002:a05:690c:6310:b0:6ee:5104:f43a with SMTP id 00721157ae682-6f543f48fc6mr7962267b3.20.1736370563163; Wed, 08 Jan 2025 13:09:23 -0800 (PST) MIME-Version: 1.0 References: <20250107094347.l37isnk3w2nmpx2i@AALNPWDAGOMEZ1.aal.scsc.local> <20250107122931.qpkn43yvs4kq3twi@AALNPWDAGOMEZ1.aal.scsc.local> <470be5fa-97d6-4045-a855-5332d3a46443@redhat.com> In-Reply-To: <470be5fa-97d6-4045-a855-5332d3a46443@redhat.com> From: Chris Li Date: Wed, 8 Jan 2025 13:09:12 -0800 X-Gmail-Original-Message-ID: X-Gm-Features: AbW1kvbtkdtOwB0PDTbzxaleIaL9E1SCVXteqVmi3Vu27dki9uzs4Q0hSb4cjtk Message-ID: Subject: Re: Swap Min Odrer To: David Hildenbrand Cc: Daniel Gomez , Ryan Roberts , Barry Song , Andrew Morton , linux-mm@kvack.org, Luis Chamberlain , Pankaj Raghav Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: ECEBD1C0012 X-Rspamd-Server: rspam12 X-Stat-Signature: 8jns7ed3wmjdrkjcya4tgefeaqxjhrcc X-Rspam-User: X-HE-Tag: 1736370564-518984 X-HE-Meta: U2FsdGVkX18nN1FBEITm6VZcT5W8jdt4C2krxqR0cjgz3oCwl2ADhXJma9SvM7xvocGSmetBWXOO9AyLmhmAmkk7Ynqs7V+rPAAkr2kfFQgsQHONlKfpmKcfqunse6iRgN5/5g5MJ9KmImaQzT1FNdTsoGugCzisPzfOsdQda42aqLF2FReEsmjoydnLgBrgiH1iDPsyWNbVjC/t0eylX+WuhybgEuz9SDymK7XQcaEk74MnSF6E7mEtziJTEIwnJf9OV2rOgyNX3z1PeOHXIXRtYaHK4VTg7f619hWj44BTt2wub+/hQO+jAeSnNLTzqwOg6cuRbkzxtHUSbitl+AY1vL/wy9siU6ixKpGcxzZ9qN2VIjuizVG0KYkmRA8bjnaM2K4IGXrWs9SZrz3LtI0D2civiOWMMXcCVDaa+9h2pgsJmy5zQ6foZZ6LcprN2K/2mG1yae98x+lnX+yRUWGY3IIf7ElQ7QwFjzsthpne9qp4eRKIFMTBd+G5+IvZ/n9rtHrJWuRChYY0PireAhfENK/gC1qGm0qRVi0yrAfscYeLxj0vbABYByVohveUFzZAf5SejW7UGNBP7eyVor2iI97XXq97ngmIIaC1n60pRIPPqN/0vM13pq6Nyni29qdWIdSBU8lV1IxxRyHnEu4WQxm4iDOYhidG2A26hmqUNB5PrkmIktRcBhqJFTOFU+WicRjPhSCu+/qM+++96VFxxMj/TVb8Pa5nFpOhf1Iyp473nl8TWm+QuYWlOgMJKyuR7yBQ0LfjfOuFOXyBs5u5UNcHVONQaAkRLM5TJOu47rCLUfhIpd7Wbxc+EA+4pn6nvVu/31AI8+AUnA5q8uLnCnroYnY5hsQmsLvf7fMVpwQD1MFk6AbDftPPUOCwr7Vw2Opvfsyd2FrZwkjRX5DobvSpKEzfWqm0FyEsOj5WsiA5soAa3PMCq/Zvez6pSTy6ygq5Rb/K3zHLecj HwGGoeoE QuHSR9X5tNB2kxnYbtGKPpwhrlzWF9OrSCkmDKgQSZIx+uZYFTef3pIHd1NEaH9sEZh4Q2l2Z/tD2DJG2qByxXtQSKPwvbbSRtTaHyTmuigwUmfowOM4/qfLtwDXESMgN4yIpQt7DCTO7pKMMdop6B5BiIBuuXG7iznt0lQWxN7nPhdLYNc6ril8d07L1RpRc9N2/vBdJLxaZaDpVCYj6bwPjQJ3Vbzz5CIVaaR65Fkge4KtYY+OAbuNnlofBsr57vseQI2SqhffH38ntOBL/dXdZZZxDTJZ2gvwN3eqc3Oi0/77qOyxJ0J9J9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 7, 2025 at 8:41=E2=80=AFAM David Hildenbrand = wrote: > > On 07.01.25 13:29, Daniel Gomez wrote: > > On Tue, Jan 07, 2025 at 11:31:05AM +0100, David Hildenbrand wrote: > >> On 07.01.25 10:43, Daniel Gomez wrote: > >>> Hi, > >> > >> Hi, > >> > >>> > >>> High-capacity SSDs require writes to be aligned with the drive's > >>> indirection unit (IU), which is typically >4 KiB, to avoid RMW. To > >>> support swap on these devices, we need to ensure that writes do not > >>> cross IU boundaries. So, I think this may require increasing the mini= mum > >>> allocation size for swap users. > >> > >> How would we handle swapout/swapin when we have smaller pages (just im= agine > >> someone does a mmap(4KiB))? > > > > Swapout would require to be aligned to the IU. An mmap of 4 KiB would > > have to perform an IU KiB write, e.g. 16 KiB or 32 KiB, to avoid any > > potential RMW penalty. So, I think aligning the mmap allocation to the > > IU would guarantee a write of the required granularity and alignment. > > We must be prepared to handle and VMA layout with single-page VMAs, > single-page holes etc ... :/ IMHO we should try to handle this > transparently to the application. > > > But let's also look at your suggestion below with swapcache. > > > > Swapin can still be performed at LBA format levels (e.g. 4 KiB) without > > the same write penalty implications, and only affecting performance > > if I/Os are not conformant to these boundaries. So, reading at IU > > boundaries is preferred to get optimal performance, not a 'requirement'= . > > > >> > >> Could this be something that gets abstracted/handled by the swap > >> implementation? (i.e., multiple small folios get added to the swapcach= e but > >> get written out / read in as a single unit?). > > > > Do you mean merging like in the block layer? I'm not entirely sure if > > this could guarantee deterministically the I/O boundaries the same way > > it does min order large folio allocations in the page cache. But I gues= s > > is worth exploring as optimization. > > Maybe the swapcache could somehow abstract that? We currently have the > swap slot allocator, that assigns slots to pages. > > Assuming we have a 16 KiB BS but a 4 KiB page, we might have various > options to explore. > > For example, we could size swap slots 16 KiB, and assign even 4 KiB > pages a single slot. This would waste swap space with small folios, that > would go away with large folios. We can group multiple swap 4K swap entries into one 16K write unit. There will be no waste of the SSD. > > If we stick to 4 KiB swap slots, maybe pageout() could be taught to > effectively writeback "everything" residing in the relevant swap slots > that span a BS? > > I recall there was a discussion about atomic writes involving multiple > pages, and how it is hard. Maybe with swaping it is "easier"? Absolutely > no expert on that, unfortunately. Hoping Chris has some ideas. Yes, see my other email about the "virtual swapfile" idea. More detailed write up coming next week. Chris > > > > > >> > >> I recall that we have been talking about a better swap abstraction for= years > >> :) > > > > Adding Chris Li to the cc list in case he has more input. > > > >> > >> Might be a good topic for LSF/MM (might or might not be a better place= than > >> the MM alignment session). > > > > Both options work for me. LSF/MM is in 12 weeks so, having a previous > > session would be great. > > Both work for me. > > -- > Cheers, > > David / dhildenb >