From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06A01E77188 for ; Wed, 8 Jan 2025 21:19:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52E5D6B007B; Wed, 8 Jan 2025 16:19:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DE9C6B0083; Wed, 8 Jan 2025 16:19:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37EB36B0085; Wed, 8 Jan 2025 16:19:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 19BA36B007B for ; Wed, 8 Jan 2025 16:19:51 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 96909A0908 for ; Wed, 8 Jan 2025 21:19:50 +0000 (UTC) X-FDA: 82985551740.29.960C5C3 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf16.hostedemail.com (Postfix) with ESMTP id A606518000E for ; Wed, 8 Jan 2025 21:19:48 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=IaTG9hp1; spf=pass (imf16.hostedemail.com: domain of chrisl@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736371188; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8zC3MPdsTeuzISfO1nLTJzPaFlhrBoOezJYI+0PpSnM=; b=l/YC+iD5xHSDs8v7zFyiwHdM8WL1G9o0byIcYkLLbbVhfbJt1nzgY3i52sDShyxjFb3o6c 1DHjw/rVX8Q2MPW6qVd1o6WxoCy4OIJoAlcSRqkDgJTGuqaEoKZAtfGu9jIpU5SyIBzZgG umPyrW+hErZ61nMSWCXt1PO9Gt1uGE0= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=IaTG9hp1; spf=pass (imf16.hostedemail.com: domain of chrisl@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736371188; a=rsa-sha256; cv=none; b=FV5P3HHzb9QU3NIw65B7x18chxgM5U4/wrGXCkIDQ4ZVBgxMasGtS50rBQHZRlq9QUNScA dCuQVF2bSAOVCSLx27vkMjvPFu7cNsJOO5I840Lf9XNALKewAl+HHdx+ce1EVqpoHH3YPR fGYDPShCZJbRZUtXkmknRlkQxNQsyk4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 2D28CA41BA7 for ; Wed, 8 Jan 2025 21:17:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71DD8C4CEDF for ; Wed, 8 Jan 2025 21:19:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736371187; bh=8zC3MPdsTeuzISfO1nLTJzPaFlhrBoOezJYI+0PpSnM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=IaTG9hp1pLYghLzAQt04DSe8mWgsMIgjVUvsr8UZI1Vb0aZkl00xHc+lqWUTHzjpp uLdReoXM1bnLZU2+PLzQKnzvIw9RMA4im30YLbDGPKgj0h3w8Y6vDbBFdXh34kyqqg 6o81cqyhIVSMm3g8WMhD3WUCkQDgNzIlSsnQLKtO0zAMZNmkGS1j4EJrTdLxe+0SfQ Mz+FZJf8vUgb9trgm0T57rc43WKl0he9WBzJwvQd8Mweor/b9hggyTMm3MYDB6HtCm utN8R0mSIOQBn7v4uzNYHTbAYaCnQ2mfVCK89Cd5cV4FZxq6FSKgPTrffoz0WnPVch 23335KbLauExA== Received: by mail-yb1-f170.google.com with SMTP id 3f1490d57ef6-e398484b60bso273475276.1 for ; Wed, 08 Jan 2025 13:19:47 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCWk+oK9WAitqZzxhTWwcoEumjA3Y72svmCz6m/BJiJ8R27chdRU2n0Z+vIrNyU1UM3qqK+ITpFV0w==@kvack.org X-Gm-Message-State: AOJu0Yy2wa59Lu2xfPg/q1guZ7ssgAIbqxo53u8qpnfqmZnIs/WgNEaK dcEjte0w8DsHrIK6z+cW0S1riQtCEjcg+aLaczvINMq/XUT8jxLai927wsA3/IQrBv8k+Jyi1iO cdzeuPlnuXis9S9kHy+FiStD6LZ8ISDTsC4pdDw== X-Google-Smtp-Source: AGHT+IGeKG6AhX+XGWIGT13KS3S+bStvg7wRIGB+y5ihFPoyccKNr1ToQgaQ5a9D+Br5IkTeNmZ3qO0MNGHYlxM29Qk= X-Received: by 2002:a05:6902:2786:b0:e4d:25c6:c391 with SMTP id 3f1490d57ef6-e54ee211bb6mr3706385276.34.1736371186738; Wed, 08 Jan 2025 13:19:46 -0800 (PST) MIME-Version: 1.0 References: <20250107094347.l37isnk3w2nmpx2i@AALNPWDAGOMEZ1.aal.scsc.local> <20250107122931.qpkn43yvs4kq3twi@AALNPWDAGOMEZ1.aal.scsc.local> <470be5fa-97d6-4045-a855-5332d3a46443@redhat.com> <20250108141406.3gen6dnlb3b4zga6@AALNPWDAGOMEZ1.aal.scsc.local> <85e2b81d-9255-4c54-b4ae-de52b2c02e7f@redhat.com> In-Reply-To: <85e2b81d-9255-4c54-b4ae-de52b2c02e7f@redhat.com> From: Chris Li Date: Wed, 8 Jan 2025 13:19:36 -0800 X-Gmail-Original-Message-ID: X-Gm-Features: AbW1kvZhDxSyGraFEAaxghTftFP_N9PZlCGzlYXAFTmZmwnMCQnKLes9LvcXkJc Message-ID: Subject: Re: Swap Min Odrer To: David Hildenbrand Cc: Daniel Gomez , Ryan Roberts , Barry Song , Andrew Morton , linux-mm@kvack.org, Luis Chamberlain , Pankaj Raghav , David Rientjes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Stat-Signature: zn5ujkhguff44wmh7abuk3ofxfwp9mc8 X-Rspamd-Queue-Id: A606518000E X-Rspam-User: X-HE-Tag: 1736371188-907025 X-HE-Meta: U2FsdGVkX19EUg2AHhixOrKbs8uf5HL/BuKu43JMy1OeF3ZyPT0LsWYVykG4hCWkSwiMSUzDulOyNlDa7Axqa2vumAFH1wE12MK0HC5l6bHHm3jj309p26FbdQ3j8TXyrvhMdHgR8+VNtnbfYLy1qwv8AsSezRh91nzbno9ut4m0BkCABu4idPSuvmcIMFdu3eECL/Ca++7eeWOQrnHP6mnqq64UPAUUgovDAw+iHPrbLup+ydvk1odxB5KaB2YJNp0roktHbr1YvEO6B1+lgnzjglO5NnIM1RrLb4xFXo78SL85sIzdGXxKlGYzsHHbFbUVtrVCrPUFjq33o2Utsk907kgztPK7JIjk9FsT1CD85lTZRii7ex+aMssZLlGojwgtVa5BBcjXKFsXZkl7H5G1/jlXXDuJIOaIjH8oMTMt0U+MDD32EdVR2MLf107EwjZ0uSkKgZm+k6thOae61Ww3pHp9e7Mo1cVeyAABR6oPrbL5Th4ZgXFh43ZkWfxrTjiYxwRIghEZ1Tw9PZZRZ8XNwnnylkPlwsRwW7xgTo44GqBiaXBiLgG1+UTWDw2TGerJgf8aszDJFOG56iTcNsDZ9RnMH3Tasespno/hKv3Xj7U6A26ED9SsU1m4oLHFRFnh9g1RsMAwwzrd5gZ/zWU5o5blaHEaMcmd8Izcu3pX8D46PqTp3FqtwhQt/peRKiy81SidarZYHbFwNs23dsCE+7aeZlllkoe8u1cBKCBcdQl9pUEYrSW+gBp+KVJ80mcEQ8WglaIULbFxXYRhHg3NmolsHCDmV7xYnJXC+AY9wz8XOlGMbvnr2idVz5nFtz5aRowGOS53RicA5TulKyfcBAD46zoUvWcGWHml/qbjBqgfYan6JioCPaPSB0SbxMRGR2yKgf32jlTEB4n8nwRuszExo+rRgYy3Wnp+SBZ67ylPEw4aY0aOkhlxWyzDeryDuCClLTKHE31nk5X 6kx/9R2/ VbJh+uIprpEVajfy628E3DVtj/FYA/i02svVvJpYfdyf3sJvr4D3Bq28GeNOFpCc9mNzbrjIrIyNPVnY08/mcJNtPyLS6vcabdEWdEqBrCMIAjs8Ui6yPmhF1zSq+Ic7DETlHLzZgbm5murlRLcd064vkijc31nizPD9Hk36ePs+pOinyu4zwOMrN1BSmrOi/t2Iq8PxUoWzfBX4hOgIKQ1sCMdFwWC0GKGj3zxgNAFuHiEztp5xD46rVh+ADMv8z+AzNbG/GKnVrD1qhDUeUTLelnsyp7n5UxxgblKiMPHVH5A4DXr1z7e4l0g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 8, 2025 at 12:36=E2=80=AFPM David Hildenbrand wrote: > > >> Maybe the swapcache could somehow abstract that? We currently have the= swap > >> slot allocator, that assigns slots to pages. > >> > >> Assuming we have a 16 KiB BS but a 4 KiB page, we might have various o= ptions > >> to explore. > >> > >> For example, we could size swap slots 16 KiB, and assign even 4 KiB pa= ges a > >> single slot. This would waste swap space with small folios, that would= go > >> away with large folios. > > > > So batching order-0 folios in bigger slots that match the FS BS (e.g. 1= 6 > > KiB) to perform disk writes, right? > > Batching might be one idea, but the first idea I raised here would be > that the swap slot size will match the BS (e.g., 16 KiB) and contain at > most one folio. > > So a order-0 folio would get a single slot assigned and effectively > "waste" 12 KiB of disk space. I prefer not to "waste" that. It will be wasted on the write amplification as well. > > An order-2 folio would get a single slot assigned and not waste any memor= y. > > An order-3 folio would get two slots assigned etc. (similar to how it is > done today for non-order-0 folios) > > So the penalty for using small folios would be more wasted disk space on > such devices. > > Can we also assign different orders > > to the same slot? > > I guess yes. > > And can we batch folios while keeping alignment to the > > BS (IU)? > > I assume with "batching" you would mean that we could actually have > multiple folios inside a single BS, like up to 4 order-0 folios in a > single 16 KiB block? That might be one way of doing it, although I > suspect this can get a bit complicated. That would be my preference. BTW, another usage case is that if we want to write compressed swap entries into the SSD (to reduce the wear on SSD), we will also end up with a similar situation where we want to combine multiple swap entries into a write unit. > > IIUC, we can perform 4 KiB read/write, but we must only have a single > write per block, because otherwise we might get the RMW problems, > correct? Then, maybe a mechanism to guarantee that only a single swap > writeback within a BS can happen at one point in time might also be an > alternative. Yes, I do see that batching and grouping write of the swap entries is necessary and useful. > > > > >> > >> If we stick to 4 KiB swap slots, maybe pageout() could be taught to > >> effectively writeback "everything" residing in the relevant swap slots= that > >> span a BS? > >> > >> I recall there was a discussion about atomic writes involving multiple > >> pages, and how it is hard. Maybe with swaping it is "easier"? Absolute= ly no > >> expert on that, unfortunately. Hoping Chris has some ideas. > > > > Not sure about the discussion but I guess the main concern for atomic > > and swaping is the alignment and the questions I raised above. > > Yes, I think that's similar. Agree, it is very much similar. It can share a single solution, the "virtual swapfile". That is my proposal. > > > > >> > >> > >>> > >>>> > >>>> I recall that we have been talking about a better swap abstraction f= or years > >>>> :) > >>> > >>> Adding Chris Li to the cc list in case he has more input. > >>> > >>>> > >>>> Might be a good topic for LSF/MM (might or might not be a better pla= ce than > >>>> the MM alignment session). > >>> > >>> Both options work for me. LSF/MM is in 12 weeks so, having a previous > >>> session would be great. > >> > >> Both work for me. > > > > Can we start by scheduling this topic for the next available MM session= ? > > Would be great to get initial feedback/thoughts/concerns, etc while we > > keep this thread going on. > > Yeah, it would probably great to present the problem and the exact > constraints we have (e.g., things stupid me asks above regarding actual > sizes in which we can perform reads and writes), so we can discuss > possible solutions. > > @David R., is the slot in two weeks already taken? Hopefully I can send out the "virtual swapfile" proposal in time and we can discuss that as one of the possible approaches. Chris > > -- > Cheers, > > David / dhildenb >