From: Chris Li <chrisl@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: Daniel Gomez <da.gomez@samsung.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Barry Song <v-songbaohua@oppo.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Luis Chamberlain <mcgrof@kernel.org>,
Pankaj Raghav <p.raghav@samsung.com>,
David Rientjes <rientjes@google.com>
Subject: Re: Swap Min Odrer
Date: Wed, 8 Jan 2025 13:19:36 -0800 [thread overview]
Message-ID: <CACePvbXRzK5pomkAYs6VXd8qvWXWDN8BydxDPkD7H9kWPn11Qg@mail.gmail.com> (raw)
In-Reply-To: <85e2b81d-9255-4c54-b4ae-de52b2c02e7f@redhat.com>
On Wed, Jan 8, 2025 at 12:36 PM David Hildenbrand <david@redhat.com> wrote:
>
> >> Maybe the swapcache could somehow abstract that? We currently have the swap
> >> slot allocator, that assigns slots to pages.
> >>
> >> Assuming we have a 16 KiB BS but a 4 KiB page, we might have various options
> >> to explore.
> >>
> >> For example, we could size swap slots 16 KiB, and assign even 4 KiB pages a
> >> single slot. This would waste swap space with small folios, that would go
> >> away with large folios.
> >
> > So batching order-0 folios in bigger slots that match the FS BS (e.g. 16
> > KiB) to perform disk writes, right?
>
> Batching might be one idea, but the first idea I raised here would be
> that the swap slot size will match the BS (e.g., 16 KiB) and contain at
> most one folio.
>
> So a order-0 folio would get a single slot assigned and effectively
> "waste" 12 KiB of disk space.
I prefer not to "waste" that. It will be wasted on the write
amplification as well.
>
> An order-2 folio would get a single slot assigned and not waste any memory.
>
> An order-3 folio would get two slots assigned etc. (similar to how it is
> done today for non-order-0 folios)
>
> So the penalty for using small folios would be more wasted disk space on
> such devices.
>
> Can we also assign different orders
> > to the same slot?
>
> I guess yes.
>
> And can we batch folios while keeping alignment to the
> > BS (IU)?
>
> I assume with "batching" you would mean that we could actually have
> multiple folios inside a single BS, like up to 4 order-0 folios in a
> single 16 KiB block? That might be one way of doing it, although I
> suspect this can get a bit complicated.
That would be my preference. BTW, another usage case is that if we
want to write compressed swap entries into the SSD (to reduce the wear
on SSD), we will also end up with a similar situation where we want to
combine multiple swap entries into a write unit.
>
> IIUC, we can perform 4 KiB read/write, but we must only have a single
> write per block, because otherwise we might get the RMW problems,
> correct? Then, maybe a mechanism to guarantee that only a single swap
> writeback within a BS can happen at one point in time might also be an
> alternative.
Yes, I do see that batching and grouping write of the swap entries is
necessary and useful.
>
> >
> >>
> >> If we stick to 4 KiB swap slots, maybe pageout() could be taught to
> >> effectively writeback "everything" residing in the relevant swap slots that
> >> span a BS?
> >>
> >> I recall there was a discussion about atomic writes involving multiple
> >> pages, and how it is hard. Maybe with swaping it is "easier"? Absolutely no
> >> expert on that, unfortunately. Hoping Chris has some ideas.
> >
> > Not sure about the discussion but I guess the main concern for atomic
> > and swaping is the alignment and the questions I raised above.
>
> Yes, I think that's similar.
Agree, it is very much similar. It can share a single solution, the
"virtual swapfile". That is my proposal.
>
> >
> >>
> >>
> >>>
> >>>>
> >>>> I recall that we have been talking about a better swap abstraction for years
> >>>> :)
> >>>
> >>> Adding Chris Li to the cc list in case he has more input.
> >>>
> >>>>
> >>>> Might be a good topic for LSF/MM (might or might not be a better place than
> >>>> the MM alignment session).
> >>>
> >>> Both options work for me. LSF/MM is in 12 weeks so, having a previous
> >>> session would be great.
> >>
> >> Both work for me.
> >
> > Can we start by scheduling this topic for the next available MM session?
> > Would be great to get initial feedback/thoughts/concerns, etc while we
> > keep this thread going on.
>
> Yeah, it would probably great to present the problem and the exact
> constraints we have (e.g., things stupid me asks above regarding actual
> sizes in which we can perform reads and writes), so we can discuss
> possible solutions.
>
> @David R., is the slot in two weeks already taken?
Hopefully I can send out the "virtual swapfile" proposal in time and
we can discuss that as one of the possible approaches.
Chris
>
> --
> Cheers,
>
> David / dhildenb
>
next prev parent reply other threads:[~2025-01-08 21:19 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20250107094349eucas1p1c973738624046458bbd8ca980cf6fe33@eucas1p1.samsung.com>
2025-01-07 9:43 ` Daniel Gomez
2025-01-07 10:31 ` David Hildenbrand
2025-01-07 12:29 ` Daniel Gomez
2025-01-07 16:41 ` David Hildenbrand
2025-01-08 14:14 ` Daniel Gomez
2025-01-08 20:36 ` David Hildenbrand
2025-01-08 21:19 ` Chris Li [this message]
2025-01-08 21:24 ` David Hildenbrand
2025-01-16 8:38 ` Chris Li
2025-01-20 12:02 ` David Hildenbrand
2025-01-09 3:38 ` David Rientjes
2025-01-08 21:09 ` Chris Li
2025-01-08 21:05 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACePvbXRzK5pomkAYs6VXd8qvWXWDN8BydxDPkD7H9kWPn11Qg@mail.gmail.com \
--to=chrisl@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=da.gomez@samsung.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=rientjes@google.com \
--cc=ryan.roberts@arm.com \
--cc=v-songbaohua@oppo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox