Re: Swap Min Odrer

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Chris Li <chrisl@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: Daniel Gomez <da.gomez@samsung.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	 Barry Song <v-songbaohua@oppo.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,  Luis Chamberlain <mcgrof@kernel.org>,
	Pankaj Raghav <p.raghav@samsung.com>,
	 David Rientjes <rientjes@google.com>,
	Kairui Song <ryncsn@gmail.com>
Subject: Re: Swap Min Odrer
Date: Thu, 16 Jan 2025 00:38:13 -0800	[thread overview]
Message-ID: <CACePvbUkMYMencuKfpDqtG1Ej7LiUS87VRAXb8sBn1yANikEmQ@mail.gmail.com> (raw)
In-Reply-To: <127a4c29-e34d-401c-a642-cc73d9d1c2f6@redhat.com>

On Wed, Jan 8, 2025 at 1:24 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.01.25 22:19, Chris Li wrote:
> > On Wed, Jan 8, 2025 at 12:36 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >>>> Maybe the swapcache could somehow abstract that? We currently have the swap
> >>>> slot allocator, that assigns slots to pages.
> >>>>
> >>>> Assuming we have a 16 KiB BS but a 4 KiB page, we might have various options
> >>>> to explore.
> >>>>
> >>>> For example, we could size swap slots 16 KiB, and assign even 4 KiB pages a
> >>>> single slot. This would waste swap space with small folios, that would go
> >>>> away with large folios.
> >>>
> >>> So batching order-0 folios in bigger slots that match the FS BS (e.g. 16
> >>> KiB) to perform disk writes, right?
> >>
> >> Batching might be one idea, but the first idea I raised here would be
> >> that the swap slot size will match the BS (e.g., 16 KiB) and contain at
> >> most one folio.
> >>
> >> So a order-0 folio would get a single slot assigned and effectively
> >> "waste" 12 KiB of disk space.
> >
> > I prefer not to "waste" that. It will be wasted on the write
> > amplification as well.
>
> If it can be implemented fairly easily, sure! :)
>
> Looking forward to hearing about the proposal!

Hi David,

Sorry I have been pretty busy with other work related stuff recently.
I did not have a chance to do the write up yet.
I might not be able to make the next Wednesday upstream alignment
meeting for this topic.

Adding Kairui to the CC list, I have been collerating with him on the
swap related changes.

I do see it is beneficial to separate out the swap cache part of the
swap entries (virtual) and block layer write locations (physical).
So the current swap allocator allocates the virtual swap entry and
still keeps the property of swap entry contiguous within a folio. The
virtual swap entry also owns the current swap count and swap cache
reclaim.

Have a lookup array to translate the virtual entry to the physical
location. The physical location also needs an allocator, but much
simpler. The physical location allocation does not participate in swap
cache reclaim, those happen in the virtual entry. Nor does it have the
swap count, only 1 bit of information used or not. The physical entry
allocation does not need to be contiguous within the folio either.

This redirection layer will provide the flexibility to do more. e.g.
bridge the gap between the block size between virtual entry and
physical entry. It can provide the IO batching layer to merge more
than one virtual swap entry into a larger physical writing block.
Similarly it can allow swap to write out compressed zswap/zram into
the SSD, using similar IO batching.

The memory overhead is 4 byte per swap entry for the lookup table.
Maybe 1 bit per physical entry for that location is used or not.

That is the key part of the idea.

There are other ideas like dynamic growing the vmalloc array pages can
be viewed as incremental local improvement, it does not change the
core data structure of swap much.

Chris

next prev parent reply	other threads:[~2025-01-16  8:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250107094349eucas1p1c973738624046458bbd8ca980cf6fe33@eucas1p1.samsung.com>
2025-01-07  9:43 ` Daniel Gomez
2025-01-07 10:31   ` David Hildenbrand
2025-01-07 12:29     ` Daniel Gomez
2025-01-07 16:41       ` David Hildenbrand
2025-01-08 14:14         ` Daniel Gomez
2025-01-08 20:36           ` David Hildenbrand
2025-01-08 21:19             ` Chris Li
2025-01-08 21:24               ` David Hildenbrand
2025-01-16  8:38                 ` Chris Li [this message]
2025-01-20 12:02                   ` David Hildenbrand
2025-01-09  3:38               ` David Rientjes
2025-01-08 21:09         ` Chris Li
2025-01-08 21:05       ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACePvbUkMYMencuKfpDqtG1Ej7LiUS87VRAXb8sBn1yANikEmQ@mail.gmail.com \
    --to=chrisl@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=da.gomez@samsung.com \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=rientjes@google.com \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=v-songbaohua@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox