Re: Swap Min Odrer - David Hildenbrand

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Chris Li <chrisl@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Barry Song <v-songbaohua@oppo.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Luis Chamberlain <mcgrof@kernel.org>,
	Pankaj Raghav <p.raghav@samsung.com>,
	David Rientjes <rientjes@google.com>,
	Kairui Song <ryncsn@gmail.com>
Subject: Re: Swap Min Odrer
Date: Mon, 20 Jan 2025 13:02:24 +0100	[thread overview]
Message-ID: <5b6b2d8f-b984-41ad-8020-3550ffb81ecb@redhat.com> (raw)
In-Reply-To: <CACePvbUkMYMencuKfpDqtG1Ej7LiUS87VRAXb8sBn1yANikEmQ@mail.gmail.com>

On 16.01.25 09:38, Chris Li wrote:
> On Wed, Jan 8, 2025 at 1:24 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.01.25 22:19, Chris Li wrote:
>>> On Wed, Jan 8, 2025 at 12:36 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>>>> Maybe the swapcache could somehow abstract that? We currently have the swap
>>>>>> slot allocator, that assigns slots to pages.
>>>>>>
>>>>>> Assuming we have a 16 KiB BS but a 4 KiB page, we might have various options
>>>>>> to explore.
>>>>>>
>>>>>> For example, we could size swap slots 16 KiB, and assign even 4 KiB pages a
>>>>>> single slot. This would waste swap space with small folios, that would go
>>>>>> away with large folios.
>>>>>
>>>>> So batching order-0 folios in bigger slots that match the FS BS (e.g. 16
>>>>> KiB) to perform disk writes, right?
>>>>
>>>> Batching might be one idea, but the first idea I raised here would be
>>>> that the swap slot size will match the BS (e.g., 16 KiB) and contain at
>>>> most one folio.
>>>>
>>>> So a order-0 folio would get a single slot assigned and effectively
>>>> "waste" 12 KiB of disk space.
>>>
>>> I prefer not to "waste" that. It will be wasted on the write
>>> amplification as well.
>>
>> If it can be implemented fairly easily, sure! :)
>>
>> Looking forward to hearing about the proposal!
> 
> Hi David,

Hi!

> 
> Sorry I have been pretty busy with other work related stuff recently.

I'm in a similar situation :D

> I did not have a chance to do the write up yet.
> I might not be able to make the next Wednesday upstream alignment
> meeting for this topic.
> 
> Adding Kairui to the CC list, I have been collerating with him on the
> swap related changes.

Is this similar to

https://lkml.kernel.org/r/20250116092254.204549-1-nphamcs@gmail.com

?

> 
> I do see it is beneficial to separate out the swap cache part of the
> swap entries (virtual) and block layer write locations (physical).
> So the current swap allocator allocates the virtual swap entry and
> still keeps the property of swap entry contiguous within a folio. The
> virtual swap entry also owns the current swap count and swap cache
> reclaim.

Right.

> 
> Have a lookup array to translate the virtual entry to the physical
> location. The physical location also needs an allocator, but much
> simpler. The physical location allocation does not participate in swap
> cache reclaim, those happen in the virtual entry. Nor does it have the
> swap count, only 1 bit of information used or not. The physical entry
> allocation does not need to be contiguous within the folio either.

Agreed.

> 
> This redirection layer will provide the flexibility to do more. e.g.
> bridge the gap between the block size between virtual entry and
> physical entry. It can provide the IO batching layer to merge more
> than one virtual swap entry into a larger physical writing block.
> Similarly it can allow swap to write out compressed zswap/zram into
> the SSD, using similar IO batching.
> 
> The memory overhead is 4 byte per swap entry for the lookup table.
> Maybe 1 bit per physical entry for that location is used or not.
> 
> That is the key part of the idea.

Okay, rings a bell, I think that was raised in some form in the past.

> 
> There are other ideas like dynamic growing the vmalloc array pages can
> be viewed as incremental local improvement, it does not change the
> core data structure of swap much.

Interesting, thanks!

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2025-01-20 12:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250107094349eucas1p1c973738624046458bbd8ca980cf6fe33@eucas1p1.samsung.com>
2025-01-07  9:43 ` Daniel Gomez
2025-01-07 10:31   ` David Hildenbrand
2025-01-07 12:29     ` Daniel Gomez
2025-01-07 16:41       ` David Hildenbrand
2025-01-08 14:14         ` Daniel Gomez
2025-01-08 20:36           ` David Hildenbrand
2025-01-08 21:19             ` Chris Li
2025-01-08 21:24               ` David Hildenbrand
2025-01-16  8:38                 ` Chris Li
2025-01-20 12:02                   ` David Hildenbrand [this message]
2025-01-09  3:38               ` David Rientjes
2025-01-08 21:09         ` Chris Li
2025-01-08 21:05       ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b6b2d8f-b984-41ad-8020-3550ffb81ecb@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=da.gomez@samsung.com \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=rientjes@google.com \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=v-songbaohua@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox