From: David Hildenbrand <david@redhat.com>
To: Chris Li <chrisl@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Barry Song <v-songbaohua@oppo.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Luis Chamberlain <mcgrof@kernel.org>,
Pankaj Raghav <p.raghav@samsung.com>,
David Rientjes <rientjes@google.com>,
Kairui Song <ryncsn@gmail.com>
Subject: Re: Swap Min Odrer
Date: Mon, 20 Jan 2025 13:02:24 +0100 [thread overview]
Message-ID: <5b6b2d8f-b984-41ad-8020-3550ffb81ecb@redhat.com> (raw)
In-Reply-To: <CACePvbUkMYMencuKfpDqtG1Ej7LiUS87VRAXb8sBn1yANikEmQ@mail.gmail.com>
On 16.01.25 09:38, Chris Li wrote:
> On Wed, Jan 8, 2025 at 1:24 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.01.25 22:19, Chris Li wrote:
>>> On Wed, Jan 8, 2025 at 12:36 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>>>> Maybe the swapcache could somehow abstract that? We currently have the swap
>>>>>> slot allocator, that assigns slots to pages.
>>>>>>
>>>>>> Assuming we have a 16 KiB BS but a 4 KiB page, we might have various options
>>>>>> to explore.
>>>>>>
>>>>>> For example, we could size swap slots 16 KiB, and assign even 4 KiB pages a
>>>>>> single slot. This would waste swap space with small folios, that would go
>>>>>> away with large folios.
>>>>>
>>>>> So batching order-0 folios in bigger slots that match the FS BS (e.g. 16
>>>>> KiB) to perform disk writes, right?
>>>>
>>>> Batching might be one idea, but the first idea I raised here would be
>>>> that the swap slot size will match the BS (e.g., 16 KiB) and contain at
>>>> most one folio.
>>>>
>>>> So a order-0 folio would get a single slot assigned and effectively
>>>> "waste" 12 KiB of disk space.
>>>
>>> I prefer not to "waste" that. It will be wasted on the write
>>> amplification as well.
>>
>> If it can be implemented fairly easily, sure! :)
>>
>> Looking forward to hearing about the proposal!
>
> Hi David,
Hi!
>
> Sorry I have been pretty busy with other work related stuff recently.
I'm in a similar situation :D
> I did not have a chance to do the write up yet.
> I might not be able to make the next Wednesday upstream alignment
> meeting for this topic.
>
> Adding Kairui to the CC list, I have been collerating with him on the
> swap related changes.
Is this similar to
https://lkml.kernel.org/r/20250116092254.204549-1-nphamcs@gmail.com
?
>
> I do see it is beneficial to separate out the swap cache part of the
> swap entries (virtual) and block layer write locations (physical).
> So the current swap allocator allocates the virtual swap entry and
> still keeps the property of swap entry contiguous within a folio. The
> virtual swap entry also owns the current swap count and swap cache
> reclaim.
Right.
>
> Have a lookup array to translate the virtual entry to the physical
> location. The physical location also needs an allocator, but much
> simpler. The physical location allocation does not participate in swap
> cache reclaim, those happen in the virtual entry. Nor does it have the
> swap count, only 1 bit of information used or not. The physical entry
> allocation does not need to be contiguous within the folio either.
Agreed.
>
> This redirection layer will provide the flexibility to do more. e.g.
> bridge the gap between the block size between virtual entry and
> physical entry. It can provide the IO batching layer to merge more
> than one virtual swap entry into a larger physical writing block.
> Similarly it can allow swap to write out compressed zswap/zram into
> the SSD, using similar IO batching.
>
> The memory overhead is 4 byte per swap entry for the lookup table.
> Maybe 1 bit per physical entry for that location is used or not.
>
> That is the key part of the idea.
Okay, rings a bell, I think that was raised in some form in the past.
>
> There are other ideas like dynamic growing the vmalloc array pages can
> be viewed as incremental local improvement, it does not change the
> core data structure of swap much.
Interesting, thanks!
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-01-20 12:02 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20250107094349eucas1p1c973738624046458bbd8ca980cf6fe33@eucas1p1.samsung.com>
2025-01-07 9:43 ` Daniel Gomez
2025-01-07 10:31 ` David Hildenbrand
2025-01-07 12:29 ` Daniel Gomez
2025-01-07 16:41 ` David Hildenbrand
2025-01-08 14:14 ` Daniel Gomez
2025-01-08 20:36 ` David Hildenbrand
2025-01-08 21:19 ` Chris Li
2025-01-08 21:24 ` David Hildenbrand
2025-01-16 8:38 ` Chris Li
2025-01-20 12:02 ` David Hildenbrand [this message]
2025-01-09 3:38 ` David Rientjes
2025-01-08 21:09 ` Chris Li
2025-01-08 21:05 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5b6b2d8f-b984-41ad-8020-3550ffb81ecb@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=da.gomez@samsung.com \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=rientjes@google.com \
--cc=ryan.roberts@arm.com \
--cc=ryncsn@gmail.com \
--cc=v-songbaohua@oppo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox