From: Chris Li <chrisl@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: Daniel Gomez <da.gomez@samsung.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Barry Song <v-songbaohua@oppo.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Luis Chamberlain <mcgrof@kernel.org>,
Pankaj Raghav <p.raghav@samsung.com>
Subject: Re: Swap Min Odrer
Date: Wed, 8 Jan 2025 13:09:12 -0800 [thread overview]
Message-ID: <CACePvbUJHcPgus-02fcuYtBoTeWub-ibKue1mcF2EdiK6UeFHg@mail.gmail.com> (raw)
In-Reply-To: <470be5fa-97d6-4045-a855-5332d3a46443@redhat.com>
On Tue, Jan 7, 2025 at 8:41 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.01.25 13:29, Daniel Gomez wrote:
> > On Tue, Jan 07, 2025 at 11:31:05AM +0100, David Hildenbrand wrote:
> >> On 07.01.25 10:43, Daniel Gomez wrote:
> >>> Hi,
> >>
> >> Hi,
> >>
> >>>
> >>> High-capacity SSDs require writes to be aligned with the drive's
> >>> indirection unit (IU), which is typically >4 KiB, to avoid RMW. To
> >>> support swap on these devices, we need to ensure that writes do not
> >>> cross IU boundaries. So, I think this may require increasing the minimum
> >>> allocation size for swap users.
> >>
> >> How would we handle swapout/swapin when we have smaller pages (just imagine
> >> someone does a mmap(4KiB))?
> >
> > Swapout would require to be aligned to the IU. An mmap of 4 KiB would
> > have to perform an IU KiB write, e.g. 16 KiB or 32 KiB, to avoid any
> > potential RMW penalty. So, I think aligning the mmap allocation to the
> > IU would guarantee a write of the required granularity and alignment.
>
> We must be prepared to handle and VMA layout with single-page VMAs,
> single-page holes etc ... :/ IMHO we should try to handle this
> transparently to the application.
>
> > But let's also look at your suggestion below with swapcache.
> >
> > Swapin can still be performed at LBA format levels (e.g. 4 KiB) without
> > the same write penalty implications, and only affecting performance
> > if I/Os are not conformant to these boundaries. So, reading at IU
> > boundaries is preferred to get optimal performance, not a 'requirement'.
> >
> >>
> >> Could this be something that gets abstracted/handled by the swap
> >> implementation? (i.e., multiple small folios get added to the swapcache but
> >> get written out / read in as a single unit?).
> >
> > Do you mean merging like in the block layer? I'm not entirely sure if
> > this could guarantee deterministically the I/O boundaries the same way
> > it does min order large folio allocations in the page cache. But I guess
> > is worth exploring as optimization.
>
> Maybe the swapcache could somehow abstract that? We currently have the
> swap slot allocator, that assigns slots to pages.
>
> Assuming we have a 16 KiB BS but a 4 KiB page, we might have various
> options to explore.
>
> For example, we could size swap slots 16 KiB, and assign even 4 KiB
> pages a single slot. This would waste swap space with small folios, that
> would go away with large folios.
We can group multiple swap 4K swap entries into one 16K write unit.
There will be no waste of the SSD.
>
> If we stick to 4 KiB swap slots, maybe pageout() could be taught to
> effectively writeback "everything" residing in the relevant swap slots
> that span a BS?
>
> I recall there was a discussion about atomic writes involving multiple
> pages, and how it is hard. Maybe with swaping it is "easier"? Absolutely
> no expert on that, unfortunately. Hoping Chris has some ideas.
Yes, see my other email about the "virtual swapfile" idea. More
detailed write up coming next week.
Chris
>
>
> >
> >>
> >> I recall that we have been talking about a better swap abstraction for years
> >> :)
> >
> > Adding Chris Li to the cc list in case he has more input.
> >
> >>
> >> Might be a good topic for LSF/MM (might or might not be a better place than
> >> the MM alignment session).
> >
> > Both options work for me. LSF/MM is in 12 weeks so, having a previous
> > session would be great.
>
> Both work for me.
>
> --
> Cheers,
>
> David / dhildenb
>
next prev parent reply other threads:[~2025-01-08 21:09 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20250107094349eucas1p1c973738624046458bbd8ca980cf6fe33@eucas1p1.samsung.com>
2025-01-07 9:43 ` Daniel Gomez
2025-01-07 10:31 ` David Hildenbrand
2025-01-07 12:29 ` Daniel Gomez
2025-01-07 16:41 ` David Hildenbrand
2025-01-08 14:14 ` Daniel Gomez
2025-01-08 20:36 ` David Hildenbrand
2025-01-08 21:19 ` Chris Li
2025-01-08 21:24 ` David Hildenbrand
2025-01-16 8:38 ` Chris Li
2025-01-20 12:02 ` David Hildenbrand
2025-01-09 3:38 ` David Rientjes
2025-01-08 21:09 ` Chris Li [this message]
2025-01-08 21:05 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACePvbUJHcPgus-02fcuYtBoTeWub-ibKue1mcF2EdiK6UeFHg@mail.gmail.com \
--to=chrisl@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=da.gomez@samsung.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=ryan.roberts@arm.com \
--cc=v-songbaohua@oppo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox