linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: Barry Song <21cnbao@gmail.com>
Cc: chenridong <chenridong@huawei.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Chen Ridong <chenridong@huaweicloud.com>,
	akpm@linux-foundation.org, mhocko@suse.com,  hannes@cmpxchg.org,
	yosryahmed@google.com, yuzhao@google.com,  david@redhat.com,
	ryan.roberts@arm.com, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, wangweiyang2@huawei.com,
	xieym_ict@hotmail.com,  Kairui Song <ryncsn@gmail.com>
Subject: Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the tail of LRU after shrinking
Date: Tue, 26 Nov 2024 16:17:03 -0800	[thread overview]
Message-ID: <CACePvbWpbGa9w3MNsATYHMcTSkzOu6OWw6tdiGS_=PdXYXzH1w@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4w5Tna1c0xO7w4=c+SRw1jQgHCCzNELkBURbCiAgxZ-cg@mail.gmail.com>

On Mon, Nov 18, 2024 at 1:56 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Mon, Nov 18, 2024 at 10:41 PM chenridong <chenridong@huawei.com> wrote:
> >
> >
> >
> > On 2024/11/18 12:14, Barry Song wrote:
> > > On Mon, Nov 18, 2024 at 5:03 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >>
> > >> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote:
> > >>> 2. In shrink_page_list function, if folioN is THP(2M), it may be splited
> > >>>    and added to swap cache folio by folio. After adding to swap cache,
> > >>>    it will submit io to writeback folio to swap, which is asynchronous.
> > >>>    When shrink_page_list is finished, the isolated folios list will be
> > >>>    moved back to the head of inactive lru. The inactive lru may just look
> > >>>    like this, with 512 filioes have been move to the head of inactive lru.
> > >>
> > >> I was hoping that we'd be able to stop splitting the folio when adding
> > >> to the swap cache.  Ideally. we'd add the whole 2MB and write it back
> > >> as a single unit.
> > >
> > > This is already the case: adding to the swapcache doesn’t require splitting
> > > THPs, but failing to allocate 2MB of contiguous swap slots will.
> > >
> > >>
> > >> This is going to become much more important with memdescs.  We'd have to
> > >> allocate 512 struct folios to do this, which would be about 10 4kB pages,
> > >> and if we're trying to swap out memory, we're probably low on memory.
> > >>
> > >> So I don't like this solution you have at all because it doesn't help us
> > >> get to the solution we're going to need in about a year's time.
> > >>
> > >
> > > Ridong might need to clarify why this splitting is occurring. If it’s due to the
> > > failure to allocate swap slots, we still need a solution to address it.
> > >
> > > Thanks
> > > Barry
> >
> > shrink_folio_list
> >   add_to_swap
> >     folio_alloc_swap
> >       get_swap_pages
> >         scan_swap_map_slots
> >         /*
> >         * Swapfile is not block device or not using clusters so unable
> >         * to allocate large entries.
> >         */
> >         if (!(si->flags & SWP_BLKDEV) || !si->cluster_info)
> >           return 0;
> >
> > In my test, I use a file as swap, which is not 'SWP_BLKDEV'. So it
> > failed to get get_swap_pages.
>
> Alright, a proper non-rotating swap block device would be much
> better. In your case, though, cluster allocation isn’t supported.

Ah yes. The later part of the swap allocation series removes the non
cluster allocation code path.
It is not merged to mm-unstable yet. So even a swapfile not block
device will get the cluster allocator.

>
> >
> > I think this is a race issue between 'shrink_folio_list' executing and
> > writing back asynchronously. In my test, 512 folios(THP split) were
> > added to swap, only about 60 folios had not been written back when
> > 'move_folios_to_lru' was invoked after 'shrink_folio_list'. What if
> > writing back faster? Maybe this will happen even 32 folios(without THP)
> > are in the 'folio_list' of shrink_folio_list's inputs.
>
> On a real non-rotate swap device, the race condition would occur only when
> contiguous 2MB swap slots are unavailable.
>
> Hi Chris,
> I recall you mentioned unifying the code for swap devices and swap files, or
> for non-rotating and rotating devices. I assume a swap file (not a block device)
> would also be a practical user case?

I assume you mean non-SSD vs SSD device. In this follow up series of
the swap allocator from Kairui, the old non cluster allocator gets
removed, the cluster allocator will be used all the time.

https://lore.kernel.org/linux-mm/20241022192451.38138-4-ryncsn@gmail.com/

Chris


      reply	other threads:[~2024-11-27  0:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-16  9:16 [RFC PATCH v2 0/1] " Chen Ridong
2024-11-16  9:16 ` [RFC PATCH v2 1/1] " Chen Ridong
2024-11-17  3:26   ` Barry Song
2024-11-18  2:18     ` Chen Ridong
2024-11-18  4:03   ` Matthew Wilcox
2024-11-18  4:14     ` Barry Song
2024-11-18  4:21       ` Matthew Wilcox
2024-11-25  1:19         ` chenridong
2024-11-28 23:08           ` Barry Song
2024-11-29  2:25             ` chenridong
2024-11-29  3:07               ` Barry Song
2024-11-27  0:08         ` Chris Li
2024-11-18  9:41       ` chenridong
2024-11-18  9:55         ` Barry Song
2024-11-27  0:17           ` Chris Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACePvbWpbGa9w3MNsATYHMcTSkzOu6OWw6tdiGS_=PdXYXzH1w@mail.gmail.com' \
    --to=chrisl@kernel.org \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chenridong@huawei.com \
    --cc=chenridong@huaweicloud.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=wangweiyang2@huawei.com \
    --cc=willy@infradead.org \
    --cc=xieym_ict@hotmail.com \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox