From: Yosry Ahmed <yosryahmed@google.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>,
akpm@linux-foundation.org, hannes@cmpxchg.org, hughd@google.com,
shakeel.butt@linux.dev, ryan.roberts@arm.com,
ying.huang@intel.com, chrisl@kernel.org, david@redhat.com,
kasong@tencent.com, willy@infradead.org,
viro@zeniv.linux.org.uk, baohua@kernel.org,
chengming.zhou@linux.dev, linux-mm@kvack.org,
kernel-team@meta.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/2] remove SWAP_MAP_SHMEM
Date: Tue, 24 Sep 2024 08:07:33 -0700 [thread overview]
Message-ID: <CAJD7tkZFu3DbovTwyRdQmEG=7nQtmzrjQVgyhE4mNzbCtZxFZA@mail.gmail.com> (raw)
In-Reply-To: <CAKEwX=PiOdrR7Ad5XoT8pRZDLB=q6B_fmwQ3ScgWFPNptBuHPw@mail.gmail.com>
On Tue, Sep 24, 2024 at 7:32 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Mon, Sep 23, 2024 at 8:25 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
> >
> >
> > On 2024/9/24 10:15, Yosry Ahmed wrote:
> > > On Mon, Sep 23, 2024 at 6:55 PM Baolin Wang
> > > <baolin.wang@linux.alibaba.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2024/9/24 07:11, Nhat Pham wrote:
> > >>> The SWAP_MAP_SHMEM state was originally introduced in the commit
> > >>> aaa468653b4a ("swap_info: note SWAP_MAP_SHMEM"), to quickly determine if a
> > >>> swap entry belongs to shmem during swapoff.
> > >>>
> > >>> However, swapoff has since been rewritten drastically in the commit
> > >>> b56a2d8af914 ("mm: rid swapoff of quadratic complexity"). Now
> > >>> having swap count == SWAP_MAP_SHMEM value is basically the same as having
> > >>> swap count == 1, and swap_shmem_alloc() behaves analogously to
> > >>> swap_duplicate()
> > >>>
> > >>> This RFC proposes the removal of this state and the associated helper to
> > >>> simplify the state machine (both mentally and code-wise). We will also
> > >>> have an extra state/special value that can be repurposed (for swap entries
> > >>> that never gets re-duplicated).
> > >>>
> > >>> Another motivation (albeit a bit premature at the moment) is the new swap
> > >>> abstraction I am currently working on, that would allow for swap/zswap
> > >>> decoupling, swapoff optimization, etc. The fewer states and swap API
> > >>> functions there are, the simpler the conversion will be.
> > >>>
> > >>> I am sending this series first as an RFC, just in case I missed something
> > >>> or misunderstood this state, or if someone has a swap optimization in mind
> > >>> for shmem that would require this special state.
> > >>
> > >> The idea makes sense to me. I did a quick test with shmem mTHP, and
> > >> encountered the following warning which is triggered by
> > >> 'VM_WARN_ON(usage == 1 && nr > 1)' in __swap_duplicate().
> > >
> > > Apparently __swap_duplicate() does not currently handle increasing the
> > > swap count for multiple swap entries by 1 (i.e. usage == 1) because it
> > > does not handle rolling back count increases when
> > > swap_count_continued() fails.
> > >
> > > I guess this voids my Reviewed-by until we sort this out. Technically
> > > swap_count_continued() won't ever be called for shmem because we only
> > > ever increment the count by 1, but there is no way to know this in
> > > __swap_duplicate() without SWAP_HAS_SHMEM.
>
> Ah this is my bad. I compiled with CONFIG_THP_SWAP, but forgot to
> remove the swapfile check (that's another can of worms, but I need
> data before submitting the patch to remove it...)
>
> One thing we can do is instead of warning here, we can handle it in
> the for loop check, where we have access to count - that's the point
> of having that for-loop check anyway? :)
>
> There's a couple of ways to go about it:
>
> 1. VM_WARN_ON(usage == 1 && nr > 1 && count != 0 );
Hmm that should work, although it's a bit complicated tbh.
> (or more accurately, (count & ~COUNT_CONTINUED) >= SWAP_MAP_MAX))
I think this will make the warning very hard to hit if there's a
misuse of __swap_duplicate(). It will only be hit when an entry needs
count continuation, which I am not sure is very common. If there's a
bug, the warning will potentially catch it too late, if ever.
The side effect here is failing to decrement the swap count of some
swap entries which will lead to them never being freed, essentially
leaking swap capacity slowly over time. I am not sure if there are
more detrimental effects.
>
> 2. Alternatively, instead of warning here, we can simply return
> -ENOMEM. Then, at shmem callsite, have a VM_WARN_ON/VM_BUG_ON(), since
> this MUST succeed.
We still fail to rollback incremented counts though when we return
-ENOMEM, right? Maybe I didn't get what you mean.
>
> Either solutions should follow with careful documentation to make it
> clear the expectation/guarantee of the new API.
>
> Yosry, Baolin, how do you two feel about this? Would something like
> this work? I need to test it first, but let me know if I'm missing
> something.
>
> If this does not work, we can do what Baolin is suggesting, and
> perhaps maintain the swap_shmem_alloc() helper. It's less than ideal,
> but at least we still lose a state...
Depending on the complexity tbh, right now removing SWAP_MAP_SHMEM is
just a cleanup with small wins, so if it's too complicated to remove
it it may not be worth it. I am assuming with your ongoing work, it
becomes much more valuable, so maybe if it's too complicated we can
defer it until the benefits are realizable?
next prev parent reply other threads:[~2024-09-24 15:08 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-23 23:11 Nhat Pham
2024-09-23 23:11 ` [RFC PATCH 1/2] swapfile: add a batched variant for swap_duplicate() Nhat Pham
2024-09-23 23:11 ` [RFC PATCH 2/2] swap: shmem: remove SWAP_MAP_SHMEM Nhat Pham
2024-09-24 0:32 ` Yosry Ahmed
2024-09-24 0:20 ` [RFC PATCH 0/2] " Yosry Ahmed
2024-09-24 1:55 ` Baolin Wang
2024-09-24 2:15 ` Yosry Ahmed
2024-09-24 3:25 ` Baolin Wang
2024-09-24 14:32 ` Nhat Pham
2024-09-24 15:07 ` Yosry Ahmed [this message]
2024-09-24 15:48 ` Nhat Pham
2024-09-24 18:11 ` Yosry Ahmed
2024-09-25 6:26 ` Barry Song
2024-09-25 7:24 ` Huang, Ying
2024-09-25 7:38 ` Barry Song
2024-09-25 1:53 ` Baolin Wang
2024-09-25 14:37 ` Nhat Pham
2024-09-26 1:59 ` Huang, Ying
2024-09-26 3:30 ` Baolin Wang
2024-09-26 3:59 ` Barry Song
2024-09-26 22:50 ` Nhat Pham
2024-09-26 4:00 ` Barry Song
2024-09-25 7:19 ` Huang, Ying
2024-09-25 7:32 ` Barry Song
2024-09-25 14:21 ` Nhat Pham
2024-09-25 14:24 ` Nhat Pham
2024-09-25 14:28 ` Nhat Pham
2024-09-24 20:15 ` Chris Li
2024-09-24 21:30 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJD7tkZFu3DbovTwyRdQmEG=7nQtmzrjQVgyhE4mNzbCtZxFZA@mail.gmail.com' \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox