From: Baoquan He <bhe@redhat.com>
To: Chris Li <chrisl@kernel.org>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
kasong@tencent.com, baohua@kernel.org, nphamcs@gmail.com,
shikemeng@huaweicloud.com
Subject: Re: [PATCH 2/3] mm/swapfile.c: use swap_info[] to find the swap device
Date: Fri, 3 Oct 2025 10:38:47 +0800 [thread overview]
Message-ID: <aN83NwGpMCFvrOEy@MiWiFi-R3L-srv> (raw)
In-Reply-To: <CACePvbXeFxbtH3c0Wvn56m+wsK0wqxP86ge5Ud73OvqnfDOwMw@mail.gmail.com>
On 10/02/25 at 08:59am, Chris Li wrote:
> On Tue, Sep 30, 2025 at 9:35 PM Baoquan He <bhe@redhat.com> wrote:
> >
> > Now, swap_active_head is only used to find a present swap device when
> > trying to swapoff it. In fact, swap_info[] is a short array which is
> > 32 at maximum, and usually the unused one can be reused, so the
> > searching for target mostly only iterates the foremost several used
> > slots. And swapoff is a rarely used operation, efficiency is not so
> > important. Then it's unnecessary to get a plist to make it.
> >
> > Here go by iterating swap_info[] to find the swap device instead of
> > iterating swap_active_head.
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> > mm/swapfile.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 5d71c748a2fe..18b52cc20749 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -2641,6 +2641,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> > struct inode *inode;
> > struct filename *pathname;
> > int err, found = 0;
> > + unsigned int type;
> >
> > if (!capable(CAP_SYS_ADMIN))
> > return -EPERM;
> > @@ -2658,7 +2659,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> >
> > mapping = victim->f_mapping;
> > spin_lock(&swap_lock);
> > - plist_for_each_entry(p, &swap_active_head, list) {
> > + for (type = 0; type < nr_swapfiles; type++) {
> > + p = swap_info[type];
>
> After some careful thinking of the swapoff behavior, I now consider
> this patch buggy.
>
> On behavior change is that, previously the swapoff is dependent on the
> swap_active_list and only swapon device is on that list.
>
> Now you change to enumerate every device at swap off and depend on the
> p->flags & SWP_WRITEOK to filter out the device. Which will go through
> all 32 swap devices regardless.
>
> > if (p->flags & SWP_WRITEOK) {
> > if (p->swap_file->f_mapping == mapping) {
>
> Considering the following race:
>
> CPU1:
> swapoff( "/dev/sda1")
> spin_lock(&swap_lock);
>
> for (type = 0; type < nr_swapfiles; type++) {
> if (p->flags & SWP_WRITEOK) {
> // found the device.
> }
> ....
> spin_lock(&p->lock);
> del_from_avail_list(p, true);
~~~~~~~~~~~~~~~~~~~~~~~~~~~
> plist_del(&p->list, &swap_active_head);
> atomic_long_sub(p->pages, &nr_swap_pages);
> total_swap_pages -= p->pages;
> spin_unlock(&p->lock);
> spin_unlock(&swap_lock);
>
> // at this point the p->flags still have SWP_WRITEOK.
> // swap_lock and p->lock are both unlocked.
No, SWP_WRITEOK is cleared in del_from_avail_list(). At this point,
there's no more SWP_WRITEOK in p->flags when both swap_lock and p->lock
are lifted.
>
> Now the second CPU swapoff race begins
>
> CPU2:
> swapoff( "/dev/sda1")
>
> spin_lock(&swap_lock); // can take this lock
>
> for (type = 0;
> type < nr_swapfiles; type++) {
> if (p->flags &
> SWP_WRITEOK) {
> // also
> found the device. because p->flags are not clear to 0 yet.
> }
> ....
>
> spin_lock(&p->lock); // can also take this lock
>
> del_from_avail_list(p, true);
>
> plist_del(&p->list, &swap_active_head); <=== blow up here
> // remove p not
> the in list.
>
> atomic_long_sub(p->pages, &nr_swap_pages);
> // double subtracting ..
>
> total_swap_pages -= p->pages;
> // double
> subtracting as well.
> spin_unlock(&p->lock);
> spin_unlock(&swap_lock);
>
> Please confirm or deny if that race is possible.
> If it is, consider it a NACK to this patch.
>
> Please consider that even trivial patches can have subtle behavior
> change and it has non zero cost to reviewers. It can introduce subtle
> bugs unintentionally as well. In this case, I feel that the
> gain(removing a list, total less than 1K) is too small, not a very
> good return on investment.
>
> Chris
>
next prev parent reply other threads:[~2025-10-03 2:55 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-01 4:34 [PATCH 0/3] mm/swap: remove plist swap_active_head Baoquan He
2025-10-01 4:34 ` [PATCH 1/3] mm/swapfile.c: remove __has_usable_swap() Baoquan He
2025-10-01 4:34 ` [PATCH 2/3] mm/swapfile.c: use swap_info[] to find the swap device Baoquan He
2025-10-02 15:59 ` Chris Li
2025-10-03 2:38 ` Baoquan He [this message]
2025-10-03 4:50 ` Chris Li
2025-10-03 5:29 ` Baoquan He
2025-10-01 4:34 ` [PATCH 3/3] mm/swap: remove unneeded swap_active_head Baoquan He
2025-10-02 8:33 ` Chris Li
2025-10-02 13:42 ` Baoquan He
2025-10-09 3:26 ` Andrew Morton
2025-10-09 7:47 ` Baoquan He
2025-10-09 17:09 ` Chris Li
2025-10-10 2:56 ` YoungJun Park
2025-10-10 1:28 ` Andrew Morton
2025-10-10 2:14 ` Baoquan He
2025-10-10 2:34 ` Chris Li
2025-10-10 2:33 ` Chris Li
2025-10-10 2:52 ` Chris Li
2025-10-02 6:04 ` [PATCH 0/3] mm/swap: remove plist swap_active_head Chris Li
2025-10-02 13:09 ` Baoquan He
2025-10-02 16:23 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aN83NwGpMCFvrOEy@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=chrisl@kernel.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox