From: Chris Li <chrisl@kernel.org>
To: Baoquan He <bhe@redhat.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
kasong@tencent.com, baohua@kernel.org, nphamcs@gmail.com,
shikemeng@huaweicloud.com
Subject: Re: [PATCH 2/3] mm/swapfile.c: use swap_info[] to find the swap device
Date: Thu, 2 Oct 2025 21:50:51 -0700 [thread overview]
Message-ID: <CACePvbWgKCjW0fHT03QUXWsj9sV0+1Co9_=t9OWQA5-ApEJyRg@mail.gmail.com> (raw)
In-Reply-To: <aN83NwGpMCFvrOEy@MiWiFi-R3L-srv>
On Thu, Oct 2, 2025 at 7:39 PM Baoquan He <bhe@redhat.com> wrote:
>
> On 10/02/25 at 08:59am, Chris Li wrote:
> > On Tue, Sep 30, 2025 at 9:35 PM Baoquan He <bhe@redhat.com> wrote:
> > >
> > > Now, swap_active_head is only used to find a present swap device when
> > > trying to swapoff it. In fact, swap_info[] is a short array which is
> > > 32 at maximum, and usually the unused one can be reused, so the
> > > searching for target mostly only iterates the foremost several used
> > > slots. And swapoff is a rarely used operation, efficiency is not so
> > > important. Then it's unnecessary to get a plist to make it.
> > >
> > > Here go by iterating swap_info[] to find the swap device instead of
> > > iterating swap_active_head.
> > >
> > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > ---
> > > mm/swapfile.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > index 5d71c748a2fe..18b52cc20749 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -2641,6 +2641,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> > > struct inode *inode;
> > > struct filename *pathname;
> > > int err, found = 0;
> > > + unsigned int type;
> > >
> > > if (!capable(CAP_SYS_ADMIN))
> > > return -EPERM;
> > > @@ -2658,7 +2659,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> > >
> > > mapping = victim->f_mapping;
> > > spin_lock(&swap_lock);
> > > - plist_for_each_entry(p, &swap_active_head, list) {
> > > + for (type = 0; type < nr_swapfiles; type++) {
> > > + p = swap_info[type];
> >
> > After some careful thinking of the swapoff behavior, I now consider
> > this patch buggy.
> >
> > On behavior change is that, previously the swapoff is dependent on the
> > swap_active_list and only swapon device is on that list.
> >
> > Now you change to enumerate every device at swap off and depend on the
> > p->flags & SWP_WRITEOK to filter out the device. Which will go through
> > all 32 swap devices regardless.
> >
> > > if (p->flags & SWP_WRITEOK) {
> > > if (p->swap_file->f_mapping == mapping) {
> >
> > Considering the following race:
> >
> > CPU1:
> > swapoff( "/dev/sda1")
> > spin_lock(&swap_lock);
> >
> > for (type = 0; type < nr_swapfiles; type++) {
> > if (p->flags & SWP_WRITEOK) {
> > // found the device.
> > }
> > ....
> > spin_lock(&p->lock);
> > del_from_avail_list(p, true);
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > plist_del(&p->list, &swap_active_head);
> > atomic_long_sub(p->pages, &nr_swap_pages);
> > total_swap_pages -= p->pages;
> > spin_unlock(&p->lock);
> > spin_unlock(&swap_lock);
> >
> > // at this point the p->flags still have SWP_WRITEOK.
> > // swap_lock and p->lock are both unlocked.
>
> No, SWP_WRITEOK is cleared in del_from_avail_list(). At this point,
> there's no more SWP_WRITEOK in p->flags when both swap_lock and p->lock
> are lifted.
Ah, you are right. I am wrong. I missed the "si->flags &=
~SWP_WRITEOK;" inside the del_from_avail_list().
I read at the end of swapoff it set the si->flags = 0; thinking the
SWP_WRITEOK was clear there. It is cleared earlier.
That should be fine. Sorry about the misread.
>
> >
> > Now the second CPU swapoff race begins
> >
> > CPU2:
> > swapoff( "/dev/sda1")
> >
> > spin_lock(&swap_lock); // can take this lock
> >
> > for (type = 0;
> > type < nr_swapfiles; type++) {
> > if (p->flags &
> > SWP_WRITEOK) {
> > // also
> > found the device. because p->flags are not clear to 0 yet.
> > }
> > ....
> >
> > spin_lock(&p->lock); // can also take this lock
> >
> > del_from_avail_list(p, true);
> >
> > plist_del(&p->list, &swap_active_head); <=== blow up here
> > // remove p not
> > the in list.
> >
> > atomic_long_sub(p->pages, &nr_swap_pages);
> > // double subtracting ..
> >
> > total_swap_pages -= p->pages;
> > // double
> > subtracting as well.
> > spin_unlock(&p->lock);
> > spin_unlock(&swap_lock);
> >
> > Please confirm or deny if that race is possible.
> > If it is, consider it a NACK to this patch.
I withdraw the NACK. Sorry about that.
Chris
next prev parent reply other threads:[~2025-10-03 4:51 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-01 4:34 [PATCH 0/3] mm/swap: remove plist swap_active_head Baoquan He
2025-10-01 4:34 ` [PATCH 1/3] mm/swapfile.c: remove __has_usable_swap() Baoquan He
2025-10-01 4:34 ` [PATCH 2/3] mm/swapfile.c: use swap_info[] to find the swap device Baoquan He
2025-10-02 15:59 ` Chris Li
2025-10-03 2:38 ` Baoquan He
2025-10-03 4:50 ` Chris Li [this message]
2025-10-03 5:29 ` Baoquan He
2025-10-01 4:34 ` [PATCH 3/3] mm/swap: remove unneeded swap_active_head Baoquan He
2025-10-02 8:33 ` Chris Li
2025-10-02 13:42 ` Baoquan He
2025-10-09 3:26 ` Andrew Morton
2025-10-09 7:47 ` Baoquan He
2025-10-09 17:09 ` Chris Li
2025-10-10 2:56 ` YoungJun Park
2025-10-10 1:28 ` Andrew Morton
2025-10-10 2:14 ` Baoquan He
2025-10-10 2:34 ` Chris Li
2025-10-10 2:33 ` Chris Li
2025-10-10 2:52 ` Chris Li
2025-10-02 6:04 ` [PATCH 0/3] mm/swap: remove plist swap_active_head Chris Li
2025-10-02 13:09 ` Baoquan He
2025-10-02 16:23 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACePvbWgKCjW0fHT03QUXWsj9sV0+1Co9_=t9OWQA5-ApEJyRg@mail.gmail.com' \
--to=chrisl@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox