linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Takero Funaki <flintglass@gmail.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>,
	Yosry Ahmed <yosryahmed@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	 Jonathan Corbet <corbet@lwn.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Domenico Cerasuolo <cerasuolodomenico@gmail.com>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration
Date: Fri, 14 Jun 2024 13:39:57 +0900	[thread overview]
Message-ID: <CAPpoddcQRJ5EYFVaUhM+f5pV-RMtq=1kzkTuWQ2GRkcux-Yzcg@mail.gmail.com> (raw)
In-Reply-To: <pezgvebjcykwgawtmvymqwktul25pgw5orxvvrbm24hjc3sizv@3yg7tbpwnlnf>

2024年6月14日(金) 1:49 Shakeel Butt <shakeel.butt@linux.dev>:
>
> On Thu, Jun 13, 2024 at 08:04:39AM GMT, Nhat Pham wrote:
> [...]
> > > > >
> > > > > Is the idea here to avoid moving the iterator to another offline memcg
> > > > > that zswap_memcg_offline_cleanup() was already called for, to avoid
> > > > > holding a ref on that memcg until the next run of zswap shrinking?
> > > > >
> > > > > If yes, I think it's probably worth doing. But why do we need to
> > > > > release and reacquire the lock in the loop above?
> > > >
> > > > Yes, the existing cleaner might leave the offline, already-cleaned memcg.
> > > >
> > > > The reacquiring lock is to not loop inside the critical section.
> > > > In shrink_worker of v0 patch, the loop was restarted on offline memcg
> > > > without releasing the lock. Nhat pointed out that we should drop the
> > > > lock after every mem_cgroup_iter() call. v1 was changed to reacquire
> > > > once per iteration like the cleaner code above.
> > >
> > > I am not sure how often we'll run into a situation where we'll be
> > > holding the lock for too long tbh. It should be unlikely to keep
> > > encountering offline memcgs for a long time.
> > >
> > > Nhat, do you think this could cause a problem in practice?
> >
> > I don't remember prescribing anything to be honest :) I think I was
> > just asking why can't we just drop the lock, then "continue;". This is
> > mostly for simplicity's sake.
> >
> > https://lore.kernel.org/linux-mm/CAKEwX=MwrRc43iM2050v5u-TPUK4Yn+a4G7+h6ieKhpQ7WtQ=A@mail.gmail.com/

I apologize for misinterpreting your comments. Removing release/reacquire.

> >
> > But I think as Takero pointed out, it would still skip over the memcg
> > that was (concurrently) updated to zswap_next_shrink by the memcg
> > offline callback.
>
> What's the issue with keep traversing until an online memcg is found?
> Something like the following:
>
>
>         spin_lock(&zswap_shrink_lock);
>         do {
>                 zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
>         } while (zswap_next_shrink && !mem_cgroup_online(zswap_next_shrink));
>
>         if (!zswap_next_shrink)
>                 zswap_next_shrink = mem_cgroup_iter(NULL, NULL, NULL);
>         ....
>
> Is the concern that there can a lot of offlined memcgs which may cause
> need resched warnings?

To avoid using the goto-based loop, here's the new version, including
Shakeel's suggestion:
```c
    do {
        spin_lock(&zswap_shrink_lock);
        /*
         * Advance the cursor to start shrinking from the next memcg
         * after zswap_next_shrink.  One memcg might be skipped from
         * shrinking if the cleaner also advanced the cursor, but it
         * will happen at most once per offlining memcg.
         */
        do {
            zswap_next_shrink = mem_cgroup_iter(NULL,
                    zswap_next_shrink, NULL);
            memcg = zswap_next_shrink;
        } while (memcg && !mem_cgroup_tryget_online(memcg));

        if (!memcg) {
            spin_unlock(&zswap_shrink_lock);
```
We can add or remove  `spin_unlock();spin_lock();` just after
mem_cgroup_iter(), if needed.
I believe the behavior is identical to v1 except for the starting
point of iteration.

For Naht's comment, 2. No skipping over zswap_next_shrink updated by
the memcg offline cleaner.
While this was true for v1, I'm moved to accept this skipping as it's
negligibly rare. As Yorsy commented, v1 retried the last memcg from
the last shrink_worker() run.

There are several options for shrink_worker where to start with:
1. Starting from the next memcg after zswap_next_shrink: It might skip
one memcg, but this is quite rare. It is the current behavior before
patch.

2. Starting from zswap_next_shrink: It might shrink one more page from
the memcg in addition to the one by the last shrink_worker() run. This
should also be rare, but probably more frequent than option 1. This is
the v0 patch behavior.

3. Addressing both: Save the last memcg as well. The worker checks if
it has been modified by the cleaner and advances only if it hasn't.
Like this:
```c
        do {
            if (zswap_last_shrink == zswap_next_shrink) {
                zswap_next_shrink = mem_cgroup_iter(NULL,
                        zswap_next_shrink, NULL);
            }
            memcg = zswap_next_shrink;
        } while (memcg && !mem_cgroup_tryget_online(memcg));
        zswap_last_shrink = memcg;
```

Which one would be better? or any other idea?


  reply	other threads:[~2024-06-14  4:46 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-08 15:53 [PATCH v1 0/3] mm: zswap: global shrinker fix and proactive shrink Takero Funaki
2024-06-08 15:53 ` [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration Takero Funaki
2024-06-10 19:16   ` Yosry Ahmed
2024-06-11 14:50     ` Takero Funaki
2024-06-11 18:26   ` Nhat Pham
2024-06-11 23:03     ` Shakeel Butt
2024-06-12 18:16     ` Takero Funaki
2024-06-12 18:28       ` Yosry Ahmed
2024-06-13  2:13         ` Takero Funaki
2024-06-13  2:18           ` Yosry Ahmed
2024-06-13  2:35             ` Takero Funaki
2024-06-13  2:57               ` Yosry Ahmed
2024-06-13 15:04                 ` Nhat Pham
2024-06-13 16:49                   ` Shakeel Butt
2024-06-14  4:39                     ` Takero Funaki [this message]
2024-06-13 16:08   ` Nhat Pham
2024-06-13 16:09     ` Nhat Pham
2024-06-08 15:53 ` [PATCH v1 2/3] mm: zswap: fix global shrinker error handling logic Takero Funaki
2024-06-10 20:27   ` Yosry Ahmed
2024-06-11 15:21     ` Takero Funaki
2024-06-11 15:51       ` Nhat Pham
2024-06-11 18:15     ` Nhat Pham
2024-06-08 15:53 ` [PATCH v1 3/3] mm: zswap: proactive shrinking before pool size limit is hit Takero Funaki
2024-06-13 15:13   ` Nhat Pham
2024-06-11 18:10 ` [PATCH v1 0/3] mm: zswap: global shrinker fix and proactive shrink Nhat Pham
2024-06-13 15:22 ` Nhat Pham
2024-06-14  4:09   ` Takero Funaki
2024-06-14 22:34     ` Nhat Pham
2024-06-14 22:48     ` Nhat Pham
2024-06-15  0:19     ` Yosry Ahmed
2024-06-20  1:03       ` Takero Funaki
2024-06-20 22:45         ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPpoddcQRJ5EYFVaUhM+f5pV-RMtq=1kzkTuWQ2GRkcux-Yzcg@mail.gmail.com' \
    --to=flintglass@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=chengming.zhou@linux.dev \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox