linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Takero Funaki <flintglass@gmail.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Nhat Pham <nphamcs@gmail.com>,
	 Chengming Zhou <chengming.zhou@linux.dev>,
	Jonathan Corbet <corbet@lwn.net>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Domenico Cerasuolo <cerasuolodomenico@gmail.com>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration
Date: Tue, 11 Jun 2024 23:50:08 +0900	[thread overview]
Message-ID: <CAPpodddsnOF7nKX-ijsujAcnjvLHB2UTyyZknc6uTgb3UWXY2g@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkaKaMpni2tA_G6DhiRLdV+O3AmXE81JyKY=PEN54o=aAw@mail.gmail.com>

On 2024/06/11 4:16, Yosry Ahmed wrote:
>
> I am really finding it difficult to understand what the diff is trying
> to do. We are holding a lock that protects zswap_next_shrink. We
> always access it with the lock held. Why do we need all of this?
>
> Adding READ_ONCE() and WRITE_ONCE() where they are not needed is just
> confusing imo.

I initially thought that reading new values from external variables
inside a loop required protection from compiler optimization. I will
remove the access macros in v2.

> 'memcg' will always be NULL on the first iteration, so we will always
> start by shrinking 'zswap_next_shrink' for a second time before moving
> the iterator.
>
>> +               } else {
>> +                       /* advance cursor */
>> +                       memcg = mem_cgroup_iter(NULL, memcg, NULL);
>> +                       WRITE_ONCE(zswap_next_shrink, memcg);
> Again, I don't see what this is achieving. The first iteration will
> always set 'memcg' to 'zswap_next_shrink', and then we will always
> move the iterator forward. The only difference I see is that we shrink
> 'zswap_next_shrink' twice in a row now (last 'memcg' in prev call, and
> first 'memcg' in this call).

The reason for checking if `memcg != next_memcg` was to ensure that we
do not skip memcg that might be modified by the cleaner.  For example,
say we get memcg A and save it. When the cleaner advances the cursor
from A to B, we then advance from B to C, shrink C. We have to check
that A in the zswap_next_shrink is untouched before advancing the
cursor.

If this approach is overly complicated and ignoring B is acceptable,
the beginning of the loop can be simplified to:

        do {
+iternext:
                spin_lock(&zswap_shrink_lock);

                zswap_next_shrink = mem_cgroup_iter(NULL,
zswap_next_shrink, NULL);
                memcg = zswap_next_shrink;



>> @@ -1434,16 +1468,25 @@ static void shrink_worker(struct work_struct *w)
>>                 }
>>
>>                 if (!mem_cgroup_tryget_online(memcg)) {
>> -                       /* drop the reference from mem_cgroup_iter() */
>> -                       mem_cgroup_iter_break(NULL, memcg);
>> -                       zswap_next_shrink = NULL;
>> +                       /*
>> +                        * It is an offline memcg which we cannot shrink
>> +                        * until its pages are reparented.
>> +                        *
>> +                        * Since we cannot determine if the offline cleaner has
>> +                        * been already called or not, the offline memcg must be
>> +                        * put back unconditonally. We cannot abort the loop while
>> +                        * zswap_next_shrink has a reference of this offline memcg.
>> +                        */
> You actually deleted the code that actually puts the ref to the
> offline memcg above.
>
> Why don't you just replace mem_cgroup_iter_break(NULL, memcg) with
> mem_cgroup_iter(NULL, memcg, NULL) here? I don't understand what the
> patch is trying to do to be honest. This patch is a lot more confusing
> than it should be.


>>                         spin_unlock(&zswap_shrink_lock);
>> -
>> -                       if (++failures == MAX_RECLAIM_RETRIES)
>> -                               break;
>> -
>> -                       goto resched;
>> +                       goto iternext;
>>                 }

Removing the `break` on max failures from the if-offline branch is
required to not leave the reference of the next memcg.

If we just replace the mem_cgroup_iter_break with `memcg =
zswap_next_shrink = mem_cgroup_iter(NULL, memcg, NULL);` and break the
loop on failure, the next memcg will be left in zswap_next_shrink. If
zswap_next_shrink is also offline, the reference will be held
indefinitely.

When we get offline memcg, we cannot determine if the cleaner has
already been called or will be called later. We have to put back the
offline memcg reference before returning from the worker function.
This potential memcg leak is the reason why I think we cannot break
the loop here.
In this patch, the `goto iternext` ensures the offline memcg is
released in the next iteration (or by cleaner waiting for our unlock).

>
> Also, I would like Nhat to weigh in here. Perhaps the decision to
> reset the iterator instead of advancing it in this case was made for a
> reason that we should honor. Maybe cgroups are usually offlined
> together so we will keep running into offline cgroups here if we
> continue? I am not sure.

From comment I removed,

>> -                * We need to retry if we have gone through a full round trip, or if we
>> -                * got an offline memcg (or else we risk undoing the effect of the
>> -                * zswap memcg offlining cleanup callback). This is not catastrophic
>> -                * per se, but it will keep the now offlined memcg hostage for a while.

I think this mentioned the potential memcg leak, which is now resolved
by this patch modifying the offline memcg case.


  reply	other threads:[~2024-06-11 14:50 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-08 15:53 [PATCH v1 0/3] mm: zswap: global shrinker fix and proactive shrink Takero Funaki
2024-06-08 15:53 ` [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration Takero Funaki
2024-06-10 19:16   ` Yosry Ahmed
2024-06-11 14:50     ` Takero Funaki [this message]
2024-06-11 18:26   ` Nhat Pham
2024-06-11 23:03     ` Shakeel Butt
2024-06-12 18:16     ` Takero Funaki
2024-06-12 18:28       ` Yosry Ahmed
2024-06-13  2:13         ` Takero Funaki
2024-06-13  2:18           ` Yosry Ahmed
2024-06-13  2:35             ` Takero Funaki
2024-06-13  2:57               ` Yosry Ahmed
2024-06-13 15:04                 ` Nhat Pham
2024-06-13 16:49                   ` Shakeel Butt
2024-06-14  4:39                     ` Takero Funaki
2024-06-13 16:08   ` Nhat Pham
2024-06-13 16:09     ` Nhat Pham
2024-06-08 15:53 ` [PATCH v1 2/3] mm: zswap: fix global shrinker error handling logic Takero Funaki
2024-06-10 20:27   ` Yosry Ahmed
2024-06-11 15:21     ` Takero Funaki
2024-06-11 15:51       ` Nhat Pham
2024-06-11 18:15     ` Nhat Pham
2024-06-08 15:53 ` [PATCH v1 3/3] mm: zswap: proactive shrinking before pool size limit is hit Takero Funaki
2024-06-13 15:13   ` Nhat Pham
2024-06-11 18:10 ` [PATCH v1 0/3] mm: zswap: global shrinker fix and proactive shrink Nhat Pham
2024-06-13 15:22 ` Nhat Pham
2024-06-14  4:09   ` Takero Funaki
2024-06-14 22:34     ` Nhat Pham
2024-06-14 22:48     ` Nhat Pham
2024-06-15  0:19     ` Yosry Ahmed
2024-06-20  1:03       ` Takero Funaki
2024-06-20 22:45         ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPpodddsnOF7nKX-ijsujAcnjvLHB2UTyyZknc6uTgb3UWXY2g@mail.gmail.com \
    --to=flintglass@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=chengming.zhou@linux.dev \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox