linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] mm: zswap: fixes for global shrinker
@ 2024-07-27 23:06 Takero Funaki
  2024-07-27 23:06 ` [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration Takero Funaki
  2024-07-27 23:06 ` [PATCH v4 2/2] mm: zswap: fix global shrinker error handling logic Takero Funaki
  0 siblings, 2 replies; 7+ messages in thread
From: Takero Funaki @ 2024-07-27 23:06 UTC (permalink / raw)
  To: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou, Andrew Morton
  Cc: Takero Funaki, linux-mm, linux-kernel

This series addresses issues in the zswap global shrinker that could not
shrink stored pages. With this series, the shrinker continues to shrink
pages until it reaches the accept threshold more reliably, gives much
higher writeback when the zswap pool limit is hit.

v4 is for cleanup and improvement in code formatting, comments, and
commit logs. No behavioral changes have been made since v3:
https://lore.kernel.org/linux-mm/20240720044127.508042-1-flintglass@gmail.com/

Changes in v4:
- Updated comments and commit logs to clarify expected behaviors (Yosry,
  Nhat)
- Merged duplicated spin_unlock() in if branches (Nhat)
- Renamed writeback attempts counter (Nhat, Chengming)

Changes in v3:
- Extract fixes for shrinker as a separate patch series.
- Fix comments and commit messages. (Chengming, Yosry)
- Drop logic to detect rare doubly advancing cursor. (Yosry)

Changes in v2:
mm: zswap: fix global shrinker memcg iteration:
- Change the loop style (Yosry, Nhat, Shakeel)
mm: zswap: fix global shrinker error handling logic:
- Change error code for no-writeback memcg. (Yosry)
- Use nr_scanned to check if lru is empty. (Yosry)

Changes in v1:
mm: zswap: fix global shrinker memcg iteration:
- Drop and reacquire spinlock before skipping a memcg.
- Add some comment to clarify the locking mechanism.

---

Takero Funaki (2):
  mm: zswap: fix global shrinker memcg iteration
  mm: zswap: fix global shrinker error handling logic

 mm/zswap.c | 112 +++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 82 insertions(+), 30 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration
  2024-07-27 23:06 [PATCH v4 0/2] mm: zswap: fixes for global shrinker Takero Funaki
@ 2024-07-27 23:06 ` Takero Funaki
  2024-07-29 18:24   ` Yosry Ahmed
  2024-07-30 19:18   ` Nhat Pham
  2024-07-27 23:06 ` [PATCH v4 2/2] mm: zswap: fix global shrinker error handling logic Takero Funaki
  1 sibling, 2 replies; 7+ messages in thread
From: Takero Funaki @ 2024-07-27 23:06 UTC (permalink / raw)
  To: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou, Andrew Morton
  Cc: Takero Funaki, linux-mm, linux-kernel

This patch fixes an issue where the zswap global shrinker stopped
iterating through the memcg tree.

The problem was that shrink_worker() would restart iterating memcg tree
from the tree root, considering an offline memcg as a failure, and abort
shrinking after encountering the same offline memcg 16 times even if
there is only one offline memcg. After this change, an offline memcg in
the tree is no longer considered a failure. This allows the shrinker to
continue shrinking the other online memcgs regardless of whether an
offline memcg exists, gives higher zswap writeback activity.

To avoid holding refcount of offline memcg encountered during the memcg
tree walking, shrink_worker() must continue iterating to release the
offline memcg to ensure the next memcg stored in the cursor is online.

The offline memcg cleaner has also been changed to avoid the same issue.
When the next memcg of the offlined memcg is also offline, the refcount
stored in the iteration cursor was held until the next shrink_worker()
run. The cleaner must release the offline memcg recursively.

Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
Signed-off-by: Takero Funaki <flintglass@gmail.com>
---
 mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 49 insertions(+), 24 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index adeaf9c97fde..e9b5343256cd 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -765,12 +765,31 @@ void zswap_folio_swapin(struct folio *folio)
 	}
 }
 
+/*
+ * This function should be called when a memcg is being offlined.
+ *
+ * Since the global shrinker shrink_worker() may hold a reference
+ * of the memcg, we must check and release the reference in
+ * zswap_next_shrink.
+ *
+ * shrink_worker() must handle the case where this function releases
+ * the reference of memcg being shrunk.
+ */
 void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
 {
 	/* lock out zswap shrinker walking memcg tree */
 	spin_lock(&zswap_shrink_lock);
-	if (zswap_next_shrink == memcg)
-		zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
+	if (zswap_next_shrink == memcg) {
+		do {
+			zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
+		} while (zswap_next_shrink && !mem_cgroup_online(zswap_next_shrink));
+		/*
+		 * We verified the next memcg is online.  Even if the next
+		 * memcg is being offlined here, another cleaner must be
+		 * waiting for our lock.  We can leave the online memcg
+		 * reference.
+		 */
+	}
 	spin_unlock(&zswap_shrink_lock);
 }
 
@@ -1304,43 +1323,49 @@ static void shrink_worker(struct work_struct *w)
 	/* Reclaim down to the accept threshold */
 	thr = zswap_accept_thr_pages();
 
-	/* global reclaim will select cgroup in a round-robin fashion. */
+	/* global reclaim will select cgroup in a round-robin fashion.
+	 *
+	 * We save iteration cursor memcg into zswap_next_shrink,
+	 * which can be modified by the offline memcg cleaner
+	 * zswap_memcg_offline_cleanup().
+	 *
+	 * Since the offline cleaner is called only once, we cannot leave an
+	 * offline memcg reference in zswap_next_shrink.
+	 * We can rely on the cleaner only if we get online memcg under lock.
+	 *
+	 * If we get an offline memcg, we cannot determine if the cleaner has
+	 * already been called or will be called later. We must put back the
+	 * reference before returning from this function. Otherwise, the
+	 * offline memcg left in zswap_next_shrink will hold the reference
+	 * until the next run of shrink_worker().
+	 */
 	do {
 		spin_lock(&zswap_shrink_lock);
-		zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
-		memcg = zswap_next_shrink;
 
 		/*
-		 * We need to retry if we have gone through a full round trip, or if we
-		 * got an offline memcg (or else we risk undoing the effect of the
-		 * zswap memcg offlining cleanup callback). This is not catastrophic
-		 * per se, but it will keep the now offlined memcg hostage for a while.
-		 *
+		 * Start shrinking from the next memcg after zswap_next_shrink.
+		 * When the offline cleaner has already advanced the cursor,
+		 * advancing the cursor here overlooks one memcg, but this
+		 * should be negligibly rare.
+		 */
+		do {
+			memcg = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
+			zswap_next_shrink = memcg;
+		} while (memcg && !mem_cgroup_tryget_online(memcg));
+		/*
 		 * Note that if we got an online memcg, we will keep the extra
 		 * reference in case the original reference obtained by mem_cgroup_iter
 		 * is dropped by the zswap memcg offlining callback, ensuring that the
 		 * memcg is not killed when we are reclaiming.
 		 */
-		if (!memcg) {
-			spin_unlock(&zswap_shrink_lock);
-			if (++failures == MAX_RECLAIM_RETRIES)
-				break;
-
-			goto resched;
-		}
-
-		if (!mem_cgroup_tryget_online(memcg)) {
-			/* drop the reference from mem_cgroup_iter() */
-			mem_cgroup_iter_break(NULL, memcg);
-			zswap_next_shrink = NULL;
-			spin_unlock(&zswap_shrink_lock);
+		spin_unlock(&zswap_shrink_lock);
 
+		if (!memcg) {
 			if (++failures == MAX_RECLAIM_RETRIES)
 				break;
 
 			goto resched;
 		}
-		spin_unlock(&zswap_shrink_lock);
 
 		ret = shrink_memcg(memcg);
 		/* drop the extra reference */
-- 
2.43.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 2/2] mm: zswap: fix global shrinker error handling logic
  2024-07-27 23:06 [PATCH v4 0/2] mm: zswap: fixes for global shrinker Takero Funaki
  2024-07-27 23:06 ` [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration Takero Funaki
@ 2024-07-27 23:06 ` Takero Funaki
  1 sibling, 0 replies; 7+ messages in thread
From: Takero Funaki @ 2024-07-27 23:06 UTC (permalink / raw)
  To: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou, Andrew Morton
  Cc: Takero Funaki, linux-mm, linux-kernel

This patch fixes the zswap global shrinker, which did not shrink the
zpool as expected.

The issue addressed is that shrink_worker() did not distinguish between
unexpected errors and expected errors, such as failed writeback from an
empty memcg. The shrinker would stop shrinking after iterating through
the memcg tree 16 times, even if there was only one empty memcg.

With this patch, the shrinker no longer considers encountering an empty
memcg, encountering a memcg with writeback disabled, or reaching the end
of a memcg tree walk as a failure, as long as there are memcgs that are
candidates for writeback. Systems with one or more empty memcgs will now
observe significantly higher zswap writeback activity after the zswap
pool limit is hit.

To avoid an infinite loop when there are no writeback candidates, this
patch tracks writeback attempts during memcg tree walks and limits
reties if no writeback candidates are found.

To handle the empty memcg case, the helper function shrink_memcg() is
modified to check if the memcg is empty and then return -ENOENT.

Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
Signed-off-by: Takero Funaki <flintglass@gmail.com>
---
 mm/zswap.c | 41 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index e9b5343256cd..60c8b1232ec9 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1293,10 +1293,10 @@ static struct shrinker *zswap_alloc_shrinker(void)
 
 static int shrink_memcg(struct mem_cgroup *memcg)
 {
-	int nid, shrunk = 0;
+	int nid, shrunk = 0, scanned = 0;
 
 	if (!mem_cgroup_zswap_writeback_enabled(memcg))
-		return -EINVAL;
+		return -ENOENT;
 
 	/*
 	 * Skip zombies because their LRUs are reparented and we would be
@@ -1310,20 +1310,34 @@ static int shrink_memcg(struct mem_cgroup *memcg)
 
 		shrunk += list_lru_walk_one(&zswap_list_lru, nid, memcg,
 					    &shrink_memcg_cb, NULL, &nr_to_walk);
+		scanned += 1 - nr_to_walk;
 	}
+
+	if (!scanned)
+		return -ENOENT;
+
 	return shrunk ? 0 : -EAGAIN;
 }
 
 static void shrink_worker(struct work_struct *w)
 {
 	struct mem_cgroup *memcg;
-	int ret, failures = 0;
+	int ret, failures = 0, attempts = 0;
 	unsigned long thr;
 
 	/* Reclaim down to the accept threshold */
 	thr = zswap_accept_thr_pages();
 
-	/* global reclaim will select cgroup in a round-robin fashion.
+	/*
+	 * Global reclaim will select cgroup in a round-robin fashion from all
+	 * online memcgs, but memcgs that have no pages in zswap and
+	 * writeback-disabled memcgs (memory.zswap.writeback=0) are not
+	 * candidates for shrinking.
+	 *
+	 * Shrinking will be aborted if we encounter the following
+	 * MAX_RECLAIM_RETRIES times:
+	 * - No writeback-candidate memcgs found in a memcg tree walk.
+	 * - Shrinking a writeback-candidate memcg failed.
 	 *
 	 * We save iteration cursor memcg into zswap_next_shrink,
 	 * which can be modified by the offline memcg cleaner
@@ -1361,9 +1375,14 @@ static void shrink_worker(struct work_struct *w)
 		spin_unlock(&zswap_shrink_lock);
 
 		if (!memcg) {
-			if (++failures == MAX_RECLAIM_RETRIES)
+			/*
+			 * Continue shrinking without incrementing failures if
+			 * we found candidate memcgs in the last tree walk.
+			 */
+			if (!attempts && ++failures == MAX_RECLAIM_RETRIES)
 				break;
 
+			attempts = 0;
 			goto resched;
 		}
 
@@ -1371,8 +1390,16 @@ static void shrink_worker(struct work_struct *w)
 		/* drop the extra reference */
 		mem_cgroup_put(memcg);
 
-		if (ret == -EINVAL)
-			break;
+		/*
+		 * There are no writeback-candidate pages in the memcg.
+		 * This is not an issue as long as we can find another memcg
+		 * with pages in zswap. Skip this without incrementing attempts
+		 * and failures.
+		 */
+		if (ret == -ENOENT)
+			continue;
+		++attempts;
+
 		if (ret && ++failures == MAX_RECLAIM_RETRIES)
 			break;
 resched:
-- 
2.43.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration
  2024-07-27 23:06 ` [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration Takero Funaki
@ 2024-07-29 18:24   ` Yosry Ahmed
  2024-07-30  5:24     ` Takero Funaki
  2024-07-30 19:18   ` Nhat Pham
  1 sibling, 1 reply; 7+ messages in thread
From: Yosry Ahmed @ 2024-07-29 18:24 UTC (permalink / raw)
  To: Takero Funaki
  Cc: Johannes Weiner, Nhat Pham, Chengming Zhou, Andrew Morton,
	linux-mm, linux-kernel

On Sat, Jul 27, 2024 at 4:06 PM Takero Funaki <flintglass@gmail.com> wrote:
>
> This patch fixes an issue where the zswap global shrinker stopped
> iterating through the memcg tree.
>
> The problem was that shrink_worker() would restart iterating memcg tree
> from the tree root, considering an offline memcg as a failure, and abort
> shrinking after encountering the same offline memcg 16 times even if
> there is only one offline memcg. After this change, an offline memcg in
> the tree is no longer considered a failure. This allows the shrinker to
> continue shrinking the other online memcgs regardless of whether an
> offline memcg exists, gives higher zswap writeback activity.
>
> To avoid holding refcount of offline memcg encountered during the memcg
> tree walking, shrink_worker() must continue iterating to release the
> offline memcg to ensure the next memcg stored in the cursor is online.
>
> The offline memcg cleaner has also been changed to avoid the same issue.
> When the next memcg of the offlined memcg is also offline, the refcount
> stored in the iteration cursor was held until the next shrink_worker()
> run. The cleaner must release the offline memcg recursively.
>
> Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
> Signed-off-by: Takero Funaki <flintglass@gmail.com>
> ---
>  mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 49 insertions(+), 24 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index adeaf9c97fde..e9b5343256cd 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -765,12 +765,31 @@ void zswap_folio_swapin(struct folio *folio)
>         }
>  }
>
> +/*
> + * This function should be called when a memcg is being offlined.
> + *
> + * Since the global shrinker shrink_worker() may hold a reference
> + * of the memcg, we must check and release the reference in
> + * zswap_next_shrink.
> + *
> + * shrink_worker() must handle the case where this function releases
> + * the reference of memcg being shrunk.
> + */
>  void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
>  {
>         /* lock out zswap shrinker walking memcg tree */
>         spin_lock(&zswap_shrink_lock);
> -       if (zswap_next_shrink == memcg)
> -               zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> +       if (zswap_next_shrink == memcg) {
> +               do {
> +                       zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> +               } while (zswap_next_shrink && !mem_cgroup_online(zswap_next_shrink));
> +               /*
> +                * We verified the next memcg is online.  Even if the next
> +                * memcg is being offlined here, another cleaner must be
> +                * waiting for our lock.  We can leave the online memcg
> +                * reference.
> +                */

I thought we agreed to drop this comment :)

> +       }
>         spin_unlock(&zswap_shrink_lock);
>  }
>
> @@ -1304,43 +1323,49 @@ static void shrink_worker(struct work_struct *w)
>         /* Reclaim down to the accept threshold */
>         thr = zswap_accept_thr_pages();
>
> -       /* global reclaim will select cgroup in a round-robin fashion. */
> +       /* global reclaim will select cgroup in a round-robin fashion.

nit: s/global/Global

> +        *
> +        * We save iteration cursor memcg into zswap_next_shrink,
> +        * which can be modified by the offline memcg cleaner
> +        * zswap_memcg_offline_cleanup().
> +        *
> +        * Since the offline cleaner is called only once, we cannot leave an
> +        * offline memcg reference in zswap_next_shrink.
> +        * We can rely on the cleaner only if we get online memcg under lock.
> +        *
> +        * If we get an offline memcg, we cannot determine if the cleaner has
> +        * already been called or will be called later. We must put back the
> +        * reference before returning from this function. Otherwise, the
> +        * offline memcg left in zswap_next_shrink will hold the reference
> +        * until the next run of shrink_worker().
> +        */
>         do {
>                 spin_lock(&zswap_shrink_lock);
> -               zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> -               memcg = zswap_next_shrink;
>
>                 /*
> -                * We need to retry if we have gone through a full round trip, or if we
> -                * got an offline memcg (or else we risk undoing the effect of the
> -                * zswap memcg offlining cleanup callback). This is not catastrophic
> -                * per se, but it will keep the now offlined memcg hostage for a while.
> -                *
> +                * Start shrinking from the next memcg after zswap_next_shrink.
> +                * When the offline cleaner has already advanced the cursor,
> +                * advancing the cursor here overlooks one memcg, but this
> +                * should be negligibly rare.
> +                */
> +               do {
> +                       memcg = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> +                       zswap_next_shrink = memcg;
> +               } while (memcg && !mem_cgroup_tryget_online(memcg));

Let's move spin_lock() and spin_unlock() to be right above and before
the do-while loop, similar to zswap_memcg_offline_cleanup(). This
should make it more obvious what the lock is protecting.

Actually, maybe it would be cleaner at this point to move the
iteration to find the next online memcg under lock into a helper, and
use it here and in zswap_memcg_offline_cleanup(). zswap_shrink_lock
and zswap_next_shrink can be made static to this helper and maybe some
of the comments could live there instead. Something like
zswap_next_shrink_memcg().

This will abstract this whole iteration logic and make shrink_worker()
significantly easier to follow. WDYT?

I can do that in a followup cleanup patch if you prefer this as well.

> +               /*
>                  * Note that if we got an online memcg, we will keep the extra
>                  * reference in case the original reference obtained by mem_cgroup_iter
>                  * is dropped by the zswap memcg offlining callback, ensuring that the
>                  * memcg is not killed when we are reclaiming.
>                  */
> -               if (!memcg) {
> -                       spin_unlock(&zswap_shrink_lock);
> -                       if (++failures == MAX_RECLAIM_RETRIES)
> -                               break;
> -
> -                       goto resched;
> -               }
> -
> -               if (!mem_cgroup_tryget_online(memcg)) {
> -                       /* drop the reference from mem_cgroup_iter() */
> -                       mem_cgroup_iter_break(NULL, memcg);
> -                       zswap_next_shrink = NULL;
> -                       spin_unlock(&zswap_shrink_lock);
> +               spin_unlock(&zswap_shrink_lock);
>
> +               if (!memcg) {
>                         if (++failures == MAX_RECLAIM_RETRIES)
>                                 break;
>
>                         goto resched;
>                 }
> -               spin_unlock(&zswap_shrink_lock);
>
>                 ret = shrink_memcg(memcg);
>                 /* drop the extra reference */
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration
  2024-07-29 18:24   ` Yosry Ahmed
@ 2024-07-30  5:24     ` Takero Funaki
  2024-07-30 18:23       ` Yosry Ahmed
  0 siblings, 1 reply; 7+ messages in thread
From: Takero Funaki @ 2024-07-30  5:24 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Nhat Pham, Chengming Zhou, Andrew Morton,
	linux-mm, linux-kernel

2024年7月30日(火) 3:24 Yosry Ahmed <yosryahmed@google.com>:
>
> On Sat, Jul 27, 2024 at 4:06 PM Takero Funaki <flintglass@gmail.com> wrote:
> >
> > This patch fixes an issue where the zswap global shrinker stopped
> > iterating through the memcg tree.
> >
> > The problem was that shrink_worker() would restart iterating memcg tree
> > from the tree root, considering an offline memcg as a failure, and abort
> > shrinking after encountering the same offline memcg 16 times even if
> > there is only one offline memcg. After this change, an offline memcg in
> > the tree is no longer considered a failure. This allows the shrinker to
> > continue shrinking the other online memcgs regardless of whether an
> > offline memcg exists, gives higher zswap writeback activity.
> >
> > To avoid holding refcount of offline memcg encountered during the memcg
> > tree walking, shrink_worker() must continue iterating to release the
> > offline memcg to ensure the next memcg stored in the cursor is online.
> >
> > The offline memcg cleaner has also been changed to avoid the same issue.
> > When the next memcg of the offlined memcg is also offline, the refcount
> > stored in the iteration cursor was held until the next shrink_worker()
> > run. The cleaner must release the offline memcg recursively.
> >
> > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
> > Signed-off-by: Takero Funaki <flintglass@gmail.com>
> > ---
> >  mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++------------------
> >  1 file changed, 49 insertions(+), 24 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index adeaf9c97fde..e9b5343256cd 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -765,12 +765,31 @@ void zswap_folio_swapin(struct folio *folio)
> >         }
> >  }
> >
> > +/*
> > + * This function should be called when a memcg is being offlined.
> > + *
> > + * Since the global shrinker shrink_worker() may hold a reference
> > + * of the memcg, we must check and release the reference in
> > + * zswap_next_shrink.
> > + *
> > + * shrink_worker() must handle the case where this function releases
> > + * the reference of memcg being shrunk.
> > + */
> >  void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
> >  {
> >         /* lock out zswap shrinker walking memcg tree */
> >         spin_lock(&zswap_shrink_lock);
> > -       if (zswap_next_shrink == memcg)
> > -               zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> > +       if (zswap_next_shrink == memcg) {
> > +               do {
> > +                       zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> > +               } while (zswap_next_shrink && !mem_cgroup_online(zswap_next_shrink));
> > +               /*
> > +                * We verified the next memcg is online.  Even if the next
> > +                * memcg is being offlined here, another cleaner must be
> > +                * waiting for our lock.  We can leave the online memcg
> > +                * reference.
> > +                */
>
> I thought we agreed to drop this comment :)
>
> > +       }
> >         spin_unlock(&zswap_shrink_lock);
> >  }
> >
> > @@ -1304,43 +1323,49 @@ static void shrink_worker(struct work_struct *w)
> >         /* Reclaim down to the accept threshold */
> >         thr = zswap_accept_thr_pages();
> >
> > -       /* global reclaim will select cgroup in a round-robin fashion. */
> > +       /* global reclaim will select cgroup in a round-robin fashion.
>
> nit: s/global/Global
>
> > +        *
> > +        * We save iteration cursor memcg into zswap_next_shrink,
> > +        * which can be modified by the offline memcg cleaner
> > +        * zswap_memcg_offline_cleanup().
> > +        *
> > +        * Since the offline cleaner is called only once, we cannot leave an
> > +        * offline memcg reference in zswap_next_shrink.
> > +        * We can rely on the cleaner only if we get online memcg under lock.
> > +        *
> > +        * If we get an offline memcg, we cannot determine if the cleaner has
> > +        * already been called or will be called later. We must put back the
> > +        * reference before returning from this function. Otherwise, the
> > +        * offline memcg left in zswap_next_shrink will hold the reference
> > +        * until the next run of shrink_worker().
> > +        */
> >         do {
> >                 spin_lock(&zswap_shrink_lock);
> > -               zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> > -               memcg = zswap_next_shrink;
> >
> >                 /*
> > -                * We need to retry if we have gone through a full round trip, or if we
> > -                * got an offline memcg (or else we risk undoing the effect of the
> > -                * zswap memcg offlining cleanup callback). This is not catastrophic
> > -                * per se, but it will keep the now offlined memcg hostage for a while.
> > -                *
> > +                * Start shrinking from the next memcg after zswap_next_shrink.
> > +                * When the offline cleaner has already advanced the cursor,
> > +                * advancing the cursor here overlooks one memcg, but this
> > +                * should be negligibly rare.
> > +                */
> > +               do {
> > +                       memcg = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> > +                       zswap_next_shrink = memcg;
> > +               } while (memcg && !mem_cgroup_tryget_online(memcg));
>
> Let's move spin_lock() and spin_unlock() to be right above and before
> the do-while loop, similar to zswap_memcg_offline_cleanup(). This
> should make it more obvious what the lock is protecting.
>
> Actually, maybe it would be cleaner at this point to move the
> iteration to find the next online memcg under lock into a helper, and
> use it here and in zswap_memcg_offline_cleanup(). zswap_shrink_lock
> and zswap_next_shrink can be made static to this helper and maybe some
> of the comments could live there instead. Something like
> zswap_next_shrink_memcg().
>
> This will abstract this whole iteration logic and make shrink_worker()
> significantly easier to follow. WDYT?
>
> I can do that in a followup cleanup patch if you prefer this as well.
>

I'd really appreciate it. Sorry to have kept you waiting for a novice
coder. Thank you for all your comments and support.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration
  2024-07-30  5:24     ` Takero Funaki
@ 2024-07-30 18:23       ` Yosry Ahmed
  0 siblings, 0 replies; 7+ messages in thread
From: Yosry Ahmed @ 2024-07-30 18:23 UTC (permalink / raw)
  To: Takero Funaki
  Cc: Johannes Weiner, Nhat Pham, Chengming Zhou, Andrew Morton,
	linux-mm, linux-kernel

On Mon, Jul 29, 2024 at 10:25 PM Takero Funaki <flintglass@gmail.com> wrote:
>
> 2024年7月30日(火) 3:24 Yosry Ahmed <yosryahmed@google.com>:
> >
> > On Sat, Jul 27, 2024 at 4:06 PM Takero Funaki <flintglass@gmail.com> wrote:
> > >
> > > This patch fixes an issue where the zswap global shrinker stopped
> > > iterating through the memcg tree.
> > >
> > > The problem was that shrink_worker() would restart iterating memcg tree
> > > from the tree root, considering an offline memcg as a failure, and abort
> > > shrinking after encountering the same offline memcg 16 times even if
> > > there is only one offline memcg. After this change, an offline memcg in
> > > the tree is no longer considered a failure. This allows the shrinker to
> > > continue shrinking the other online memcgs regardless of whether an
> > > offline memcg exists, gives higher zswap writeback activity.
> > >
> > > To avoid holding refcount of offline memcg encountered during the memcg
> > > tree walking, shrink_worker() must continue iterating to release the
> > > offline memcg to ensure the next memcg stored in the cursor is online.
> > >
> > > The offline memcg cleaner has also been changed to avoid the same issue.
> > > When the next memcg of the offlined memcg is also offline, the refcount
> > > stored in the iteration cursor was held until the next shrink_worker()
> > > run. The cleaner must release the offline memcg recursively.
> > >
> > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
> > > Signed-off-by: Takero Funaki <flintglass@gmail.com>
> > > ---
> > >  mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++------------------
> > >  1 file changed, 49 insertions(+), 24 deletions(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index adeaf9c97fde..e9b5343256cd 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -765,12 +765,31 @@ void zswap_folio_swapin(struct folio *folio)
> > >         }
> > >  }
> > >
> > > +/*
> > > + * This function should be called when a memcg is being offlined.
> > > + *
> > > + * Since the global shrinker shrink_worker() may hold a reference
> > > + * of the memcg, we must check and release the reference in
> > > + * zswap_next_shrink.
> > > + *
> > > + * shrink_worker() must handle the case where this function releases
> > > + * the reference of memcg being shrunk.
> > > + */
> > >  void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
> > >  {
> > >         /* lock out zswap shrinker walking memcg tree */
> > >         spin_lock(&zswap_shrink_lock);
> > > -       if (zswap_next_shrink == memcg)
> > > -               zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> > > +       if (zswap_next_shrink == memcg) {
> > > +               do {
> > > +                       zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> > > +               } while (zswap_next_shrink && !mem_cgroup_online(zswap_next_shrink));
> > > +               /*
> > > +                * We verified the next memcg is online.  Even if the next
> > > +                * memcg is being offlined here, another cleaner must be
> > > +                * waiting for our lock.  We can leave the online memcg
> > > +                * reference.
> > > +                */
> >
> > I thought we agreed to drop this comment :)
> >
> > > +       }
> > >         spin_unlock(&zswap_shrink_lock);
> > >  }
> > >
> > > @@ -1304,43 +1323,49 @@ static void shrink_worker(struct work_struct *w)
> > >         /* Reclaim down to the accept threshold */
> > >         thr = zswap_accept_thr_pages();
> > >
> > > -       /* global reclaim will select cgroup in a round-robin fashion. */
> > > +       /* global reclaim will select cgroup in a round-robin fashion.
> >
> > nit: s/global/Global
> >
> > > +        *
> > > +        * We save iteration cursor memcg into zswap_next_shrink,
> > > +        * which can be modified by the offline memcg cleaner
> > > +        * zswap_memcg_offline_cleanup().
> > > +        *
> > > +        * Since the offline cleaner is called only once, we cannot leave an
> > > +        * offline memcg reference in zswap_next_shrink.
> > > +        * We can rely on the cleaner only if we get online memcg under lock.
> > > +        *
> > > +        * If we get an offline memcg, we cannot determine if the cleaner has
> > > +        * already been called or will be called later. We must put back the
> > > +        * reference before returning from this function. Otherwise, the
> > > +        * offline memcg left in zswap_next_shrink will hold the reference
> > > +        * until the next run of shrink_worker().
> > > +        */
> > >         do {
> > >                 spin_lock(&zswap_shrink_lock);
> > > -               zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> > > -               memcg = zswap_next_shrink;
> > >
> > >                 /*
> > > -                * We need to retry if we have gone through a full round trip, or if we
> > > -                * got an offline memcg (or else we risk undoing the effect of the
> > > -                * zswap memcg offlining cleanup callback). This is not catastrophic
> > > -                * per se, but it will keep the now offlined memcg hostage for a while.
> > > -                *
> > > +                * Start shrinking from the next memcg after zswap_next_shrink.
> > > +                * When the offline cleaner has already advanced the cursor,
> > > +                * advancing the cursor here overlooks one memcg, but this
> > > +                * should be negligibly rare.
> > > +                */
> > > +               do {
> > > +                       memcg = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> > > +                       zswap_next_shrink = memcg;
> > > +               } while (memcg && !mem_cgroup_tryget_online(memcg));
> >
> > Let's move spin_lock() and spin_unlock() to be right above and before
> > the do-while loop, similar to zswap_memcg_offline_cleanup(). This
> > should make it more obvious what the lock is protecting.
> >
> > Actually, maybe it would be cleaner at this point to move the
> > iteration to find the next online memcg under lock into a helper, and
> > use it here and in zswap_memcg_offline_cleanup(). zswap_shrink_lock
> > and zswap_next_shrink can be made static to this helper and maybe some
> > of the comments could live there instead. Something like
> > zswap_next_shrink_memcg().
> >
> > This will abstract this whole iteration logic and make shrink_worker()
> > significantly easier to follow. WDYT?
> >
> > I can do that in a followup cleanup patch if you prefer this as well.
> >
>
> I'd really appreciate it. Sorry to have kept you waiting for a novice
> coder. Thank you for all your comments and support.

I will send a followup patch after this lands in mm-unstable. For this
patch, feel free to add:

Acked-by: Yosry Ahmed <yosryahmed@google.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration
  2024-07-27 23:06 ` [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration Takero Funaki
  2024-07-29 18:24   ` Yosry Ahmed
@ 2024-07-30 19:18   ` Nhat Pham
  1 sibling, 0 replies; 7+ messages in thread
From: Nhat Pham @ 2024-07-30 19:18 UTC (permalink / raw)
  To: Takero Funaki
  Cc: Johannes Weiner, Yosry Ahmed, Chengming Zhou, Andrew Morton,
	linux-mm, linux-kernel

On Sat, Jul 27, 2024 at 4:06 PM Takero Funaki <flintglass@gmail.com> wrote:
>
> This patch fixes an issue where the zswap global shrinker stopped
> iterating through the memcg tree.
>
> The problem was that shrink_worker() would restart iterating memcg tree
> from the tree root, considering an offline memcg as a failure, and abort
> shrinking after encountering the same offline memcg 16 times even if
> there is only one offline memcg. After this change, an offline memcg in
> the tree is no longer considered a failure. This allows the shrinker to
> continue shrinking the other online memcgs regardless of whether an
> offline memcg exists, gives higher zswap writeback activity.
>
> To avoid holding refcount of offline memcg encountered during the memcg
> tree walking, shrink_worker() must continue iterating to release the
> offline memcg to ensure the next memcg stored in the cursor is online.
>
> The offline memcg cleaner has also been changed to avoid the same issue.
> When the next memcg of the offlined memcg is also offline, the refcount
> stored in the iteration cursor was held until the next shrink_worker()
> run. The cleaner must release the offline memcg recursively.
>
> Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
> Signed-off-by: Takero Funaki <flintglass@gmail.com>
> ---
>  mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 49 insertions(+), 24 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index adeaf9c97fde..e9b5343256cd 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -765,12 +765,31 @@ void zswap_folio_swapin(struct folio *folio)
>         }
>  }
>
> +/*
> + * This function should be called when a memcg is being offlined.
> + *
> + * Since the global shrinker shrink_worker() may hold a reference
> + * of the memcg, we must check and release the reference in
> + * zswap_next_shrink.
> + *
> + * shrink_worker() must handle the case where this function releases
> + * the reference of memcg being shrunk.
> + */
>  void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
>  {
>         /* lock out zswap shrinker walking memcg tree */
>         spin_lock(&zswap_shrink_lock);
> -       if (zswap_next_shrink == memcg)
> -               zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> +       if (zswap_next_shrink == memcg) {
> +               do {
> +                       zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> +               } while (zswap_next_shrink && !mem_cgroup_online(zswap_next_shrink));
> +               /*
> +                * We verified the next memcg is online.  Even if the next
> +                * memcg is being offlined here, another cleaner must be
> +                * waiting for our lock.  We can leave the online memcg
> +                * reference.
> +                */
> +       }
>         spin_unlock(&zswap_shrink_lock);
>  }
>
> @@ -1304,43 +1323,49 @@ static void shrink_worker(struct work_struct *w)
>         /* Reclaim down to the accept threshold */
>         thr = zswap_accept_thr_pages();
>
> -       /* global reclaim will select cgroup in a round-robin fashion. */
> +       /* global reclaim will select cgroup in a round-robin fashion.
> +        *
> +        * We save iteration cursor memcg into zswap_next_shrink,
> +        * which can be modified by the offline memcg cleaner
> +        * zswap_memcg_offline_cleanup().
> +        *
> +        * Since the offline cleaner is called only once, we cannot leave an
> +        * offline memcg reference in zswap_next_shrink.
> +        * We can rely on the cleaner only if we get online memcg under lock.
> +        *
> +        * If we get an offline memcg, we cannot determine if the cleaner has
> +        * already been called or will be called later. We must put back the
> +        * reference before returning from this function. Otherwise, the
> +        * offline memcg left in zswap_next_shrink will hold the reference
> +        * until the next run of shrink_worker().
> +        */
>         do {
>                 spin_lock(&zswap_shrink_lock);
> -               zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> -               memcg = zswap_next_shrink;
>
>                 /*
> -                * We need to retry if we have gone through a full round trip, or if we
> -                * got an offline memcg (or else we risk undoing the effect of the
> -                * zswap memcg offlining cleanup callback). This is not catastrophic
> -                * per se, but it will keep the now offlined memcg hostage for a while.
> -                *
> +                * Start shrinking from the next memcg after zswap_next_shrink.
> +                * When the offline cleaner has already advanced the cursor,
> +                * advancing the cursor here overlooks one memcg, but this
> +                * should be negligibly rare.
> +                */
> +               do {
> +                       memcg = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
> +                       zswap_next_shrink = memcg;
> +               } while (memcg && !mem_cgroup_tryget_online(memcg));
> +               /*

Yeah I agree with Yosry's comment - the do while loop looks like it
can become a helper in some form? But that asides, the rest LGTM:

Reviewed-by: Nhat Pham <nphamcs@gmail.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-07-30 19:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-27 23:06 [PATCH v4 0/2] mm: zswap: fixes for global shrinker Takero Funaki
2024-07-27 23:06 ` [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration Takero Funaki
2024-07-29 18:24   ` Yosry Ahmed
2024-07-30  5:24     ` Takero Funaki
2024-07-30 18:23       ` Yosry Ahmed
2024-07-30 19:18   ` Nhat Pham
2024-07-27 23:06 ` [PATCH v4 2/2] mm: zswap: fix global shrinker error handling logic Takero Funaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox