linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Takero Funaki <flintglass@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Yosry Ahmed <yosryahmed@google.com>,
	 Chengming Zhou <chengming.zhou@linux.dev>,
	Jonathan Corbet <corbet@lwn.net>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Domenico Cerasuolo <cerasuolodomenico@gmail.com>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 6/6] mm: zswap: interrupt shrinker writeback while pagein/out IO
Date: Mon, 8 Jul 2024 12:17:12 -0700	[thread overview]
Message-ID: <CAKEwX=NzLWSO4oiNBFhsqurOCX=cYVWTUkhRdmOsrqxOz+xi0g@mail.gmail.com> (raw)
In-Reply-To: <20240706022523.1104080-7-flintglass@gmail.com>

On Fri, Jul 5, 2024 at 7:25 PM Takero Funaki <flintglass@gmail.com> wrote:
>
> To prevent the zswap global shrinker from writing back pages
> simultaneously with IO performed for memory reclaim and faults, delay
> the writeback when zswap_store() rejects pages or zswap_load() cannot
> find entry in pool.
>
> When the zswap shrinker is running and zswap rejects an incoming page,
> simulatenous zswap writeback and the rejected page lead to IO contention
> on swap device. In this case, the writeback of the rejected page must be
> higher priority as it is necessary for actual memory reclaim progress.
> The zswap global shrinker can run in the background and should not
> interfere with memory reclaim.

Do you see this problem actually manifesting in real life? Does not
sound infeasible to me, but I wonder how likely this is the case.

Do you have any userspace-visible metrics, or any tracing logs etc.
that proves that it actually happens?

This might also affect the dynamic shrinker as well FWIW.

>
> The same logic applies to zswap_load(). When zswap cannot find requested
> page from pool and read IO is performed, shrinker should be interrupted.
>
> To avoid IO contention, save the timestamp jiffies when zswap cannot
> buffer the pagein/out IO and interrupt the global shrinker. The shrinker
> resumes the writeback in 500 msec since the saved timestamp.
>
> Signed-off-by: Takero Funaki <flintglass@gmail.com>
> ---
>  mm/zswap.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index def0f948a4ab..59ba4663c74f 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -35,6 +35,8 @@
>  #include <linux/pagemap.h>
>  #include <linux/workqueue.h>
>  #include <linux/list_lru.h>
> +#include <linux/delay.h>
> +#include <linux/jiffies.h>
>
>  #include "swap.h"
>  #include "internal.h"
> @@ -176,6 +178,14 @@ static bool zswap_next_shrink_changed;
>  static struct work_struct zswap_shrink_work;
>  static struct shrinker *zswap_shrinker;
>
> +/*
> + * To avoid IO contention between pagein/out and global shrinker writeback,
> + * track the last jiffies of pagein/out and delay the writeback.
> + * Default to 500msec in alignment with mq-deadline read timeout.

If there is a future version, could you include the reason why you
select 500msec in the patch's changelog as well?

> + */
> +#define ZSWAP_GLOBAL_SHRINKER_DELAY_MS 500
> +static unsigned long zswap_shrinker_delay_start;
> +
>  /*
>   * struct zswap_entry
>   *
> @@ -244,6 +254,14 @@ static inline struct xarray *swap_zswap_tree(swp_entry_t swp)
>         pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name,         \
>                  zpool_get_type((p)->zpools[0]))
>
> +static inline void zswap_shrinker_delay_update(void)
> +{
> +       unsigned long now = jiffies;
> +
> +       if (now != zswap_shrinker_delay_start)
> +               zswap_shrinker_delay_start = now;
> +}

Hmmm is there a reason why we do not just do:

zswap_shrinker_delay_start = jiffies;

or, more unnecessarily:

unsigned long now = jiffies;

zswap_shrinker_delay_start = now;

IOW, why the branching here? Does not seem necessary to me, but
perhaps there is a correctness/compiler reason I'm not seeing?

In fact, if it's the first version, then we could manually inline it.

Additionally/alternatively, I wonder if it is more convenient to do it
at the mm/page_io.c zswap callsites, i.e whenever zswap_store() and
zswap_load() returns false, then delay the shrinker before proceeding
with the IO steps.

> +
>  /*********************************
>  * pool functions
>  **********************************/
> @@ -1378,6 +1396,8 @@ static void shrink_worker(struct work_struct *w)
>         struct mem_cgroup *memcg;
>         int ret, failures = 0, progress;
>         unsigned long thr;
> +       unsigned long now, sleepuntil;
> +       const unsigned long delay = msecs_to_jiffies(ZSWAP_GLOBAL_SHRINKER_DELAY_MS);
>
>         /* Reclaim down to the accept threshold */
>         thr = zswap_accept_thr_pages();
> @@ -1405,6 +1425,21 @@ static void shrink_worker(struct work_struct *w)
>          * until the next run of shrink_worker().
>          */
>         do {
> +               /*
> +                * delay shrinking to allow the last rejected page completes
> +                * its writeback
> +                */
> +               sleepuntil = delay + READ_ONCE(zswap_shrinker_delay_start);

I assume we do not care about racy access here right? Same goes for
updates - I don't see any locks protecting these operations (but I
could be missing something).


> +               now = jiffies;
> +               /*
> +                * If zswap did not reject pages for long, sleepuntil-now may
> +                * underflow.  We assume the timestamp is valid only if
> +                * now < sleepuntil < now + delay + 1
> +                */
> +               if (time_before(now, sleepuntil) &&
> +                               time_before(sleepuntil, now + delay + 1))
> +                       fsleep(jiffies_to_usecs(sleepuntil - now));
> +
>                 spin_lock(&zswap_shrink_lock);
>
>                 /*
> @@ -1526,8 +1561,10 @@ bool zswap_store(struct folio *folio)
>         VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
>
>         /* Large folios aren't supported */
> -       if (folio_test_large(folio))
> +       if (folio_test_large(folio)) {
> +               zswap_shrinker_delay_update();
>                 return false;
> +       }
>
>         if (!zswap_enabled)
>                 goto check_old;
> @@ -1648,6 +1685,8 @@ bool zswap_store(struct folio *folio)
>         zswap_entry_cache_free(entry);
>  reject:
>         obj_cgroup_put(objcg);
> +       zswap_shrinker_delay_update();
> +
>         if (need_global_shrink)
>                 queue_work(shrink_wq, &zswap_shrink_work);
>  check_old:
> @@ -1691,8 +1730,10 @@ bool zswap_load(struct folio *folio)
>         else
>                 entry = xa_load(tree, offset);
>
> -       if (!entry)
> +       if (!entry) {
> +               zswap_shrinker_delay_update();
>                 return false;
> +       }
>
>         if (entry->length)
>                 zswap_decompress(entry, page);
> @@ -1835,6 +1876,8 @@ static int zswap_setup(void)
>         if (ret)
>                 goto hp_fail;
>
> +       zswap_shrinker_delay_update();
> +
>         shrink_wq = alloc_workqueue("zswap-shrink",
>                         WQ_UNBOUND, 1);
>         if (!shrink_wq)
> --
> 2.43.0
>


  reply	other threads:[~2024-07-08 19:17 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-06  2:25 [PATCH v2 0/6] mm: zswap: global shrinker fix and proactive shrink Takero Funaki
2024-07-06  2:25 ` [PATCH v2 1/6] mm: zswap: fix global shrinker memcg iteration Takero Funaki
2024-07-08  4:54   ` Chengming Zhou
2024-07-17  1:54   ` Yosry Ahmed
2024-07-06  2:25 ` [PATCH v2 2/6] mm: zswap: fix global shrinker error handling logic Takero Funaki
2024-07-17  2:39   ` Yosry Ahmed
2024-07-06  2:25 ` [PATCH v2 3/6] mm: zswap: proactive shrinking before pool size limit is hit Takero Funaki
2024-07-12 23:18   ` Nhat Pham
2024-07-06  2:25 ` [PATCH v2 4/6] mm: zswap: make writeback run in the background Takero Funaki
2024-07-06  2:25 ` [PATCH v2 5/6] mm: zswap: store incompressible page as-is Takero Funaki
2024-07-06 23:53   ` Nhat Pham
2024-07-07  9:38     ` Takero Funaki
2024-07-12 22:36       ` Nhat Pham
2024-07-08  3:56   ` Chengming Zhou
2024-07-08 13:44     ` Takero Funaki
2024-07-09 13:26       ` Chengming Zhou
2024-07-12 22:47         ` Nhat Pham
2024-07-16  2:30           ` Chengming Zhou
2024-07-06  2:25 ` [PATCH v2 6/6] mm: zswap: interrupt shrinker writeback while pagein/out IO Takero Funaki
2024-07-08 19:17   ` Nhat Pham [this message]
2024-07-09  0:57   ` Nhat Pham
2024-07-10 21:21     ` Takero Funaki
2024-07-10 22:10       ` Nhat Pham
2024-07-15  7:33         ` Takero Funaki
2024-07-06 17:32 ` [PATCH v2 0/6] mm: zswap: global shrinker fix and proactive shrink Andrew Morton
2024-07-07 10:54   ` Takero Funaki
2024-07-09  0:53 ` Nhat Pham
2024-07-10 22:26   ` Takero Funaki
2024-07-12 23:02     ` Nhat Pham
2024-07-15  8:20       ` Takero Funaki
2024-07-26 18:13         ` Nhat Pham
2024-07-26 18:25           ` Nhat Pham
2024-07-17  2:53     ` Yosry Ahmed
2024-07-17 17:49       ` Nhat Pham
2024-07-17 18:05         ` Yosry Ahmed
2024-07-17 19:01           ` Nhat Pham
2024-07-19 14:55           ` Takero Funaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=NzLWSO4oiNBFhsqurOCX=cYVWTUkhRdmOsrqxOz+xi0g@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=chengming.zhou@linux.dev \
    --cc=corbet@lwn.net \
    --cc=flintglass@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox