linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: SeongJae Park <sj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	 David Hildenbrand <david@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Nhat Pham <nphamcs@gmail.com>,
	 Yosry Ahmed <yosry.ahmed@linux.dev>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 Takero Funaki <flintglass@gmail.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
Date: Fri, 15 Aug 2025 19:20:35 -0700	[thread overview]
Message-ID: <CAF8kJuPCpnGAoz=1CBwe46ytpcR0ZUMAEWLFQce7eUWkb+0ERA@mail.gmail.com> (raw)
In-Reply-To: <20250816000701.90784-1-sj@kernel.org>

On Fri, Aug 15, 2025 at 5:07 PM SeongJae Park <sj@kernel.org> wrote:
>
> On Fri, 15 Aug 2025 15:28:59 -0700 Chris Li <chrisl@kernel.org> wrote:
> > Sure. I am most interested in getting the best overall solution. No
> > objects to get it now vs later.
>
> Thank you for being flexible, Chris.  I'm also looking forward to keep
> collaborating with you on the followup works!

Likewise.

>
> [...]
> > > > Is there any reason you want to store the page in zpool when the
> > > > compression engine (not the ratio) fails?
> > >
> > > The main problem this patch tries to solve is the LRU order corruption.  In any
> > > case, I want to keep the order if possible.
> >
> > In that case, does it mean that in order to keep the LRU, you never
> > want to write from zswap to a real back end device?
>
> Not always, but until we have to write back zswapped pages to real back end
> deevice, and all zswapped pages of lower LRU-order are already wrote back.

I see, that makes sense. Only writers to the lower tier while
maintaining LRU order.
Thanks for the explanation. I can see the bigger picture now.

> > I slept over it a bit. Now I think we should make this a counter of
> > how many uncompressed pages count stored in zswap. Preperbelly as per
> > memcg counter.
>
> I agree that could be useful.  I will add the counter in the next version (v4).
> But making it for each memcg may be out of the scope of this patch, in my
> opinion.  Would you mind doing per-memcg counter implementation as a followup?

No objections. If the incompressible page count is very bad, it will
drag down the overall compression ratio as well. So we do have some
per memcg signal to pick it up. The trade off is how complex we want
to go for those marginal finer grain measurements.

> > I saw that you implement it as a counter in your V1.
>
> Yes, though it was only for internal usage and therefore not exposed to the
> user space.  I will make it again and expose to the user space via debugfs.
> Say, stored_uncompressed_pages?

I am not very good at naming either. Maybe
"stored_incompressible_pages"? Uncompressed pages are the resulting
state, it sounds like we choose to store them uncompressed. The root
cause is that those pages are incompressible. That is the nature of
the page,  we don't have a choice to compress them.

>
> > Does the zsmalloc
> > already track this information in the zspool class?
>
> zsmalloc provides such information when CONFIG_ZSMALLOC_STAT is enabled, to my
> understanding.

Thanks. I think that is likely global not per memcg.

>
> [...]
> > I am actually less interested in the absolute failure number which
> > keeps increasing, more on how much incompressible zswap is stored.
> > That counter + number of engine errors should be perfect.
>
> Sounds good.  So the next version (v4) of this patch will provide two new
> debugfs counters, namely compress_engine_fail, and stored_uncompressed_pages.

Sounds good. Maybe "crypto_compress_fail", the engine is the common
name as I call it. Here the engine refers to the crypto library so
"crypto_compress_fail" matches the code better.

>
> [...]
> > > I agree this code is nice to read.  Nonetheless I have to say the behavior is
> > > somewhat different from what I want.
> >
> > Even if you keep the current behavior, you can move the invert the
> > test condition and then remove the "else + goto" similar to the above.
> > That will make your code less and flatter. I will need to think about
> > whether we can assign the return value less.
>
> Nice idea.  What about below?
>
>         if (comp_ret) {

You can consider adding (comp_ret || !dlen) , because if dlen == 0,
something is seriously wrong with the compression, that is a real
error.

>                 zswap_compress_engine_fail++;
>                 dlen = PAGE_SIZE;
>         }
>         if (dlen >= PAGE_SIZE) {
>                 zswap_compress_fail++;
>                 if (!mem_cgroup_zswap_writeback_enabled(
>                                         folio_memcg(page_folio(page)))) {
>                         comp_ret = -EINVAL;
>                         goto unlock;
>                 }
>                 comp_ret = 0;
>                 dlen = PAGE_SIZE;
>                 dst = kmap_local_page(page);
>         }

Looks very good to me. Much cleaner than the V2. Thanks for adopting that.

> > Another point I would like to make is that you currently make the cut
> > off threshold as page size. The ideal threshold might be something
> > slightly smaller than page size. The reason is that the zsmalloc has a
> > fixed size chuck to store it. If your page is close to full, it will
> > store the data in the same class as the full page. You are not gaining
> > anything from zsmalloc if the store page compression ratio is 95%.
> > That 5% will be in the waste in the zsmalloc class fragment anyway. So
> > the trade off is, decompress 95% of a page vs memcpy 100% of a page.
> > It is likely memcpy 100% is faster. I don't know the exact ideal cut
> > off of threshold. If I had to guess, I would guess below 90%.
>
> Agreed, this could be another nice followup work to do.

Ack.

>
> >
> > >
> > > > I can
> > > > be biased towards my own code :-).
> > > > I think we should treat the compression engine failure separately from
> > > > the compression rate failure. The engine failure is rare and we should
> > > > know about it as a real error. The compression rate failure is normal.
> > >
> > > Again, I agree having the observability would be nice.  I can add a new counter
> > > for that, like below attached patch, for example.
> >
> > I would love that. Make that per memcg if possible :-)
>
> As mentioned above, I would like to make only a global counter on debugfs for
> now, if you don't mind.  Let me know if you mind.

I don't mind. We can take the incremental approach.

Chris


  reply	other threads:[~2025-08-16  2:20 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-12 17:00 SeongJae Park
2025-08-12 18:52 ` Nhat Pham
2025-08-13 17:07 ` Chris Li
2025-08-13 18:20   ` SeongJae Park
2025-08-15 22:28     ` Chris Li
2025-08-15 23:08       ` Nhat Pham
2025-08-16  0:14         ` SeongJae Park
2025-08-16  2:23           ` Chris Li
2025-08-18 18:18             ` SeongJae Park
2025-08-18 20:33               ` Chris Li
2025-08-16  0:30         ` Chris Li
2025-08-16  0:07       ` SeongJae Park
2025-08-16  2:20         ` Chris Li [this message]
2025-08-13 18:32   ` Nhat Pham
2025-08-15 22:34     ` Chris Li
2025-08-15 22:44       ` Nhat Pham
2025-08-13 18:58   ` Hugh Dickins
2025-08-15 22:38     ` Chris Li
2025-08-13 19:42   ` Shakeel Butt
2025-08-13 20:48     ` Johannes Weiner
2025-08-15 22:44       ` Chris Li
2025-08-15 22:42     ` Chris Li
2025-08-14 16:23 ` Johannes Weiner
2025-08-14 17:04   ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF8kJuPCpnGAoz=1CBwe46ytpcR0ZUMAEWLFQce7eUWkb+0ERA@mail.gmail.com' \
    --to=chrisl@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=david@redhat.com \
    --cc=flintglass@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=sj@kernel.org \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox