linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	nphamcs@gmail.com, chengming.zhou@linux.dev,
	 usamaarif642@gmail.com, shakeel.butt@linux.dev,
	ryan.roberts@arm.com,  ying.huang@intel.com, 21cnbao@gmail.com,
	akpm@linux-foundation.org,  nanhai.zou@intel.com,
	wajdi.k.feghali@intel.com, vinodh.gopal@intel.com
Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store().
Date: Wed, 25 Sep 2024 11:30:34 -0700	[thread overview]
Message-ID: <CAJD7tkY8D14j-e6imW9NxZCjTbx8tu_VaKDbRRQMdSeKX_kBuw@mail.gmail.com> (raw)
In-Reply-To: <20240925134008.GA875661@cmpxchg.org>

[..]
> > > +       /*
> > > +        * Check cgroup limits:
> > > +        *
> > > +        * The cgroup zswap limit check is done once at the beginning of an
> > > +        * mTHP store, and not within zswap_store_page() for each page
> > > +        * in the mTHP. We do however check the zswap pool limits at the
> > > +        * start of zswap_store_page(). What this means is, the cgroup
> > > +        * could go over the limits by at most (HPAGE_PMD_NR - 1) pages.
> > > +        * However, the per-store-page zswap pool limits check should
> > > +        * hopefully trigger the cgroup aware and zswap LRU aware global
> > > +        * reclaim implemented in the shrinker. If this assumption holds,
> > > +        * the cgroup exceeding the zswap limits could potentially be
> > > +        * resolved before the next zswap_store, and if it is not, the next
> > > +        * zswap_store would fail the cgroup zswap limit check at the start.
> > > +        */
> >
> > I do not really like this. Allowing going one page above the limit is
> > one thing, but one THP above the limit seems too much. I also don't
> > like relying on the repeated limit checking in zswap_store_page(), if
> > anything I think that should be batched too.
> >
> > Is it too unreasonable to maintain the average compression ratio and
> > use that to estimate limit checking for both memcg and global limits?
> > Johannes, Nhat, any thoughts on this?
>
> I honestly don't think it's much of an issue. The global limit is
> huge, and the cgroup limit is to the best of my knowledge only used as
> a binary switch. Setting a non-binary limit - global or cgroup - seems
> like a bit of an obscure usecase to me, because in the vast majority
> of cases it's preferable to keep compresing over declaring OOM.
>
> And even if you do have some granular limit, the workload size scales
> with it. It's not like you have a thousand THPs in a 10M cgroup.

The memcg limit and zswap limit can be disproportionate, although that
shouldn't be common.

>
> If this ever becomes an issue, we can handle it in a fastpath-slowpath
> scheme: check the limit up front for fast-path failure if we're
> already maxed out, just like now; then make obj_cgroup_charge_zswap()
> atomically charge against zswap.max and unwind the store if we raced.
>
> For now, I would just keep the simple version we currently have: check
> once in zswap_store() and then just go ahead for the whole folio.

I am not totally against this but I feel like this is too optimistic.
I think we can keep it simple-ish by maintaining an ewma for the
compression ratio, we already have primitives for this (see
DECLARE_EWMA).

Then in zswap_store(), we can use the ewma to estimate the compressed
size and use it to do the memcg and global limit checks once, like we
do today. Instead of just checking if we are below the limits, we
check if we have enough headroom for the estimated compressed size.
Then we call zswap_store_page() to do the per-page stuff, then do
batched charging and stats updates.

If you think that's an overkill we can keep doing the limit checks as
we do today,
but I would still like to see batching of all the limit checks,
charging, and stats updates. It makes little sense otherwise.


  reply	other threads:[~2024-09-25 18:31 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-24  1:17 [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios Kanchana P Sridhar
2024-09-24  1:17 ` [PATCH v7 1/8] mm: Define obj_cgroup_get() if CONFIG_MEMCG is not defined Kanchana P Sridhar
2024-09-24 16:45   ` Nhat Pham
2024-09-24  1:17 ` [PATCH v7 2/8] mm: zswap: Modify zswap_compress() to accept a page instead of a folio Kanchana P Sridhar
2024-09-24 16:50   ` Nhat Pham
2024-09-24  1:17 ` [PATCH v7 3/8] mm: zswap: Refactor code to store an entry in zswap xarray Kanchana P Sridhar
2024-09-24 17:16   ` Nhat Pham
2024-09-24 20:40     ` Sridhar, Kanchana P
2024-09-24 19:14   ` Yosry Ahmed
2024-09-24 22:22     ` Sridhar, Kanchana P
2024-09-24  1:17 ` [PATCH v7 4/8] mm: zswap: Refactor code to delete stored offsets in case of errors Kanchana P Sridhar
2024-09-24 17:25   ` Nhat Pham
2024-09-24 20:41     ` Sridhar, Kanchana P
2024-09-24 19:20   ` Yosry Ahmed
2024-09-24 22:32     ` Sridhar, Kanchana P
2024-09-25  0:43       ` Yosry Ahmed
2024-09-25  1:18         ` Sridhar, Kanchana P
2024-09-25 14:11         ` Johannes Weiner
2024-09-25 18:45           ` Sridhar, Kanchana P
2024-09-24  1:17 ` [PATCH v7 5/8] mm: zswap: Compress and store a specific page in a folio Kanchana P Sridhar
2024-09-24 19:28   ` Yosry Ahmed
2024-09-24 22:45     ` Sridhar, Kanchana P
2024-09-25  0:47       ` Yosry Ahmed
2024-09-25  1:49         ` Sridhar, Kanchana P
2024-09-25 13:53           ` Johannes Weiner
2024-09-25 18:45             ` Sridhar, Kanchana P
2024-09-24  1:17 ` [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store() Kanchana P Sridhar
2024-09-24 17:33   ` Nhat Pham
2024-09-24 20:51     ` Sridhar, Kanchana P
2024-09-24 21:08       ` Nhat Pham
2024-09-24 21:34         ` Yosry Ahmed
2024-09-24 22:16           ` Nhat Pham
2024-09-24 22:18             ` Sridhar, Kanchana P
2024-09-24 22:28             ` Yosry Ahmed
2024-09-24 22:17           ` Sridhar, Kanchana P
2024-09-24 19:38   ` Yosry Ahmed
2024-09-24 20:51     ` Nhat Pham
2024-09-24 21:38       ` Yosry Ahmed
2024-09-24 23:11         ` Nhat Pham
2024-09-25  0:05           ` Sridhar, Kanchana P
2024-09-25  0:52           ` Yosry Ahmed
2024-09-24 23:21       ` Sridhar, Kanchana P
2024-09-24 23:02     ` Sridhar, Kanchana P
2024-09-25 13:40     ` Johannes Weiner
2024-09-25 18:30       ` Yosry Ahmed [this message]
2024-09-25 19:10         ` Sridhar, Kanchana P
2024-09-25 19:49           ` Yosry Ahmed
2024-09-25 20:49             ` Johannes Weiner
2024-09-25 19:20         ` Johannes Weiner
2024-09-25 19:39           ` Yosry Ahmed
2024-09-25 20:13             ` Johannes Weiner
2024-09-25 21:06               ` Yosry Ahmed
2024-09-25 22:29                 ` Sridhar, Kanchana P
2024-09-26  3:58                   ` Sridhar, Kanchana P
2024-09-26  4:52                     ` Yosry Ahmed
2024-09-26 16:40                       ` Sridhar, Kanchana P
2024-09-26 17:19                         ` Yosry Ahmed
2024-09-26 17:29                           ` Sridhar, Kanchana P
2024-09-26 17:34                             ` Yosry Ahmed
2024-09-26 19:36                               ` Sridhar, Kanchana P
2024-09-26 18:43                             ` Johannes Weiner
2024-09-26 18:45                               ` Yosry Ahmed
2024-09-26 19:40                                 ` Sridhar, Kanchana P
2024-09-26 19:39                               ` Sridhar, Kanchana P
2024-09-25 14:27   ` Johannes Weiner
2024-09-25 18:17     ` Yosry Ahmed
2024-09-25 18:48     ` Sridhar, Kanchana P
2024-09-24  1:17 ` [PATCH v7 7/8] mm: swap: Count successful mTHP ZSWAP stores in sysfs mTHP zswpout stats Kanchana P Sridhar
2024-09-24  1:17 ` [PATCH v7 8/8] mm: Document the newly added mTHP zswpout stats, clarify swpout semantics Kanchana P Sridhar
2024-09-24 17:36   ` Nhat Pham
2024-09-24 20:52     ` Sridhar, Kanchana P
2024-09-24 19:34 ` [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios Yosry Ahmed
2024-09-24 22:50   ` Sridhar, Kanchana P
2024-09-25  6:35 ` Huang, Ying
2024-09-25 18:39   ` Sridhar, Kanchana P
2024-09-26  0:44     ` Huang, Ying
2024-09-26  3:48       ` Sridhar, Kanchana P
2024-09-26  6:47         ` Huang, Ying
2024-09-26 21:44           ` Sridhar, Kanchana P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkY8D14j-e6imW9NxZCjTbx8tu_VaKDbRRQMdSeKX_kBuw@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=kanchana.p.sridhar@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nanhai.zou@intel.com \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox