linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
To: Barry Song <21cnbao@gmail.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"yosryahmed@google.com" <yosryahmed@google.com>,
	"nphamcs@gmail.com" <nphamcs@gmail.com>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"Zou, Nanhai" <nanhai.zou@intel.com>,
	"Feghali, Wajdi K" <wajdi.k.feghali@intel.com>,
	"Gopal, Vinodh" <vinodh.gopal@intel.com>,
	"Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Subject: RE: [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store vmstat event counters.
Date: Thu, 15 Aug 2024 01:37:07 +0000	[thread overview]
Message-ID: <SJ0PR11MB56783F8762A5AE6FF20BBAF2C9802@SJ0PR11MB5678.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CAGsJ_4wuoWKSnzeJ-2Xoc=_du3ZL3Ms8s6K58w8En3_h8-q_ng@mail.gmail.com>

Hi Barry,

> -----Original Message-----
> From: Barry Song <21cnbao@gmail.com>
> Sent: Wednesday, August 14, 2024 4:25 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com;
> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store
> vmstat event counters.
> 
> On Thu, Aug 15, 2024 at 5:40 AM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> > Hi Barry,
> >
> > > -----Original Message-----
> > > From: Barry Song <21cnbao@gmail.com>
> > > Sent: Wednesday, August 14, 2024 12:49 AM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com;
> > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> akpm@linux-
> > > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > Subject: Re: [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store
> > > vmstat event counters.
> > >
> > > On Wed, Aug 14, 2024 at 6:28 PM Kanchana P Sridhar
> > > <kanchana.p.sridhar@intel.com> wrote:
> > > >
> > > > Added vmstat event counters per mTHP-size that can be used to account
> > > > for folios of different sizes being successfully stored in ZSWAP.
> > > >
> > > > For this RFC, it is not clear if these zswpout counters should instead
> > > > be added as part of the existing mTHP stats in
> > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats.
> > > >
> > > > The following is also a viable option, should it make better sense,
> > > > for instance, as:
> > > >
> > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout.
> > > >
> > > > If so, we would be able to distinguish between mTHP zswap and
> > > > non-zswap swapouts through:
> > > >
> > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout
> > > >
> > > > and
> > > >
> > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout
> > > >
> > > > respectively.
> > > >
> > > > Comments would be appreciated as to which approach is preferable.
> > >
> > > Even though swapout might go through zswap, from the perspective of
> > > the mm core, it shouldn't be aware of that. Shouldn't zswpout be part
> > > of swpout? Why are they separate? no matter if a mTHP has been
> > > put in zswap, it has been swapped-out to mm-core? No?
> >
> > Thanks for the code review comments. This is a good point. I was keeping in
> > mind the convention used by existing vmstat event counters that distinguish
> > zswpout/zswpin from pswpout/pswpin events.
> >
> > If we want to keep the distinction in mTHP swapouts, would adding a
> > separate MTHP_STAT_ZSWPOUT to "enum mthp_stat_item" be Ok?
> >
> 
> I'm not entirely sure how important the zswpout counter is. To me, it doesn't
> seem as critical as swpout and swpout_fallback, which are more useful for
> system profiling. zswapout feels more like an internal detail related to
> how the swap-out process is handled? If this is the case, we might not
> need this per-size counter.
> 
> Otherwise, I believe sysfs is a better place to avoid all the chaos in vmstat
> to handle various orders and sizes. So the question is, per-size zswpout
> counter is really important or just for debugging purposes?

I agree, sysfs would be a cleaner mTHP stats accounting solution, given the
existing mTHP swpout stats under the per-order
/sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout.

I personally find distinct zswap vs. bdev/fs swapout accounting useful
for debugging, and for overall reclaim path characterization.
For instance, the impact of different zswap compressors' compress latency
on zswpout activity for a given workload. Is a slowdown in compress latency
causing active/hot memory to be reclaimed and immediately faulted in?
Does better zswap compress efficiency co-relate to more cold memory
as mTHP to be reclaimed? How does the reclaim path efficiency
improvement resulting from improving zswap_store mTHP performance
co-relate with ZSWAP utilization and memory savings? I have found these
counters useful in understanding some of these characteristics.

I also believe it helps to account for the number of mTHP being stored in
different compress tiers. For e.g. how many mTHP were stored in zswap vs.
being rejected and stored in the backing swap device. This could help say
in provisioning zswap memory, and knowing the impact of zswap compress
path latency on scaling.

Another interesting characteristic that mTHP zswpout accounting could help
understand would be compressor incompressibility and/or zpool fragmentation;
and being able to better co-relate the zswap/reject_* sysfs counters with
mTHP [z]swpout stats.

Look forward to inputs from yourself and others on the direction and next steps.

Thanks,
Kanchana

> 
> > In any case, it looks like all that would be needed is a call to
> > count_mthp_stat(folio_order(folio), MTHP_STAT_[Z]SWPOUT) in the
> > general case.
> >
> > I will make this change in v2, depending on whether or not the
> > separation of zswpout vs. non-zswap swpout is recommended for
> > mTHP.
> >
> > >
> > >
> > > >
> > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > > ---
> > > >  include/linux/vm_event_item.h | 15 +++++++++++++++
> > > >  mm/vmstat.c                   | 15 +++++++++++++++
> > > >  2 files changed, 30 insertions(+)
> > > >
> > > > diff --git a/include/linux/vm_event_item.h
> > > b/include/linux/vm_event_item.h
> > > > index 747943bc8cc2..2451bcfcf05c 100644
> > > > --- a/include/linux/vm_event_item.h
> > > > +++ b/include/linux/vm_event_item.h
> > > > @@ -114,6 +114,9 @@ enum vm_event_item { PGPGIN, PGPGOUT,
> > > PSWPIN, PSWPOUT,
> > > >                 THP_ZERO_PAGE_ALLOC,
> > > >                 THP_ZERO_PAGE_ALLOC_FAILED,
> > > >                 THP_SWPOUT,
> > > > +#ifdef CONFIG_ZSWAP
> > > > +               ZSWPOUT_PMD_THP_FOLIO,
> > > > +#endif
> > > >                 THP_SWPOUT_FALLBACK,
> > > >  #endif
> > > >  #ifdef CONFIG_MEMORY_BALLOON
> > > > @@ -143,6 +146,18 @@ enum vm_event_item { PGPGIN, PGPGOUT,
> > > PSWPIN, PSWPOUT,
> > > >                 ZSWPIN,
> > > >                 ZSWPOUT,
> > > >                 ZSWPWB,
> > > > +               ZSWPOUT_4KB_FOLIO,
> > > > +#ifdef CONFIG_THP_SWAP
> > > > +               mTHP_ZSWPOUT_8kB,
> > > > +               mTHP_ZSWPOUT_16kB,
> > > > +               mTHP_ZSWPOUT_32kB,
> > > > +               mTHP_ZSWPOUT_64kB,
> > > > +               mTHP_ZSWPOUT_128kB,
> > > > +               mTHP_ZSWPOUT_256kB,
> > > > +               mTHP_ZSWPOUT_512kB,
> > > > +               mTHP_ZSWPOUT_1024kB,
> > > > +               mTHP_ZSWPOUT_2048kB,
> > > > +#endif
> > >
> > > This implementation hardcodes assumptions about the page size being
> 4KB,
> > > but page sizes can vary, and so can the THP orders?
> >
> > Agreed, will address in v2.
> >
> > >
> > > >  #endif
> > > >  #ifdef CONFIG_X86
> > > >                 DIRECT_MAP_LEVEL2_SPLIT,
> > > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > > index 8507c497218b..0e66c8b0c486 100644
> > > > --- a/mm/vmstat.c
> > > > +++ b/mm/vmstat.c
> > > > @@ -1375,6 +1375,9 @@ const char * const vmstat_text[] = {
> > > >         "thp_zero_page_alloc",
> > > >         "thp_zero_page_alloc_failed",
> > > >         "thp_swpout",
> > > > +#ifdef CONFIG_ZSWAP
> > > > +       "zswpout_pmd_thp_folio",
> > > > +#endif
> > > >         "thp_swpout_fallback",
> > > >  #endif
> > > >  #ifdef CONFIG_MEMORY_BALLOON
> > > > @@ -1405,6 +1408,18 @@ const char * const vmstat_text[] = {
> > > >         "zswpin",
> > > >         "zswpout",
> > > >         "zswpwb",
> > > > +       "zswpout_4kb_folio",
> > > > +#ifdef CONFIG_THP_SWAP
> > > > +       "mthp_zswpout_8kb",
> > > > +       "mthp_zswpout_16kb",
> > > > +       "mthp_zswpout_32kb",
> > > > +       "mthp_zswpout_64kb",
> > > > +       "mthp_zswpout_128kb",
> > > > +       "mthp_zswpout_256kb",
> > > > +       "mthp_zswpout_512kb",
> > > > +       "mthp_zswpout_1024kb",
> > > > +       "mthp_zswpout_2048kb",
> > > > +#endif
> > >
> > > The issue here is that the number of THP orders
> > > can vary across different platforms.
> >
> > Agreed, will address in v2.
> >
> > Thanks,
> > Kanchana
> >
> > >
> > > >  #endif
> > > >  #ifdef CONFIG_X86
> > > >         "direct_map_level2_splits",
> > > > --
> > > > 2.27.0
> > > >
> 
> Thanks
> Barry

  reply	other threads:[~2024-08-15  1:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-14  6:28 [RFC PATCH v1 0/4] mm: ZSWAP swap-out of mTHP folios Kanchana P Sridhar
2024-08-14  6:28 ` [RFC PATCH v1 1/4] mm: zswap: zswap_is_folio_same_filled() takes an index in the folio Kanchana P Sridhar
2024-08-14  6:28 ` [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store vmstat event counters Kanchana P Sridhar
2024-08-14  7:48   ` Barry Song
2024-08-14 17:40     ` Sridhar, Kanchana P
2024-08-14 23:24       ` Barry Song
2024-08-15  1:37         ` Sridhar, Kanchana P [this message]
2024-08-14  6:28 ` [RFC PATCH v1 3/4] mm: zswap: zswap_store() extended to handle mTHP folios Kanchana P Sridhar
2024-08-14  6:28 ` [RFC PATCH v1 4/4] mm: page_io: Count successful mTHP zswap stores in vmstat Kanchana P Sridhar
2024-08-14  7:53   ` Barry Song
2024-08-14 17:47     ` Sridhar, Kanchana P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB56783F8762A5AE6FF20BBAF2C9802@SJ0PR11MB5678.namprd11.prod.outlook.com \
    --to=kanchana.p.sridhar@intel.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nanhai.zou@intel.com \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox