RE: [PATCH v2 0/3] mm/damon: Profiling enhancements for DAMON

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Prasad, Aravinda" <aravinda.prasad@intel.com>
To: SeongJae Park <sj@kernel.org>
Cc: "damon@lists.linux.dev" <damon@lists.linux.dev>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"s2322819@ed.ac.uk" <s2322819@ed.ac.uk>,
	"Kumar, Sandeep4" <sandeep4.kumar@intel.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"Subramoney, Sreenivas" <sreenivas.subramoney@intel.com>,
	"Kervinen, Antti" <antti.kervinen@intel.com>,
	"Kanevskiy, Alexander" <alexander.kanevskiy@intel.com>
Subject: RE: [PATCH v2 0/3] mm/damon: Profiling enhancements for DAMON
Date: Tue, 19 Mar 2024 10:56:42 +0000	[thread overview]
Message-ID: <MW5PR11MB5907F44D802C298E8182A1B6F22C2@MW5PR11MB5907.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20240319052054.100167-1-sj@kernel.org>



> -----Original Message-----
> From: SeongJae Park <sj@kernel.org>
> Sent: Tuesday, March 19, 2024 10:51 AM
> To: Prasad, Aravinda <aravinda.prasad@intel.com>
> Cc: damon@lists.linux.dev; linux-mm@kvack.org; sj@kernel.org; linux-
> kernel@vger.kernel.org; s2322819@ed.ac.uk; Kumar, Sandeep4
> <sandeep4.kumar@intel.com>; Huang, Ying <ying.huang@intel.com>;
> Hansen, Dave <dave.hansen@intel.com>; Williams, Dan J
> <dan.j.williams@intel.com>; Subramoney, Sreenivas
> <sreenivas.subramoney@intel.com>; Kervinen, Antti
> <antti.kervinen@intel.com>; Kanevskiy, Alexander
> <alexander.kanevskiy@intel.com>
> Subject: Re: [PATCH v2 0/3] mm/damon: Profiling enhancements for DAMON
> 
> Hi Aravinda,
> 
> 
> Thank you for posting this new revision!
> 
> I remember I told you that I don't see a high level significant problems on on
> the reply to the previous revision of this patch[1], but I show a concern now.
> Sorry for not raising this earlier, but let me explain my humble concerns before
> being even more late.

Sure, no problem. We can discuss. I will get back to you with a detailed note.

Regards,
Aravinda

> 
> On Mon, 18 Mar 2024 18:58:45 +0530 Aravinda Prasad
> <aravinda.prasad@intel.com> wrote:
> 
> > DAMON randomly samples one or more pages in every region and tracks
> > accesses to them using the ACCESSED bit in PTE (or PMD for 2MB pages).
> > When the region size is large (e.g., several GBs), which is common for
> > large footprint applications, detecting whether the region is accessed
> > or not completely depends on whether the pages that are actively
> > accessed in the region are picked during random sampling.
> > If such pages are not picked for sampling, DAMON fails to identify the
> > region as accessed. However, increasing the sampling rate or
> > increasing the number of regions increases CPU overheads of kdamond.
> 
> DAMON uses sampling because it considers a region as accessed if a portion of
> the region that big enough to be detected via sampling is all accessed.  If a
> region is having some pages that really accessed but the proportion is too
> small to be found via sampling, I think DAMON could say the overall access to
> the region is only modest and could even be ignored.  In my humble opinion,
> this fits with the definition of DAMON region: A memory address range that
> constructed with pages having similar access frequency.


> 
> >
> > This patch proposes profiling different levels of the
> > application\u2019s page table tree to detect whether a region is
> > accessed or not. This patch set is based on the observation that, when
> > the accessed bit for a page is set, the accessed bits at the higher
> > levels of the page table tree (PMD/PUD/PGD) corresponding to the path
> > of the page table walk are also set. Hence, it is efficient to check
> > the accessed bits at the higher levels of the page table tree to
> > detect whether a region is accessed or not. For example, if the access
> > bit for a PUD entry is set, then one or more pages in the 1GB PUD
> > subtree is accessed as each PUD entry covers 1GB mapping. Hence,
> > instead of sampling thousands of 4K/2M pages to detect accesses in a
> > large region, sampling at the higher level of page table tree is faster and
> efficient.
> 
> Due to the above reason, I concern this could result in making DAMON
> monitoring results be inaccurately biased to report more than real accesses.
> 
> >
> > This patch set is based on 6.8-rc5 kernel (commit: f48159f8,
> > mm-unstable
> > tree)
> >
> > Changes since v1 [1]
> > ====================
> >
> >  - Added support for 5-level page table tree
> >  - Split the patch to mm infrastructure changes and DAMON enhancements
> >  - Code changes as per comments on v1
> >  - Added kerneldoc comments
> >
> > [1] https://lkml.org/lkml/2023/12/15/272
> >
> > Evaluation:
> >
> > - MASIM benchmark with 1GB, 10GB, 100GB footprint with 10% hot data
> >   and 5TB with 10GB hot data.
> > - DAMON: 5ms sampling, 200ms aggregation interval. Rest all
> >   parameters set to default value.
> > - DAMON+PTP: Page table profiling applied to DAMON with the above
> >   parameters.
> >
> > Profiling efficiency in detecting hot data:
> >
> > Footprint	1GB	10GB	100GB	5TB
> > ---------------------------------------------
> > DAMON		>90%	<50%	 ~0%	  0%
> > DAMON+PTP	>90%	>90%	>90%	>90%
> 
> Sampling interval is the time interval that assumed to be large enough for the
> workload to make meaningful amount of accesses within the interval.  Hence,
> meaningful amount of sampling interval depends on the workload's
> characteristic and system's memory bandwidth.
> 
> Here, the size of the hot memory region is about 100MB, 1GB, 10GB, and
> 10GB for the four cases, respectively.  And you set the sampling interval as
> 5ms.  Let's assume the system can access, say, 50 GB per second, and hence it
> could be able to access only up to 250 MB per 5ms.  So, in case of 1GB and
> footprint, all hot memory region would be accessed while DAMON is waiting
> for next sampling interval.  Hence, DAMON would be able to see most
> accesses via sampling.  But for 100GB footprint case, only 250MB / 10GB =
> about 2.5% of the hot memory region would be accessed between the
> sampling interval.  DAMON cannot see whole accesses, and hence the
> precision could be low.
> 
> I don't know exact memory bandwith of the system, but to detect the 10 GB
> hot region with 5ms sampling interval, the system should be able to access
> 2GB memory per millisecond, or about 2TB memory per second.  I think
> systems of such memory bandwidth is not that common.
> 
> I show you also explored a configuration setting the aggregation interval
> higher.  But because each sampling checks only access between the sampling
> interval, that might not help in this setup.  I'm wondering if you also explored
> increasing sampling interval.
> 
> Sorry again for finding this concern not early enough.  But I think we may need
> to discuss about this first.
> 
> [1] https://lkml.kernel.org/r/20231215201159.73845-1-sj@kernel.org
> 
> 
> Thanks,
> SJ
> 
> 
> >
> > CPU overheads (in billion cycles) for kdamond:
> >
> > Footprint	1GB	10GB	100GB	5TB
> > ---------------------------------------------
> > DAMON		1.15	19.53	3.52	9.55
> > DAMON+PTP	0.83	 3.20	1.27	2.55
> >
> > A detailed explanation and evaluation can be found in the arXiv paper:
> > https://arxiv.org/pdf/2311.10275.pdf
> >
> >
> > Aravinda Prasad (3):
> >   mm/damon: mm infrastructure support
> >   mm/damon: profiling enhancement
> >   mm/damon: documentation updates
> >
> >  Documentation/mm/damon/design.rst |  42 ++++++
> >  arch/x86/include/asm/pgtable.h    |  20 +++
> >  arch/x86/mm/pgtable.c             |  28 +++-
> >  include/linux/mmu_notifier.h      |  36 +++++
> >  include/linux/pgtable.h           |  79 ++++++++++
> >  mm/damon/vaddr.c                  | 233 ++++++++++++++++++++++++++++--
> >  6 files changed, 424 insertions(+), 14 deletions(-)
> >
> > --
> > 2.21.3

next prev parent reply	other threads:[~2024-03-19 10:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18 13:28 Aravinda Prasad
2024-03-18 13:28 ` [PATCH v2 1/3] mm/damon: mm infrastructure support Aravinda Prasad
2024-03-18 20:27   ` kernel test robot
2024-03-18 13:28 ` [PATCH v2 2/3] mm/damon: profiling enhancement Aravinda Prasad
2024-03-18 18:23   ` kernel test robot
2024-03-18 21:59   ` kernel test robot
2024-03-18 13:28 ` [PATCH v2 3/3] mm/damon: documentation updates Aravinda Prasad
2024-03-19  0:51 ` [PATCH v2 0/3] mm/damon: Profiling enhancements for DAMON Yu Zhao
2024-03-19  5:20 ` SeongJae Park
2024-03-19 10:56   ` Prasad, Aravinda [this message]
2024-03-20 12:31   ` Prasad, Aravinda
2024-03-21 23:10     ` SeongJae Park
2024-03-22 12:12       ` Prasad, Aravinda
2024-03-22 18:32         ` SeongJae Park
2024-03-25  7:50           ` Prasad, Aravinda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW5PR11MB5907F44D802C298E8182A1B6F22C2@MW5PR11MB5907.namprd11.prod.outlook.com \
    --to=aravinda.prasad@intel.com \
    --cc=alexander.kanevskiy@intel.com \
    --cc=antti.kervinen@intel.com \
    --cc=damon@lists.linux.dev \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=s2322819@ed.ac.uk \
    --cc=sandeep4.kumar@intel.com \
    --cc=sj@kernel.org \
    --cc=sreenivas.subramoney@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox