Re: [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Yiannis Nikolakopoulos <yiannis.nikolakop@gmail.com>
Cc: Wei Xu <weixugc@google.com>, David Rientjes <rientjes@google.com>,
	Gregory Price <gourry@gourry.net>,
	Matthew Wilcox <willy@infradead.org>,
	Bharata B Rao <bharata@amd.com>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, <dave.hansen@intel.com>,
	<hannes@cmpxchg.org>, <mgorman@techsingularity.net>,
	<mingo@redhat.com>, <peterz@infradead.org>,
	<raghavendra.kt@amd.com>, <riel@surriel.com>, <sj@kernel.org>,
	<ying.huang@linux.alibaba.com>, <ziy@nvidia.com>,
	<dave@stgolabs.net>, <nifan.cxl@gmail.com>,
	<xuezhengchu@huawei.com>, <akpm@linux-foundation.org>,
	<david@redhat.com>, <byungchul@sk.com>, <kinseyho@google.com>,
	<joshua.hahnjy@gmail.com>, <yuanchu@google.com>,
	<balbirs@nvidia.com>, <alok.rathore@samsung.com>,
	<yiannis@zptcorp.com>, Adam Manzanares <a.manzanares@samsung.com>
Subject: Re: [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure
Date: Mon, 20 Oct 2025 15:23:45 +0100	[thread overview]
Message-ID: <20251020152345.00003d61@huawei.com> (raw)
In-Reply-To: <CAOi6=wS6s2FAAbMbxX5zCZzPQE7Mm73pbhxpiM_5e44o6yyPMw@mail.gmail.com>

On Thu, 16 Oct 2025 18:16:31 +0200
Yiannis Nikolakopoulos <yiannis.nikolakop@gmail.com> wrote:

> On Thu, Sep 25, 2025 at 5:01 PM Jonathan Cameron
> <jonathan.cameron@huawei.com> wrote:
> >
> > On Thu, 25 Sep 2025 16:03:46 +0200
> > Yiannis Nikolakopoulos <yiannis.nikolakop@gmail.com> wrote:
> >
> > Hi Yiannis,  
> Hi Jonathan! Thanks for your response!
> 
Hi Yiannis,

This is way more fun than doing real work ;)

> [snip]
> > > There are several things that may be done on the device side. For now, I
> > > think the kernel should be unaware of these. But with what I described
> > > above, the goal is to have the capacity thresholds configured in a way
> > > that we can absorb the occasional dirty cache lines that are written back.  
> >
> > In worst case they are far from occasional. It's not hard to imagine a malicious  
> This is correct. Any simplification on my end is mainly based on the
> empirical evidence of the use cases we are testing for (tiering). But
> I fully respect that we need to be proactive and assume the worst case
> scenario.
> > program that ensures that all L3 in a system (say 256MiB+) is full of cache lines
> > from the far compressed memory all of which are changed in a fashion that makes
> > the allocation much less compressible.  If you are doing compression at cache line
> > granularity that's not so bad because it would only be 256MiB margin needed.
> > If the system in question is doing large block side compression, say 4KiB.
> > Then we have a 64x write amplification multiplier. If the virus is streaming over  
> This is insightful indeed :). However, even in the case of the 64x
> amplification, you implicitly assume that each of the cachelines in
> the L3 belongs to a different page. But then one cache-line would not
> deteriorate the compressed size of the entire page that much (the
> bandwidth amplification on the device is a different -performance-
> story).

This is putting limits on what compression algorithm is used. We could do
that but then we'd have to never support anything different. Maybe if the
device itself provided the worse case amplification numbers that would do
Any device that gets this wrong is buggy - but it might be hard to detect
that if people don't publish their compression algs and the proofs of worst
case blow up of compression blocks.

I guess we could do the maths on what the device manufacturer says and
if we don't believe them or they haven't provided enough info to check,
double it :)

> So even in the 4K case the two ends of the spectrum are to
> either have big amplification with low compression ratio impact, or
> small amplification with higher compression ratio impact.
> Another practical assumption here, is that the different HMU
> mechanisms would help promote the contended pages before this becomes
> a big issue. Which of course might still not be enough on the
> malicious streaming writes workload.

Using promotion to get you out of this is a non starter unless you have
a backstop because we'll have annoying things like pinning going on or
bandwidth bottlenecks at the promotion target.
Promotion might massively reduce the performance impact of course under
normal conditions.

> Overall, I understand these are heuristics and I do see your point
> that this needs to be robust even for the maliciously behaving
> programs.
> > memory the evictions we are seeing at the result of new lines being fetched
> > to be made much less compressible.
> >
> > Add a accelerator (say DPDK or other zero copy into userspace buffers) into the
> > mix and you have a mess. You'll need to be extremely careful with what goes  
> Good point about the zero copy stuff.
> > in this compressed memory or hold enormous buffer capacity against fast
> > changes in compressability.  
> To my experience the factor of buffer capacity would be closer to the
> benefit that you get from the compression (e.g. 2x the cache size in
> your example).
> But I understand the burden of proof is on our end. As we move further
> with this I will try to provide data as well.

If we are aiming for generality the nasty problem is that either we have to
write rules on what Linux will cope with, or design it to cope with the
worse possible implementation :(

I can think of lots of plausible sounding cases that have horrendous
multiplication factors if done in a naive fashion. 
* De-duplication
* Metadata flag for all 0s
* Some general purpose compression algs are very vulnerable to the tails
  of the probability distributions.  Some will flip between multiple modes
  with very different characteristics, perhaps to meet latency guarantees.

Would be fun to ask an information theorist / compression expert to lay
out an algorithm with the worst possible tail performance but with good
average.



> >
> > Key is that all software is potentially malicious (sometimes accidentally so ;)
> >
> > Now, if we can put this into a special pool where it is acceptable to drop the writes
> > and return poison (so the application crashes) then that may be fine.
> >
> > Or block writes.   Running compressed memory as read only CoW is one way to
> > avoid this problem.  
> These could be good starting points, as I see in the rest of the thread.
> 
Fun problems.  Maybe we start with very conservative handling and then
argue for relaxations later.

Jonathan

> Thanks,
> Yiannis

next prev parent reply	other threads:[~2025-10-20 14:23 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 14:46 Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 1/8] mm: migrate: Allow misplaced migration without VMA too Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 2/8] migrate: implement migrate_misplaced_folios_batch Bharata B Rao
2025-10-03 10:36   ` Jonathan Cameron
2025-10-03 11:02     ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 3/8] mm: Hot page tracking and promotion Bharata B Rao
2025-10-03 11:17   ` Jonathan Cameron
2025-10-06  4:13     ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 4/8] x86: ibs: In-kernel IBS driver for memory access profiling Bharata B Rao
2025-10-03 12:19   ` Jonathan Cameron
2025-10-06  4:28     ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 5/8] x86: ibs: Enable IBS profiling for memory accesses Bharata B Rao
2025-10-03 12:22   ` Jonathan Cameron
2025-09-10 14:46 ` [RFC PATCH v2 6/8] mm: mglru: generalize page table walk Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 7/8] mm: klruscand: use mglru scanning for page promotion Bharata B Rao
2025-10-03 12:30   ` Jonathan Cameron
2025-09-10 14:46 ` [RFC PATCH v2 8/8] mm: sched: Move hot page promotion from NUMAB=2 to kpromoted Bharata B Rao
2025-10-03 12:38   ` Jonathan Cameron
2025-10-06  5:57     ` Bharata B Rao
2025-10-06  9:53       ` Jonathan Cameron
2025-09-10 15:39 ` [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure Matthew Wilcox
2025-09-10 16:01   ` Gregory Price
2025-09-16 19:45     ` David Rientjes
2025-09-16 22:02       ` Gregory Price
2025-09-17  0:30       ` Wei Xu
2025-09-17  3:20         ` Balbir Singh
2025-09-17  4:15           ` Bharata B Rao
2025-09-17 16:49         ` Jonathan Cameron
2025-09-25 14:03           ` Yiannis Nikolakopoulos
2025-09-25 14:41             ` Gregory Price
2025-10-16 11:48               ` Yiannis Nikolakopoulos
2025-09-25 15:00             ` Jonathan Cameron
2025-09-25 15:08               ` Gregory Price
2025-09-25 15:18                 ` Gregory Price
2025-09-25 15:24                 ` Jonathan Cameron
2025-09-25 16:06                   ` Gregory Price
2025-09-25 17:23                     ` Jonathan Cameron
2025-09-25 19:02                       ` Gregory Price
2025-10-01  7:22                         ` Gregory Price
2025-10-17  9:53                           ` Yiannis Nikolakopoulos
2025-10-17 14:15                             ` Gregory Price
2025-10-17 14:36                               ` Jonathan Cameron
2025-10-17 14:59                                 ` Gregory Price
2025-10-20 14:05                                   ` Jonathan Cameron
2025-10-21 18:52                                     ` Gregory Price
2025-10-21 18:57                                       ` Gregory Price
2025-10-22  9:09                                         ` Jonathan Cameron
2025-10-22 15:05                                           ` Gregory Price
2025-10-23 15:29                                             ` Jonathan Cameron
2025-10-16 16:16               ` Yiannis Nikolakopoulos
2025-10-20 14:23                 ` Jonathan Cameron [this message]
2025-10-20 15:05                   ` Gregory Price
2025-10-08 17:59       ` Vinicius Petrucci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251020152345.00003d61@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=a.manzanares@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=alok.rathore@samsung.com \
    --cc=balbirs@nvidia.com \
    --cc=bharata@amd.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis.nikolakop@gmail.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox