Re: [LSF/MM TOPIC] Proactive Memory Reclaim

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Johannes Weiner <hannes@cmpxchg.org>
To: Shakeel Butt <shakeelb@google.com>
Cc: lsf-pc@lists.linux-foundation.org, Linux MM <linux-mm@kvack.org>,
	Michal Hocko <mhocko@kernel.org>, Rik van Riel <riel@surriel.com>,
	Roman Gushchin <guro@fb.com>
Subject: Re: [LSF/MM TOPIC] Proactive Memory Reclaim
Date: Tue, 23 Apr 2019 13:31:28 -0400	[thread overview]
Message-ID: <20190423173128.GA3601@cmpxchg.org> (raw)
In-Reply-To: <CALvZod4V+56pZbPkFDYO3+60Xr0_ZjiSgrfJKs_=Bd4AjdvFzA@mail.gmail.com>

Hi Shakeel,

On Tue, Apr 23, 2019 at 08:30:46AM -0700, Shakeel Butt wrote:
> Though this is quite late, I still want to propose a topic for
> discussion during LSFMM'19 which I think will be beneficial for Linux
> users in general but particularly the data center users running a
> range of different workloads and want to reduce the memory cost.
> 
> Topic: Proactive Memory Reclaim
> 
> Motivation/Problem: Memory overcommit is most commonly used technique
> to reduce the cost of memory by large infrastructure owners. However
> memory overcommit can adversely impact the performance of latency
> sensitive applications by triggering direct memory reclaim. Direct
> reclaim is unpredictable and disastrous for latency sensitive
> applications.
> 
> Solution: Proactively reclaim memory from the system to drastically
> reduce the occurrences of direct reclaim. Target cold memory to keep
> the refault rate of the applications acceptable (i.e. no impact on the
> performance).
> 
> Challenges:
> 1. Tracking cold memory efficiently.
> 2. Lack of infrastructure to reclaim specific memory.
> 
> Details: Existing "Idle Page Tracking" allows tracking cold memory on
> a system but it becomes prohibitively expensive as the machine size
> grows. Also there is no way from the user space to reclaim a specific
> 'cold' page. I want to present our implementation of cold memory
> tracking and reclaim. The aim is to make it more generally beneficial
> to lot more users and upstream it.
> 
> More details:
> "Software-driven far-memory in warehouse-scale computers", ASPLOS'19.
> https://youtu.be/aKddds6jn1s

I would be very interested to hear about this as well.

As Rik mentions, I've been working on a way to determine the "true"
memory workingsets of our workloads. I'm using a pressure feedback
loop of psi and dynamically adjusted cgroup limits, to harness the
kernel's LRU/clock algorithm to sort out what's cold and what isn't.

This does use direct reclaim, but since psi quantifies the exact time
cost of that, it backs off before our SLAs are violated. Of course, if
necessary, this work could easily be punted to a kthread or something.

The additional refault IO also has not been a problem in practice for
us so far, since our pressure parameters are fairly conservative. But
that is a bit harder to manage - by the time you experience those you
might have already oversteered. This is where compression could help
reduce the cost of being aggressive. That said, even with conservative
settings I've managed to shave off 25-30% of the memory footprint of
common interactive jobs without affecting their performance. I suspect
that in many workloads (depending on their exact slope of the access
locality bell curve) shaving off more would require a disproportionate
amount more pressure/CPU/IO, and so might not be worthwile.

Anyway, I'd love to hear your insights on this.

next prev parent reply	other threads:[~2019-04-23 17:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23 15:30 Shakeel Butt
2019-04-23 15:58 ` Mel Gorman
2019-04-23 16:33   ` Shakeel Butt
2019-04-23 16:49     ` Yang Shi
2019-04-23 17:12       ` Shakeel Butt
2019-04-23 18:26         ` Yang Shi
2019-04-23 16:08 ` Rik van Riel
2019-04-23 17:04   ` Shakeel Butt
2019-04-23 17:49     ` Johannes Weiner
2019-04-23 17:34   ` Suren Baghdasaryan
2019-04-23 17:31 ` Johannes Weiner [this message]
2019-04-24 16:28   ` Christopher Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190423173128.GA3601@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=guro@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@kernel.org \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox