linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.com>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Yosry Ahmed <yosryahmed@google.com>, Huan Yang <link@vivo.com>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Huang Ying <ying.huang@intel.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Peter Xu <peterx@redhat.com>,
	"Vishal Moola (Oracle)" <vishal.moola@gmail.com>,
	Yue Zhao <findns94@gmail.com>, Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 0/1] Add swappiness argument to memory.reclaim
Date: Thu, 30 Nov 2023 11:56:42 -0500	[thread overview]
Message-ID: <20231130165642.GA386439@cmpxchg.org> (raw)
In-Reply-To: <ZWiw9cEsDap1Qm5h@tiehlicka>

On Thu, Nov 30, 2023 at 04:57:41PM +0100, Michal Hocko wrote:
> On Thu 30-11-23 07:36:53, Dan Schatzberg wrote:
> [...]
> > In contrast, I argue in favor of a swappiness setting not as a way to implement
> > custom reclaim algorithms but rather to bias the balance of anon vs file due to
> > differences of proactive vs reactive reclaim. In this context, swappiness is the
> > existing interface for controlling this balance and this patch simply allows for
> > it to be configured differently for proactive vs reactive reclaim.
> 
> I do agree that swappiness is a better interface than explicit anon/file
> but the problem with swappiness is that it is more of a hint for the reclaim
> rather than a real control. Just look at get_scan_count and its history.
> Not only its range has been extended also the extent when it is actually
> used has been changing all the time and I think it is not a stretch to
> assume that trend to continue.

Right, we did tweak the edge behavior of e.g. swappiness=0. And we
extended the range to express "anon is cheaper than file", which
wasn't possible before, to support the compressed memory case.

However, its meaning and impact has been remarkably stable over the
years: it allows userspace to specify the relative cost of paging IO
between file and anon pages. This comment is from 2.6.28:

        /*
         * With swappiness at 100, anonymous and file have the same priority.
         * This scanning priority is essentially the inverse of IO cost.
         */
        anon_prio = sc->swappiness;
        file_prio = 200 - sc->swappiness;

And this is it today:

	/*
	 * Calculate the pressure balance between anon and file pages.
	 *
	 * The amount of pressure we put on each LRU is inversely
	 * proportional to the cost of reclaiming each list, as
	 * determined by the share of pages that are refaulting, times
	 * the relative IO cost of bringing back a swapped out
	 * anonymous page vs reloading a filesystem page (swappiness).
	 *
	 * Although we limit that influence to ensure no list gets
	 * left behind completely: at least a third of the pressure is
	 * applied, before swappiness.
	 *
	 * With swappiness at 100, anon and file have equal IO cost.
	 */
	total_cost = sc->anon_cost + sc->file_cost;
	anon_cost = total_cost + sc->anon_cost;
	file_cost = total_cost + sc->file_cost;
	total_cost = anon_cost + file_cost;

	ap = swappiness * (total_cost + 1);
	ap /= anon_cost + 1;

	fp = (200 - swappiness) * (total_cost + 1);
	fp /= file_cost + 1;

So swappiness still means the same it did 15 years ago. We haven't
changed the default swappiness setting, and we haven't broken any
existing swappiness configurations through VM changes in that time.

There are a few scenarios where swappiness doesn't apply:

- No swap. Oh well, that seems reasonable.

- Priority=0. This applies to near-OOM situations where the MM system
  tries to save itself. This isn't a range in which proactive
  reclaimers (should) operate.

- sc->file_is_tiny. This doesn't apply to cgroup reclaim and thus
  proactive reclaim.

- sc->cache_trim_mode. This implements clean cache dropbehind, and
  applies in the presence of large, non-refaulting inactive cache. The
  assumption there is that this data is reclaimable without involving
  IO to evict, and without the expectation of refault IO in the
  future. Without IO involvement, the relative IO cost isn't a
  factor. This will back off when refaults are observed, and the IO
  cost setting is then taken into account again as expected.

  If you consider swappiness to mean "reclaim what I ask you to", then
  this would override that, yes. But in the definition of relative IO
  cost, this decision making is permissible.

  Note that this applies to the global swappiness setting as well, and
  nobody has complained about it.

So I wouldn't say it's merely a reclaim hint. It controls a very
concrete and influential factor in VM decision making. And since the
global swappiness is long-established ABI, I don't expect its meaning
to change significantly any time soon.


  reply	other threads:[~2023-11-30 16:56 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 15:36 Dan Schatzberg
2023-11-30 15:36 ` [PATCH 1/1] mm: add swapiness= arg " Dan Schatzberg
2023-11-30 21:33   ` Andrew Morton
2023-11-30 21:46     ` Dan Schatzberg
2023-12-01  1:56   ` Huan Yang
2023-12-01  2:05     ` Yosry Ahmed
2023-12-01  2:13       ` Huan Yang
2023-12-01  2:17         ` Yosry Ahmed
2023-12-01  2:24           ` Huan Yang
2023-11-30 15:57 ` [PATCH 0/1] Add swappiness argument " Michal Hocko
2023-11-30 16:56   ` Johannes Weiner [this message]
2023-11-30 18:49     ` Shakeel Butt
2023-11-30 19:47     ` Dan Schatzberg
2023-11-30 20:30       ` Shakeel Butt
2023-11-30 21:37         ` Dan Schatzberg
2023-11-30 21:52           ` Shakeel Butt
2023-12-01  9:33     ` Michal Hocko
2023-12-01 15:49       ` Dan Schatzberg
2023-12-01 17:09       ` Johannes Weiner
2023-12-04 15:23         ` Michal Hocko
2023-12-05 16:19           ` Johannes Weiner
2023-12-07 18:57         ` Michal Koutný
2023-11-30 18:44 ` Shakeel Butt
2023-11-30 18:54   ` Matthew Wilcox
2023-11-30 19:39     ` Johannes Weiner
2023-11-30 19:49   ` Johannes Weiner
2023-11-30 19:50   ` Dan Schatzberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231130165642.GA386439@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=david@redhat.com \
    --cc=findns94@gmail.com \
    --cc=hughd@google.com \
    --cc=link@vivo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    --cc=roman.gushchin@linux.dev \
    --cc=schatzberg.dan@gmail.com \
    --cc=shakeelb@google.com \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox