From: Fengguang Wu <fengguang.wu@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <andrea@suse.de>,
linux-mm@kvack.org, Nick Piggin <npiggin@suse.de>
Subject: Re: make swappiness safer to use
Date: Tue, 7 Aug 2007 13:00:33 +0800 [thread overview]
Message-ID: <386462833.05717@ustc.edu.cn> (raw)
Message-ID: <20070807050032.GA16179@mail.ustc.edu.cn> (raw)
In-Reply-To: <20070806112154.f8c5bcdc.akpm@linux-foundation.org>
On Mon, Aug 06, 2007 at 11:21:54AM -0700, Andrew Morton wrote:
> On Wed, 1 Aug 2007 04:33:15 +0200 Andrea Arcangeli <andrea@suse.de> wrote:
>
> > On Wed, Aug 01, 2007 at 09:32:08AM +0800, Fengguang Wu wrote:
> > > Here's the updated patch without underflows.
> >
> > this is ok.
>
> I lost the plot a bit here. Can I please have a resend of the full and
> final patch?
OK, here it is.
===
From: Andrea Arcangeli <andrea@suse.de>
Subject: make swappiness safer to use
Swappiness isn't a safe sysctl. Setting it to 0 for example can hang a
system. That's a corner case but even setting it to 10 or lower can
waste enormous amounts of cpu without making much progress. We've
customers who wants to use swappiness but they can't because of the
current implementation (if you change it so the system stops swapping
it really stops swapping and nothing works sane anymore if you really
had to swap something to make progress).
This patch from Kurt Garloff makes swappiness safer to use (no more
huge cpu usage or hangs with low swappiness values).
I think the prev_priority can also be nuked since it wastes 4 bytes
per zone (that would be an incremental patch but I wait the
nr_scan_[in]active to be nuked first for similar reasons). Clearly
somebody at some point noticed how broken that thing was and they had
to add min(priority, prev_priority) to give it some reliability, but
they didn't go the last mile to nuke prev_priority too. Calculating
distress only in function of not-racy priority is correct and sure
more than enough without having to add randomness into the equation.
Patch is tested on older kernels but it compiles and it's quite simple
so...
Overall I'm not very satisified by the swappiness tweak, since it
doesn't rally do anything with the dirty pagecache that may be
inactive. We need another kind of tweak that controls the inactive
scan and tunes the can_writepage feature (not yet in mainline despite
having submitted it a few times), not only the active one. That new
tweak will tell the kernel how hard to scan the inactive list for pure
clean pagecache (something the mainline kernel isn't capable of
yet). We already have that feature working in all our enterprise
kernels with the default reasonable tune, or they can't even run a
readonly backup with tar without triggering huge write I/O. I think it
should be available also in mainline later.
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Kurt Garloff <garloff@suse.de>
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
mm/vmscan.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
--- linux-2.6.22-rc6-mm1.orig/mm/vmscan.c
+++ linux-2.6.22-rc6-mm1/mm/vmscan.c
@@ -887,6 +887,7 @@ static void shrink_active_list(unsigned
long mapped_ratio;
long distress;
long swap_tendency;
+ long imbalance;
if (zone_is_near_oom(zone))
goto force_reclaim_mapped;
@@ -922,6 +923,46 @@ static void shrink_active_list(unsigned
swap_tendency = mapped_ratio / 2 + distress + sc->swappiness;
/*
+ * If there's huge imbalance between active and inactive
+ * (think active 100 times larger than inactive) we should
+ * become more permissive, or the system will take too much
+ * cpu before it start swapping during memory pressure.
+ * Distress is about avoiding early-oom, this is about
+ * making swappiness graceful despite setting it to low
+ * values.
+ *
+ * Avoid div by zero with nr_inactive+1, and max resulting
+ * value is vm_total_pages.
+ */
+ imbalance = zone_page_state(zone, NR_ACTIVE);
+ imbalance /= zone_page_state(zone, NR_INACTIVE) + 1;
+
+ /*
+ * Reduce the effect of imbalance if swappiness is low,
+ * this means for a swappiness very low, the imbalance
+ * must be much higher than 100 for this logic to make
+ * the difference.
+ *
+ * Max temporary value is vm_total_pages*100.
+ */
+ imbalance *= (vm_swappiness + 1);
+ imbalance /= 100;
+
+ /*
+ * If not much of the ram is mapped, makes the imbalance
+ * less relevant, it's high priority we refill the inactive
+ * list with mapped pages only in presence of high ratio of
+ * mapped pages.
+ *
+ * Max temporary value is vm_total_pages*100.
+ */
+ imbalance *= mapped_ratio;
+ imbalance /= 100;
+
+ /* apply imbalance feedback to swap_tendency */
+ swap_tendency += imbalance;
+
+ /*
* Now use this metric to decide whether to start moving mapped
* memory onto the inactive list.
*/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-07 5:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-31 21:52 Andrea Arcangeli
2007-07-31 22:12 ` Andrew Morton
2007-07-31 22:40 ` Andrea Arcangeli
2007-07-31 22:51 ` Andrew Morton
2007-07-31 23:02 ` Andrea Arcangeli
[not found] ` <20070801011925.GB20109@mail.ustc.edu.cn>
2007-08-01 1:19 ` Fengguang Wu
[not found] ` <20070801012222.GA20565@mail.ustc.edu.cn>
2007-08-01 1:22 ` Fengguang Wu
[not found] ` <20070801013208.GA20085@mail.ustc.edu.cn>
2007-08-01 1:32 ` Fengguang Wu
2007-08-01 2:33 ` Andrea Arcangeli
2007-08-06 18:21 ` Andrew Morton
[not found] ` <20070807050032.GA16179@mail.ustc.edu.cn>
2007-08-07 5:00 ` Fengguang Wu [this message]
2007-11-12 2:07 ` YAMAMOTO Takashi
2007-08-01 2:30 ` Andrea Arcangeli
2007-07-31 23:09 ` Andrew Morton
2007-07-31 23:23 ` Andrea Arcangeli
2007-07-31 23:32 ` Martin Bligh
2007-07-31 23:49 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=386462833.05717@ustc.edu.cn \
--to=fengguang.wu@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrea@suse.de \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox