[RFC PATCH] mm, oom: oom ratelimit auto tuning

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yafang Shao <laoar.shao@gmail.com>
To: akpm@linux-foundation.org, mhocko@kernel.org
Cc: linux-mm@kvack.org, Yafang Shao <laoar.shao@gmail.com>
Subject: [RFC PATCH] mm, oom: oom ratelimit auto tuning
Date: Sat, 11 Apr 2020 05:36:14 -0400	[thread overview]
Message-ID: <1586597774-6831-1-git-send-email-laoar.shao@gmail.com> (raw)

Recently we find an issue that when OOM happens the server is almost
unresponsive for several minutes. That is caused by a slow serial set
with "console=ttyS1,19200". As the speed of this serial is too slow, it
will take almost 10 seconds to print a full OOM message into it. And
then all tasks allocating pages will be blocked as there is almost no
pages can be reclaimed. At that time, the memory pressure is around 90
for a long time. If we don't print the OOM messages into this serial,
a full OOM message only takes less than 1ms and the memory pressure is
less than 40.

We can avoid printing OOM messages into slow serial by adjusting
/proc/sys/kernel/printk to fix this issue, but then all messages with
KERN_WARNING level can't be printed into it neither, that may loss some
useful messages when we want to collect messages from the it for
debugging purpose.

So it is better to decrease the ratelimit. We can introduce some sysctl
knobes similar with printk_ratelimit and burst, but it will burden the
amdin. Let the kernel automatically adjust the ratelimit, that would be
a better choice.

The OOM ratelimit starts with a slow rate, and it will increase slowly
if the speed of the console is rapid and decrease rapidly if the speed
of the console is slow. oom_rs.burst will be in [1, 10] and
oom_rs.interval will always greater than 5 * HZ.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/oom_kill.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index dfc357614e56..23dba8ccf313 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -954,8 +954,10 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 {
 	struct task_struct *victim = oc->chosen;
 	struct mem_cgroup *oom_group;
-	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
-					      DEFAULT_RATELIMIT_BURST);
+	static DEFINE_RATELIMIT_STATE(oom_rs, 20 * HZ, 1);
+	int delta;
+	unsigned long start;
+	unsigned long end;

 	/*
 	 * If the task is already exiting, don't alarm the sysadmin or kill
@@ -972,8 +974,51 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 	}
 	task_unlock(victim);

-	if (__ratelimit(&oom_rs))
+	if (__ratelimit(&oom_rs)) {
+		start = jiffies;
 		dump_header(oc, victim);
+		end = jiffies;
+		delta = end - start;
+
+		/*
+		 * The OOM messages may be printed to a serial with very low
+		 * speed, e.g. console=ttyS1,19200. It will take long
+		 * time to print these OOM messages to this serial, and
+		 * then processes allocating pages will all be blocked due
+		 * to it can hardly reclaim pages. That will case high
+		 * memory pressure and the system may be unresponsive for a
+		 * long time.
+		 * In this case, we should decrease the OOM ratelimit or
+		 * avoid printing OOM messages into the slow serial. But if
+		 * we avoid printing OOM messages into the slow serial, all
+		 * messages with KERN_WARNING level can't be printed into
+		 * it neither, that may loss some useful messages when we
+		 * want to collect messages from the console for debugging
+		 * purpose. So it is better to decrease the ratelimit. We
+		 * can introduce some sysctl knobes similar with
+		 * printk_ratelimit and burst, but it will burden the
+		 * admin. Let the kernel automatically adjust the ratelimit
+		 * would be a better chioce.
+		 * In bellow algorithm, it will decrease the OOM ratelimit
+		 * rapidly if the console is slow and increase the OOM
+		 * ratelimit slowly if the console is fast. oom_rs.burst
+		 * will be in [1, 10] and oom_rs.interval will always
+		 * greater than 5 * HZ.
+		 */
+		if (delta < oom_rs.interval / 10) {
+			if (oom_rs.interval >= 10 * HZ)
+				oom_rs.interval /= 2;
+			else if (oom_rs.interval > 6 * HZ)
+				oom_rs.interval -= HZ;
+
+			if (oom_rs.burst < 10)
+				oom_rs.burst += 1;
+		} else if (oom_rs.burst > 1) {
+			oom_rs.burst = 1;
+			oom_rs.interval = 4 * delta;
+		}
+
+	}

 	/*
 	 * Do we need to kill the entire memory cgroup?
-- 
2.18.2

next             reply	other threads:[~2020-04-11  9:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-11  9:36 Yafang Shao [this message]
2020-04-14  7:39 ` Michal Hocko
2020-04-14 12:32   ` Yafang Shao
2020-04-14 14:32     ` Michal Hocko
2020-04-14 14:58       ` Yafang Shao
2020-04-15  5:58         ` Tetsuo Handa
2020-04-17 11:57           ` Yafang Shao
2020-04-17 13:03             ` Tetsuo Handa
2020-04-17 13:55               ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1586597774-6831-1-git-send-email-laoar.shao@gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox